Shufflemono: Rethinking Lightweight Network for Self-Supervised Monocular Depth Estimation

[1] Tomasz Szmuc, Rafał Mrówka, Marek Brańka, Jakub Ficoń, Piotr Pieta, A Novel Method for Fast Generation of 3D Objects from Multiple Depth Sensors., Journal of Artificial Intelligence and Soft Computing Research, 2023, 13(2): 95-105. Search in Google Scholar

[2] Martin-Gomez, A., Li, H., Song, T., Yang, S., Wang, G., Ding, H., Navab, N., Zhao, Z., Armand, M., Sttar: surgical tool tracking using off-the-shelf augmented reality head-mounted displays., IEEE Transactions on Visualization and Computer Graphics, 2023, 1-16. Search in Google Scholar

[3] Rodrigues, R.T., Miraldo, P., Dimarogonas, D.V., Aguiar, A.P., A framework for depth estimation and relative localization of ground robots using computer vision., IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, 3719-3724. Search in Google Scholar

[4] Silva, R., Cielniak, G., Gao, J., Leaving the Lines Behind: Vision-Based Crop Row Exit for Agricultural Robot Navigation., Preprint at https://arxiv.org/abs/2306.05869, 2023. Search in Google Scholar

[5] Sharma, A., Nett, R., Ventura, J., Unsupervised learning of depth and ego-motion from cylindrical panoramic video with applications for virtual reality., International Journal of Semantic Computing, 2020, 14(03): 333-356. Search in Google Scholar

[6] Rasla, A., Beyeler, M., The relative importance of depth cues and semantic edges for indoor mobility using simulated prosthetic vision in immersive virtual reality., Proceedings of the 28th ACM Symposium on Virtual Reality Software and Technology, 2022, 1-11. Search in Google Scholar

[7] Patakin, N., Vorontsova, A., Artemyev, M., Konushin, A., Single-stage 3d geometry-preserving depth estimation model training on dataset mixtures with uncalibrated stereo data., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 1705-1714. Search in Google Scholar

[8] Peng, R., Wang, R., Wang, Z., Lai, Y., Wang, R., Rethinking depth estimation for multi-view stereo: A unified representation., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 8645-8654. Search in Google Scholar

[9] Choe, J., Joo, K., Imtiaz, T., Kweon, I.S., Volumetric propagation network: Stereo-lidar fusion for long-range depth estimation., IEEE Robotics and Automation Letters, 2021, 6(3): 4672-4679. Search in Google Scholar

[10] Hirschmuller, H., Accurate and efficient stereo processing by semi-global matching and mutual information., IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, 2: 807-814. Search in Google Scholar

[11] Chang, J.-R., Chen, Y.-S., Pyramid stereo matching network., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 5410-5418. Search in Google Scholar

[12] Liu, P., King, I., Lyu, M.R., Xu, J.,Flow2stereo: Effective self-supervised learning of optical flow and stereo matching., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, 6648-6657. Search in Google Scholar

[13] Ullman, S., The interpretation of structure from motion., Proceedings of the Royal Society of London. Series B. Biological Sciences, 1979, 203(1153): 405–426. Search in Google Scholar

[14] Zhou, T., Brown, M., Snavely, N., Lowe, D.G., Unsupervised learning of depth and ego-motion from video., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, 1851–1858. Search in Google Scholar

[15] Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J., Digging into self-supervised monocular depth estimation., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 3828–3838. Search in Google Scholar

[16] Zhou, Z., Fan, X., Shi, P., Xin, Y., R-msfm: Recurrent multi-scale feature modulation for monocular depth estimating., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, 12777–12786. Search in Google Scholar

[17] Zhang, N., Nex, F., Vosselman, G., Kerle, N., Litemono: A lightweight cnn and transformer architecture for self-supervised monocular depth estimation., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, 18537–18546. Search in Google Scholar

[18] Zhang, X., Zhou, X., Lin, M., Sun, J., Shufflenet: An extremely efficient convolutional neural network for mobile devices., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 6848–6856. Search in Google Scholar

[19] Eigen, D., Fergus, R., Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2015, 2650–2658. Search in Google Scholar

[20] Hui, T.-W., Rm-depth: Unsupervised learning of recurrent monocular depth in dynamic scenes., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 1675–1684. Search in Google Scholar

[21] Yan, J., Zhao, H., Bu, P., Jin, Y., Channel-wise attention-based network for self-supervised monocular depth estimation., 2021 International Conference on 3D Vision (3DV), 2021, 464–473. Search in Google Scholar

[22] Zhao, C., Zhang, Y., Poggi, M., Tosi, F., Guo, X., Zhu, Z., Huang, G., Tang, Y., Mattoccia, S.: Monovit: Self-supervised monocular depth estimation with a vision transformer., 2022 International Conference on 3D Vision (3DV), 2022, 668–678 . Search in Google Scholar

[23] He, M., Hui, L., Bian, Y., Ren, J., Xie, J., Yang, J., Ra-depth: Resolution adaptive self-supervised monocular depth estimation., European Conference on Computer Vision, 2022, 565–581. Search in Google Scholar

[24] Shim, D., Kim, H.J., Swindepth: Unsupervised depth estimation using monocular sequences via swin transformer and densely cascaded network., arXiv preprint arXiv:2301.06715, 2023. Search in Google Scholar

[25] Jaderberg, M., Vedaldi, A., Zisserman, A., Speeding up convolutional neural networks with low rank expansions., Proceeding of the British Machine Vision Conference 2014. British Machine Vision Association, 2014. Search in Google Scholar

[26] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H., Mobilenets: Efficient convolutional neural networks for mobile vision applications., arXiv preprint arXiv:1704.04861, 2017. Search in Google Scholar

[27] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C., Mobilenetv2: Inverted residuals and linear bottlenecks., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 4510–4520. Search in Google Scholar

[28] Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al., Searching for mobilenetv3., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, 1314–1324. Search in Google Scholar

[29] Ma, N., Zhang, X., Zheng, H.-T., Sun, J., Shufflenet v2: Practical guidelines for efficient cnn architecture design., Proceedings of the European Conference on Computer Vision, 2018,116–131. Search in Google Scholar

[30] Mehta, S., Rastegari, M., Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer., International Conference on Learning Representations., 2021. Search in Google Scholar

[31] Yang, R., Ma, H., Wu, J., Tang, Y., Xiao, X., Zheng, M., Li, X., Scalablevit: Rethinking the context-oriented generalization of vision transformer., Proceedings of the European Conference on Computer Vision, 2022, 480–496. Search in Google Scholar

[32] Chen, Y., Dai, X., Chen, D., Liu, M., Dong, X., Yuan, L., Liu, Z.: Mobile-former: Bridging mobilenet and transformer., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, 5270–5279. Search in Google Scholar

[33] Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T., Axial attention in multidimensional transformers., arXiv preprint arXiv:1912.12180, 2019. Search in Google Scholar

[34] Mehta, S., Rastegari, M., Separable self-attention for mobile vision transformers., Transactions on Machine Learning Research, 2022. Search in Google Scholar

[35] Ronneberger, O., Fischer, P., Brox, T., U-net: Convolutional networks for biomedical image segmentation., Medical Image Computing and Computer-Assisted Intervention–MICCAI, 2015, 234–241. Search in Google Scholar

[36] Krizhevsky, A., Sutskever, I., Hinton, G.E., Imagenet classification with deep convolutional neural networks., Advances in neural information processing systems, 2012, 5. Search in Google Scholar

[37] Xie, S., Girshick, R., Dollar, P., Tu, Z., He, K., Aggregated residual transformations for deep neural networks., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2017, 1492–1500. Search in Google Scholar

[38] Glorot, X., Bordes, A., Bengio, Y., Deep sparse rectifier neural networks., Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, 315–323. Search in Google Scholar

[39] Maas, A.L., Hannun, A.Y., Ng, A.Y., et al., Rectifier nonlinearities improve neural network acoustic models., Proc. Icml, 2013, 30(1): 3. Search in Google Scholar

[40] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P., Image quality assessment: from error visibility to structural similarity., IEEE transactions on image processing, 2004, 600–612. Search in Google Scholar

[41] Girshick, R., Fast r-cnn., Proceedings of the IEEE/CVF International Conference on Computer Vision, 2015, 1440–1448. Search in Google Scholar

[42] Zhou, H., Greenwood, D., Taylor, S., Self-supervised monocular depth estimation with internal feature fusion, arXiv preprint arXiv:2110.09482, 2021. Search in Google Scholar

[43] Geiger, A., Lenz, P., Stiller, C., Urtasun, R., Vision meets robotics: The kitti dataset., The International Journal of Robotics Research., 2013, 32(11): 1231-1237. Search in Google Scholar

[44] Saxena, A., Sun, M., Ng, A.Y., Make3d: Learning 3d scene structure from a single still image., IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 31(5): 824-840. Search in Google Scholar

[45] Eigen, D., Puhrsch, C., Fergus, R., Depth map prediction from a single image using a multi-scale deep network., Advances in neural information processing systems, 2014, 27. Search in Google Scholar

[46] Wang, C., Buenaposada, J.M., Zhu, R., Lucey, S., Learning depth from monocular videos using direct methods., Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 2022–2030. Search in Google Scholar

Idioma:: Inglés

Calendario de la edición:: 4 veces al año
Temas de la revista:: Informática, Bases de datos y minería de datos, Inteligencia artificial

RSS Feed de revista

Shufflemono: Rethinking Lightweight Network for Self-Supervised Monocular Depth Estimation

Yingwei Feng

Zhiyong Hong

Liping Xiong

Zhiqiang Zeng

Jingmin Li

Publicado en línea: 11 jun 2024

Páginas: 191 - 205

Recibido: 11 dic 2023

Aceptado: 27 feb 2024

DOI: https://doi.org/10.2478/jaiscr-2024-0011

Palabras clavelightweight, self-supervised learning, depth estimation, monocular

© 2024 Yingwei Feng et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Palabras clave
lightweight, self-supervised learning, depth estimation, monocular