A Novel Variance Reduction Proximal Stochastic Newton Algorithm for Large-Scale Machine Learning Optimization

[1] M. Liu, Y. Mroueh, J. Ross, W. Zhang, X. Cui, P. Das, and T. Yang, “Towards understanding acceleration phenomena in large-scale stochas-tic optimization and deep learning,” arXiv preprint arXiv:2203.17191, 2022. Liu M. Mroueh Y. Ross J. Zhang W. Cui X. Das P. Yang T. “ Towards understanding acceleration phenomena in large-scale stochas-tic optimization and deep learning ,” arXiv preprint arXiv:2203.17191, 2022 . Search in Google Scholar

[2] Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and B. Woodworth, “Lower bounds for non-convex stochastic optimization,” Journal of Machine Learning Research, vol. 23, no. 115, pp. 1–75, 2022. Arjevani Y. Carmon Y. Duchi J. C. Foster D. J. Srebro N. Woodworth B. “ Lower bounds for non-convex stochastic optimization ,” Journal of Machine Learning Research , vol. 23 , no. 115 , pp. 1 – 75 , 2022 . Search in Google Scholar

[3] D. Richards, M. Rabbat, and M. Rowland, “Sharpness-aware minimiza-tion improves distributed training of neural networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 29 115–29 135. Richards D. Rabbat M. Rowland M. “ Sharpness-aware minimiza-tion improves distributed training of neural networks ,” in International Conference on Machine Learning . PMLR , 2023 , pp. 29 115 – 29 135 . Search in Google Scholar

[4] N. Agarwal, Z. Allen-Zhu, K. Sridharan, and Y. Wang, “On the theory of variance reduction for stochastic gradient monte carlo”, Mathematical Programming, pp. 1–41, 2023. Agarwal N. Allen-Zhu Z. Sridharan K. Wang Y. “ On the theory of variance reduction for stochastic gradient monte carlo ”, Mathematical Programming , pp. 1 – 41 , 2023 . Search in Google Scholar

[5] F. Huang, S. Chen, and Z. Huang, “Revisiting resnets: Improved training and scaling strategies,” Neural Networks, vol. 153, pp. 324–337, 2022. Huang F. Chen S. Huang Z. “ Revisiting resnets: Improved training and scaling strategies ,” Neural Networks , vol. 153 , pp. 324 – 337 , 2022 . Search in Google Scholar

[6] P. Xu, Z. Chen, D. Zou, and Q. Gu, “How can we craft large-scale neural networks in the presence of measurement noise?” Advances in Neural Information Processing Systems, vol. 34, pp. 28 140–28 152, 2021. Xu P. Chen Z. Zou D. Gu Q. “ How can we craft large-scale neural networks in the presence of measurement noise?” Advances in Neural Information Processing Systems , vol. 34 , pp. 28 140 – 28 152 , 2021 . Search in Google Scholar

[7] R. Johnson et al., “Stochastic variance reduced gradient descent for non-convex optimization,” Journal of Machine Learning Research, vol. 21, pp. 1–30, 2020. Johnson R. , “ Stochastic variance reduced gradient descent for non-convex optimization ,” Journal of Machine Learning Research , vol. 21 , pp. 1 – 30 , 2020 . Search in Google Scholar

[8] T. Guo, Y. Liu, and C. Han, “An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning,” Optimization Letters, vol. 17, no. 2, pp. 385-400, 2023. doi:10.1007/s11590-023-01884-8. Guo T. Liu Y. Han C. “ An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning ,” Optimization Letters , vol. 17 , no. 2 , pp. 385 400 , 2023 . doi: 10.1007/s11590-023-01884-8 . Open DOI Search in Google Scholar

[9] H. Zhang, Q. Yang, and Y. Zhang, “Linear Convergence of Stochastic Gradient Descent for Non-strongly Convex Smooth Optimization,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 124-135. doi:10.5555/3327763.3327786. Zhang H. Yang Q. Zhang Y. “ Linear Convergence of Stochastic Gradient Descent for Non-strongly Convex Smooth Optimization ,” in Proceedings of the 37th International Conference on Machine Learning , 2020 , pp. 124 135 . doi: 10.5555/3327763.3327786 . Open DOI Search in Google Scholar

[10] A. K. Sinha, M. K. Gupta, and A. R. Jain, “Variance Reduction Techniques for Stochastic Gradient Descent in Deep Learning,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 1-10. doi:10.5555/3495724.3495801. Sinha A. K. Gupta M. K. Jain A. R. “ Variance Reduction Techniques for Stochastic Gradient Descent in Deep Learning ,” in Proceedings of the 38th International Conference on Machine Learning , 2021 , pp. 1 10 . doi: 10.5555/3495724.3495801 . Open DOI Search in Google Scholar

[11] T. Qianqian, L. Guannan, and C. Xingyu, “Asynchronous Parallel Stochastic Quasi-Newton Methods,” Journal of Computational and Applied Mathematics, vol. 386, pp. 112-123, 2021. doi:10.1016/j.cam.2021.112123. Qianqian T. Guannan L. Xingyu C. “ Asynchronous Parallel Stochastic Quasi-Newton Methods ,” Journal of Computational and Applied Mathematics , vol. 386 , pp. 112 123 , 2021. doi: 10.1016/j.cam.2021.112123 . Open DOI Search in Google Scholar

[12] R. M. Gower, P. Richtarik, and F. Bach, “Stochastic Block Coordinate Descent with Variance Reduction,” IEEE Transactions on Information Theory, vol. 64, no. 9, pp. 6262-6281, 2018. doi:10.1109/TIT.2018.2841289. Gower R. M. Richtarik P. Bach F. “ Stochastic Block Coordinate Descent with Variance Reduction ,” IEEE Transactions on Information Theory , vol. 64 , no. 9 , pp. 6262 6281 , 2018 . doi: 10.1109/TIT.2018.2841289 . Open DOI Search in Google Scholar

[13] Y. Chen et al., “Variance reduced stochastic gradient descent with momentum for non-convex optimization,” in Proceedings of the 37th International Conference on Machine Learning (ICML), 2020, pp. 1–10. Chen Y. , “ Variance reduced stochastic gradient descent with momentum for non-convex optimization ,” in Proceedings of the 37th International Conference on Machine Learning (ICML) , 2020 , pp. 1 – 10 . Search in Google Scholar

Lingua:: Inglese

Frequenza di pubblicazione:: 4 volte all'anno
Argomenti della rivista:: Informatica, Informatica, altro

Feed RSS della rivista

A Novel Variance Reduction Proximal Stochastic Newton Algorithm for Large-Scale Machine Learning Optimization

Mohammed Moyed Ahmed

Pubblicato online: 31 dic 2024

Pagine: 84 - 90

DOI: https://doi.org/10.2478/ijanmc-2024-0040

Parole chiaveComposite optimization, Machine learning, Stochastic Newton method, Variance reduction, Convergence analysis

© 2024 Dr.Mohammed Moyed Ahmed, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Parole chiave
Composite optimization, Machine learning, Stochastic Newton method, Variance reduction, Convergence analysis