Accesso libero

A Novel Variance Reduction Proximal Stochastic Newton Algorithm for Large-Scale Machine Learning Optimization

  
31 dic 2024
INFORMAZIONI SU QUESTO ARTICOLO

Cita
Scarica la copertina

M. Liu, Y. Mroueh, J. Ross, W. Zhang, X. Cui, P. Das, and T. Yang, “Towards understanding acceleration phenomena in large-scale stochas-tic optimization and deep learning,” arXiv preprint arXiv:2203.17191, 2022. Liu M. Mroueh Y. Ross J. Zhang W. Cui X. Das P. Yang T. Towards understanding acceleration phenomena in large-scale stochas-tic optimization and deep learning ,” arXiv preprint arXiv:2203.17191, 2022 . Search in Google Scholar

Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and B. Woodworth, “Lower bounds for non-convex stochastic optimization,” Journal of Machine Learning Research, vol. 23, no. 115, pp. 1–75, 2022. Arjevani Y. Carmon Y. Duchi J. C. Foster D. J. Srebro N. Woodworth B. Lower bounds for non-convex stochastic optimization ,” Journal of Machine Learning Research , vol. 23 , no. 115 , pp. 1 75 , 2022 . Search in Google Scholar

D. Richards, M. Rabbat, and M. Rowland, “Sharpness-aware minimiza-tion improves distributed training of neural networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 29 115–29 135. Richards D. Rabbat M. Rowland M. Sharpness-aware minimiza-tion improves distributed training of neural networks ,” in International Conference on Machine Learning . PMLR , 2023 , pp. 29 115 29 135 . Search in Google Scholar

N. Agarwal, Z. Allen-Zhu, K. Sridharan, and Y. Wang, “On the theory of variance reduction for stochastic gradient monte carlo”, Mathematical Programming, pp. 1–41, 2023. Agarwal N. Allen-Zhu Z. Sridharan K. Wang Y. On the theory of variance reduction for stochastic gradient monte carlo ”, Mathematical Programming , pp. 1 41 , 2023 . Search in Google Scholar

F. Huang, S. Chen, and Z. Huang, “Revisiting resnets: Improved training and scaling strategies,” Neural Networks, vol. 153, pp. 324–337, 2022. Huang F. Chen S. Huang Z. Revisiting resnets: Improved training and scaling strategies ,” Neural Networks , vol. 153 , pp. 324 337 , 2022 . Search in Google Scholar

P. Xu, Z. Chen, D. Zou, and Q. Gu, “How can we craft large-scale neural networks in the presence of measurement noise?” Advances in Neural Information Processing Systems, vol. 34, pp. 28 140–28 152, 2021. Xu P. Chen Z. Zou D. Gu Q. How can we craft large-scale neural networks in the presence of measurement noise?” Advances in Neural Information Processing Systems , vol. 34 , pp. 28 140 28 152 , 2021 . Search in Google Scholar

R. Johnson et al., “Stochastic variance reduced gradient descent for non-convex optimization,” Journal of Machine Learning Research, vol. 21, pp. 1–30, 2020. Johnson R. , “ Stochastic variance reduced gradient descent for non-convex optimization ,” Journal of Machine Learning Research , vol. 21 , pp. 1 30 , 2020 . Search in Google Scholar

T. Guo, Y. Liu, and C. Han, “An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning,” Optimization Letters, vol. 17, no. 2, pp. 385-400, 2023. doi:10.1007/s11590-023-01884-8. Guo T. Liu Y. Han C. An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning ,” Optimization Letters , vol. 17 , no. 2 , pp. 385 400 , 2023 . doi: 10.1007/s11590-023-01884-8 . Open DOISearch in Google Scholar

H. Zhang, Q. Yang, and Y. Zhang, “Linear Convergence of Stochastic Gradient Descent for Non-strongly Convex Smooth Optimization,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 124-135. doi:10.5555/3327763.3327786. Zhang H. Yang Q. Zhang Y. Linear Convergence of Stochastic Gradient Descent for Non-strongly Convex Smooth Optimization ,” in Proceedings of the 37th International Conference on Machine Learning , 2020 , pp. 124 135 . doi: 10.5555/3327763.3327786 . Open DOISearch in Google Scholar

A. K. Sinha, M. K. Gupta, and A. R. Jain, “Variance Reduction Techniques for Stochastic Gradient Descent in Deep Learning,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 1-10. doi:10.5555/3495724.3495801. Sinha A. K. Gupta M. K. Jain A. R. Variance Reduction Techniques for Stochastic Gradient Descent in Deep Learning ,” in Proceedings of the 38th International Conference on Machine Learning , 2021 , pp. 1 10 . doi: 10.5555/3495724.3495801 . Open DOISearch in Google Scholar

T. Qianqian, L. Guannan, and C. Xingyu, “Asynchronous Parallel Stochastic Quasi-Newton Methods,” Journal of Computational and Applied Mathematics, vol. 386, pp. 112-123, 2021. doi:10.1016/j.cam.2021.112123. Qianqian T. Guannan L. Xingyu C. Asynchronous Parallel Stochastic Quasi-Newton Methods ,” Journal of Computational and Applied Mathematics , vol. 386 , pp. 112 123 , 2021. doi: 10.1016/j.cam.2021.112123 . Open DOISearch in Google Scholar

R. M. Gower, P. Richtarik, and F. Bach, “Stochastic Block Coordinate Descent with Variance Reduction,” IEEE Transactions on Information Theory, vol. 64, no. 9, pp. 6262-6281, 2018. doi:10.1109/TIT.2018.2841289. Gower R. M. Richtarik P. Bach F. Stochastic Block Coordinate Descent with Variance Reduction ,” IEEE Transactions on Information Theory , vol. 64 , no. 9 , pp. 6262 6281 , 2018 . doi: 10.1109/TIT.2018.2841289 . Open DOISearch in Google Scholar

Y. Chen et al., “Variance reduced stochastic gradient descent with momentum for non-convex optimization,” in Proceedings of the 37th International Conference on Machine Learning (ICML), 2020, pp. 1–10. Chen Y. , “ Variance reduced stochastic gradient descent with momentum for non-convex optimization ,” in Proceedings of the 37th International Conference on Machine Learning (ICML) , 2020 , pp. 1 10 . Search in Google Scholar

Lingua:
Inglese
Frequenza di pubblicazione:
4 volte all'anno
Argomenti della rivista:
Informatica, Informatica, altro