This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
M. Liu, Y. Mroueh, J. Ross, W. Zhang, X. Cui, P. Das, and T. Yang, “Towards understanding acceleration phenomena in large-scale stochas-tic optimization and deep learning,” arXiv preprint arXiv:2203.17191, 2022.LiuM.MrouehY.RossJ.ZhangW.CuiX.DasP.YangT. “Towards understanding acceleration phenomena in large-scale stochas-tic optimization and deep learning,” arXiv preprint arXiv:2203.17191, 2022.Search in Google Scholar
Y. Arjevani, Y. Carmon, J. C. Duchi, D. J. Foster, N. Srebro, and B. Woodworth, “Lower bounds for non-convex stochastic optimization,” Journal of Machine Learning Research, vol. 23, no. 115, pp. 1–75, 2022.ArjevaniY.CarmonY.DuchiJ. C.FosterD. J.SrebroN.WoodworthB. “Lower bounds for non-convex stochastic optimization,” Journal of Machine Learning Research, vol. 23, no. 115, pp. 1–75, 2022.Search in Google Scholar
D. Richards, M. Rabbat, and M. Rowland, “Sharpness-aware minimiza-tion improves distributed training of neural networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 29 115–29 135.RichardsD.RabbatM.RowlandM. “Sharpness-aware minimiza-tion improves distributed training of neural networks,” in International Conference on Machine Learning. PMLR, 2023, pp. 29 115–29 135.Search in Google Scholar
N. Agarwal, Z. Allen-Zhu, K. Sridharan, and Y. Wang, “On the theory of variance reduction for stochastic gradient monte carlo”, Mathematical Programming, pp. 1–41, 2023.AgarwalN.Allen-ZhuZ.SridharanK.WangY. “On the theory of variance reduction for stochastic gradient monte carlo”, Mathematical Programming, pp. 1–41, 2023.Search in Google Scholar
F. Huang, S. Chen, and Z. Huang, “Revisiting resnets: Improved training and scaling strategies,” Neural Networks, vol. 153, pp. 324–337, 2022.HuangF.ChenS.HuangZ. “Revisiting resnets: Improved training and scaling strategies,” Neural Networks, vol. 153, pp. 324–337, 2022.Search in Google Scholar
P. Xu, Z. Chen, D. Zou, and Q. Gu, “How can we craft large-scale neural networks in the presence of measurement noise?” Advances in Neural Information Processing Systems, vol. 34, pp. 28 140–28 152, 2021.XuP.ChenZ.ZouD.GuQ. “How can we craft large-scale neural networks in the presence of measurement noise?”Advances in Neural Information Processing Systems, vol. 34, pp. 28 140–28 152, 2021.Search in Google Scholar
R. Johnson et al., “Stochastic variance reduced gradient descent for non-convex optimization,” Journal of Machine Learning Research, vol. 21, pp. 1–30, 2020.JohnsonR., “Stochastic variance reduced gradient descent for non-convex optimization,” Journal of Machine Learning Research, vol. 21, pp. 1–30, 2020.Search in Google Scholar
T. Guo, Y. Liu, and C. Han, “An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning,” Optimization Letters, vol. 17, no. 2, pp. 385-400, 2023. doi:10.1007/s11590-023-01884-8.GuoT.LiuY.HanC. “An Overview of Stochastic Quasi-Newton Methods for Large-Scale Machine Learning,” Optimization Letters, vol. 17, no. 2, pp. 385400, 2023. doi:10.1007/s11590-023-01884-8.Open DOISearch in Google Scholar
H. Zhang, Q. Yang, and Y. Zhang, “Linear Convergence of Stochastic Gradient Descent for Non-strongly Convex Smooth Optimization,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 124-135. doi:10.5555/3327763.3327786.ZhangH.YangQ.ZhangY. “Linear Convergence of Stochastic Gradient Descent for Non-strongly Convex Smooth Optimization,” in Proceedings of the 37th International Conference on Machine Learning, 2020, pp. 124135. doi:10.5555/3327763.3327786.Open DOISearch in Google Scholar
A. K. Sinha, M. K. Gupta, and A. R. Jain, “Variance Reduction Techniques for Stochastic Gradient Descent in Deep Learning,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 1-10. doi:10.5555/3495724.3495801.SinhaA. K.GuptaM. K.JainA. R. “Variance Reduction Techniques for Stochastic Gradient Descent in Deep Learning,” in Proceedings of the 38th International Conference on Machine Learning, 2021, pp. 110. doi:10.5555/3495724.3495801.Open DOISearch in Google Scholar
T. Qianqian, L. Guannan, and C. Xingyu, “Asynchronous Parallel Stochastic Quasi-Newton Methods,” Journal of Computational and Applied Mathematics, vol. 386, pp. 112-123, 2021. doi:10.1016/j.cam.2021.112123.QianqianT.GuannanL.XingyuC. “Asynchronous Parallel Stochastic Quasi-Newton Methods,” Journal of Computational and Applied Mathematics, vol. 386, pp. 112123, 2021. doi:10.1016/j.cam.2021.112123.Open DOISearch in Google Scholar
R. M. Gower, P. Richtarik, and F. Bach, “Stochastic Block Coordinate Descent with Variance Reduction,” IEEE Transactions on Information Theory, vol. 64, no. 9, pp. 6262-6281, 2018. doi:10.1109/TIT.2018.2841289.GowerR. M.RichtarikP.BachF. “Stochastic Block Coordinate Descent with Variance Reduction,” IEEE Transactions on Information Theory, vol. 64, no. 9, pp. 62626281, 2018. doi:10.1109/TIT.2018.2841289.Open DOISearch in Google Scholar
Y. Chen et al., “Variance reduced stochastic gradient descent with momentum for non-convex optimization,” in Proceedings of the 37th International Conference on Machine Learning (ICML), 2020, pp. 1–10.ChenY., “Variance reduced stochastic gradient descent with momentum for non-convex optimization,” in Proceedings of the 37th International Conference on Machine Learning (ICML), 2020, pp. 1–10.Search in Google Scholar