The Generalization Error Bound for A Stochastic Gradient Descent Family bia A Gaussian Approximation Method

Arora, S., Du, S.S., Hu, W., Li, Z., Salakhutdinov, R.R. and Wang, R. (2019). On exact computation with an infinitely wide neural net, in H. Wallach et al. (Eds), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, New York, pp. 8141–8150. Search in Google Scholar

Bartlett, P.L., Foster, D.J. and Telgarsky, M.J. (2017). Spectrally-normalized margin bounds for neural networks, in I. Guyon et al. (Eds), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, New York, pp. 6240–6249. Search in Google Scholar

Bottou, L. (1998). On-line learning and stochastic approximations, in L. Bottou (Ed) On-line Learning in Neural Networks, Cambridge University Press, pp. 9–42. Search in Google Scholar

Bousquet, O. and Elisseeff, A. (2002). Stability and generalization, Journal of Machine Learning Research 2: 499–526, DOI: 10.1162/153244302760200704. Search in Google Scholar

Chen, H., Mo, Z., Yang, Z. and Wang, X. (2019). Theoretical investigation of generalization bound for residual networks, Proceedings of the 28th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco, pp. 2081–2087, DOI: 10.24963/ijcai.2019/288. Search in Google Scholar

Chen, Y., Chang, H., Meng, J. and Zhang, D. (2019). Ensemble neural networks (enn): A gradient-free stochastic method, Neural Networks 110: 170–185. Search in Google Scholar

D’Agostino, R. and Pearson, E.S. (1973). Tests for departure from normality. empirical results for the distributions of b² and √b¹, Biometrika 60(3): 613–622. Search in Google Scholar

Dieuleveut, A., Durmus, A. and Bach, F. (2017). Bridging the gap between constant step size stochastic gradient descent and markov chains, arXiv: 1707.06386. Search in Google Scholar

Elisseeff, A., Evgeniou, T. and Pontil, M. (2005). Stability of randomized learning algorithms, Journal of Machine Learning Research 6: 55–79. Search in Google Scholar

Feng, Y., Gao, T., Li, L., Liu, J.-G. and Lu, Y. (2020). Uniform-in-time weak error analysis for stochastic gradient descent algorithms via diffusion approximation, Communications in Mathematical Sciences 18(1). Search in Google Scholar

Hardt, M., Recht, B. and Singer, Y. (2016). Train faster, generalize better: Stability of stochastic gradient descent, in M.F. Balcan and K.Q.Weinberger (Eds), Proceedings of The 33rd International Conference on Machine Learning, New York, USA, pp. 1225–1234. Search in Google Scholar

He, F., Liu, T. and Tao, D. (2019). Control batch size and learning rate to generalize well: Theoretical and empirical evidence, in H.Wallach et al. (Eds), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, New York, pp. 1143–1152. Search in Google Scholar

He, K., Zhang, X., Ren, S. and Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, Poceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1026–1034. Search in Google Scholar

Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H. and Kalenichenko, D. (2018). Quantization and training of neural networks for efficient integer-arithmetic-only inference, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2704–2713, DOI:10.1109/CVPR.2018.00286. Search in Google Scholar

Kramers, H. (1940). Brownian motion in a field of force and the diffusion model of chemical reactions, Physica 7(4): 284–304. Search in Google Scholar

Krizhevsky, A. and Hinton, G. (2009). LearningMultiple Layers of Features from Tiny Images, Master’s thesis, Department of Computer Science, University of Toronto. Search in Google Scholar

LeCun, Y. and Cortes, C. (2010). MNIST handwritten digit database, http://yann.lecun.com/exdb/mnist/. Search in Google Scholar

Li, H., Xu, Z., Taylor, G., Studer, C. and Goldstein, T. (2018). Visualizing the loss landscape of neural nets, in S. Bengio et al. (Eds), Advances in Neural Information Processing Systems, Vol. 31, Curran Associates, New York, pp. 6389–6399. Search in Google Scholar

Li, J., Luo, X. and Qiao, M. (2020). On generalization error bounds of noisy gradient methods for non-convex learning, arXiv: 1902.00621. Search in Google Scholar

Li, Q., Tai, C. and Weinan, E. (2019). Stochastic modified equations and dynamics of stochastic gradient algorithms i: Mathematical foundations, Journal of Machine Learning Research 20(40): 1–47. Search in Google Scholar

Li, X., Lu, J.,Wang, Z., Haupt, J. and Zhao, T. (2019). On tighter generalization bound for deep neural networks: CNNs, ResNets, and beyond, arXiv: 1806.05159. Search in Google Scholar

Ljung, L., Pflug, G. and Walk, H. (1992). Stochastic Approximation and Optimization of Random Systems, Birkhäuser, Basel, Switzerland. Search in Google Scholar

London, B., Huang, B., Taskar, B. and Getoor, L. (2014). PAC-Bayesian Collective Stability, in S. Kaski and J. Corander (Eds), Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, Reykjavik, Iceland, pp. 585–594. Search in Google Scholar

Mandt, S., Hoffman, M.D. and Blei, D.M. (2017). Stochastic gradient descent as approximate Bayesian inference, Journal of Machine Learning Research 18(1): 4873–4907. Search in Google Scholar

McAllester, D.A. (1999). PAC-Bayesian model averaging, Proceedings of the 12th Annual Conference on Computational Learning Theory, COLT ’99, New York, NY, USA, pp. 164–170, DOI: 10.1145/307400.307435. Search in Google Scholar

Negrea, J., Haghifam, M., Dziugaite, G. K., Khisti, A. and Roy, D. M. (2019). Information-theoretic generalization bounds for sgld via data-dependent estimates, in H. Wallach et al. (Eds), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, New York, pp. 11015–11025. Search in Google Scholar

Panigrahi, A., Somani, R., Goyal, N. and Netrapalli, P. (2019). Non-gaussianity of stochastic gradient noise, arXiv: 1910.09626. Search in Google Scholar

Qian, Y., Wang, Y., Wang, B., Gu, Z., Guo, Y. and Swaileh, W. (2022). Hessian-free second-order adversarial examples for adversarial learning, arXiv: 2207.01396. Search in Google Scholar

Rong, Y., Huang, W., Xu, T. and Huang, J. (2020). DropEdge: Towards deep graph convolutional networks on node classification, arXiv: 1907.10903. Search in Google Scholar

Shalev-Shwartz, S. and Ben-David, S. (2014). Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, International Conference on Learning Representations 2020, https://openreview.net/pdf?id=Hkx1qkrKPr. Search in Google Scholar

Simsekli, U., Sagun, L. and Gurbuzbalaban, M. (2019). A tail-index analysis of stochastic gradient noise in deep neural networks, in K. Chaudhuri and R. Salakhutdinov (Eds), Proceedings of the 36th International Conference on Machine Learning, New York, pp. 5827–5837. Search in Google Scholar

Sulaiman, I.M., Kaelo, P., Khalid, R. and Nawawi, M.K.M. (2024). A descent generalized rmil spectral gradient algorithm for optimization problems, Internationl Jouranl of Applied Mathematics and Computer Science 34(2): 225–233, DOI: 10.61822/amcs-2024-0016. Search in Google Scholar

Sutskever, I., Martens, J., Dahl, G. and Hinton, G. (2013). On the importance of initialization and momentum in deep learning, in S. Dasgupta and D. McAllester (Eds), Proceedings of the 30th International Conference on Machine Learning, Atlanta, USA, pp. 1139–1147. Search in Google Scholar

Villani, C. (2008). Optimal Transport: Old and New, Grundlehren der Mathematischen Wissenschaften, Springer, Berlin/Heidelberg. Search in Google Scholar

Weinan, E., Ma, C. and Wang, Q. (2019). A priori estimates of the population risk for residual networks, arXiv: 1903.02154. Search in Google Scholar

Welling, M. and Teh, Y.W. (2011). Bayesian learning via stochastic gradient Langevin dynamics, Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, Omnipress, Madison, WI, USA, p. 681–688. Search in Google Scholar

Wu, J., Hu, W., Xiong, H., Huan, J., Braverman, V. and Zhu, Z. (2020). On the noisy gradient descent that generalizes as SGD, International Conference on Machine Learning, PMLR 119: 10367–10376. Search in Google Scholar

Zhang, C., Bengio, S., Hardt, M., Recht, B. and Vinyals, O. (2016). Understanding deep learning requires rethinking generalization, arXiv: 1611.03530. Search in Google Scholar

Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Mathematics, Applied Mathematics

Journal RSS Feed

The Generalization Error Bound for A Stochastic Gradient Descent Family bia A Gaussian Approximation Method

Hao Chen

Zhanfeng Mo

Zhouwang Yang

Published Online: Jun 24, 2025

Page range: 251 - 266

Received: Jun 03, 2024

Accepted: Oct 16, 2024

DOI: https://doi.org/10.61822/amcs-2025-0018

Keywordsstochastic gradient descent, Gaussian approximation, KL-divergence, generalization bound

© 2025 Hao Chen et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
stochastic gradient descent, Gaussian approximation, KL-divergence, generalization bound