The Generalization Error Bound for A Stochastic Gradient Descent Family bia A Gaussian Approximation Method

Recent works have developed model complexity based and algorithm based generalization error bounds to explain how stochastic gradient descent (SGD) methods help over-parameterized models generalize better. However, previous works are limited by their scope of analysis and fail to provide comprehensive explanations. In this paper, we propose a novel Gaussian approximation framework to establish generalization error bounds for the 𝒰-SGD family, which is a class of SGD with asymptotically unbiased and uniformly bounded gradient noise. We study 𝒰-SGD dynamics, and we show both theoretically and numerically that the limiting model parameter distribution tends to be Gaussian, even when the original gradient noise is non-Gaussian. For a 𝒰-SGD family, we establish a desirable iteration number independent generalization error bound at the order of $𝒪 ((1 + \sqrt{log (p \sqrt{n})}) / \sqrt{n})$ \mathcal{O}\left( {\left( {1 + \sqrt {\log \left( {p\sqrt n } \right)} } \right)/\sqrt n } \right) , where n and p stand for the sample size and parameter dimension. Based on our analysis, we propose two general types of methods to help models generalize better, termed as the additive and multiplicative noise insertions. We show that these methods significantly reduce the dominant term of the generalization error bound.

Język:: Angielski

Częstotliwość wydawania:: 4 razy w roku
Dziedziny czasopisma:: Matematyka, Matematyka stosowana

Kanał RSS czasopisma

The Generalization Error Bound for A Stochastic Gradient Descent Family bia A Gaussian Approximation Method

Hao Chen

Zhanfeng Mo

Zhouwang Yang

Data publikacji: 24 cze 2025

Zakres stron: 251 - 266

Otrzymano: 03 cze 2024

Przyjęty: 16 paź 2024

DOI: https://doi.org/10.61822/amcs-2025-0018

Słowa kluczowestochastic gradient descent, Gaussian approximation, KL-divergence, generalization bound

© 2025 Hao Chen et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Słowa kluczowe
stochastic gradient descent, Gaussian approximation, KL-divergence, generalization bound