Some Mathematical Aspects of Deep Learning and Stochastic Gradient Descent

Lexing Ying, Stanford University
Fine Hall 214

In-Person Talk 

This talk concerns several mathematical aspects of deep learning and stochastic gradient descent. The first aspect is why deep neural networks trained with stochastic gradient descent often generalize. We will make a connection between the generalization and the stochastic stability of the stochastic gradient descent dynamics. The second aspect is to understand the training process of stochastic gradient descent. Here, we use several simple mathematical examples to explain several key empirical observations, including the edge of stability, exploration of flat minimum, and learning rate decay.

Based on joint work with Chao Ma.