Samuel Daudin (LJLL, Univ. Paris-Cité)
Performing regression tasks with deep neural networks can be modeled as an optimal control problem for an ordinary differential equation. The ODE describes the evolution of the features along the layers of the network while the control is given, for each layer, by the parameters (weights and biases) of the neurons. We investigate a relaxation of this problem where controls are taken to be probability measures over the parameter space and the cost involves an additional entropy penalization. We are particularly interested in the stability of the optimal parameters — where stability is related to some second order optimality conditions. We show that, for an open and dense set of initial data (in terms of the initial distribution of the features), there is actually a unique stable global minimizer in the control problem. Moreover we prove that stable minimizers satisfy some local Polyak-Lojasiewicz condition and that the (continuous analog of the) gradient descent with back-propagation converges exponentially fast when initialized nearby a stable minimizer. This is a joint work with François Delarue.