Risk bounds for deep learning

Bos, J.M.

In this thesis, deep learning is studied from a statistical perspective. Convergence rates for the worst case risk bounds of neural network estimators are obtained in the classification, density estimation and linear regression model. Special attention is given to the role of input dimension since in practice, neural networks have shown promising results for high dimensional input settings.

First, the estimation of conditional class probabilities under the cross-entropy loss is studied. A challenge with this loss is that it becomes unbounded near zero. To deal with this, the loss is truncated. Convergence rates are obtained for a neural network estimator under this truncated loss.

The second problem considered is density estimation. A two step procedure is proposed. The first step transforms the density estimation problem into a regression problem by constructing response variables using a kernel density estimator on half of the data. In the second step, a...

In this thesis, deep learning is studied from a statistical perspective. Convergence rates for the worst case risk bounds of neural network estimators are obtained in the classification, density estimation and linear regression model. Special attention is given to the role of input dimension since in practice, neural networks have shown promising results for high dimensional input settings.

First, the estimation of conditional class probabilities under the cross-entropy loss is studied. A challenge with this loss is that it becomes unbounded near zero. To deal with this, the loss is truncated. Convergence rates are obtained for a neural network estimator under this truncated loss.

The second problem considered is density estimation. A two step procedure is proposed. The first step transforms the density estimation problem into a regression problem by constructing response variables using a kernel density estimator on half of the data. In the second step, a neural network is fitted to this constructed data. Convergence rates for this method are obtained using existing approximation results for compositional functions.

Finally, forward gradient descent is studied. This is a biologically motivated alternative for gradient descent. Convergence rates are derived for this method in the linear regression model with random design.

Show less

Leiden University Scholarly Publications

View statistics

Documents

In Collections

Risk bounds for deep learning

Funding