Double descent

In statistics and machine learning, double descent is the phenomenon where a statistical model with a small number of parameters and a model with an extremely large number of parameters have a small error, but a model whose number of parameters is about the same as the number of data points used to train the model will have a large error.

History
Early observations of double descent in specific models date back to 1989, while the double descent phenomenon as a broader concept shared by many models gained popularity around 2019. The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant error (an extrapolation of bias-variance tradeoff), and the empirical observations in the 2010s that some modern machine learning models tend to perform better with larger models.

Theoretical models
shows that double descent occurs in linear regression with isotropic Gaussian covariates and isotropic Gaussian noise.

A model of double descent at the thermodynamic limit has been analyzed by the replica method, and the result has been confirmed numerically.

Empirical examples
The scaling behavior of double descent has been found to follow a broken neural scaling law functional form.