Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Benign overfitting in linear regression.

Literature DB >> 32332161

Benign overfitting in linear regression.

Peter L Bartlett^1,2, Philip M Long³, Gábor Lugosi^4,5,6, Alexander Tsigler⁷.

Abstract

The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction. We give a characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy. The characterization is in terms of two notions of the effective rank of the data covariance. It shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size. By studying examples of data covariance properties that this characterization shows are required for benign overfitting, we find an important role for finite-dimensional data: the accuracy of the minimum norm interpolating prediction rule approaches the best possible accuracy for a much narrower range of properties of the data distribution when the data lie in an infinite-dimensional space vs. when the data lie in a finite-dimensional space with dimension that grows faster than the sample size.

Keywords: interpolation; linear regression; overfitting; statistical learning theory

Year: 2020 PMID： 32332161 DOI： 10.1073/pnas.1907378117

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

Keyword Cloud
Cited

11 in total

1. The science of deep learning.

Authors: Richard Baraniuk; David Donoho; Matan Gavish
Journal: Proc Natl Acad Sci U S A Date: 2020-11-23 Impact factor: 11.205

2. SURPRISES IN HIGH-DIMENSIONAL RIDGELESS LEAST SQUARES INTERPOLATION.

Authors: Trevor Hastie; Andrea Montanari; Saharon Rosset; Ryan J Tibshirani
Journal: Ann Stat Date: 2022-04-07 Impact factor: 4.904

3. A smart, practical, deep learning-based clinical decision support tool for patients in the prostate-specific antigen gray zone: model development and validation.

Authors: Sang Hun Song; Hwanik Kim; Jung Kwon Kim; Hakmin Lee; Jong Jin Oh; Sang-Chul Lee; Seong Jin Jeong; Sung Kyu Hong; Junghoon Lee; Sangjun Yoo; Min-Soo Choo; Min Chul Cho; Hwancheol Son; Hyeon Jeong; Jungyo Suh; Seok-Soo Byun
Journal: J Am Med Inform Assoc Date: 2022-10-07 Impact factor: 7.942

4. High-dimensional dynamics of generalization error in neural networks.

Authors: Madhu S Advani; Andrew M Saxe; Haim Sompolinsky
Journal: Neural Netw Date: 2020-09-05

5. Using Muse: Rapid Mobile Assessment of Brain Performance.

Authors: Olave E Krigolson; Mathew R Hammerstrom; Wande Abimbola; Robert Trska; Bruce W Wright; Kent G Hecker; Gordon Binsted
Journal: Front Neurosci Date: 2021-01-28 Impact factor: 4.677

6. Establishment and Effectiveness Evaluation of a Scoring System-RAAS (RDW, AGE, APACHE II, SOFA) for Sepsis by a Retrospective Analysis.

Authors: Yingying Huang; Shaowei Jiang; Wenjie Li; Yiwen Fan; Yuxin Leng; Chengjin Gao
Journal: J Inflamm Res Date: 2022-01-20

10. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation.

Authors: Davide Chicco; Matthijs J Warrens; Giuseppe Jurman
Journal: PeerJ Comput Sci Date: 2021-07-05