| Literature DB >> 32229555 |
Matthew J Salganik1, Ian Lundberg2, Alexander T Kindel2, Caitlin E Ahearn3, Khaled Al-Ghoneim4, Abdullah Almaatouq5,6, Drew M Altschul7, Jennie E Brand3,8, Nicole Bohme Carnegie9, Ryan James Compton10, Debanjan Datta11, Thomas Davidson12, Anna Filippova13, Connor Gilroy14, Brian J Goode15, Eaman Jahani16, Ridhi Kashyap17,18,19, Antje Kirchner20, Stephen McKay21, Allison C Morgan22, Alex Pentland6, Kivan Polimis23, Louis Raes24, Daniel E Rigobon25, Claudia V Roberts26, Diana M Stanescu27, Yoshihiko Suhara6, Adaner Usmani28, Erik H Wang27, Muna Adem29, Abdulla Alhajri30, Bedoor AlShebli31, Redwane Amin32, Ryan B Amos26, Lisa P Argyle33, Livia Baer-Bositis34, Moritz Büchi35, Bo-Ryehn Chung36, William Eggert37, Gregory Faletto38, Zhilin Fan39, Jeremy Freese34, Tejomay Gadgil40, Josh Gagné34, Yue Gao41, Andrew Halpern-Manners29, Sonia P Hashim26, Sonia Hausen34, Guanhua He42, Kimberly Higuera34, Bernie Hogan43, Ilana M Horwitz44, Lisa M Hummel34, Naman Jain25, Kun Jin45, David Jurgens46, Patrick Kaminski29,47, Areg Karapetyan48,49, E H Kim34, Ben Leizman26, Naijia Liu27, Malte Möser26, Andrew E Mack27, Mayank Mahajan26, Noah Mandell50, Helge Marahrens29, Diana Mercado-Garcia44, Viola Mocz51, Katariina Mueller-Gastell34, Ahmed Musse52, Qiankun Niu32, William Nowak53, Hamidreza Omidvar54, Andrew Or26, Karen Ouyang26, Katy M Pinto55, Ethan Porter56, Kristin E Porter57, Crystal Qian26, Tamkinat Rauf34, Anahit Sargsyan58, Thomas Schaffner26, Landon Schnabel34, Bryan Schonfeld27, Ben Sender59, Jonathan D Tang26, Emma Tsurkov34, Austin van Loon34, Onur Varol60,61, Xiafei Wang62, Zhi Wang61,63, Julia Wang26, Flora Wang59, Samantha Weissman26, Kirstie Whitaker64,65, Maria K Wolters66, Wei Lee Woon67, James Wu68, Catherine Wu26, Kengran Yang54, Jingwen Yin39, Bingyu Zhao69, Chenyun Zhu39, Jeanne Brooks-Gunn70,71, Barbara E Engelhardt26,36, Moritz Hardt72, Dean Knox27, Karen Levy73, Arvind Narayanan26, Brandon M Stewart2, Duncan J Watts74,75,76, Sara McLanahan1.
Abstract
How predictable are life trajectories? We investigated this question with a scientific mass collaboration using the common task method; 160 teams built predictive models for six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. Despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model. Within each outcome, prediction error was strongly associated with the family being predicted and weakly associated with the technique used to generate the prediction. Overall, these results suggest practical limits to the predictability of life outcomes in some settings and illustrate the value of mass collaborations in the social sciences.Entities:
Keywords: life course; machine learning; mass collaboration; prediction
Mesh:
Year: 2020 PMID: 32229555 PMCID: PMC7165437 DOI: 10.1073/pnas.1915006117
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Data collection modules in the Fragile Families study. Each module is made up of ∼10 sections, where each section includes questions about a specific topic (e.g., marriage attitudes, family characteristics, demographic characteristics). Information about the topics included in each module is presented in . During the Fragile Families Challenge, data from waves 1 to 5 (birth to age 9 y) were used to predict outcomes in wave 6 (age 15 y).
Fig. 2.Datasets in the Fragile Families Challenge. During the Fragile Families Challenge, participants used the background data (measured from child’s birth to age 9 y) and the training data (measured at child age 15 y) to predict the holdout data as accurately as possible. While the Fragile Families Challenge was underway, participants could assess the accuracy of their predictions in the leaderboard data. At the end of the Fragile Families Challenge, we assessed the accuracy of the predictions in the holdout data.
Fig. 3.Performance in the holdout data of the best submissions and a four variable benchmark model (). A shows the best performance (bars) and a benchmark model (lines). Error bars are 95% confidence intervals (). B–D compare the predictions and the truth; perfect predictions would lie along the diagonal. E–G show the predicted probabilities for cases where the event happened and where the event did not happen. In B–G, the dashed line is the mean of the training data for that outcome.
Fig. 4.Heatmaps of the squared prediction error for each observation in the holdout data. Within each heatmap, each row represents a team that made a qualifying submission (sorted by predictive accuracy), and each column represents a family (sorted by predictive difficulty). Darker colors indicate higher squared error; scales are different across subfigures; order of rows and columns are different across subfigures. The hardest-to-predict observations tend to be those that are very different from the mean of the training data, such as children with unusually high or low GPAs (). This pattern is particularly clear for the three binary outcomes—eviction, job training, layoff—where the errors are large for families where the event occurred and small for the families where it did not.