Literature DB >> 26250683

STATISTICS. The reusable holdout: Preserving validity in adaptive data analysis.

Cynthia Dwork1, Vitaly Feldman2, Moritz Hardt3, Toniann Pitassi4, Omer Reingold5, Aaron Roth6.   

Abstract

Misapplication of statistical data analysis is a common cause of spurious discoveries in scientific research. Existing approaches to ensuring the validity of inferences drawn from data assume a fixed procedure to be performed, selected before the data are examined. In common practice, however, data analysis is an intrinsically adaptive process, with new analyses generated on the basis of data exploration, as well as the results of previous analyses on the same data. We demonstrate a new approach for addressing the challenges of adaptivity based on insights from privacy-preserving data analysis. As an application, we show how to safely reuse a holdout data set many times to validate the results of adaptively chosen analyses.
Copyright © 2015, American Association for the Advancement of Science.

Year:  2015        PMID: 26250683     DOI: 10.1126/science.aaa9375

Source DB:  PubMed          Journal:  Science        ISSN: 0036-8075            Impact factor:   47.728


  20 in total

Review 1.  Crowdsourcing biomedical research: leveraging communities as innovation engines.

Authors:  Julio Saez-Rodriguez; James C Costello; Stephen H Friend; Michael R Kellen; Lara Mangravite; Pablo Meyer; Thea Norman; Gustavo Stolovitzky
Journal:  Nat Rev Genet       Date:  2016-07-15       Impact factor: 53.242

Review 2.  Integrating explanation and prediction in computational social science.

Authors:  Jake M Hofman; Duncan J Watts; Susan Athey; Filiz Garip; Thomas L Griffiths; Jon Kleinberg; Helen Margetts; Sendhil Mullainathan; Matthew J Salganik; Simine Vazire; Alessandro Vespignani; Tal Yarkoni
Journal:  Nature       Date:  2021-06-30       Impact factor: 49.962

3.  Consensus features nested cross-validation.

Authors:  Saeid Parvandeh; Hung-Wen Yeh; Martin P Paulus; Brett A McKinney
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

4.  Enabling Privacy-Preserving GWASs in Heterogeneous Human Populations.

Authors:  Sean Simmons; Cenk Sahinalp; Bonnie Berger
Journal:  Cell Syst       Date:  2016-07-21       Impact factor: 10.304

5.  Approval policies for modifications to machine learning-based software as a medical device: A study of bio-creep.

Authors:  Jean Feng; Scott Emerson; Noah Simon
Journal:  Biometrics       Date:  2020-10-11       Impact factor: 2.571

6.  The Psychological Science Accelerator: Advancing Psychology through a Distributed Collaborative Network.

Authors:  Hannah Moshontz; Lorne Campbell; Charles R Ebersole; Hans IJzerman; Heather L Urry; Patrick S Forscher; Jon E Grahe; Randy J McCarthy; Erica D Musser; Jan Antfolk; Christopher M Castille; Thomas Rhys Evans; Susann Fiedler; Jessica Kay Flake; Diego A Forero; Steve M J Janssen; Justin Robert Keene; John Protzko; Balazs Aczel; Sara Álvarez Solas; Daniel Ansari; Dana Awlia; Ernest Baskin; Carlota Batres; Martha Lucia Borras-Guevara; Cameron Brick; Priyanka Chandel; Armand Chatard; William J Chopik; David Clarance; Nicholas A Coles; Katherine S Corker; Barnaby James Wyld Dixson; Vilius Dranseika; Yarrow Dunham; Nicholas W Fox; Gwendolyn Gardiner; S Mason Garrison; Tripat Gill; Amanda C Hahn; Bastian Jaeger; Pavol Kačmár; Gwenaël Kaminski; Philipp Kanske; Zoltan Kekecs; Melissa Kline; Monica A Koehn; Pratibha Kujur; Carmel A Levitan; Jeremy K Miller; Ceylan Okan; Jerome Olsen; Oscar Oviedo-Trespalacios; Asil Ali Özdoğru; Babita Pande; Arti Parganiha; Noorshama Parveen; Gerit Pfuhl; Sraddha Pradhan; Ivan Ropovik; Nicholas O Rule; Blair Saunders; Vidar Schei; Kathleen Schmidt; Margaret Messiah Singh; Miroslav Sirota; Crystal N Steltenpohl; Stefan Stieger; Daniel Storage; Gavin Brent Sullivan; Anna Szabelska; Christian K Tamnes; Miguel A Vadillo; Jaroslava V Valentova; Wolf Vanpaemel; Marco A C Varella; Evie Vergauwe; Mark Verschoor; Michelangelo Vianello; Martin Voracek; Glenn P Williams; John Paul Wilson; Janis H Zickfeld; Jack D Arnal; Burak Aydin; Sau-Chin Chen; Lisa M DeBruine; Ana Maria Fernandez; Kai T Horstmann; Peder M Isager; Benedict Jones; Aycan Kapucu; Hause Lin; Michael C Mensink; Gorka Navarrete; Miguel A Silan; Christopher R Chartier
Journal:  Adv Methods Pract Psychol Sci       Date:  2018-10-01

7.  Differential privacy-based evaporative cooling feature selection and classification with relief-F and random forests.

Authors:  Trang T Le; W Kyle Simmons; Masaya Misaki; Jerzy Bodurka; Bill C White; Jonathan Savitz; Brett A McKinney
Journal:  Bioinformatics       Date:  2017-09-15       Impact factor: 6.937

8.  Generating highly accurate prediction hypotheses through collaborative ensemble learning.

Authors:  Nino Arsov; Martin Pavlovski; Lasko Basnarkov; Ljupco Kocarev
Journal:  Sci Rep       Date:  2017-03-17       Impact factor: 4.379

Review 9.  The Weak Spots in Contemporary Science (and How to Fix Them).

Authors:  Jelte M Wicherts
Journal:  Animals (Basel)       Date:  2017-11-27       Impact factor: 2.752

10.  Realizing privacy preserving genome-wide association studies.

Authors:  Sean Simmons; Bonnie Berger
Journal:  Bioinformatics       Date:  2016-01-14       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.