Literature DB >> 24474744

Ranking and combining multiple predictors without labeled data.

Fabio Parisi1, Francesco Strino, Boaz Nadler, Yuval Kluger.   

Abstract

In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier's accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) construct a metaclassifier more accurate than most classifiers in the ensemble? Here we present a spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the off-diagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, because its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), an unsupervised ensemble classifier whose weights are equal to these eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.

Keywords:  cartels; classifier balanced accuracy; crowdsourcing; spectral analysis; unsupervised learning

Mesh:

Year:  2014        PMID: 24474744      PMCID: PMC3910607          DOI: 10.1073/pnas.1219097111

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  5 in total

Review 1.  Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review.

Authors:  S D Walter; L M Irwig
Journal:  J Clin Epidemiol       Date:  1988       Impact factor: 6.437

Review 2.  Multidisciplinary cancer conferences: a systematic review and development of practice standards.

Authors:  F C Wright; C De Vito; B Langer; A Hunter
Journal:  Eur J Cancer       Date:  2007-02-27       Impact factor: 9.162

3.  Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer.

Authors:  Adam A Margolin; Erhan Bilal; Erich Huang; Thea C Norman; Lars Ottestad; Brigham H Mecham; Ben Sauerwine; Michael R Kellen; Lara M Mangravite; Matthew D Furia; Hans Kristian Moen Vollan; Oscar M Rueda; Justin Guinney; Nicole A Deflaux; Bruce Hoff; Xavier Schildwachter; Hege G Russnes; Daehoon Park; Veronica O Vang; Tyler Pirtle; Lamia Youseff; Craig Citro; Christina Curtis; Vessela N Kristensen; Joseph Hellerstein; Stephen H Friend; Gustavo Stolovitzky; Samuel Aparicio; Carlos Caldas; Anne-Lise Børresen-Dale
Journal:  Sci Transl Med       Date:  2013-04-17       Impact factor: 17.956

4.  VDA, a method of choosing a better algorithm with fewer validations.

Authors:  Francesco Strino; Fabio Parisi; Yuval Kluger
Journal:  PLoS One       Date:  2011-10-12       Impact factor: 3.240

5.  Picking ChIP-seq peak detectors for analyzing chromatin modification experiments.

Authors:  Mariann Micsinai; Fabio Parisi; Francesco Strino; Patrik Asp; Brian D Dynlacht; Yuval Kluger
Journal:  Nucleic Acids Res       Date:  2012-02-03       Impact factor: 16.971

  5 in total
  12 in total

1.  Data Programming: Creating Large Training Sets, Quickly.

Authors:  Alexander Ratner; Christopher De Sa; Sen Wu; Daniel Selsam; Christopher Ré
Journal:  Adv Neural Inf Process Syst       Date:  2016-12

2.  scEnhancer: a single-cell enhancer resource with annotation across hundreds of tissue/cell types in three species.

Authors:  Tianshun Gao; Zilong Zheng; Yihang Pan; Chengming Zhu; Fuxin Wei; Jinqiu Yuan; Rui Sun; Shuo Fang; Nan Wang; Yang Zhou; Jiang Qian
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

3.  A predictive modeling approach for cell line-specific long-range regulatory interactions.

Authors:  Sushmita Roy; Alireza Fotuhi Siahpirani; Deborah Chasman; Sara Knaack; Ferhat Ay; Ron Stewart; Michael Wilson; Rupa Sridharan
Journal:  Nucleic Acids Res       Date:  2015-09-03       Impact factor: 16.971

4.  Snorkel: Rapid Training Data Creation with Weak Supervision.

Authors:  Alexander Ratner; Stephen H Bach; Henry Ehrenberg; Jason Fries; Sen Wu; Christopher Ré
Journal:  Proceedings VLDB Endowment       Date:  2017-11

5.  Ensembles of change-point detectors: implications for real-time BMI applications.

Authors:  Zhengdong Xiao; Sile Hu; Qiaosheng Zhang; Xiang Tian; Yaowu Chen; Jing Wang; Zhe Chen
Journal:  J Comput Neurosci       Date:  2018-09-12       Impact factor: 1.621

6.  A spectral approach integrating functional genomic annotations for coding and noncoding variants.

Authors:  Iuliana Ionita-Laza; Kenneth McCallum; Bin Xu; Joseph D Buxbaum
Journal:  Nat Genet       Date:  2016-01-04       Impact factor: 38.330

7.  Spectral Transfer Learning Using Information Geometry for a User-Independent Brain-Computer Interface.

Authors:  Nicholas R Waytowich; Vernon J Lawhern; Addison W Bohannon; Kenneth R Ball; Brent J Lance
Journal:  Front Neurosci       Date:  2016-09-22       Impact factor: 4.677

8.  Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen.

Authors:  Michael P Menden; Dennis Wang; Mike J Mason; Bence Szalai; Krishna C Bulusu; Yuanfang Guan; Thomas Yu; Jaewoo Kang; Minji Jeon; Russ Wolfinger; Tin Nguyen; Mikhail Zaslavskiy; In Sock Jang; Zara Ghazoui; Mehmet Eren Ahsen; Robert Vogel; Elias Chaibub Neto; Thea Norman; Eric K Y Tang; Mathew J Garnett; Giovanni Y Di Veroli; Stephen Fawell; Gustavo Stolovitzky; Justin Guinney; Jonathan R Dry; Julio Saez-Rodriguez
Journal:  Nat Commun       Date:  2019-06-17       Impact factor: 14.919

9.  Assessment of network module identification across complex diseases.

Authors:  Sarvenaz Choobdar; Mehmet E Ahsen; Jake Crawford; Mattia Tomasoni; Tao Fang; David Lamparter; Junyuan Lin; Benjamin Hescott; Xiaozhe Hu; Johnathan Mercer; Ted Natoli; Rajiv Narayan; Aravind Subramanian; Jitao D Zhang; Gustavo Stolovitzky; Zoltán Kutalik; Kasper Lage; Donna K Slonim; Julio Saez-Rodriguez; Lenore J Cowen; Sven Bergmann; Daniel Marbach
Journal:  Nat Methods       Date:  2019-08-30       Impact factor: 28.547

10.  Homogenizing Estimates of Heritability Among SOLAR-Eclipse, OpenMx, APACE, and FPHI Software Packages in Neuroimaging Data.

Authors:  Peter Kochunov; Binish Patel; Habib Ganjgahi; Brian Donohue; Meghann Ryan; Elliot L Hong; Xu Chen; Bhim Adhikari; Neda Jahanshad; Paul M Thompson; Dennis Van't Ent; Anouk den Braber; Eco J C de Geus; Rachel M Brouwer; Dorret I Boomsma; Hilleke E Hulshoff Pol; Greig I de Zubicaray; Katie L McMahon; Nicholas G Martin; Margaret J Wright; Thomas E Nichols
Journal:  Front Neuroinform       Date:  2019-03-12       Impact factor: 4.081

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.