Harrison G Zhang1,2,3, Boris P Hejblum4,5, Griffin M Weber1, Nathan P Palmer1, Susanne E Churchill1, Peter Szolovits6, Shawn N Murphy7,8, Katherine P Liao1,2, Isaac S Kohane1, Tianxi Cai1,4. 1. Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA. 2. Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, Massachusetts, USA. 3. Department of Biological Sciences, Columbia University, New York City, New York, USA. 4. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA. 5. Bordeaux Population Health, Université de Bordeaux, Inserm U1219, Inria SISTM, Bordeaux, France. 6. Computer Science and Artificial Intelligence Laboratory (CSAIL), Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 7. Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts, USA. 8. Research IS and Computing, Mass General Brigham HealthCare, Charlestown, Massachusetts, USA.
Abstract
OBJECTIVE: Large amounts of health data are becoming available for biomedical research. Synthesizing information across databases may capture more comprehensive pictures of patient health and enable novel research studies. When no gold standard mappings between patient records are available, researchers may probabilistically link records from separate databases and analyze the linked data. However, previous linked data inference methods are constrained to certain linkage settings and exhibit low power. Here, we present ATLAS, an automated, flexible, and robust association testing algorithm for probabilistically linked data. MATERIALS AND METHODS: Missing variables are imputed at various thresholds using a weighted average method that propagates uncertainty from probabilistic linkage. Next, estimated effect sizes are obtained using a generalized linear model. ATLAS then conducts the threshold combination test by optimally combining P values obtained from data imputed at varying thresholds using Fisher's method and perturbation resampling. RESULTS: In simulations, ATLAS controls for type I error and exhibits high power compared to previous methods. In a real-world genetic association study, meta-analysis of ATLAS-enabled analyses on a linked cohort with analyses using an existing cohort yielded additional significant associations between rheumatoid arthritis genetic risk score and laboratory biomarkers. DISCUSSION: Weighted average imputation weathers false matches and increases contribution of true matches to mitigate linkage error-induced bias. The threshold combination test avoids arbitrarily choosing a threshold to rule a match, thus automating linked data-enabled analyses and preserving power. CONCLUSION: ATLAS promises to enable novel and powerful research studies using linked data to capitalize on all available data sources.
OBJECTIVE: Large amounts of health data are becoming available for biomedical research. Synthesizing information across databases may capture more comprehensive pictures of patient health and enable novel research studies. When no gold standard mappings between patient records are available, researchers may probabilistically link records from separate databases and analyze the linked data. However, previous linked data inference methods are constrained to certain linkage settings and exhibit low power. Here, we present ATLAS, an automated, flexible, and robust association testing algorithm for probabilistically linked data. MATERIALS AND METHODS: Missing variables are imputed at various thresholds using a weighted average method that propagates uncertainty from probabilistic linkage. Next, estimated effect sizes are obtained using a generalized linear model. ATLAS then conducts the threshold combination test by optimally combining P values obtained from data imputed at varying thresholds using Fisher's method and perturbation resampling. RESULTS: In simulations, ATLAS controls for type I error and exhibits high power compared to previous methods. In a real-world genetic association study, meta-analysis of ATLAS-enabled analyses on a linked cohort with analyses using an existing cohort yielded additional significant associations between rheumatoid arthritis genetic risk score and laboratory biomarkers. DISCUSSION: Weighted average imputation weathers false matches and increases contribution of true matches to mitigate linkage error-induced bias. The threshold combination test avoids arbitrarily choosing a threshold to rule a match, thus automating linked data-enabled analyses and preserving power. CONCLUSION: ATLAS promises to enable novel and powerful research studies using linked data to capitalize on all available data sources.
Authors: Katherine P Liao; Tianxi Cai; Vivian Gainer; Sergey Goryachev; Qing Zeng-treitler; Soumya Raychaudhuri; Peter Szolovits; Susanne Churchill; Shawn Murphy; Isaac Kohane; Elizabeth W Karlson; Robert M Plenge Journal: Arthritis Care Res (Hoboken) Date: 2010-08 Impact factor: 4.794
Authors: Joshua C Denny; Marylyn D Ritchie; Melissa A Basford; Jill M Pulley; Lisa Bastarache; Kristin Brown-Gentry; Deede Wang; Dan R Masys; Dan M Roden; Dana C Crawford Journal: Bioinformatics Date: 2010-03-24 Impact factor: 6.937
Authors: Yukinori Okada; Di Wu; Gosia Trynka; Towfique Raj; Chikashi Terao; Katsunori Ikari; Yuta Kochi; Koichiro Ohmura; Akari Suzuki; Shinji Yoshida; Robert R Graham; Arun Manoharan; Ward Ortmann; Tushar Bhangale; Joshua C Denny; Robert J Carroll; Anne E Eyler; Jeffrey D Greenberg; Joel M Kremer; Dimitrios A Pappas; Lei Jiang; Jian Yin; Lingying Ye; Ding-Feng Su; Jian Yang; Gang Xie; Ed Keystone; Harm-Jan Westra; Tõnu Esko; Andres Metspalu; Xuezhong Zhou; Namrata Gupta; Daniel Mirel; Eli A Stahl; Dorothée Diogo; Jing Cui; Katherine Liao; Michael H Guo; Keiko Myouzen; Takahisa Kawaguchi; Marieke J H Coenen; Piet L C M van Riel; Mart A F J van de Laar; Henk-Jan Guchelaar; Tom W J Huizinga; Philippe Dieudé; Xavier Mariette; S Louis Bridges; Alexandra Zhernakova; Rene E M Toes; Paul P Tak; Corinne Miceli-Richard; So-Young Bang; Hye-Soon Lee; Javier Martin; Miguel A Gonzalez-Gay; Luis Rodriguez-Rodriguez; Solbritt Rantapää-Dahlqvist; Lisbeth Arlestig; Hyon K Choi; Yoichiro Kamatani; Pilar Galan; Mark Lathrop; Steve Eyre; John Bowes; Anne Barton; Niek de Vries; Larry W Moreland; Lindsey A Criswell; Elizabeth W Karlson; Atsuo Taniguchi; Ryo Yamada; Michiaki Kubo; Jun S Liu; Sang-Cheol Bae; Jane Worthington; Leonid Padyukov; Lars Klareskog; Peter K Gregersen; Soumya Raychaudhuri; Barbara E Stranger; Philip L De Jager; Lude Franke; Peter M Visscher; Matthew A Brown; Hisashi Yamanaka; Tsuneyo Mimori; Atsushi Takahashi; Huji Xu; Timothy W Behrens; Katherine A Siminovitch; Shigeki Momohara; Fumihiko Matsuda; Kazuhiko Yamamoto; Robert M Plenge Journal: Nature Date: 2013-12-25 Impact factor: 49.962