Literature DB >> 36038559

Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated.

Eran Elhaik1.   

Abstract

Principal Component Analysis (PCA) is a multivariate analysis that reduces the complexity of datasets while preserving data covariance. The outcome can be visualized on colorful scatterplots, ideally with only a minimal loss of information. PCA applications, implemented in well-cited packages like EIGENSOFT and PLINK, are extensively used as the foremost analyses in population genetics and related fields (e.g., animal and plant or medical genetics). PCA outcomes are used to shape study design, identify, and characterize individuals and populations, and draw historical and ethnobiological conclusions on origins, evolution, dispersion, and relatedness. The replicability crisis in science has prompted us to evaluate whether PCA results are reliable, robust, and replicable. We analyzed twelve common test cases using an intuitive color-based model alongside human population data. We demonstrate that PCA results can be artifacts of the data and can be easily manipulated to generate desired outcomes. PCA adjustment also yielded unfavorable outcomes in association studies. PCA results may not be reliable, robust, or replicable as the field assumes. Our findings raise concerns about the validity of results reported in the population genetics literature and related fields that place a disproportionate reliance upon PCA outcomes and the insights derived from them. We conclude that PCA may have a biasing role in genetic investigations and that 32,000-216,000 genetic studies should be reevaluated. An alternative mixed-admixture population genetic model is discussed.
© 2022. The Author(s).

Entities:  

Mesh:

Year:  2022        PMID: 36038559      PMCID: PMC9424212          DOI: 10.1038/s41598-022-14395-4

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.996


  95 in total

1.  Interview with Luigi Luca Cavalli-Sforza: past research and directions for future investigations in human population genetics. Interview by Franz Manni.

Authors:  Luigi Luca Cavalli-Sforza
Journal:  Hum Biol       Date:  2010-06       Impact factor: 0.553

2.  The genome-wide structure of the Jewish people.

Authors:  Doron M Behar; Bayazit Yunusbayev; Mait Metspalu; Ene Metspalu; Saharon Rosset; Jüri Parik; Siiri Rootsi; Gyaneshwer Chaubey; Ildus Kutuev; Guennady Yudkovsky; Elza K Khusnutdinova; Oleg Balanovsky; Ornella Semino; Luisa Pereira; David Comas; David Gurwitz; Batsheva Bonne-Tamir; Tudor Parfitt; Michael F Hammer; Karl Skorecki; Richard Villems
Journal:  Nature       Date:  2010-06-09       Impact factor: 49.962

3.  Population genetics. Private partnership to trace human history.

Authors:  Elizabeth Pennisi
Journal:  Science       Date:  2005-04-15       Impact factor: 47.728

4.  RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference.

Authors:  Brian K Maples; Simon Gravel; Eimear E Kenny; Carlos D Bustamante
Journal:  Am J Hum Genet       Date:  2013-08-01       Impact factor: 11.025

5.  Synthetic maps of human gene frequencies in Europeans.

Authors:  P Menozzi; A Piazza; L Cavalli-Sforza
Journal:  Science       Date:  1978-09-01       Impact factor: 47.728

Review 6.  Population genetic considerations for using biobanks as international resources in the pandemic era and beyond.

Authors:  Hannah Carress; Daniel John Lawson; Eran Elhaik
Journal:  BMC Genomics       Date:  2021-05-17       Impact factor: 3.969

7.  Analysis and application of European genetic substructure using 300 K SNP information.

Authors:  Chao Tian; Robert M Plenge; Michael Ransom; Annette Lee; Pablo Villoslada; Carlo Selmi; Lars Klareskog; Ann E Pulver; Lihong Qi; Peter K Gregersen; Michael F Seldin
Journal:  PLoS Genet       Date:  2008-01       Impact factor: 5.917

8.  Genome-wide analysis of the role of copy-number variation in pancreatic cancer risk.

Authors:  Jason A Willis; Semanti Mukherjee; Irene Orlow; Agnes Viale; Kenneth Offit; Robert C Kurtz; Sara H Olson; Robert J Klein
Journal:  Front Genet       Date:  2014-02-13       Impact factor: 4.599

9.  Genetic structure, divergence and admixture of Han Chinese, Japanese and Korean populations.

Authors:  Yuchen Wang; Dongsheng Lu; Yeun-Jun Chung; Shuhua Xu
Journal:  Hereditas       Date:  2018-04-06       Impact factor: 3.271

10.  Factor analysis of ancient population genomic samples.

Authors:  Olivier François; Flora Jay
Journal:  Nat Commun       Date:  2020-09-16       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.