Literature DB >> 30591078

Polygenic risk scores: a biased prediction?

Francisco M De La Vega^1,2, Carlos D Bustamante^3,4,5.

Abstract

A new study highlights the biases and inaccuracies of polygenic risk scores (PRS) when predicting disease risk in individuals from populations other than those used in their derivation. The design bias of workhorse tools used for research, particularly genotyping arrays, contributes to these distortions. To avoid further inequities in health outcomes, the inclusion of diverse populations in research, unbiased genotyping, and methods of bias reduction in PRS are critical.

Entities: Chemical Disease Species

Mesh：

Year: 2018 PMID： 30591078 PMCID： PMC6309089 DOI： 10.1186/s13073-018-0610-x

Source DB: PubMed Journal: Genome Med ISSN： 1756-994X Impact factor: 11.117

Resurgence of polygenic risk scores

There is a renewed interest in developing and applying polygenic risk scores (PRS) to predict the genetic liability of human traits, including predisposition to common diseases [1]. This resurgence is fueled by several major developments: (i) thousands of reports of genome-wide association studies (GWAS) encompassing larger samples, with some studies reaching up to a million subjects [2]; (ii) new methodology for developing PRS from raw GWAS genotypes without relying solely on genome-wide significant hits [3]; and (iii) the availability of large longitudinal cohorts providing the rich phenotype and genetic data [4] needed to validate and test PRS. Validation is needed to prove that a PRS does not overfit the training data, producing inflated results, and requires a sample that is entirely separate from the training dataset to evaluate their performance. GWAS have been successful in identifying a subset of the genes and causal variants behind polygenic common diseases, such as coronary artery disease (CAD), cancers, and type 2 diabetes. It was initially hoped that once the genetic architecture of a trait was identified, the observed effects of the risk-associated alleles could be used to construct a combined score and to predict individuals at the tail ends of the risk distribution. In the early days of GWAS, the observed effects of the risk alleles were often found to be small, so more GWAS samples were aggregated to achieve greater power and more associated alleles were found, but with even smaller effects. Even when these were accounted for, only a small fraction of heritability seemed to be explained (the so-called ‘missing heritability problem’ [5]), suggesting that the hope of genetic risk prediction would never be realized. However, new methodologies that relinquished the goal of finding the complete catalog of causal genes, and instead aggregated data from a larger fraction of the genotyped variants that scored below the genome-wide significance threshold, were devised to account for undiscovered loci [6]. These approaches explained a much larger fraction of trait heritability. With larger GWAS and the advent of datasets such as the UK Biobank [4], which collected deep genetic and phenotypic data from approximately 500,000 individuals, the prospect of utilizing PRS as a clinical tool is gaining traction [1].

Defining the role of PRS in healthcare

The causation of common human diseases is complex as it results from a combination of genetic and environmental factors. A key mission of genomic medicine is to predict the genetic liability of disease on the basis of an individual’s genotype. Identifying those in the population who are at greater risk of disease can result in breakthroughs in healthcare management and can lower costs by reducing unnecessary disease burden and by introducing preemptive therapies or lifestyle changes for those at greater risk. Khera et al. [7] provide an example of how a convergence of factors is starting to realize this mission. PRS constructed from large-scale GWAS of five common diseases could identify individuals within the UK Biobank with high disease risk. The PRS for CAD, for example, found 8% of individuals in the test dataset who exhibited a threefold or more increase in risk for the disease, a fraction of the population that is 20-fold larger than that comprised of individuals carrying monogenic mutations that confer a comparable increase in disease risk. This finding suggests that if this PRS was applied in clinical care, individuals in the > 95% percentile of the CAD risk distribution could be started on statins and prescribed a healthier diet, probably preventing morbidity and untimely mortality in this population. Many more recent or upcoming studies have used similar approaches to describe PRS for a multitude of traits. And as obtaining genotype array data is becoming more inexpensive, there are now suggestions that the time has come to apply PRS in clinical care [1, 7]. But, are PRS ready for prime time?

The bias in the machine

There are several potential pitfalls in the construction of PRS that could affect how they perform in real-world clinical populations. One of the most obvious is that they suffer from the same bias that most genetics research experiences: a lack of diversity in the populations recruited for genetic studies [8]. Until recently, over 80% of participants in genetic studies have been of European descent, 14% have been Asian, and just 6% have been from other populations [8]. Disease-associated alleles can have significantly different frequencies between populations as the result of demographic events, such as migrations and population bottlenecks, which can lead to discovery bias. In addition, linkage disequilibrium-based pruning or adjustments performed as part of the construction of the PRS [3] can contribute bias, because of the limited reference haplotype panels for diverse populations. Accordingly, Martin et al. [9] reported that PRS derived from European-based GWAS show biases in different, often unpredictable, directions when tested in non-European cohorts. A recent report from Kim et al. [10] not only confirms that PRS derived from GWAS of European-ancestry samples can misestimate risk when applied to other populations, but also that the very tools used to genotype the GWAS samples contain bias and contribute significantly to the misestimation of disease risk across populations. These researchers first showed that disease allele frequencies for loci in the National Human Genome Research Institute (NHGRI) catalog of published GWAS studies differ significantly between Europeans and other populations sampled in the 1000 Genomes Project. Second, they observed that Africans exhibit significantly higher risk allele frequencies, a difference that is higher for ancestral risk alleles (i.e., the allele sequence present in hominid common ancestors) than for derived risk alleles (i.e., sequences that arose in the human population more recently). When risk alleles are binned into disease categories, those diseases with a higher proportion of causal ancestral alleles show elevated average risk allele frequencies in Africa. This skew in risk allele frequencies is sometimes discordant with known differences in disease prevalence between populations (e.g., for cardiovascular disease, African-Americans have a higher incidence but a PRS showed lower risk for Africans), implying that genetic disease risks may be misestimated, most significantly for individuals with African ancestry. Furthermore, the commercial single nucleotide polymorphism (SNP) genotyping arrays used in GWAS have a strong ascertainment bias, as these SNPs were selected from the sequencing data of a small sample of individuals, mostly of European descent. Through simulations, Kim et al. [10] show that this ascertainment bias alone can cause disease risks to be misestimated. On the other hand, simulations using whole-genome sequencing show much reduced (although not completely eliminated) biases in allele frequency differences between Africans and non-Africans, particularly when sample sizes increase. These results suggest that performing GWAS in more diverse samples, which include participants from around the world, is not sufficient to reduce discovery bias [8], because performing such studies with standard commercial SNP arrays would still result in biases. This is an important insight, as SNP arrays are inexpensive and genetic studies planned around the world are cost-constrained. Performing whole-genome sequencing in place of using SNP arrays would alleviate the ascertainment bias problem, but would increase costs by orders of magnitude. How might we resolve this dilemma?

Overcoming biases

A number of approaches have been proposed to reduce the biases in PRS with respect to their application in populations with diverse or admixed ancestry. Clearly, the inclusion of more diverse populations in GWAS and biobanking is essential to reducing biases and addressing health disparities [8]. These studies also require improved arrays designed for cosmopolitan samples and informed by diverse variant discovery efforts. Whole-genome sequencing would be the ideal platform on which to perform such studies but, until costs drop further, alternative approaches, such as low-coverage sequencing, have been proposed. Low-coverage sequencing at < 1× depth now has costs approaching those of SNP microarrays and could impute a set of genotypes with high accuracy. Imputation relies, however, on haplotype reference panels that are mostly available for individuals of European descent and East Asians, and consequently imputation into other populations is less accurate. In the absence of truly cosmopolitan GWAS data and validation cohorts, statistical adjustments of the PRS derived from European data could be applied to predict risk in other populations more closely. Kim et al. [10] suggest a method that considers whether the risk allele is ancestral or derived and show encouraging results in their simulations, but more research is needed in this area.

Towards precision health equity

Biases in genetic research have created the potential for health disparities [8]. PRS based on GWAS of European-descent cohorts could become useful in improving health outcomes for individuals from these populations, but currently may misestimate risk in admixed individuals and those of different ancestries [10]. To strive towards health equity in precision medicine and to prevent further health disparities, both study designs that include population diversity and methods to compensate for the biases incurred in constructing PRS need to be prioritized. For the sake of simplicity, we have not discussed important non-genetic sources of health disparities, including discrimination, lack of access to healthcare, and gene-by-environment interactions, which further complicate the problem at hand. Nonetheless, we remain optimistic that a concerted effort to both broaden representation in discovery cohorts and to develop tools to translate these discoveries into actionable healthcare management strategies are the way forward to improving health outcomes for all.

9 in total

1. Harnessing the information contained within genome-wide association studies to improve individual prediction of complex disease risk.

Authors: David M Evans; Peter M Visscher; Naomi R Wray
Journal: Hum Mol Genet Date: 2009-06-24 Impact factor: 6.150

2. Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations.

Authors: Alicia R Martin; Christopher R Gignoux; Raymond K Walters; Genevieve L Wojcik; Benjamin M Neale; Simon Gravel; Mark J Daly; Carlos D Bustamante; Eimear E Kenny
Journal: Am J Hum Genet Date: 2017-03-30 Impact factor: 11.025

3. Genomics is failing on diversity.

Authors: Alice B Popejoy; Stephanie M Fullerton
Journal: Nature Date: 2016-10-13 Impact factor: 49.962

Review 4. Finding the missing heritability of complex diseases.

Authors: Teri A Manolio; Francis S Collins; Nancy J Cox; David B Goldstein; Lucia A Hindorff; David J Hunter; Mark I McCarthy; Erin M Ramos; Lon R Cardon; Aravinda Chakravarti; Judy H Cho; Alan E Guttmacher; Augustine Kong; Leonid Kruglyak; Elaine Mardis; Charles N Rotimi; Montgomery Slatkin; David Valle; Alice S Whittemore; Michael Boehnke; Andrew G Clark; Evan E Eichler; Greg Gibson; Jonathan L Haines; Trudy F C Mackay; Steven A McCarroll; Peter M Visscher
Journal: Nature Date: 2009-10-08 Impact factor: 49.962

5. Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.

Authors: Bjarni J Vilhjálmsson; Jian Yang; Hilary K Finucane; Alexander Gusev; Sara Lindström; Stephan Ripke; Giulio Genovese; Po-Ru Loh; Gaurav Bhatia; Ron Do; Tristan Hayeck; Hong-Hee Won; Sekar Kathiresan; Michele Pato; Carlos Pato; Rulla Tamimi; Eli Stahl; Noah Zaitlen; Bogdan Pasaniuc; Gillian Belbin; Eimear E Kenny; Mikkel H Schierup; Philip De Jager; Nikolaos A Patsopoulos; Steve McCarroll; Mark Daly; Shaun Purcell; Daniel Chasman; Benjamin Neale; Michael Goddard; Peter M Visscher; Peter Kraft; Nick Patterson; Alkes L Price
Journal: Am J Hum Genet Date: 2015-10-01 Impact factor: 11.025

6. Genetic disease risks can be misestimated across global populations.

Authors: Michelle S Kim; Kane P Patel; Andrew K Teng; Ali J Berens; Joseph Lachance
Journal: Genome Biol Date: 2018-11-14 Impact factor: 13.583

7. Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits.

Authors: Evangelos Evangelou; Helen R Warren; David Mosen-Ansorena; Borbala Mifsud; Raha Pazoki; He Gao; Georgios Ntritsos; Niki Dimou; Claudia P Cabrera; Ibrahim Karaman; Fu Liang Ng; Marina Evangelou; Katarzyna Witkowska; Evan Tzanis; Jacklyn N Hellwege; Ayush Giri; Digna R Velez Edwards; Yan V Sun; Kelly Cho; J Michael Gaziano; Peter W F Wilson; Philip S Tsao; Csaba P Kovesdy; Tonu Esko; Reedik Mägi; Lili Milani; Peter Almgren; Thibaud Boutin; Stéphanie Debette; Jun Ding; Franco Giulianini; Elizabeth G Holliday; Anne U Jackson; Ruifang Li-Gao; Wei-Yu Lin; Jian'an Luan; Massimo Mangino; Christopher Oldmeadow; Bram Peter Prins; Yong Qian; Muralidharan Sargurupremraj; Nabi Shah; Praveen Surendran; Sébastien Thériault; Niek Verweij; Sara M Willems; Jing-Hua Zhao; Philippe Amouyel; John Connell; Renée de Mutsert; Alex S F Doney; Martin Farrall; Cristina Menni; Andrew D Morris; Raymond Noordam; Guillaume Paré; Neil R Poulter; Denis C Shields; Alice Stanton; Simon Thom; Gonçalo Abecasis; Najaf Amin; Dan E Arking; Kristin L Ayers; Caterina M Barbieri; Chiara Batini; Joshua C Bis; Tineka Blake; Murielle Bochud; Michael Boehnke; Eric Boerwinkle; Dorret I Boomsma; Erwin P Bottinger; Peter S Braund; Marco Brumat; Archie Campbell; Harry Campbell; Aravinda Chakravarti; John C Chambers; Ganesh Chauhan; Marina Ciullo; Massimiliano Cocca; Francis Collins; Heather J Cordell; Gail Davies; Martin H de Borst; Eco J de Geus; Ian J Deary; Joris Deelen; Fabiola Del Greco M; Cumhur Yusuf Demirkale; Marcus Dörr; Georg B Ehret; Roberto Elosua; Stefan Enroth; A Mesut Erzurumluoglu; Teresa Ferreira; Mattias Frånberg; Oscar H Franco; Ilaria Gandin; Paolo Gasparini; Vilmantas Giedraitis; Christian Gieger; Giorgia Girotto; Anuj Goel; Alan J Gow; Vilmundur Gudnason; Xiuqing Guo; Ulf Gyllensten; Anders Hamsten; Tamara B Harris; Sarah E Harris; Catharina A Hartman; Aki S Havulinna; Andrew A Hicks; Edith Hofer; Albert Hofman; Jouke-Jan Hottenga; Jennifer E Huffman; Shih-Jen Hwang; Erik Ingelsson; Alan James; Rick Jansen; Marjo-Riitta Jarvelin; Roby Joehanes; Åsa Johansson; Andrew D Johnson; Peter K Joshi; Pekka Jousilahti; J Wouter Jukema; Antti Jula; Mika Kähönen; Sekar Kathiresan; Bernard D Keavney; Kay-Tee Khaw; Paul Knekt; Joanne Knight; Ivana Kolcic; Jaspal S Kooner; Seppo Koskinen; Kati Kristiansson; Zoltan Kutalik; Maris Laan; Marty Larson; Lenore J Launer; Benjamin Lehne; Terho Lehtimäki; David C M Liewald; Li Lin; Lars Lind; Cecilia M Lindgren; YongMei Liu; Ruth J F Loos; Lorna M Lopez; Yingchang Lu; Leo-Pekka Lyytikäinen; Anubha Mahajan; Chrysovalanto Mamasoula; Jaume Marrugat; Jonathan Marten; Yuri Milaneschi; Anna Morgan; Andrew P Morris; Alanna C Morrison; Peter J Munson; Mike A Nalls; Priyanka Nandakumar; Christopher P Nelson; Teemu Niiranen; Ilja M Nolte; Teresa Nutile; Albertine J Oldehinkel; Ben A Oostra; Paul F O'Reilly; Elin Org; Sandosh Padmanabhan; Walter Palmas; Aarno Palotie; Alison Pattie; Brenda W J H Penninx; Markus Perola; Annette Peters; Ozren Polasek; Peter P Pramstaller; Quang Tri Nguyen; Olli T Raitakari; Meixia Ren; Rainer Rettig; Kenneth Rice; Paul M Ridker; Janina S Ried; Harriëtte Riese; Samuli Ripatti; Antonietta Robino; Lynda M Rose; Jerome I Rotter; Igor Rudan; Daniela Ruggiero; Yasaman Saba; Cinzia F Sala; Veikko Salomaa; Nilesh J Samani; Antti-Pekka Sarin; Reinhold Schmidt; Helena Schmidt; Nick Shrine; David Siscovick; Albert V Smith; Harold Snieder; Siim Sõber; Rossella Sorice; John M Starr; David J Stott; David P Strachan; Rona J Strawbridge; Johan Sundström; Morris A Swertz; Kent D Taylor; Alexander Teumer; Martin D Tobin; Maciej Tomaszewski; Daniela Toniolo; Michela Traglia; Stella Trompet; Jaakko Tuomilehto; Christophe Tzourio; André G Uitterlinden; Ahmad Vaez; Peter J van der Most; Cornelia M van Duijn; Anne-Claire Vergnaud; Germaine C Verwoert; Veronique Vitart; Uwe Völker; Peter Vollenweider; Dragana Vuckovic; Hugh Watkins; Sarah H Wild; Gonneke Willemsen; James F Wilson; Alan F Wright; Jie Yao; Tatijana Zemunik; Weihua Zhang; John R Attia; Adam S Butterworth; Daniel I Chasman; David Conen; Francesco Cucca; John Danesh; Caroline Hayward; Joanna M M Howson; Markku Laakso; Edward G Lakatta; Claudia Langenberg; Olle Melander; Dennis O Mook-Kanamori; Colin N A Palmer; Lorenz Risch; Robert A Scott; Rodney J Scott; Peter Sever; Tim D Spector; Pim van der Harst; Nicholas J Wareham; Eleftheria Zeggini; Daniel Levy; Patricia B Munroe; Christopher Newton-Cheh; Morris J Brown; Andres Metspalu; Adriana M Hung; Christopher J O'Donnell; Todd L Edwards; Bruce M Psaty; Ioanna Tzoulaki; Michael R Barnes; Louise V Wain; Paul Elliott; Mark J Caulfield
Journal: Nat Genet Date: 2018-09-17 Impact factor: 41.307

8. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations.

Authors: Amit V Khera; Mark Chaffin; Krishna G Aragam; Mary E Haas; Carolina Roselli; Seung Hoan Choi; Pradeep Natarajan; Eric S Lander; Steven A Lubitz; Patrick T Ellinor; Sekar Kathiresan
Journal: Nat Genet Date: 2018-08-13 Impact factor: 38.330

9. The UK Biobank resource with deep phenotyping and genomic data.

Authors: Clare Bycroft; Colin Freeman; Desislava Petkova; Gavin Band; Lloyd T Elliott; Kevin Sharp; Allan Motyer; Damjan Vukcevic; Olivier Delaneau; Jared O'Connell; Adrian Cortes; Samantha Welsh; Alan Young; Mark Effingham; Gil McVean; Stephen Leslie; Naomi Allen; Peter Donnelly; Jonathan Marchini
Journal: Nature Date: 2018-10-10 Impact factor: 49.962

9 in total

33 in total

1. Risk Prediction Models for Colorectal Cancer Incorporating Common Genetic Variants: A Systematic Review.

Authors: Luke McGeoch; Catherine L Saunders; Simon J Griffin; Jon D Emery; Fiona M Walter; Deborah J Thompson; Antonis C Antoniou; Juliet A Usher-Smith
Journal: Cancer Epidemiol Biomarkers Prev Date: 2019-07-10 Impact factor: 4.254

Review 2. Alternative Applications of Genotyping Array Data Using Multivariant Methods.

Authors: David C Samuels; Jennifer E Below; Scott Ness; Hui Yu; Shuguang Leng; Yan Guo
Journal: Trends Genet Date: 2020-08-06 Impact factor: 11.639

3. Variable prediction accuracy of polygenic scores within an ancestry group.

Authors: Hakhamanesh Mostafavi; Arbel Harpak; Ipsita Agarwal; Dalton Conley; Jonathan K Pritchard; Molly Przeworski
Journal: Elife Date: 2020-01-30 Impact factor: 8.140

4. From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data.

Authors: Daniele Raimondi; Massimiliano Corso; Piero Fariselli; Yves Moreau
Journal: Nucleic Acids Res Date: 2022-02-22 Impact factor: 16.971

5. Including diverse and admixed populations in genetic epidemiology research.

Authors: Amke Caliebe; Fasil Tekola-Ayele; Burcu F Darst; Xuexia Wang; Yeunjoo E Song; Jiang Gui; Ronnie A Sebro; David J Balding; Mohamad Saad; Marie-Pierre Dubé
Journal: Genet Epidemiol Date: 2022-07-16 Impact factor: 2.344

6. Multi-biomarker is an early-stage predictor for progression of Coronavirus disease 2019 (COVID-19) infection.

Authors: Zheng Zhou; Ying Li; Yuanhui Ma; Heng Zhang; Yunfeng Deng; Zuobin Zhu
Journal: Int J Med Sci Date: 2021-05-27 Impact factor: 3.738

Review 7. The Molecular Basis of Predicting Atherosclerotic Cardiovascular Disease Risk.

Authors: Matthew Nayor; Kemar J Brown; Ramachandran S Vasan
Journal: Circ Res Date: 2021-01-21 Impact factor: 17.367

8. Genome-wide Modeling of Polygenic Risk Score in Colorectal Cancer Risk.

Authors: Minta Thomas; Lori C Sakoda; Michael Hoffmeister; Elisabeth A Rosenthal; Jeffrey K Lee; Franzel J B van Duijnhoven; Elizabeth A Platz; Anna H Wu; Christopher H Dampier; Albert de la Chapelle; Alicja Wolk; Amit D Joshi; Andrea Burnett-Hartman; Andrea Gsur; Annika Lindblom; Antoni Castells; Aung Ko Win; Bahram Namjou; Bethany Van Guelpen; Catherine M Tangen; Qianchuan He; Christopher I Li; Clemens Schafmayer; Corinne E Joshu; Cornelia M Ulrich; D Timothy Bishop; Daniel D Buchanan; Daniel Schaid; David A Drew; David C Muller; David Duggan; David R Crosslin; Demetrius Albanes; Edward L Giovannucci; Eric Larson; Flora Qu; Frank Mentch; Graham G Giles; Hakon Hakonarson; Heather Hampel; Ian B Stanaway; Jane C Figueiredo; Jeroen R Huyghe; Jessica Minnier; Jenny Chang-Claude; Jochen Hampe; John B Harley; Kala Visvanathan; Keith R Curtis; Kenneth Offit; Li Li; Loic Le Marchand; Ludmila Vodickova; Marc J Gunter; Mark A Jenkins; Martha L Slattery; Mathieu Lemire; Michael O Woods; Mingyang Song; Neil Murphy; Noralane M Lindor; Ozan Dikilitas; Paul D P Pharoah; Peter T Campbell; Polly A Newcomb; Roger L Milne; Robert J MacInnis; Sergi Castellví-Bel; Shuji Ogino; Sonja I Berndt; Stéphane Bézieau; Stephen N Thibodeau; Steven J Gallinger; Syed H Zaidi; Tabitha A Harrison; Temitope O Keku; Thomas J Hudson; Veronika Vymetalkova; Victor Moreno; Vicente Martín; Volker Arndt; Wei-Qi Wei; Wendy Chung; Yu-Ru Su; Richard B Hayes; Emily White; Pavel Vodicka; Graham Casey; Stephen B Gruber; Robert E Schoen; Andrew T Chan; John D Potter; Hermann Brenner; Gail P Jarvik; Douglas A Corley; Ulrike Peters; Li Hsu
Journal: Am J Hum Genet Date: 2020-08-05 Impact factor: 11.025

9. Admixture Has Shaped Romani Genetic Diversity in Clinically Relevant Variants.

Authors: Neus Font-Porterias; Aaron Giménez; Annabel Carballo-Mesa; Francesc Calafell; David Comas
Journal: Front Genet Date: 2021-06-16 Impact factor: 4.599

10. Body Mass Index and Birth Weight Improve Polygenic Risk Score for Type 2 Diabetes.

Authors: Avigail Moldovan; Yedael Y Waldman; Nadav Brandes; Michal Linial
Journal: J Pers Med Date: 2021-06-21