Literature DB >> 35373152

From pharmacogenetics to pharmaco-omics: Milestones and future directions.

Chiara Auwerx^1,2,3,4, Marie C Sadler^2,3,4, Alexandre Reymond¹, Zoltán Kutalik^2,3,4.

Abstract

The origins of pharmacogenetics date back to the 1950s, when it was established that inter-individual differences in drug response are partially determined by genetic factors. Since then, pharmacogenetics has grown into its own field, motivated by the translation of identified gene-drug interactions into therapeutic applications. Despite numerous challenges ahead, our understanding of the human pharmacogenetic landscape has greatly improved thanks to the integration of tools originating from disciplines as diverse as biochemistry, molecular biology, statistics, and computer sciences. In this review, we discuss past, present, and future developments of pharmacogenetics methodology, focusing on three milestones: how early research established the genetic basis of drug responses, how technological progress made it possible to assess the full extent of pharmacological variants, and how multi-dimensional omics datasets can improve the identification, functional validation, and mechanistic understanding of the interplay between genes and drugs. We outline novel strategies to repurpose and integrate molecular and clinical data originating from biobanks to gain insights analogous to those obtained from randomized controlled trials. Emphasizing the importance of increased diversity, we envision future directions for the field that should pave the way to the clinical implementation of pharmacogenetics.

Entities: Chemical

Keywords: biobanks; bioinformatics; causal inference; electronic health records; genome-wide association studies; multi-omics; pharmacogenetics; pharmacogenome; pharmacogenomics

Year: 2022 PMID： 35373152 PMCID： PMC8971318 DOI： 10.1016/j.xhgg.2022.100100

Source DB: PubMed Journal: HGG Adv ISSN： 2666-2477

Introduction

Through the study of genetic determinants of drug responses, pharmacogenetics (PGx) plays a crucial role in achieving the promises of personalized medicine: providing a medical treatment tailored to one’s genetic background. PGx variants modify a drug’s pharmacokinetics by impacting its absorption, distribution, metabolism, and excretion (ADME), its pharmacodynamics by perturbing proteins involved in the drug’s mechanism of action, or immune regulation., Understanding these interactions improves treatment outcome by optimizing drug dosage and efficacy while minimizing adverse drug reaction (ADR) risk. To date, the Clinical Pharmacogenetics Implementation Consortium (CPIC) reports almost 450 gene-drug interactions. Eighty-three of these, involving 22 genes and 63 drugs, are annotated with the highest level of confidence and have prescription guidelines. These 63 drugs represent a small fraction of approved drugs but form a large portion of prescribed drugs,, so that 35%–65% of the population has been exposed to at least one prescription drug with PGx indication.6, 7, 8 Notwithstanding, clinical implementation of PGx has been slow, with concerns regarding clinical validity and cost-effectiveness, infrastructure and data management, lack of awareness and education of health professionals, and ethical and regulatory issues being identified as the main barriers. To address these concerns, numerous PGx initiatives have been undertaken. So far, these revealed that pre-emptive genetic testing is cost effective in most situations, virtually benefits all given that 95% of individuals present at least one clinically actionable PGx variant, and could improve drug dosage while reducing ADRs. As encouraging results from these initiatives bring us a step closer to a widespread clinical implementation of PGx, this review takes a step back to discuss key milestones in the development of PGx methodology aiming at the identification, functional validation, and mechanistic understanding of clinically actionable germline variants (Figure 1). Purposefully omitting advances in oncology, where the consideration of somatic mutations adds an additional layer of complexity, we selected key studies that illustrate these milestones, highlighting how they have shaped PGx. The scientific community increasingly benefits from data streaming from high-throughput experiments, as well as biobanks coupling genetic information to intermediate molecular phenotypes and electronic health records. We emphasize remaining limitations and areas of active research, discussing strategies and methods that have not yet been fully developed and applied to PGx, envisioning the future of the field.

Figure 1

Overview of the chronological development of pharmacogenetics methodology

Each section of this review deals with a major PGx milestone: pharmacogenetics, pharmacogenomics, and pharmaco-omics. Listed are some of the main tools and approaches that were instrumental in the development of PGx at various stages. They will be discussed in their respective sections.

Overview of the chronological development of pharmacogenetics methodology Each section of this review deals with a major PGx milestone: pharmacogenetics, pharmacogenomics, and pharmaco-omics. Listed are some of the main tools and approaches that were instrumental in the development of PGx at various stages. They will be discussed in their respective sections.

The origins of modern pharmacogenetics

The genetic basis of drug responses

The term “pharmacogenetics” was coined in 1959 by Friedrich Vogel to describe the emerging concept that drug reactions are under genetic control., This idea originated from a series of observations indicating that individuals react differently when exposed to comparable amounts of a drug. For instance, 10% of African Americans develop hemolytic anemia following treatment with the antimalarial drug primaquine, an ADR rarely observed in individuals of European ancestry (MIM: 300908, 305900). This observation was in line with a genetic origin of the ADR, as narrated by one of the founders of PGx, Arno Motulsky: “since a given gene may be more frequent in certain ethnic groups, any drug reaction that is more frequently observed in a given racial group, when other environmental variables are equal, will usually have a genetic basis.” The intuition was confirmed decades later, as studies revealed a higher prevalence of causative glucose-6-phosphate dehydrogenase deficiency in Africans, tightly linking PGx with population genetics from the onset (Box 2). Allele: one of two or more versions of a genetic region. Biomarker: measure of a biological state that is indicative of a physiological or pathological process. Diplotype: combination of two haplotypes present in an individual. Epistasis: phenomenon whereby the phenotypic consequence of a genetic variant is dependent on the presence of one or multiple other genetic variants (non-additive effects; gene-by-gene interactions). Expression quantitative trait locus (eQTL): genetic locus harboring markers that explain a certain proportion of the variance in expression of a given gene. Fine-mapping: statistical approach aiming at determining the most likely causal genetic variant(s) for an observed association signal. Forward genetics: identification of the genetic basis of a phenotype. Genetic architecture: underlying genetic basis of a phenotype (e.g., monogenic, polygenic, ...). Genetic reference population: collection of strains from a model organism that were generated through specific breeding and for which broad genotype and phenotype data are available. Genome-wide significance: threshold of significance that accounts for the fact that genome-wide association studies (GWASs) perform a large number of associations on not fully independent genetic variants. Typically, p ≤ 5 × 10−8 is used. Haplotype: group of alleles that are inherited together from a single parent. Heritability: proportion of phenotypic variation in a given population that is due to genetic variation. Imputation: inference of genetic variants that were not directly genotyped using haplotype reference panels. Imputation is possible as blocks of variants in linkage disequilibrium (LD) tend to be co-inherited, so that the presence of a given variant is predictive of the genotype at an adjacent location. Linkage disequilibrium (LD): non-random association of alleles at two or more loci in a population. Missense variant: SNV leading to the substitution of a different amino acid in the resulting protein. Monogenic trait: trait influenced by genetic variants in or near a single gene. Oligogenic trait: trait influenced by genetic variants in or near a few genes. Pleiotropy: phenomenon whereby a single gene or variant affects multiple phenotypes. Polygenic score (PGS): estimate of an individual’s genetic liability to a trait calculated as the weighted sum of phenotype-associated variants, weights being the effect sizes from GWAS summary statistics. Polygenic trait: trait influenced by genetic variants in or near a large number of genes. Polypharmacy: co-prescription of several drugs. Population differentiation: accumulation of differences in allele frequencies between populations with restricted gene flow due to natural selection or genetic drift. Population stratification: presence of sub-populations with different allele frequencies within a seemingly homogeneous population. If not accounted for, population stratification can lead to spurious genotype-phenotype associations. Reverse genetics: identification of the phenotypic consequence(s) of a particular genetic variant. Sequencing coverage: number of unique reads that include a particular base pair. Single nucleotide variant (SNV): substitution of a single nucleotide at a specific genomic location. Star allele: allele following the standardized nomenclature for genetic polymorphisms in pharmacogenetics (PGx) genes. Star alleles are not restricted to single variants but can refer to functional haplotypes of PGx genes. Summary statistics: aggregated GWAS results lacking individual genotype and phenotype data and typically including the variant’s identifier, genomic location, reference and alternative allele, and their respective frequencies, a metrics for the strength of association with incertitude and a p value. Ability to assess the phenotypic impact of a variant depends on its frequency in the studied population. Early on, it was observed that different populations do not respond equally to certain drugs. Later, microarray studies revealed population differentiation across ADME genes, and sequencing studies generated global maps of PGx allele frequencies., Following, a gene-based population differentiation method that optimally accounts for rare variants showed that 10% of investigated PGx genes exhibit high population differentiation. Despite growing awareness around the importance of inclusiveness, the majority of genetic studies remain Eurocentric, undermining the ability to discover new PGx variants and implement equitable clinical guidelines. A 2021 systematic review showed that 88% of discovery PGx genome-wide association studies (GWASs) participants were of European ancestry. This bias is apparent in the largest PGx study to date: 17% of the UK Biobank participants carry at least one undocumented deleterious variant in the 14 analyzed PGx genes, the majority of which being of non-European ancestry. Similarly, 72% of the participants enrolled in clinical trials leading to US Food and Drug Administration (FDA) drug approval in 2019 were of white ancestry. Biased and underpowered, such studies miss or underestimate the strength of robust PGx interactions involving variants predominantly observed in minority populations. This results in poor generalizability of findings, which translate into healthcare disparities. For example, polygenic scores (PGSs) (Polygenic pharmacogenomics) ought to play a major role in personalized medicine but raise ethical concerns due to poor accuracy in individuals of non-European ancestry. Studies discussed in this review illustrate how increased diversity can benefit PGx. Including participants of both European and African ancestry, it was shown that the strength of association between hepatitis C treatment efficacy and IFNL3 genotype was comparable in both populations and that the increased prevalence of the unfavorable IFNL3 genotype in Africans explained a large fraction of the differences in treatment responsiveness across groups (The first pharmacogenomics GWAS). In another example, three out of five regions associated with bronchodilators response in asthmatic children were discovered by analyzing minority populations. However, these proved difficult to validate due to the absence of replication cohorts (The human pharmacogenomic landscape). A third study proposes a mechanistic model to improve warfarin dose prediction by analyzing both European and African American cohorts (Molecular modelling of pharmaco-omics interactions). Trained on each ancestry group, the model was used to predict maintenance dose in an ancestry-matched and cross-ancestry manner, yielding improved results compared with existing algorithms. This study highlights how a mechanistic approach to modeling combined with a limited number of samples of different ancestries allows to broaden model applicability to other ethnic groups. Overall, inclusiveness can be applied to a wide range of methodologies, leading to knowledge that benefits all. Acknowledging the role of genetics in drug response promoted usage of classical human genetics tools to the study of PGx. Twin studies could verify that a drug response was under genetic control by estimating its heritability, while family studies allowed inferring inheritance patterns of drug responses. For instance, inability to metabolize the antihypertensive drug debrisoquine and the sodium channel blocker sparteine follows an autosomal recessive pattern of inheritance (MIM: 608902). Inability to metabolize these compounds is strongly correlated and was believed to be controlled by the recently discovered cytochrome P450 (CYP) system., As oxidative metabolism of other drugs seemed unaffected, it suggested that only one enzyme of the CYP system was dysfunctional. Overall, these pioneering studies had the merit to confirm the underlying genetic basis of differential drug response, forming the foundations of PGx.

The candidate gene approach

A drawback of early PGx studies was their inability to map differential drug responses. Progress in molecular biology and improved understanding of cellular and physiological processes set the stage for candidate gene studies (CGSs), which use prior knowledge to identify PGx interactions typically involving common or large-effect-size variants. After selecting candidate genes based on reported roles in drug ADME or relevant biological pathways, association between drug response and genetic variants within these genes is assessed. For instance, the etiology of the debrisoquine and sparteine non-metabolizer phenotype was elucidated by purifying the relevant CYP and cloning the relevant alleles. This gene later came to be known as CYP2D6 (MIM: 124030), one of the best studied PGx genes. Human CYP2D6 is part of a family of 57 putatively functional genes, a dozen of which being responsible for the oxidative biotransformation of 70%–80% of all drugs, arguably making them the most important PGx gene family. Other CGSs elucidated the genetic cause of abacavir hypersensitivity (MIM: 142830), a severe multisystemic response experienced by 5% of human immunodeficiency virus (HIV)-infected patients treated with the antiviral drug. Suspicion of a genetic origin for the ADR led two groups to assess variants in genes involved in abacavir metabolism and immune response, jointly identifying HLA-B∗5701 as the main genetic risk factor for abacavir hypersensitivity., These success stories sparked the idea that new PGx knowledge could be brought into the clinic to optimize treatment. Given genotyping costs in the early 2000s, it was crucial to validate interactions before adopting systematic screening. Most PGx interactions unraveled by CGS resulted from retrospective case-control studies comparing the frequency of suspected genetic variants between a group that experienced the drug response and a group of matched controls that did not. Despite being cost effective, their retrospective nature makes them prone to various types of biases. They also suffer from general challenges associated with epidemiological study design, such as improper definition of the outcome phenotype, lack of power due to inadequate sample size, or failure to control for stratification or pleiotropy. Hence, many associations failed to replicate and further proof was warranted prior to clinical implementation.

Randomized controlled trials

Randomized controlled trials (RCTs) have become the gold standard of evidence-based medicine over the course of the 20th century. PGx RCTs allow measuring the effects of the genotype, treatment, and their interaction, providing causal evidence for the use of a genotype-informed treatment plan. Ideally, participants are randomized to receive either genotype-guided or conventional therapy (Figure 2A). Alternatively, when the genotype of interest is rare or when several treatments are to be compared, participants can be stratified based on their genotype and randomized within each stratum (Figure 2B). Following two independent reports of HLA-B∗5701 as the main genetic risk factor for abacavir hypersensitivity,, a prospective RCT set out to investigate the clinical benefits of HLA-B∗5701 screening by randomizing 2,000 HIV-positive patients according to the design of Figure 2A. While all participants in the control group received abacavir, individuals in the intervention group were genotyped so that only HLA-B∗5701-negative patients received the drug. The study demonstrated that pre-emptive screening reduced the incidence of clinically diagnosed abacavir hypersensitivity from 7.8% to 3.4% (p 0.001) and eliminated immunologically confirmed abacavir hypersensitivities (2.7% in controls; p 0.001), prompting the establishment of clinical screening guidelines. Recently, the impact of these guidelines was evaluated, showing that, since their introduction, the proportion of patients undergoing pre-emptive screening steadily increased, while ADR incidence decreased 6-fold.

Figure 2

Possible study designs for pharmacogenetic randomized controlled trials

(A) Participants are randomly assigned to an intervention or control group. In the former, participants are screened for the presence of the PGx variant of interest (star). Negative individuals receive standard treatment (black), while positive ones receive an adapted alternative treatment (green). In the control group, all individuals receive conventional treatment and genetic screening is performed post hoc. The number of participants exhibiting the response of interest (red) is assessed and compared among groups.

(B) Participants are stratified based on the presence or absence of the PGx variant of interest. In each strata, participants are randomly assigned to an intervention group, which receives standard treatment (black) and a control group, which receives a placebo or alternative treatment (white). The number of participants exhibiting the response of interest is assessed and compared among groups and strata.

Possible study designs for pharmacogenetic randomized controlled trials (A) Participants are randomly assigned to an intervention or control group. In the former, participants are screened for the presence of the PGx variant of interest (star). Negative individuals receive standard treatment (black), while positive ones receive an adapted alternative treatment (green). In the control group, all individuals receive conventional treatment and genetic screening is performed post hoc. The number of participants exhibiting the response of interest (red) is assessed and compared among groups. (B) Participants are stratified based on the presence or absence of the PGx variant of interest. In each strata, participants are randomly assigned to an intervention group, which receives standard treatment (black) and a control group, which receives a placebo or alternative treatment (white). The number of participants exhibiting the response of interest is assessed and compared among groups and strata. RCTs provide strong support of causality between an exposure and an outcome, but their role in PGx has been called into question., RCTs present multiple limitations, often related to their high costs. Systematic investigation of 138 clinical trials estimated the median cost of a trial to $19 million, with costs increasing with participant number and the inclusion of a randomized control group. Studying rare PGx variants or drug response requires large sample sizes and quickly becomes financially unsustainable. This problem is illustrated with the TAILOR-PCI RCT that found no statistically significant reduction in major adverse cardiovascular events (MACEs) (12 months follow-up) upon CYP2C19-guided prescription of anti-platelet medication (hazard ratio = 0.66; 95% confidence interval [CI]: 0.43–1.02; p = 0.06). Clopidogrel is the most widely prescribed anti-platelet medication, but due to genetically driven variability in drug response, it is suspected to increase the risk for MACEs. As CYP2C19 (MIM: 124020, 609535) is required to activate clopidogrel, CYP2C19 loss-of-function (LOF) carriers in the genotype-guided group were prescribed ticagrelor instead of clopidogrel. Despite its large size (n = 5,302), the trial was only powered (85%) to detect a minimum hazard ratio of 0.50, so that the detected trend did not reach significance. Another limitation is that only two LOF alleles were considered, although recent evidence demonstrated a polygenic architecture of clopidogrel response (Polygenic pharmacogenomics). Considering these results, the US Food and Drug Administration (FDA) does not require genetic testing prior to clopidogrel treatment initiation, nor do the American College of Cardiology and European Society of Cardiology recommend routine CYP2C19 testing. Notwithstanding, recent meta-analyses suggest that CYP2C19-guided prescription could identify patients benefiting from alternative anti-platelet medication,, prompting the CPIC to update its recommendations in 2022, thereby providing guidance to clinicians on how to interpret CYP2C19 genetic test results. Concurrently, a subset of the TAILOR-PCI RCT has increased the follow-up period to 24 months, additionally performing a feasibility study to assess digital solutions allowing more efficient and cost-effective patient follow-up and study design. Besides the costs and hurdles associated with generating prospective evidence of clinical utility, there is a lack of incentive for the pharmaceutical industry to conduct PGx RCTs, as patents for most drugs with PGx guidelines—including clopidogrel—have expired, making them unprofitable., Another recurrent issue is the lack of diversity (Box 2). Finally, PGx RCTs are associated with serious ethical concerns: Rarely performed without prior knowledge, some study designs (e.g., Figure 2B) expose patients with a likely actionable variant to a treatment that puts them at risk of ADRs., Slow clinical implementation of PGx is often credited to lack of supporting evidence by RCTs, which can partially be attributed to the limitations described above. RCTs remain an important epidemiological tool, as demonstrated by the more than 800 PGx trials reported over the last 20 years, of which 155 are still recruiting or ongoing. However, RCTs should not be the sole approach to validate PGx interactions. Proof of efficacy of genetic testing often needs to meet higher standards compared with non-genetic tests, and most physicians unconsciously practice PGx by adjusting patient’s treatments. To ensure transfer of PGx knowledge from bench to bedside while maintaining rigorous standards of clinical efficacy, it will be important to find more efficient ways to conduct RCTs and broaden the spectrum of accepted proof of evidence. In Leveraging observational data to gain causal insights, we introduce statistical tools leveraging large-scale observational data to approximate the effects of RCT studies.

The genomic revolution

The pharmacogenome

Sequencing of the human genome in the early 2000s, brought forward the concept of the druggable genome, originally defined as genes whose product can or are predicted to be bound by small molecules. Besides providing a blueprint to map genes and variants associated with drug responses, access to the reference genome catalyzed waves of technological progress in genomics (Box 3). A common theme is the possibility to assess more—and potentially novel—variants with higher accuracy and at ever lower prices, leading to the metamorphosis of pharmacogenetics into pharmacogenomics. Conceptually similar and therefore both referred to as PGx hereafter, pharmacogenomics aims at discovering genetic variants affecting drug response by probing the entire genome, as opposed to a few candidate loci. This progress enabled the exploration of the pharmacogenome, which we define as regions of the human genome that influence response to medication. Microarray genotyping: Genomic variability can be assessed with genotyping microarrays that probe the identity of 250,000–4,000,000 predefined single nucleotide variants (SNVs). Diverse, high-coverage, haplotype reference panels are used to impute variants at positions that were not genotyped. In spite of increasing the multiple-testing burden, boosting the number of assessed SNVs increases power, improves fine-mapping, and facilitates meta-analyses. Imputed microarray data allow the assessment of millions of SNVs for a fraction of sequencing costs. Short-read sequencing: A major breakthrough came from the development of high-throughput next-generation sequencing (NGS) methods that implement massively parallel sequencing of short DNA fragments. Applications include whole-genome sequencing (WGS), which assesses the entire genome, and whole-exome sequencing (WES), which uses capture and enrichment methods to selectively sequence exons, the protein coding regions of the genome. Because exons represent 1% to 2% of the human genome, for the same amount of sequencing, WES can increase coverage or sample number. Cost effective and better suited to assess rare variants and diverse populations, low coverage sequencing ( 1×) combined to imputation represents a promising alternative to microarrays., Long-read sequencing: Bypassing the need for polymerase chain reaction, third-generation sequencing based on long reads reduces the risk of amplification-related errors and biases. By sequencing DNA fragments over 1 Mb long,, this technology can accurately resolve complex regions, call haplotypes, and map structural variants (e.g., copy-number variants). Long-read sequencing can also detect base modifications, providing further insights into the analyzed sequence. CRISPR-based genome editing: CRISPR technologies allow to assess the impact of precise genetic modifications, ranging from point mutations, to whole-chromosome deletions,, in physiological settings. CRISPR tools are available for most model organisms, cell lines, and patient-derived cells, with a growing number of functionalities extending beyond simple gene editing.73, 74, 75 It is possible to generate whole-body or tissue-specific gene knockins and knockouts, combine both approaches to create humanized models by excising the endogenous gene and replacing it by its human version, or perform epigenetic editing. Several alterations can be induced simultaneously, opening the door to the study of epistatic interactions. For PGx to embrace the genomic revolution, ways to effectively organize, store, and share newly generated knowledge were required, leading to the creation of PGx databases (Table 1). The further the field advances, the more it will rely on the presence of centralized, well-structured, and inter-connected resources. Search terms pharmacogenetics/pharmacogenomics returned over 30,000 PubMed entries published between 2000 and 2020, testifying how the genomic revolution allowed the collection of data and knowledge at an unprecedented scale.

Table 1

List of five major pharmacogenetics knowledge databases and resources

Name	Aim	Reference and website
CPIC: The Clinical Pharmacogenetics Implementation Consortium	facilitate the clinical implementation of PGx by generating curated, evidence-based, and updated guidelines that provide prescription recommendations for gene-drug pairs	Relling and Klein⁷⁶, Relling et al.³, https://cpicpgx.org/
DGIdb: The Drug Gene Interaction database	bridge the gap between drug discovery and PGx by integrating data from other databases, such as PharmGKB, Therapeutic Target Database,⁷⁷ DrugBank,⁷⁸ etc.	Griffith et al.⁷⁹, Freshour et al.⁸⁰, https://www.dgidb.org/
PGRN: The Pharmacogenomics Global Research Network	PGx hub with the aims of• providing a platform for the PGx community • promoting and advancing research in PGx • bringing awareness to the importance of PGx Coordinates PharmGKB, CPIC, and PharmVar	Relling et al.⁸¹, https://www.pgrn.org/
PharmGKB: The Pharmacogenetics Knowledge Base	first centralized gene-drug interaction database aiming at linking genomic data to molecular and cellular phenotypes, as well as to clinical information	Whirl-Carrillo et al.⁸², Whirl-Carrillo et al.⁸³https://www.pharmgkb.org/
PharmVar: The Pharmacogenetic Variation Consortium	provide a centralized repository for all PGx variants and a standardized nomenclature for PGx alleles	Gaedigk et al.⁸⁴, Gaedigk et al.⁸⁵, https://www.pharmvar.org/

List of five major pharmacogenetics knowledge databases and resources providing a platform for the PGx community promoting and advancing research in PGx bringing awareness to the importance of PGx

Genome-wide association studies

Genome-wide association studies (GWASs) are observational studies wherein a large set of genetic variants spread across the genome, typically single nucleotide variants (SNVs) assessed with microarrays (Box 3), is scanned to identify associations with a phenotype. Hence, GWASs mitigate biases inherent to CGSs, enabling data-driven discovery of unsuspected PGx variants.

The first pharmacogenomics GWASs

One of the first PGx GWASs found that each additional copy of the SLCO1B1 rs4149056 TC allele increased the odds ratio for statin-induced myopathy by 4.5 (95% CI: 2.6–7.7; p = 2 × 10−9; MIM: 604843). If the gene was known to encode a hepatic statin transporter, it had not been linked to the ADR. Due to the rarity of the ADR, the GWAS was conducted within the larger SEARCH RCT, which followed 12,000 participants exposed to different doses of statins over the course of 6 years, providing access to matched controls. Case number was doubled by considering both individuals with definite and milder myopathy, highlighting the importance of adequate phenotype definition. These steps were crucial in gathering enough participants to have the statistical power to detect associations. The challenge is exacerbated in PGx studies, as only a fraction of individuals take the drug of interest and even a smaller fraction might develop a given ADR. Review of 23 PGx GWASs conducted between 2007 and 2010 showed that, on average, drug-response GWASs had 570 participants and ADR GWASs a case/control ratio of 70/120. Estimates from 2015 to 2020 show that these numbers are increasing (median: 1,220 participants) but remain 10 times lower than for non-PGx GWASs. An indirect consequence is that fewer GWASs have investigated drug responses., To overcome this hurdle, PGx GWASs can be performed within the context of RCTs or conducted by international consortia (e.g., PGRN-RIKEN), facilitating the recruitment of large cohorts. So far, most examples focused on ADRs, which are easy to identify but rare. Conversely, providing sufficient technical and financial means, drug response can be assessed in all treated subjects, offering greater statistical power to detect PGx interactions. One of the first drug responses investigated by GWAS was warfarin maintenance dose., At the time, warfarin was the most widely used oral anticoagulant, with prescription rates totaling 0.5%–1.5% of the population. The therapeutic window of warfarin is narrow and exhibits inter-individual variability, about 50% of which being explained by patient-specific demographics, clinical factors, and genetic background. Based on prior knowledge of warfarin metabolism (Figure 3), CGSs demonstrated the independent role of polymorphisms in VKORC1 (MIM: 608547) and CYP2C9 (MIM: 601130) in determining an individual’s drug dosage and ADR risk due to drug concentrations outside the therapeutic index. To determine whether other genes were implicated, a GWAS on warfarin dosage was conducted (n = 181), identifying a single genome-wide significant signal near VKORC1. A second GWAS with improved power (n = 1,053) identified significant signals for VKORC1, CYP2C9, and CYP4F2 (MIM: 604426). The fact that these genome-wide studies did not reveal new genetic risk factors reflects that candidate genes with mechanistic support are bound to have strong PGx effect and will have the highest power to be detected by GWASs. Nevertheless, they corroborated earlier findings through an orthogonal method, excluding the presence of additional large-effect variants.

Figure 3

Warfarin pharmacogenetics

Main enzymes (orange) encoded by genes whose polymorphisms (star) affect warfarin (red) dosage. Warfarin inhibits VKORC1, an enzyme that reduces vitamin K1 2,3 epoxide to vitamin K, so that the latter can act as a cofactor for clotting factor activation. CYP2C9 is the main enzyme involved in the inactivation of warfarin. CYP4F2 is involved in vitamin K1 catabolism. Metabolites are in square boxes.

Warfarin pharmacogenetics Main enzymes (orange) encoded by genes whose polymorphisms (star) affect warfarin (red) dosage. Warfarin inhibits VKORC1, an enzyme that reduces vitamin K1 2,3 epoxide to vitamin K, so that the latter can act as a cofactor for clotting factor activation. CYP2C9 is the main enzyme involved in the inactivation of warfarin. CYP4F2 is involved in vitamin K1 catabolism. Metabolites are in square boxes. GWASs also discovered new PGx interactions. Before 2011, hepatitis C was treated with a poorly tolerated combination of pegylated interferons and ribavirin that failed to clear the infection in 50% of patients. Differential success rate across populations suggested a genetic origin for treatment efficacy (Box 2), prompting analysis through GWASs and revealing an association signal near IFNL3 (previously IL28B; MIM: 607402, 609532).,96, 97, 98 Arrival of direct-acting antivirals improved treatment success rate, especially in patients with unfavorable IFNL3 genotype. Nowadays, new generations of well-tolerated and highly effective direct-acting antiviral regimens bypass the need for pegylated interferons altogether. This highlights an alternative to PGx screening: identifying drugs that elicit a more homogeneous response. Not always available, this alternative can lead to other issues, as new antivirals are associated with drug-drug interactions, themselves influenced by PGx. Together, GWASs are an important complement to hypothesis-driven CGSs and have discovered or replicated 586 unique drug-variant associations in the 14 years since the first published PGx GWASs.

Polygenic pharmacogenomics

A long unanswered question relates to the genetic architecture of PGx traits. In 1918, R.A. Fisher showed that continuous traits can be explained through Mendelian inheritance of many genetic variants with a genuine, albeit small, effect. Since, the polygenic nature of complex traits has become widely accepted and pushed to the extreme with the omnigenic model, which hypothesizes that all genes expressed within a phenotype-relevant cell type contribute to the phenotype in question., Despite the quantitative nature of drug responses, early GWASs pointed to a mono- or oligogeneic architecture governed by few large-effect-size variants detectable by CGSs or small GWASs. Demonstrating polygenicity in PGx traits has not been straightforward. In the 2000s, CGSs implicated several genes in the response to the antidiabetic drug metformin,105, 106, 107, 108 but these loci were not sufficient to explain the estimated 34% heritability of the drug response. Instead, chromosome-wise heritability estimates pointed at the presence of multiple small-effect variants spread across the genome. In the following years, independent, better-powered GWASs implicated an increasing number of genes in metformin response,110, 111, 112, 113 supporting the predicted polygenic architecture. More recently, heritability of 12 PGx traits was estimated between 5% and 59%, with 62%–95% of the heritability being attributed to moderate- and small-effect variants. These results suggest a highly polygenic architecture for these traits, indicating that larger GWASs are likely to reveal further associations. Improving genotype-based patient classification, polygenic scores (PGSs) have gained popularity to estimate the cumulative effect of genome-wide genetic variation on a phenotype. Given their success in identifying individuals at risk for common diseases, it is of interest to determine whether PGSs can help predict an individual’s drug response. So far, applications in PGx are sparse and have yielded mixed results, as highlighted in a 2021 comprehensive review of 51 PGx PGS studies. One of the most widely studied trait is response to antidepressants. Despite being an excellent candidate trait given the large number of treatment-refractory patients and a high heritability, only recently was its polygenic basis confirmed., Other studies started demonstrating the usefulness of PGx PGSs. Prior knowledge was used to design a PGS for ADR risk to the previously discussed anti-platelet drug clopidogrel. Six SNVs associating with on-treatment platelet reactivity were used to build a predictive PGS for MACEs, despite none of the variants associating with the trait, thereby demonstrating the predictive value of small cumulative effects. Another study developed a series of PGSs for drug-induced liver injury (DILI), a rare and genetically determined ADR induced by a wide range of drugs., In addition to validation in an independent dataset, PGSs were tested in liver organoids and primary hepatocytes, revealing shared predictivity across different drugs. Cell-based and transcriptomics assays further provided mechanistic insights. Free from the ethical and technical considerations associated with human studies, this “polygenicity-in-a-dish” approach allows to investigate the downstream impact of diverse genetic profiles in a tissue-specific and environmentally controlled way. Similar approaches with tailored follow-up assays might be applied to study other PGx traits. Beyond classical challenges (sample sizes, phenotype definition, integration of rare variants, etc.), limitations intrinsic to microarray data will have to be overcome to fully understand the genetic architecture of PGx traits. It is difficult to design probes for the highly polymorphic, repetitive, or complex regions many PGx-relevant loci map to (e.g., CYPs and HLA). Though newer arrays include more PGx content, they do not yet provide perfect coverage of known PGx alleles. Furthermore, microarrays designed to assess common variants in Europeans provide suboptimal coverage in other populations (Box 2). This is mitigated by new arrays optimized for population-specific or multiethnic studies and ever larger and more diverse reference panels (e.g., TOPMed Imputation Reference panel) that improve imputation quality across diverse ancestries. Lastly, microarrays are not suited to accurately assess rare variants,, despite evidence suggesting that these might account for some of the missing heritability in PGx traits.

Next-generation sequencing in pharmacogenomics

Unlike genotyping, sequencing assesses the exact order and identity of all nucleotides within a DNA segment (Box 3), allowing the discovery of unreported genetic variants. With decreasing sequencing price, it becomes possible to (1) increase sequencing coverage, leading to improved accuracy, and (2) sequence individuals from diverse populations, leading to the identification of new genetic variants.

The human pharmacogenomic landscape

Next-generation sequencing (NGS) has widely been used to gauge the mutational landscape of known PGx genes. If non-targeted approaches, such as whole-exome sequencing (WES) or whole-genome sequencing (WGS) (Box 3) allow the unbiased assessment of the mutational spectrum among all human genes, resources can be optimized by evaluating with increased coverage a set of genes likely to yield new insights. For instance, PGRNseq, a capture panel that sequences 84 PGx genes with ultra-deep 500× coverage, will assess PGx variation in the electronic Medical Records and Genomics (eMERGE)-PGx project (n 9,000)., Another study (n = 14,002) analyzed rare variants in 202 drug target genes, suggesting that rare variants are both numerous and population specific. Genetic characterization of the 57 human CYP genes revealed over 6,000 new SNVs. Similar studies sampling individuals from various ethnicities were later conducted for a broader spectrum of PGx genes, coming to the conclusion that 90% of SNVs in these genes are rare., Despite their low frequency, rare variants strongly contribute to functional variability,, many being predicted to be deleterious. At fixed effect size, sample size required to detect an association increases with decreasing allele frequency, so that GWASs lack the power to detect rare variant associations. This led to the development of statistical methods for cumulative rare variant association testing (Box 4), but few PGx studies have made use of them so far. Recently, optimal unified sequence kernel association test (SKAT-O) was used on WGS data to identify genetic determinants for bronchodilators response in 1,441 asthmatic children of diverse ethnicities (Box 2). Five significant SKAT-O regions were identified at non-ADME loci, explaining 4%–8% of the phenotypic variation. This exemplifies how new PGx discoveries can be made by promoting diversity and investigating rare variants in genes lacking prior known PGx interactions. Burden tests collapse rare variants within a predefined region or gene into a genetic score. As they falsely assume the same direction and magnitude of effect for all grouped variants, they can be inaccurate. For instance, variants in CYP2C9 can increase, decrease, or abolish CYP2C9’s ability to metabolize warfarin. Variance-component tests investigate rare and common variants within a region simultaneously and have increased power in the presence of opposite effect variants or when only a small fraction of the variants is causal. This comes at the cost of requiring larger sample sizes, which can be challenging in PGx. The best known variance-component test is the sequence kernel association test (SKAT). Combined tests combine features from both above-mentioned tests, increasing robustness and performance in small sample sizes. The most widely used is the optimal unified SKAT test (SKAT-O).

The non-coding pharmacogenome

Despite most GWAS signals mapping to non-coding regions, fine-mapping and mechanistic interpretation of these signals remains difficult. In the previously discussed association between SLCO1B1 and statin-induced myopathy, the variant originally identified by the GWAS was intronic. Only after linkage disequilibrium (LD) analysis was the likely causal missense variant, rs4149056, identified. PGx associations mapping to non-coding regions could be explained by synthetic associations, a concept referring to the apparent association of a common marker with a trait resulting from that marker being in LD with one or multiple unobserved, rare, causal variants. Drug-response-associated non-coding variants could simply tag the deleterious mutational burden in corresponding PGx genes, even if it has been argued that synthetic associations created by rare variants are unlikely to explain most GWAS results. Non-coding variants can mechanistically influence drug response, e.g., by influencing the binding of drug-regulated transcription factors. This concept is demonstrated by a variant (rs4743771 CA) affecting the response to rosiglitazone, an antidiabetic drug associated with adverse effects, such as increased cholesterol levels. Rosiglitazone mediates its therapeutic effect by binding and activating the nuclear receptor PPARγ (MIM: 601487), which induces the expression of multiple genes, including the cholesterol efflux transporter ABCA1 (MIM: 600046). Chromatin immunoprecipitation sequencing showed that A/A carriers lack a rosiglitazone-induced PPARγ-binding site and fail to induce expression of nearby ABCA1 upon administration of the drug. Confirming the clinical relevance of the variant, diabetic patients with the corresponding genotype experienced a reduced rosiglitazone-induced increase in cholesterol levels. Overall, combining NGS to biochemical and molecular assays can shed light on PGx variants in regulatory regions. Another emerging mechanism of action is via long non-coding RNAs (lncRNAs) modulation. While this area of research remains in its infancy, it was shown that the transcription factor HNF1α (MIM: 142410) and the lncRNA HNF1α-AS1 form a regulatory network that controls the expression of several CYP genes.,

Elucidating complex pharmacogenomic regions

Obtaining fine-scaled sequencing of PGx genes residing in complex genomic regions remains challenging. Previously mentioned CYP2D6 is notoriously difficult to analyze with NGS due to the presence of homologous pseudogenes and a propensity for structural rearrangements. The gene is also highly polymorphic, with 149 alleles listed on PharmVar. Long-read sequencing workflows (Box 3) have been implemented to resolve CYP2D6 haplotypes and diplotypes.150, 151, 152, 153, 154 Besides recovering known polymorphisms, these studies report a large fraction of new variants. As CYP2D6 metabolizes 25% of commonly prescribed drugs and that CYP2D6 genotype directly influences drug metabolism efficiency, obtaining accurate diplotypes could improve CYP2D6 metabolizer phenotype prediction. While long-read sequencing technologies offer exciting perspectives and first population-scale applications are emerging, they remain expensive and limited by low throughput and high error rates.

Experimental pharmacogenomics

Experimental settings provide precise control over genetic and environmental factors (e.g., diet, lifestyle, medication, and combinations thereof) and reduce the influence of confounders, making it possible to establish causal genotype-phenotype relationships and study complex phenomena, such as gene-by-gene (i.e., epistasis), gene-by-environment, and gene-by-sex interactions. The importance of understanding these effects is highlighted by an epidemiological study that found female sex and polypharmacy as the strongest predisposing factors for ADRs. Pharmacological differences between sexes have long been acknowledged, and there is emerging evidence that these might influence PGx., Moreover, polypharmacy might impact ADR risk through drug-drug interactions and drug-drug-gene interactions. Model organisms offer a convenient setting to disentangle these complex interactions through two experimental strategies (Figure 4).

Figure 4

Schematic representation of two experimental approaches to pharmacogenetics

The reverse genetics approach starts with a genetic variant of interest and aims at characterizing its phenotypic consequences upon drug exposure. Conversely, the forward genetics approach starts with a phenotype of interest and screens a genetically diverse population to identify genes or variants eliciting the studied response upon drug exposure.

Schematic representation of two experimental approaches to pharmacogenetics The reverse genetics approach starts with a genetic variant of interest and aims at characterizing its phenotypic consequences upon drug exposure. Conversely, the forward genetics approach starts with a phenotype of interest and screens a genetically diverse population to identify genes or variants eliciting the studied response upon drug exposure.

A reverse genetics approach to pharmacogenetics

Reverse genetics investigates the phenotypic impact of a specific polymorphism by itself or in different sexes, genetic backgrounds, tissues, and/or environments. Before 2012, reverse genetics relied on laborious cloning techniques. Since, clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated protein 9 (Cas9) revolutionized genetic engineering through its simplicity, versatility, and precision (Box 3) and over 50 publications already made use of CRISPR to induce full PGx gene knockouts or characterize specific variants. For instance, CRISPR-induced deletion of two linked SNVs in an upstream enhancer of CYP2D6 revealed which one was responsible for increasing the expression of the gene., CRISPR technologies have been linked to off-target mutagenesis at sites with high sequence similarity—which are particularly common among ADME genes—even if careful usage of the tool should mitigate this issue. More importantly, the large number of PGx variants and persistent increase in drug prescription and polypharmacy render it impossible to assess the full combinatorial space of experimental settings at current experimental throughput, limiting usage of CRISPR to strong candidate loci.

A forward genetics approach to pharmacogenetics

Forward genetics exploits genetically diverse population maintained under controlled environmental conditions to facilitate the identification of genetic modifiers of drug response. Contrasting with targeted approach, CRISPR technologies can also be used to perform high-throughput gene-level genome-wide knockout, knockdown, or knockin screens to assess the low-dimension combinatorial effect of genetic perturbations. Screens can be coupled to high-throughput molecular techniques, such as bulk or single-cell RNA sequencing, allowing measurement of downstream consequences of induced mutations. Since drug resistance or susceptibility measured as cell proliferation or death can easily be assessed and exhibits limited tissue specificity, CRISPR screens have primarily been applied to cancer PGx research. In the future, screens could characterize a wider array of PGx-relevant readouts and move toward the investigation of specific SNVs. Deep mutational scanning takes a step in this direction by probing the function of thousands of variants in a single experiment and was recently applied to generate abundance and activity scores for 87% of the possible missense variants in CYP2C9. Providing unprecedented functional insights into the mutational landscape of the gene, these data can be used to train functional prediction algorithms (Functional prediction of rare variants). Another source of genetic diversity are genetic reference populations, which are available for most model organisms (e.g., mouse). Producing genetically identical individuals, they represent powerful mapping tools allowing the in vivo multi-tissue characterization of drug responses and can be coupled to extensive phenotyping and follow-up studies. An in vitro alternative that directly mimics human genetic diversity is to harvest patient cells, which can be studied as primary cells or induced into pluripotent stem cells to generate organoids, as in the previously described drug-induced liver injury PGS study. Patient-derived organoids can mimic an increasing number of organs and tissues, rendering the technology apt for personalized medicine strategies aiming at selecting the most efficient and safe treatment. While widespread clinical implementation awaits, it was shown that organoids from cystic fibrosis patients can predict drug efficacy., By studying appropriate and genetically diverse model systems, questions that were unanswerable due to ethical, logistical, or technical concerns become solvable. Experimental studies have their limitations, such as high costs, limited throughput, and/or poor reproducibility and translatability,, but the virtuous cycle between human observational studies and in vitro and in vivo experimentation in model organisms should incite PGx to exploit the respective strengths of these approaches.

Pharmacogenomics in the era of big data

Electronic health records and biobanks

Electronic health records (EHRs) have become widespread over the past decade as, besides supporting and improving diagnosis, clinical decisions, and treatment coordination, they provide new data analytics opportunities. EHRs commonly encompass patient demographics, medical history, drug prescriptions, and in some cases laboratory results, radiological images, and wearable device data. Integration of EHRs is key to the realization of precision medicine, and current research already benefits from it by conducting nonrandomized studies.182, 183, 184, 185 Cost effectiveness, possibility to investigate rare diseases, large cohorts allowing stratification, and analysis of co-occurring conditions are some of the advantages over classical RCTs.

Electronic health records in pharmacogenomics

Increasingly, biobanks match genotype entries with EHRs, broadening the phenotype query space exploitable for PGx research. Table 2 lists major research initiatives and biobanks for which EHRs are available. The UK Biobank (UKB) recently added primary care data for 230,000 participants, including consultation reports, medical diagnoses, and medication prescription. FinnGen plans to recruit 500,000 individuals by 2023 with access to EHRs from national health registries comprehending medication usage. Similarly, the Estonian Biobank contains detailed information about drug purchase and disease incidences collected since 2000., In parallel, many initiatives focus on the clinical implementation of PGx, such as the American eMERGE-PGx project, the European Ubiquitous Pharmacogenomics (U-PGx) consortium, the South East Asian Pharmacogenomics Research Network (SEAPharm),, or the African Pharmacogenomics Consortium (APC). Because of the increased genetic diversity observed in African populations and their exposure to different types of diseases (e.g., higher infectious disease burden), the latter initiative is promising to unravel new PGx findings.

Table 2

List of 10 major biobanks and initiatives suitable for pharmacogenetic research

	Recruitment	Sample size	Sequencing data	EHR data	Omics data	References	IHCC	Notes
Estonian Genome Project	2002–	200,000	genome-wide genotyping (n = 200,000); WES (n = 2,500); WGS (n = 3,000)	n > 40,000	transcriptomics (n = 600; blood); metabolomics (n = 11,000); methylomics (n = 800); microbiomics (n = 2,500)	Leitsalu et al.¹⁸⁷, Leitsalu et al.¹⁸⁸, https://genomics.ut.ee	yes	in 2018–2019, increased from 50,000 to 200,000 samples
eMERGE	2007–	>135,000	genome-wide genotyping (n > 105,000); WES (n > 3,700); WGS (n > 1,700); PGRNseq (n = 9,000a)	all participants are linked to EHRs	None	Gottesman et al.¹⁸⁹, Bush et al.¹²⁹, https://emerge-network.org	yes	network of multiple distinct cohorts
UK Biobank	2006–2010	500,000	genome-wide genotyping (n = 500,000); WES (n = 500,000); WGS (n = 500,000b^,c)	n = 230,000	blood biomarkers (n = 500,000); proteomics (n = 53,000b); metabolomics (n = 120,000); telomere length (n = 500,000)	Elliott and Peakman¹⁹⁰, Bycroft et al.¹⁹¹, Backman et al.¹⁹², https://www.ukbiobank.ac.uk	yes
DiscovEHR	2007–	>250,000	genome-wide genotyping (n > 150,000); WES (n > 100,000)	all participants are linked to EHRs	blood biomarkers	Carey et al.¹⁹³, Dewey et al.¹⁹⁴, http://www.discovehrshare.com	yes	part of Geisinger’s MyCode Community Health Initiative
Million Veteran Program	2011–	>1,000,000b	genome-wide genotyping (n = 1,000,000b^,d); WESb; WGS (n = 100,000)	all participants are linked to Veteran Affairs EHR	blood biomarkers (n = 1,000,000b); proteomicsb; methylomicsb	Gaziano et al.¹⁹⁵, Hunter-Zinck et al.¹⁹⁶, https://www.mvp.va.gov	yes
Taiwan Biobank	2012–	200,000e	genome-wide genotyping (n = 200,000e^,f); WGS (n = 1,500)	national EHRs availableg	blood biomarkers; metabolomicse	Wei et al.¹⁹⁷, Lin et al.¹⁹⁸, https://www.twbiobank.org.tw	yes
H3Africa	2012–	>100,000	genome-wide genotypingh; WESh; WGSh (n > 400)	None	blood biomarkers; microbiomicsh	H3Africa Consortium et al.¹⁹⁹, Mulder et al.²⁰⁰, https://h3africa.org	no	H3Africa is composed of several cohorts
Tohoku Medical Megabank Project	2013–2017	150,000	genome-wide genotyping (n = 150,000b); WGS (n = 14,000)	EHRs from MMWIN back-up system	transcriptomics (n = 100; blood); proteomics (n = 500); metabolomics (n = 46,000); methylomics (N = 100)	Tadaka et al.²⁰¹, Ido et al.²⁰², https://www.megabank.tohoku.ac.jp	no	composed of a population community-based cohort and a Birth and Three-Generation Cohort
FinnGen	2017–2023	500,000i	genome-wide genotyping (n = 500,000i^,j)	all participants are linked to national EHRs	None	Locke et al.²⁰³, https://www.finngen.fi	no
All of Us	2018–	>1,000,000b	genomic assaysb	all participants are linked to EHRs	biological assaysb	The All of Us Research Program Investigators²⁰⁴, https://www.researchallofus.org	yes

The IHCC column informs whether the cohort is part of the International HundredK + Cohorts Consortium.

eMERGE-PGx Project

Planned.

150,000 samples released in 2021.

460,000 samples released in 2020.

Planned in the population community-based cohort.

100,000 samples released in 2021.

Not yet crosslinked.

Sample sizes and available measurements are cohort dependent.

By 2023.

350,000 samples released in 2022.

List of 10 major biobanks and initiatives suitable for pharmacogenetic research The IHCC column informs whether the cohort is part of the International HundredK + Cohorts Consortium. eMERGE-PGx Project Planned. 150,000 samples released in 2021. 460,000 samples released in 2020. Planned in the population community-based cohort. 100,000 samples released in 2021. Not yet crosslinked. Sample sizes and available measurements are cohort dependent. By 2023. 350,000 samples released in 2022. EHR-coupled biobanks accelerate our understanding of PGx by enabling the replication of known interactions and catalyzing new discoveries. Associations between drug maintenance dose and 9 PGx genes were tested in 200,000 UKB participants by assigning individuals to a metabolizer class (e.g., poor, intermediate, and normal) based on their genotype, revealing known CYP2C9 and a novel CYP2C19 variant as determinants for warfarin dosage. Another UKB study discovered an association between statin use and NAT2 (MIM: 612182) genotype. The first study making use of WGS-coupled EHRs was conducted in 2,500 Estonian Biobank participants. Increasing sample size to 16,000 by including individuals with imputed genotypes, six known and nine new PGx interactions were identified. Importantly, many biobanks follow participants longitudinally, so that the number of individuals exposed to a drug or reporting an ADR increases with time. Initiatives such as the International HundredK+ Cohorts Consortium (IHCC) aim at creating a platform to harmonize data and facilitate information sharing across cohorts. Specifically, one of IHCC’s core initiatives is to “improve understanding of variability in response to treatments and identify novel PGx associations.” With 50 million participants from 100 cohorts, statistical power largely surpasses the one of traditional RCTs, making it likely for combined biobank studies to establish new PGx interactions.

Leveraging observational data to gain causal insights

Biobank studies follow the design of nonrandomized trials. Despite high agreement between treatment effects in randomized and nonrandomized trials, correlations between drug responses and genetic variants do not guarantee causality. Propensity score matching reduces bias from concomitant confounding variables through inclusion of selected covariates, and randomization between study participants is mimicked based on known observed covariates. Alternatively, Mendelian randomization (MR) approaches account for unobserved confounders, where the causal effect of a modifiable risk factor on an outcome is inferred from genetic associations coming from observational data (i.e., GWAS summary statistics or individual-level data). Analogously to RCTs, individuals are stratified based on the presence of genetic variants affecting the exposure of interest. MR was recently extended to study PGx through the triangulation within a study (TWIST) framework, which estimates a genetically moderated treatment effect (GMTE) and allows to calculate the reduction in treatment effect experienced by patients with a particular PGx variant, as compared with individuals lacking the variant (Figure 5A). Applying TWIST to the UKB revealed that CYP2C19 LOF carriers experienced a 0.28% increased risk of stroke (p = 0.048) while on clopidogrel. As the TAILOR-PCI RCT failed at detecting a significant benefit to CYP2C19-based anti-platelet prescription (Randomized controlled trials), increased power resulting from larger sample size (clopidogrel n = 7,483; no clopidogrel n = 198,868) and consideration of a larger spectrum of LOF alleles could have contributed to the positive results. A different study analyzing statin usage in the UKB found that females homozygous for the SLCO1B1∗5 allele known to impair intracellular statin uptake would have 0.147 mmol/L lower total cholesterol were they treated with a lipid-lowering medication unaffected by their genotype, reiterating the importance of sex as a modulator of PGx interactions. Overall, MR studies provide in silico orthogonal lines of evidence for the functional consequence of PGx variants and represent valuable tools to prioritize RCTs.

Figure 5

MR-analogous frameworks to infer causal effects in pharmacogenetic studies

(A) Framework proposed by Bowden et al. to estimate a genetically modified treatment effect (GMTE) (green arrow), which describes the reduction in treatment effect experienced by patients with a PGx variant (treatment inhibiting) as opposed to those without (treatment enabling) the variant whilst on the same treatment.

(B) Proposed framework to identify causal molecular mediators of PGx effects. Genetic variants are used as instrumental variables, intermediate phenotypes (e.g., methylomics, transcriptomics, proteomics, or metabolomics data) as exposure, and drug responses as outcome. The causal effect of the exposure on the outcome (green arrow) is estimated thanks to (1) the effect of the genetic variants on the exposure, measured by means of quantitative trait loci (QTLs), and (2) the effect of the same genetic variants on the outcome, measured by means of PGx GWAS.

MR-analogous frameworks to infer causal effects in pharmacogenetic studies (A) Framework proposed by Bowden et al. to estimate a genetically modified treatment effect (GMTE) (green arrow), which describes the reduction in treatment effect experienced by patients with a PGx variant (treatment inhibiting) as opposed to those without (treatment enabling) the variant whilst on the same treatment. (B) Proposed framework to identify causal molecular mediators of PGx effects. Genetic variants are used as instrumental variables, intermediate phenotypes (e.g., methylomics, transcriptomics, proteomics, or metabolomics data) as exposure, and drug responses as outcome. The causal effect of the exposure on the outcome (green arrow) is estimated thanks to (1) the effect of the genetic variants on the exposure, measured by means of quantitative trait loci (QTLs), and (2) the effect of the same genetic variants on the outcome, measured by means of PGx GWAS.

Challenges of large-scale biomedical databases

Increased usage of EHR-linked biobanks has brought attention to new challenges. First, EHRs frequently consist of unstructured, noisy, sparse, and incomplete clinical notes. As manual review by clinical experts is not scalable, automated solutions based on natural language-processing algorithms are required to extract informative features. Second, the massive size of biobanks necessitates sufficient storage capacities and computational resources to conduct analyses. This issue is exacerbated by the sensitive nature of the data: whereas patient privacy should be ensured, access to the data should be facilitated for the research community. With varying legislation across countries and differing views on acceptable privacy levels, development of data storage and management solutions is ongoing work. Finally, validating the increasing number of putative PGx interactions resulting from data-driven approaches is likely to become a major bottleneck. Conventional methods, such as RCTs, are inherently slow and low throughput, calling for alternative ways to validate putative PGx findings.

Integration of multi-omics data

Biotechnological advances foster the generation of “omics” datasets, including methylomics, transcriptomics, proteomics, and metabolomics, which quantify DNA methylation, RNA expression, protein expression and modifications, and metabolite abundance, respectively. As disruptions in intermediate layers might be predictive of therapeutic outcomes (i.e., drug responses), exploring and integrating changes in biomarkers could help understand and predict pharmaco-omics interactions.

Molecular modeling of pharmaco-omics interactions

GWASs have shed light on the genetic architecture of complex traits but fall short on elucidating molecular mechanisms, as they identify associations between traits and blocks of co-inherited variants in LD, hindering fine-mapping. This challenge is addressed by methods integrating GWAS results with other omics data. Colocalization approaches prioritize putative causal genes based on shared signals between GWAS and omics data., Similarly, transcriptome-wide association studies (TWASs) pinpoint genes whose differential expression is associated with a given phenotype., Starting from genetic variants, gene expression is inferred from expression quantitative trait loci (eQTLs) and a derived gene expression PGS is correlated to the investigated trait. MR provides a framework that can be generalized to any modifiable exposure, including the transcriptome, proteome,, methylome (CpG sites), or metabolome. Not widely employed in PGx, these methods improve our understanding of the molecular basis of association signals and could provide a corroborating source of evidence for novel PGx interactions (Figure 5B). Recent studies highlighted the benefits of including biomarkers to machine-learning (ML) algorithms predicting drug response for antidepressant use, precision psychiatry, and warfarin dosage (Box 2). In the latter study, assuming that warfarin dosage depends on genetic variation altering both the expression and activity of key enzymes (Figure 3), tissue-specific eQTLs were used to infer the expression of 116 genes implicated in warfarin pharmacokinetics and pharmacokinetics and construct prediction models, which improved predictive performance by an additional 8%–12% of variance.

Limitations to multi-omics modeling in pharmacogenomics

Despite promising results, most predictive models are not ready for the clinics. Limited size and number of available multi-omics datasets increase the risk of model overfitting and prevents independent replication, respectively. Furthermore, study results are often siloed and only few databases with unbiased drug-screening experiments are publicly available. There is a need for large-scale in vitro and in vivo perturbation experiments measuring drug effects on multiple omics layers, as results from these assays could open new avenues to study biomarker combinations and drug-drug-gene interactions. Importantly, omics layers are not confined entities. Instead, each forms a small portion of an inter-connected biological system, so that testing every possible drug-biomarker combination in isolation would not provide a complete picture. System genetics strategies have been proposed to elucidate molecular mechanisms,, but they remain exploratory and often lack the drug-omics interaction dimension. Toward this direction, a “multiscale interactome” was constructed, compiling public data on interactions between drugs, diseases, proteins, and biological functions to identify disease treatment mechanisms. System-level PGx is likely to become more elaborate in the future (Perspectives in mechanistic approaches), providing mechanistic explanations for pharmaco-omics interactions.

In silico pharmacogenomics

Functional prediction of rare variants

The wealth of rare variants in known PGx genes discovered by NGS makes it challenging to infer their function through classical methods. Not only are in silico approaches necessary, but they can advance PGx faster and beyond what can be achieved with traditional methods. Functional impact of common variants can be characterized through association studies, in vitro heterologous expression systems, or computational prediction tools. However, these typically fall short when assessing rare variants. Computational algorithms usually rely on evolutionary conservation and use training sets with annotated pathogenic variants. Often under lower evolutionary constraints than disease-associated variants, PGx variants can be functional without being pathogenic. One solution is to engineer cell lines to harbor these variants and estimate their effect on enzymatic activity in a high-throughput and cost-effective manner, as was recently done for 8,000 CYP2C9 variants following a forward genetics approach (A forward genetics approach to pharmacogenetics). Generated data can be used to train ML algorithms, yielding models applicable to clinical data. For instance, an ensemble ML model combining multiple functionality prediction algorithms was trained based on the enzymatic activity of 337 deleterious variants across 43 ADME genes, significantly increasing performance accuracy compared with single ML algorithms. Variant combinations—within the same or different genes—can also affect drug response. Employing deep learning algorithms, haplotype effects were investigated to predict CYP2D6 enzymatic activity., Training a convolutional neural network based on star alleles definitions, in vitro enzymatic activity of both known and unknown CYP2D6 haplotypes could be predicted. Similarly, a neural network was trained on a prospective cohort of breast cancer patients to predict CYP2D6-mediated tamoxifen metabolism. In this second study, omitting star allele haplotype assignment yielded better prediction scores, implying it might be time to move beyond star allele PGx. Both studies being limited in their scope, investigating a broader set of genes is warranted to draw general conclusions.

Perspectives in mechanistic approaches

We previously introduced statistical methods that integrate GWASs’ findings with other datasets to identify causal biomarker-drug response relationships. Assuming a biomarker alteration is causal to a disease, identifying drugs achieving their therapeutic effect by inducing the opposite change should provide viable treatment options. This hypothesis was tested by searching for drugs inducing gene expression patterns opposite to the ones observed in seven psychiatric disorders within the Connectivity Map (CMap), a library containing the expression profile of 1,000 genes upon perturbation by 20,000 small molecules. Although there is no protein analogue to CMap, ChEMBL, the largest database of curated drug targets, and DrugBank form valuable resources to conduct similar studies at the proteome level. Drug repositioning is a topic on its own, but this example showcases an application of a mechanistic understanding of drug-disease interactions. In the context of PGx, this strategy could identify drugs counteracting patient-specific, disease-causing biomarker changes. Alternatively to data-driven top-down strategies are bottom-up approaches, such as the ones used in systems biology. Systems biology aims at constructing metabolic network models describing complex biological systems, with nodes representing metabolite concentrations connected by mathematical equations defined by enzyme kinetics. Since enzyme kinetics is at the core of drug metabolism, it would be relevant to understand how consequences of genetic variants propagate through the biochemical network (Figure 6). Network effects can be calculated through flux balance analyses, using mathematical models accounting for thermodynamic constraints and mRNA and protein abundances. Efforts to reconstruct a genome-scale network are concerted in Recon3D, the most comprehensive human metabolic network to date. Recon3D provides information on 13,543 enzymatic reactions and 4,140 metabolites and describes the relationships between involved genes, proteins, the reactions these proteins catalyze, and links between genetic variants and metabolic effects. Integration of genetic variation into metabolic studies remains in its infancy, but the ever-growing supply of diverse datasets is leading to a steady increase in our understanding of disease mechanisms. There is a great demand for refined multiscale modeling, as expressed in a review authored by the pharmaceutical industry, with the hope that more realistic in silico models will improve prediction of drug mechanisms and safety before human clinical testing.

Figure 6

A mechanistic systems biology view on pharmacogenomics

The metabolic network of each cell is composed of metabolites (nodes) connected by enzymatic reactions (edges). It is modulated by genetic and epigenetic variants (star) that propagate through several intermediate molecular layers (e.g., transcriptome and proteome), as well as by drug intake. This network and its modulation are cell type specific and influence an individual’s drug response and ADR risk, as illustrated on the right with examples of differential drug responses manifesting themselves in specific tissues.

A mechanistic systems biology view on pharmacogenomics The metabolic network of each cell is composed of metabolites (nodes) connected by enzymatic reactions (edges). It is modulated by genetic and epigenetic variants (star) that propagate through several intermediate molecular layers (e.g., transcriptome and proteome), as well as by drug intake. This network and its modulation are cell type specific and influence an individual’s drug response and ADR risk, as illustrated on the right with examples of differential drug responses manifesting themselves in specific tissues.

The past and future of pharmacogenetics

Seventy years of methodological development have transformed PGx from an emerging science into an interdisciplinary area of research key to the implementation of personalized medicine. Pharmacogenetics established the role of genetics in pharmacological responses, with RCTs validating the first CGSs discoveries. Completion of the Human Genome Project triggered the transition to pharmacogenomics: sequencing technologies allowed ascertainment of the human PGx landscape while genome-editing tools enabled discovery and functional validation of PGx interactions. Finally, tools and concepts from statistics and computer science have been and will be key to analyze and integrate ever more complex datasets, bringing us a step closer to the in silico study of pharmaco-omics. Despite indisputable progress, numerous challenges lay ahead. Throughout the review, we highlighted areas of research we imagine will be decisive for the future of PGx and that overlap challenges recently laid out. Due to its interdisciplinary nature, progress in PGx can be achieved through different avenues, including data generation, data analysis and integration, and discovery prioritization and validation. Size and diversity of information contained in biobanks is constantly increasing, as development of sequencing and omics technologies improves genetic and molecular pathway understanding, while EHRs broaden clinical characterization. In parallel, high-throughput screens in genetically engineered cell lines, patient-derived organoids, and model organisms can assess the impact of genetic variations in controlled but tunable environmental settings. Establishment of efficient and secure solutions for data storage and sharing will foster the development of statistical and ML approaches aiming at analyzing and integrating data originating from various sources, resulting in a better understanding of the genetic architecture of PGx traits and mechanistic insights. RCTs will likely continue to play an important role in the clinical implementation of PGx, but it will be important to maximize their success rate and improve their throughput. Creative in vitro and in vivo experiments, as well as high-quality in silico mechanistic and predictive models, can generate robust hypotheses, thereby improving prioritization of new putative PGx interactions. Awareness and incorporation of this knowledge in RCT design might increase their success rate and/or promote development of companion diagnostics gauging the suitability of a treatment for a given patient. Solutions to the above-mentioned challenges should ensure that progress in PGx benefits all. Genomic research remains predominantly conducted in populations of European ancestry, despite general agreement that more diversity is needed to understand the human genome.,256, 257, 258, 259 This particularly applies to PGx, as many PGx genes are under weak global selective pressures and exhibit population-specific allele frequencies.15, 16, 17, 18, 19,, In this optic, All of US will recruit one million US citizens to build the largest and most diverse cohort in history, marking an important shift from the paradigm of studying white-male individuals to representing the full spectrum of a globalized world, including its minorities. Another core value of the project is to strengthen ties between participants and researchers to return value to all parties and ensure knowledge flow from bench to bedside and back. Acknowledging that most health care systems currently lack the infrastructure to implement screening for even well-established PGx variant, we argue that genetically-guided prescription, together with the recording of health outcomes through EHRs, will fuel PGx research while helping refine prescription guidelines. This virtuous cycle will benefit from increased diversity in both research and clinical settings, as the richness in genetic variation will accelerate mechanistic understandings of PGx, whose findings can in turn be generalized to other populations.

247 in total

1. Understanding adverse drug reactions in older adults through drug-drug interactions.

Authors: A Marengoni; L Pasina; C Concoreggi; G Martini; F Brognoli; A Nobili; G Onder; D Bettoni
Journal: Eur J Intern Med Date: 2014-10-11 Impact factor: 4.487

Review 2. The next generation of CRISPR-Cas technologies and applications.

Authors: Adrian Pickar-Oliver; Charles A Gersbach
Journal: Nat Rev Mol Cell Biol Date: 2019-08 Impact factor: 94.444

3. Effect of genetic variation in the organic cation transporter 1 (OCT1) on metformin action.

Authors: Yan Shu; Steven A Sheardown; Chaline Brown; Ryan P Owen; Shuzhong Zhang; Richard A Castro; Alexandra G Ianculescu; Lin Yue; Joan C Lo; Esteban G Burchard; Claire M Brett; Kathleen M Giacomini
Journal: J Clin Invest Date: 2007-05 Impact factor: 14.808

4. Cohort Profile: Estonian Biobank of the Estonian Genome Center, University of Tartu.

Authors: Liis Leitsalu; Toomas Haller; Tõnu Esko; Mari-Liis Tammesoo; Helene Alavere; Harold Snieder; Markus Perola; Pauline C Ng; Reedik Mägi; Lili Milani; Krista Fischer; Andres Metspalu
Journal: Int J Epidemiol Date: 2014-02-11 Impact factor: 7.196

5. Identification of disease treatment mechanisms through the multiscale interactome.

Authors: Camilo Ruiz; Marinka Zitnik; Jure Leskovec
Journal: Nat Commun Date: 2021-03-19 Impact factor: 14.919

6. Genome-Wide Approach to Measure Variant-Based Heritability of Drug Outcome Phenotypes.

Authors: Ayesha Muhammad; Ida T Aka; Kelly A Birdwell; Adam S Gordon; Dan M Roden; Wei-Qi Wei; Jonathan D Mosley; Sara L Van Driest
Journal: Clin Pharmacol Ther Date: 2021-07-12 Impact factor: 6.903

7. Analysis of population-specific pharmacogenomic variants using next-generation sequencing data.

Authors: Eunyong Ahn; Taesung Park
Journal: Sci Rep Date: 2017-09-04 Impact factor: 4.379

8. Projected Prevalence of Actionable Pharmacogenetic Variants and Level A Drugs Prescribed Among US Veterans Health Administration Pharmacy Users.

Authors: Catherine Chanfreau-Coffinier; Leland E Hull; Julie A Lynch; Scott L DuVall; Scott M Damrauer; Francesca E Cunningham; Benjamin F Voight; Michael E Matheny; David W Oslin; Michael S Icardi; Sony Tuteja
Journal: JAMA Netw Open Date: 2019-06-05

9. The ETFL formulation allows multi-omics integration in thermodynamics-compliant metabolism and expression models.

Authors: Pierre Salvy; Vassily Hatzimanikatis
Journal: Nat Commun Date: 2020-01-13 Impact factor: 14.919

10. Heritability of variation in glycaemic response to metformin: a genome-wide complex trait analysis.

Authors: Kaixin Zhou; Louise Donnelly; Jian Yang; Miaoxin Li; Harshal Deshmukh; Natalie Van Zuydam; Emma Ahlqvist; Chris C Spencer; Leif Groop; Andrew D Morris; Helen M Colhoun; Pak C Sham; Mark I McCarthy; Colin N A Palmer; Ewan R Pearson
Journal: Lancet Diabetes Endocrinol Date: 2014-03-19 Impact factor: 32.069

1 in total

1. Genetic Variation among Pharmacogenes in the Sardinian Population.

Authors: Maria Laura Idda; Magdalena Zoledziewska; Silvana Anna Maria Urru; Gregory McInnes; Alice Bilotta; Viola Nuvoli; Valeria Lodde; Sandro Orrù; David Schlessinger; Francesco Cucca; Matteo Floris
Journal: Int J Mol Sci Date: 2022-09-02 Impact factor: 6.208

1 in total