Literature DB >> 27779249

The global spectrum of protein-coding pharmacogenomic diversity.

G E B Wright^1,2, B Carleton^2,3, M R Hayden^1,2, C J D Ross^2,4.

Abstract

Differences in response to medications have a strong genetic component. By leveraging publically available data, the spectrum of such genomic variation can be investigated extensively. Pharmacogenomic variation was extracted from the 1000 Genomes Project Phase 3 data (2504 individuals, 26 global populations). A total of 12 084 genetic variants were found in 120 pharmacogenes, with the majority (90.0%) classified as rare variants (global minor allele frequency <0.5%), with 52.9% being singletons. Common variation clustered individuals into continental super-populations and 23 pharmacogenes contained highly differentiated variants (FST>0.5) for one or more super-population comparison. A median of three clinical variants (PharmGKB level 1A/B) was found per individual, and 55.4% of individuals carried loss-of-function variants, varying by super-population (East Asian 60.9%>African 60.1%>South Asian 60.3%>European 49.3%>Admixed 39.2%). Genome sequencing can therefore identify clinical pharmacogenomic variation, and future studies need to consider rare variation to understand the spectrum of genetic diversity contributing to drug response.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Proteins

Year: 2016 PMID： 27779249 PMCID： PMC5817389 DOI： 10.1038/tpj.2016.77

Source DB: PubMed Journal: Pharmacogenomics J ISSN： 1470-269X Impact factor: 3.550

Introduction

Inter-individual differences in response to medications are known to have a strong genetic component and several genes that influence either response to medications or adverse drug reactions (ADRs) have been identified.[1, 2] The majority of previous pharmacogenomic studies, however, have either assessed individual candidate genes or analyzed a subset of genetic variation interrogated with genotyping arrays. Current sequencing technologies therefore offer an opportunity to assess the full spectrum of variation present in populations,[3] as well as to determine how genes of pharmacogenomic importance are affected by rare genetic variation, the class of genetic variants that are most likely to be deleterious.[4] Further, sequencing approaches present a means to investigate understudied populations and identify groups of individuals at risk to certain ADRs on a scale not previously possible. The 1000 Genomes Project (1000GP) aimed to detect the majority of variants with minor allele frequencies (MAFs) >1% in numerous human populations through the use of current sequencing and array genotyping technologies.[5] The final stage consisted of 2504 individuals from 26 populations.[6] The current study aimed to leverage these genomic data to determine the spectrum of variation found in pharmacogenes across human populations. A previous study investigated an earlier release of the 1000GP, analyzing 15 populations and a subset of pharmacogenomic variation present on a commercial array (that is, 1156 markers).[7] We therefore analyzed the full catalogue of diversity in the protein-coding regions of genes of relevance to pharmacogenomics, incorporating data from across the entire allele frequency spectrum. The protein-coding regions of these pharmacogenes were the focus of investigation, since these areas were subjected to the most comprehensive sequencing coverage in the 1000GP (mean coverage, complete exome=65.7 × ), compared with the rest of the genome (mean coverage, whole-genome sequencing=7.4 × ).[5] Performing pharmacogenomic studies of inclusive population cohorts will lead to a better understanding of the pattern of genetic factors that influence drug safety and effectiveness.

Materials and methods

Selection of pharmacogenes

Pharmacogenes were selected based on curated Pharmacogenomics Knowledgebase (PharmGKB) data and the literature. Autosomal genes annotated as ‘very important pharmacogenes’ and/or containing variants with high to moderate levels of clinical annotation (PharmGKB levels 1 and 2) were prioritized (www.pharmgkb.org/downloads, accessed 26 August 2014). In addition, pharmacogenes with emerging evidence, as highlighted in recent reviews, were included if they had not already fulfilled these criteria.[1, 2] Human leukocyte antigen (HLA) and UDP-glucuronosyltransferases (UGT) genes were excluded from analyses due to their complex nomenclature and difficulties associated with current sequencing.[8, 9]

Study population, genetic data retrieval and functional annotation

The 1000GP Phase 3 consists of 2504 individuals from 26 global populations (Supplementary Table 1), grouped according to five super-populations: African, admixed American, East Asian, European and South Asian. GRCh37 exon locations of pharmacogenes were extracted with the R (R Foundation for Statistical Computing, Vienna, Austria) package, biomaRt, and an intersection between these coordinates and the exome region targeted by the 1000GP was created (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/exome_pull_down_targets, accessed 10 May 2016). The intersection was padded by 25 bp with bedtools (v2.24.0) to capture flanking exon/intron boundaries. Sequencing coverage for the exome capture regions was then calculated with samtools (v0.1.19). Variants were extracted (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502, accessed 2 June 2016) using tabix (v1.3) and annotated with the Ensembl Variant Effect Predictor (VEP v83) (per gene, default ranking criteria). Variants were assigned CADD scores and values of ⩾15 were considered deleterious (http://cadd.gs.washington.edu/info, accessed 9 June 2016). The VEP-plugin, Loss-Of-Function Transcript Effect Estimator (LOFTEE), was employed to annotate high-confidence loss-of-function (LOF) variants. LOF variants with a global MAF >30% were manually curated. In order to generate a list of robust LOF variants, we selected variants annotated using GENCODE (v19) transcripts that were annotated as ‘high confidence’ and were not flagged by LOFTEE.

Population genetic analyses

Principal component analysis was performed on a pruned subset of the data with EIGENSOFT (v5.0). Pruning was based on linkage disequillibrium and MAF (parameters: 50 variant window, shifted by a 5-variant interval, r2>0.2, global MAF>0.01) with PLINK (v1.07), and these data were used exclusively for principal component analysis. VCFtools (v0.1.14) was used to calculate global and population-specific allele frequencies, fixation index (FST) statistics and analyze variants in inaccessible regions and/or segmental duplications. Rare variants were defined as those with a MAF <0.5%, while singletons were variants with an allele count of one in all 1000GP individuals. Highly differentiated variants in individual populations were defined as variants that were rare in the global sample (MAF <0.5%), but common in one population (MAF >5.0%).

Clinical pharmacogenomic variants

Clinical variants were defined as variants with a PharmGKB clinical annotation level of evidence of 1A/B with unambiguous allele-defining variants. Level 1A/B variants represent those that are being implemented in clinical practice or have an unequivocal influence on a pharmacogenomic trait, while level 2 variants are ones that are either found in known pharmacogenes or have been replicated with moderate evidence for association (https://www.pharmgkb.org/page/clinAnnLevels, accessed 9 June 2016). Due to pleiotropic effects, the number of minor alleles carried per individual was used to calculate clinical variant statistics. Downstream statistical analyses and plotting were performed in R (dplyr, reshape2 and ggplot2).

Short-read sequencing accessibility and variant site assessment

Pharmacogenes were assessed for accessibility to short-read sequencing technologies by investigating variants located in (i) potentially inaccessible regions defined by the 1000GP ‘strict mask’ and (ii) segmental duplications (>1000 bp with >90% identity, http://humanparalogy.gs.washington.edu/build37/data/, accessed 23 February 2015). Extreme outlier genes (>3 × interquartile range) with regards to proportion of variants located in either the ‘strict mask’ or segmental duplication regions were flagged as being potentially problematic for short-read technologies. In order to assess the quality of variant calls in the data set, we generated a list of variants that are found in the 1000GP data, but are more likely to be sequencing artefacts. The 1000GP used a support vector machine (SVM) classifier to select high-quality variants and the final call set included single-nucleotide variants with SVM>0. We therefore described marginal quality variants as those close to the SVM separating hyperplane (that is, 010] Finally, to provide independent in silico verification of the 1000GP variants we compared allele frequency data for overlapping markers found in either the Exome Aggregation Consortium (ExAC)[11] and the Human Genome Diversity Project.[12]

Code availability

The code used to perform these analyses will be made available via GitHub (https://github.com/GalenWright/1000gpPGX).

Results

Summary of pharmacogenomic variation

A total of 120 pharmacogenes were included, spanning 369 kb of genomic sequence and containing 12 084 variants, with a mean coverage of 105.2 × for the analyzed pharmacogenomic exome region (Supplementary Table 2). Notably, 6398 (52.9%) of the variants were singletons. Rare variants, with global MAFs <0.5%, made up 90.0% of the data set. Variants that could influence protein function (for example, missense, stop gained, splice acceptor) were enriched in the rare variant classes, while, conversely, those more likely to be benign (for example, synonymous, intronic and 3′UTR) were more frequent in the most common positional annotations (Figure 1). The most significant enrichment was observed for missense variants (corrected P=9.7 × 10−40), where this class was over twice as prevalent in singletons (41.0%) compared with common (global MAF >5.0%) variants (19.5%). Further, rare variants had 50.1% higher mean CADD scores than variants with higher allele frequencies (13.1 versus 8.6 CADD). Supplementary Table 3 presents a per gene summary of the number of variants and select functional annotations.

Figure 1

Summary of the functional annotation of the pharmacogenomic variants in the 1000 Genomes Project individuals. (a) Counts of the different variant classes according to consequence type. (b) Relative proportion of variants across consequence type stratified by global minor allele frequency (MAF) bins. Consequence types that differ significantly in frequency according to global MAF are annotated with Bonferroni corrected P-values. Missense variants displayed the most significant differences in relative frequencies (P=9.68 × 10−40).

The number of missense variants per coding sequence length ranged from 0.001 (YEATS4, missense variant every ~684 bp) to 0.058 (IFNL3, missense variant every ~17 bp). Seventeen pharmacogenes exclusively carried rare missense variants, while ADRB1 was an outlier with regards to this statistic, with only 66.7% missense variants classified as rare (Supplementary Table 3). Some of the most conserved pharmacogenes were those where somatic mutations are predictive of cancer treatment response (for example, BRAF, KRAS and NRAS), indicating their important role in biological processes. Many of the other conserved pharmacogenes are important for hypertension (NEDD4L, PRKCA and PTGS2), statins (HMGCR) and beta blockers (ADRB1, ADRA2C and PTGS2). Principal component analysis and FST analyses (Supplementary Figures 1 and 2) revealed that pharmacogenomic variation tends to separate continental super-populations into different clusters (that is, African, European, South Asian and East Asian). African populations had the highest number of polymorphic sites in their pharmacogenes (Supplementary Figure 3). The average number of singletons per individual per population ranged from 1.2 to 3.6, with the Finnish population displaying the least number of singletons per individual (Supplementary Figure 4). There were 23 pharmacogenes (19.2%) that contained highly differentiated pharmacogenomic variants (pairwise FST>0.5 for one or more continental comparison, Supplementary Figure 5 and Table 1) and 17 (14.1%) possessed a rare variant that was common in one population (Supplementary Table 4 and Supplementary Figure 6).

Table 1

Pharmacogenes containing highly differentiated genetic variants. Twenty-three genes showed at least one variant that had FST values of greater than or equal to 0.5 for one or more super-population comparison (bolded values). These genes are important for various drug classes, with the table presenting the highest mean FST variant for each of these genes.

Gene	ID	Annotation	CADD	AFR EAS	AFR EUR	AFR SAS	EUR EAS	EUR SAS	SAS EAS	AMR AFR	AMR EAS	AMR EUR	AMR SAS	Important drugs
ABCG2	rs2231153	Intron	3.91	0.43	0.66	0.51	0.12	0.06	0.01	0.38	0.00	0.17	0.03	Various
ADCY9	rs2230738	Synonymous	10.52	0.60	0.22	0.49	0.22	0.12	0.02	0.33	0.15	0.01	0.07	Dalcetrapib
ADH1B	rs1229984	Missense	8.98	0.72	0.03	0.02	0.65	0.00	0.66	0.07	0.58	0.01	0.02	Alcohol
ADH1C	rs2241894	Synonymous	0.05	0.14	0.14	0.02	0.44	0.24	0.06	0.19	0.51	0.01	0.31	Anticancer
ALK	rs2246745	Synonymous	0.09	0.58	0.59	0.48	0.00	0.02	0.02	0.48	0.02	0.03	0.00	Anticancer
ANKK1	rs11214596	Intron	3.67	0.01	0.50	0.28	0.41	0.09	0.19	0.30	0.19	0.09	0.00	Antipsychotics, antidepressants
BRAF	rs3789806	Intron	6.63	0.47	0.52	0.30	0.00	0.08	0.05	0.50	0.00	0.00	0.08	Anticancer
CFTR	rs213950	Missense	13.09	0.52	0.47	0.40	0.00	0.01	0.03	0.41	0.03	0.01	0.00	Ivacaftor
COL22A1	rs3935045	Splice region	2.93	0.47	0.54	0.33	0.01	0.07	0.03	0.32	0.04	0.08	0.00	Salbutamol
CYP1A2	rs2470890	Synonymous	2.09	0.11	0.57	0.10	0.33	0.34	0.00	0.32	0.07	0.13	0.08	Antipsychotics, caffeine
CYP2D6	rs1081003	Synonymous	4.54	0.43	0.04	0.00	0.53	0.03	0.42	0.01	0.45	0.01	0.01	Various
CYP2E1	rs2480257	3′UTR	7.13	0.31	0.59	0.42	0.12	0.05	0.02	0.46	0.02	0.04	0.00	Analgesics, antituberculosis
CYP3A4	rs2242480	Intron	3.78	0.52	0.74	0.40	0.11	0.21	0.02	0.39	0.03	0.25	0.00	Various
CYP3A5	rs15524	3′UTR	2.60	0.27	0.58	0.23	0.18	0.22	0.00	0.40	0.03	0.08	0.05	Immunosuppressives, anticancer
ERCC1	rs11615	Synonymous	0.30	0.20	0.58	0.41	0.23	0.05	0.08	0.37	0.03	0.10	0.01	Anticancer
F5	rs13306334	Missense	1.68	0.63	0.02	0.19	0.56	0.12	0.33	0.12	0.43	0.04	0.03	Contraceptives, anticancer
GRIK4	rs644057	Synonymous	3.51	0.54	0.13	0.30	0.30	0.06	0.15	0.22	0.24	0.02	0.01	Antidepressants
GSTP1	rs4147581	Intron	8.01	0.58	0.37	0.53	0.07	0.04	0.00	0.21	0.21	0.05	0.17	Anticancer
IFNL3	rs8103142	Missense	0.00	0.55	0.26	0.34	0.15	0.01	0.08	0.16	0.26	0.02	0.06	Interferon
P2RY1	rs701265	Synonymous	0.64	0.45	0.56	0.51	0.02	0.00	0.01	0.43	0.00	0.03	0.01	Antiplatelet
PTGS1	rs5788	Synonymous	14.54	0.60	0.46	0.52	0.06	0.01	0.03	0.33	0.17	0.03	0.07	Antiplatelet
TCF7L2	rs1056877	3′UTR	0.00	0.58	0.58	0.58	0.00	0.00	0.00	0.47	0.07	0.05	0.07	Sulfonamides
TXNRD2	rs5748469	Missense	28.60	0.59	0.05	0.20	0.43	0.07	0.23	0.20	0.25	0.06	0.00	Antidepressants

Abbreviations: AFR, African; AMR, admixed American; EAS, East Asian; EUR, European; SAS, South Asian; UTR, untranslated region.

A total of 22 clinical variants were found in 11 pharmacogenes with 7 of these variants displaying global MAF ⩾5.0%. The number of clinical variants per individual varied between 0 and 11 (median 3), with 97% of individuals being carriers (Figure 2). Apart from ANKK1, the coverage of clinical pharmacogenes did not vary substantially between populations (Supplementary Figure 7). High-confidence LOF variants were found in 69 pharmacogenes (57.5%) and we detected 175 unique variants, comprising 1968 alleles (Figure 3). Individuals carried 0–5 of such LOF variants, with 55.4% of individuals being carriers, but this varied by super-population (East Asian 60.9%>African 60.1%>South Asian 60.3%>European 49.3%>Admixed 39.2%). Apart from 12 variants (6.9%), all high-confidence LOF variants were rare (global MAF<0.5%) and many of the higher frequency LOF variants allele frequencies were driven by one super-population. CYP2D6 provided the largest contribution to the LOF allele count and CYP2D6*4 (rs3892097, splice acceptor) displayed the highest global MAF (9.3%).

Figure 2

Pharmacogenomic variants with a high level of clinical annotation (that is, PharmGKB Level 1A/B). (a) Scatterplot of allele frequencies of clinically relevant variants in the different population groups. Variants in certain genes, such as CY2C19 and CYP4F2, displayed differences in allele frequencies between super-populations. (b) Violin plot of the number of clinically relevant pharmacogenomic variants carried per individual, grouped by population, and coloured by super-population. Ninety-seven percent of the individuals in the 1000 Genomes Project carried at least one such variant (median of 3).

Figure 3

Pharmacogenes that carried high-confidence loss-of-function (LOF) variants as designated by LOFTEE. (a) The size of the points is proportional to the number of unique LOF variants in the gene, with the cumulative allele count per gene indicated. (b) Combined allele frequencies of LOF variants per gene in each of the global super-populations. Common LOF variants were frequently driven by one super-population.

Sequencing performance and variant assessment

As our assessment of sequencing performance criteria is stringent[8], no pharmacogenes were removed from subsequent analyses, and these metrics should be considered as a reflection of pharmacogenes that should be treated with caution when short-read sequencing technologies are applied. Of 120 pharmacogenes, 16 had variants located within segmental duplications, of which 50% were cytochrome P450 (CYP) genes (Supplementary Table 2). This overrepresentation of CYP genes in the segmental gene list was statistically significant (P=0.001) as CYPs only comprise 10% of the complete set. Ten pharmacogenes (CES1, CYP2A6, CYP2B6, CYP2D6, CYP3A4, CYP4F2, FCGR3A, GSTT1, IFNL3 and SULT1A1) were extreme outliers with regards to the proportion of variants located in either the 1000GP ‘strict mask’ regions or segmental duplications (that is, >64%). These 10 pharmacogenes had a higher proportion of variants that failed the filtering steps performed by the 1000GP quality control (14.3% versus 1.9%) and more variants that were classified in this study as marginal quality variants (3.4% versus 0.1% Supplementary Figure 8 and Supplementary Table 2). Of note, none of the clinical variants (that is, PharmGKB level 1A/B) failed the 1000GP filtering or fell into our marginal variant category. Further, only four high-confidence LOF variants (SCN5A rs202196386, ABCG2 rs573803020, C8orf34 rs554409474 and SLC28A3 rs548288413) were in the marginal quality variant category. A complete list of the 110 marginal quality variants can be found in Supplementary Table 5. Validation of our results using genomic data from external projects showed a strong correlation between the 1000GP pharmacogenomic data and results that were generated either by genotyping arrays or exome sequencing (Supplementary Figure 9). Comparison with the ExAC data showed that the allele frequencies for 10 871 variants were comparable, even though different bioinformatic analyses were employed. Previously identified array-genotyped markers (n=136) from the Human Genome Diversity Project correlated well between super-population group (R2⩾0.95) for all populations except the admixed American populations, indicating the difficulty of predicting allele frequencies in highly admixed populations.

Discussion

This study presents an extensive surveillance of pharmacogenomic variation in global populations. Analysis of these regions with current sequencing technologies was shown to be feasible in genes of relevance to drug safety and effectiveness. By assessing the full spectrum of genetic variation, the importance of rare variation in influencing the protein function of pharmacogenes was highlighted. Future pharmacogenomic approaches in clinical practice will need to develop methods to address this class of variation to ensure the maximum predictive value for diagnostic tests. Furthermore, 97% of individuals carried at least one well-established variant of pharmacogenomic relevance, indicating the valuable clinical information related to drug response and/or ADRs that can be obtained through genomic sequencing. Sequence analysis facilitated the identification of protein-coding pharmacogenomic variation across a globally representative cohort at a scope not previously feasible. The majority of the variation was made up of rare variants and singletons (~90%). Further, the relative frequency of deleterious variants is inversely correlated with allele frequency (Figure 1b), since deleterious variants are more likely to be rare.[13] This was demonstrated by the high prevalence of rare missense variants in the pharmacogenes examined in this study and is in line with research involving re-sequencing of drug target genes.[14] This is of particular importance to pharmacogenomics, as rare variants are an understudied class of pharmacogenomic variation[15] and such low-frequency functional variants are unlikely to be adequately covered on conventional genotyping arrays. One of the pharmacogenes with a highest proportion of missense variants, SLC22A1, encodes the major hepatic uptake transporter of the antidiabetic drug, metformin.[16] Over 20 SLC22A1 variants have been associated with either changes in protein function in vitro or clinical traits, such as treatment response.[17] Future studies should ensure that variation in highly polymorphic pharmacogenes is adequately genotyped to ensure robust findings. Variation in conserved germline pharmacogenes may be easier to capture through conventional genotyping, although regulatory genetic variants may still have an important role.

Population genetics

The inclusion of diverse populations in genomic studies ensures that the benefits of precision medicine can be applied globally, in accordance with the ethical principle of justice.[18] Common pharmacogenomic variants stratified individuals into continental super-populations, with the admixed individuals separating along clines between these clusters (Supplementary Figure 1). This was also observed for the FST analyses of synonymous variants (Supplementary Figure 2). Rare variants have been shown to be geographically localized[14] and this clustering makes the design of arrays that adequately capture global variants difficult. This indicates that sequencing is the most appropriate way to assess pharmacogenomic variation across the frequency spectrum. The pharmacogenes that displayed highly differentiated variants are important for a variety of drug classes (Table 1, Supplementary Figure 5). Consistent with the history of modern humans, most differences were observed between African populations in relation to the other super-populations (91% of such variants displayed differences involving an African population) and there were no highly differentiated variants for the European-South Asian comparisons. The most differentiated polymorphism was a missense ADH1B variant (rs1229984), which is involved in alcohol metabolism. This variant has been linked to an increased oesophageal cancer risk,[19] and could contribute towards the elevated prevalence of this cancer in certain Asian populations,[20] although this phenotype is multifactorial and the effect size of the variant is modest. The CYP3A4*1G allele (rs2242480), which has been associated with increased tacrolimus metabolism,[21] displayed the greatest individual FST statistic (0.74 between Africans-Europeans). Unique patterns of genetic diversity for CYP3A4 in African populations have been documented,[22] and this, combined with the fact that African populations have higher frequencies of active CYP3A5,[23] indicate that these individuals would require higher dosages of immunosuppressive drug on average. The angiotensin converting enzyme (ACE) gene, contained the most variants that were globally rare, yet common in one population, with four independent signals (three African and one admixed, Supplementary Table 4). ACE inhibitors display differences response profiles, with African patients displaying less effective blood pressure reduction from these medications than Europeans[24] and higher risk for the ADR, angio-oedema.[25] Genetic variants identified through these analyses are therefore good candidates for future pharmacogenomic research. Three variants of potential relevance to CYP-related drug metabolism—CYP2B6 (rs28399501, 3′UTR), CYP2C8 (rs11572079, splice region) and CYP2C19 (rs181297724, missense/splice region)—were common in the Finnish, but rare in the global population. Allele frequency differences between the Finnish and other European populations have been documented for other CYP2 polymorphisms.[26] Pharmacogenomic studies of related medications and cohorts should include these variants to determine clinical relevance. For example, 27% of patients in Finland were found to discontinue statins (CYP2C8 substrate) during the first year of treatment, and ADRs potentially contributed towards this statistic.[27] Another notable finding in the Finnish population was the depletion of singletons in this bottlenecked population (Supplementary Figure 4), which is in line with previous genomic research in these individuals,[28] and provides the opportunity to study the effect of rare pharmacogenomic variants in these individuals.

Clinical pharmacogenomics and high-confidence LOF variants

Almost every 1000GP individual (97%) carried a high evidence clinical variant (Figure 2), indicating the clinical utility of current sequencing technologies. In addition, if a patient presents with the absence of pharmacogenomic risk variants for a particular drug, the treating physician can have more confidence prescribing that medication. Pharmacogenes relevant for anticancer agents featured prominently on this clinical list (DPYD–fluorouracil, MTFHR–methotrexate, TMEM43/XPC–cisplatin, TPMT–mercaptopurine), reflecting an active research field, with several biomarkers available for clinical intervention. This was followed by pharmacogenes involved in warfarin-related traits (CYP2C9 and CYP4F2), with CYP2C9*3 (rs1057910) also having relevance for severe skin reactions from phenytoin.[29] The highly polymorphic pharmacogene, CYP2D6, along with CYP2C19, each contributed four clinical variants. CYP2D6 is important in the metabolism of many drugs, including antidepressants as well as analgesics (for example, codeine and tramadol), indicating that carriers of these clinical variants are likely to benefit from receiving these pharmacogenomic results. The European super-population had the highest mean number of clinical variants (4.1), while the African populations had the lowest number of such variants (2.3) (Figure 2), which is similar to the findings for disease-related variants in different populations.[5] This most likely represents database bias, as the clinical pharmacogenomic variants assessed in this study rely on previously published evidence. African populations have been underrepresented in past pharmacogenomic research,[6, 30, 31] therefore reiterating the importance of performing research in diverse populations. These genetically diverse individuals are likely to harbour pharmacogenomic variants that are common in African populations, with similar effect sizes, but remain to be identified as being clinically relevant. As only the coding regions were assessed, these clinical carrier counts are underestimated in all populations. For example, increasing the capture region to include a more comprehensive set of transcripts incorporating untranslated regions would allow for the inclusion of additional clinical variants (for example, CYP3A5*3 and VKORC1 rs7294/rs9934438). With the addition of these variants, every individual in the 1000GP would carry a clinical variant, providing support for the use of augmented exome approaches.[32] A recent study also highlighted the importance of rare variation in a predominantly European-descent cohort of patients from the eMERGE Network analyzed with the PGRNseq platform.[33] This represents a significant advance in incorporating sequencing-based pharmacogenomic approaches into the clinic. Our study adds important additional support for these findings through capturing the diversity of pharmacogenomic alleles observed across the globe, surveying population genetic differences and annotating high-confidence pharmacogenomic LOF variants. LOF variants have a marked impact on protein function, and consequently, pharmacogenomic traits. We generated a list of pharmacogenes that are impaired by LOF variants that have been annotated with a high degree of confidence, minimizing potential false positives. Of possible clinical relevance, 50% of the top 10 pharmacogenes contributing towards the high-confidence LOF allele count also contain variants with PharmGKB level 1A/B evidence (CYP2C19, SLCO1B1, ANKK1, CYP3A5 and CYP2D6). The high number of CYP2D6 poor metabolizer (PM) alleles is of relevance since poor metabolizers will not receive therapeutic benefit from pro-drugs such as codeine,[34] while being placed at risk for ADRs from other medications (for example, tricyclic antidepressants).[35] Although >50% of pharmacogenes possessed high-confidence LOF variants, the majority of genes were only affected by rare LOF variation. Sequencing should therefore be considered the best strategy to capture the variation in drug response phenotypes. Finally, the true number of LOF variants is also likely to be higher due to our stringent annotation strategy and because the 1000GP did not report singleton indels,[6] a type of variation likely to cause frameshift mutations.

Limitations of current sequencing technologies

This study highlighted pharmacogenes that could be problematic with regards to short-read technologies (Supplementary Figure 8). Our results reiterate the difficulties associated with analyzing the CYP genes with such technologies,[8, 36] although our criteria used to flag potential problematic genes could be overly strict for research purposes.[8] For clinical sequencing applications, however, variation in these pharmacogenes should be confirmed via alternative methods. The inadequacy of sequencing for highly complex HLA and UGT genes also needs to be addressed since this group represents many important clinical pharmacogenes. The UGT loci play a major role in phase II drug metabolism,[9] while the HLA region is important for drug hypersensitivity reactions.[37] 1000GP Phase 3 employed 76-101bp paired-end sequencing and many limitations will be prevented with longer read technologies. There have been attempts to address some of these issues through novel bioinformatic pipelines and individuals that have been genotyped with numerous platforms.[38, 39, 40] Despite these limitations, the overall concordance between the 1000GP and external data with regards to allele frequency patterns was strong (Supplementary Figure 9). Reference transcripts can have a substantial influence on the annotation of variants, with LOF variants being particularly difficult to assess.[41, 42] For example, the important PM allele, CYP2C19*2, was not annotated as a high-confidence LOF variant in our analyses. It has also recently been shown that tools to infer pharmacogenomic alleles are currently inadequate when being used on current sequencing data and need to be improved.[43] An additional limitation of only assessing the exome is that non-coding regulatory variation, which is not captured with this approach, can have an important role in pharmacogenomic phenotypes,[44] highlighting one of the advantages of performing whole-genome sequencing. Finally, as this was beyond the scope of this study, a dedicated analysis of copy number of pharmacogenes is still required.

Conclusions

Sequencing technologies will continue to be used for pharmacogenomic applications in both research and clinical settings at an increasing rate. This study highlighted that this approach remains the best way to capture rare variants, which although independently rare, make up the bulk of the variation in pharmacogenes. To facilitate clinical uptake, it will be important to address the analysis burden associated with high-throughput sequencing-related data. Developing variant interpretation systems that include drug response prediction beyond well-characterized clinical factors will help achieve this goal. Rare variants will need to be considered in such approaches, a task that will be assisted by improvements in computational prediction. Sequencing is a globally inclusive technique, as genotypes are not restricted to a predetermined panel of variants. Our clinical analyses detected variants that were mainly relevant to anticancer agents and warfarin, suggesting literature biases. Additional robust pharmacogenomic studies using globally representative cohorts are therefore essential. Further, once sequenced, a genome can be used throughout a patient’s lifetime and can provide a constant source of medically relevant information that can be used to achieve a balance between mitigating ADRs and achieving drug efficacy.

42 in total

1. Pharmacogenomics, ancestry and clinical decision making for global populations.

Authors: E Ramos; A Doumatey; A G Elkahloun; D Shriner; H Huang; G Chen; J Zhou; H McLeod; A Adeyemo; C N Rotimi
Journal: Pharmacogenomics J Date: 2013-07-09 Impact factor: 3.550

2. OCT1 is a high-capacity thiamine transporter that regulates hepatic steatosis and is a target of metformin.

Authors: Ligong Chen; Yan Shu; Xiaomin Liang; Eugene C Chen; Sook Wah Yee; Arik A Zur; Shuanglian Li; Lu Xu; Kayvan R Keshari; Michael J Lin; Huan-Chieh Chien; Youcai Zhang; Kari M Morrissey; Jason Liu; Jonathan Ostrem; Noah S Younger; John Kurhanewicz; Kevan M Shokat; Kaveh Ashrafi; Kathleen M Giacomini
Journal: Proc Natl Acad Sci U S A Date: 2014-06-24 Impact factor: 11.205

3. Pharmacogenetic variation at CYP2C9, CYP2C19, and CYP2D6 at global and microgeographic scales.

Authors: Johanna Sistonen; Silvia Fuselli; Jukka U Palo; Neelam Chauhan; Harish Padh; Antti Sajantila
Journal: Pharmacogenet Genomics Date: 2009-02 Impact factor: 2.089

Review 4. Pharmacogenetic Markers of Drug Efficacy and Toxicity.

Authors: V L M Yip; D B Hawcutt; M Pirmohamed
Journal: Clin Pharmacol Ther Date: 2015-06-03 Impact factor: 6.875

Review 5. Clinical Pharmacogenetics Implementation Consortium guidelines for cytochrome P450 2D6 genotype and codeine therapy: 2014 update.

Authors: K R Crews; A Gaedigk; H M Dunnenberger; J S Leeder; T E Klein; K E Caudle; C E Haidar; D D Shen; J T Callaghan; S Sadhasivam; C A Prows; E D Kharasch; T C Skaar
Journal: Clin Pharmacol Ther Date: 2014-01-23 Impact factor: 6.875

Review 6. New genetic findings lead the way to a better understanding of fundamental mechanisms of drug hypersensitivity.

Authors: Munir Pirmohamed; David A Ostrov; B Kevin Park
Journal: J Allergy Clin Immunol Date: 2015-08 Impact factor: 10.793

7. Cypiripi: exact genotyping of CYP2D6 using high-throughput sequencing data.

Authors: Ibrahim Numanagić; Salem Malikić; Victoria M Pratt; Todd C Skaar; David A Flockhart; S Cenk Sahinalp
Journal: Bioinformatics Date: 2015-06-15 Impact factor: 6.937

Review 8. The emerging era of pharmacogenomics: current successes, future potential, and challenges.

Authors: J W Lee; F Aminkeng; A P Bhavsar; K Shaw; B C Carleton; M R Hayden; C J D Ross
Journal: Clin Genet Date: 2014-05-09 Impact factor: 4.438

9. Analysis of protein-coding genetic variation in 60,706 humans.

Authors: Monkol Lek; Konrad J Karczewski; Eric V Minikel; Kaitlin E Samocha; Eric Banks; Timothy Fennell; Anne H O'Donnell-Luria; James S Ware; Andrew J Hill; Beryl B Cummings; Taru Tukiainen; Daniel P Birnbaum; Jack A Kosmicki; Laramie E Duncan; Karol Estrada; Fengmei Zhao; James Zou; Emma Pierce-Hoffman; Joanne Berghout; David N Cooper; Nicole Deflaux; Mark DePristo; Ron Do; Jason Flannick; Menachem Fromer; Laura Gauthier; Jackie Goldstein; Namrata Gupta; Daniel Howrigan; Adam Kiezun; Mitja I Kurki; Ami Levy Moonshine; Pradeep Natarajan; Lorena Orozco; Gina M Peloso; Ryan Poplin; Manuel A Rivas; Valentin Ruano-Rubio; Samuel A Rose; Douglas M Ruderfer; Khalid Shakir; Peter D Stenson; Christine Stevens; Brett P Thomas; Grace Tiao; Maria T Tusie-Luna; Ben Weisburd; Hong-Hee Won; Dongmei Yu; David M Altshuler; Diego Ardissino; Michael Boehnke; John Danesh; Stacey Donnelly; Roberto Elosua; Jose C Florez; Stacey B Gabriel; Gad Getz; Stephen J Glatt; Christina M Hultman; Sekar Kathiresan; Markku Laakso; Steven McCarroll; Mark I McCarthy; Dermot McGovern; Ruth McPherson; Benjamin M Neale; Aarno Palotie; Shaun M Purcell; Danish Saleheen; Jeremiah M Scharf; Pamela Sklar; Patrick F Sullivan; Jaakko Tuomilehto; Ming T Tsuang; Hugh C Watkins; James G Wilson; Mark J Daly; Daniel G MacArthur
Journal: Nature Date: 2016-08-18 Impact factor: 49.962

10. Distribution and medical impact of loss-of-function variants in the Finnish founder population.

Authors: Elaine T Lim; Peter Würtz; Aki S Havulinna; Priit Palta; Taru Tukiainen; Karola Rehnström; Tõnu Esko; Reedik Mägi; Michael Inouye; Tuuli Lappalainen; Yingleong Chan; Rany M Salem; Monkol Lek; Jason Flannick; Xueling Sim; Alisa Manning; Claes Ladenvall; Suzannah Bumpstead; Eija Hämäläinen; Kristiina Aalto; Mikael Maksimow; Marko Salmi; Stefan Blankenberg; Diego Ardissino; Svati Shah; Benjamin Horne; Ruth McPherson; Gerald K Hovingh; Muredach P Reilly; Hugh Watkins; Anuj Goel; Martin Farrall; Domenico Girelli; Alex P Reiner; Nathan O Stitziel; Sekar Kathiresan; Stacey Gabriel; Jeffrey C Barrett; Terho Lehtimäki; Markku Laakso; Leif Groop; Jaakko Kaprio; Markus Perola; Mark I McCarthy; Michael Boehnke; David M Altshuler; Cecilia M Lindgren; Joel N Hirschhorn; Andres Metspalu; Nelson B Freimer; Tanja Zeller; Sirpa Jalkanen; Seppo Koskinen; Olli Raitakari; Richard Durbin; Daniel G MacArthur; Veikko Salomaa; Samuli Ripatti; Mark J Daly; Aarno Palotie
Journal: PLoS Genet Date: 2014-07-31 Impact factor: 5.917

30 in total

1. Standardized Biogeographic Grouping System for Annotating Populations in Pharmacogenetic Research.

Authors: Rachel Huddart; Alison E Fohner; Michelle Whirl-Carrillo; Genevieve L Wojcik; Christopher R Gignoux; Alice B Popejoy; Carlos D Bustamante; Russ B Altman; Teri E Klein
Journal: Clin Pharmacol Ther Date: 2019-01-21 Impact factor: 6.875

Review 2. Emerging strategies to bridge the gap between pharmacogenomic research and its clinical implementation.

Authors: Volker M Lauschke; Magnus Ingelman-Sundberg
Journal: NPJ Genom Med Date: 2020-03-05 Impact factor: 8.617

Review 3. Novel genetic and epigenetic factors of importance for inter-individual differences in drug disposition, response and toxicity.

Authors: Volker M Lauschke; Yitian Zhou; Magnus Ingelman-Sundberg
Journal: Pharmacol Ther Date: 2019-01-22 Impact factor: 12.310

Review 4. Pharmacogenomics in dermatology: tools for understanding gene-drug associations.

Authors: Roxana Daneshjou; Rachel Huddart; Teri E Klein; Russ B Altman
Journal: Semin Cutan Med Surg Date: 2019-03-01

Review 5. Pharmacogenomics and big genomic data: from lab to clinic and back again.

Authors: Adam Lavertu; Greg McInnes; Roxana Daneshjou; Michelle Whirl-Carrillo; Teri E Klein; Russ B Altman
Journal: Hum Mol Genet Date: 2018-05-01 Impact factor: 6.150

6. Implementation of preemptive DNA sequence-based pharmacogenomics testing across a large academic medical center: The Mayo-Baylor RIGHT 10K Study.

Authors: Liewei Wang; Steven E Scherer; Suzette J Bielinski; Donna M Muzny; Leila A Jones; John Logan Black; Ann M Moyer; Jyothsna Giri; Richard R Sharp; Eric T Matey; Jessica A Wright; Lance J Oyen; Wayne T Nicholson; Mathieu Wiepert; Terri Sullard; Timothy B Curry; Carolyn R Rohrer Vitek; Tammy M McAllister; Jennifer L St Sauver; Pedro J Caraballo; Konstantinos N Lazaridis; Eric Venner; Xiang Qin; Jianhong Hu; Christie L Kovar; Viktoriya Korchina; Kimberly Walker; HarshaVardhan Doddapaneni; Tsung-Jung Wu; Ritika Raj; Shawn Denson; Wen Liu; Gauthami Chandanavelli; Lan Zhang; Qiaoyan Wang; Divya Kalra; Mary Beth Karow; Kimberley J Harris; Hugues Sicotte; Sandra E Peterson; Amy E Barthel; Brenda E Moore; Jennifer M Skierka; Michelle L Kluge; Katrina E Kotzer; Karen Kloke; Jessica M Vander Pol; Heather Marker; Joseph A Sutton; Adrijana Kekic; Ashley Ebenhoh; Dennis M Bierle; Michael J Schuh; Christopher Grilli; Sara Erickson; Audrey Umbreit; Leah Ward; Sheena Crosby; Eric A Nelson; Sharon Levey; Michelle Elliott; Steve G Peters; Naveen Pereira; Mark Frye; Fadi Shamoun; Matthew P Goetz; Iftikhar J Kullo; Robert Wermers; Jan A Anderson; Christine M Formea; Razan M El Melik; John D Zeuli; Joseph R Herges; Carrie A Krieger; Robert W Hoel; Jodi L Taraba; Scott R St Thomas; Imad Absah; Matthew E Bernard; Stephanie R Fink; Andrea Gossard; Pamela L Grubbs; Therese M Jacobson; Paul Takahashi; Sharon C Zehe; Susan Buckles; Michelle Bumgardner; Colette Gallagher; Kelliann Fee-Schroeder; Nichole R Nicholas; Melody L Powers; Ahmed K Ragab; Darcy M Richardson; Anthony Stai; Jaymi Wilson; Joel E Pacyna; Janet E Olson; Erica J Sutton; Annika T Beck; Caroline Horrow; Krishna R Kalari; Nicholas B Larson; Hongfang Liu; Liwei Wang; Guilherme S Lopes; Bijan J Borah; Robert R Freimuth; Ye Zhu; Debra J Jacobson; Matthew A Hathcock; Sebastian M Armasu; Michaela E McGree; Ruoxiang Jiang; Tyler H Koep; Jason L Ross; Matthew G Hilden; Kathleen Bosse; Bronwyn Ramey; Isabelle Searcy; Eric Boerwinkle; Richard A Gibbs; Richard M Weinshilboum
Journal: Genet Med Date: 2022-03-21 Impact factor: 8.864

7. Genetic variation in human drug-related genes.

Authors: Charlotta Pauline Irmgard Schärfe; Roman Tremmel; Matthias Schwab; Oliver Kohlbacher; Debora Susan Marks
Journal: Genome Med Date: 2017-12-22 Impact factor: 11.117

8. Key Challenges in the Search for Innovative Drug Treatments for Special Populations. Converging Needs in Neonatology, Pediatrics, and Medical Genetics.

Authors: Stuart MacLeod
Journal: Children (Basel) Date: 2017-08-04

9. African Genetic Diversity: Implications for Cytochrome P450-mediated Drug Metabolism and Drug Development.

Authors: Iris Rajman; Laura Knapp; Thomas Morgan; Collen Masimirembwa
Journal: EBioMedicine Date: 2017-02-20 Impact factor: 8.143

10. A systematic comparison of pharmacogene star allele calling bioinformatics algorithms: a focus on CYP2D6 genotyping.

Authors: David Twesigomwe; Galen E B Wright; Britt I Drögemöller; Jorge da Rocha; Zané Lombard; Scott Hazelhurst
Journal: NPJ Genom Med Date: 2020-08-03 Impact factor: 8.617