| Literature DB >> 29263805 |
Greyson P Twist1,2, Andrea Gaedigk3,4,5, Neil A Miller1, Emily G Farrow1,5, Laurel K Willig1,4,5, Darrell L Dinwiddie6, Josh E Petrikin1,4,5, Sarah E Soden1,4,5, Suzanne Herd1, Margaret Gibson1, Julie A Cakici1, Amanda K Riffel3, J Steven Leeder1,3,4,5, Deendayal Dinakarpandian2,5, Stephen F Kingsmore1,4,5,7.
Abstract
An important component of precision medicine-the use of whole-genome sequencing (WGS) to guide lifelong healthcare-is electronic decision support to inform drug choice and dosing. To achieve this, automated identification of genetic variation in genes involved in drug absorption, distribution, metabolism, excretion and response (ADMER) is required. CYP2D6 is a major enzyme for drug bioactivation and elimination. CYP2D6 activity is predominantly governed by genetic variation; however, it is technically arduous to haplotype. Not only is the nucleotide sequence of CYP2D6 highly polymorphic, but the locus also features diverse structural variations, including gene deletion, duplication, multiplication events and rearrangements with the nonfunctional, neighbouring CYP2D7 and CYP2D8 genes. We developed Constellation, a probabilistic scoring system, enabling automated ascertainment of CYP2D6 activity scores from 2×100 paired-end WGS. The consensus reference method included TaqMan genotyping assays, quantitative copy-number variation determination and Sanger sequencing. When compared with the consensus reference Constellation had an analytic sensitivity of 97% (59 of 61 diplotypes) and analytic specificity of 95% (116 of 122 haplotypes). All extreme phenotypes, i.e., poor and ultrarapid metabolisers were accurately identified by Constellation. Constellation is anticipated to be extensible to functional variation in all ADMER genes, and to be performed at marginal incremental financial and computational costs in the setting of diagnostic WGS.Entities:
Year: 2016 PMID: 29263805 PMCID: PMC5685293 DOI: 10.1038/npjgenmed.2015.7
Source DB: PubMed Journal: NPJ Genom Med ISSN: 2056-7944 Impact factor: 8.617
Figure 1Graphical overview of the highly polymorphic CYP2D6/2D7/2D8 locus. (a) The reference Chr 22 locus comprising the CYP2D6*1 haplotype (white) and two non-functional paralogs, CYP2D7 (red) and CYP2D8 (grey). Note that the locus is on the minus strand and is shown in reverse. REP6 and REP7 are paralogous, Alu-containing, 600-bp repetitive segments found downstream of CYP2D6 and CYP2D7, respectively. The blue boxes indicate identical unique sequences downstream of CYP2D6 and CYP2D7 which are separated from REP7 by 1.6-kb in the latter. (b) Three CYP2D6 haplotypes, CYP2D6*2, CYP2D6*10 and CYP2D6*4. The CYP2D6 activity conveyed by these haplotypes is indicated by colour-coded boxes (red, non-functional variant; orange, decreased activity; green, fully functional reference activity; blue, increased activity). (c) The most common CYP2D6 copy-number variations. CYP2D6*5 is characterised by a deletion of the entire CYP2D6 gene and fusion of REP6 and REP7 (REP-del). Duplication haplotypes have two or more CYP2D6 copies, as exemplified by CYP2D6*2x2 (ultrarapid metaboliser) and CYP2D6*4x2 (non-functional). Less common are copy-number variants with three or more copies. (d) Hybrid genes composed of CYP2D7 and CYP2D6 fusion products that result from unequal recombination. A number of hybrid genes with a variety of switch regions have been described and are consolidated as the CYP2D6*13 haplotype. (e) Four tandem arrangements, featuring two or more, non-identical copies of CYP2D6.
Figure 2In silico modelling of the uniqueness of alignments of simulated short-read sequences to the region of Chromosome 22 containing CYP2D6, CYP2D7 and CYP2D8 (hg19, chr22:42,518,000-42,555,000). Simulated singleton reads (a) and paired-end reads (b) from 50 to 5,000-nt in length were generated from this region. For paired-end reads, insert lengths varied from 300 to 800-nt. Exons, introns and genomic features to which reads mapped uniquely with GSNAP are shown as green‘1’; regions to which reads did not map uniquely are shown as red ‘0’.
Figure 3Flow diagram of the assignment of CYP2D6 phenotype inferred by WGS and Constellation. Whole-genome sequence data are mapped to the reference human genome and CYP2D6 diplotypes called by Constellation. Predicted phenotype is determined by assigning an ‘activity score’ based on the individual diplotype.[8–10]
Summary of diplotype and activity score assignments and phenotype predictions for different methods, the consensus reference and Constellation
|
|
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||||
| CMH 064 | No | C | 1 |
|
|
|
| 1 | 1 | EM |
| CMH 076 | No | AA |
|
|
|
|
| 2 | 2 | EM |
| CMH 172 | No | Mex |
|
|
|
|
| 2 | 2 | EM |
| UDT 002 | No | n/a |
|
|
|
|
| 0 | 0 | PM |
| UDT 173 | No | n/a |
|
|
|
|
| 1 | 1 | EM |
| CMH 557 | No | C |
|
|
|
|
| 2 | 2 | EM |
| CMH 563 | No | C |
|
|
|
|
| 2 | 2 | EM |
| CMH 010 | No | C |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 154 | No | C |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 487 | No | C |
|
|
|
|
| 2 | 2 | EM |
| CMH 545 | No | C |
|
|
|
|
| 1 | 1 | EM |
| CMH 589 | No | C |
|
|
|
|
| 0 | 0 | PM |
| CMH 663 | No | C |
|
|
|
|
| 0.5 | 0.5 | IM |
| CMH 677 | No | C |
|
|
|
|
| 0 | 0 | PM |
| CMH 731 | No | C |
|
|
|
|
|
|
| IM |
| NA07019 | No | C |
|
|
|
|
| 1 | 1 | EM |
| NA12753 | No | C |
|
|
|
|
| 1 | 1 | EM |
| NA19685 | No | Mex |
|
|
|
|
| 3 | 3 | UM |
| NA18507 | No | Yoruban |
|
|
|
|
|
|
| EM |
| CMH 186 | M | Mex |
|
|
|
|
| 1 | 1 | EM |
| CMH 202 | F | Mex |
|
|
|
|
| 1 | 1 | EM |
| CMH 184 | C-1 | Mex |
|
|
|
|
| 1 | 1 | EM |
| CMH 185 | C-2 | Mex |
|
|
|
|
| 0 | 0 | PM |
| CMH 224 | M | n/a |
|
|
|
|
| 0.5 | 0.5 | IM |
| CMH 222 | C-1 | n/a |
|
|
|
|
| 0.5 | 0.5 | IM |
| CMH 223 | C-2 | n/a |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 248 | M | C |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 249 | F | C |
|
|
|
|
| 1 | 1 | EM |
| CMH 446 | C-1 | C |
|
|
|
|
| 2 | 2 | EM |
| CMH 447 | C-2 | C |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 397 | M | AA/AI |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 398 | F | AA/AI |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 396 | C | AA/AI |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 437 | M | AA |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 438 | F | AA |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 436 | C | AA |
|
|
|
|
| 2 | 2 | EM |
| CMH 570 | M | C |
|
|
|
|
|
|
| EM |
| CMH 571 | F | C |
|
|
|
|
| 1 | 1 | EM |
| CMH 569 | C | C |
|
|
|
|
| 1 | 1 | EM |
| CMH 579 | M | C |
|
|
|
|
| 2 | 2 | EM |
| CMH 580 | F | C |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 578 | C | C |
|
|
|
|
| 2 | 2 | EM |
| CMH 630 | M | n/a |
|
|
|
|
| 2 | 2 | EM |
| CMH 631 | F | n/a |
|
|
|
|
| Unknown | Unknown | Unknown |
| CMH 629 | C | MR |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 673 | M | C |
|
|
|
|
| 2 | 1 | EM |
| CMH 674 | F | C |
|
|
|
|
| 1 | 1 | EM |
| CMH 672 | C | C |
|
|
|
|
| 1 | 1 | EM |
| CMH 681 | M | C |
|
|
|
|
| 1 | 1 | EM |
| CMH 682 | F | C |
|
|
|
|
| 2 | 2 | EM |
| CMH 680 | C | C |
|
|
|
|
| 2 | 2 | EM |
| CMH 729 | M | C |
|
|
|
|
| 1.5 | 1.5 | EM |
| CMH 730 | F | C |
|
|
|
|
| 0.5 | 0.5 | IM |
| CMH 728 | C | C |
|
|
|
|
| 1 | 1 | EM |
| CMH 679 | M | C |
|
|
|
|
| 0 | 0 | PM |
| CMH 678 | C | C |
|
|
|
|
| 1 | 1 | EM |
| CMH 719 | M | C |
|
|
|
|
| 2 | 2 | EM |
| CMH 718 | C | C |
|
|
|
|
| 2 | 2 | EM |
| NA12878 | M | Eur |
|
|
|
|
| 0 | 0 | PM |
| NA12877 | F | Eur |
|
|
|
|
| 0 | 0 | PM |
| NA12882 | C | Eur |
|
|
|
|
| 0 | 0 | PM |
Abbreviations: AA, African American; AI, American Indian; C, Caucasian; Ch, child; Ch-1, child 1; Ch-2, child 2; CNV, copy-number variation; EM, extensive metaboliser phenotype; Eur, European Ethnicities; F, father; IM, intermediate metaboliser phenotype; M, mother; MR, mixed race; No, not related; PM, poor metaboliser phenotype; UM, ultrarapid metaboliser phenotype; WGS, whole-genome sequencing.
TaqMan refers to genotype analysis using a panel of genotyping assays (see Supplementary Table 1). CNV refers to quantitative multiplex PCR that determines CYP2D6 gene copy number (deletion, duplication, multiplication and gene hybrids). This assay was complemented by genotyping XL-PCR amplicons generated from duplicated or hybrid gene copies (Supplementary Figure 1) or sequencing. The number of gene copies are as indicated; the presence of CYP2D6/CYP2D7 gene hybrids (6/7 hyb) are also shown. Sanger refers to diplotype calls based on Sanger sequencing of a 6.6-kb long XL-PCR product encompassing the CYP2D6 gene (Supplementary Figure 1). Consensus reference indicates calls derived from a combination of CNV, TaqMan and Sanger sequencing. Constellation refers to calls made by the Constellation software using .vcf files generated from WGS. Activity Scores (AS) were assigned to diplotypes derived from the consensus reference diplotypes and Constellation. Inconsistent calls between the consensus reference calls and Constellation are bolded. Phenotype prediction is consistent between the consensus reference and Constellation calls with the exception of three cases. (+) denotes that the subject was identified as having a duplication. [mac], multiple ambiguous calls causing a ‘no call’ result. #, novel subvariant(s) identified (see Supplementary Figure 3 for details). For brevity, this is only annotated in the column labelled ‘Sanger’. [*2], TaqMan genotype result for SNP rs16947 was not conclusive. Allele subtype assignments are not shown in this table, but provided for each individual in Supplementary Figure 3. Subjects with a CMH or UDT-prefix are patient samples, those with a NA-prefix were obtained from the Coriell Institute. Relatedness of subjects is as indicated. Coriell samples are annotated as European (Eur) in the Coriell database.