| Literature DB >> 21935354 |
Frederick E Dewey1, Rong Chen, Sergio P Cordero, Kelly E Ormond, Colleen Caleshu, Konrad J Karczewski, Michelle Whirl-Carrillo, Matthew T Wheeler, Joel T Dudley, Jake K Byrnes, Omar E Cornejo, Joshua W Knowles, Mark Woon, Katrin Sangkuhl, Li Gong, Caroline F Thorn, Joan M Hebert, Emidio Capriotti, Sean P David, Aleksandra Pavlovic, Anne West, Joseph V Thakuria, Madeleine P Ball, Alexander W Zaranek, Heidi L Rehm, George M Church, John S West, Carlos D Bustamante, Michael Snyder, Russ B Altman, Teri E Klein, Atul J Butte, Euan A Ashley.
Abstract
Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.Entities:
Mesh:
Year: 2011 PMID: 21935354 PMCID: PMC3174201 DOI: 10.1371/journal.pgen.1002280
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Pedigree and genetic risk prediction workflow.
A, Family pedigree with known medical history. The displayed ages represent the age of death for deceased subjects or the age at the time of medical history collection (9/2010) for living family members. Arrows denote sequenced family members. Abbreviations: AD, Alzheimer's disease; CABG, coronary artery bypass graft surgery; CHF, congestive heart failure; CVA, cerebrovascular accident; DM, diabetes mellitus; DVT, deep venous thrombosis; GERD, gastroesophageal reflux disease; HTN, hypertension; IDDM, insulin-dependent diabetes mellitus; MI, myocardial infarction; SAB, spontaneous abortion; SCD, sudden cardiac death. B, Workflow for phased genetic risk evaluation using whole genome sequencing.
Figure 2Development of major allele reference sequences.
Allele frequencies from the low coverage whole genome sequencing pilot of the 1000 genomes data were used to estimate the major allele for each of the three main HapMap populations. The major allele was substituted for the NCBI reference sequence 37.1 reference base at every position at which the reference base differed from the major allele, resulting in approximately 1.6 million single nucleotide substitutions in the reference sequence. A, Approximately half of these positions were shared between all three HapMap population groups, with the YRI population containing the greatest number of major alleles differing from the NCBI reference sequence. B, Number of disease-associated variants represented in the NCBI reference genome by the minor allele in each of the three HapMap populations. C, Number of positions per Mbp at which the major allele differed from the reference base by chromosome and HapMap population.
Figure 3Inheritance state analysis, error estimation, and phasing.
A, A Hidden Markov Model (HMM) was used to infer one of four Mendelian and two non-Mendelian inheritance states for each allele assortment at variant positions across the quartet. “MIE-rich” refers to Mendelian-inheritance error (MIE) rich regions. “Compression” refers to genotype errors from heterozygous structural variation in the reference or study subjects, manifest as a high proportion of uniformly heterozygous positions across the quartet. B, A combination of quality score calibration using orthogonal genotyping technology and filtering SNVs in error prone regions (MIE-rich and compression regions) identified by the HMM resulted in >90% reduction in the genotype error rate estimated by the MIE rate. C, Consistent with PRDM9 allelic status, approximately half of all recombinations in each parent occurred in hotspots. The mother has two haplotypes in the gene RNF212 associated with low recombination rates, while the father has one haplotype each associated with high and low recombination rates. Notation denotes base at [rs3796619, rs1670533]. D, Variant phasing using pedigree, inheritance state, and population linkage disequilibrium data. Pedigree data were first used to phase informative allele assortments in trios (top). The inheritance state of neighboring regions was used to phase positions in which all members of a mother-father-child trio were heterozygous and the sibling was homozygous for the reference or non-reference allele (middle). For uniformly heterozygous positions, we phased the non-reference allele using a maximum likelihood model to assign the non-reference allele to paternal or maternal chromosomes based on population linkage disequilibrium with phased SNVs within 250 kbp (bottom). In all panels a corresponds to the reference allele and b to the non-reference allele.
Putative loss of function variants across the family quartet.
| All variants | All rare/novel | Rare/novel and OMIM-disease associated gene | ||||
| Variant type | HG19 reference (n = 4302405) | CEU reference (n = 3733299) | HG19 reference (n = 351555) | CEU reference (n = 354074) | HG19 reference | CEU reference |
| Coding-missense | 9468 | 7982 | 1276 | 1276 | 203 | 200 |
| Coding-nonsense | 52 | 50 | 13 | 13 | 1 | 1 |
| Coding-synonyn | 11663 | 9928 | 1061 | 1059 | 186 | 186 |
| Intronic | 1303341 | 1128283 | 116276 | 115397 | 19544 | 19766 |
| Splice-5′ | 156 | 147 | 16 | 16 | 0 | 0 |
| Splice-3′ | 98 | 96 | 9 | 9 | 1 | 1 |
| UTR-5′ | 40142 | 37794 | 3637 | 3619 | 510 | 516 |
| UTR-3′ | 61826 | 59396 | 5989 | 5953 | 848 | 857 |
| miRNA target | 0 | 0 | 0 | 0 | 0 | 0 |
| Pri-miRNA | 2 | 2 | 1 | 1 | 0 | 0 |
| Mature miRNA | 0 | 0 | 0 | 0 | 0 | 0 |
| Coding indels | 1519 | 1476 | 432 | 412 | 73 | 71 |
| Coding frameshift indels | 440 | 418 | 273 | 253 | 29 | 27 |
Abbreviations: CEU reference, variant calls against CEU major allele reference; HG19 reference, variant calls against NCBI reference sequence 37.1; miRNA, micro RNA; Pri-miRNA, primary microRNA transcript; OMIM, Online Mendelian Inheritance In Man database; UTR, un-translated region.
Rare variants with known clinical associations.
| Chromosome | Gene | rsid | Affected family members | Disease | Inheritance | Onset-earliest | Onset-median | Severity | Actionability | Lifetime risk | Variant pathogenicity |
| 12 |
| rs61750615 | M, S, D | Von Willebrand disease | Incomplete dominant | 1 | 1 | 5 | 5 | variable | 7 |
| 10 |
| rs7080536 | M, S, D | Carotid stenosis, thrombophilia | AD | 4 | 4 | 1 | 5 | variable | 7 |
| 19 |
| rs79389353 | M, D | Cysteinuria – kidney stones | AR | 1 | 1 | 3 | 5 | 7 | 7 |
| 1 |
| rs6025 | F, D | Thrombophilia | Incomplete dominant | 4 | 4 | 4 | 5 | 2 | 7 |
| 1 |
| rs1801133 | F, D | Hyperhomocystein-emia | AR | 1 | 1 | 1 | 6 | 2 | 7 |
Key: Father, mother, son, daughter = F, M, S, D. Abbreviations: AD, autosomal dominant; AR, autosomal recessive. Variants were scored according to disease phenotype features and variant pathogenicty as outlined in Table S4.
Figure 4Ancestry and immunogenotyping using phased variant data.
A, Ancestry analysis of maternal and paternal origins based on principle components analysis of SNP genotypes intersected with the Population Reference Sample dataset. B, The HMM identified a recombination spanning the HLA–B locus and facilitated resolution of haplotype phase at HLA loci. Contig colors in the lower panel correspond to the inheritance state as depicted in Figure 3A. C, Common HLA types for family quartet based on phased sequence data.
Figure 5Common variant risk prediction.
A, Common variant risk prediction for 28 disease states for each of the family members (f, father; m, mother; s, son; d, daughter) and 174 ethnicity-matched HapMap subjects. The x-axis in each plot represents the log10(likelihood ratio) for each disease according to allelic distribution of SNPs identified in the literature as significantly associated with disease by 2 or more studies including 2000 or more total subjects. B, Upper left: pre (base) and post (bar end) estimates of disease risk for the father according to common variant risk prediction, derived from the pre-probability of disease multiplied by the composite likelihood ratio from all SNPs meeting the criteria described above. Upper right: Composite likelihood ratio estimates for disease risk according to common genetic variation. Blue bars represent paternal estimate, pink bars represent maternal estimate, red points represent the estimate for the daughter, and blue points represent the estimate for the son. Lower panels: parental haplotype contribution to disease risk for each child (points) for the daughter (lower left) and son (lower right). Blue shading represents paternal haplotype risk allele contribution and pink shading represents maternal haplotype risk allele contribution.
Drug metabolizing enzyme variants.
| Father | Mother | Sister | Brother | ||||||
| Drug Metabolizing Enzyme | Drugs Metabolized | Genotype | Phenotype | Genotype | Phenotype | Genotype | Phenotype | Genotype | Phenotype |
| CYP2C9 | warfarin,NSAIDS (naproxen, ibuprofen, celecoxib, etc.), sulfonylureas (glimepiride, glipizide, etc.) fluvastatin |
| normal metabolizer |
| normal metabolizer |
| normal metabolizer |
| normal metabolizer |
| CYP2C19 | clopidogrel, proton pump inhibitors (omeprazole, pantoprazole, etc.), citalopram |
| Undetermined |
| ultra metabolizer |
| ultra metabolizer |
| Undetermined |
| CYP2D6 | codeine, metoprolol, tamoxifen, fluoxetine |
| intermediate metabolizer |
| normal metabolizer |
| intermediate metabolizer |
| intermediate metabolizer |
*CYP2C9 genotypes checked and ruled out: *2, *3, *5, *8, *9, *10, *11, *12, *18; absence of these alleles defaults to *1.
†: CYP2C19 genotypes based on single defining SNPs for the *17 and *2 alleles; all other alleles ruled out by default.
‡: The in vivo phenotype for the combination of an increased activity allele and a loss-of-function allele for CYP2C19 is not well studied to date. According to Scott et al [53], one paper has reported intermediate activity for this allele combination with respect to clopidogrel, but the study was not replicated and therefore the phenotype is considered provisional. The actual phenotype associated with this combination may vary depending upon other factors such as the medication(s) the patient is taking, as well as other inducers and inhibitors of CYP2C19.
§: CYP2D6 genotypes checked: *2, *4, *5, *10, *15, *8, *11, *12, *14, *17, *19, *20, *29, *31, *35, *40, *41, *69; absence of these alleles defaults to *1.
Genetic pharmacological response predictions.
| SNP location | Drug(s) | Drug(s) more likely to work | Drug(s) less likely to work | Drug(s) more likely to cause side effect | Drug(s) less likely to cause side effect | Drug dose(s) above average | Drug dose(s) below average | Drug dose(s) average | No PGx action/ phenotype unknown | Confidence level |
| rs9934438 | warfarin | F, M, S, D | High | |||||||
| rs1954787 | citalopram | F, M, D | S | High | ||||||
| rs776746 | cyclosporine | F, M, S, D | High | |||||||
| rs1800460 | thiopurines | High | ||||||||
| rs2108622 | warfarin | F, M, S, D | Medium | |||||||
| rs4680 | morphine | F, M, S, D | Medium | |||||||
| rs5443 | statins | F, M | S, D | Medium | ||||||
| rs4253778 | beta blocking agents | D | F, M ,S, | Medium | ||||||
| rs622342 | metformin | M, S | F, D | Medium | ||||||
| rs7569963 | citalopram | S | F | M, D | Medium | |||||
| rs8012552 | ACE inhibitors | F, M, S, D | Low | |||||||
| rs11209716 | ACE inhibitors | F, S, D | M | Low |
Key: Father, mother, son, daughter = F, M, S, D. Abbreviations: ACE, angiotensin converting enzyme; PGx, pharmacogenomic. Family members' genotypes are compared to other possible genotypes; this is not a population-based statistic.