| Literature DB >> 26845687 |
Harendra Guturu1,2, Sandeep Chinchali1,3, Shoa L Clarke4, Gill Bejerano2,3,5.
Abstract
Although many human diseases have a genetic component involving many loci, the majority of studies are statistically underpowered to isolate the many contributing variants, raising the question of the existence of alternate processes to identify disease mutations. To address this question, we collect ancestral transcription factor binding sites disrupted by an individual's variants and then look for their most significant congregation next to a group of functionally related genes. Strikingly, when the method is applied to five different full human genomes, the top enriched function for each is invariably reflective of their very different medical histories. For example, our method implicates "abnormal cardiac output" for a patient with a longstanding family history of heart disease, "decreased circulating sodium level" for an individual with hypertension, and other biologically appealing links for medical histories spanning narcolepsy to axonal neuropathy. Our results suggest that erosion of gene regulation by mutation load significantly contributes to observed heritable phenotypes that manifest in the medical history. The test we developed exposes a hitherto hidden layer of personal variants that promise to shed new light on human disease penetrance, expressivity and the sensitivity with which we can detect them.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26845687 PMCID: PMC4742230 DOI: 10.1371/journal.pcbi.1004711
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Schematic of conserved binding site eroding loci method.
(A) Method for inferring conserved binding site eroding loci (CoBELs) and hypothesizing functional consequences of erosions. (B) Conserved binding site eroding loci (CoBELs) are human reference transcription factor binding sites, conserved across multiple mammals, that are disrupted by a sequenced individual’s derived variant. Shown is a CoBEL upstream of ADRA1B contributing to the Quake genome “abnormal cardiac output” prediction in Table 1. (C) Conserved binding site eroding loci (CoBELs) are checked for enrichment of function and the functional phenotypes are matched to medical histories via literature survey. Each step is evaluated for statistical significance (see text).
Top predicted phenotype and matching medical phenotype.
The set of conserved binding site eroding loci (CoBELs) for each individual is searched for the most significant congregation of binding site erosion events next to a group of genes sharing the same function or phenotype (see text). Per personal genome, the top row columns 2–7 describe the obtained top prediction from personal genome data and its properties. The Fold enrichment and FDR q-value are both reported by GREAT’s binomial enrichment test, fraction of relevant genes is the number of genes annotated for the phenotype (those listed in affected target genes) divided by all genes annotated with the phenotype. Column 8 highlights the matching personal medical phenotype. The bottom row for each personal genome spanning columns 2–7 provides exact quotes from references that confirm the link between the predicted and observed phenotypes (columns 2 and 8 for each personal genome).
| Personal genome based prediction | |||||||
|---|---|---|---|---|---|---|---|
| Person | Affected phenotype | # of CoBEL loci | Fold | False Discovery Rate (Q-value) | Affected target genes | Fraction of relevant genes | Personal medical phenotype |
| abnormal cardiac output | 57 | 2.00 | 1.69 x 10−4 | ADRA1A, ADRA1B, ARSB, CACNB2, CDC42, CDH2, DDAH1, ELN, FXN, MLYCD, NPPA, NRG1, PDLIM3, PLN, PPARGC1A, PPARGC1B, RAF1, RXRA, TMOD1 | 58% | family history of ARVD/C and heart disease and presumed sudden cardiac death | |
| “Arrhythmogenic right ventricular dysplasia/cardiomyopathy is an inherited cardiomyopathy estimated to affect approximately 1 in 5,000 individuals. [. . .] The disease is frequently familial and typically involves autosomal dominant transmission with low penetrance and variable expressivity.” [ | |||||||
| preganglionic parasympathetic nervous system development | 23 | 3.26 | 1.18 x 10−4 | EGR2, HES1, HES3, HOXA1, HOXB1, HOXB2, PLXNA4, TFAP2A | 80% | narcolepsy | |
| “… a non-secondary involvement of the autonomic nervous system in narcolepsy is strongly suggested” [ | |||||||
| epithelial cell morphogenesis | 60 | 2.11 | 1.38 x 10−5 | BASP1, BCL11B, BMP4, CTNNB1, EPB41L5, FZD7, GATA3, GDNF, GREM1, HEG1, IHH, PAX2, PAX8, SALL1, SIX2, WT1 | 59% | possible keratosis pilaris | |
| “The epidermis [in keratosis pilaris] demonstrates mild hyperkeratosis, hypogranulosis, and follicular plugging.” [ | |||||||
| decreased circulating sodium level (hyponatremia) | 32 | 3.23 | 4.94 x 10−6 | EDN1, NR3C2, SCNN1B, SCNN1G, SLC26A3, SLC4A4, TXNIP, WWOX | 89% | hypertension | |
| “A sodium-conserving genome in the context of the contemporary high-sodium and low-potassium diet is maladaptive, with documented pathological and epidemiological consequences (ie, epidemic hypertension).” [ | |||||||
| regulation of oligodendrocyte differentiation | 59 | 2.11 | 2.93 x 10−5 | ASPA, BMP4, CTNNB1, CXCR4, DLX1, DLX2, HDAC2, HES1, HES5, ID2, ID4, LINGO1, OLIG2, PPARG, SHH, TCF7L2 | 73% | family history of patchy axonal polyneuropathy | |
| “Oligodendrocytes, the myelin-forming glial cells of the central nervous system, maintain long-term axonal integrity.” [ | |||||||
Fig 2Enrichment distribution of hypothesized phenotypes in ‘control’ genomes.
(A-E) Comparison of personal genome enrichments of 1,094 genomes from the 1,000 genomes project and the five genomes analyzed in this report. Dashed lines indicate GREAT’s default binomial fold (greater than or equal to two) and FDR (less than or equal to 0.05) significance thresholds. Lower left corner has the mass of genomes that were not significant by GREAT’s default hypergeometric FDR (less than or equal to 0.05). The red markers indicate an analyzed personal genome’s prediction is significant and distinguishes it from the 1,000 genomes cohort, indicating such associations do not spuriously appear at a high frequency in control individuals. Panel A indicates the enrichment of “abnormal cardiac output” is fairly common in the background 1,000 genomes cohort which is not unexpected since predisposition to mild forms of heart disease are common in otherwise normal populations.
Fig 3Frequency distribution of CoBELs in relation to population structure.
(A) Principal component analysis (PCA) of the five genomes with respect to the genomes in the 1,000 genomes project, revealed clustering with the European population as expected. (B-F) Comparison of the five individuals's enrichment specific CoBEL frequencies in all 1,000 genomes data and in the two populations with which the five genomes cluster by PCA. Both this and additional frequency distribution analysis (see text) reveal that top CoBELs enrichment are composed of both common and rare variants as expected of low pathogenicity mutations that exert a noticeable effect only in aggregate. The similarity of the frequency distributions for the full 1,000 genomes and two sub-populations further suggests the lack of any population specific bias in our enrichments.