| Literature DB >> 31645420 |
Anna L Tyler1, Abbas Raza2, Dimitry N Krementsov3, Laure K Case1, Rui Huang4, Runlin Z Ma4, Elizabeth P Blankenhorn5, Cory Teuscher6,7, J Matthew Mahoney8,9.
Abstract
Genetic mapping is a primary tool of genetics in model organisms; however, many quantitative trait loci (QTL) contain tens or hundreds of positional candidate genes. Prioritizing these genes for validation is often ad hoc and biased by previous findings. Here we present a technique for prioritizing positional candidates based on computationally inferred gene function. Our method uses machine learning with functional genomic networks, whose links encode functional associations among genes, to identify network-based signatures of functional association to a trait of interest. We demonstrate the method by functionally ranking positional candidates in a large locus on mouse Chr 6 (45.9 Mb to 127.8 Mb) associated with histamine hypersensitivity (Histh). Histh is characterized by systemic vascular leakage and edema in response to histamine challenge, which can lead to multiple organ failure and death. Although Histh risk is strongly influenced by genetics, little is known about its underlying molecular or genetic causes, due to genetic and physiological complexity of the trait. To dissect this complexity, we ranked genes in the Histh locus by predicting functional association with multiple Histh-related processes. We integrated these predictions with new single nucleotide polymorphism (SNP) association data derived from a survey of 23 inbred mouse strains and congenic mapping data. The top-ranked genes included Cxcl12, Ret, Cacna1c, and Cntn3, all of which had strong functional associations and were proximal to SNPs segregating with Histh. These results demonstrate the power of network-based computational methods to nominate highly plausible quantitative trait genes even in challenging cases involving large QTL and extreme trait complexity.Entities:
Keywords: Clarkson’s Disease; Gene prioritization; histamine hypersensitivity; machine learning; quantitative trait locus
Mesh:
Substances:
Year: 2019 PMID: 31645420 PMCID: PMC6893195 DOI: 10.1534/g3.119.400740
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Workflow Overview. The workflow is broken into blocks by color, each with a bolded title. Each block shows how data (blue rectangles) were operated on (gray rectangles) to achieve results (green rectangles). Arrows show the general flow of work and dependence (and independence) of individual analyses.
Figure 3Interval specific recombinant congenic line (ISRCL) mapping identified four linked QTL controlling Histh. ISRCLs were injected (D0) with CFA and subsequently challenged (D30) by i.v. injection of 75 mg/kg histamine to determine histamine hypersensitivity. Deaths were recorded at 30 min post injection and the data are reported as the number of animals dead over the number of animals studied. Significance of observed differences was determined by a test with p-values considered significant.
Figure 2Network-based machine learning for functionally annotating genes. A Known-positive genes annotated to a functional term (blue nodes) are typically densely interconnected in a functional network. B The adjacency matrix of a network is a tabular representation of the connectivity structure of the network in which each row/column corresponds to a node of the network, and connected pairs of nodes have non-zero values in the corresponding cell of the matrix. Note that in general the connections are weighted, but for display we are only showing present/absent links (white/black cells). The connections from every gene in the genome to the known positives form a sub-matrix of the adjacency matrix called the feature matrix (vertical red lines), whose rows are the feature vectors for each gene. C Using the network-based feature vectors for each gene, we train SVMs to distinguish known positives (blue dots) from random genes in the genome (gray dots) to identify the full sub-network corresponding to the true positive genes (green bordered dots and dotted red lines in panels A,B).
A survey Histh phenotypes across 23 inbred mouse strains
| Strain | HA | Strain | HA | Strain | HA |
|---|---|---|---|---|---|
| A/J | 0/8 | CZECHII/EiJ | 0/8 | ||
| AKR/J | 0/8 | DBA/1J | 0/8 | ||
| BALB/cJ | 0/8 | DBA/2J | 0/8 | ||
| BPL/1J | 0/8 | JF1/Ms | 0/8 | ||
| C3H/HeJ | 0/8 | MOLF/EiJ | 0/8 | ||
| C57BL/10J | 0/8 | MRL/MpJ | 0/8 | ||
| C57BL/6J | 0/7 | PWD/PhJ | 0/12 | ||
| CBA/J | 0/8 | PWK/PhJ | 0/6 |
Cohorts of CFA injected 8- to 10-week old mice were challenge 30 days later with 75 mg/kg HA by i.v. injection, and deaths recorded at 30 min. Results are expressed as the (number of animals dead)/(number of animals studied). The last column contains strains with haplotype structure similar to SJL/J in bold typeface. These strains are divided into those that did not develop Histh (top), and those that did (bottom).
Figure 4Targeted genetic association analysis for Histh. Negative log-transformed p values of SNP associations with Histh. Genomic coordinates (mm10 Mbp) of each SNP are shown along the x-axis. Each circle denotes a single SNP. Gene names are included for SNPs that crossed p-value threshold of shown with a red dotted line. The location of Histh sub-QTL are shown at the top of the figure.
Figure 5Two axes of gene scoring. Gene names are plotted by their on the x-axis and the on the y-axis. Both scores were scaled by their maximum value for better comparison. Genes farther to the right were associated with SNPs that segregated with Histh. Genes higher up on the y-axis have stronger functional association with gene modules. The blue line marks the Pareto front. Genes on this line maximize the two scores and are the best candidates based on the combination of both scores.
Figure 6Final gene scores. Gene functional values were combined with SNP associations to assign each gene a final gene score (). Higher gene scores indicate better candidates. A Heat map showing the final score of each of the top 20 ranked genes for each gene module. To aid visualization of the strongest candidates, asterisks in each cell indicate where candidate genes were associated with a module with an . B The top panel shows individual SNPs plotted at their genomic location (x-axis) and their (y-axis). All SNPs with nominally significant p value ( are plotted. The horizontal line indicates the Bonferroni corrected significance cutoff (). The four sub-QTL are demarcated by background color and are labeled at the top of the figure. The bottom panel shows genes plotted at their genomic location (x-axis) and their final gene score () (y-axis) to demonstrate how the final ranked genes align with the SNP association data.