| Literature DB >> 31765831 |
Yun Tan1, Lulu Jiang2, Kankan Wang3, Hai Fang4.
Abstract
We propose a computational workflow (I3) for intuitive integrative interpretation of complex genetic data mainly building on the self-organising principle. We illustrate the use in interpreting genetics of gene expression and understanding genetic regulators of protein phenotypes, particularly in conjunction with information from human population genetics and/or evolutionary history of human genes. We reveal that loss-of-function intolerant genes tend to be depleted of tissue-sharing genetics of gene expression in brains, and if highly expressed, have broad effects on the protein phenotypes studied. We suggest that this workflow presents a general solution to the challenge of complex genetic data interpretation. I3 is available at http://suprahex.r-forge.r-project.org/I3.html.Entities:
Keywords: Evolution; Human genetics; Interpretation; Machine learning; Self-organising
Mesh:
Substances:
Year: 2019 PMID: 31765831 PMCID: PMC7056857 DOI: 10.1016/j.gpb.2018.10.006
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1Overview of
A.I3 workflow. B. Applications to the interpretation of eQTL genes (eGenes) in brain tissues in terms of additional data (LoF intolerance) and annotation data (curated gene sets and gene evolutionary ages). C. Applications to the interpretation of genetic regulators in terms of additional data (protein phenotypic effects, gene expression, and LoF intolerance) and annotation data (pathways and gene druggable categories). eQTL, expression quantitative trait loci; LoF, loss-of-function.
Figure 2Genetics of gene expression in brain tissues
A. Overview of analytical workflow. B. Brain tissue landscape. A diamond-shape map trained using GTEx eGene datasets in brain tissues (and the whole blood for comparisons). The colour bar represents the q value significance defining an eGene in a tissue. C. LoF intolerance map produced from the LoF intolerance data overlaid onto the trained tissue map. D. Gene cluster identified from the trained tissue map. Clusters color-coded and labeled. E. The probability of containing LoF intolerant genes averaged per cluster. F. Enrichment analysis of gene clusters (in columns) in terms of 5 curated gene sets (in rows). G. Evolutionary analysis of gene clusters. Shown in rows are phylostrata ordered by evolutionary history.
Figure 3Genetic regulators of protein phenotypes
A. Overview of analytical workflow. B. Protein phenotype landscape on 2D. A trefoil-shaped map trained from positive regulators involving 11 protein phenotypes. The colour bar represents the mutation index as a measure of identifying regulators; the lower the more likely, according to haploid mutagenesis screens for genetic regulators. The landscape is drawn within the outermost box in which geometric location depicts the similarity between these 11 protein phenotypes. C. Gene cluster identified from the trained protein phenotype map. Clusters color-coded and labeled. The overlaid map obtained by overlaying additional data onto the trained protein phenotype map. The phenotypic effect map using the per regulator number of phenotypes (D), the expression map using the RNA-seq expression data in HAP1 cells (E), and the ExAC LoF map using the ExAC LoF intolerance data (F). Also shown on the right are values for the corresponding additional data averaged per cluster. G. Reactome pathways enriched in gene clusters. H. LoF intolerance explaining relationships between expression and phenotypic effects of genetic regulators. I. Druggable categories enriched in gene clusters. Odds ratio (and 95% confidence interval) based on Fisher’s exact test. ATK, Phosphorylated ATK; CTNNB1, Non-phosphorylated β-catenin; ERK, Phosphorylated ERK; GNB1, GNB1 abundance; H2AK119, Histone H2A(K119) crotonylation; H3K27, Histone H3(K27) trimethylation; IRF1, IRF1 abundance; LAMP1, Glycosylated LAMP1; p38, Phosphorylated p38; PD-L1, PD-L1 abundance; XBP1, Spliced XBP1.