| Literature DB >> 29176608 |
Rachel K Severin1, Xinwei Li2, Kun Qian3, Andreas C Mueller2, Lynn Petukhova4,5.
Abstract
Knowledge about genetic drivers of disease increases the efficiency of interpreting patient DNA sequence and helps to identify and prioritize biological points of intervention. Discoveries of genes with single mutations exerting substantial phenotypic impact reliably provide new biological insight, although such approaches tend to generate knowledge that is disjointed from the complexity of biological systems governed by elaborate networks. Here we sought to facilitate diagnostic sequencing for hair disorders and assess the underlying biology by compiling an archive of 684 genes discovered in studies of monogenic disorders and identifying molecular annotations enriched by them. To demonstrate utility for this dataset, we performed two data driven analyses. First, we extracted and analyzed data implicating enriched signaling pathways and identified previously unrecognized contributions from Hippo signaling. Second, we performed hierarchical clustering on the entire dataset to investigate the underlying causal structure of hair disorders. We identified 35 gene clusters representing genetically derived biological modules that provide a foundation for the development of a new disease taxonomy grounded in biology, rather than clinical presentations alone. This Resource will be useful for diagnostic sequencing in patients with diseases affecting the hair follicle, improved characterization of hair follicle biology, and methods development in precision medicine.Entities:
Mesh:
Year: 2017 PMID: 29176608 PMCID: PMC5701154 DOI: 10.1038/s41598-017-16050-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Hair follicle signaling network revealed by genes underlying monogenic disorders. Annotations significantly enriched by the 684 genes we identified include 57 cellular signaling pathways (diamond nodes) that are connected by a network of 220 genes (rectangular nodes). Edges represent gene-pathway memberships. The most highly connected genes (black outlines) connect 49 pathways (black outlines). Of the eight pathways that do not contain any of the highly connected genes (red outlines), seven are connected by a set of 59 genes (indicated by red edges). This subnetwork was also identified by the Louvain method for gene community detection (red nodes) as one of four gene communities, and includes all 29 genes of the Hippo pathway. The other three gene communities are color-coded, indicating a consistency of results across both analytic methods.
Figure 2Molecular taxonomy of hair disorder genes revealed by functional hierarchical clustering analysis of 684 genes and 4,937 annotations. Unsupervised agglomerative hierarchical clustering was performed to group 684 genes based on the degree of similarity among their functional annotations. Color-coding distinguishes 35 clusters created by using an arbitrary threshold of height (h) = 1.15, indicated by a black horizontal line. Genes with similar functional annotations are grouped within the same or neighboring clusters. We propose that each cluster represents a biological module, a set of genes that converge on a shared biological feature whose diagnostic and clinical utility remain to be established.
Summary of natural language processing of cluster annotations.
| Cluster | Gene Count | Mapped Genes | Term Extraction |
|---|---|---|---|
| 1 | 11 | 11 | choline metabolism in cancer, binding site:atp, kinase, hsa04722:neurotrophin signaling pathway, hsa04071:sphingolipid signaling pathway, hsa04910:insulin signaling pathway |
| 2 | 24 | 23 | pi3k-akt signaling pathway, hsa04014:ras signaling pathway, kinase |
| 3 | 15 | 15 | hsa05100:bacterial invasion of epithelial cells, hsa04520:adherens junction, hsa04510:focal adhesion |
| 4 | 15 | 15 | hsa04110:cell cycle, 7157:tp53tumor protein p53, heat shock protein, nucleolin |
| 5 | 8 | 8 | obesity, dna-binding region:nuclear receptor, steroid hormone receptor |
| 6 | 8 | 8 | NKκB signaling pathway |
| 7 | 7 | 7 | autoimmune disease, infection, graft-versus-host disease |
| 8 | 19 | 18 | cardiovascular diseases, autoimmune disease, atherosclerosis, obesity, metabolic syndrome, type 2 diabetes |
| 9 | 13 | 13 | T-cell factor dependent signaling, hormone |
| 10 | 12 | 12 | lysosome, lysosomal lumen, glycosaminoglycan degradation |
| 11 | 4 | 4 | synaptic vesicle transport, melanosome organization, lysosomal organelles biogenesis |
| 12 | 15 | 14 | keratinocyte differentiation, foreskin |
| 13 | 10 | 10 | keratin, intermediate filament, ipr003054:type ii keratin |
| 14 | 6 | 6 | keratin, intermediate filament, ipr002957:keratin type i |
| 15 | 23 | 21 | magnesium, protein heterooligomerization |
| 16 | 18 | 17 | cell differentiation, fatty acid biosynthesis, iron, go:0030148 sphingolipid biosynthetic process |
| 17 | 3 | 3 | ribosomal protein |
| 18 | 22 | 22 | cell-cell adherens junction, methylation, gaba type a receptor associated protein like |
| 19 | 36 | 36 | go:0045892 negative regulation of transcription dna-te, 3065:hdac1histone deacetylase 1, domain:leucine-zipper, ipr011598:myc-type basic helix-loop-helix (bhlh) domain |
| 20 | 22 | 21 | chromatin regulator, 3066:hdac2histone deacetylase 2, go:0006310 dna recombination |
| 21 | 15 | 15 | go:0007568 aging, hsa04913:ovarian steroidogenesis, iron |
| 22 | 51 | 41 | cytoplasmic vesicle, endosome, go:0000139 golgi membrane |
| 23 | 5 | 5 | go:0004713 protein tyrosine kinase activity, go:0008543 fibroblast growth factor receptor signaling, go:0036092 phosphatidylinositol-3-phosphate biosynthesic process |
| 24 | 39 | 39 | go:0042438 melanin biosynthetic process, go:0033162 melanosome membrane, go:0043066 negative regulation of apoptotic process |
| 25 | 6 | 6 | go:0030057 desmosome, ipr014868:cadherin prodomain, ipr027397:catenin binding domain |
| 26 | 16 | 16 | go:0032496 response to lipopolysaccharide, myocardial infarction, go:0006954 inflammatory response |
| 27 | 61 | 53 | go:0007399 nervous system development, lipoprotein, cell projection |
| 28 | 21 | 18 | homeobox, go:0001942 hair follicle development |
| 29 | 34 | 32 | 5914:retinoic acid receptor alpha(rara), cross-link:Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in SUMO), dna-binding, transcription regulation |
| 30 | 37 | 35 | go:0005887 integral component of plasma membrane, calcium transport, go:0043588 skin development |
| 31 | 40 | 31 | go:0043473 pigmentation |
| 32 | 27 | 27 | go:0007155 cell adhesion, go:0030198 extracellular matrix organization, go:0005788 endoplasmic reticulum lumen |
| 33 | 14 | 14 | ipr001881:egf-like calcium-binding, ipr009030:insulin-like growth factor binding protein, n terminal |
| 34 | 10 | 10 | hsa04550:signaling pathways regulating pluripotency of stem cells, hsa05205:proteoglycans in cancer, hsa04390:hippo signaling pathway, hsa04916:melanogenesis, wnt signaling pathway |
| 35 | 17 | 16 | go:0005125 cytokine activity, sm00204:tgfb, growth factor, go:0008285 negative regulation of cell proliferation |
NLP identified the most frequent significantly enriched annotations specific to each of the 35 clusters, allowing for semantic interpretation of the hierarchical clustering analysis. Mapped genes indicate the number of genes annotated by at least one NLP feature. Dominant features of clusters suggest the functional significance of modules revealed by our analytic approach. In order to increase specificity of terms, annotations that appeared in more than 21 clusters (60%) were excluded from NLP. A list of the 20 most enriched annotations for each cluster may be found in Supplementary Table 6.