| Literature DB >> 31510680 |
Sumit Mukherjee1, Thanneer M Perumal1, Kenneth Daily1, Solveig K Sieberts1, Larsson Omberg1, Christoph Preuss2, Gregory W Carter2, Lara M Mangravite1, Benjamin A Logsdon1.
Abstract
MOTIVATION: Late onset Alzheimer's disease is currently a disease with no known effective treatment options. To better understand disease, new multi-omic data-sets have recently been generated with the goal of identifying molecular causes of disease. However, most analytic studies using these datasets focus on uni-modal analysis of the data. Here, we propose a data driven approach to integrate multiple data types and analytic outcomes to aggregate evidences to support the hypothesis that a gene is a genetic driver of the disease. The main algorithmic contributions of our article are: (i) a general machine learning framework to learn the key characteristics of a few known driver genes from multiple feature sets and identifying other potential driver genes which have similar feature representations, and (ii) A flexible ranking scheme with the ability to integrate external validation in the form of Genome Wide Association Study summary statistics. While we currently focus on demonstrating the effectiveness of the approach using different analytic outcomes from RNA-Seq studies, this method is easily generalizable to other data modalities and analysis types.Entities:
Mesh:
Year: 2019 PMID: 31510680 PMCID: PMC6612835 DOI: 10.1093/bioinformatics/btz365
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Description of various feature sets used for multiview evidence aggregation
| Feature set | SynapseID | No. features | Type | Descriptions |
|---|---|---|---|---|
| Differential expression | syn18097426 | 250 | Binary | Membership based on differential expression in different brain regions and patient subgroups (such as males/females) |
| Global network | syn18097427 | 42 | Numeric | Features derived from graph structure in different brain regions |
| Module network | syn18097424 | 66 | Numeric | Features derived from graph structure in important co-expression modules from different brain regions |
Fig. 1.RNA-Seq data for AD patients and controls were derived for seven different brain regions from three centers. Differential expression, co-expression module and global network features were derived from all brain regions. Each feature and known drivers were used to build predictive models for driver genes. These driver probabilities and GWAS statistics were used for an evidence-based driver ranking
Fig. 2.Comparison of various classification algorithms trained on corrupted class labels and tested on actual labels
Fig. 3.(A) Results of the Mann–Whitney U test performed on IGAP and Jansen MP distributions for predicted driver versus non-driver genes. (B) Results of the t-test performed on IGAP and Jansen MLP distributions for predicted driver versus non-driver genes
Top 20 ranked genes along with their associated driver score and minimum P-value from IGAP (Lambert ) and Jansen (Jansen ) GWAS datasets
| Genes | Driver score | Jansen | IGAP |
|---|---|---|---|
| APOC1 | 42.92 | <1E−308 | <1E−308 |
| APOE | 41.75 | <1E−308 | <1E−308 |
| BCAM | 5.88 | 1.60E−143 | 4.66E−69 |
| CD74 | 4.92 | 1.93E−02 | 1.20E−01 |
| TREM2 | 4.65 | 2.95E−15 | 1.07E−03 |
| CLPTM1 | 4.58 | 7.07E−50 | 2.80E−21 |
| DEF6 | 4.28 | 5.94E−03 | 3.52E−02 |
| SLC7A7 | 4.05 | 2.29E−03 | 2.36E−02 |
| DOCK2 | 3.72 | 9.14E−04 | 4.82E−03 |
| SPI1 | 3.62 | 1.06E−06 | 1.99E−06 |
| STEAP3 | 3.61 | 3.63E−05 | 2.21E−02 |
| PICALM | 3.56 | 2.19E−18 | 1.91E−12 |
| HMOX1 | 3.56 | 1.16E−02 | 1.43E−01 |
| CLU | 3.55 | 2.61E−19 | 2.48E−17 |
| MS4A6A | 3.55 | 1.55E−15 | 6.64E−11 |
| IRF5 | 3.45 | 1.21E−02 | 1.48E−02 |
| TYROBP | 3.44 | 1.34E−02 | 5.40E−02 |
| PARVG | 3.42 | 1.44E−02 | 1.05E−03 |
| ITGAL | 3.41 | 1.92E−04 | 4.36E−03 |
| PTPRC | 3.33 | 2.12E−03 | 7.24E−03 |
Top 20 enriched genesets for biological process and function along with their associated adjusted P-values obtained from Enrichr (Chen )
| GO biological process | Adjusted | GO molecular function | Adjusted |
|---|---|---|---|
| Neutrophil mediated immunity | 3.03E−12 | MHC Class II receptor activity | 7.67E−03 |
| Neutrophil activation involved in immune response | 3.03E−12 | Activin binding | 7.67E−03 |
| Neutrophil degranulation | 4.62E−12 | MHC Class II protein complex binding | 7.67E−03 |
| Interferon-gamma-mediated signaling pathway | 4.62E−12 | MHC protein complex binding | 7.67E−03 |
| Cytokine-mediated signaling pathway | 9.91E−11 | Transforming growth factor beta binding | 7.67E−03 |
| Cellular response to interferon-gamma | 5.79E−10 | Phosphotyrosine residue binding | 7.67E−03 |
| Negative regulation of amyloid precursor protein catabolic process | 7.71E−05 | Transforming growth factor beta receptor binding | 7.67E−03 |
| Regulation of amyloid-beta formation | 7.94E−05 | Amyloid-beta binding | 7.67E−03 |
| Positive regulation of intracellular signal transduction | 1.62E−04 | Scavenger receptor activity | 1.04E−02 |
| Positive regulation of actin nucleation | 1.68E−04 | Protein phosphorylated amino acid binding | 1.09E−02 |
| Endocytosis | 2.26E−04 | Low-density lipoprotein receptor activity | 1.42E−02 |
| Regulation of mast cell degranulation | 3.07E−04 | Phosphatidylinositol bisphosphate binding | 1.42E−02 |
| Regulation of apoptotic process | 3.07E−04 | Protein kinase binding | 1.42E−02 |
| Extracellular matrix organization | 3.07E−04 | Clathrin heavy chain binding | 1.91E−02 |
| Negative regulation of amyloid-beta formation | 4.01E−04 | Lipoprotein particle receptor activity | 1.95E−02 |
| Antigen receptor-mediated signaling pathway | 4.01E−04 | GTPase regulator activity | 2.02E−02 |
| Negative regulation of extrinsic apoptotic signaling pathway | 5.26E−04 | Actin binding | 2.23E−02 |
| Regulation of amyloid-beta clearance | 5.77E−04 | Type II transforming growth factor beta receptor binding | 2.30E−02 |
| T cell receptor signaling pathway | 5.77E−04 | Low-density lipoprotein particle binding | 2.30E−02 |
| Cellular response to transforming growth factor beta stimulus | 1.09E−03 | Peptidase activity, acting on L-amino acid peptides | 2.30E−02 |
Spearman rank correlation (with model predictions) for the top 10 features of network topological feature sets
| Module net |
| Global net |
|
|---|---|---|---|
| TCXbrownTCXauthority | −0.36 | STGcloseness | 0.58 |
| TCXbrownTCXdegree | −0.36 | STGdegree | 0.57 |
| TCXbrownTCXeccentricity | −0.36 | STGauthority | 0.57 |
| DLPFCredDLPFCauthority | −0.34 | PHGauthority | 0.54 |
| DLPFCredDLPFCeccentricity | −0.34 | STGpagerank | 0.53 |
| TCXbrownTCXcloseness | −0.34 | PHGdegree | 0.53 |
| DLPFCredDLPFCdegree | −0.34 | PHGcloseness | 0.52 |
| TCXbrownTCXpagerank | −0.34 | DLPFCauthority | 0.52 |
| DLPFCredDLPFCcloseness | −0.33 | STGcentr_betw | 0.50 |
| DLPFCredDLPFCpagerank | −0.33 | DLPFCdegree | 0.50 |
Fig. 4.Known driver genes (colored in gray) and all other genes highlighted on the top two principal components for each of the three feature sets