| Literature DB >> 33805976 |
Yichuan Liu1, Hui-Qi Qu1, Xiao Chang1, Lifeng Tian1, Jingchun Qu1, Joseph Glessner1, Patrick M A Sleiman1,2, Hakon Hakonarson1,2,3.
Abstract
RNA-seq has been a powerful method to detect the differentially expressed genes/long non-coding RNAs (lncRNAs) in schizophrenia (SCZ) patients; however, due to overfitting problems differentially expressed targets (DETs) cannot be used properly as biomarkers. This study used machine learning to reduce gene/non-coding RNA features. Dorsolateral prefrontal cortex (dlpfc) RNA-seq data from 254 individuals was obtained from the CommonMind consortium. The average predictive accuracy for SCZ patients was 67% based on coding genes, and 96% based on long non-coding RNAs (lncRNAs). Machine learning is a powerful algorithm to reduce functional biomarkers in SCZ patients. The lncRNAs capture the characteristics of SCZ tissue more accurately than mRNA as the former regulate every level of gene expression, not limited to mRNA levels.Entities:
Keywords: long non-coding RNAs; machine learning; schizophrenia; transcriptome
Year: 2021 PMID: 33805976 PMCID: PMC8037538 DOI: 10.3390/ijms22073364
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Two-fold random shuffle testing results for 50 rounds. X-axis is the round number, Y_1 axis (left) is the number of reduced genes (a) and lncRNAs (b), Y_2 axis (right) is the accuracy measurement ranged from 0 to 1.
Figure 2(a) Number of feature vectors for coding genes after multiple filtering methods; (b) number of feature vectors for lncRNAs after multiple filtering methods; (c) factor analysis cumulative curve and number of remain coding-gene feature vectors; (d) factor analysis cumulative curve and number of remain lncRNAs and their targeted genes’ feature vectors.
Figure 3-Log10 (Adjusted p value) scale for enriched functional pathways: (a) gene factor 1; (b) targeted genes in lncRNA factor 2.
Genes in gene factors with high supportive evidences, including genome-wide association study (GWAS), Genome Wide Linkage Study (Linkage) Copy Number Variation (CNV), integrative analysis (Integrative), differentially methylated (Diff Methy), Differentially expressed (Diff Exp), identified by exome sequencing (Exome), expression level in brain tissues (Brain Exp), Gene Ontology (GO), and total score (Score)
| Factor | Gene | GWAS | Linkage | CNV | Integrative | Diff Methy | Diff Exp | Exome | Brain Exp | GO | Score |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ASPHD1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 (26.68) | 0 | 5 |
| 1 | AK4 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 (13.55) | 1 | 4 |
| 1 | APH1A | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 (43.24) | 1 | 4 |
| 1 | FPGS | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 (14.38) | 1 | 4 |
| 1 | FSCN1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 (48.13) | 0 | 4 |
| 1 | INO80E | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 (20.22) | 0 | 4 |
| 1 | P2RX6 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 (4.99) | 1 | 4 |
| 1 | PCCB | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 (26.64) | 0 | 4 |
| 1 | PRODH | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 (24.04) | 0 | 4 |
| 1 | SCN1B | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 (25.12) | 1 | 4 |
| 1 | SEMA7A | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 (13.64) | 1 | 4 |
| 2 | BCCIP | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 (26.92) | 1 | 5 |
| 2 | HNRNPU | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 (108.84) | 0 | 4 |
| 2 | HSP90AA1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 (411.41) | 1 | 4 |
| 2 | NRG1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 (4.47) | 1 | 4 |
| 2 | PDE4B | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 (38.88) | 1 | 4 |
| 3 | TIMP2 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 (107.98) | 1 | 4 |
| 7 | BCL6 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 (29.50) | 1 | 5 |
| 9 | RERE | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 (25.51) | 1 | 4 |
Target genes in lncRNA factors with high supportive evidence.
| Factor | lncRNA | Target Gene | GWAS | Linkage | CNV | Integrative | Diff Methy | Diff Exp | Exome | Brain Exp | GO | Score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | ENSG00000247735.2 | ASPHD1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 (26.68) | 0 | 5 |
| 1 | ENSG00000232912.1 | RERE | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 (25.51) | 1 | 4 |
| 1 | ENSG00000235770.1 | FN1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 (38.33) | 1 | 4 |
| 1 | ENSG00000235831.2 | ITPR1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 (25.59) | 1 | 4 |
| 1 | ENSG00000239569.2 | SRPK2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 (75.17) | 1 | 4 |
| 1 | ENSG00000243762.1 | RANBP1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 (30.73) | 0 | 4 |
| 1 | ENSG00000247735.2 | SEZ6L2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 (38.77) | 1 | 4 |
| 1 | ENSG00000257126.1 | FOXG1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 (22.05) | 1 | 4 |
| 1 | ENSG00000261220.2 | ST3GAL1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 (8.93) | 1 | 4 |
| 1 | ENSG00000271849.1 | PJA2 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 (144.54) | 1 | 4 |
| 2 | ENSG00000224563.1 | BCL6 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 (29.50) | 1 | 5 |
| 2 | ENSG00000226978.1 | MAGI2 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 (24.46) | 1 | 4 |
| 2 | ENSG00000236031.1 | AKT3 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 (35.93) | 1 | 4 |
| 2 | ENSG00000248816.1 | TENM3 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 (8.06) | 1 | 4 |
| 2 | ENSG00000272367.1 | RASA1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 (20.45) | 1 | 4 |
| 3 | ENSG00000272989.1 | DLG1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 (52.38) | 1 | 5 |
| 3 | ENSG00000239569.2 | SRPK2 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 (75.17) | 1 | 4 |
| 3 | ENSG00000273164.1 | PRODH | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 (24.04) | 0 | 4 |
| 3 | ENSG00000273164.1 | DGCR2 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 (30.80) | 0 | 4 |
| 37 | ENSG00000248816.1 | TENM3 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 (8.06) | 1 | 4 |
Figure 4Portion of differentially expressed genes/lncRNAs versus genes/lncRNAs with at least one supportive evidence: (a) coding genes after factor analysis; (b) targeted coding genes based on lncRNA genomic locus.