| Literature DB >> 32111185 |
Sameer Sardaar1, Bill Qi1, Alexandre Dionne-Laporte2, Guy A Rouleau1,2, Reihaneh Rabbany3,4, Yannis J Trakadis5,6.
Abstract
BACKGROUND: Machine learning (ML) algorithms and methods offer great tools to analyze large complex genomic datasets. Our goal was to compare the genomic architecture of schizophrenia (SCZ) and autism spectrum disorder (ASD) using ML.Entities:
Keywords: Autism spectrum disorder; Genomic; Machine learning; Schizophrenia; Unsupervised clustering
Mesh:
Year: 2020 PMID: 32111185 PMCID: PMC7049199 DOI: 10.1186/s12888-020-02503-5
Source DB: PubMed Journal: BMC Psychiatry ISSN: 1471-244X Impact factor: 3.630
Performance of different approaches (algorithms) on test data
| Method | Accuracy | Precision | Recall | NIR | 95% CI | |
|---|---|---|---|---|---|---|
| SNV-based | 0.86 | 0.73 | 0.98 | 0.63 | < 4.97e-22 | (0.82,0.89) |
| Gene-based | 0.88 | 0.80 | 0.96 | 0.58 | < 3.09e-36 | (0.85,0.92) |
The performance between the two algorithms trained to distinguish ASD cases from SCZ cases is measured on a previously unseen test dataset. The accuracy is a measure of the number of correctly predicted samples divided by the total number of samples
Acc Accuracy, NIR No information rate, CI Confidence interval
Performance of SNV and Gene-based approaches using five-fold cross validation
| Method | Accuracy | Precision | Recall | NIR | 95% CI | |
|---|---|---|---|---|---|---|
| SNV-based | 0.88 | 0.78 | 0.97 | 0.59 | < 2.2e-16 | (0.86,0.90) |
| Gene-based | 0.88 | 0.81 | 0.95 | 0.57 | < 2.2e-16 | (0.86,0.90) |
The performance between the two algorithms trained to distinguish ASD cases from SCZ cases is measured using five-fold cross validation. All performance metrics are the average of the five cross validation folds
Acc Accuracy, NIR No information rate, CI Confidence interval
Top 10 important genes from SNV-based and gene-based models
| SNV-based approach (SNV rsID) | Gene-based approach |
|---|---|
| SARM1 (rs71373646) | SARM1 |
| QRICH2 (rs6501878) | QRICH2 |
| AKAP1 (rs34535433) | PRPF31 |
| PCLO (rs77721383) | SEC24D |
| TSPO2 (rs147405274) | SCN4A |
| ABCC3 (rs11568605) | CACNA1S |
| KIF13A (rs41267712) | CDSN |
| FAN1 (rs150393409) | HERC2 |
| CCDC155 (rs201671744) | MUC16 |
| PRPF31 (rs199870856) | PCLO |
Boosted regression trees models were trained to separate SCZ and ASD probands based on the population-structure-adjusted SNV-based and gene-based datasets. The 10 most important genes from the gene-based model, but also from the SNV-based approach (including the actual SNV in parenthesis), are shown in this table. The table is ordered from most to least importance
Fig. 1Hierarchical clustering of overlapping genes using SCZ cases
Fig. 2Hierarchical clustering of overlapping genes using ASD cases