| Literature DB >> 34997195 |
Yichuan Liu1, Hui-Qi Qu1, Frank D Mentch1, Jingchun Qu1, Xiao Chang1, Kenny Nguyen1, Lifeng Tian1, Joseph Glessner1, Patrick M A Sleiman1,2,3, Hakon Hakonarson4,5,6,7,8.
Abstract
Mental disorders present a global health concern, while the diagnosis of mental disorders can be challenging. The diagnosis is even harder for patients who have more than one type of mental disorder, especially for young toddlers who are not able to complete questionnaires or standardized rating scales for diagnosis. In the past decade, multiple genomic association signals have been reported for mental disorders, some of which present attractive drug targets. Concurrently, machine learning algorithms, especially deep learning algorithms, have been successful in the diagnosis and/or labeling of complex diseases, such as attention deficit hyperactivity disorder (ADHD) or cancer. In this study, we focused on eight common mental disorders, including ADHD, depression, anxiety, autism, intellectual disabilities, speech/language disorder, delays in developments, and oppositional defiant disorder in the ethnic minority of African Americans. Blood-derived whole genome sequencing data from 4179 individuals were generated, including 1384 patients with the diagnosis of at least one mental disorder. The burden of genomic variants in coding/non-coding regions was applied as feature vectors in the deep learning algorithm. Our model showed ~65% accuracy in differentiating patients from controls. Ability to label patients with multiple disorders was similarly successful, with a hamming loss score less than 0.3, while exact diagnostic matches are around 10%. Genes in genomic regions with the highest weights showed enrichment of biological pathways involved in immune responses, antigen/nucleic acid binding, chemokine signaling pathway, and G-protein receptor activities. A noticeable fact is that variants in non-coding regions (e.g., ncRNA, intronic, and intergenic) performed equally well as variants in coding regions; however, unlike coding region variants, variants in non-coding regions do not express genomic hotspots whereas they carry much more narrow standard deviations, indicating they probably serve as alternative markers.Entities:
Mesh:
Year: 2022 PMID: 34997195 PMCID: PMC9095459 DOI: 10.1038/s41380-021-01418-1
Source DB: PubMed Journal: Mol Psychiatry ISSN: 1359-4184 Impact factor: 13.437
Fig. 1Phenotype summary of 4179 African American individuals from the NHLBI Trans-Omics for Precision Medicine (TOPMed) project.
a Age distribution of patients: the majority ~95% are children under 18 years old. b Number of patients with corresponding eight diagnosis, including ADHD, depression, anxiety, autism, intellectual disabilities, speech/language disorder, delays in developments, and ODD, being noted that one patient could have multiple diagnosis. c Distribution of patients’ diagnosis, ranged from controls (no diagnosed mental disorders) to maximum six diagnoses.
Prediction accuracy summary (mean ± standard deviation).
| Variation types | Prediction accuracy (single diagnosis mental disorder vs controls) | Prediction accuracy (mental disorder vs controls) | Prediction accuracy (hamming loss among 8 disorders) | Prediction accuracy (exactly matches among 8 disorders) |
|---|---|---|---|---|
| Nonsynonymous SNVs | 71.7 ± 1.34% | 64.5 ± 1.2% | 0.28 ± 0.011 | 8.8 ± 1.1% |
| Frameshift SNVs | 70.8 ± 1.61% | 64 ± 1.4% | 0.30 ± 0.007 | 8.4 ± 0.91% |
| Stop codon SNV | 71.49 ± 1.69% | 65.1 ± 0.97% | 0.28 ± 0.004 | 7.2 ± 0.69% |
| SNVs in UTR region | 72.4 ± 1.44% | 65.5 ± 1.1% | 0.31 ± 0.009 | 7.6 ± 1.2% |
| SNVs in ncRNA | 72.6 ± 1.52% | 65.7 ± 1.3% | 0.29 ± 0.009 | 8.1 ± 1.4% |
| SNVs in intronic regions | 72.8 ± 1.29% | 65.7 ± 1.1% | 0.28 ± 0.008 | 8.1 ± 0.98% |
| SNVs in intergenic regions | 73.1 ± 1.23% | 64.6 ± 1.1% | 0.30 ± 0.006 | 9.3 ± 0.90% |
Prediction accuracy for specific disorders in patients with at least one diagnosis based on coding variants.
| Nonsynonymous SNVs | Frameshift SNVs | Stop codon | ||||
|---|---|---|---|---|---|---|
| Disorder | Precision | Recall | Precision | Recall | Precision | Recall |
| ADHD | 39.4 ± 2.7% | 40.5 ± 5.1% | 39.8 ± 2.8% | 39.1 ± 3.3% | 31.3 ± 40.2% | 0.08 ± 0.27% |
| Speech/language disorders | 30.3 ± 3.4% | 30.6 ± 5.2% | 32.6 ± 3.1% | 30.9 ± 4.2% | 0.0 ± 0.0% | 0.0 ± 0.0% |
| Developmental delays | 36.8 ± 2.1% | 37.1 ± 4.2% | 36.4 ± 2.8% | 36.2 ± 4.3% | 33.6 ± 1.2% | 90.1 ± 1.9% |
| Depression | 25.9 ± 3.1% | 23.6 ± 4.4% | 26.3 ± 3.4% | 24.8 ± 3.6% | 19.9 ± 6.5% | 3.7 ± 1.5% |
| Anxiety | 18.1 ± 4.1% | 13.7 ± 6.1% | 20.6 ± 3.4% | 18.9 ± 4.1% | 15.1 ± 2.3% | 14.9 ± 2.9% |
| ODD | 13 ± 11.1% | 6.1 ± 5.1% | 20.6 ± 5.9% | 14.6 ± 4.7% | 17.4 ± 9.7% | 2.5 ± 1.5% |
| Autism | 11.9 ± 7.1% | 6.1 ± 4.8% | 15.8 ± 4.5% | 13.2 ± 4.6% | 11.2 ± 2.6% | 12.2 ± 3.4% |
| Intellectual disabilities | 20.3 ± 3.8% | 19.4 ± 5.3% | 21.4 ± 2.9% | 18.8 ± 4.4% | 0.0 ± 0.0% | 0.0 ± 0.0% |
Prediction accuracy for specific disorders in patients with at least one diagnosis based on non-coding variants.
| SNVs in UTR region | SNVs in ncRNA | Intronic SNVs | Intergenic SNVs | |||||
|---|---|---|---|---|---|---|---|---|
| Disorder | Precision | Recall | Precision | Recall | Precision | Recall | Precision | Recall |
| ADHD | 38.4 ± 2.5% | 33.4 ± 7.3% | 39.6 ± 2.7% | 32.4 ± 5.7% | 39.3 ± 3.6% | 31.1 ± 7.1% | 42.6 ± 2.3% | 42.8 ± 3.9% |
| Speech/language disorders | 30.8 ± 3.3% | 31.6 ± 5.8% | 30.1 ± 2.9% | 26.5 ± 5.3% | 30.6 ± 2.8% | 31.8 ± 5.4% | 33.5 ± 2.8% | 33.6 ± 4.5% |
| Developmental delays | 35.4 ± 3.5% | 33.5 ± 7.3% | 35.4 ± 2.5% | 38.4 ± 4.5% | 36.6 ± 3.1% | 35.5 ± 4.5% | 36.4 ± 2.4% | 36.8 ± 4.2% |
| Depression | 24.1 ± 2.9% | 25.9 ± 7.6% | 24.6 ± 2.8% | 26.9 ± 4.7% | 24.7 ± 3.9% | 24.7 ± 5.5% | 26 ± 3.1% | 26.7 ± 5.0% |
| Anxiety | 18.6 ± 3.7% | 17.2 ± 4.2% | 17.2 ± 6.6% | 6.8 ± 3.9% | 16.1 ± 12.3% | 3.8 ± 3.2% | 20.1 ± 4.4% | 17.9 ± 4.4% |
| ODD | 10.9 ± 5.5% | 10.3 ± 6.2% | 11.6 ± 4.8% | 9.6 ± 5.5% | 6.8 ± 9.1% | 3.2 ± 2.5% | 15.4 ± 4.4% | 13.5 ± 4.1% |
| Autism | 12.9 ± 4.9% | 10.6 ± 5.2% | 13.2 ± 4.8% | 9.5 ± 3.6% | 14.9 ± 7.2% | 7.9 ± 4.2% | 16.9 ± 5.3% | 13.6 ± 4.3% |
| Intellectual disabilities | 19.9 ± 3.4% | 14.8 ± 5.3% | 22.1 ± 7.9% | 10.5 ± 5.3% | 21.8 ± 5.8% | 9.8 ± 3.5% | 20.5 ± 2.9% | 22.2 ± 4.4% |
Fig. 2Boxplots for weights of 587 genomic regions (feature vectors).
a In prediction of cases versus controls. b In multiple labeling for 1384 mental disorder patients.
Fig. 3Feature vector weight distribution of three different types of structural variants (nonsynonymous SNVs, frameshift SNVs, and stop codon SNVs) cross 22 autosomes.
a In prediction of cases versus controls, b In multiple labeling for 1384 mental disorder patients. Red dash line is the value if the genomic regions are uniformly weighted.
Fig. 4Feature vector weight distribution of four different types of structural variants (SNVs in UTR regions, ncRNA, intronic regions, and intergenic regions) cross 22 autosomes.
a In prediction of cases versus controls. b In multiple labeling for 1384 mental disorder patients. Red dash line is the value if the genomic regions are uniformly weighted.
Coding hotspots based on weight of genomic regions and enriched Gene Ontology (GO)/KEGG pathways.
| Variation type | Locus | Weight for case–control classifications (%) | Weight for multiple labeling (%) | Num genes contain corresponding variation | Enriched Gene Ontology/pathways (Benjamini–Hochberg adjusted |
|---|---|---|---|---|---|
| Stop codon | chr19:1-50000000 | 1.1 | 1.2 | 59 | – |
| Stop codon | chr19:50000001-55000000 | 1.5 | 1.5 | 68 | Regulation of immune response (2.2E-4); regulation of transcription (5.1E-4); osteoclast differentiation (2.1E-5); antigen processing and presentation (5.2E-4) |
| Stop codon | chr11:55000001-60000000 | 1.2 | 1.2 | 42 | G-protein-coupled receptor signaling pathway (4.3E-27); olfactory receptor activity (2.3E-36); olfactory transduction (1.8E-31) |
| Stop codon | chr17:35000001-40000000 | 1.0 | 1.1 | 33 | Lymphocyte/monocyte chemotaxis (2.1E-3); chemokine-mediated signaling pathway (5.3E-3); chemokine signaling pathway (0.034); cytokine-cytokine receptor interaction (0.036) |
| Frameshift SNVs | chr11:55000001-60000000 | 1.3 | 1.4 | 49 | G-protein-coupled receptor signaling pathway (4E-30); in sensory perception of smell (2E-42); olfactory receptor activity (1.3E-42); olfactory transduction (9.1E-35) |
| Frameshift SNVs | chr16:55000001-60000000 | 1.3 | 1.3 | 27 | – |
| Frameshift SNVs | chr19:50000001-55000000 | 1.3 | 1.4 | 76 | IMMUNE category diseases (6.9E-3); regulation of immune response (3.2E-11); regulation of transcription (3.2E-3); receptor activity (8.3E-3); osteoclast differentiation (1.2E-8); antigen processing and presentation (2.9E-6); natural killer cell-mediated cytotoxicity (5.3E-4) |
| Frameshift SNVs | chr6:30000001-35000000 | 1.1 | 1.2 | 65 | IMMUNE category diseases (2.5E-29); MHC class II receptor activity (1.2E-3); antigen processing and presentation (4.1E-5) |
| Frameshift SNVs | chr7:100000001-105000000 | 1.1 | 1.2 | 33 | – |
Genes in coding hotspots and their interacted medications.
| Variation type | Locus | Genes/interacted medications |
|---|---|---|
| Frameshift SNVs | chr11:55000001-60000000 | TCN1 (cyanocobalamin); MED19 (alcohol) |
| Frameshift SNVs | chr16:55000001-60000000 | SLC12A3 (interacted with 18 medicines); CETP (tamoxifen, atorvastatin, simvastatin, pravastatin, lovastatin, fluvastatin); MMP2 (cyclosporine, pravastatin, bevacizumab, vinblastine, filgrastim, zileuton, paclitaxel, simvastatin, letrozole, streptozocin, acetazolamide, deferoxamine, ramipril) |
| Frameshift SNVs | chr19:50000001-55000000 | KIR2DS4 (methotrexate); KCNC3 (dalfampridine, guanidine hydrochloride); FPR1 (penicillin G potassium, sulfinpyrazone); KLK1 (ecallantide); PRPF31 (metformin); CACNG6 (bepridil hydrochloride, pregabalin, gabapentin enacarbil, gabapentin) |
| Frameshift SNVs | chr6:30000001-35000000 | HCG22 (triamcinolone); CCHCR1 (nevirapine); HSPA1L (carbamazepine); MUCL3 (carboplatin, gemcitabine); EHMT2 (interacted with 189 medications); CDSN (carboplatin, gemcitabine); TCF19 (nevirapine); NOTCH4 (allopurinol); TAPBP (aspirin); ATAT1 (gemcitabine, carboplatin); COL11A2 (ocriplasmin, collagenase clostridium histolyticum); ZBTB22 (aspirin); TNF (interacted with 41 drug) |
| Frameshift SNVs | chr7:100000001-105000000 | SERPINE1 (cetrorelix, hydrochlorothiazide, epirubicin, captopril, orlistat, levothyroxine, nimodipine, dexamethasone, defibrotide, citalopram, urokinase, fluoxetine, vasopressin); STAG3 (vemurafenib); ACHE (interacted with 28 medications); EPHB4 (vandetanib) |
| Stop codon | chr11:55000001-60000000 | TMX2 (alcohol) |
| Stop codon | chr17:35000001-40000000 | SLFN11 (niraparib, temozolomide, talazoparib); CCL3 (infliximab) |
| Stop codon | chr19:1-5000000 | TBXA2R (morphine, iloprost, furosemide, vinblastine, dinoprostone, cyclosporine, aspirin, alprostadil); GRIN3B (felbamate, ketamine hydrochloride, esketamine, amantadine hydrochloride, orphenadrine hydrochloride, acamprosate calcium, orphenadrine, orphenadrine citrate, esketamine hydrochloride, memantine hydrochloride); PLIN3 (galsulfase, idursulfase); AMH (testosterone); MKNK2 (erlotinib, gefitinib, sorafenib); PIP5K1C (alcohol) |
| Stop codon | chr19:50000001-55000000 | KIR2DS4 (methotrexate); KLK4 (ecallantide, bortezomib); CACNG6 (bepridil hydrochloride, pregabalin, gabapentin enacarbil, gabapentin); PRPF31 (metformin); NDUFA3 (metformin hydrochloride) |