| Literature DB >> 36117207 |
Ioanna Talli1, Nikolas Dovrolis2, Anastasis Oulas3,4, Stavroula Stavrakaki1, Kali Makedou5, George M Spyrou6,7, Ioanna Maroulakou8.
Abstract
BACKGROUND: Clinical classification of autistic patients based on current WHO criteria provides a valuable but simplified depiction of the true nature of the disorder. Our goal is to determine the biology of the disorder and the ASD-associated genes that lead to differences in the severity and variability of clinical features, which can enhance the ability to predict clinical outcomes.Entities:
Keywords: ASD; Bioinformatics; Clinical phenotype; Genetic variation; Genomics
Mesh:
Year: 2022 PMID: 36117207 PMCID: PMC9482726 DOI: 10.1186/s40246-022-00415-x
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 6.481
Fig. 1Flowchart of the classification process suing a linear regression classifier and PLINK as the feature (variants) selection tools. Step 1—Describes the dataset with 2 classes of patients with Severe and Non-severe autism. Step 2—Denotes the selection of a set of variants to be used during the leave-one-out classification (LOOCV) process. Step 3—The LOOCV is initiated by extracting one sample from the dataset. Step 4—PLINK is used to perform odds ratio analysis on the remaining samples (this avoids overfitting). X number of variants top significant variants are used for the next. Step 5—A linear regression model is trained using the data and features for the specific iteration. Step 6—the LOOCV process (Steps 3–5) is repeated for every sample (N = 33) and statistics recorder. Step 7—Steps 2–5 are repeated for every value of X
Fig. 2Pipeline for the identification of genes which characterize the IQ, verbal, memory, and attention measurements based on current scientific knowledge provided by the databases SIFT and Polyphen
Groups’ performance on IQ, vocabulary, memory and attention tasks
| Variables | ASD_MH | ASD_L | n | U | p | g | ||
|---|---|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | |||||
| Non-verbal IQ: RPM Standard Score | 98.61 | 15.13 | 40.66 | 12.23 | 33 | 0.00*** | < 0.001 | 4.17 |
| Expressive Vocabulary: CVS Standard Score | 76.11 | 12.43 | 3.33 | 12.91 | 33 | 0.00*** | < 0.001 | 5.75 |
| Picture comprehension: DSLD Test Preschool | – | – | 2.57 | 2.95 | 15 | – | – | – |
| Attention (total score): TAAC Test Percentile | 6.33 | 10.32 | 0.07 | 0.26 | 33 | 13.00*** | < 0.001 | .82 |
| Auditory Attention: TAAC Test Percentile | 16.66 | 26.21 | 0.14 | 0.35 | 33 | 24.00*** | < 0.001 | .85 |
| Visual Attention: TAAC Test Percentile | 10.94 | 18.47 | 0.13 | 0.35 | 33 | 7.00*** | < 0.001 | .79 |
| Visual Attention Range: TAAC Test Percentile | 17.67 | 23.84 | 0.33 | 1.29 | 33 | 20.50*** | < 0.001 | .98 |
| VSTM Sentence recall: TAAC Test Percentile | 14.28 | 14.63 | 0.67 | 0.26 | 33 | 11.50*** | < 0.001 | 1.26 |
| VSTM Word Recall: Memory Test | 20.39 | 7.09 | 1.07 | 4.13 | 33 | 4.00*** | < 0.001 | 3.25 |
| Immediate Visual Memory: Memory Test | 24.44 | 9.80 | 4.07 | 5.23 | 33 | 13.50*** | < 0.001 | 2.53 |
| Delayed Visual Memory: Memory Test | 4.05 | 2.34 | 0.53 | 1.12 | 33 | 24.00*** | < 0.001 | 1.86 |
| Visual Information Recall: Memory Test | 6.05 | 4.64 | 0.47 | 0.83 | 33 | 32.00*** | < 0.001 | 1.60 |
| Narration Total Elements: Memory Test | 5.50 | 5.54 | 0.07 | 0.26 | 33 | 25.50*** | < 0.001 | 1.32 |
| Narration Total Sections: Memory Test | 4.94 | 4.99 | 0.07 | 0.26 | 33 | 32.00*** | < 0.001 | 1.32 |
| Information Retention Factor: Memory Test | 5.28 | 6.06 | 0.40 | 1.55 | 33 | 32.00*** | < 0.001 | 1.06 |
| Recognition: Memory Test | 19.67 | 6.68 | 1.00 | 3.87 | 33 | 10.00*** | < 0.001 | 3.34 |
RPM Raven Progressive Matrices, CVS Crichton Vocabulary Scale, DSLD Test Preschool Detection of Speech and Language Disorders Test Preschool, TAAC Test for the Assessment of Attention and Concentration in Primary School
**p < 0.01, ***p < 0.001; Hedges' g: 0.2 = small effect size, 0.5 = medium effect size, 0.8 = large effect size
Fig. 3Results of the Classification process described above showing statistics for each LOOCV run across different values of top significant variants selected for validation. Statistics are recorded in the form LOOCV prediction accuracy (blue bars) of sensitivity (orange bars) specificity (gray bars) and finally Matthews correlation is shown (yellow line)
Fig. 4Visualization of risk model results for 33 ASD patients (18 non-severe and 15 severe) using the 26 variants selected during LOOCV. The dendrogram was obtained by performing hierarchical clustering (using Euclidean distance and average linkage algorithm) of model prediction outputs. The clustering represents the molecular subtypes obtained by the trained model for all ASD patients. The two molecular subtypes as predicated by the risk models are color-coded as pink for the most severe cases (high-risk individuals), light green for least severe cases (low-risk individuals). Moreover, the continuous spectrum of risk prediction scores is shown in the red-green gradient traversing the dendrogram. Patients are further sorted by severity in descending order. Clinical experimental data is also viewed in parallel to the results obtained from the machine learning algorithm and are shown as columns with dark and light gray boxes. The boxes denote the different level of severity for the six different clinical data available for this study. The molecular classification of samples 8574_9, 8574_14, 8574_7 and 8574_23 appears to differ from the clinical classification. These samples cluster separately from the rest of the samples with similar severe clinical phenotypes. Similarly, based on theory molecular classification, samples 8574_13 and 8574_18 also appear to cluster away from samples of similar non-severe clinical classification
Total of 84 unique top variants for severe vs non-severe molecular classification selected from LOOCV
| rs_ids | gene | chrom | start | end | aa_change | impact | aaf_1kg_all |
|---|---|---|---|---|---|---|---|
| rs76264143 | chr1 | 982843 | 982844 | intron variant | 0.04 | ||
| rs2748972 | chr1 | 1891476 | 1891477 | S/P | missense variant | 0.13 | |
| rs1763347 | chr1 | 103354427 | 103354428 | G | synonymous variant | 0.62 | |
| rs12119908 | chr1 | 156902221 | 156902222 | R/H | missense variant | 0.20 | |
| rs822431 | chr1 | 156902280 | 156902281 | S/A | missense variant | 0.28 | |
| rs4570419 | chr1 | 156907030 | 156907031 | intron variant | 0.22 | ||
| rs2275199 | chr1 | 156909694 | 156909695 | N | synonymous variant | 0.19 | |
| rs2275206 | chr1 | 156939066 | 156939067 | splice region variant | 0.18 | ||
| rs12063382 | chr1 | 236925843 | 236925844 | S | synonymous variant | 0.20 | |
| rs1061157 | chr2 | 203421198 | 203421199 | R | synonymous variant | 0.12 | |
| rs2921705 | chr2 | 233792564 | 233792565 | intron variant | 0.17 | ||
| rs34116584 | chr2 | 241808313 | 241808314 | P/L | missense variant | 0.08 | |
| rs66494441 | chr2 | 241808462 | 241808463 | intron variant | -1.00 | ||
| rs34726174 | chr2 | 241871846 | 241871847 | G/R | missense variant | 0.13 | |
| rs4683158 | chr3 | 46010076 | 46010077 | R/Q | missense variant | 0.93 | |
| rs4682801 | chr3 | 46021217 | 46021218 | R | synonymous variant | 0.76 | |
| rs12492868 | chr3 | 46755936 | 46755937 | T | synonymous variant | 0.38 | |
| rs34788938 | chr3 | 46759009 | 46759010 | Q/P | missense variant | 0.31 | |
| rs56260729 | chr3 | 195594949 | 195594950 | P/L | missense variant | 0.12 | |
| rs2070018 | chr4 | 155508626 | 155508627 | intron variant | 0.89 | ||
| rs2271704 | chr5 | 41008779 | 41008780 | L/P | missense variant | 0.78 | |
| rs316408 | chr5 | 43066773 | 43066774 | upstream gene variant | 0.67 | ||
| rs3749787 | chr5 | 151784182 | 151784183 | L | synonymous variant | 0.23 | |
| rs7965 | chr5 | 154346324 | 154346325 | K | synonymous variant | 0.06 | |
| rs2251702 | chr6 | 24797646 | 24797647 | H | synonymous variant | 0.52 | |
| rs62000984 | chr6 | 26017674 | 26017675 | L | synonymous variant | 0.09 | |
| rs2969042 | chr7 | 2612293 | 2612294 | intron variant | 0.26 | ||
| rs2969043 | chr7 | 2612294 | 2612295 | intron variant | 0.26 | ||
| rs10264715 | chr7 | 44555405 | 44555406 | Y | synonymous variant | 0.08 | |
| rs34947817 | chr7 | 143792990 | 143792991 | S/N | missense variant | 0.11 | |
| rs10230228 | chr7 | 143806687 | 143806688 | Q/K | missense variant | 0.16 | |
| rs10252253 | chr7 | 143807303 | 143807304 | L/P | missense variant | 0.16 | |
| rs7791767 | chr7 | 149513151 | 149513152 | non coding transcript exon variant | 0.21 | ||
| rs10250401 | chr7 | 149515102 | 149515103 | non coding transcript exon variant | 0.20 | ||
| rs622106 | chr8 | 12878676 | 12878677 | A | synonymous variant | 0.71 | |
| rs3739310 | chr8 | 12878806 | 12878807 | C/G | missense variant | 0.65 | |
| rs503550 | chr8 | 12879061 | 12879062 | L | synonymous variant | 0.71 | |
| rs608909 | chr8 | 12879333 | 12879334 | V | synonymous variant | 0.71 | |
| rs608052 | chr8 | 12879538 | 12879539 | R/G | missense variant | 0.70 | |
| rs7826836 | chr8 | 12888907 | 12888908 | S/A | missense variant | 0.67 | |
| rs1799931 | chr8 | 18258369 | 18258370 | G/E | missense variant | 0.08 | |
| rs16892543 | chr8 | 89340161 | 89340162 | P | synonymous variant | 0.21 | |
| rs1328285 | chr9 | 15922136 | 15922137 | intron variant | 0.88 | ||
| rs62559879 | chr9 | 33566233 | 33566234 | A | synonymous variant | 0.16 | |
| rs11791445 | chr9 | 114476747 | 114476748 | M/L | missense variant | 0.14 | |
| rs12352352 | chr9 | 114484782 | 114484783 | P | synonymous variant | 0.14 | |
| rs10512411 | chr9 | 114490228 | 114490229 | A | synonymous variant | 0.14 | |
| rs73563696 | chr9 | 140108856 | 140108857 | S | synonymous variant | 0.14 | |
| rs76301014 | chr11 | 123676387 | 123676388 | R/C | missense variant | 0.12 | |
| rs1298463 | chr12 | 72013831 | 72013832 | A | synonymous variant | 0.51 | |
| rs6538681 | chr12 | 96284649 | 96284650 | A | synonymous variant | 0.75 | |
| rs2985989 | chr13 | 46108853 | 46108854 | L | synonymous variant | 0.81 | |
| rs12586727 | chr14 | 38218342 | 38218343 | I/V | missense variant | 0.21 | |
| rs2229518 | chr15 | 30008888 | 30008889 | A | synonymous variant | 0.79 | |
| rs12904906 | chr15 | 60689994 | 60689995 | intron variant | 0.11 | ||
| rs714181 | chr16 | 3640273 | 3640274 | P/L | missense variant | 0.24 | |
| rs3743690 | chr16 | 20635417 | 20635418 | K | splice region variant | 0.18 | |
| rs2301672 | chr16 | 20636813 | 20636814 | S | synonymous variant | 0.18 | |
| rs12922670 | chr16 | 89234947 | 89234948 | upstream gene variant | 0.06 | ||
| rs9890913 | chr17 | 31618550 | 31618551 | L | synonymous variant | 0.13 | |
| rs77247739 | chr17 | 73518327 | 73518328 | Q/P | missense variant | 0.07 | |
| rs3744183 | chr17 | 77075666 | 77075667 | I | synonymous variant | 0.42 | |
| rs3744184 | chr17 | 77075669 | 77075670 | P | synonymous variant | 0.41 | |
| rs3744185 | chr17 | 77075672 | 77075673 | P | synonymous variant | 0.42 | |
| rs3744186 | chr17 | 77076439 | 77076440 | S | synonymous variant | 0.39 | |
| rs3752042 | chr17 | 78010412 | 78010413 | upstream gene variant | 0.12 | ||
| rs11082414 | chr18 | 42529995 | 42529996 | V/L | missense variant | 0.16 | |
| rs78047294 | chr19 | 871986 | 871987 | T | synonymous variant | 0.17 | |
| rs17445374 | chr19 | 21326357 | 21326358 | D/G | missense variant | 0.14 | |
| rs2280746 | chr19 | 35770055 | 35770056 | V/I | missense variant | 0.20 | |
| rs12104393 | chr19 | 48801217 | 48801218 | intron variant | 0.15 | ||
| rs2242463 | chr19 | 48806976 | 48806977 | D | synonymous variant | 0.15 | |
| rs28582401 | chr19 | 48807366 | 48807367 | L | synonymous variant | 0.15 | |
| rs2617667 | chr19 | 53993669 | 53993670 | A/T | missense variant | 0.30 | |
| rs17373408 | chr20 | 31624299 | 31624300 | S | synonymous variant | 0.07 | |
| rs2070326 | chr20 | 31678533 | 31678534 | L | synonymous variant | 0.22 | |
| rs3787220 | chr20 | 33337750 | 33337751 | P | synonymous variant | 0.80 | |
| rs6060043 | chr20 | 33364583 | 33364584 | intron variant | 0.81 | ||
| rs4811888 | chr20 | 56182182 | 56182183 | Q/E | missense variant | 0.13 | |
| rs5939319 | chrX | 2700156 | 2700157 | D/N | missense variant | 0.04 | |
| rs7049300 | chrX | 5821785 | 5821786 | T | synonymous variant | 0.12 | |
| rs6150 | chrX | 30327366 | 30327367 | C | synonymous variant | 0.10 | |
| rs5952285 | chrX | 44913051 | 44913052 | intron variant | 0.24 | ||
| rs5952682 | chrX | 44966794 | 44966795 | intron variant | 0.24 |
Validation of the IGs highlighted by our machine learning approach with the help of the 5 autism-related databases (AutismKB, SFARI, HuVarBase, DisGeNET and OpenTargets)
| Severity | Novel | Known (5 database validation set) |
|---|---|---|
| Mild | ||
| Severe | ||
Fig. 5Common and distinct selected Genes harboring variants (candidate mutations) with possible contribution to ASD risk, based on severity across distinct ASD subgroups (phenotypes)
Validation of the IGs highlighted by our literature-based approach with the help of the 5 autism-related databases (AutismKB, SFARI, HuVarBase, DisGeNET and OpenTargets). Results highlighted in Bold were found in SFARI
| INTERSECTION OF SUBPHENOTYPES AND 5 DATABASES | |||||
|---|---|---|---|---|---|
| SEVERITY ¯ SUBPHENOTYPE® | IQ | MEMORY | VERBAL | ATTENTION | LANGUAGE |
| INDEPENDENT SEVERITY | |||||
| SEVERE | |||||
| MILD | |||||
Fig. 6Functional analysis of the genes discovered by our Machine Learning approach for Severe and Mild autism. Figure shows individual gene participation in specific Autism Mechanisms (AMs)
Fig. 7Functional analysis of the genes discovered by our literature-based approach for Severe and Mild autism broken down by specific clinical sub-phenotypes. Figure shows individual gene participation in specific Autism Mechanisms (AMs)