Literature DB >> 28138110

Predicting Autism Spectrum Disorder Using Blood-based Gene Expression Signatures and Machine Learning.

Dong Hoon Oh1, Il Bin Kim2, Seok Hyeon Kim3, Dong Hyun Ahn3.   

Abstract

OBJECTIVE: The aim of this study was to identify a transcriptomic signature that could be used to classify subjects with autism spectrum disorder (ASD) compared to controls on the basis of blood gene expression profiles. The gene expression profiles could ultimately be used as diagnostic biomarkers for ASD.
METHODS: We used the published microarray data (GSE26415) from the Gene Expression Omnibus database, which included 21 young adults with ASD and 21 age- and sex-matched unaffected controls. Nineteen differentially expressed probes were identified from a training dataset (n=26, 13 ASD cases and 13 controls) using the limma package in R language (adjusted p value <0.05) and were further analyzed in a test dataset (n=16, 8 ASD cases and 8 controls) using machine learning algorithms.
RESULTS: Hierarchical cluster analysis showed that subjects with ASD were relatively well-discriminated from controls. Based on the support vector machine and K-nearest neighbors analysis, validation of 19-DE probes with a test dataset resulted in an overall class prediction accuracy of 93.8% as well as a sensitivity and specificity of 100% and 87.5%, respectively.
CONCLUSION: The results of our exploratory study suggest that the gene expression profiles identified from the peripheral blood samples of young adults with ASD can be used to identify a biological signature for ASD. Further study using a larger cohort and more homogeneous datasets is required to improve the diagnostic accuracy.

Entities:  

Keywords:  Autism spectrum disorder; Blood; Decision support techniques; Machine learning; Microarray analysis; Transcriptome

Year:  2017        PMID: 28138110      PMCID: PMC5290715          DOI: 10.9758/cpn.2017.15.1.47

Source DB:  PubMed          Journal:  Clin Psychopharmacol Neurosci        ISSN: 1738-1088            Impact factor:   2.582


INTRODUCTION

Autism spectrum disorders (ASDs) are devastating neurodevelopmental disorders characterized by deficits in social communication and interaction across multiple contexts as well as restricted, repetitive patterns of interests and behavior. The Centers for Disease Control recently presented that the prevalence of ASD has risen to approximately 1 in 68, and most children are not diagnosed with ASD until after 4 years of age in the United States.1) Because early intensive behavioral and developmental interventions for toddlers and children with autism could improve outcomes,2) there is a scientific need for reliable diagnostic ASD biomarkers that are expressed early in life. Such markers could have a significant impact on diagnosis and treatment. Although the complex etiologies of ASD are poorly understood, the high heritability of ASD is supported by high concordance rates (from 36% to 95%) in monozygotic twins and higher recurrence risks of 11% and 19% with single-sibling involvement.3–5) Rapid advances in clinical genetic testing technology have increased the diagnostic yield from about 10% a few years ago to about 30%.6) However, because many of these genetic variants show incomplete penetrance and variable phenotypic expression,7) the use of gene expression signature bio-markers may be informative and provide the best model for identifying ASD cases. In particular, four studies have investigated blood-derived gene expression signatures to differentiate between ASD individuals (toddlers and children) and unaffected controls.8–11) These studies focused on individuals with a mean age of 2.2 to 9.6 years who were at risk for ASD, and there were relatively high predictive accuracies (between 68% and 91%). To date, no study has demonstrated diagnostic prediction using blood-derived gene expression signatures in adult subjects with ASD. Accordingly, whether the gene expression profiles of adult individuals offer information about the ASD risk remains a critical question. The aim of this study is to apply a transcriptomic approach to identify a gene expression signature with promising performance in the diagnostic prediction of young adults with ASD. Here, we used a published ASD microarray dataset to test the hypothesis. These methods provide researchers with the opportunity to test hypotheses without performing time-consuming, labor-intensive bench work.

METHODS

Acquisition of the Microarray Data

A publicly available microarray dataset (GSE26415) was downloaded from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) database,12) which was deposited by Kuwano et al.13) The original data included 21 samples of peripheral leukocytes obtained from young adults with ASD and age- and sex-matched controls as well as from healthy women with children with ASD and matched controls. The pre-existing clinical diagnoses of ASD were made by experienced child psychiatrist and developmental pediatrician according to the Diagnostic and Statistical Manual of Mental Disorders, 4th edition, text revision (DSM-IV-TR). In order to corroborate the ASD diagnosis, the Japanese version of the Autism Spectrum Quotient was completed.13) The platform information was GPL6480 (Agilent-014850 Whole Human Genome Microarray 4x44K G4112F). In this study, we utilized 42 microarrays from subjects with ASD (n=21) and their matched controls (n=21) for further analyses. The demographic and clinical characteristics of the study subjects are summarized in Table 1.
Table 1

Demographic and clinical characteristics of study subjects

CharacteristicASD (n=21)Control (n=21)
Demographic
 Gender, male/female17/417/4
 Age (yr)26.7 (5.5)27.0 (5.5)
Clinical
 AQ30.2 (5.1)NA
 WAIS
  VIQ96.2 (19.9)NA
  PIQ87.5 (21.6)NA
  FIQ91.9 (21.6)NA

Values are presented as number only or mean (standard deviation).

ASD, autism spectrum disorder; AQ, autism spectrum quotient; WAIS, Wechsler Adult Intelligence Scale; IQ, intelligence quotients; VIQ, verbal IQ; PIQ, performance IQ; FIQ, full IQ; NA, not applied.

Data Preprocessing and Selection of Differentially Expressed Genes

The raw data in .CEL format were primarily processed using R language (http://www.r-project.org/)14) “limma” package.15) The datasets were imported in R using the “read.maimages” function; the “normexp” function was used for background correction. The adjusted data were transformed with a logarithm for normalization using the quantile method (Supplementary Fig. 1). Filtering was further achieved by building a criterion in which the 95th percentile of the negative probe on each array was set as a standard point of brightness. The control and low expression probes were filtered out when the probes at one-third of the total arrays were 10% less bright than the standard point. The “avereps” function was used to average the replicate spots on each array. Differentially expressed (DE) probes were identified using the moderated t-test from the limma package. p values were adjusted for multiple testing with the Bonferroni correction, and probes were called significant when the adjusted p value was <0.05.

Development of a Prediction Model Using a Machine Learning Algorithm

We applied machine learning to develop a prediction model that used DE probes extracted from the training set, differentiating between individuals with ASD and controls in the test set. Our strategy included two main types of machine learning, unsupervised and supervised learning. For unsupervised learning, we adopted hierarchical cluster analysis using complete linkage and the Euclidean distance. Cluster analysis and visualization were performed using the “heatmap.2” function in the “gplots” package16) in R. For supervised learning, we used three different machine learning algorithms, such as the support vector machine (SVM),17) K-nearest neighbors (KNN)18) and linear discriminant analysis (LDA).19) We performed prediction analysis in the subsequent sequential steps. Using the “set.seed” function in R, we randomly divided our data (n=42) into a training dataset (13 ASD and 13 control subjects) and test dataset (8 ASD and 8 control subjects). Each algorithm was trained on the training dataset of 26 randomly selected samples, which were labeled with DE probes. Eight ASD and 8 control subjects in the test data-set were validated. All supervised machine learning analyses were performed using the “MLinterfaces” packages20) in R language. Supplementary Figure 2 briefly describes the study design. The protocol of this study was reviewed and approved by the institutional review board of Hanyang University Hospital (HYUH IRB-2015-05-008).

RESULTS

Altered Gene Expression Profiling between the ASD and Control Groups

In comparing microarray data for the subjects with ASD (n=13) with those of unaffected controls (n=13) in the training dataset, a total of 19 DE probes were identified (adjusted p value <0.05), including 6 up-regulated probes and 13 down-regulated probes (Supplementary Fig. 3). Among the 19 probes, 15 were annotated as gene symbols using the Bioconductor “hgug4112a.db” package.21) Ten of these genes (or loci) had previously reported associations with ASD (Table 2).
Table 2

Nineteen probes significantly dysregulated* in the ASD training sample compared with the unaffected control’s training sample

Probe IDGene symbolGene nameLocationlogFCp valueAdjusted p valueEvidence for association with ASD

TypeNumber of reports
A_32_P9963HSF2Heat shock transcription factor 26q22.31−0.52883.99E-080.0007Deletion-duplication of 6q22.3120
A_24_P391104RFX1Regulatory factor X, 1 (influences HLA class II expression)19p13.10.53791.16E-070.0021Deletion of 19p13.13-p13.112
A_23_P214037NPM1Nucleophosmin (nucleolar phosphoproein B23, numatrin)5q35.1−0.51394.34E-070.0079
A_32_P184330<NA>0.56644.39E-070.0080
A_24_P832113NPM1Nucleophosmin (nucleolar phosphoproein B23, numatrin)5q35.1−0.48084.79E-070.0088
A_23_P119683MIER2Mesoderm induction early response 1, family member 219p13.30.33915.41E-070.0099
A_23_P162807MRPS31Mitochondrial ribosomal protein S3113q14.11−0.44221.13E-060.0206Deletion-duplication of 13q14.1115
A_23_P88439TC2NTandem C2 domains, nuclear14q32.12−0.40561.25E-060.0228Deletion of 14q32.11-q32.131
A_32_P188674NPM1Nucleophosmin (nucleolar phosphoproein B23, numatrin)5q35.1−0.47421.30E-060.0237
A_23_P399501PKMPyruvate kinase, muscle15q220.40231.32E-060.0242
A_32_P46765C12orf29Chromosome 12 open reading frame 2912q21.32−0.43201.38E-060.0252Deletion of 12q21.31-q21.331
A_24_P927883JADE2Jade family PHD finger 25q31.1−0.36951.52E-060.0278
A_23_P131676ACKR3Atypical chemokine receptor 32q37.3−0.44131.66E-060.0303Deletion of 2q37.1-q37.33
A_23_P84154ARHGAP15Rho GTPase activating protein 152q22.2-q22.3−0.39581.69E-060.0309Rare single gene variant6
A_23_P250462ATP6AP1ATPase, H+ transporting, lysosomal accessory protein 1Xq280.33992.25E-060.0412Deletion-duplication of Xq27.1-q281
A_23_P322593TAPT1-AS1TAPT1 antisense RNA 14p15.32−0.36502.28E-060.0417Deletion of 4p16.3-p15.322
A_32_P173058TMEM41BTransmembrane protein 41B11p15.4−0.33882.40E-060.0438Deletion-duplication of 11p15.428
A_23_P117424DCAF11DDB1 and CUL4 associated factor 1114q11.20.31742.54E-060.0464Duplication of 14q11.2-q21.11
A_32_P90685<NA>−0.48532.63E-060.0481

ASD, autism spectrum disorder; ID, intellectual disability; NA, not applied; logFC, log2 of fold change.

Adjusted p values <0.05;

Bonferroni correction.

The Simons Foundation Autism Research Initiative (SFARI) Gene 2.0 database (available at http://gene.sfari.org).

Unsupervised Machine Learning

Using the 19-probe expression signature, a hierarchical cluster analysis of all samples (n=42) showed that ASDs were relatively well discriminated from controls (with the sorting of three ASD cases into the control group), suggesting that these probes could be helpful for differentiating between ASDs and controls. Detailed results from the hierarchical cluster analysis are presented in Figure 1.
Fig. 1

Heat-map overview of the two-way hierarchical clustering analysis of 19 differentially-expressed probes. Each row represents the relative levels of expression for a single probe. The red or green color indicates relatively high or low expression, respectively. In the sample clustering dendrogram, red indicates autism spectrum disorder samples while blue indicates control samples.

Supervised Machine Learning

For the supervised machine learning algorithms, we simply built a classifier using the 19-probe expression signature and assessed its predictive performance. With this 19-probe prediction model, the test dataset was used to validate the prediction of ASD. This validation test revealed that our prediction model successfully distinguished between the individuals with ASD and controls. Both SVM and KNN analysis accurately identified 8 individuals with ASD and 8 controls with the exception of classifying one control as ASD, resulting in a predictive accuracy of 93.8% (sensitivity of 100% and specificity of 87.5%). However, in the LDA analysis, the diagnostic prediction of ASD vs. control samples was 68.8% accurate (Table 3).
Table 3

Prediction performances of the 19-probe set on the test (validation) set, according to machine learning algorithms

Accuracy (%)Sensitivity (%)Specificity (%)Positive predictive value (%)Negative predictive value (%)
SVM93.8100.087.588.9100.0
KNN93.8100.087.588.9100.0
LDA68.862.575.071.466.7

SVM, support vector machine; KNN, K-nearest neighbor; LDA, linear discriminant analysis.

DISCUSSION

Our analyses were designed to validate a potential biological signature using peripheral blood microarray data obtained from young Asian adults with ASD in combination with machine learning algorithms. In this exploratory study using previously published microarray data,13) we identified a blood-based gene expression signature that reliably identified young adults with ASD. These results are consistent with the findings of four previous studies that reported on gene expression signatures with high diagnostic accuracy for toddlers and children with ASD.8–11) The results of this and four previous studies suggest that gene expression profiles from the peripheral blood samples contain a biological signature that could be used to predict the ASD risk in both children and young adults. According to several studies of healthy adults, the expression of most genes within individuals remains temporally stable, and only 1% to 2% of genes display significant changes over time periods of at least one month.22,23) In addition, previous studies observed that the cognitive, behavioral, and emotional symptoms of individuals with ASD generally persist over time.24,25) Therefore, the gene expression patterns underlying these long-standing phenotypes may be constant over time in the transition children to young adults. Gene expression microarrays primarily measure messenger RNA for thousands of identified genes.26) The microarrays specifically evaluate the sequence of DNA that is transcribed to RNA in the genome at a given time. Prediction models using multivariate gene expression have been widely adopted for screening, diagnosis, and prognosis.27,28) Several previous transcriptome-wide studies of gene expression in ASD subjects have used post mortem brain tissue29–31) or peripheral blood samples.8–11,13) Among them, the gene expression profiles using peripheral blood have shown that RNA expression is disrupted across hundreds of genes in individuals with ASD. Blood-based analyses of gene expression profiles are encouraging because blood samples are easily obtainable from living individuals and are likely to contain ASD-relevant signatures. Although the connection between blood and brain transcriptomic profiles is not well known, growing evidence suggests that measurements performed in tissues that are not primarily involved in the disease process may uncover disease signatures.10) Sullivan et al.32) have established a shared gene expression profile between whole blood and brain tissues suggesting that the cautious and thoughtful use of peripheral gene expression may be a useful surrogate for gene expression in the brain. Further research will be required to determine whether the dysregulated signatures in peripheral blood are actual indicators of the brain pathophysiology in ASD. Our results could also provide further evidence of the emerging consensus that peripheral blood is a potential source of biological signatures that are responsible for dysregulation of the brain and other unreachable tissues.33) The gene list in our study partially overlaps with previously reported candidate genes and loci associations for ASD (Table 2). These various transcriptomic changes would be representative of the genomic alteration in the ASD. Blood-derived gene expression studies of subjects with ASD repeatedly demonstrate dysregulation of immune/inflammation genes.34) Regulatory factor X1 (RFX1; transcription factor regulating a wide variety of genes involved in immunity)35) expression was significantly increased in the ASD group in our study. Substantial percentages of patients with ASD show peripheral markers of mitochondrial energy metabolism dysfunction.36) We found the mitochondrial ribosomal protein S31 (MRPS31) expression was significantly reduced in the ASD group. In particular, we identified a probe (A_23_P399501, pyruvate kinase muscle isozyme [PKM]) that has the best ability for detecting whether a sample was collected from a patient with ASD (Supplementary Fig. 4). The PKM expression level was significantly higher in ASD subjects than in controls. Pyruvate kinase is an enzyme involved in glycolysis. Its primary function is to catalyze the transfer of a phosphate group from phosphoenolpyruvate to adenosine diphosphate as the last step of glycolysis, generating one molecule of pyruvate and one molecule of adenosine triphosphate.37) A previous study also demonstrated that the plasma pyruvate levels were higher in children with autism than in controls.38) These results suggested that the PKM expression level in peripheral blood may serve as a biomarker to distinguish ASD from controls. Our study has several limitations, mostly stemming from small sample size and lack of phenotypic information of the original data. In particular, most of ASD subjects in this study exhibited normal intelligence quotients (IQ; mean full scale IQ, 91.9), this probably does not represent the broader ASD population. Unfortunately, it is not well understood about the connection between the peripheral blood and the brain transcriptomic profiles and the influence of age factor for gene expression in subjects with ASD. The results of our study should be cautiously interpreted. If further analysis is performed on a more homogeneous dataset and validated in an independent, large cohort of cases and controls, the accuracy of the results should be higher. These strategies for class prediction analyses will help identify robust biomarkers for both the diagnosis of ASD and individualized treatment options for patients and their families.39) In conclusion, this study reveals a blood-based gene expression signature that has promising accuracy in distinguishing between young adults with ASD and age- and sex-matched unaffected controls. The ability of the 19 DE probes to correctly predict ASD samples compares favorably with the results of four previous studies on ASD diagnosis in toddlers and children. This transcriptomics approach may shed light on an important aspect of clinical biomarker discovery, offering high predictive accuracy for detecting ASD.
  32 in total

1.  Analysis of blood-based gene expression signature in first-episode psychosis.

Authors:  Jimmy Lee; Liang-Kee Goh; Gengbo Chen; Swapna Verma; Chay-Hoon Tan; Tih-Shih Lee
Journal:  Psychiatry Res       Date:  2012-04-13       Impact factor: 3.222

2.  Gene expression profiling predicts clinical outcome of breast cancer.

Authors:  Laura J van 't Veer; Hongyue Dai; Marc J van de Vijver; Yudong D He; Augustinus A M Hart; Mao Mao; Hans L Peterse; Karin van der Kooy; Matthew J Marton; Anke T Witteveen; George J Schreiber; Ron M Kerkhoven; Chris Roberts; Peter S Linsley; René Bernards; Stephen H Friend
Journal:  Nature       Date:  2002-01-31       Impact factor: 49.962

3.  The stability of cognitive and behavioral parameters in autism: a twelve-year prospective study.

Authors:  B J Freeman; B Rahbar; E R Ritvo; T L Bice; A Yokota; R Ritvo
Journal:  J Am Acad Child Adolesc Psychiatry       Date:  1991-05       Impact factor: 8.829

4.  Mitochondrial dysfunction in autism.

Authors:  Cecilia Giulivi; Yi-Fan Zhang; Alicja Omanska-Klusek; Catherine Ross-Inta; Sarah Wong; Irva Hertz-Picciotto; Flora Tassone; Isaac N Pessah
Journal:  JAMA       Date:  2010-12-01       Impact factor: 56.272

5.  Prediction of autism by translation and immune/inflammation coexpressed genes in toddlers from pediatric community practices.

Authors:  Tiziano Pramparo; Karen Pierce; Michael V Lombardo; Cynthia Carter Barnes; Steven Marinero; Clelia Ahrens-Barbeau; Sarah S Murray; Linda Lopez; Ronghui Xu; Eric Courchesne
Journal:  JAMA Psychiatry       Date:  2015-04       Impact factor: 21.596

6.  Sibling recurrence and the genetic epidemiology of autism.

Authors:  John N Constantino; Yi Zhang; Thomas Frazier; Anna M Abbacchi; Paul Law
Journal:  Am J Psychiatry       Date:  2010-10-01       Impact factor: 18.112

Review 7.  A systematic review of early intensive intervention for autism spectrum disorders.

Authors:  Zachary Warren; Melissa L McPheeters; Nila Sathe; Jennifer H Foss-Feig; Allison Glasser; Jeremy Veenstra-Vanderweele
Journal:  Pediatrics       Date:  2011-04-04       Impact factor: 7.124

8.  Disruption of cerebral cortex MET signaling in autism spectrum disorder.

Authors:  Daniel B Campbell; Rosanna D'Oronzio; Krassi Garbett; Philip J Ebert; Karoly Mirnics; Pat Levitt; Antonio M Persico
Journal:  Ann Neurol       Date:  2007-09       Impact factor: 10.422

9.  Integrated analysis of whole-exome sequencing and transcriptome profiling in males with autism spectrum disorders.

Authors:  Marta Codina-Solà; Benjamín Rodríguez-Santiago; Aïda Homs; Javier Santoyo; Maria Rigau; Gemma Aznar-Laín; Miguel Del Campo; Blanca Gener; Elisabeth Gabau; María Pilar Botella; Armand Gutiérrez-Arumí; Guillermo Antiñolo; Luis Alberto Pérez-Jurado; Ivon Cuscó
Journal:  Mol Autism       Date:  2015-04-15       Impact factor: 7.509

10.  Characteristics and predictive value of blood transcriptome signature in males with autism spectrum disorders.

Authors:  Sek Won Kong; Christin D Collins; Yuko Shimizu-Motohashi; Ingrid A Holm; Malcolm G Campbell; In-Hee Lee; Stephanie J Brewster; Ellen Hanson; Heather K Harris; Kathryn R Lowe; Adrianna Saada; Andrea Mora; Kimberly Madison; Rachel Hundley; Jessica Egan; Jillian McCarthy; Ally Eran; Michal Galdzicki; Leonard Rappaport; Louis M Kunkel; Isaac S Kohane
Journal:  PLoS One       Date:  2012-12-05       Impact factor: 3.240

View more
  7 in total

Review 1.  Towards a Multivariate Biomarker-Based Diagnosis of Autism Spectrum Disorder: Review and Discussion of Recent Advancements.

Authors:  Troy Vargason; Genevieve Grivas; Kathryn L Hollowood-Jones; Juergen Hahn
Journal:  Semin Pediatr Neurol       Date:  2020-03-05       Impact factor: 1.636

2.  Computational Modeling of Gene-Specific Transcriptional Repression, Activation and Chromatin Interactions in Leukemogenesis by LASSO-Regularized Logistic Regression.

Authors:  Nickolas Steinauer; Kevin Zhang; Chun Guo; Jinsong Zhang
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2021-12-08       Impact factor: 3.710

3.  Effects of the Prenatal Administration of Tetanus Toxoid on the Sociability and Explorative Behaviors of Rat Offspring: A Preliminary Study.

Authors:  Eda Sünnetçi; Ferit Durankuş; Yakup Albayrak; Mümin Alper Erdoğan; Özüm Atasoy; Oytun Erbaş
Journal:  Clin Psychopharmacol Neurosci       Date:  2021-02-28       Impact factor: 2.582

4.  Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder.

Authors:  Mateusz Garbulowski; Karolina Smolinska; Klev Diamanti; Gang Pan; Khurram Maqbool; Lars Feuk; Jan Komorowski
Journal:  Front Genet       Date:  2021-02-25       Impact factor: 4.599

5.  Machine Learning Data Analysis Highlights the Role of Parasutterella and Alloprevotella in Autism Spectrum Disorders.

Authors:  Daniele Pietrucci; Adelaide Teofani; Marco Milanesi; Bruno Fosso; Lorenza Putignani; Francesco Messina; Graziano Pesole; Alessandro Desideri; Giovanni Chillemi
Journal:  Biomedicines       Date:  2022-08-19

Review 6.  From Neurons to Social Beings: Short Review of the Mirror Neuron System Research and Its Socio-Psychological and Psychiatric Implications.

Authors:  Hyeonjin Jeon; Seung-Hwan Lee
Journal:  Clin Psychopharmacol Neurosci       Date:  2018-02-28       Impact factor: 2.582

7.  Biomarkers for Autism Spectrum Disorders (ASD): A Meta-analysis.

Authors:  Ashley Ansel; Yehudit Posen; Ronald Ellis; Lisa Deutsch; Philip D Zisman; Benjamin Gesundheit
Journal:  Rambam Maimonides Med J       Date:  2019-10-29
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.