Literature DB >> 32176083

Identification of a robust non-coding RNA signature in diagnosing autism spectrum disorder by cross-validation of microarray data from peripheral blood samples.

Wei Cheng1, Shanhu Zhou, Jinxia Zhou, Xijia Wang.   

Abstract

Novel molecular signatures are needed to improve the early and accurate diagnosis of autism spectrum disorder (ASD), and indicate physicians to provide timely intervention. This study aimed to identify a robust blood non-coding RNA (ncRNA) signature in diagnosing ASD. One hundred eighty six blood samples in the microarray dataset were randomly divided into the training set (n = 112) and validation set (n = 72). Then, the microarray probe expression profile was re-annotated into the expression profile of 4143 ncRNAs though probe sequence mapping. In the training set, least absolute shrinkage and selection operator (LASSO) penalized generalized linear model was adopted to identify the 20-ncRNA signature, and a diagnostic score was calculated for each sample according to the ncRNA expression levels and the model coefficients. The score demonstrated an excellent diagnostic ability for ASD in the training set (area under receiver operating characteristic curve [AUC] = 0.96), validation set (AUC = 0.97) and the overall (AUC = 0.96). Moreover, the blood samples of 23 ASD patients and 23 age- and gender-matched controls were collected as the external validation set, in which the signature also showed a good diagnostic ability for ASD (AUC = 0.96). In subgroup analysis, the signature was also robust when considering the potential confounders of sex, age, and disease subtypes. In comparison with a 55-gene signature deriving from the same dataset, the ncRNA signature showed an obviously better diagnostic ability (AUC: 0.96 vs 0.68, P < .001). In conclusion, this study identified a robust blood ncRNA signature in diagnosing ASD, which might help improve the diagnostic accuracy for ASD in clinical practice.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32176083      PMCID: PMC7220435          DOI: 10.1097/MD.0000000000019484

Source DB:  PubMed          Journal:  Medicine (Baltimore)        ISSN: 0025-7974            Impact factor:   1.817


Introduction

Autism spectrum disorder (ASD) is a heterogeneous set of neurodevelopmental diseases, characterized by deficits in social communication and verbal/nonverbal interaction, as well as restricted and repetitive patterns of interests and behaviors. It has a high prevalence of approximately 0.3% to 1.2%, with 3 main subtypes of autistic disorder, Asperger disorder and pervasive developmental disorder-not otherwise specified (PDD-NOS).[ Despite of onsite before 3 years old, most children are diagnosed with ASD after 4 years old.[ Early intensive behavioral interventions could improve the outcomes (e.g., language skills, cognitive performance, and adaptive behavior skills) in some young children with ASD.[ Thus, it has a critical need in clinical practice to increase the diagnostic accuracy for ASD. As functional RNA molecules, non-coding RNAs (ncRNAs) are transcribed from DNA but not translated into proteins, including the types of long non-coding RNA (lncRNA), pseudogene, small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), ribosomal RNA (rRNA), miscellaneous RNA (miscRNA), and so on. ncRNAs have been reported in the pathogenesis of ASD, and aberrant expression of ncRNAs was detected in peripheral blood of ASD patients.[ With great advances in genetic detection, gene expression profiles are available to identify novel and robust biomarkers. In this study, we obtain the ncRNA expression profiles through microarray probe re-annotation, and identify and validate a blood ncRNA signature for the diagnosis of ASD.

Methods

Data preparation

The database of Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/) was a public functional genomics data repository of array- and sequence-based data, and users could query and download experiments and curated gene expression profiles. In this database, we searched ASD-related datasets of gene expression profiles from inception to June 2019, using the key words including: (“autism spectrum disorder” OR “Asperger” OR “autis∗” OR “pervasive developmental disorder” OR “childhood disintegrative disorder”). Datasets were included if meeting the following criteria: detected blood gene expression profiles (probe-tabulated) of both ASD cases and healthy controls (HC); availability of clinical data and corresponding sequences of the microarray probes; the sample size was large enough. Then, the tab-delimited expression value-matrix table was downloaded and log2-transformed. This study was approved by the ethnic committee of Puren Hospital Affiliated to Wuhan University of Science and Technology.

Probe re-annotation

First, we obtained the microarray probe sequences from the Affymetrix product website (http://www.affymetrix.com), as well as the human genome sequences (GRCh38.p12) and comprehensive gene annotation from the GENCODE database (https://www.gencodegenes.org).[ Then, the probe sequences were aligned to the human genome sequences, using the software of hierarchical indexing for spliced alignment of transcripts (HISAT). No mismatches between probe sequence and reference sequence were allowed. Alignments were stored in SAM format where all alternative best hits were included. Among the successful alignments to the reference genome, we only included the transcripts with at least 1 alignment, each of which was also matched to only 1 transcript. When multiple probes matched to an identical gene, the average expression value across these probes was calculated to represent the corresponded gene. ncRNA expression profiles were extracted after excluding protein-coding RNAs (protein-coding and nonsense mediated decay) according to the RNA types.

ncRNA signature construction, evaluation, and validation

The samples were randomly divided into the training set and validation set according to the ratio of 6:4. In the training set, a least absolute shrinkage and selection operator (LASSO) penalized generalized linear model was adopted to identify significant ncRNAs.[ The penalty parameter was estimated by 10-fold cross-validation at 1 standard error (SE) beyond the minimum partial likelihood deviance. Then, the coefficients of significant ncRNAs in the model were extracted to calculate a diagnostic score for each sample in the training set, validation set, and the overall. In receiver operating characteristic (ROC) curve analysis, area under ROC curve (AUC) was calculated to evaluate the diagnostic ability of the signature. Moreover, subgroup analysis was conducted on sex, age, and disease subtypes to assess the diagnostic stability of the signature, and DeLong test for 2 ROC curves was performed to investigate the difference between subgroups. Finally, we also compared the ncRNA signature with a 55-gene signature which derived from the same dataset.[

External validation

The blood samples of 23 ASD and 23 age- and sex-matched controls were obtained in Puren Hospital from September 2015 to July 2017. Written informed consent was provided before sample collection, and the present study protocol was approved by the ethnic committee of Puren Hospital. Then, total RNA was extracted using TRIzol reagent (Invitrogen, Waltham, MA, USA), and stored at −80°C. The RNA concentration and purity were measured by the NanoDrop spectrophotometer (Thermo Fisher, Waltham, MA, USA). Total RNA was synthesized into first-strand cDNA using fluorescent-labeled dNTPs (Thermo Fisher, Waltham, MA, USA), before hybridization with a customized microarray which tailed and fixed 20 ncRNA probes (CapitalBio, China). The probe sequences were as follows: RNU1-16P: GGGACTATGTTCGTGTTCTCTCCTG RNU6-258P: AAGTCGTGAAATAGTCCATATGTTA RNU6-485P: CCCTGTGCAAGGATGATATGCAAAT RNU6-549P: GCTCACTTCAGTGGTACATATACTA RNVU1-15: GAACTCGACTGCATAACTTGTGATA RNA5SP160: CTATGGCCATAAAACCCTGAACTTG IGHV3-47: TCTCCAGAGACAACGCCAAGAAGTC TUBB2BP1: CATGCCCTCACCCAAGGTGTCAGAC CHST9-AS1: GGAAGTCTTCAGTATCTGTACAACT AC074183.2: TGCCCAGGAGAAAGGCTGAAGGACA AC136940.3: TTAGATTACGCTTGGCTTCTTTTGA RP4-646B12.2: GAAAAAAGTGTGAATCAGTCACTAC RP11-90M2.3: ATTGGTGGGTTTTGATGCCCAGTGA RN7SL132P: AGCCCAGGAGCTCATAGTGCGCTAT AC074117.12: CTGGCGTGGCAGAATACCTCTTTGA AC069513.3: GCAGCGCCTCTTCCGACAGCCCCCA RNY1P11: AAAGGGAGTGAGTTATCTCATTTGA RP11-162A23.5: CCCACCTCCCCTACCAAAGCCCATA MYCL2: GACCGCGACTCGTACCATCACTATT RNU105B: AGGTCACTCTCTCCCCAGGCTGTGT Finally, the ncRNA expression levels were detected by the GenePix microarray scanner (Axon Instrument, Union City, CA, USA), and a diagnostic score was calculated for each sample according to the signature formula.

Statistical analysis

All statistical analyses were conducted using R 3.6.0 software (The R Foundation, MA, USA). The generalized linear model was constructed with glmnet 2.0–18 package, and ROC curve analysis was performed with ROCR 1.0–7 package. A 2-sided P value < .05 was considered statistically significant.

Results

Characteristic of included dataset

The included microarray dataset of GSE18123 was based on the platform of GPL6244 (Affymetrix Human Gene 1.0 ST Array [HuGene-1_0-st]), with a total of 104 ASD (80 men [76.9%] and average age 8.1 years [2-21]) and 82 controls (48 men [58.3%] and average age 8.0 years [2-22]). The subtypes of autistic disorder, Asperger disorder, and PDD-NOS accounted for 39.4% (n = 41), 14.4% (n = 15), and 46.2% (n = 48) respectively. Then, 186 samples were randomly divided into the training set (n = 112) and validation set (n = 74).

Data preprocessing and sample clustering

The GPL6244 platform contained 861,493 sequences (25 bases) aligned to 33,297 probes. After probe re-annotation, a total of 8089 RNAs (32 types) were identified with 18,329 specific probes, among which there were 4143 ncRNAs mapped to 10,258 probes. Then, samples were clustered according to the distance in Pearson correlation matrices. When adopted the expression profiles of probes or genes, no outliers were detected (height < 0.2).

Signature construction, evaluation, and validation

One hundred eighty six blood samples in the microarray dataset were randomly divided into the training set (n = 112) and validation set (n = 72) according to the ratio of 6:4. In the training set, the LASSO penalized generalized linear model identified 20 significant ncRNAs, which demonstrated an obvious discrepancy between ASD and control samples (Table 1). A diagnostic score was calculated for each sample according to the ncRNA expression levels weighted by their coefficients in the LASSO model.
Table 1

Twenty non-coding RNAs in the blood diagnostic signature of autism spectrum disorder.

Twenty non-coding RNAs in the blood diagnostic signature of autism spectrum disorder. Diagnostic score = AQP4-AS1∗(–0.169) + FTH1P2∗(–0.070) + FTH1P3∗0.261 + GMCL2∗(–0.097)  + HMGN2P11∗(–0.212) + IGHV3-47∗(–0.024) + MUC20-OT1∗(–0.143) + MYCLP1∗(–0.105) + POLR2KP2∗(–0.272) + RN7SL132P∗0.044 + RNA5SP160∗(–0.184) + RNU105B∗0.200 + RNU1-16P∗0.040 + RNU6-258P∗0.080 + RNU6-485P∗(–0.078) + RNU6-549P∗0.119 + RNVU1-15∗0.114 + RNY1P11∗(–0.160) +  RPS26P39∗(–0.083) + TUBB2BP1∗(–0.075) The score displayed an excellent diagnostic ability for ASD in the training set (AUC = 0.96), validation set (AUC = 0.97), and the overall (AUC = 0.96) (Fig. 1). No significant difference was detected between the women and men in the training set (AUC: 0.96 vs 0.97, P = .890), validation set (AUC: 0.96 vs 1.00, P = .205), and the overall (AUC: 0.95 vs 0.98, P = .231) (Fig. 2). It was also insignificant between the older and younger cases in the training set (AUC: 0.93 vs 0.99, P = .088), validation set (AUC: 0.99 vs 0.95, P = .366), and the overall (AUC: 0.95 vs 0.97, P = .337) (Fig. 3).
Figure 1

Receiver operating characteristic (ROC) curve analysis of the diagnostic signature. AUC = area under ROC curve.

Figure 2

Receiver operating characteristic (ROC) curve analysis of the diagnostic signature in the subgroups of different sexes. AUC = area under ROC curve.

Figure 3

Receiver operating characteristic (ROC) curve analysis of the diagnostic signature in the subgroups of different ages. AUC = area under ROC curve.

Receiver operating characteristic (ROC) curve analysis of the diagnostic signature. AUC = area under ROC curve. Receiver operating characteristic (ROC) curve analysis of the diagnostic signature in the subgroups of different sexes. AUC = area under ROC curve. Receiver operating characteristic (ROC) curve analysis of the diagnostic signature in the subgroups of different ages. AUC = area under ROC curve. As for disease subtypes, the score had a good power in diagnosing autistic disorder (AUC = 0.97 in training set, 0.98 in validation set, and 0.97 in the overall), PDD-NOS (AUC = 0.96 in training set, 0.96 in validation set, and 0.95 in the overall), and Asperger disorder (AUC = 0.87 in training set, 0.96 in validation set, and 0.91 in the overall) (Fig. 4). No significant difference was detected between the subtypes (P > .05).
Figure 4

Receiver operating characteristic (ROC) curve analysis of the diagnostic signature in the subgroups of different disease subtypes. AUC = area under ROC curve.

Receiver operating characteristic (ROC) curve analysis of the diagnostic signature in the subgroups of different disease subtypes. AUC = area under ROC curve. The blood samples were collected from 23 ASD patients (14 men [60.9%] and average age 8.0 years [2-16]), and 23 age- and sex-matched controls. According to the signature formula, the score displayed a good diagnostic ability for ASD (AUC = 0.96) (Fig. 5).
Figure 5

Receiver operating characteristic (ROC) curve analysis of the diagnostic signature in the external validation set. AUC = area under ROC curve.

Receiver operating characteristic (ROC) curve analysis of the diagnostic signature in the external validation set. AUC = area under ROC curve.

Comparison with a 55-gene signature

The previously published 55-gene signature derived from the same dataset.[ In the training set, the 55-gene signature showed no obvious difference than the ncRNA signature (AUC: 0.96 vs 0.96, P = 1.000) (Fig. 6). In the validation set, the 55-gene signature displayed a poorer performance than the ncRNA signature (AUC: 0.71 vs 0.97, P < .001). In general, the ncRNA signature showed a better diagnostic ability than the 55-gene signature (AUC: 0.96 vs 0.68, P < .001).
Figure 6

Receiver operating characteristic (ROC) curve analysis of the 55-gene diagnostic signature. AUC = area under ROC curve.

Receiver operating characteristic (ROC) curve analysis of the 55-gene diagnostic signature. AUC = area under ROC curve.

Discussion

In this study, we adopted the methods of probe re-annotation and penalized generalized linear model to identify a novel and robust blood ncRNA signature in diagnosing ASD. The signature showed an excellent stability in the subgroup analyses of age, sex, and disease subtypes. It also displayed a higher diagnostic efficiency than the 55-gene signature. The ncRNA signature consisted of 20 genes, covering 5 kinds of ncRNAs (lncRNA, pseudogene, miscRNA, rRNA, snoRNA, and snRNA). lncRNAs are untranslated RNA molecules with >200 nucleotides in length, which perform a wide variety of functions and play an important role in the development of mental discords.[ AQP4-AS1 had a moderate to high expression in nervous system (brain, cortex, and cerebellum) (based on the GTEx database). In GWAS Catalog, AQP4-AS1 was associated with the human phenotypes of blood protein measurement, lobe attachment, breast carcinoma, hair color, and glomerular filtration rate. MUC20-OT1 was involved into the phenotypes of mean corpuscular hemoglobin, mean corpuscular volume, mean corpuscular hemoglobin concentration, eosinophil count, iron biomarker measurement, and transferrin measurement. Pseudogenes are gene copies that have lost the original coding ability, and they have been reported in the pathogenesis of ASD.[ IGHV3-47 was associated with the phenotype of blood protein measurement. FTH1P3 was involved into the progression and prognosis of multiple cancers.[ GMCL2 was related with protein binding (GO:0005515). MYCLP1 was associated with the phenotype of alcoholic pancreatitis. POLR2KP2 had a moderate to high expression in nervous system (brain, cortex, and cerebellum). RPS26P39 was associated with the phenotypes of sleep duration, age at menopause, and neutropenia, response to gemcitabine, pancreatic carcinoma. TUBB2BP1 had a moderate to high expression in nervous system (brain, cortex, and cerebellum), which was related with the phenotypes of red blood cell distribution width, mean corpuscular hemoglobin, and Crohn disease. Small ncRNAs are regarded as new class of biomarkers and potential therapeutic targets in neurodegenerative diseases.[ In the signature, snRNAs are most frequent. As the components of spliceosome, snRNAs are fairly conserved with a uridine-rich non-coding sequence of <200nt, and involved into the splicing of precursor mRNA. RNU1-16P had a moderate to high expression in nervous system (brain, cortex, and cerebellum) and whole blood. RNU6-258P was related with intelligence.[ Compared with normally developed children, the intelligence development in ASD children was significantly delayed.[ There were positive correlations between age and mean corpuscular volume, and red cell distribution width in ASD children, and 24.1% cases had iron deficiency and 15.5% had anemia.[ RNU6-549P was associated with the GWAS phenotypes of mathematical ability, self reported educational attainment, coronary artery disease, and factor VII activating protease measurement. Previous studies suggested impaired metacognitive monitoring, mathematics under-achievement, and educational needs in ASD.[ RNVU1-15 had a moderate to high expression in nervous system and whole blood. It was associated with U1 snRNP (GO:0005685), mRNA 5′-splice site recognition (GO:0000395), pre-mRNA 5′-splice site binding (GO:0030627). A growing number of alternative splicing regulators have been reported in relation with ASD.[ The limitations in this study should be also acknowledged. First, the sample size is not as large as we expected. Second, the method of probe re-annotation could not cover all ncRNAs. In the future, a large-scale prospective designed study was needed to validate this ncRNA signature. In conclusion, through probe re-annotation and penalized generalized linear model, we identified a novel and robust blood ncRNA signature in diagnosing ASD, which might help improve the diagnostic accuracy for ASD in clinical practice.

Author contributions

Conceptualization: Wei Cheng. Data curation: Wei Cheng, Jinxia Zhou. Formal analysis: Wei Cheng. Methodology: Jinxia Zhou. Software: Shanhu Zhou. Supervision: Shanhu Zhou, Jinxia Zhou. Visualization: Shanhu Zhou. Writing – original draft: Wei Cheng.
  23 in total

1.  Serum miRNA expression profiling reveals miR-486-3p may play a significant role in the development of autism by targeting ARID1B.

Authors:  Dan Yu; Xueqian Jiao; Tao Cao; Fusheng Huang
Journal:  Neuroreport       Date:  2018-12-05       Impact factor: 1.837

2.  Microdeletion of pseudogene chr14.232.a affects LRFN5 expression in cells of a patient with autism spectrum disorder.

Authors:  Gerarda Cappuccio; Sergio Attanasio; Marianna Alagia; Margherita Mutarelli; Roberta Borzone; Marianthi Karali; Rita Genesio; Angela Mormile; Lucio Nitsch; Floriana Imperati; Annalisa Esposito; Sandro Banfi; Ennio Del Giudice; Nicola Brunetti-Pierri
Journal:  Eur J Hum Genet       Date:  2019-05-31       Impact factor: 4.246

3.  Creativity in Autism: An Examination of General and Mathematical Creative Thinking Among Children with Autism Spectrum Disorder and Children with Typical Development.

Authors:  Orit Hetzroni; Hila Agada; Mark Leikin
Journal:  J Autism Dev Disord       Date:  2019-09

4.  Supporting metacognitive monitoring in mathematics learning for young people with autism spectrum disorder: A classroom-based study.

Authors:  Katie Maras; Tim Gamble; Mark Brosnan
Journal:  Autism       Date:  2017-10-26

Review 5.  A systematic review of early intensive intervention for autism spectrum disorders.

Authors:  Zachary Warren; Melissa L McPheeters; Nila Sathe; Jennifer H Foss-Feig; Allison Glasser; Jeremy Veenstra-Vanderweele
Journal:  Pediatrics       Date:  2011-04-04       Impact factor: 7.124

6.  Genomic heterogeneity in peritoneal implants: A differential analysis of gene expression using nanostring Human Cancer Reference panel identifies a malignant signature.

Authors:  Paulette Mhawech-Fauceglia; Iyare Izevbaye; Tassja Spindler; Guisong Wang; Helena Hwang; Damanzoopinder Samrao; Ester Elishaev; G Larry Maxwell; Kate Lawrenson; Kathleen M Darcy
Journal:  Gynecol Oncol       Date:  2019-11-08       Impact factor: 5.482

7.  The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog).

Authors:  Jacqueline MacArthur; Emily Bowler; Maria Cerezo; Laurent Gil; Peggy Hall; Emma Hastings; Heather Junkins; Aoife McMahon; Annalisa Milano; Joannella Morales; Zoe May Pendlington; Danielle Welter; Tony Burdett; Lucia Hindorff; Paul Flicek; Fiona Cunningham; Helen Parkinson
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

8.  Development and validation of an individualized diagnostic signature in thyroid cancer.

Authors:  Li-Ou Han; Xin-Yu Li; Ming-Ming Cao; Yan Cao; Li-Hong Zhou
Journal:  Cancer Med       Date:  2018-03-09       Impact factor: 4.452

Review 9.  Small Non-coding RNAs: New Class of Biomarkers and Potential Therapeutic Targets in Neurodegenerative Disease.

Authors:  Callum N Watson; Antonio Belli; Valentina Di Pietro
Journal:  Front Genet       Date:  2019-04-26       Impact factor: 4.599

10.  Development and validation of a novel and robust blood small nuclear RNA signature in diagnosing autism spectrum disorder.

Authors:  Jinxia Zhou; Qian Hu; Xijia Wang; Wei Cheng; Chunlian Pan; Xiaobin Xing
Journal:  Medicine (Baltimore)       Date:  2019-11       Impact factor: 1.889

View more
  2 in total

1.  Downregulation of TGIF2 is possibly correlated with neuronal apoptosis and autism-like symptoms in mice.

Authors:  Jing Lei; Yijue Deng; Songdong Ma
Journal:  Brain Behav       Date:  2022-05-19       Impact factor: 3.405

Review 2.  Paving the Way toward Personalized Medicine: Current Advances and Challenges in Multi-OMICS Approach in Autism Spectrum Disorder for Biomarkers Discovery and Patient Stratification.

Authors:  Areej G Mesleh; Sara A Abdulla; Omar El-Agnaf
Journal:  J Pers Med       Date:  2021-01-13
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.