Literature DB >> 22139918

AutismKB: an evidence-based knowledgebase of autism genetics.

Li-Ming Xu1, Jia-Rui Li, Yue Huang, Min Zhao, Xing Tang, Liping Wei.   

Abstract

Autism spectrum disorder (ASD) is a heterogeneous neurodevelopmental disorder with a prevalence of 0.9-2.6%. Twin studies showed a heritability of 38-90%, indicating strong genetic contributions. Yet it is unclear how many genes have been associated with ASD and how strong the evidence is. A comprehensive review and analysis of literature and data may bring a clearer big picture of autism genetics. We show that as many as 2193 genes, 2806 SNPs/VNTRs, 4544 copy number variations (CNVs) and 158 linkage regions have been associated with ASD by GWAS, genome-wide CNV studies, linkage analyses, low-scale genetic association studies, expression profiling and other low-scale experimental studies. To evaluate the evidence, we collected metadata about each study including clinical and demographic features, experimental design and statistical significance, and used a scoring and ranking approach to select a core data set of 434 high-confidence genes. The genes mapped to pathways including neuroactive ligand-receptor interaction, synapse transmission and axon guidance. To better understand the genes we parsed over 30 databases to retrieve extensive data about expression patterns, protein interactions, animal models and pharmacogenetics. We constructed a MySQL-based online database and share it with the broader autism research community at http://autismkb.cbi.pku.edu.cn, supporting sophisticated browsing and searching functionalities.

Entities:  

Mesh:

Year:  2011        PMID: 22139918      PMCID: PMC3245106          DOI: 10.1093/nar/gkr1145

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Autism spectrum disorder (ASD) is a heterogeneous neurodevelopmental disorder characterized by impairments in reciprocal social interaction and communication and presence of restricted, repetitive and stereotyped patterns of behavior, interests and activities (1). ASD is an umbrella term for Autistic Disorder, Asperger Syndrome and Pervasive Developmental Disorder Not Otherwise Specified (PDD-NOS) (1). With an early onset prior to age 3 and a prevalence as high as 0.9–2.6% (2,3), ASD is one of the leading causes of childhood disability and inflicts serious suffering and burden for the family and society (4). Understanding the causes of ASD is critical for developing better treatment. Twin studies have shown that the heritability of ASD is as high as 38–90%, indicating strong contributions by genetic factors as well as environmental factors (5,6). The search for environmental factors has not yet led to convincing major candidates whereas the search for genes associated with autism, although far from complete or conclusive, has been more fruitful. The genes discovered so far can be roughly grouped into two categories: ‘syndromic autism related genes’ or causal genes underlying genetic disorders that cause autistic symptoms such as Fragile X Syndrome, Rett Syndrome, Tuberous Sclerosis Complex and dozens of other disorders (7,8), and ‘non-syndromic autism related genes’ most of which are susceptibility genes (9). Many experimental methods have been used to identify associated genes, including the earlier linkage analyses and low-scale candidate gene association or experimental studies as well as the more recent genome-wide association studies (GWAS), genome-wide CNV studies and expression profiling. With hundreds of studies published, especially the recent genome-wide studies, and with the next-generation sequencing technologies providing even more power for further gene discoveries (10), a new challenge has emerged: it has become more and more difficult for an autism researcher to answer with confidence how many genes have been associated with ASD, how strong the evidence is, what features the genes have and what pathways they involve. The amount of available literature and data and the intrinsic complexity of autism genetics demand bioinformatic data management and analysis. Three efforts have been made so far by different groups to collect genes and variations associated with ASD: AutDB (also known as SAFRI Gene) collected 219 genes (11,12), Autism genetic database (AGD) collected 226 genes and 743 CNVs (13) and Autism Chromosome Rearrangement Database (ACRD) collected 372 breakpoints and other genomic features (14). However, they are far from a comprehensive survey of autism genetics. To bring a clearer big picture of autism genetics, we performed a comprehensive review and analysis of published literature and data, described below, resulting in a total of 2193 genes, 2806 SNPs/VNTRs, 4544 CNVs and 158 linkage regions. We provide the results as an online resource for the broader autism research community at http://autismkb.cbi.pku.edu.cn/ with extensive evidence and annotations, supporting sophisticated browsing and searching functionalities.

DATA COLLECTION

Literature search

We searched the PubMed database for publications related to autism genetics, using the query term ‘autism AND associat*’ for association studies, ‘autism AND (gene OR microarray OR proteomics)’ for expression profiling studies and the other low-scale experimental studies, and ‘autism AND (CNV OR copy number variation OR microarray* OR microdel* OR microdup* OR rearrange* OR (genome-wide AND (linkage OR associa* OR scan)))’ for CNV and linkage studies. The abstracts of the 4000+ articles retrieved were reviewed to remove irrelevant papers, resulting in a final set of 579 articles, reporting a total of 11 GWAS, 242 low-scale candidate gene association studies, 13 expression profiling studies, 95 genome-wide CNV studies, 23 genome-wide linkage analyses and 236 other low-scale experimental studies. For syndromic autism-related genes, we first collected the autism-related disorders and their causal genes from a recently published comprehensive review (7). We then searched OMIM to get the official disease names and linked all the disorders to OMIM, and searched PubMed for additional citations using the query ‘(OMIM disease name) AND autism’ for each disease. All citations were double-checked manually. Finally, 99 genes for 94 autism-related disorders supported by 250 references were included in our data set of ‘Syndromic Autism Related Genes’. In total, we collected as many as 2135 non-syndromic autism-related genes, 99 syndromic autism-related genes, 4544 CNVs and 158 linkage regions. The genes located in the CNV and linkage regions were then retrieved by the UCSC Genome Browser (15).

Evidence collection

To establish the strength of evidence, we collected metadata about each study and result. Supplementary Table S1–S7 list the evidence collected for each type of experimental methods. In summary, for each study of non-syndromic autism, we collected the clinical and demographic features of the samples including ancestral background, country of origin, inclusion and exclusion criteria, number of cases and controls with gender ratio, age at examination and diagnosis criteria. We collected metadata about the experimental design including platform, experimental methods, statistical methods and statistical significance. For each gene, we estimated how much evidence supports its role in autism by each type of experimental methods and calculated a weighted sum, following a multi-dimensional evidence-based candidate gene prioritization approach (16). First, we assigned initial scores to the genes for each type of experimental methods (Supplementary Table S8). Score 0 is given if there is no positive evidence of the type. Table 1 lists the distribution of the scores for each type. Next, we used a benchmark data set consisting of 21 non-syndromic autism-related genes considered high confidence from six autism reviews (8,9,17–20) (Supplementary Table S9) to calculate the weights. We followed a gene prioritization approach (16) to generate a candidate weight matrix pool consisting of dN = 76 weight vectors, where N represents the number of experimental methods and d = N+1 represents possible different weights, 1–7 in the weight vectors. A combined score for each gene was then calculated by summing up the products of the scores and corresponding weights from the six experimental methods (16). All the 2135 candidate genes including 21 benchmark genes were sorted by their combined scores. We selected the weight matrix that gave the benchmark genes the highest rank as the optimal weight matrix (Supplementary Table S10). About 95% benchmark genes were ranked among the top 98% of all candidate genes. We chose the lowest combined score, 9, from the benchmark data set as the cutoff of high-confident genes, resulting in a core data set of 383 non-syndromic autism-related genes. Because the definition of ‘optimal weight matrix’ is always debatable, we provide an online ranking tool to allow users to re-rank the genes interactively by inputting customized weights based on their own experiences and preferences.
Table 1.

Score distribution of genes discovered by each experiment method

Experimental methodsScoresNumber of Genes
Genome-wide association studies181
246
35
Expression profiling11320
2285
350
Genome-wide CNV studies11086
234
319
Linkage analyses1535
243
30
Low scale genetic association studies1128
223
312
Other low-scale experimental studies1241
237
330
Score distribution of genes discovered by each experiment method For syndromic autism, we assigned four levels to the autism-related disorders: Level 1 disorders have one reported case with autistic symptoms, Level 2 have two to three cases in a single family, Level 3 have cases in more than one family and Level 4 are reported in multiple review papers (8). Causal genes of Level 3 and 4 disorders were considered high-confident genes in the core dataset.

Functional annotations

To better understand the function of the genes associated with autism, we collected extensive functional information and data, including crosslinks to NCBI Entrez gene (21), OMIM (21), Uniprot (http://www.uniprot.org/) and Ensembl (http://www.ensembl.org/), functional groups based on Gene Ontology (http://www.geneontology.org/), protein–protein interactions from database BioGRID (22), BIND (23) and HPRD (24), and genomic variants from the Database of Genomic Variants (DGV) (25). We linked the genes to three psychiatric disease databases, AlzGene (26), SzGene (27) and PDGene (http://www.pdgene.org/), when the gene is common between these diseases and ASD. Information about homologues of the genes were retrieved from Mouse Genome Informatics (MGI) (28), Zebrafish Model Organism Database (ZFIN) (29) and FlyBase (30). We collected comprehensive mRNA expression profiling data, including ESTs from NCBI Unigene Profiles (21), microarray expression profiles from BioGPS (31) and Allen Brain Atlas (32), and RNA-Seq (33–38). Protein expression evidence at peptide level was retrieved from PRIDE (39) and Peptide Atlas (40). We also collected transcription factor binding sites in the upstream regions of the genes from in-house collection of ChIP-Chip and ChIP-Seq data, miRNAs that may target the genes from miRWalk (41) and TarBase (42), and natural antisense transcripts that may regulate the genes from NATsDB (43). Possible post-translation modifications were retrieved from UniProt and dbPTM (44). We used KOBAS 2.0 (45) to retrieve the pathways that the genes are involved in from BioCyc (46), KEGG Pathway (47), PID (48), PID Reactome (48), PANTHER (49) and Reactome (50) and possible association with other diseases from Disease databases include KEGG Disease (51), FunDO (52,53), GAD (54), NHGRI GWAS Catalog (55) and OMIM (21). Pharmaco-genetics and drug information was collected from Comparative Toxicogenomics Database (CTD) (56), Pharmacogenomics Knowledge Base (57) and DrugBank (58). Supplementary Table S11 summarizes the gene coverage from each source database. The overlap between the genes discovered by expression profiling and those by the other genetic technologies is shown in Supplementary Table S12. Enriched functional pathways were identified by KOBAS 2.0 (45) and enriched GO terms were identified by DAVID (59). Pathways such as neuroactive ligand–receptor interaction, synapse transmission, and axon guidance were statistically significantly enriched in the core data set (Table 2). In addition to synapse transmission, GO terms such as transmission of nerve impulse, neuron differentiation were also found to be statistically significant (Table 3). The result is consistent with recent findings that synapse development, axon targeting and neuron motility are related to autism etiology (60,61).
Table 2.

Top five enriched pathway of the genes in the high-confident core dataset, using KOBAS2.0

TermDatabaseIDP ValueQ Value
Neuroactive ligand-receptor interactionKEGG PATHWAYhsa040801.03E-111.65E-09
Synaptic TransmissionReactomeREACT:136857.50E-109.06E-08
Axon guidanceReactomeREACT:182661.29E-081.24E-06
Calcium signaling pathwayKEGG PATHWAYhsa040202.25E-081.97E-06
Long-term potentiationKEGG PATHWAYhsa047201.76E-079.98E-06
Table 3.

Top 10 enriched GO terms of the genes in the high-confident core dataset

GO IDGO TermP ValueQ Value
GO:0019226transmission of nerve impulse5.44E-299.73E-26
GO:0007268synaptic transmission4.59E-288.21E-25
GO:0007610synapse1.05E-231.45E-20
GO:0045202behavior4.53E-238.10E-20
GO:0044057synapse part7.21E-229.94E-19
GO:0007267regulation of system process4.12E-217.38E-18
GO:0044456cell-cell signaling4.17E-217.46E-18
GO:0030182neuron differentiation8.21E-191.47E-15
GO:0031644regulation of neurological system process1.53E-182.74E-15
GO:0051969regulation of transmission of nerve impulse1.74E-183.11E-15
Top five enriched pathway of the genes in the high-confident core dataset, using KOBAS2.0 Top 10 enriched GO terms of the genes in the high-confident core dataset

DATABASE INTERFACE

We set up a MySQL relational database to store all the data. A user-friendly web interface for browsing and searching was implemented by PHP and JavaScript, powered by JQuery framework.

Browsing

Users can browse the data in AutismKB in a variety of ways, including by data sets, experimental methods or chromosome. The gene lists include a summary of information about the genes, hyperlinked to detailed gene evidence and annotation pages. Figure 1 shows a typical AutismKB gene entry. Basic information such as gene symbol, gene name, cytoband and cross links are provided (Figure 1A). Nucleotide sequences and protein sequences can be sent to WebLab (62) for further analysis (Figure 1B). Summaries of supporting evidence and category-specific scores are provided (Figure 1C). Users can click on the hyperlinks of the category-specific score to view different category of evidences. The categories without any evidence are hidden by default (Figure 1D). Users can click on ‘+’ to expand or ‘−’ to collapse different categories. Detailed information of polymorphisms for low scale association studies and GWAS can be found by clicking on ‘detail’ in the tables (Figure 1E). When exploring other low-scale studies and large-scale expression studies, users can click the down arrow in the right of the table to obtain more information (Figure 1F). Annotations of each gene can be obtained by clicking the label ‘view annotation’ in the top left.
Figure 1.

A typical gene entry in AutismKB. (A) Basic information and quick links, (B) nucleotide and protein sequences, (C) evidence statistics and links to different data sources, (D) example of default collapsed data source, (E) link and example of polymorphism information and (F) example of expanded data source with hidden information.

A typical gene entry in AutismKB. (A) Basic information and quick links, (B) nucleotide and protein sequences, (C) evidence statistics and links to different data sources, (D) example of default collapsed data source, (E) link and example of polymorphism information and (F) example of expanded data source with hidden information. CNVs are provided by a tabular view with name, cytoband, gain or loss, number, evidence types and reference. Users can use evidence type and chromosome to filter the table (Figure 2A). Clicking on the name can bring the detail information of each CNV including the samples and methods of the study, CNV region, and any syndromic and non-syndromic autism genes in the region (Figure 2B). Users can use chromosome to filter the linkage regions and click on linkage name to view detailed information.
Figure 2.

CNV list and a typical CNV entry in AutismKB. (A) the CNV list in AutismKB and (B) a typical CNV entry.

CNV list and a typical CNV entry in AutismKB. (A) the CNV list in AutismKB and (B) a typical CNV entry.

Searching

AutismKB supports both text-based search and sequence-based search. Users can find a quick search box on the top right of each page to search by gene symbol. Advanced search was provided to search genes, CNVs, linkage regions by gene name, gene symbol, NCBI Entrez id, Ensemble id, GO terms, UniProt ID, location, score, method and PubMed ID. Finally, a BLAST search against the nucleotide or protein sequences of all AutismKB genes is also available.

CONCLUSION

AutismKB is a comprehensive knowledgebase of autism-related genes, CNVs and linkage regions with extensive evidence and annotations. AutismKB will be updated periodically. We hope that it can be a valuable resource for the autism research community.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1-12.

FUNDING

Funding for open access charge: National Outstanding Young Investigator Award from Natural Science Foundation of China (grant number: 31025014); 973 Basic Research Program (grant number: 2011CBA01102); scholarships from Merck and Johnson and Johnson. Conflict of interest statement. None declared.
  60 in total

Review 1.  Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome.

Authors:  J Zhang; L Feuk; G E Duggan; R Khaja; S W Scherer
Journal:  Cytogenet Genome Res       Date:  2006       Impact factor: 1.636

Review 2.  Genetics of autism spectrum disorder.

Authors:  Sabine M Klauck
Journal:  Eur J Hum Genet       Date:  2006-06       Impact factor: 4.246

3.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits.

Authors:  Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-27       Impact factor: 11.205

4.  Systematic meta-analyses and field synopsis of genetic association studies in schizophrenia: the SzGene database.

Authors:  Nicole C Allen; Sachin Bagade; Matthew B McQueen; John P A Ioannidis; Fotini K Kavvoura; Muin J Khoury; Rudolph E Tanzi; Lars Bertram
Journal:  Nat Genet       Date:  2008-07       Impact factor: 38.330

Review 5.  Etiological heterogeneity in autism spectrum disorders: more than 100 genetic and genomic disorders and still counting.

Authors:  Catalina Betancur
Journal:  Brain Res       Date:  2010-12-01       Impact factor: 3.252

6.  Reactome: a database of reactions, pathways and biological processes.

Authors:  David Croft; Gavin O'Kelly; Guanming Wu; Robin Haw; Marc Gillespie; Lisa Matthews; Michael Caudy; Phani Garapati; Gopal Gopinath; Bijay Jassal; Steven Jupe; Irina Kalatskaya; Shahana Mahajan; Bruce May; Nelson Ndegwa; Esther Schmidt; Veronica Shamovsky; Christina Yung; Ewan Birney; Henning Hermjakob; Peter D'Eustachio; Lincoln Stein
Journal:  Nucleic Acids Res       Date:  2010-11-09       Impact factor: 16.971

7.  The UCSC Genome Browser database: update 2011.

Authors:  Pauline A Fujita; Brooke Rhead; Ann S Zweig; Angie S Hinrichs; Donna Karolchik; Melissa S Cline; Mary Goldman; Galt P Barber; Hiram Clawson; Antonio Coelho; Mark Diekhans; Timothy R Dreszer; Belinda M Giardine; Rachel A Harte; Jennifer Hillman-Jackson; Fan Hsu; Vanessa Kirkup; Robert M Kuhn; Katrina Learned; Chin H Li; Laurence R Meyer; Andy Pohl; Brian J Raney; Kate R Rosenbloom; Kayla E Smith; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2010-10-18       Impact factor: 16.971

8.  The database of experimentally supported targets: a functional update of TarBase.

Authors:  Giorgos L Papadopoulos; Martin Reczko; Victor A Simossis; Praveen Sethupathy; Artemis G Hatzigeorgiou
Journal:  Nucleic Acids Res       Date:  2008-10-27       Impact factor: 16.971

9.  Autism Genetic Database (AGD): a comprehensive database including autism susceptibility gene-CNVs integrated with known noncoding RNAs and fragile sites.

Authors:  Gregory Matuszek; Zohreh Talebizadeh
Journal:  BMC Med Genet       Date:  2009-09-24       Impact factor: 2.103

10.  PID: the Pathway Interaction Database.

Authors:  Carl F Schaefer; Kira Anthony; Shiva Krupa; Jeffrey Buchoff; Matthew Day; Timo Hannay; Kenneth H Buetow
Journal:  Nucleic Acids Res       Date:  2008-10-02       Impact factor: 16.971

View more
  91 in total

1.  Clinical Application of Chromosome Microarray Analysis in Han Chinese Children with Neurodevelopmental Disorders.

Authors:  Mingyu Xu; Yiting Ji; Ting Zhang; Xiaodong Jiang; Yun Fan; Juan Geng; Fei Li
Journal:  Neurosci Bull       Date:  2018-06-09       Impact factor: 5.203

2.  Functional Genomic Analyses Identify Pathways Dysregulated in Animal Model of Autism.

Authors:  Ji-Yun Huang; Yun Tian; Hui-Juan Wang; Hong Shen; Huan Wang; Sen Long; Mei-Hua Liao; Zhi-Rong Liu; Ze-Ming Wang; Dan Li; Rong-Rong Tao; Tian-Tian Cui; Shigeki Moriguchi; Kohji Fukunaga; Feng Han; Ying-Mei Lu
Journal:  CNS Neurosci Ther       Date:  2016-06-20       Impact factor: 5.243

Review 3.  Systems biology of complex symptom profiles: capturing interactivity across behavior, brain and immune regulation.

Authors:  Gordon Broderick; Travis John Adrian Craddock
Journal:  Brain Behav Immun       Date:  2012-09-28       Impact factor: 7.217

4.  Prenatal expression of MET receptor tyrosine kinase in the fetal mouse dorsal raphe nuclei and the visceral motor/sensory brainstem.

Authors:  Hsiao-Huei Wu; Pat Levitt
Journal:  Dev Neurosci       Date:  2013-03-20       Impact factor: 2.984

5.  Transcriptional consequences of 16p11.2 deletion and duplication in mouse cortex and multiplex autism families.

Authors:  Ian Blumenthal; Ashok Ragavendran; Serkan Erdin; Lambertus Klei; Aarathi Sugathan; Jolene R Guide; Poornima Manavalan; Julian Q Zhou; Vanessa C Wheeler; Joshua Z Levin; Carl Ernst; Kathryn Roeder; Bernie Devlin; James F Gusella; Michael E Talkowski
Journal:  Am J Hum Genet       Date:  2014-06-05       Impact factor: 11.025

6.  CHD8 regulates neurodevelopmental pathways associated with autism spectrum disorder in neural progenitors.

Authors:  Aarathi Sugathan; Marta Biagioli; Christelle Golzio; Serkan Erdin; Ian Blumenthal; Poornima Manavalan; Ashok Ragavendran; Harrison Brand; Diane Lucente; Judith Miles; Steven D Sheridan; Alexei Stortchevoi; Manolis Kellis; Stephen J Haggarty; Nicholas Katsanis; James F Gusella; Michael E Talkowski
Journal:  Proc Natl Acad Sci U S A       Date:  2014-10-07       Impact factor: 11.205

7.  Brief Report: A Gene Enrichment Approach Applied to Sleep and Autism.

Authors:  Emily A Abel; A J Schwichtenberg; Olivia R Mannin; Kristine Marceau
Journal:  J Autism Dev Disord       Date:  2020-05

8.  Application of custom-designed oligonucleotide array CGH in 145 patients with autistic spectrum disorders.

Authors:  Barbara Wiśniowiecka-Kowalnik; Monika Kastory-Bronowska; Magdalena Bartnik; Katarzyna Derwińska; Wanda Dymczak-Domini; Dorota Szumbarska; Ewa Ziemka; Krzysztof Szczałuba; Maciej Sykulski; Tomasz Gambin; Anna Gambin; Chad A Shaw; Tadeusz Mazurczak; Ewa Obersztyn; Ewa Bocian; Paweł Stankiewicz
Journal:  Eur J Hum Genet       Date:  2012-10-03       Impact factor: 4.246

9.  A Statistical Framework for Mapping Risk Genes from De Novo Mutations in Whole-Genome-Sequencing Studies.

Authors:  Yuwen Liu; Yanyu Liang; A Ercument Cicek; Zhongshan Li; Jinchen Li; Rebecca A Muhle; Martina Krenzer; Yue Mei; Yan Wang; Nicholas Knoblauch; Jean Morrison; Siming Zhao; Yi Jiang; Evan Geller; Iuliana Ionita-Laza; Jinyu Wu; Kun Xia; James P Noonan; Zhong Sheng Sun; Xin He
Journal:  Am J Hum Genet       Date:  2018-05-10       Impact factor: 11.025

Review 10.  Glutamatergic candidate genes in autism spectrum disorder: an overview.

Authors:  Andreas G Chiocchetti; Hanna S Bour; Christine M Freitag
Journal:  J Neural Transm (Vienna)       Date:  2014-02-04       Impact factor: 3.575

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.