| Literature DB >> 29112736 |
Jinchen Li1,2,3, Leisheng Shi1, Kun Zhang1, Yi Zhang1, Shanshan Hu1, Tingting Zhao1, Huajing Teng4, Xianfeng Li3,4, Yi Jiang3, Liying Ji1, Zhongsheng Sun1,4.
Abstract
A growing number of genomic tools and databases were developed to facilitate the interpretation of genomic variants, particularly in coding regions. However, these tools are separately available in different online websites or databases, making it challenging for general clinicians, geneticists and biologists to obtain the first-hand information regarding some particular variants and genes of interest. Starting with coding regions and splice sties, we artificially generated all possible single nucleotide variants (n = 110 154 363) and cataloged all reported insertion and deletions (n = 1 223 370). We then annotated these variants with respect to functional consequences from more than 60 genomic data sources to develop a database, named VarCards (http://varcards.biols.ac.cn/), by which users can conveniently search, browse and annotate the variant- and gene-level implications of given variants, including the following information: (i) functional effects; (ii) functional consequences through different in silico algorithms; (iii) allele frequencies in different populations; (iv) disease- and phenotype-related knowledge; (v) general meaningful gene-level information; and (vi) drug-gene interactions. As a case study, we successfully employed VarCards in interpretation of de novo mutations in autism spectrum disorders. In conclusion, VarCards provides an intuitive interface of necessary information for researchers to prioritize candidate variations and genes.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29112736 PMCID: PMC5753295 DOI: 10.1093/nar/gkx1039
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A general workflow of VarCards. A mass of genomic, genetic and clinical data sources should be systematically evaluated for prioritizing candidate variants and genes underlying genetic diseases. Various variant-level and gene-level implications have been integrated in VarCards.
Summary of integrated data sources in VarCards
| Category | Data source |
|---|---|
| Part one: variation-level implication | |
| Allele frequency | dbSNP, gnomAD, ExAC, 1000 Genomes, ESP, Kaviar, HRC, CG69 |
| Missense prediction | SIFT, PolyPhen2_HDIV, PolyPhen2_HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, PROVEAN, MetaSVM, MetaLR, VEST, M-CAP, CADD, GERP++, DANN, fathmm-MKL, Eigen, GenoCanyon, fitCons, PhyloP, PhastCons, SiPhy, REVEL |
| Disease-related | InterVar, ClinVar, denovo-db, COSMIC, ICGC, GWAS Catalog, |
| Other data | RefSeq, InterPro, Segmental duplication |
| Part two: gene-level implication | |
| Basic information | UniProt, HomoloGene, Ensembl, NCBI Gene |
| Genic intolerance | RVIS, LoFtool, heptanucleotide context intolerance score |
| Gene function | UniProt, Gene Ontology, InterPro, InBio Map, BioSystems |
| Disease-related | OMIM, MGI, ClinVar, HPO |
| Gene expression | UniProt, GTEx, The Human Protein Atlas |
| Target drug | DGIdb |
dbSNP, single nucleotide polymorphism database; gnomAD, genome aggregation database; ESP, NHLBI GO Exome Sequencing Project; HRC, haplotype reference consortium; Kaviar, Kaviar Genomic Variant Database, CG69, allele frequency in 69 human subjects sequenced by Complete Genomics. OMIM, online mendelian inheritance in man; MGI, mouse genome informatics; COSMIC, catalogue of somatic mutations in cancer; ICGC, international cancer genome consortium; HPO, human phenotype ontology; GTEx, Genotype-Tissue Expression.
Figure 2.Snapshot of variant-level implications in VarCards. There are three approaches to access variant-level implications, including ‘Quick search’, ‘Advanced search’ and ‘Annotate’. As an example, the results of a quick search for the variant ‘SCN2A:c.562C>T’, including functional effects at the transcript and protein levels, predicted the damaging severity of missense variants, allele frequencies in different populations and information in disease-related databases.
Figure 3.Snapshot of gene-level implications in VarCards. As example, the typical gene-level implications of the SCN2A gene are illustrated, including basic information, gene functions, associated phenotypes and diseases, gene expression, homology, variants in different population and drug–gene interactions.
Figure 4.Case study of de novo mutations in ASD. (A) Workflow of data analysis. The LoF and predicted deleterious missense DNMs that had never been previously observed in the general population (based on gnomAD) and were found to be associated with ASD. These DNMs were identified in 600 ASD cases, accounting for 23.92% of ASD cohorts. We then classified these 600 ASD cases into five classes based on evidence of the associations of DNM-targeted genes with ASD, other neuropsychiatric disorders, and known disease pathways (see also in panel C). (B) Average number of DNMs classified by functional effects and their allele frequencies in gnomAD were compared between ASD and sibling. (C) Pie charts illustrating the percentages of ASD cases that harbored significant functional DNMs, non-functional DNMs, or non-coding DNMs. *P < 0.05; **P < 0.01; ***P < 0.001.