Literature DB >> 29112736

VarCards: an integrated genetic and clinical database for coding variants in the human genome.

Jinchen Li^1,2,3, Leisheng Shi¹, Kun Zhang¹, Yi Zhang¹, Shanshan Hu¹, Tingting Zhao¹, Huajing Teng⁴, Xianfeng Li^3,4, Yi Jiang³, Liying Ji¹, Zhongsheng Sun^1,4.

Abstract

A growing number of genomic tools and databases were developed to facilitate the interpretation of genomic variants, particularly in coding regions. However, these tools are separately available in different online websites or databases, making it challenging for general clinicians, geneticists and biologists to obtain the first-hand information regarding some particular variants and genes of interest. Starting with coding regions and splice sties, we artificially generated all possible single nucleotide variants (n = 110 154 363) and cataloged all reported insertion and deletions (n = 1 223 370). We then annotated these variants with respect to functional consequences from more than 60 genomic data sources to develop a database, named VarCards (http://varcards.biols.ac.cn/), by which users can conveniently search, browse and annotate the variant- and gene-level implications of given variants, including the following information: (i) functional effects; (ii) functional consequences through different in silico algorithms; (iii) allele frequencies in different populations; (iv) disease- and phenotype-related knowledge; (v) general meaningful gene-level information; and (vi) drug-gene interactions. As a case study, we successfully employed VarCards in interpretation of de novo mutations in autism spectrum disorders. In conclusion, VarCards provides an intuitive interface of necessary information for researchers to prioritize candidate variations and genes.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Proteins

Year: 2018 PMID： 29112736 PMCID： PMC5753295 DOI： 10.1093/nar/gkx1039

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

In recent decades, next-generation sequencing (NGS) has resulted in a revolution in the rapid detection of large amounts of sequence variants in the human genome (1). Among NGS technologies, whole-exome sequencing is probably the most commonly used for prioritizing candidate mutations and genes underlying Mendelian, complex and undiagnosed genetic diseases as well as human cancers. However, only a small subset of functionally relevant variants, particularly in coding regions, are potentially associated with a given disease (2). To better interpret human variants for identifying disease-causing variations, a growing number of databases and tools have been successively developed (3). In addition, several organizations, such as The American College of Medical Genetics and Genomics, presented their guidelines for evaluating the causality between genetic variants and human diseases based on known genetic and clinical data sources, and functional studies (4–6). Datasets from the 1000 Genomes Project (7), National Institutes of Health Heart, Lung and Blood Institute (NHLBI) Exome Sequencing Project (ESP) (8), Exome Aggregation Consortium (ExAC) (9,10) and the Genome Aggregation Database (gnomAD) (9) provided large-scale reference genetic variations for multiple populations, which are critical for filtering out common variants that are less likely be disease-causing, allowing to identify rare variants. Additionally, a variety of in silico algorithms and tools, such as SIFT (11), PolyPhen2 (12) and MutationTaster (13), were developed to predict whether missense variants are damaging to the protein function or structure. To facilitate the process of querying missense predictions, the dbNSFP database, which has been widely used in the research community, was developed by integrating different algorithms and is constantly being updated (14–16). To date, dbNSFP v3.0 has integrated more than 20 algorithms (16). Wang and colleagues developed a functional annotation pipeline named ANNOVAR for genetic variant annotation (17). To further facilitate web-based personal genome annotation, they also developed a web server called wANNOVAR (18). Through the command-line tool and web server, users can effectively analyze genomic variants (19). Furthermore, several other variant- and gene-level databases, such as InterVar (20), ClinVar (21), InterPro (22), denovo-db (23), COSMIC (24), OMIM (25), Ensembl (26), GenBank (27), The UCSC Genome Browser (28), UniProt (29), Gene Ontology (30) and DGIdb (31), were successfully developed to assist in the interpretation of genetic variants and prioritization of disease candidate genes. Despite the great progress of these databases and tools in genomics and medical genetics, these resources are presented separately on various online websites, which cannot simultaneously perform both functional prediction and its clinical implication. It is also both tedious and time consuming to obtain first-hand core information regarding some variants and genes of interest. Therefore, there is a necessity to develop a convenient database through which users can retrieve general genetic and clinical knowledge for given variants in one integrated online database. To address this need, we developed VarCards, which provides an intuitive graphical user interface for querying genetic and clinical data regarding coding variants in the human genome.

MATERIALS AND METHODS

Variant-level data source

Based on definitions of transcripts from RefSeq (32), CCDS (33), UCSC known Gene (34) and Ensembl Gene (35) with the reference human genome (CRCh37/hg19), we retrieved the coding regions and splicing sites (2-base pairs of the splicing junctions), and artificially generated any possible single nucleotide variants (SNVs) of these regions. For example, for a given genomic position, if the nucleotide was cytosine (C) in the reference human genome, we artificially generated three SNVs: cytosine to thymidine (C>T), cytosine to guanine (C>G) and cytosine to adenine (C>A). In addition, we cataloged all reported insertions and deletions (INDELs) sourced from the general population variant database gnomAD (9), clinical variations database (ClinVar) (21), International Cancer Genome Consortium (ICGC) (36), Catalogue of Somatic Mutations In Cancer (COSMIC) (24) and de novo mutations database (denovo-db) (23). We downloaded the allele frequencies of different populations from various human genetic variation databases, including gnomAD (variants of 15 496 genomes and 123 136 exomes from seven populations worldwide) (9), ExAC (60 706 exomes from seven populations) (9,10), ESP (6503 exomes from European Americans and African Americans) (8), 1000 Genomes Project (genomic data for 2504 individuals from five populations) (7), Kaviar genomic variant database (integrated variants from 35 projects encompassing 13 200 genomes and 64 600 exomes) (37), Haplotype Reference Consortium (HRC, 64 976 haplotypes from individuals with predominantly European ancestry) (38) and CG69 (69 individuals with complete genomes) (39). In addition, we extracted variant and related diseases or phenotypes information from InterVar (20), ClinVar (21), denovo-db (23), COSMIC (24), ICGC (36) and GWAS Catalog (40). Furthermore, we obtained predictive scores and pathogenicity consequences of missense variants from 23 in silico algorithms or tools, including SIFT (41,42), PolyPhen2_HDIV (12), PolyPhen2_HVAR (12), LRT (43), MutationTaster (44), MutationAssessor (45), FATHMM (46), PROVEAN (47), MetaSVM (48), MetaLR (48), VEST3 (49), M-CAP (50), CADD (51), GERP++ (52), DANN (53), fathmm-MKL (54), Eigen (55), GenoCanyon (56), fitCons (57), PhyloP (58), PhastCons (59), SiPhy (60) and REVEL (61). In particular, the predicted damaging scores and functional consequences of the 23 algorithms were sourced from dbNSFP v3.0 database (16). Finally, some genomic features, such as the protein domain from InterPro (22) and repeat segment from segmental duplication database (62), were also cataloged.

Gene-level data source

Gene-level basic information and functional information were sourced from UniProt (29), NCBI Gene (63) and BioSystems (64). The Gene ontology (GO) terms from the Gene Ontology Consortium, protein domains from InterPro (22) and protein–protein interactions from InBio Map (65) were also integrated. We collected the genic intolerance score of each gene from three studies: (i) the residual variation intolerance score (RVIS) from Petrovski et al. (66), (ii) loss-of-function (LoF) intolerance (gene intolerance score based on loss-of-function variants in 60 706 individuals) from Fadista et al. (67) and (iii) the heptanucleotide context intolerance score from Aggarwala et al. (68). In addition, data for genes associated with different diseases or phenotypes were curated from Online Mendelian Inheritance in Man (OMIM) (25), ClinVar (21), Human Phenotype Ontology (HPO) (69) and MGI (mammalian phenotype from mouse genome informatics) (70). Furthermore, we collected gene expression data for various tissues from the genotype-tissue Expression Project (GTEx) (71) and the protein subcellular map from the Human Protein Atlas (72). To present an overall view of gene expression levels, the means and standard deviations across 31 primary tissues and 54 secondary tissues for each gene were calculated. Protein sequences across 21 species were sourced from HomoloGene at NCBI. Finally, the data for drug–gene interactions and gene druggability were sourced from the Drug-gene Interaction Database (DGIdb) (31) to assist with the precision medicine.

Combination and annotation

Similar to our previous studies (73–77), we performed the command line tool, ANNOVAR (17) to annotate all SNVs and INDELs with respect to variant-level data sources, including the following information: (i) functional effects of variants; (ii) functional prediction of missense mutations by 23 predictive algorithms; (iii) allele frequencies in different populations; (iv) reported variants in different disease- and phenotype-related databases; and (v) some other genome features, such as CytoBand. The gene-level data sources were integrated into the database by using our in-house script. LoF variants, including stop-gain, stop-loss, splicing sites SNVs, and frameshift indels, and deleterious missense SNVs with an allele frequency of <0.0001 based on gnomAD (9) were regarded as potential extreme variants. Deleterious missense SNVs were predicted using a combination of 23 computational methods.

Database construction

A user-friendly web interface, VarCards (http://159.226.67.237/sun/varcards/ or http://varcards.biols.ac.cn/), was developed by combining jQuery with a PHP-based web framework CodeIgniter, supported by versatile browsing and searching functionalities, as our previous databases and web servers (73,76–78). Annotation information was stored in either MySQL database or flat files. Academic users can access genetic data or extended analysis results freely through the web interface with no requirement for the use of a username or password.

De novo mutation (DNM) annotations in a case study

In the current case study, DNMs from 2508 autism spectrum disorder (ASD) cases and 1911 unaffected siblings were sourced from the Simons Simplex collection (SSC) (79,80). VarCards was used to annotate all DNMs, and only coding and splicing site DNMs were retained for further analysis. Deleterious missense mutations were predicted by the combination of REVEL (61) and VEST3 (49). ASD candidate genes were sourced from ClinVar (21), OMIM (25) and SFARI Gene (81). The online tool DAVID (82) was employed to perform functional enrichment analysis.

RESULTS AND WEB INTERFACE

To assess the clinical significance of given variants, such as DNMs from sporadic families, homozygous variants from consanguineous families, and cosegregated heterozygous variants from multigeneration families, various genomic, genetic and clinical evidences must be systematically evaluated. VarCards provides integrated web interfaces to conveniently search, browse and annotate the variant- and gene-level implications of any given coding variants (Figure 1 and Table 1). For variant-level implications, users can obtained first-hand information, including: (i) whether this variant has been reported to be associated with diseases; (ii) allele frequencies in different populations; (iii) functional effects of transcript and protein levels; and (iv) deleteriousness predicted by various algorithms. In addition, VarCards provides general gene-level implications, such as basic information, genic intolerance, gene function, gene-related diseases, gene expression and target drug to assist users with prioritizing candidate genes.

Figure 1.

Table 1.

Summary of integrated data sources in VarCards

Category	Data source
Part one: variation-level implication
Allele frequency	dbSNP, gnomAD, ExAC, 1000 Genomes, ESP, Kaviar, HRC, CG69
Missense prediction	SIFT, PolyPhen2_HDIV, PolyPhen2_HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, PROVEAN, MetaSVM, MetaLR, VEST, M-CAP, CADD, GERP++, DANN, fathmm-MKL, Eigen, GenoCanyon, fitCons, PhyloP, PhastCons, SiPhy, REVEL
Disease-related	InterVar, ClinVar, denovo-db, COSMIC, ICGC, GWAS Catalog,
Other data	RefSeq, InterPro, Segmental duplication

Part two: gene-level implication
Basic information	UniProt, HomoloGene, Ensembl, NCBI Gene
Genic intolerance	RVIS, LoFtool, heptanucleotide context intolerance score
Gene function	UniProt, Gene Ontology, InterPro, InBio Map, BioSystems
Disease-related	OMIM, MGI, ClinVar, HPO
Gene expression	UniProt, GTEx, The Human Protein Atlas
Target drug	DGIdb

dbSNP, single nucleotide polymorphism database; gnomAD, genome aggregation database; ESP, NHLBI GO Exome Sequencing Project; HRC, haplotype reference consortium; Kaviar, Kaviar Genomic Variant Database, CG69, allele frequency in 69 human subjects sequenced by Complete Genomics. OMIM, online mendelian inheritance in man; MGI, mouse genome informatics; COSMIC, catalogue of somatic mutations in cancer; ICGC, international cancer genome consortium; HPO, human phenotype ontology; GTEx, Genotype-Tissue Expression.

A general workflow of VarCards. A mass of genomic, genetic and clinical data sources should be systematically evaluated for prioritizing candidate variants and genes underlying genetic diseases. Various variant-level and gene-level implications have been integrated in VarCards. dbSNP, single nucleotide polymorphism database; gnomAD, genome aggregation database; ESP, NHLBI GO Exome Sequencing Project; HRC, haplotype reference consortium; Kaviar, Kaviar Genomic Variant Database, CG69, allele frequency in 69 human subjects sequenced by Complete Genomics. OMIM, online mendelian inheritance in man; MGI, mouse genome informatics; COSMIC, catalogue of somatic mutations in cancer; ICGC, international cancer genome consortium; HPO, human phenotype ontology; GTEx, Genotype-Tissue Expression.

Variant-level implications

Overall, 110 154 363 SNVs and 1 223 370 INDELs in coding regions or splicing sites are included in VarCards. Both general and advanced query interfaces are provided to access the detailed annotation data of these variants. Common search terms, such as genomic position and regions, gene symbol, and nucleic acid changes in a certain gene or transcript, are supported to allow users to quickly analyze variants of interest. Search results return as a page contained a table, which display all variant-level implications (Figure 2), including (i) functional effects at the protein and transcript levels in all four gene annotation systems; (ii) the predicted damaging scores and functional consequences of missense variants of 23 in silico algorithms; (iii) allele frequencies of different populations in gnomAD (9), ExAC (10), ESP (8), 1000G (7), Kaviar (37), HRC (38) and CG69 (39); and (iv) disease- and phenotype-related knowledge in dbSNP (83), ClinVar (21), denovo-db (23), InterVar (20), COSMIC (24), ICGC (36), GWAS Catalog (40) and InterPro (22). The search results can be flexibly filtered by several properties, such as functional effects, damaging scores and allele frequencies. To meet the needs of different users, VarCards also allows users to perform advanced searches by pasting a list or uploading a file containing a mass of search terms with specific formats, including VCF4, ANNOVAR, genomic coordinates and genomic regions (Figure 2). We encourage users to specify data sources of interest for advanced searches. Notably, users can freely export query results as Excel or CSV files or copy them to the clipboard.

Figure 2.

Snapshot of variant-level implications in VarCards. There are three approaches to access variant-level implications, including ‘Quick search’, ‘Advanced search’ and ‘Annotate’. As an example, the results of a quick search for the variant ‘SCN2A:c.562C>T’, including functional effects at the transcript and protein levels, predicted the damaging severity of missense variants, allele frequencies in different populations and information in disease-related databases.

Gene-level implications

For genes containing given variants, VarCards provides seven specified panels to present gene-level implications (Figure 3). The ‘Basic information’ panel provides the following information: (i) primary information extracted from NCBI Gene (63), such as official gene name, synonyms and chromosomal location; (ii) a brief description of the cellular function of the protein encoded by the gene sourced from UniProt (29); and (iii) the genic intolerance score from three studies (66–68). The ‘Gene function’ panel provides information related to the protein entry name, length, subunit structure and domains, protein–protein interactions, GO terms and biological pathways. The ‘Phenotype and disease’ panel presents the reported disease-associated variants or genes from OMIM (25), ClinVar (21), denovo-db (23), MGI (70) and HPO (69). For the ‘Gene expression’ panel, the expression levels across 31 primary tissues and 54 secondary tissues are illustrated using a bar chart. For the ‘Homology’ panel, multiple alignments of protein amino acid sequences across 21 species are presented to assist the user in evaluating evolutionarily conserved sites. In addition, quick links to the interested gene at ENSEMBL (26) and TreeFam (84) are listed below the panel. Via the ‘Variants in different populations’ panel, users can inspect the number of variants with different functional effects and allele frequencies to preliminary estimate the general mutation rate in different populations. For the ‘Drug-gene interaction’ panel, the drug-gene interactions and gene druggability data sourced from DGIdb 2.0 (31) are presented in a real-time manner. Notably, only core information of gene-level implications is shown in VarCards. Links to external resources with detailed information are provided and can be easily accessed for academic users.

Figure 3.

Snapshot of gene-level implications in VarCards. As example, the typical gene-level implications of the SCN2A gene are illustrated, including basic information, gene functions, associated phenotypes and diseases, gene expression, homology, variants in different population and drug–gene interactions.

Browsing and customized annotations

Users can access variant- and gene-level implications via the browse function in the VarCards database. Moreover, VarCards implements a function for customized annotation by which users can conveniently annotate their variants in VCF or ANNOVAR formats. For different annotation needs, users can flexibly specify their data source of interest and cutoff of extreme variants including functional effects, allele frequencies and predicted damaging scores from any of the 23 in silico algorithms. After the variant file is uploaded, an annotate job will run in the backend, and when the job is completed, an email containing a download link for retrieving the results will be sent to the user.

Case study

DNMs play essential roles in the etiology of ASD, as shown in our previous studies (73–75). We cataloged 3397 and 2285 DNMs of 2508 ASD cases and 1911 unaffected siblings, respectively, from SSC (79,80) (Figure 4A). After removing noncoding variants, 2723 exonic DNMs retained in ASD, including 1114 DNMs that were presented in gnomAD (9) and 1609 DNMs that were novel variants. Consistent with previous studies (85), we found that the former category of DNMs did not show significant differences in patients with ASD when compared with siblings, whereas the mutation rate of LoF and predicted deleterious missense variants (i.e. putative functional DNMs) rather than tolerated missense, synonymous and nonframeshift INDELs (i.e. non-functional DNMs) of the novel DNM, was significantly higher than that in the control (P < 0.05, Figure 4B). We found that 600 (23.92%) patients with ASD harbored functional novel DNMs, 1015 (40.47%) patients harbored nonfunctional novel DNMs or other DNMs that presented in gnomAD, and 893 (35.61%) patients did not harbor any exonic DNMs (Figure 4C).

Figure 4.

Case study of de novo mutations in ASD. (A) Workflow of data analysis. The LoF and predicted deleterious missense DNMs that had never been previously observed in the general population (based on gnomAD) and were found to be associated with ASD. These DNMs were identified in 600 ASD cases, accounting for 23.92% of ASD cohorts. We then classified these 600 ASD cases into five classes based on evidence of the associations of DNM-targeted genes with ASD, other neuropsychiatric disorders, and known disease pathways (see also in panel C). (B) Average number of DNMs classified by functional effects and their allele frequencies in gnomAD were compared between ASD and sibling. (C) Pie charts illustrating the percentages of ASD cases that harbored significant functional DNMs, non-functional DNMs, or non-coding DNMs. *P < 0.05; **P < 0.01; ***P < 0.001. A previous study estimated that 45% of de novo LoF mutations and 13% of de novo missense mutations accounted for 9 and 12% of ASD cases, respectively (80). For the 600 cases with functional DNMs, we then prioritized their candidate genes based on clinical, genetic and biological information from VarCards and SFARI Gene (81). As a result, we found that 126 (21%) cases harbored functional DNMs in strong ASD candidate genes; 114 (19%) cases harbored functional DNMs in suggested ASD candidate genes; 41 (6.83%) cases harbored functional DNMs in genes associated with other neurodevelopmental disorders; 20 (3.33%) cases harbored functional DNMs in genes involved in known ASD pathways; and 299 (49.83%) cases harbored functional DNMs in genes without sufficient evidence supporting their identity as ASD candidate genes (Figure 4C). In total, 301 of 2508 (12%) ASD cases were found to have possible ASD risk DNMs and genes, which was higher than that reported in a previous study (86) in which de novo LoF mutations in ASD candidate genes accounted for 5% of patients. Finally, we found that candidate genes in the 301 cases were enriched in multiple biological processes in GO and KEGG pathways, including in utero embryonic development (GO:0001701, adjusted P = 6.5 × 10−4), circadian entrainment (hsa04713, adjusted P = 6.3 × 10−4), dopaminergic synapse (hsa04728, adjusted P = 7.9 × 10−4), glutamatergic synapse (hsa04724, adjusted P = 0.001), covalent chromatin modification (GO:0016569, adjusted P = 0.024), and the canonical Wnt signaling pathway (GO:0060070, adjusted P = 0.032).

DISCUSSION

Analysis of numerous variants detected by NGS technologies provide us unprecedented opportunities to prioritize clinically significant variations and genes underlying human genetic diseases (1). The major challenge is to interpret the close relationships between genotypes and phenotypes (87). Several scattering distributed genetic, genomic, and clinical data sources can assist in prioritizing disease-causing or disease-risk variations. In this study, we retrieved the most important core information from more than 60 genetic, genomic and clinical data sources and integrated them into the VarCards database, allowing clinicians, geneticists and biologists to conveniently analyze the first-hand general variant- and gene-level implications without having to search various websites or annotate variants by command line. Despite of the advance of other available tools, VarCards shows significant differences. The dbNSFP (14–16) focus on functional effects of non-synonymous SNVs and their annotations. In addition, dbNSFP is a locally installed database and therefore doesn’t provide any web interface to search, browse and annotate genetic variants, which is not easily accessible. The command-line tools, such as ANNOVAR (17) and WGSA (88), and the application program interface MyVariant.info (89), are developed to perform functional annotation of genetic variant. These tools can analyze mass of variants, but they does not provide any web interface and their results are not clearly visualized for the end-users. It is not convenient for researchers without sufficient bioinformatics skills. The web servers, such as wANNOVAR (18,19), VEP tool (90), Phenolyzer (91), wInterVar (20) and SeqMule (92) have been developed to analyze the genomic variants and predict functional consequences. However, results are reported in tab-delimited, CSV or VCF formats, which may not be intuitive enough for general clinicians, geneticists and biologists. In addition, these tools mainly focus on variant-level annotations and the gene-level information has not been annotated sufficiently in the web server. Moreover, when we query a small number of variants using web server, the results cannot be immediate shown because new submitted jobs usually need to be queued. Compared with these tools or web servers, VarCards not only provides similar annotation function, but also provides a more intuitive online interface for researchers without sufficient bioinformatics skills to accessibly obtain the first-hand genetic, genomic and clinical information of any coding variants within a short time. To interpret whether a variant is significantly contributes to human disease, performing systematic and quantitative evaluations of positive and negative evidences regarding to its pathogenicity are urgently needed. There are several issues we would like to emphasize here. First, since the evidence of clinically significant variations from disease- and phenotype-related databases, such as ClinVar (21), COSMIC (24), OMIM (25) and HGMD (93), was mostly provided by individual studies or manually collected from scientific literature, different criteria and methodological biases certainly occurred in assessing the pathogenicity of genetic variants. Users should note the possibility of false-positive data in these data sources when interpreting known disease-contributing variations and genes (20,94). Second, VarCards provided prediction scores of 23 in silico algorithms for missense variants, and users should also note the potential limitations of specificity and sensitivity of these methods (16). Third, to reduce false-positive results in the identification of disease candidate genes, we encourage users to replicate their findings in more samples, perform functional experiment studies and carefully examine the clinical data of patients. Considering the complex processes of genetic testing, VarCards did not directly identify disease-causing variations, but provided various publically available data sources containing information on the given variants. It is expected that some users or groups will be able to flexibly prioritize candidate variations and genes based on their own criteria and genetic data according to the needs of their specific study. VarCards will be updated continuously to provide the research community an up-to-date resource, not only update the data sources that we integrated, but also the integrated more new datasets that may be useful for medical genetics. To improve the VarCards database in further updates, we encourage users to provide feedback with any suggestions or data sources. In the first phase, VarCards focused on variants in coding regions and splicing sites, accounting for 85% of disease-causing variations in Mendelian disorders (2). By reducing sequencing costs, new high-throughput technologies and analysis methods give us additional opportunities to investigate regulatory variants and functional elements in noncoding regions (95). However, the clinical interpretation of variants in noncoding regions still remains a major challenge (96). We plan to update the VarCards database in the next phase for rapid interpretation of noncoding variants. In summary, VarCards provides an intuitive interface of genetic, genomic, and clinical knowledge of coding variants, accelerating the prioritization of candidate variations and genes.

92 in total

1. Vitamin D-related genes are subjected to significant de novo mutation burdens in autism spectrum disorder.

Authors: Jinchen Li; Lin Wang; Ping Yu; Leisheng Shi; Kun Zhang; Zhong Sheng Sun; Kun Xia
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2017-04-13 Impact factor: 3.568

2. Mouse Genome Informatics (MGI): Resources for Mining Mouse Genetic, Genomic, and Biological Data in Support of Primary and Translational Research.

Authors: Janan T Eppig; Cynthia L Smith; Judith A Blake; Martin Ringwald; James A Kadin; Joel E Richardson; Carol J Bult
Journal: Methods Mol Biol Date: 2017

3. dbNSFP v2.0: a database of human non-synonymous SNVs and their functional predictions and annotations.

Authors: Xiaoming Liu; Xueqiu Jian; Eric Boerwinkle
Journal: Hum Mutat Date: 2013-07-10 Impact factor: 4.878

4. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

5. An expanded sequence context model broadly explains variability in polymorphism levels across the human genome.

Authors: Varun Aggarwala; Benjamin F Voight
Journal: Nat Genet Date: 2016-02-15 Impact factor: 38.330

6. Ensembl 2017.

Authors: Bronwen L Aken; Premanand Achuthan; Wasiu Akanni; M Ridwan Amode; Friederike Bernsdorff; Jyothish Bhai; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Thomas Juettemann; Stephen Keenan; Matthew R Laird; Ilias Lavidas; Thomas Maurel; William McLaren; Benjamin Moore; Daniel N Murphy; Rishi Nag; Victoria Newman; Michael Nuhn; Chuang Kee Ong; Anne Parker; Mateus Patricio; Harpreet Singh Riat; Daniel Sheppard; Helen Sparrow; Kieron Taylor; Anja Thormann; Alessandro Vullo; Brandon Walts; Steven P Wilder; Amonida Zadissa; Myrto Kostadima; Fergal J Martin; Matthieu Muffato; Emily Perry; Magali Ruffier; Daniel M Staines; Stephen J Trevanion; Fiona Cunningham; Andrew Yates; Daniel R Zerbino; Paul Flicek
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

7. A reference panel of 64,976 haplotypes for genotype imputation.

Authors: Shane McCarthy; Sayantan Das; Warren Kretzschmar; Olivier Delaneau; Andrew R Wood; Alexander Teumer; Hyun Min Kang; Christian Fuchsberger; Petr Danecek; Kevin Sharp; Yang Luo; Carlo Sidore; Alan Kwong; Nicholas Timpson; Seppo Koskinen; Scott Vrieze; Laura J Scott; He Zhang; Anubha Mahajan; Jan Veldink; Ulrike Peters; Carlos Pato; Cornelia M van Duijn; Christopher E Gillies; Ilaria Gandin; Massimo Mezzavilla; Arthur Gilly; Massimiliano Cocca; Michela Traglia; Andrea Angius; Jeffrey C Barrett; Dorrett Boomsma; Kari Branham; Gerome Breen; Chad M Brummett; Fabio Busonero; Harry Campbell; Andrew Chan; Sai Chen; Emily Chew; Francis S Collins; Laura J Corbin; George Davey Smith; George Dedoussis; Marcus Dorr; Aliki-Eleni Farmaki; Luigi Ferrucci; Lukas Forer; Ross M Fraser; Stacey Gabriel; Shawn Levy; Leif Groop; Tabitha Harrison; Andrew Hattersley; Oddgeir L Holmen; Kristian Hveem; Matthias Kretzler; James C Lee; Matt McGue; Thomas Meitinger; David Melzer; Josine L Min; Karen L Mohlke; John B Vincent; Matthias Nauck; Deborah Nickerson; Aarno Palotie; Michele Pato; Nicola Pirastu; Melvin McInnis; J Brent Richards; Cinzia Sala; Veikko Salomaa; David Schlessinger; Sebastian Schoenherr; P Eline Slagboom; Kerrin Small; Timothy Spector; Dwight Stambolian; Marcus Tuke; Jaakko Tuomilehto; Leonard H Van den Berg; Wouter Van Rheenen; Uwe Volker; Cisca Wijmenga; Daniela Toniolo; Eleftheria Zeggini; Paolo Gasparini; Matthew G Sampson; James F Wilson; Timothy Frayling; Paul I W de Bakker; Morris A Swertz; Steven McCarroll; Charles Kooperberg; Annelot Dekker; David Altshuler; Cristen Willer; William Iacono; Samuli Ripatti; Nicole Soranzo; Klaudia Walter; Anand Swaroop; Francesco Cucca; Carl A Anderson; Richard M Myers; Michael Boehnke; Mark I McCarthy; Richard Durbin
Journal: Nat Genet Date: 2016-08-22 Impact factor: 38.330

8. Identifying Mendelian disease genes with the variant effect scoring tool.

Authors: Hannah Carter; Christopher Douville; Peter D Stenson; David N Cooper; Rachel Karchin
Journal: BMC Genomics Date: 2013-05-28 Impact factor: 3.969

9. Current status and new features of the Consensus Coding Sequence database.

Authors: Catherine M Farrell; Nuala A O'Leary; Rachel A Harte; Jane E Loveland; Laurens G Wilming; Craig Wallin; Mark Diekhans; Daniel Barrell; Stephen M J Searle; Bronwen Aken; Susan M Hiatt; Adam Frankish; Marie-Marthe Suner; Bhanu Rajput; Charles A Steward; Garth R Brown; Ruth Bennett; Michael Murphy; Wendy Wu; Mike P Kay; Jennifer Hart; Jeena Rajan; Janet Weber; Catherine Snow; Lillian D Riddick; Toby Hunt; David Webb; Mark Thomas; Pamela Tamez; Sanjida H Rangwala; Kelly M McGarvey; Shashikant Pujar; Andrei Shkeda; Jonathan M Mudge; Jose M Gonzalez; James G R Gilbert; Stephen J Trevanion; Robert Baertsch; Jennifer L Harrow; Tim Hubbard; James M Ostell; David Haussler; Kim D Pruitt
Journal: Nucleic Acids Res Date: 2013-11-11 Impact factor: 16.971

10. The Ensembl gene annotation system.

Authors: Bronwen L Aken; Sarah Ayling; Daniel Barrell; Laura Clarke; Valery Curwen; Susan Fairley; Julio Fernandez Banet; Konstantinos Billis; Carlos García Girón; Thibaut Hourlier; Kevin Howe; Andreas Kähäri; Felix Kokocinski; Fergal J Martin; Daniel N Murphy; Rishi Nag; Magali Ruffier; Michael Schuster; Y Amy Tang; Jan-Hinnerk Vogel; Simon White; Amonida Zadissa; Paul Flicek; Stephen M J Searle
Journal: Database (Oxford) Date: 2016-06-23 Impact factor: 3.451

64 in total

1. Performance evaluation of pathogenicity-computation methods for missense variants.

Authors: Jinchen Li; Tingting Zhao; Yi Zhang; Kun Zhang; Leisheng Shi; Yun Chen; Xingxing Wang; Zhongsheng Sun
Journal: Nucleic Acids Res Date: 2018-09-06 Impact factor: 16.971

2. The Clinical Genome and Ancestry Report: An interactive web application for prioritizing clinically implicated variants from genome sequencing data with ancestry composition.

Authors: In-Hee Lee; Jose A Negron; Carles Hernandez-Ferrer; William Jefferson Alvarez; Kenneth D Mandl; Sek Won Kong
Journal: Hum Mutat Date: 2019-11-15 Impact factor: 4.878

3. Loss of PIGK function causes severe infantile encephalopathy and extensive neuronal apoptosis.

Authors: Xin Chen; Wu Yin; Siyi Chen; Wenyu Zhang; Hongyan Li; Hanzhe Kuang; Miaojin Zhou; Yanling Teng; Junlong Zhang; Guodong Shen; Desheng Liang; Zhuo Li; Bing Hu; Lingqian Wu
Journal: Hum Genet Date: 2021-01-04 Impact factor: 4.132

4. OncoBase: a platform for decoding regulatory somatic mutations in human cancers.

Authors: Xianfeng Li; Leisheng Shi; Yan Wang; Jianing Zhong; Xiaolu Zhao; Huajing Teng; Xiaohui Shi; Haonan Yang; Shasha Ruan; MingKun Li; Zhong Sheng Sun; Qimin Zhan; Fengbiao Mao
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

5. Clinical Interpretation of Sequence Variants.

Authors: Junyu Zhang; Yanyi Yao; Haixian He; Jun Shen
Journal: Curr Protoc Hum Genet Date: 2020-06

6. Identification of a Homozygous Missense Mutation in the TYR Gene in a Chinese Family with OCA1.

Authors: Yan Wang; Yi-Fan Zhou; Na Shen; Yao-Wu Zhu; Kun Tan; Xiong Wang
Journal: Curr Med Sci Date: 2018-10-20

7. funtrp: identifying protein positions for variation driven functional tuning.

Authors: Maximilian Miller; Daniel Vitale; Peter C Kahn; Burkhard Rost; Yana Bromberg
Journal: Nucleic Acids Res Date: 2019-12-02 Impact factor: 16.971

Review 8. Application of Computational Biology and Artificial Intelligence Technologies in Cancer Precision Drug Discovery.

Authors: Nagasundaram Nagarajan; Edward K Y Yapp; Nguyen Quoc Khanh Le; Balu Kamaraj; Abeer Mohammed Al-Subaie; Hui-Yuan Yeh
Journal: Biomed Res Int Date: 2019-11-11 Impact factor: 3.411

9. Study of the collagen type VI alpha 3 (COL6A3) gene in Parkinson's disease.

Authors: Chong-Yao Jin; Ran Zheng; Zhi-Hao Lin; Nai-Jia Xue; Ying Chen; Ting Gao; Yi-Qun Yan; Yi Fang; Ya-Ping Yan; Xin-Zhen Yin; Jun Tian; Jia-Li Pu; Bao-Rong Zhang
Journal: BMC Neurol Date: 2021-05-08 Impact factor: 2.474

10. Clinical characteristics and genetic spectrum of 26 individuals of Chinese origin with primary ciliary dyskinesia.

Authors: Xinyue Zhao; Chun Bian; Keqiang Liu; Wenshuai Xu; Yaping Liu; Xinlun Tian; Jing Bai; Kai-Feng Xu; Xue Zhang
Journal: Orphanet J Rare Dis Date: 2021-07-01 Impact factor: 4.123