| Literature DB >> 30102366 |
Wenliang Zhang1, Haiyue Zhang1, Huan Yang1, Miaoxin Li1, Zhi Xie2, Weizhong Li1.
Abstract
The causes of a disease and its therapies are not only related to genotypes, but also associated with other factors, including phenotypes, environmental exposures, drugs and chemical molecules. Distinguishing disease-related factors from many neutral factors is critical as well as difficult. Over the past two decades, bioinformaticians have developed many computational resources to integrate the omics data and discover associations among these factors. However, researchers and clinicians are experiencing difficulties in choosing appropriate resources from hundreds of relevant databases and software tools. Here, in order to assist the researchers and clinicians, we systematically review the public computational resources of human diseases related to genotypes, phenotypes, environment factors, drugs and chemical exposures. We briefly describe the development history of these computational resources, followed by the details of the relevant databases and software tools. We finally conclude with a discussion of current challenges and future opportunities as well as prospects on this topic.Entities:
Keywords: database; disease phenotype; environmental exposure; genotype; software tool; web platform
Year: 2019 PMID: 30102366 PMCID: PMC6954426 DOI: 10.1093/bib/bby071
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1Development history of disease-related computational resources. The development of disease-related databases, software tools and web platforms, is depicted over the timeline. According to scopes and applications, the computational resources are classified into different groups.
Comparison of different disease-related data resources
|
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |||
|
| ||||||||||
| OMIM | √(M) | √(F) | √ | √(M) | √(F) | GPAs | ||||
| Orphanet | √ | √(M) | √(F) | √ | GPAs, PDAs | |||||
| DIDA | √ | √(digenic) | GPAs | |||||||
| DiseaseMeth | √(Cancer) | √ | GPAs | |||||||
|
| ||||||||||
| miR2Disease | √ | √(miR) | GPAs | |||||||
| HMDD v2.0 | √ | √(miR) | GPAs | |||||||
| NONCODE | √ | √(lnc) | GPAs | |||||||
| LncRNADisease | √ | √(lnc) | GPAs | |||||||
| Lnc2Cancer | √(Cancer) | √(lnc) | GPAs | |||||||
| NSDNA | √(NSDs) | √(ncR) | GPAs | |||||||
| circRNADisease | √ | √(circ) | GPAs | |||||||
| MNDR | √(F) | √(M) | √(ncR) | GPAs | √ | |||||
|
| ||||||||||
| HGMD | √ | √ | √(M) | √(F) | √ | GPAs | √ | |||
| ClinVar | √ | √ | √(M) | √(F) | √ | GPAs | √ | |||
| VarCards | √ | √ | √(M) | √(F) | √ | GPAs | √ | |||
| GWAS Catalog | √ | √(M) | √(F) | √ | GPAs | √ | ||||
| GWAS Central | √ | √(M) | √(F) | √ | GPAs | √ | ||||
| GWASdb | √ | √(M) | √(F) | √ | √ | GPAs, GDAs | √ | |||
| COSMIC | √(Cancer) | √(M) | √(F) | √ | √ | GPAs, GDAs | √ | |||
| CIViC | √(Cancer) | √ | √ | √ | GPAs, GDAs | √ | ||||
| Denovo-db | √(NSDs) | √(M) | √(F) | √ | GPAs | √ | ||||
| miRdSNP | √ | √(miR) | √ | GPAs | ||||||
| LincSNP | √ | √(lnc) | √ | GPAs | √ | |||||
| LncRNASNP | √ | √(lnc) | √ | GPAs | √ | |||||
|
| ||||||||||
| dbSNP | √ | √ | √ | √ | ||||||
| ESP | √ | √ | √ | √ | GPAs | |||||
| ExAC | √ | √ | ||||||||
| 1000Genome | √ | √ | √ | |||||||
| Kaviar | √ | √ | √ | |||||||
| FINDbase | √ | √ | √ | |||||||
|
| ||||||||||
| MGD | √ | √ | √(Mouse) | √ | GPAs | |||||
| MTB | √(Cancer) | √(Mouse) | √ | GPAs | ||||||
| RGD | √ | √ | √(Rat) | √ | GPAs | |||||
| ZFIN | √ | √ | √(zebra fish) | √ | √ | GPAs | ||||
|
| ||||||||||
| CTD | √ | √ | √ | √ | GPAs, GEFAs, PEFAs | |||||
| miREnvironment | √ | √(miR) | √ | GPEFAs | ||||||
| SM2miR | √ | √(miR) | √ | √ | GEFAs | |||||
| LncEnvironmentDB | √(lnc) | √ | GEFAs | √ | ||||||
| DLREFD | √ | √(lnc) | √ | √ | GPEFAs | |||||
Continued
Summary of disease-related databases
|
|
|
|
|---|---|---|
|
| ||
| OMIM [ | 15 919 gene descriptions, 8670 phenotypes and 3928 genes with association to 1 or more phenotype(s) | 22 June 2018w |
| Orphanet [ | 6949 associations between genes and rare diseases | Aug 2016w |
| Gene2phenotype [ | 2285 GPAs in developmental disorders | Oct 2017w |
| DIDA [ | 213 digenic combination-disease associations in 44 digenic diseases | Oct 2015p |
| DiseaseMeth v2.0 [ | 679 602 aberrant DNA methylation-disease associations in 88 diseases, especially in various cancer | Nov 2016p |
|
| ||
| NONCODE [ | 1110 lncRNAs associated with 284 diseases | Nov 2016p |
| miR2Disease [ | 3273 associations between 349 miRNAs and 169 diseases | Jun 2018w |
| HMDD v2.0 [ | 10 368 associations between 572 miRNAs and 378 diseases | Jun 2013p |
| LncRNADisease [ | 3000 association between 914 lncRNAs and 329 diseases | July 2017w |
| Lnc2Cancer [ | 1488 associations between 666 lncRNAs and 97 cancers | July 2016w |
| NSDNA [ | 26 128 associations between 8736 ncRNAs and 144 nervous system diseases | May 2017w |
| circRNADisease [ | 354 associations between 330 circRNAs and 48 diseases | Apr 2018p |
| MNDR v2.0 [ | 8824 lncRNA-disease, 70 381 miRNA-disease, 118 piRNA-disease and 67 snoRNA-disease experimental associations across 6 mammals | Nov 2017p |
|
| ||
| Clinvar [ | 428 435 genomic variant-disease associations across 30 181 genes | Jun 2018w |
| HGMD [ | 224 642 disease related variants on 8784 genes | Jan 2018w |
| Denovo-db [ | (July 2016)p: 32 991 de novo genetic variants in neurodevelopmental disorders | |
| VarCards [ | 110 154 363 artificially generated SNVs and 1 223 370-reported indels in coding region and splicing sites | Oct 2017p |
| LOVD 2.0 [ | 3 334 104 (2 400 084 unique) variants in 248 807 individuals in 86 LOVD installations | Dec 2015p |
| MITOMAP [ | 1746 variants on mitochondrial DNA | Dec 2015p |
| COSMIC [ | 208 368 associations between somatic mutations and cancer | Nov 2016p |
| CIViC [ | 1678 interpretations of clinical relevance for 713 variants affecting 283 genes associated with 209 cancer subtypes and 291 drugs | Feb 2017p |
| GWAS Catalog v2 [ | ∼60 000 associations between SNPs and traits/diseases | Apr 2018w |
| GWASdb v2.0 [ | 252 530 associations between SNPs and traits/diseases | Nov 2015p |
| GWAS Central [ | 69 986 326 associations between 2 974 961 SNPs and 829 traits/diseases | Nov 2017w |
| LincSNP2.0 [ | 371 647 associations between lncRNA SNPs and diseases, and 1 266 485 Linkage disequilibrium (LD)-SNPs | Oct 2016p |
| LncRNASNP2 [ | 697 lncRNA-Disease associations; 602 GWAS-SNPs and 2 859 147 SNPs in LD regions | Oct 217p |
| miRdSNP [ | 786 associations between 630 unique disease-associated SNPs and 204 disease types | 2012p |
| miRNASNP [ | 2257 SNPs in 1596 human pre-miRNAs;706 SNPs in miRNA mature regions and 227 SNPs in miRNA seed regions | Jan 2015p |
| dbSNP [ | A genomic variation database including 660 773 127 SNPs of | Mar 2018w |
| ExAC [ | Variations from 130 000 subject exome sequencing data from a wide variety of large-scale sequencing projects | Aug 2016p |
| ESP [ | 1 788 563 variants of 6700 exome sequencing data from heart-, lung- and blood-related diseases and traits | Oct 2016p |
| 1000Genome [ | Over 88 million variants of 2504 whole genome sequencing data from 26 populations | Oct 2015p |
| Kaviar [ | Over 162 million variants from 35 projects encompassing 13 200 genomes and 64 600 exomes | Feb 2016w |
|
| ||
| MGD [ | 5021 associations between mouse genetic models and human diseases | Nov 2016p |
| MouseNet v2 [ | 788 080 functional gene network associations for laboratory mouse and eight other model vertebrates | Nov 2015p |
| MTB [ | 6057 associations between mouse genetic models-human cancer; 2288 associations between specific genes-cancers | Oct 2014p |
| RGD [ | 2998 associations between rat genetic models-human diseases | Nov 2016p |
| ZFIN [ | 11348 associations between zebrafish genetic models-human diseases | Nov 2016p |
|
| ||
| CTD [ | 1 379 105 chemical-gene associations, 202 085 chemical-disease associations and 33 583 gene-disease associations | Sep 2016p |
Continued
Genomic feature-driven tools for annotation and evaluation of clinical significance of variants
|
|
|
|
|
|---|---|---|---|
| Functional annotation of genomic variants | 2010: ANNOVAR [ | Yes, annually since 2015 | Functional annotation of genetic variants from high-throughput sequencing data |
| 2012: wANNOVAR [ | Yes | Functional annotation of genetic variants from high-throughput sequencing data | |
| 2012: KGGSeq [ | Yes, bugs fixed monthly | Three different levels: genetic level, variant-gene level and knowledge level | |
| Prediction of functional impact of amino acid substitutions | 2003: SIFT [ | Last update in Aug 2011 | Sequence homology based on PSI-BLAST |
| 2009: LRT [ | Last update in Nov 2009 | Sequence homology | |
| 2010: PolyPhen2 [ | Last update in 2016 | Eight sequence-based and three structure-based predictive features | |
| 2011: MutationAssessor [ | Last update in Dec 2015 | Sequence homology of protein families and subfamilies between species | |
| 2012: PROVEAN [ | Last update in Jan 2015 | Sequence homology | |
| 2013: FATHMM [ | Last update in May 2015 | Sequence homology | |
| 2015: MetaSVM [ | Last update in 2016 | 9 prediction scores and allele frequencies in 1000Genomes | |
| 2017: IMHOTEP [ | Unknown | 9 popular predicted tools | |
| Prioritization of rare missense variants | 2013: VEST3 [ | Yes, quarterly | 86 sequence features |
| 2016: REVEL [ | Last update in 2016 | 13 popular predicted tools | |
| 2016: M-CAP [ | Last update in 2016 | Pathogenicity likelihood scores and direct measures of evolutionary, conservation, the cross-species analog to frequency within the human population | |
| Prioritization of noncoding variants | 2014: GWAVA [ | Last update in 2014 | Various genomic and epigenomic annotations |
| 2015: DeepSEA [ | Yes, annually | Regulatory sequence code | |
| Prediction of functional consequences for both coding and non-coding variants | 2010: MutationTaster [ | Yes | Conservation, splice site, mRNA features, protein features and regulatory features |
| 2011: VAAST [ | Last update in Sep 2016 | Variant frequency data with AAS effect information on a feature-by-feature basis | |
| 2014: CADD [ | Last update in Apr 2018 | 63 annotations including 949 sequence features | |
| 2015: DANN [ | Last update in 2015 | 63 annotations including 949 sequence features that is same to CADD | |
| 2015: FATHMM-MKL [ | Last update in 2015 | 1281 sequence features | |
| 2016: Eigen [ | Last update in 2016 | Functional, evolutionary conservation and regulatory annotations | |
| Interpretation of clinical significance of variants | 2017: InterVar [ | Yes, last update in Jan. 2018 | The-2015-ACMG-AMP-Guidelines |
| 2015: ClinLabGeneticist [ | Last update in 2014 | Extensive variant annotation data source and prioritization of variants |
The tools are classified into different categories according to their uses.
Comparison of phenotype-driven tools for interpretation of clinical significance of variants
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| 2013: eXtasy [ | Online & Standalone | Linux | Ruby; Tabix; Bedtools; R Statistical Framework with randomForest; RobustRankAggreg libraries | Random Forests; Phenomizer | VCF file; TSV for HPO term(s) | Mendelian and oligogenic disorders; nSNVs; Exome analysis; (Only 1 sample per run) |
| 2014: Phen-Gen [ | Online & Standalone | Linux (Ubuntu, CentOS, & RHEL) | Perl | Bayesian framework; Random walk–with–restart; Variant-predicted pathogenicity score; Phenomizer | VCF file; text file for HPO term(s); Pedigree(PED) file; Inheritance models; Type of prediction-genomic or coding; Discard de novo and Stringency | Rare disorders; nSNVs, splice-sites and short indels and non-coding variants; Genome and Exome analysis; (Only 1 family or 1 sample per run) |
| 2014: PhenIX [ | Online | - | - | Semantic similarity score; Variant-frequency score; Variant-predicted pathogenicity score | VCF file; HPO term(s); Inheritance modes; Frequency sources; Number of candidates to show | Mendelian diseases; SNVs, splice-sites and short indels; Exome analysis; (Only 1 sample per run) |
| 2014: Phevor [ | Online | - | - | Disease-gene association score; Variant-prioritization score | VAAST simple or Table for variants; Ontology Term(s); Ontologies to link to HPO | Rare disorders; SNVs; Exome analysis; (Only 1 sample per run) |
| 2016: Exomiser [ | Standalone | Linux; Mac OS X; Windows | ∼4GB RAM for an exome analysis and ∼12GB RAM for a genome analysis; >3 GB free RAM (8 GB preferred); Java 8 or above | HiPHIVE; PHIVE; PhenIX; ExomeWalker; OWLSim; Logistic regression | YML file that include VCF file name; HPO term(s); PED file name; inheritance modes, Probands; Frequency sources; Pathogenicity sources and other alterative parameters | Mendelian, oligogenic and multigenic disorders; SNVs, splice-sites, short indels and non-coding variants; Genome and Exome analysis; (Multiple samples or families per run) |
| 2014: PhenogramViz [ | Cytoscape app | Windows | Cytoscape Version 3.1.0. and above | Phenogram-score (PHS); NAG, OBE, OPA, HI score | Enter symptom(s) directly for symptoms or create file with HPO term(s); Lists of CNVs (include types, Chromosome, Start, End); Lists of genes | Mendelian disorders; CNVs; aCGH and exome analysis; (Only 1 sample per run) |
The availabilities, the requirements and the use of these tools are detailed in the table.
Summary of different biomedical data and analysis web platforms
|
|
|
|
|
|---|---|---|---|
| DisGeNET [ |
| Web interface, DisGeNET Cytoscape plugin, Disgenet2r R package, DisGeNET-RDF | UniProt, dbSNP, GDA, CTD, MGD, OMIM, Clinvar, RGD, GWAS Catalog, Orphaned, HPO, UMLS, MeSH, DO, ICD9-CM, HGNC, dbSNP, CTD in total 22 resources |
| Monarch Initiative [ |
| Web interface, Phenotypes Analyzer, PhenoGrid, Text annotator, Exomiser | ClinVar, CTD, GeneReviews, OMIM, HPO, Orphanet, GWAS Catalog, MGI, ZFIN, NCBI, UCSC, HGNC, MeSH, OMIM, ORDO, HPO, EFO, UMLS in total 53 resources |
| Open Targets Platform [ |
| Web interface, Phylogenetic tree and HEART, Application programming interface | GWAS Catalog, UniProt, Expression Atlas, ChEMBL, Reactome, PhenoDigm, UMLS, MeSH, GO, ECO, HPO, MP, OMIM, ICD9-CM in total 21 resources |
| MalaCards [ |
| Web interface, Tgex, GeneAnalytics, VarElect GeneALaCart, PathCards | Clinvar, Cosmic, dbSNP, DGIdb, DrugBank, FDA, HGMD, OMIM, PharmGKB, ICD10, MeSH, MGI, UMLS, UniProt in total 68 resources |
| MARRVEL [ |
| Web interface, Mutalyzer Position Converter, OMIM API, DIOPT, GTEx | ExAC, gnomAD, IMPC, Monarch, ClinVar, Geno2MP, DGV, DECIPHER, DIOPT, Mutalyzer, SGD, PomBase, WormBase, FlyBase, ZFin, MGI and RGD in total 17 resources |
Scope refers to the major focus of the web platform. Scale is the number of associations and items currently provided in the platform. Each platform has integrated multiple tools/applications. Sources refer to the original data resources that have been integrated in the platform. In the date of statistic, p indicates the Month-Year of statistic from journal publications; w refers to the Month-Year of statistic from official websites.
Figure 2Framework of a comprehensive web platform. A comprehensive web platform should integrate various disease-related information including genotypes, phenotypes, environmental factors, life styles and so on. The available information in the platform should be homogenously annotated by controlled vocabularies and community-driven ontologies, such as GenBank, dbSNP and miRbase for genotypes, HPO and DO for phenotypes, EFO and ChEBI for environmental factors and life styles, DrugBank and PubChem for drugs. Moreover, the platform should have solid scoring models to prioritize associations between different factors, such as genotype-phenotype associations (GPAs), environmental factor-phenotype associations (EFPAs), genotype-environmental factor-phenotype associations (GEFPAs), phenotype-treatment associations (PTAs), genotype-treatment associations (GTAs) and genotype-phenotype-treatment associations (GPTAs).
(continued)
|
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |||
|
| ||||||||||
| ChEMBL | √(Target) | √(F) | √ | PDTAs | ||||||
| DrugBank | √ | √ | √(Target) | √(F) | √ | PDTAs | ||||
| DrugCentral | √ | √ | √(Target) | √ | PDTAs | |||||
| TTD | √ | √ | √(Target) | √ | √ | PDTAs | ||||
| PharmGKB | √ | √ | √(Target) | √ | √ | GDAs | √ | |||
| DGIdb | √(Target) | √ | DTAs | √ | ||||||
| CancerPPD | √(Cancer) | √(Target) | √(Peptides) | DTAs | ||||||
According to scopes and data associations, the databases can be categorised into major groups, but some of them could be included in multiple groups. The symbol ‘√’ indicates the relevant information provided in each database. The following are the name abbreviations: NSDs: nervous system diseases; M: majority; F: few; lnc: lncRNA; mi: miRNA; circ: circRNA; ncR: ncRNAs, including lncRNA, miRNA, piRNA, siRNA and snoRNA etc.; GPAs: genotype–phenotype associations; GDAs: genotype-drug associations; PDAs: phenotype-drug associations; GPEFAs: genotype–phenotype-EF associations; GEFAs: genotype-environmental gactor associations; PDTAs: phenotype-drug-target associations; DTAs: drug-target associations.
(continued)
|
|
|
|
|---|---|---|
| ExposomeExplorer [ | 8034 concentrations correspond to dietary biomarkers (488) for 50 foods and 78 food compounds | Oct 2016p |
| CEBS [ | Over 11 000 exposure agents and over 8000 exposure studies | Nov 2016p |
| SM2miR [ | 5161 associations between 1681 miRNAs and 255 small molecules | Apr 2015p |
| miREnvironment [ | 3857 associations between 1242 miRNAs, EFs and 305 phenotypes | Sep 2012w |
| DLREFD [ | 835 associations between 475 LncRNAs, 153 EFs and 124 phenotypes | Oct 2016p |
|
| ||
| ChEMBL [ | Over 1.6 million distinct compound structures and 14 million activity values from over 1.2 million assays; ∼11 000 drug targets including 9052 proteins | Nov 2016p |
| DrugBank 4.0 [ | 2037 FDA-approved small molecule drugs and 241 FDA-approved biotech (protein/peptide) drugs; over 6000 experimental drugs and over 201 SNP-associated drug effects, and 4661 drug targets | Nov 2013p |
| DrugCentral [ | 2021 FDA drugs, 2423 drugs approved outside US, 3799 small molecules, 239 peptides, 294 other drugs; 10 427 human protein targets including 837 drug efficacy targets | Oct 2016p |
| TTD [ | 2071 approved drugs, 7291 clinical trial drugs, 357 preclinical drugs, 17 803 experimental drugs | Nov 2015p |
| PharmGKB [ | 20 017 associations between SNPs and drugs, and 65 important pharmacogenes | Jun 2018w |
| DGIdb [ | 40 017 mining clinically associations between 2644 genes and 11 215 drugs | Nov 2015p |
| CancerPPD [ | 3491 Experimentally verified anticancer peptides and 121 proteins spanning in 21 tissues | Sep 2014p |
Scope refers to the major focus of the databases. The number of associations or items currently provided in the database is given. In the date of statistic, p indicates the Month-Year of statistic from journal publications; w refers to the Month-Year of statistic from official websites.