| Literature DB >> 21504866 |
Elizabeth A Webb1, Timothy D Smith, Richard G H Cotton.
Abstract
DNA mutation data currently reside in many online databases, which differ markedly in the terminology used to describe or define the mutation and also in completeness of content, potentially making it difficult both to locate a mutation of interest and to find sought-after data (eg phenotypic effect). To highlight the current deficiencies in the accessibility of web-based genetic variation information, we examined the ease with which various resources could be interrogated for five model mutations, using a set of simple search terms relating to the change in amino acid or nucleotide. Fifteen databases were investigated for the time and/or number of mouse clicks; clicks required to find the mutations; availability of phenotype data; the procedure for finding information; and site layout. Google and PubMed were also examined. The three locus-specific databases (LSDBs) generally yielded positive outcomes, but the 12 genome-wide databases gave poorer results, with most proving not to be searchable and only three yielding successful outcomes. Google and PubMed searches found some mutations and provided patchy information on phenotype. The results show that many web-based resources are not currently configured for fast and easy access to comprehensive mutation data, with only the isolated LSDBs providing optimal outcomes. Centralising this information within a common repository, coupled with a simple, all-inclusive interrogation process, would improve searching for all gene variation data.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21504866 PMCID: PMC3500169 DOI: 10.1186/1479-7364-5-3-141
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Mutations used for web searches
| Mutation number | Gene name | Gene | Disorder | Amino acid | Amino acid |
|---|---|---|---|---|---|
| 1 | Phenylalanine | PAH | Phenylketonuria | 148 | Glycine/ |
| 2 | mutL homolog 1 | MLH1 | Hereditary | 62 | Glutamine/ |
| 3 | mutL homolog 1 | MLH1 | Hereditary | 618 | Lysine/ |
| 4 | Breast cancer type 2 | BRCA2 | Breast cancer | 2723 | Aspartic acid/ |
| 5 | Breast cancer 1, | BRCA1 | Breast cancer | 772 | Valine/ |
The databases studied
| Database | URL | Description |
|---|---|---|
| HGMD--Human Gene Mutation | Established for the study of mutational | |
| OMIM--Online Mendelian Inheritance | Comprehensive collection of human genes | |
| dbSNP--NCBI Single Nucleotide | Established to serve as a central repository | |
| MutDB--Structurally Annotated | Annotation of human variation data with | |
| MutView--Mutation View | Developed by the Keio University School of | |
| dbGaP--NCBI Genotypes and | Developed to archive/distribute results of | |
| GeneRev--GeneReviews | Expert-authored, peer-reviewed, current | |
| GeneCards | Searchable, integrated database of human | |
| UniProt--Universal Protein Resource | A comprehensive, high-quality and freely | |
| GDB--Human Genome Database | A community-curated collection of human | |
| Ensembl | Software system which produces and | |
| DGV--Database of Genomic Variants | Comprehensive summary of structural | |
| PAHdb--Phenylalanine Hydroxylase | Maintains and centralises mutation data on | |
| InSiGHT--LOVD (Leiden Open | International organisation aiming to improve | |
| BIC--Open Access On-Line Breast | Maintains a central repository for |
Searches of general and locus-specific databases
| Database | Mutation found | CMC to find or [not find] | Time to find or [not find] (min) | Phenotype found | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 | 1 | 2 | 3 | 4 | 5 | ||
| HGMD | + | + | +a | + | + | 5 | 5 | 5a | 5 | 5 | 6.15 | 3.27 | 8.95a | 1.55 | 1.35 | + | + | +a | + | + | - | - | - | - | - |
| OMIM | - | - | + | - | - | [15] | [17] | 8 | [19] | [15] | [7.73] | [7.03] | 1.50 | [6.03] | [5.88] | - | - | + | - | - | - | - | - | - | - |
| dbSNP | - | - | - | - | - | [18] | [18] | [25] | [15] | [12] | [10.80] | [11.57] | [11.57] | [11.56] | [4.48] | - | - | - | - | - | - | - | - | - | - |
| MutDB | + | + | + | + | + | 4 | 4 | 4 | 4 | 4 | 6.50 | 2.90 | 1.58 | 1.15 | 0.98 | + | + | + | + | + | + | + | + | + | + |
| MutView | - | - | - | - | - | [9] | [5] | [5] | [9] | [9] | [4.48] | [1.32] | [1.32] | [4.78] | [4.90] | - | - | - | - | - | - | - | - | - | - |
| GeneCards | - | - | - | -c | - | [19] | [16] | [15] | [15] | [14] | [9.57] | [9.62] | [8.68] | [8.02] | [8.07] | - | - | - | - | - | - | - | - | - | - |
| GeneRev | - | - | - | - | - | [15] | [4] | [4] | [4] | [4] | [13.00] | [3.48] | [3.48] | [2.70] | [2.70] | - | - | - | - | - | - | - | - | - | - |
| dbGaP | - | - | - | - | - | [3] | [3] | [3] | [2] | [2] | [1.11] | [1.10] | [1.10] | [0.98] | [0.98] | - | - | - | - | - | - | - | - | - | |
| UniProt | + | + | + | + | + | 3 | 3 | 3 | 3 | 3 | 1.18 | 1.00 | 0.75 | 1.03 | 0.80 | + | + | + | + | + | + | + | + | + | + |
| Ensembl | - | + | + | + | - | [27] | 6 | 6 | 6 | [24] | [15.50] | 5.95 | 4.75 | 2.75 | [12.50] | - | - | - | - | - | - | - | - | - | - |
| GDB (NO) | |||||||||||||||||||||||||
| DGV | - | - | - | - | - | [3] | [3] | [3] | [3] | [3] | [2.75] | [1.07] | [1.07] | [1.07] | [1.82] | - | - | - | - | - | - | - | - | - | - |
| PAHdb-- | + | 2 | 1.15 | - | - | ||||||||||||||||||||
| InSiGHT | + | + | 10 | 4 | 10.85 | 1.98 | + | + | + | + | |||||||||||||||
| BIC | + | + | 5 | 5 | 1.00 | 0.92 | + | + | + | + | |||||||||||||||
aWrongly numbered as 617.
bDefined as being appropriate additional information (eg impact on severity of disease).
cMutation found using D/H format. Wrongly numbered as 2722. No phenotype data.
dMutations 1-5.
Abbreviation: NO, not operational.
Figure 1Time taken to access mutations (Mut) 1 to 5 on the Human Gene Mutation Database (HGMD), Structurally Annotated Mutation Data (MutDB), Ensembl and Universal Protein Resource (UniProt) websites.
Database comparisons for ease of use characteristics for finding variations causing inherited disease (mutations)
| Database | Mutations | Time | CMCs | Phenotype | Password | Database | Clear | Clear | Recent |
|---|---|---|---|---|---|---|---|---|---|
| HGMD | * | * | * | * | * | * | e | ||
| OMIM | 1b | * | * | * | * | * | * | * | * |
| dbSNPc | 0b | NA | NA | NA | * | * | |||
| MutDB | * | * | * | * | * | * | * | * | |
| MutView | 0b | NA | NA | NA | * | * | * | ||
| GeneCards | 0b | NA | NA | NA | * | * | * | d | * |
| GeneRev | 0b | NA | NA | NA | * | * | * | ||
| dbGap | 0b | NA | NA | NA | * | * | |||
| UniProt | * | * | * | * | * | * | * | * | * |
| Ensembl | 3b | * | * | * | * | * | d | * | |
| DGV | 0b | NA | NA | NA | * | * | * | * | |
| PAHdb | * | * | * | * | * | * | * | * | |
| InSiGHT | * | * | * | * | * | * | * | d | * |
| BIC | * | * | * | * | * | * | * |
*Indicates compliance with column heading criteria.
aFor the last mutation searched (thus allowing for an 'experience' factor).
bActual number of mutations found.
cIncluded for comparison, not a mutation database.
dAlthough layout is relatively clear, many fields are included in these databases, making them complex to navigate initially.
ePublic version only.
NA, Not applicable (mutation not found).
Links available from the various databases
| Links | HGMD | OMIM | dbSNP | MutView | GeneRev | dbGaP | UniProt | Ensembl | DGV | InSiGHT | BIC | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OMIM | * | * | * | * | * | * | * | |||||||
| GDB | * | * | ||||||||||||
| NCBI | * | * | * | * | * | * | * | |||||||
| EntrezGene | * | * | * | * | * | * | * | |||||||
| GeneCards | * | * | ||||||||||||
| GenAtlas | * | * | ||||||||||||
| JSNP | * | |||||||||||||
| GAD | * | * | * | |||||||||||
| FINDbase | * | |||||||||||||
| GeneClinics | * | * | ||||||||||||
| SwissProt | * | * | * | * | * | * | ||||||||
| TrEMBL | * | * | ||||||||||||
| LSDB | * | * | * | * | * | |||||||||
| Coriell | * | |||||||||||||
| HGVS | * | |||||||||||||
| HGMD | * | * | * | * | * | |||||||||
| Genage | * | |||||||||||||
| HGNC | * | * | * | * | * | |||||||||
| dbGaP | * | |||||||||||||
| PharmGKB | * | * | ||||||||||||
| SeattleSNPs | * | |||||||||||||
| UCSC | * | * | * | |||||||||||
| Ensembl | * | * | ||||||||||||
| GeneTests | * | * | ||||||||||||
| UniGene | * | * | ||||||||||||
| WikiGene | * | * | ||||||||||||
| RCSB | * | |||||||||||||
| BIOPKU | * | |||||||||||||
| THBdb | * | |||||||||||||
| WoodsMMR | * | |||||||||||||
| MMRUV | * | |||||||||||||
| HapMap | * | * | ||||||||||||
| DECIPHER | * | |||||||||||||
| dbRIP | ||||||||||||||
| HSVD | ||||||||||||||
| CAC | ||||||||||||||
| PDB | * | |||||||||||||
| IPI | * |
*Indicates link available.
aAdditional links specific to MUTdb: SIFT; PolyPhen; LS-SNP; SNPs3D; PolyDoms; Panther; PMut; SNPEffect; FASTSNP.
bAdditional links specific to GeneCards: GeneLoc; dbSNP; AKS; HuGE; AceView; euGenes; miRbase; ECGene; H-InvDB; ATLAS; HORDE; IMGT; Leiden; GeneRev; Navigator; BCGD; TGDB; Pupasuite; Homologene; Pseudogene; SGD; MGI; Flybase; Wormbase; GeneDecks; GeneNote; GNF SynAtlas; GeneAnnotation; GeneTide; SAGE tags; CGAP; Source; GNF BioGPS; ExpoldeB; RNAdb; ASD; BioMol; MINT; String; Kegg; IntAct; Phosphosite; Proteopedia; OCA; ProtoNet; BLOCKS; InterPro.
cAdditional links specific to PAHdb: Cell bank; The Waystation; PHEXdb; CASRdb; HEXdb; CYSdb; Human Genome Variation Society (nomenclature guidelines).
Google searches
| Search term | No. entries | Entry no./name | Phenotype | |
|---|---|---|---|---|
| 'PAH G148S'a | 0 | - | ||
| PAH G148Sb | 19 | 2 | 3/PAH LKf | No |
| 5/FINDbase | No | |||
| 'PAH Gly148Ser' | 0 | - | - | |
| PAH Gly148Ser | 0 | - | - | |
| 'PAH GLY148SER' | 0 | - | - | |
| PAH GLY148SER | 0 | - | - | |
| - | ||||
| 'MLH1 Q62K' | 2 | 0 | - | |
| MLH1 Q62K | 10 | 0 | - | |
| 'MLH1 Gln62Lys' | 0 | - | - | |
| MLH1 Gln62Lysc | 8 | 0 | - | |
| - | ||||
| 'MLH1 K618A' | 7 | 0 | - | |
| MLH1 K618A | 159 | 2 | 55;56/LOVDd | Yes |
| (72)e | ||||
| 'MLH1 Lys618Ala' | 4 | 0 | - | |
| MLH1 Lys618Alac | 283 | 2 | 37;49/LOVDd | Yes |
| (80)e | 1 | 53/GeneCards | No | |
| 'BRCA2 D2723H' | 6 | 0 | ||
| BRCA2 D2723H | 87 | 1 | 11/kConFab | Yes |
| (33)e | ||||
| 'BRCA2Asp2723His' | 0 | - | ||
| BRCA2Asp2723Hisc | 5 | 1 | 4/kConFab | Yes |
| 'BRCA1 V772A' | 0 | - | ||
| BRCA1 V772A | 12 | 1 | 12/kConFab | Yes |
| 'BRCA1 Val772Ala' | 0 | - | ||
| BRCA1 Val772 Alac | 1 | 1 | 1/kConFab Consortium | Yes |
aQuote marks indicate Google Advanced exact wording or phrase search.
bLack of quote marks indicates Google Advanced search with unlinked terms.
cUpper case letters gave the same result as lower case.
dLeiden Open Variation Database.
eGoogle estimate of unique entries.
fPAH Locus Knowledgebase.
PubMed searches
| First | Second | Reference | |
|---|---|---|---|
| 'PAH G148S' | 0 | ||
| PAH G148S | 0 | ||
| PAH | 8420 | ||
| PAH | G148S | 0 | |
| PAH | Gly148Sera | 0 | |
| MLH1 | 2353 | ||
| MLH1 | Q62K | 0 | |
| MLH1 | Gln62Lysa | 0 | |
| MLH1 | K618A | 5 | [ |
| MLH1 | Lys618Alaa | 1 | [ |
| BRCA2 | 3906 | ||
| BRCA2 | D2723H | 1 | [ |
| BRCA2 | Asp2723Hisa | 1 | [ |
| BRCA1 | 6317 | ||
| BRCA1 | V772A | 0 | |
| BRCA1 | Val772Alaa | 0 |
Quote marks combined search terms.
aUpper case letters gave the same result as lower case.
Figure 2Proposal for a recommended search strategy, exemplified by phenylalanine hydroxylase (shown in italics).