| Literature DB >> 19933760 |
Chisato Yamasaki1, Katsuhiko Murakami, Jun-ichi Takeda, Yoshiharu Sato, Akiko Noda, Ryuichi Sakate, Takuya Habara, Hajime Nakaoka, Fusano Todokoro, Akihiro Matsuya, Tadashi Imanishi, Takashi Gojobori.
Abstract
We report the extended database and data mining resources newly released in the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). H-InvDB is a comprehensive annotation resource of human genes and transcripts, and consists of two main views and six sub-databases. The latest release of H-InvDB (release 6.2) provides the annotation for 219,765 human transcripts in 43,159 human gene clusters based on human full-length cDNAs and mRNAs. H-InvDB now provides several new annotation features, such as mapping of microarray probes, new gene models, relation to known ncRNAs and information from the Glycogene database. H-InvDB also provides useful data mining resources-'Navigation search', 'H-InvDB Enrichment Analysis Tool (HEAT)' and web service APIs. 'Navigation search' is an extended search system that enables complicated searches by combining 16 different search options. HEAT is a data mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. H-InvDB now has web service APIs of SOAP and REST to allow the use of H-InvDB data in programs, providing the users extended data accessibility.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19933760 PMCID: PMC2808976 DOI: 10.1093/nar/gkp1020
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Statistics of H-InvDB entries
| H-InvDB release | Date of release | Number of transcripts (HIT) | Number of gene clusters (HIX) | Number of proteins (HIP) | Annotation jamboree | |
|---|---|---|---|---|---|---|
| 1.0 | 20 April 2004 | 41 118 | 21 037 | – | H-Invitational 1 | August 2002 |
| 2.0 | 31 August 2005 | 56 419 | 25 585 | – | H-Invitational 2 FA | November 2003 |
| 3.0 | 31 March 2006 | 167 992 | 35 005 | – | All human gene FA meeting 2005 | October 2005 |
| 4.0 | 28 March 2007 | 175 542 | 34 701 | 173 690 | All human gene FA meeting 2006 | October 2006 |
| 5.0 | 26 December 2008 | 187 156 | 36 073 | 124 280 | All human gene FA meeting 2007 | October 2007 |
| 6.0 | 18 December 2008 | 219 765 | 43 159 | 133 523 | ||
| 6.2 | 30 March 2009 | 219 765 | 43 159 | 133 629 |
aMeeting of H-Invitational project.
bMeeting hosted by Genome Information Integration Project (GIIP).
Statistics of curated representative H-Inv proteins (H-InvDB release 6.2)
| Category | Definition | Number of representative HITs | Percentage |
|---|---|---|---|
| I | Identical to known | 13 314 | 37.71 |
| II | Similar to known | 3380 | 9.57 |
| III | InterPro domain containing protein | 2584 | 7.32 |
| IV | Conserved hypothetical protein | 4584 | 12.98 |
| V | Hypothetical protein | 5203 | 14.74 |
| VI | Hypothetical short protein (20–79 amino acids) | 5446 | 15.43 |
| VII | Pseudogene candidates | 901 | 2.55 |
| Total | 35 303 | 100.00 |
a‘Known’ proteins are experimentally validated proteins in literatures.
Figure 1.pHIT gene model in G-integra genome browser. An image of G-integra genome browser for a pHIT gene model, pHIT000015735, is shown (http://www.h-invitational.jp/hinv/g-integra/cgi-bin/f_genemap.cgi?id=pHIT000015735). Gene structure of pHIT000015735 is indicated by blue solid square at all human gene and JIGSAW track.
New annotated features in H-InvDB
| No. | Annotation item | Area | Available at |
|---|---|---|---|
| 1 | Mappings of microarray probes to H-InvDB data | Expression | ‘Expression’ tab in Transcript view |
| 2 | New ID for gene families/groups (HIF) | Gene family | ‘Function’ tab in Transcript view, Locus view, and Gene Family/groups view. |
| 3 | pHIT gene models | Gene model | Transcript view, Locus view, G-integra and all the related viewers |
| 4 | eHIT gene models | Gene model | Transcript view, Locus view, G-integra and all the related viewers |
| 5 | Truncation judgment | Quality control | ‘Transcript Information’ tab in Transcript view |
| 6 | Kozak sequence | Quality control | ‘Transcript Information’ tab in Transcript view |
| 7 | Anti-sense gene information | Gene structure | ‘Gene structure’ tab in Locus view |
| 8 | Detailed data of similarity to known ncRNA. | ncRNA | ‘Function’ tab in Transcript view |
| 9 | Two new species (horse and medaka) for comparative analysis | Comparative | ‘Evolution’ tab in Transcript view, G-integra and Evola |
| 10 | Detailed annotation for unmapped (UM) transcripts | Gene structure | Topic Annotation viewer |
| 11 | Remote integration of GlycoGene Database (GGDB) | Function | ‘Function’ tab in Transcript view |
| 12 | Remote integration of the functional RNA database (fRNAdb) | ncRNA | ‘Function’ tab in Transcript view |
Figure 2.‘Navigation search’: powerful search tool of 16 search items. Example screen shot of the Navigation search system (http://www.h-invitational.jp/hinv/c-search/). (A) There are links to the Navigation system, ‘Navi’, at the black menu bar in all the viewers in H-InvDB including the top page. (B) Search navigation menu provide the list of all searches available in H-InvDB. (C) The new advanced search provide combination search of 16 search contents, for example, #2 gene structure, #3 alternative splicing (AS) variants, #10 genetic polymorphism and #13 relation to disease. (D) The search results provide the list of HIX IDs, HIT IDs, Chromosome number, definition, HGNC gene symbol, and links to appropriate H-InvDB and related viewers.
The list of search contents and items H-InvDB Navigation search
| No. | Search content | Search items |
|---|---|---|
| 1 | Keyword or ID | 13 IDs and 7 different types of keywords |
| 2 | Gene structure | chromosome number, chromosomal band, genome strand and location on the human genome |
| 3 | Alternative splicing (AS) variants | splicing site, pattern and location of alternative splicing |
| 4 | Non-coding functional RNAs | type and classification of ncRNAs |
| 5 | Protein functions | definition, similarity category, gene symbol, EC name and molecular function of GO |
| 6 | Functional domains | ID, name and type of InterPro domain |
| 7 | Subcellular localization | cellular component of GO and predicted subcellular localization by WoLF PSORT, SOSUI, TMHMM, TargetP and PTS1 |
| 8 | Metabolic pathways | biological process of GO, ID and name of the KEGG pathway |
| 9 | Protein 3D structure | PDB and SCOP IDs of GTOP prediction |
| 10 | Genetic polymorphism | types and features of variation such as SNP, microsatellite, copy number variation (CNV), synonymous or nonsynonymous variations |
| 11 | Gene expression | tissue specific expression in ten tissue/organ classes, Affimetrix probe ID, promoter motif and upstream transcriptional start site (TSS) |
| 12 | Relation to disease | relation to MutationView, ID and disease name of OMIM |
| 13 | Molecular evolution | orthologues and genome conservation among human and 13 model organisms |
| 14 | Protein–protein interaction | number of interacting proteins |
| 15 | Gene families and groups | all the predicted human gene families and four manually curated gene families/groups; Ig, MHC, TCR and OR |
| 16 | Transcript information | sequence data provider, molecular type, coding potential and curation status information |
The list of representative H-InvDB web service APIs (SOAP)
| API type | Description of API | WDSL | Query and output |
|---|---|---|---|
| Search entries | Search by IDs | soap_id_search.php?wsdl | query = any ID output = HIT ID |
| Search by keywords | soap_keyword_search.php?wsdl | query = any keyword output = HIT ID | |
| Search by genomic location | soap_location2hit.php?wsdl | query = genomic location output = corresponding HIT ID | |
| Count entries | Total number of HIT | soap_hit_cnt.php?wsdl | output = total number of HIT ID |
| Convert IDs | Convert ISND accession to HIT | soap_acc2hit.php?wsdl | query = Accession No. output = HIT ID |
| Retrieve data | Retrieve HIT XML file | soap_hit_xml.php?wsdl | query = HIT ID output = HIT XML file |
| Retrieve HIT definition | soap_hit_definition.php?wsdl | query = HIT ID output = HIT definition | |
| Retrieve HIT evolutionary information | soap_hit_evolution.php?wsdl | query = HIT ID output = evolutionary information | |
| Retrieve HIT gene expression information | soap_hit_expression.php?wsdl | query = HIT ID output = gene expression information | |
| Retrieve HIT genomic location of HIT | soap_hit_location.php?wsdl | query = HIT ID output = genomic location of HIT | |
| Retrieve nucleotide sequence of HIT | soap_hit_nucleotide_seq_xml.php?wsdl | query = HIT ID output = nucleotide sequence of HIT (XML format) | |
| Retrieve protein sequence of HIT | soap_hit_protein_seq_xml.php?wsdl | query = HIT ID output = protein sequence of HIT (XML format) |