| Literature DB >> 23197657 |
Jun-Ichi Takeda1, Chisato Yamasaki, Katsuhiko Murakami, Yoko Nagai, Miho Sera, Yuichiro Hara, Nobuo Obi, Takuya Habara, Takashi Gojobori, Tadashi Imanishi.
Abstract
H-InvDB (http://www.h-invitational.jp/) is a comprehensive human gene database started in 2004. In the latest version, H-InvDB 8.0, a total of 244 709 human complementary DNA was mapped onto the hg19 reference genome and 43 829 gene loci, including nonprotein-coding ones, were identified. Of these loci, 35 631 were identified as potential protein-coding genes, and 22 898 of these were identical to known genes. In our analysis, 19 309 annotated genes were specific to H-InvDB and not found in RefSeq and Ensembl. In fact, 233 genes of the 19 309 turned out to have protein functions in this version of H-InvDB; they were annotated as unknown protein functions in the previous version. Furthermore, 11 genes were identified as known Mendelian disorder genes. It is advantageous that many biologically functional genes are hidden in the H-InvDB unique genes. As large-scale proteomic projects have been conducted to elucidate the functions of all human proteins, we have enhanced the proteomic information with an advanced protein view and new subdatabase of protein complexes (Protein Complex Database with quality index). We propose that H-InvDB is an important resource for finding novel candidate targets for medical care and drug development.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23197657 PMCID: PMC3531145 DOI: 10.1093/nar/gks1245
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A schematic diagram of H-InvDB as a central hub for human omics study. Each content is described shortly in the Quick guide page (http://h-invitational.jp/hinv/ahg-db/tools.jsp).
Statistics of H-InvDB 8.0
| Number of gene clusters (HIX) | Number of transcripts (HIT) | Number of proteins (HIP) |
|---|---|---|
| 43 829 | 244 709 | 147 684 |
Statistics of representative HIPs
| Category | Definition | Number of representative HITs |
|---|---|---|
| I | Identical to known human protein (≥98% identity and 100% coverage) | 16 128 |
| II | Similar to known protein (≥50% identity and ≥50% coverage) | 5872 |
| III | InterPro domain containing protein | 898 |
| IV | Conserved hypothetical protein | 1705 |
| V | Hypothetical protein | 5268 |
| VI | Hypothetical short protein (20–79 amino acids) | 5068 |
| VII | Pseudogene candidates | 692 |
| Total | 35 631 |
Figure 2.Comparison of gene numbers between H-InvDB and other databases. (A) The Venn diagram represents the numbers of unique and overlapping genes among H-InvDB, RefSeq and Ensembl. (B) The bar graph represents the numbers of H-Inv unique genes when compared with CCDS genes. The roman numerals indicate protein categories shown in Table 2.
Protein category-upgraded genes relating with Mendelian disorders in only H-InvDB 8.0
| Category | Number of category-upgraded genes |
|---|---|
| Upgrade from V or VI to I | 11 |
| Upgrade from V or VI to II | 209 |
| Upgrade from V or VI to III | 13 |
aDefinition of category is shown in Table 2.
Figure 3.Screenshot of a part of protein view and the top page of PCDq. (A) Hyperlinks to NBRC and HGPD are shown in a red circle. (B) Entrance to PCDq is http://www.h-invitational.jp/hinv/pcdq/.