| Literature DB >> 16381884 |
Ingo Paulsen1, Arndt von Haeseler.
Abstract
Classification of proteins into families of homologous sequences constitutes the basis of functional analysis or of evolutionary studies. Here we present INVertebrate HOmologous GENes (INVHOGEN), a database combining the available invertebrate protein genes from UniProt (consisting of Swiss-Prot and TrEMBL) into gene families. For each family INVHOGEN provides a multiple protein alignment, a maximum likelihood based phylogenetic tree and taxonomic information about the sequences. It is possible to download the corresponding GenBank flatfiles, the alignment and the tree in Newick format. Sequences and related information have been structured in an ACNUC database under a client/server architecture. Thus, complex selections can be performed. An external graphical tool (FamFetch) allows access to the data to evaluate homology relationships between genes and distinguish orthologous from paralogous sequences. Thus, INVHOGEN complements the well-known HOVERGEN database. The databank is available at http://www.bi.uni-duesseldorf.de/~invhogen/invhogen.html.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16381884 PMCID: PMC1347462 DOI: 10.1093/nar/gkj100
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Removing incompatible HSPs. For each pair of sequences X and Y that hit each other using BLASTP, HSPs that are not compatible with a global alignment are removed. In this example, hits H1 and H2 are compatible. However H3 and H4 are not compatible. Therefore, only H1 and H2 are considered for further computations on similarity measures. Because H1 and H2 are overlapping, the overlap is allocated to H1 and H2 is shortened accordingly. In a crossing-over situation between H1 and H2 for the sequences X and Y, H1 will be used if length(H1) > length(H2), otherwise, H2 is to take into account.
Figure 2The web interface for querying gene family databases. Window 1 allows to perform queries for different kinds of search criteria. In this example INVHOGEN is asked to search for all gene families containing Apis mellifera (honey bee). The resulting gene families for his query are listened in window 2.
Distribution of families in INVHOGEN Release 2 and HOVERGEN Release 46
| Family size | No. of families INVHOGEN | No. of families HOVERGEN | ||
|---|---|---|---|---|
| 2 | 8567 | 55.7% | 3219 | 24.5% |
| 3 | 2257 | 14.7% | 1788 | 13.6% |
| 4 | 1210 | 7.8% | 1369 | 10.5% |
| 5–9 | 2093 | 13.6% | 3677 | 28.0% |
| 10–19 | 693 | 4.5% | 1928 | 14.7% |
| 20–49 | 358 | 2.3% | 832 | 6.3% |
| 50–99 | 116 | 0.8% | 182 | 1.4% |
| ≥100 | 95 | 0.6% | 149 | 1.1% |
| Total | 15 389 | 100% | 13 144 | 100% |
Ten largest families of INVHOGEN Release 2 and HOVERGEN Release 46
| Family name INVHOGEN | Sequences | Family name HOVERGEN | |
|---|---|---|---|
| Cytochrome | 22 287 | 22 616 | Cytochrome |
| Cytochrome | 6192 | 8480 | NADH dehydrogenase subunit 4 |
| Cytochrome | 3229 | 5987 | Family 1 of G-protein-coupled receptors |
| Elongation factor-1α | 3124 | 3608 | Class I histocompatibility antigen |
| NADH dehydrogenase subunit 1 | 1586 | 2990 | ATP synthase subunit 6 |
| NADH dehydrogenase subunit 5 | 1568 | 2291 | ATP synthase subunit 8 |
| WNT family | 1528 | 2090 | Cytochrome |
| Serine peptidase | 1096 | 1657 | NADH dehydrogenase subunit 1 |
| Homeobox protein | 860 | 1499 | Zinc finger protein |
| Histone H3 | 836 | 1314 | NADH dehydrogenase subunit 6 |
| Total | 42 306 | 52 532 | |
The top 10 species in INVHOGEN Release 2 and HOVERGEN Release 46
| Species INVHOGEN | Sequences | Species HOVERGEN | |
|---|---|---|---|
| 17 348 | 56 932 | ||
| 16 604 | 46 693 | ||
| 10 704 | 9066 | ||
| 8423 | 7577 | ||
| Schistosoma japonicum | 2143 | 5392 | |
| Drosophila simulans | 998 | 3258 | Gallus gallus |
| 894 | 3038 | Bos taurus | |
| 689 | 2790 | Sus scrofa | |
| Drosophila yakuba | 608 | 1720 | Macaca fascicularis |
| Ixodes scapularis | 538 | 1325 | Oryctolagus cuniculus |
| Total | 58 949 | 137 791 | |
aThe organisms where the complete genomic sequence is published (Genomes OnLine Database, August 11, 2005)
Distribution of the main classified invertebrate groups in INVHOGEN Release 2 and from the literature (20)
| Invertebrate groups | Species/fraction from literature | Species/fraction in INVHOGEN | Sequences/fraction in INVHOGEN | |||
|---|---|---|---|---|---|---|
| Arthropods | 900 000 | 85.86% | 16 681 | 77% | 81 896 | 62.36% |
| Urochordates | 3000 | 0.29% | 65 | 0.30% | 910 | 0.69% |
| Echinoderms | 7000 | 0.67% | 326 | 1.50% | 2718 | 2.07% |
| Poriferans | 9000 | 0.86% | 112 | 0.52% | 398 | 0.30% |
| Nematodes | 15 000 | 1.43% | 348 | 1.61% | 29 630 | 22.56% |
| Platyhelminths | 20 000 | 1.91% | 369 | 1.70% | 4296 | 3.27% |
| Cnidarians | 9000 | 0.86% | 448 | 2.07% | 1629 | 1.24% |
| Molluscs | 70 000 | 6.68% | 2930 | 13.52% | 8088 | 6.16% |
| Annelids | 15 000 | 1.43% | 369 | 1.70% | 1041 | 0.79% |
| Hemichordates | 100 | 0.01% | 3 | 0.01% | 74 | 0.06% |
| Cephalochordates | 25 | 0% | 8 | 0.04% | 608 | 0.46% |
| Ctenophorans | 150 | 0.01% | 6 | 0.03% | 31 | 0.02% |
| Total | 1 048 275 | 100% | 21 665 | 100% | 131 319 | 100% |