| Literature DB >> 11178275 |
I Iliopoulos1, S Tsoka, M A Andrade, P Janssen, B Audit, A Tramontano, A Valencia, C Leroy, C Sander, C A Ouzounis.
Abstract
To assess how automatic function assignment will contribute to genome annotation in the next five years, we have performed an analysis of 31 available genome sequences. An emerging pattern is that function can be predicted for almost two-thirds of the 73,500 genes that were analyzed. Despite progress in computational biology, there will always be a great need for large-scale experimental determination of protein function.Entities:
Mesh:
Substances:
Year: 2000 PMID: 11178275 PMCID: PMC150431 DOI: 10.1186/gb-2000-2-1-interactions0001
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1A summary of the annotation levels for 31 genomes. Annotations for all genomes (for 73,500 unique genes, 134,000 annotations in total - approximately a twofold annotation coverage) are available on the world wide web at the European Bioinformatics Institute Computational Genomics Group Services page [15] - then point and click at 'GeneQuiz'. Total computation required 2,400 CPU-hrs on a 16-CPU SGI Power Challenge and 68GB of storage. Results for other genomes will be made available at the same URL as they are completed. (a) Information snapshot for 31 entire genomes and a eukaryotic chromosome (Plasmodium falciparum, chromosome 2). For species (and strain) name abbreviations, please refer to the website [15]. Bacteria are shown in black, Archaea in red and Eukarya in blue. Percentages for proteins with homologs of known structure (pink) or function (blue), hypothetical proteins (dark brown) and unique proteins (light brown) are shown. Species are sorted according to the sum of structure and function information; the horizontal line represents the average of known/predicted functions across species. Diamonds (bottom panel) represent the percentage increase in new findings over the original (or public database) annotations (except Drosophila melanogaster, for which such comparison is not currently possible). This percentage range, ranging from 0 to 20, is indicated in brackets. (b) An 'information clock' for the genome of Haemophilus influenzae, showing the relative levels of annotation over time, reflecting a general increase of information in the public databases. Colours are used as in (a).