| Literature DB >> 19578431 |
.
Abstract
The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content.Entities:
Mesh:
Year: 2009 PMID: 19578431 PMCID: PMC2699109 DOI: 10.1371/journal.pcbi.1000431
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Distribution of the PANTHER families with respect to the number of reference genome species having representatives in each family.
Figure 2Tree representation of the TOP2 homolog set for the twelve species from the Reference Genome project.
Genes having experimental data are labeled in red. Since members of all represented branches have “GO:0003918 DNA topoisomerase (ATP-hydrolyzing) activity” and a role in “GO:0007059 chromosome segregation”, the common ancestor (CA) can be inferred to also have had these functions. We thus predict that all descendents can be annotated to those terms with reasonable confidence. The sequences represented are (from top to bottom): A. thaliana TAIR:locus = 2075765, E. coli UniProt: P0AFI2 (parC), E. coli UniProt: P0AES4 (gyrA), E. coli UniProt: P20083 (parE), E. coli UniProt: P0AES6 (gyrB), A. thaliana TAIR:locus = 2146658, A. thaliana TAIR:locus = 2076268, A. thaliana TAIR:locus = 2146698, A. thaliana TAIR:locus = 2076201, D. discoideum dictyBase: DDB_G0279737 (top2mt), D. discoideum dictyBase: DDB_ G0270418 (top2), S. cerevisiae SGD:S000005032 (TOP2), S. pombe GeneDB SPBC1A4.03c (top2), D. melanogaster FlyBase FBgn0003732 (top2), C. elegans WormBase WBGene00019876 (R05D3.1), C. elegans WormBase WBGene00022854 (cin-4), C. elegans WormBase WBGene00021604 (Y46H3C.4), D. reiro ZFIN ZDB-GENE-030131-2453 (top2A), D. reiro ZFIN ZDB-GENE-041008-136 (top2B), G. gallus UniProt:O42130 (top2A), H. sapiens UniProt:P11288 (top2A), M. musculus MGI:98790 (top2A), R. norvegius RGD: 62048 (top2A), G. gallus UniProt: O42131 (top2B), H. sapiens UniProt:P02880 (top2B), M. musculus MGI:98791 (top2B), R. norvegius RGD: 1586156 (top2B).
Increase in information content of the annotations of the genes from the twelve reference genomes (“All”), compared to that of the subset of genes selected for concurrent annotation (“Ref”).
| July 2006 | December 2008 | Change | Relative Change | ||
| Biological process | All | 6.09 | 6.07 | −0.02 | +2.44 |
|
|
|
|
| ||
| Cellular component | All | 4.32 | 4.29 | −0.03 | +2.06 |
|
|
|
|
| ||
| Molecular function | All | 6.18 | 5.69 | −0.49 | +1.99 |
|
|
|
|
|
The relative change corresponds to the sum of the changes for “All” and “Ref” sets of genes.
Figure 3The Gene Ontology's brower AmiGO displays Comparison Graph for genes presents in homolosets.
Those show all annotations, both experimental (evidence codes: IDA, IMP, IGI, IPI, IEP) as well as those inferred from sequence similarity to an experimentally characterized gene (ISS) and by curators (IC). Direct annotations to a GO term are indicated by colored wedges. Different species are represented by different colors. What species to display can be selected from the Control Panel on the righ hand side (here, the species selected are H. sapiens, D. reiro, and E. coli). The wedges also contain a small color-coded circle that indicates whether the annotation to a term is based on experimental data (green), supported by sequence similarity (blue), or is annotated with other evidence (no circle in the wedge). Mousing over a term leads to the display of the term ID, term name, and a complete list of annotations to that term by species. Here we show the term “chromosome segreagation”, for which five of the twelve species have experimental data to support that annotation. Annotations based on experimental data are indicated by “E”, and those based on sequence similarity by an “I”.