| Literature DB >> 18948289 |
Susan Tweedie1, Michael Ashburner, Kathleen Falls, Paul Leyland, Peter McQuilton, Steven Marygold, Gillian Millburn, David Osumi-Sutherland, Andrew Schroeder, Ruth Seal, Haiyan Zhang.
Abstract
FlyBase (http://flybase.org) is a database of Drosophila genetic and genomic information. Gene Ontology (GO) terms are used to describe three attributes of wild-type gene products: their molecular function, the biological processes in which they play a role, and their subcellular location. This article describes recent changes to the FlyBase GO annotation strategy that are improving the quality of the GO annotation data. Many of these changes stem from our participation in the GO Reference Genome Annotation Project--a multi-database collaboration producing comprehensive GO annotation sets for 12 diverse species.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18948289 PMCID: PMC2686450 DOI: 10.1093/nar/gkn788
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Evidence codes used in GO annotation
| Manually assigned evidence codes |
| Experimental evidence codes |
| Inferred from Direct Assay (IDA) |
| Inferred from Physical Interaction (IPI) |
| Inferred from Mutant Phenotype (IMP) |
| Inferred from Genetic Interaction (IGI) |
| Inferred from Expression Pattern (IEP) |
| Computational analysis evidence codes |
| Inferred from Sequence or Structural Similarity (ISS) |
| Inferred from Sequence Orthology (ISO) |
| Inferred from Sequence Alignment (ISA) |
| Inferred from Sequence Model (ISM) |
| Inferred from Genomic Context (IGC) |
| Inferred from Reviewed Computational Analysis (RCA) |
| Author statement evidence codes |
| Traceable Author Statement (TAS) |
| Non-traceable Author Statement (NAS) |
| Curatorial statement codes |
| Inferred by Curator (IC) |
| No biological Data available (ND) |
| Automatically assigned evidence codes |
| Inferred from Electronic Annotation (IEA) |
aThree new subcategories of the ISS evidence code used to assign GO terms based on sequence similarity. ISO is used when the similar sequences are considered to be orthologous. ISA is used where there is extensive sequence alignment but the sequences are not known to be orthologous. ISM is used when a sequence model has been generated from a set of related sequences, e.g. hidden Markov models for transmembrane regions. Full documentation of evidence codes together with how they are used in annotation can be found on the GO website (http://www.geneontology.org/GO.contents.doc.shtml).
Examples of GO annotations for D. melanogaster genes
| 2. Object ID | 3. Object symbol | 4. Qualifier | 5. GO ID | 6. DB:Reference | 7. Evidence Code | 8. With/From | 15. Date |
|---|---|---|---|---|---|---|---|
| FBgn0029891 | Pink1 | GO:0007005 | FB:FBrf0193630|PMID:16672980 | IGI | FB:FBgn0040491 | 20070523 | |
| FBgn0034879 | Rrp4 | GO:0006397 | FB:FBrf0105495 | ISS | SGD:S0001111 | 20060803 | |
| FBgn0020615 | SelD | NOT | GO:0004756 | FB:FBrf0099751|PMID:9398525 | IDA | 20060803 | |
| FBgn0010349 | Dhc64C | colocalizes_with | GO:0005739 | FB:FBrf0191163|PMID:16467387 | IDA | 20071221 | |
| Fbgn0033687 | CG8407 | GO:0007017 | FB:FBrf0174215 | IEA | InterPro:IPR001372 | 20080731 | |
| FBgn0036811 | MED11 | contributes_to | GO:0016455 | FB:FBrf0150795|PMID:12021283 | IC | GO:0000119 | 20070523 |
The column numbers are identical to those in the gene_association.fb file; the full file contains 15 columns of information.
Figure 1.GO annotation on the Gene Report of D. melanogaster MBD-like gene. Note the presence of contradictory experimental evidence for the term ‘methyl-CpG binding’ as indicated by the use of the qualifier ‘NOT’ for some publications. Each highlighted term links to a CV term report that includes a definition for the term and a diagram indicating its relationship to other terms.
Numbers of D. melanogaster protein-coding genes (from a total of 14029 in FB2008_08) with GO annotation by evidence type and ontology
| Biological Process | Molecular Function | Cellular Component | Combined GO | |
|---|---|---|---|---|
| Genes with any GO annotation | 8080 | 9253 | 6893 | 10 131 |
| Genes with ≥1 experimentally based term | 2603 | 1217 | 1403 | 3163 |
| Genes with only electronic annotation | 2288 | 2055 | 1684 | 1716 |
| Genes with no data available | 855 | 859 | 1013 | 728 |
aAssigned with evidence codes: IDA, IPI, IMP, IGI, IEP.
bAssigned with IEA evidence code.
cRoot GO terms assigned with ND evidence code (note that ‘ND’ is applied only to genes that have been assessed for functional data; it is not used for genes that have not yet been subject to GO curation).
Comparison of total GO annotations in FlyBase releases FB2006_01 and FB2008_08 for all D. melanogaster genes (including those not yet located to the genome) by evidence type
| FB2006_01 | FB2008_08 | % change | |
|---|---|---|---|
| Experimental evidence | 11 888 | 17 322 | +46% |
| Computational evidence | 14 946 | 15 487 | +4% |
| Author/curator statements | 16 711 | 16 506 | −1% |
| Electronic annotation | 22 626 | 16 010 | −29% |
| No biological data available | 4997 | 5022 | +0.5% |
| Total annotations | 71 168 | 70 351 | −1% |
aAssigned with evidence codes: IDA, IPI, IMP, IGI, IEP.
bAssigned with evidence codes: ISS, ISA, ISM, ISO, RCA.
cAssigned with evidence codes: TAS, NAS, IC.
dAssigned with IEA evidence code.
eRoot GO terms assigned with ND evidence code (note that ‘ND’ is applied only to genes that have been assessed for functional data; it is not used for genes that have not yet been subject to GO curation).