| Literature DB >> 16961921 |
Fiona M McCarthy1, Nan Wang, G Bryce Magee, Bindu Nanduri, Mark L Lawrence, Evelyn B Camon, Daniel G Barrell, David P Hill, Mary E Dolan, W Paul Williams, Dawn S Luthe, Susan M Bridges, Shane C Burgess.
Abstract
BACKGROUND: Many agricultural species and their pathogens have sequenced genomes and more are in progress. Agricultural species provide food, fiber, xenotransplant tissues, biopharmaceuticals and biomedical models. Moreover, many agricultural microorganisms are human zoonoses. However, systems biology from functional genomics data is hindered in agricultural species because agricultural genome sequences have relatively poor structural and functional annotation and agricultural research communities are smaller with limited funding compared to many model organism communities. DESCRIPTION: To facilitate systems biology in these traditionally agricultural species we have established "AgBase", a curated, web-accessible, public resource http://www.agbase.msstate.edu for structural and functional annotation of agricultural genomes. The AgBase database includes a suite of computational tools to use GO annotations. We use standardized nomenclature following the Human Genome Organization Gene Nomenclature guidelines and are currently functionally annotating chicken, cow and sheep gene products using the Gene Ontology (GO). The computational tools we have developed accept and batch process data derived from different public databases (with different accession codes), return all existing GO annotations, provide a list of products without GO annotation, identify potential orthologs, model functional genomics data using GO and assist proteomics analysis of ESTs and EST assemblies. Our journal database helps prevent redundant manual GO curation. We encourage and publicly acknowledge GO annotations from researchers and provide a service for researchers interested in GO and analysis of functional genomics data.Entities:
Mesh:
Year: 2006 PMID: 16961921 PMCID: PMC1618847 DOI: 10.1186/1471-2164-7-229
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Comparison of human, mouse, rat, chicken and bovine genome statistics.
| H. sapiens | 36.1 | 86 803 | 7 741 746 |
| M. musculus | 36.1 | 67 096 | 4 719 380 |
| R. norvegicus | 3.4 | 51 564 | 871 147 |
| G. gallus | 1.1 | 30 470 | 588 288 |
| B. taurus | 2.1 | 41 986 | 1 039 059 |
| H. sapiens | 299 863 | 73 932 | 5.9 |
| M. musculus | 184 110 | 61 537 | 15.5 |
| R. norvegicus | 52 857 | 14 885 | 29.9 |
| G. gallus | 29 763 | 7 291 | 47.9 |
| B. taurus | 52 425 | 10 059 | 57.1 |
| H. sapiens | 266 785 | 50 612 | 81 |
| M. musculus | 327 082 | 517 032 | 60.2 |
| R. norvegicus | 73 783 | 10 285 | 86.1 |
| G. gallus | 29 963 | 3 058 | 89.8 |
| B. taurus | 39 832 | 9 484 | 76.2 |
Current annotation statistics for selected genomes (15/03/06) are shown. The build number was obtained from NCBI, the estimated number of gene products is based on UniGene numbers [38] and EST numbers are obtained from ESTdb. The number of proteins in the UniProtKB database is under represented for agricultural species. To estimate the proportion of predicted genes in the genome, the number of gene predictions is expressed as a percentage of the total of number of genes both predicted and from UniGene. GO statistics are obtained from GO association files using GOProfiler (available from AgBase).
Figure 1Papers referencing GO by species. The number of papers referencing GO, as determined from PubMed (06/09/06). GO annotation has become the accepted standard for functional annotation [13] and its use is growing exponentially (A). Despite this, GO annotation has been minimally used in chicken and cow (B), in part this is because of smaller numbers of livestock researchers, but also using GO annotation in livestock first requires researchers to functionally annotate their own data.
AgBase GO annotations by species and evidence code.
| Chicken | 1 007 | 142 | 80 | 24 |
| Cow | 4 411 | 382 | 4 | 13 |
| Sheep | 316 | 61 | 0 | 0 |
| Channel Catfish | 19 | 3 | 0 | 2 |
We aim to increase the coverage of GO annotations in agriculturally important species and we are currently GO annotating chicken, cow, sheep and channel catfish. To improve GO coverage, we determine which proteins currently have no GO annotations and use GOanna to do a 'first-pass' annotation based on sequence homology (ISS). We have also provided GO annotations via literature curation for chicken, cow and channel catfish. We do not currently provide IEA annotations.
Figure 2A comparison of chicken and cow GO annotations from AgBase and EBI-GOA. We are currently focused on providing GO annotations for chicken and cow gene products and we collaborate with EBI-GOA to provide a combined GO gene association file for each of these species. The number of GO annotations for chicken and cow is represented here based on GO evidence code; details about the GO evidence codes can also be found on the GO Consortium homepage [37]. (1) Unlike EBI-GOA, AgBase does not currently annotate to IEA. (2) In newly sequenced genomes, such as cow and chicken, a large proportion of gene products are not represented in the UniProtKB database (Table 1) and are not annotated by EBI-GOA. To complement the EBI-GOA annotation effort and provide breadth of coverage, we identify the expression of these 'predicted' gene products in vivo and, where possible, provide GO annotations. (3) By definition, there is no published literature for these 'predicted' proteins and they can only be GO annotated using either IEA or ISS.
Figure 3The AgBase protein detail page. The AgBase protein detail page shows proteins and their GO annotation. The GO annotation terms are interactive links and the source of the GO annotation is acknowledged. Protein sequence is displayed in a text accessible window and where possible, links to other databases are cross-referenced.
Figure 4. GORetriever takes a list of accession numbers or IDs and fetches the existing GO annotation for these products. A list of IDs for which there is currently no GO annotation is also returned and may be used as input for GOanna (Figure 5). An example of a chicken protein and its corresponding matches is shown.
Figure 5. GOanna allows a user to make GO annotations based on sequence similarity. The user inputs a file of IDs or sequences and the tool does a Blast search against a user-specified database of GO annotated gene products using user-defined parameters. The output is shown both at the web interface and as a downloadable file that contains hyperlinks to the BlastP alignments.
Figure 6. GOSlimViewer takes a list of list of GO numbers generated from the GO Retriever program (A) and using a user-defined slim, creates an Excel compatible file that can be used for visualization of the results (B).