| Literature DB >> 20157489 |
P Gaudet1, L Lane, P Fey, A Bridge, S Poux, A Auchincloss, K Axelsen, S Braconi Quintaje, E Boutet, P Brown, E Coudert, R S Datta, W C de Lima, T de Oliveira Lima, S Duvaud, N Farriol-Mathis, S Ferro Rojas, M Feuermann, A Gateau, U Hinz, C Hulo, J James, S Jimenez, F Jungo, G Keller, P Lemercier, D Lieberherr, M Moinat, A Nikolskaya, I Pedruzzi, C Rivoire, B Roechert, M Schneider, E Stanley, M Tognolli, K Sjölander, L Bougueleret, R L Chisholm, A Bairoch.
Abstract
UniProtKB/Swiss-Prot, a curated protein database, and dictyBase, the Model Organism Database for Dictyostelium discoideum, have established a collaboration to improve data sharing. One of the major steps in this effort was the 'Dicty annotation marathon', a week-long exercise with 30 annotators aimed at achieving a major increase in the number of D. discoideum proteins represented in UniProtKB/Swiss-Prot. The marathon led to the annotation of over 1000 D. discoideum proteins in UniProtKB/Swiss-Prot. Concomitantly, there were a large number of updates in dictyBase concerning gene symbols, protein names and gene models. This exercise demonstrates how UniProtKB/Swiss-Prot can work in very close cooperation with model organism databases and how the annotation of proteins can be accelerated through those collaborations.Entities:
Year: 2009 PMID: 20157489 PMCID: PMC2790310 DOI: 10.1093/database/bap016
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Dictyostelium genome statistics
| Genome size | 34 Mbp |
| Chromosomes | 6 |
| Protein coding genes | ∼12 500 |
| Genes with splice variants | ∼20 identified |
| Curated pseudogenes | 160 |
| Repetitive elements | 500–1000, clustered |
| rRNA genes | 8 (100 copies, extrachromosomal) |
| tRNA genes | 418 |
| Other non-coding RNAs | ∼100 |
Annotation marathon summary
| Number of proteins annotated during the 5 days of the actual annotation marathon | 1044 |
| Total number of papers read/annotated | 284 |
| Gene names changed/added in dictyBase | 254 |
| Name changes still being processed | 88, including 40 with naming disagreements |
| Gene models corrected in dictyBase | 27 |
Figure 1.A modified gene model, commd10 (DDB_G0275249). (A) The gene prediction (underlined) selected a splice donor (gttaat) 21 nt upstream from that (gtaata) of the curated gene model (highlighted in yellow). This image was generated using the dictyBase Genome Browser. (B) The modified gene prediction is supported by sequence similarity. The new Dictyostelium gene model is indicated by a black shading of ‘DICTY’ on the labels on the left; the new gene model produces a better alignment than the gene prediction (bottom row), indicated by the gap in the alignment of the latter.
Figure 2.The new dictyBase protein information page will display annotations from UniProtKB/Swiss-Prot. Information for the MhcA protein (gene ID: DDB_G0286355; sequence ID: DDB0191444) parsed from UniProtKB/Swiss-Prot record P08799 includes sequence processing, sequence existence evidence, subunit structure, post-translational modifications, as well as sub-cellular location. Fields obtained from UniProtKB/Swiss-Prot are marked with an asterisk.
Progress of Dictyostelium annotations in Swiss-Prot
| Release | Date | Number of Dicty entries | Rank in Swiss-Prot |
|---|---|---|---|
| 51.4 | January 2007 | 337 | 165th rank |
| 55.0 | February 2008 | 537 | 86th rank |
| 55.2 | April 2008 | 1803 | 18th rank |
| 57.1 | April 2009 | 3619 | 10th rank |