| Literature DB >> 21306991 |
Thomas D Otto1, Gary P Dillon, Wim S Degrave, Matthew Berriman.
Abstract
Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net.Entities:
Mesh:
Year: 2011 PMID: 21306991 PMCID: PMC3089447 DOI: 10.1093/nar/gkq1268
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Workflow of RATT.
Figure 2.Transfer of annotation from the M. tuberculosis strain H37Rv onto the strain F11 sequence, over a deletion. The genomes of H37Rv (upper) and F11 (lower) are shown using the Artemis Comparison Tool (ACT). The source H37Rv annotation (light blue) is directly mapped onto F11 by RATT (green) except for those features corresponding to a region that is unique to the source strain that cannot be transferred and are written to a separate output file (brown).
Figure 3.RATT corrections of transferred annotations. Annotation from H37Rv were transferred onto the F11 sequence (pale blue), corrected (green) and then compared with the existing strain F11 annotation in EMBL (yellow). (A and B) The correction of start and stop codons, respectively. In a more complex mapping situation (C), where all three reading frames are shown for clarity, RATT maps a large single coding sequence (CDS) from H37Rv to a locus within F11 that includes several in-frame stop codons. By inserting a frameshift (i.e. to indicate a pseudogene) the conceptual translation is preserved. This contrasts with two overlapping genes predicted as part of the F11 genome project.
Comparison of three annotation transfers by RATT
| Transfer | Features transferred/total | CDS transferred/total | Partial transfers/CDS | Corrected | Manual correction needed |
|---|---|---|---|---|---|
| 9406/9557 | 3955/3999 | 44 | 113 | 3 | |
| 626/686 | 626/686 | 78 | 139 | 21 | |
| 4970/5105 | 4889/4902 | 157/135 | 7 | 3 |
Corrections by RATT in three annotation transfers
| Transfer | Wrong | Wrong | Number of frameshifts (corrected/ total) | Wrong splice sites (corrected/ total) |
|---|---|---|---|---|
| 44/44 | 62/62 | 40/43 | – | |
| 37/40 | 88/97 | 61/70 | 9/27 | |
| 0/0 | 4/4 | 0/3 | 1/1 |
Comparison of predicted CDS annotations with original strain F11 annotations
| Annotation method | Predicted CDSs | Start matches | End matches | Exact matches |
|---|---|---|---|---|
| Glimmer | 4234 | 2525 | 3838 | 2522 |
| RATT | 3955 | 3505 | 3821 | 3495 |
Predicted annotations by RATT (by transfer from the H37Rv strain annotation) are compared with the existing 3950 CDS annotations in the public version of strain F11.