| Literature DB >> 27105845 |
Sascha Steinbiss1, Fatima Silva-Franco2, Brian Brunk3, Bernardo Foth4, Christiane Hertz-Fowler2, Matthew Berriman4, Thomas D Otto4.
Abstract
Currently available sequencing technologies enable quick and economical sequencing of many new eukaryotic parasite (apicomplexan or kinetoplastid) species or strains. Compared to SNP calling approaches, de novo assembly of these genomes enables researchers to additionally determine insertion, deletion and recombination events as well as to detect complex sequence diversity, such as that seen in variable multigene families. However, there currently are no automated eukaryotic annotation pipelines offering the required range of results to facilitate such analyses. A suitable pipeline needs to perform evidence-supported gene finding as well as functional annotation and pseudogene detection up to the generation of output ready to be submitted to a public database. Moreover, no current tool includes quick yet informative comparative analyses and a first pass visualization of both annotation and analysis results. To overcome those needs we have developed the Companion web server (http://companion.sanger.ac.uk) providing parasite genome annotation as a service using a reference-based approach. We demonstrate the use and performance of Companion by annotating two Leishmania and Plasmodium genomes as typical parasite cases and evaluate the results compared to manually annotated references.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27105845 PMCID: PMC4987884 DOI: 10.1093/nar/gkw292
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Schematic overview of the Companion workflows. (A) – genome annotation workflow, (B) – downstream analysis and visualization workflow. Input files are represented as blue boxes, output files as yellow boxes. All output files are used to construct the result set presented in the web interface: (C) and (D) – target-reference synteny diagrams for the Leishmania aethiopica target chromosome 34 and the unassembled ‘bin’ chromosome (the latter not drawn to scale), (E) – zoomable tree placing the newly annotated species (here ‘LAET’) in the context of the reference species set, (F) – interactive Venn diagram summarizing core and species-specific clusters.
Figure 2.Example of gene model integration across different gene finders. Case 1 depicts a situation in which RATT was not able to correctly produce a sensible gene model. In case 2, AUGUSTUS missed this gene completely.
Annotation accuracy evaluation for the example runs on Leishmania and Plasmodium parasite species
| Extrinsic evidence | Protein | Protein | RNA-seq + protein |
| Score threshold | 0.8 | 0.5 | 0.5 |
| # Reference genes | 8077 | 5491 | 5491 |
| # Predicted genes | 8412 | 5634 | 5634 |
| Gene level sens | 86.60% | 92.59% | 91.99% |
| Gene level spec | 83.14% | 90.24% | 89.65% |
| AA level sens | 98.06% | 98.07% | 98.61% |
| AA level spec | 95.15% | 98.34% | 98.35% |
Please see Supplementary Tables S1 and S2 for complete results for all species.