MOTIVATION: Chloroplast genomes are now produced in the hundreds for angiosperm phylogenetics projects, but current methods for annotation, alignment and tree estimation still require some manual intervention reducing throughput and increasing analysis time for large chloroplast systematics projects. RESULTS: Verdant is a web-based software suite and database built to take advantage a novel annotation program, annoBTD. Using annoBTD, Verdant provides accurate annotation of chloroplast genomes without manual intervention. Subsequent alignment and tree estimation can incorporate newly annotated and publically available plastomes and can accommodate a large number of taxa. Verdant sharply reduces the time required for analysis of assembled chloroplast genomes and removes the need for pipelines and software on personal hardware. AVAILABILITY AND IMPLEMENTATION: Verdant is available at: http://verdant.iplantcollaborative.org/plastidDB/ It is implemented in PHP, Perl, MySQL, Javascript, HTML and CSS with all major browsers supported. CONTACT: mrmckain@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.
MOTIVATION: Chloroplast genomes are now produced in the hundreds for angiosperm phylogenetics projects, but current methods for annotation, alignment and tree estimation still require some manual intervention reducing throughput and increasing analysis time for large chloroplast systematics projects. RESULTS: Verdant is a web-based software suite and database built to take advantage a novel annotation program, annoBTD. Using annoBTD, Verdant provides accurate annotation of chloroplast genomes without manual intervention. Subsequent alignment and tree estimation can incorporate newly annotated and publically available plastomes and can accommodate a large number of taxa. Verdant sharply reduces the time required for analysis of assembled chloroplast genomes and removes the need for pipelines and software on personal hardware. AVAILABILITY AND IMPLEMENTATION: Verdant is available at: http://verdant.iplantcollaborative.org/plastidDB/ It is implemented in PHP, Perl, MySQL, Javascript, HTML and CSS with all major browsers supported. CONTACT: mrmckain@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online.
Chloroplast genomes, or plastomes, are a valuable tool for phylogenetics in angiosperms as they provide a less complex, albeit partial, history relative to nuclear loci. Plastomes are also easily obtainable from low coverage genome sequencing (Soltis ; Steele ) making them a desirable by-product from multiple sequencing projects. A major hurdle in scalable use of these data is quick and accurate annotation of plastomes and subsequent alignment and phylogenetic estimation. The time and necessary computational resources or skill required to complete these tasks may act as a barrier for novel research in underrepresented flowering plant groups.Here, we present Verdant, a taxonomically structured, database-driven suite of tools for annotation, alignment and tree estimation of chloroplast genomes in a web-based platform. An exhaustive tutorial is provided, but much of Verdant’s interface is designed to be as intuitive as possible.Verdant provides a number of key features designed for usability, including:Automated annotation of both whole and partial plastomes for protein coding genes, tRNAs and rRNAs using our novel software, annoBTD.Orientation and orthology focused alignments of annotated genes, rRNAs, tRNAs, introns and intergenic regions using MAFFT (Katoh and Standley, 2013).Phylogeny estimation using RAxML (Stamatakis, 2014).Annotation visualization using Circos (Krzywinski ) and JBrowse (Skinner ).Downloadable datasets of aligned and unaligned plastome regions, both individual gene or concatenated plastome trees, and project metadata including full plastome size, large single copy (LSC), small single copy (SSC), and inverted repeat (IRA or IRB) sizes and locations and total number of annotated features present.These functions are enabled by an underlying database consisting of high-quality plastomes downloaded from GenBank and newly annotated, secure and user-populated databases for individual projects. Users can then release their data to the public database at their discretion.
2 Implementation
Verdant is broken into two primary workflows (see Fig. 1). The first, which is automatic upon upload, involves the annotation of the plastome sequence(s) using our novel software, annoBTD, the population of the user’s personal and secure data structure, and the creation of Circos and JBrowse visualization for each plastome. The second workflow is completely user-driven and includes project creation, taxon selection, feature selection, alignment and phylogenetic tree reconstruction.
Fig. 1.
Verdant workflow. Workflow diagram depicting the automated (grey box) and user-driven (white box) steps and options available in Verdant. Parallelograms at the bottom of the diagram represent downloadable files available
Verdant workflow. Workflow diagram depicting the automated (grey box) and user-driven (white box) steps and options available in Verdant. Parallelograms at the bottom of the diagram represent downloadable files available
2.1 Annotation with annoBTD
Our novel annotation software, annoBTD, completely removes the need for manual annotation, although such intervention may occasionally be necessary with some aberrant plastomes. The time for annotation of a full plastome sequence is approximately 10–30 min, and annotations can be downloaded in GFF3 format. AnnoBTD is an alternative to the current standard web-based program DOGMA (Wyman ), which is effective and easy to use but requires manual intervention for final and accurate annotations thus limiting throughput.
2.1.1 Protein coding genes
Details of annoBTD are found in Supplementary Information. A novel feature includes de novo ORF identification; ORFs are then identified to reference using the five most closely related species available in the database. Once putative identity is established for an ORF, its position in the plastome informs the final annotation decision. Overlap of two different genes, such as psbD and psbC, is allowed. Start and stop codons for each gene, as well as intron–exon boundaries, are estimated from the sequence by methods that do not require canonical start codons or exact boundary matches. AnnoBTD also finds very small exons that may be missed by other annotation programs.
2.1.2 rRNAs and tRNAs
Because they are conserved in chloroplasts, rRNAs and tRNAs are detected by an optimized blastn and annotated via position and length in the plastome sequence.
2.1.3 LSC, SSC and IR
The LSC, SSC and IR regions of the plastome, if the full sequence is given, are estimated by identifying the repetitive IR sequences and assigning LSC and SSC by size.
2.2 Analyses in Verdant
Verdant’s project management system allows users to create multiple projects adding their own plastome data or publicly available data from the database. Users then choose single genes, ranges of genes, or whole plastomes to include in their analyses. Unaligned or aligned sequences may be downloaded. For alignments, each region of the genome, annotated feature or inter-annotated region, is aligned separately and then concatenated into a single alignment. The MAFFT nucleotide direction option is used to keep all regions properly oriented to each other in order to maintain alignment accuracy over inversion events. In cases where a taxon does not have a specific feature, the region is left as an indel for the taxon in the alignment. Both individual and concatenated alignments are provided to the user for download. Phylogenies are estimated using both individual region alignments and the concatenated alignments with all RAxML files available for download.
3 Conclusion
The annotation and project development features of Verdant provide a high-throughput method for conducting phylogenetic analyses using whole chloroplast sequences, a much needed utility with the glut of plastome data now available. Because of its focus on phylogenetic applications, Verdant complements other tools developed for functional studies of plastome biology. Future additions to Verdant will include more evolutionary analyses to look at plastome structure and function and, ultimately, user created modules.Click here for additional data file.
Authors: P Roxanne Steele; Kate L Hertweck; Dustin Mayfield; Michael R McKain; James Leebens-Mack; J Chris Pires Journal: Am J Bot Date: 2012-01-30 Impact factor: 3.844
Authors: Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra Journal: Genome Res Date: 2009-06-18 Impact factor: 9.043
Authors: Mitchell E Skinner; Andrew V Uzilov; Lincoln D Stein; Christopher J Mungall; Ian H Holmes Journal: Genome Res Date: 2009-07-01 Impact factor: 9.043
Authors: Christine A McAllister; Michael R McKain; Mao Li; Bess Bookout; Elizabeth A Kellogg Journal: Philos Trans R Soc Lond B Biol Sci Date: 2018-11-19 Impact factor: 6.237
Authors: Michael Tillich; Pascal Lehwark; Tommaso Pellizzer; Elena S Ulbricht-Jones; Axel Fischer; Ralph Bock; Stephan Greiner Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971
Authors: Alison P A Menezes; Luciana C Resende-Moreira; Renata S O Buzatti; Alison G Nazareno; Monica Carlsen; Francisco P Lobo; Evanguedes Kalapothakis; Maria Bernadete Lovato Journal: Sci Rep Date: 2018-02-02 Impact factor: 4.379