Literature DB >> 30828503

ReFernment: An R package for annotating RNA editing in plastid genomes.

Tanner A Robison1, Paul G Wolf1.   

Abstract

PREMISE OF THE STUDY: In the absence of cDNA, the annotation of RNA editing in plastomes must be done manually, representing a significant time cost to those studying the organellar genomes of ferns and hornworts. METHODS AND
RESULTS: We developed an R package to automatically annotate apparent nonsense mutations in plastid genomes. The software successfully annotates such sites and results in no false positives for data with no sequencing or assembly errors.
CONCLUSIONS: Compared to manual annotation, ReFernment offers greater speed and accuracy for annotating RNA editing sites. This software should be especially useful for researchers generating large numbers of plastome sequences for taxa with high levels of RNA editing.

Entities:  

Keywords:  GenBank; National Center for Biotechnology Information (NCBI); RNA editing; annotation; chloroplast; genome; plastome

Year:  2019        PMID: 30828503      PMCID: PMC6384294          DOI: 10.1002/aps3.1216

Source DB:  PubMed          Journal:  Appl Plant Sci        ISSN: 2168-0450            Impact factor:   1.936


The development of next‐generation sequencing has led to an explosion of available genome data, especially for plastid genomes (plastomes). These relatively small genomes are a major source of data for phylogenetic analyses. As of September 2018, more than 2700 plastome sequences from green plants have been published (https://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=2759&opt=plastid) in public databases, which has in turn aided in the resolution of deep phylogenetic relationships across plant diversity (Ruhfel et al., 2014; Tonti‐Filippini et al., 2017; Gitzendanner et al., 2018). However, researchers assembling and annotating plastomes are often faced with the problem of RNA editing, whereby the sequence of the initial transcript is altered prior to translation. In some groups of plants, RNA editing can be high: up to 78% of protein‐coding genes in plastomes of ferns (Wolf et al., 2004) and hornworts (Kugita et al., 2003). Many of these RNA editing sites will alter the sequences of start codons, stop codons, or result in stop codons within the genomic coding sequence. The most common forms for RNA editing in plastomes are U‐to‐C or C‐to‐U editing (Kugita et al., 2003; Wolf et al., 2004). Whereas many of the automated annotation tools presently available are generally good at annotating plastid genes, none of them account for RNA editing (Wyman et al., 2004; Liu et al., 2012; McKain et al., 2017; Jung et al., 2018). This results in annotated genes that appear to be missing start codons or stop codons, or to contain numerous internal stops based on their nucleotide translations. Reasonably, issues like these make it difficult to get some plastome sequences approved for public databases such as GenBank. Although RNA editing appears to occur at a lower rate in angiosperms than in other clades, 138 RNA editing sites were detected in the plastome of Amborella Baill. (Hein et al., 2016). Thus, the need to annotate RNA editing sites may not be restricted to a few seed‐free lineages. Tools are available to predict RNA editing sites, for example, PREPACT (Lenz and Knoop, 2013) and PREP suite (Mower, 2009). While these tools are powerful for predicting RNA editing sites in organelle genomes, they do not directly add to or alter an existing annotation file. Thus, in many cases, researchers manually add these annotations by examining each nonsense mutation and determining whether RNA editing would likely restore this site. This process, while necessary for admission to public repositories, is tedious and time‐consuming—especially considering that these edits to nonsense mutations occur in a highly predictable manner. Here, we attempt to solve this problem by introducing ReFernment, a simple R package that automatically annotates nonsense codons in DNA translations to account for RNA editing and provides conceptual translations for coding sequences. ReFernment is available at https://github.com/TARobison/ReFernment.

METHODS AND RESULTS

ReFernment operates by refining existing annotations. Thus, the software uses an annotation generated by programs such as DOGMA, CpGAVAS, Verdant, or AGORA (Wyman et al., 2004; Liu et al., 2012; McKain et al., 2017; Jung et al., 2018) and adjusts these annotations to account for RNA editing. ReFernment requires both a GFF3 (no sequence) file and a GenBank flat file (including nucleotide sequence), and its basic operation is extremely simple. First, ReFernment checks the starting and final codons of each gene. In both cases, ReFernment initially checks whether the codon is a valid start or stop. If the codon is not valid, it checks whether an RNA editing event would result in the restoration of the codon to a valid start or stop (e.g., ACG to AUG). If the codon is not valid, even after checking for possible RNA editing, ReFernment checks whether nearby codons (within five codons) represent valid codons; if so, ReFernment changes the gene boundaries to start or stop at those valid sites. Next, ReFernment checks whether a gene has any internal stops, and if so, checks whether RNA editing would restore these nonsense mutations, adjusting the translation to account for this. ReFernment then edits the imputed GenBank flat file, adding conceptual translations and annotations indicating the sites where RNA editing occurred with ‘misc_feature’ flags, adding necessary RNA editing flags to the relevant genes, and providing a conceptual translation for each gene. Finally, ReFernment produces a five‐column feature table, formatted correctly for submission to GenBank, and a protein FASTA file with the conceptual translations for coding sequences where RNA editing has occurred. ReFernment operates under the assumption that only U‐to‐C or C‐to‐U RNA editing is occurring in the plastome (Takenaka et al., 2013). Additionally, ReFernment assumes that all nonsense mutations are the result of RNA editing. Because most of the genes that reside within the plastome are vital to photosynthetic function, it is assumed that these genes will remain operational. There may be cases where internal stops, bad starts, or missing stops are actually the result of an uncorrected mutation, especially in parasitic lineages (Krause, 2008). When ReFernment was tested against plastomes with high levels of RNA editing, confirmed with cDNA data (AB086179 and AY178864.1), every nonsense mutation was correctly annotated, and there were no false positive annotations. A major limitation of ReFernment is that the annotations it produces are only as good as the annotations it is provided. If a gene annotation is frameshifted, if a pseudogene is annotated as a coding sequence, if there are assembly errors, or if an annotation has the incorrect start and stop sites, ReFernment might interpret this as RNA editing, rather than an error. In other words, ReFernment is not a substitute for manually checking gene annotations, nor is ReFernment a fix for sloppy annotation. In an attempt to mitigate these problems, if there are more than five detected internal stops in a gene, ReFernment will produce an error suggesting that the user manually check that gene. There are cases where genes have more than five RNA‐edited internal stops, but these are relatively rare, so users should use best judgement. The utility of ReFernment is simple: it saves users time in the final stages of annotation. Manually accounting for RNA edits generally takes hours for a typical fern or hornwort plastid genome, but with ReFernment, this process takes less than a minute. There are currently efforts to publish some 1000 additional fern plastomes in the coming years, and hopefully similar efforts are underway for hornworts, meaning many thousands of hours can be saved by the implementation of this simple program. ReFernment not only saves the researcher time, but also provides consistent methodology for the annotation of RNA editing. In many cases, RNA editing sites are not annotated in plastid sequences and only conceptual translations are provided. This not only results in confusion in how to annotate such sites consistently, but also makes it difficult for researchers interested in the evolution of such sites to readily identify them.

CONCLUSIONS

ReFernment offers easy and rapid annotation of RNA‐edited sites and automatic conceptual translation of amino acid sequences, streamlining the process of GenBank submission and saving the user valuable time.
  14 in total

1.  RNA editing in hornwort chloroplasts makes more than half the genes functional.

Authors:  Masanori Kugita; Yuhei Yamamoto; Takeshi Fujikawa; Tohoru Matsumoto; Koichi Yoshinaga
Journal:  Nucleic Acids Res       Date:  2003-05-01       Impact factor: 16.971

2.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

Review 3.  From chloroplasts to "cryptic" plastids: evolution of plastid genomes in parasitic plants.

Authors:  Kirsten Krause
Journal:  Curr Genet       Date:  2008-08-12       Impact factor: 3.886

Review 4.  RNA editing in plants and its evolution.

Authors:  Mizuki Takenaka; Anja Zehrmann; Daniil Verbitskiy; Barbara Härtel; Axel Brennicke
Journal:  Annu Rev Genet       Date:  2013       Impact factor: 16.830

5.  High levels of RNA editing in a vascular plant chloroplast genome: analysis of transcripts from the fern Adiantum capillus-veneris.

Authors:  Paul G Wolf; Carol A Rowe; Mitsuyasu Hasebe
Journal:  Gene       Date:  2004-09-15       Impact factor: 3.688

6.  The PREP suite: predictive RNA editors for plant mitochondrial genes, chloroplast genes and user-defined alignments.

Authors:  Jeffrey P Mower
Journal:  Nucleic Acids Res       Date:  2009-05-11       Impact factor: 16.971

7.  PREPACT 2.0: Predicting C-to-U and U-to-C RNA Editing in Organelle Genome Sequences with Multiple References and Curated RNA Editing Annotation.

Authors:  Henning Lenz; Volker Knoop
Journal:  Bioinform Biol Insights       Date:  2013-01-10

8.  CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences.

Authors:  Chang Liu; Linchun Shi; Yingjie Zhu; Haimei Chen; Jianhui Zhang; Xiaohan Lin; Xiaojun Guan
Journal:  BMC Genomics       Date:  2012-12-20       Impact factor: 3.969

9.  From algae to angiosperms-inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes.

Authors:  Brad R Ruhfel; Matthew A Gitzendanner; Pamela S Soltis; Douglas E Soltis; J Gordon Burleigh
Journal:  BMC Evol Biol       Date:  2014-02-17       Impact factor: 3.260

10.  Frequent chloroplast RNA editing in early-branching flowering plants: pilot studies on angiosperm-wide coexistence of editing sites and their nuclear specificity factors.

Authors:  Anke Hein; Monika Polsakiewicz; Volker Knoop
Journal:  BMC Evol Biol       Date:  2016-01-25       Impact factor: 3.260

View more
  2 in total

1.  Variation in plastid genomes in the gynodioecious species Silene vulgaris.

Authors:  Manuela Krüger; Oushadee A J Abeyawardana; Miloslav Juříček; Claudia Krüger; Helena Štorchová
Journal:  BMC Plant Biol       Date:  2019-12-19       Impact factor: 4.215

2.  Plastid Phylogenomics and Plastomic Diversity of the Extant Lycophytes.

Authors:  Sisi Chen; Ting Wang; Jiangping Shu; Qiaoping Xiang; Tuo Yang; Xianchun Zhang; Yuehong Yan
Journal:  Genes (Basel)       Date:  2022-07-19       Impact factor: 4.141

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.