| Literature DB >> 30479362 |
Carolina Osuna-Mascaró1,2, Rafael Rubio de Casas3,4, Francisco Perfectti5,6.
Abstract
Chloroplast genomes (cp genomes) are widely used in comparative genomics, population genetics, and phylogenetic studies. Obtaining chloroplast genomes from RNA-Seq data seems feasible due to the almost full transcription of cpDNA. However, the reliability of chloroplast genomes assembled from RNA-Seq instead of genomic DNA libraries remains to be thoroughly verified. In this study, we assembled chloroplast genomes for three Erysimum (Brassicaceae) species from three RNA-Seq replicas and from one genomic library of each species, using a streamlined bioinformatics protocol. We compared these assembled genomes, confirming that assembled cp genomes from RNA-Seq data were highly similar to each other and to those from genomic libraries in terms of overall structure, size, and composition. Although post-transcriptional modifications, such as RNA-editing, may introduce variations in the RNA-seq data, the assembly of cp genomes from RNA-seq appeared to be reliable. Moreover, RNA-Seq assembly was less sensitive to sources of error such as the recovery of nuclear plastid DNAs (NUPTs). Although some precautions should be taken when producing reference genomes in non-model plants, we conclude that assembling cp genomes from RNA-Seq data is a fast, accurate, and reliable strategy.Entities:
Mesh:
Year: 2018 PMID: 30479362 PMCID: PMC6258696 DOI: 10.1038/s41598-018-35654-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Details of the plant populations sampled: Taxon, population code, sampled tissue, location, and geographical coordinates.
| Taxon | Population code | Sample | Location | Elevation | Geographical coordinates |
|---|---|---|---|---|---|
| Ebb09 | Leaves | Sierra Nevada, Almería, Spain | 2128 | 37°05′46″N 3°01′01″W | |
| Ebb07 | Buds | Sierra Nevada, Almería, Spain | 2128 | 37°05′46″N 3° 01′01″W | |
| Ebb10 | Buds | Sierra Nevada, Almería, Spain | 2140 | 37°05′32″N 3° 00′ 40″ W | |
| Ebb12 | Buds | Sierra Nevada, Almería, Spain | 2264 | 37°05′51″N 2°58′06″W | |
| Em21 | Leaves and buds | Sierra Nevada, Granada, Spain | 1723 | 37° 08′ 04″N 3°25′43″W | |
| Em71 | Buds | Sierra de Huétor, Granada, Spain | 1352 | 37°57′10″N 2°29′24″W | |
| Em39 | Buds | Sierra Jureña, Granada, Spain | 1272 | 37°19′08″N 3°33′11″W | |
| En14 | Leaves | Nigüelas, Granada, Spain | 2314 | 37°01′27″N 3°28′08″W | |
| En12 | Buds | Sierra Nevada, Granada, Spain | 2255 | 37°05′37″N 2°56′19″W | |
| En10 | Buds | Sierra Nevada, Granada, Spain | 2321 | 37°06′37″N 3°24′18″W | |
| En05 | Buds | Sierra Nevada, Granada, Spain | 2074 | 37°06′35″N 3°01′32″W |
Figure 1A flow chart depicting the bioinformatics analyses to assembly cp genomes.
Figure 2Chloroplast genome map of Erysimum mediohispanicum. Genes drawn inside the circle are transcribed clockwise, and those outside are counter-clockwise. Genes belonging to a different functional group are shown in different colors. See supplementary material for functional category of these genes.
Characteristics of the chloroplast genomes of Erysimum: type of library (genomic DNA or RNA-Seq library), length of the cp genome (bp), number of assembled reads, length of the two inverted repeats (IRa and the IRb), length of the small single copy (SSC), and of the large single copy (LSC) region, and GC% content.
| Taxon | Population code | Type of library | Lenght (bp) | Assembled reads | IRa (bp) | SSC (bp) | IRb (bp) | LSC (bp) | GC % |
|---|---|---|---|---|---|---|---|---|---|
| Ebb09 | Genomic DNA | 154,581 | 983,811 | 26,429 | 83,767 | 26,426 | 136,625 | 36.6 | |
| Ebb07 | RNA-Seq libraries | 154,791 | 3,727,511 | 25,783 | 95,135 | 13,797 | 134,715 | 37.5 | |
| Ebb10 | RNA-Seq libraries | 154,768 | 9,963,413 | 25,847 | 95,396 | 14,419 | 135,662 | 36.5 | |
| Ebb12 | RNA-Seq libraries | 154,761 | 10,356,264 | 24,617 | 95,167 | 13,305 | 133,089 | 36.5 | |
| Em21 | Genomic DNA | 154,599 | 1,414,714 | 26,429 | 83,853 | 26,429 | 136,628 | 36.6 | |
| Em71 | RNA-Seq libraries | 154,788 | 1,314,441 | 24,671 | 95,187 | 13,303 | 133,161 | 36.5 | |
| Em39 | RNA-Seq libraries | 154,827 | 13,595,017 | 26,472 | 83,764 | 24,099 | 134,335 | 36.5 | |
| Em21 | RNA-Seq libraries | 154,251 | 19,075,780 | 25,280 | 89,248 | 18,133 | 132,661 | 36.6 | |
| En14 | Genomic DNA | 154,660 | 1,554,542 | 26,442 | 83,840 | 26,442 | 136,724 | 36.6 | |
| En05 | RNA-Seq libraries | 153,467 | 12,482,406 | 25,863 | 85,139 | 24,831 | 135,833 | 36.7 | |
| En10 | RNA-Seq libraries | 154,834 | 9,515,436 | 25,902 | 85,182 | 23,492 | 134,576 | 36.7 | |
| En12 | RNA-Seq libraries | 154,747 | 5,338,711 | 25,764 | 84,289 | 24,027 | 134,080 | 36.7 |
Figure 3Composition of Erysimum baeticum, E. mediohispanicum, and E. nevadense cp genomes, obtained from genomic data and for the three RNA-Seq replicas.
Comparison of RNA-Seq vs. genomic assembly of chloroplast genomes. Number of protein-coding genes, tRNA, mRNA, rRNA, exons, coding sequences (CDS), genes with introns, repeat sequences, and the total number of repeats (i.e. including forward, reverse, palindrome, and complemented repeats) in different chloroplast regions (IRa, SSC, IRb, and LSC) for chloroplast genomes obtained from genomic DNA and chloroplast genomes obtained from RNA-Seq libraries are presented. The eight genes showing introns were rpl2, atpF, rpoC1, psaA, ycf3, clpP, ndhB, and ndhA.
| Taxon | Population code | Type of library | Protein-coding genes | tRNA | mRNA | rRNA | Exons | CDS | Genes with introns | Total repeats number | Forward repeats | Reverse repeats | Palindrome repeats | Complemented repeats | Total repeats in IRa region | Total repeats in SSC region | Total repeats in IRb region | Total repeats in LSC region |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ebb09 | Genomic DNA | 124 | 29 | 87 | 8 | 136 | 99 | 8 | 49 | 31 | 0 | 18 | 0 | 7 | 33 | 7 | 2 | |
| Ebb07 | RNA-Seq | 124 | 29 | 87 | 8 | 136 | 99 | 8 | 50 | 25 | 0 | 25 | 0 | 7 | 36 | 7 | 0 | |
| Ebb10 | RNA-Seq | 124 | 29 | 87 | 8 | 136 | 99 | 8 | 61 | 36 | 0 | 23 | 2 | 8 | 45 | 8 | 0 | |
| Ebb12 | RNA-Seq | 124 | 29 | 87 | 8 | 136 | 99 | 8 | 64 | 36 | 0 | 26 | 2 | 8 | 48 | 8 | 0 | |
| Em21 | Genomic DNA | 124 | 29 | 86 | 8 | 135 | 98 | 8 | 63 | 37 | 0 | 24 | 2 | 10 | 40 | 3 | 10 | |
| Em71 | RNA-Seq | 124 | 29 | 87 | 8 | 135 | 98 | 8 | 60 | 36 | 0 | 22 | 2 | 7 | 46 | 7 | 0 | |
| Em39 | RNA-Seq | 124 | 29 | 87 | 8 | 135 | 98 | 8 | 74 | 36 | 0 | 36 | 2 | 13 | 48 | 13 | 0 | |
| Em21 | RNA-Seq | 124 | 29 | 87 | 8 | 135 | 98 | 8 | 69 | 42 | 1 | 24 | 2 | 9 | 50 | 9 | 1 | |
| En14 | Genomic DNA | 124 | 29 | 87 | 8 | 136 | 99 | 8 | 78 | 51 | 1 | 25 | 1 | 7 | 59 | 7 | 5 | |
| En05 | RNA-Seq | 124 | 29 | 86 | 8 | 136 | 99 | 8 | 73 | 38 | 0 | 33 | 2 | 11 | 51 | 11 | 0 | |
| En10 | RNA-Seq | 124 | 29 | 87 | 8 | 136 | 99 | 8 | 70 | 36 | 0 | 31 | 2 | 11 | 48 | 11 | 0 | |
| En12 | RNA-Seq | 124 | 29 | 87 | 8 | 136 | 99 | 8 | 69 | 36 | 0 | 31 | 2 | 9 | 49 | 9 | 2 |
Figure 4The number of single small repeats (SSRs) sequences in the chloroplast genomes of Erysimum species, obtained from genomic data and for the three RNA-Seq replicas.
Figure 5Sequence identity plots among the Erysimum chloroplast genomes, with Arabidopsis thaliana as a reference. Annotated genes are displayed on the top. A cut-off of 50% identity was used for the plot. The vertical scale represents the percent identity between 50 and 100%. Genome regions are color-coded as CNS (conserved non-coding sequences), exons, and introns. The color legend is summarized in the upper left-hand corner.