| Literature DB >> 33062940 |
Chenxi Zhou1, Tania Duarte1, Rocio Silvestre2, Genoveva Rossel2, Robert O M Mwanga3, Awais Khan2,4, Andrew W George5, Zhangjun Fei6, G Craig Yencho7, David Ellis2, Lachlan J M Coin1.
Abstract
Background: The chloroplast (cp) genome is an important resource for studying plant diversity and phylogeny. Assembly of the cp genomes from next-generation sequencing data is complicated by the presence of two large inverted repeats contained in the cp DNA.Entities:
Keywords: Convolvulaceae Ipomoea; Illumina sequencing; Oxford Nanopore sequencing; chloroplast; genome assembly; phylogenetic analysis; sweetpotato
Year: 2020 PMID: 33062940 PMCID: PMC7536352 DOI: 10.12688/gatesopenres.12856.2
Source DB: PubMed Journal: Gates Open Res ISSN: 2572-4754
Figure 1. Assembly of the Tanzania chloroplast (cp) genome.
( a) Dot plot of the Nanopore read length versus the alignment identity to reference assembly. The read alignment identity is defined as I = M/L, where M is the total number of base pairs of the exact match and L is the size of the alignment span on the reference genome. The reference genome is the 30 cp genomes downloaded from the NCBI (Supplementary Table 2). The alignment was performed with BWA MEM [45]. The alignment identities were calculated from the Cigar string. The purple and yellow represents before and after error correction with Illumina reads using Nanocorr [20], respectively. ( b) Dot plot of the reference cp genome versus the contigs produced by Canu [17]. ( c) Dot plot of the reference cp genome versus the contigs produced by AMOS minimus [46] after merging Canu contigs. ( d) Dot plot of the reference cp genome versus the contigs produced by AMOS minimus after circularization. ( e) Dot plot of the reference cp genome versus the final cp genome assembly which was polished with Illumina reads using Pilon [19] and fixed the start at the LSC. For ( b– e), the cp genome assembly of the I. trifida was used as the reference (accession number REM 753, Genbank accession number KF242496) [16]. The green bars on the x-axis indicate positions of the two IRs.
Figure 2. The chloroplast genome of the sweetpotato cultivar Tanzania.
The preliminary annotations were produced by DOGMA [48]. MUSCLE [49] was used to refine the annotations. The plot was generated with OGDRAW [50].
List of annotated genes.
The functional systems were adopted from the OGDRAW [50]. Bracketed superscripts represent number of copies.
| Functional system | Number | Gene list |
|---|---|---|
| Photosystem I | 7 |
|
| Photosystem II | 15 |
|
| Cytochrome b/f complex | 6 |
|
| ATP synthase | 6 |
|
| NADH dehydrogenase | 13 |
|
| RubisCO large subunit | 1 |
|
| C-type cytochrome synthesis | 1 |
|
| RNA polymerase | 4 |
|
| Ribosomal proteins (LSU) | 9 |
|
| Ribosomal proteins (SSU) | 16 |
|
| Maturase K | 1 |
|
| Acetyl-CoA carboxylase
| 1 |
|
| Clp protease proteolytic subunit | 1 |
|
| Chloroplast envelope membrane
| 1 |
|
| ORFs | 6 |
|
| Hypothetical chloroplast RF | 8 |
|
Figure 3. A phylogenetic tree of the Convolvulaceae Ipomoea section Batatas on the basis of chloroplast genomes.
The numbers on the branches are bootstrap support values. The branches shorter than 2×10 -4 substitutions per bp were collapsed resulting two clades consisting of 12 and 6 sweetpotato cultivars represented by a big and small solid circle respectively in the plot. The plot was generated with iTOL [52].
Figure 4. A phylogenetic tree of the East African sweetpotato cultivars used in the GT4SP project on the basis of chloroplast genomes.
This is a fine-scale representation of the two clades in Figure 3. The numbers on the branches are branch lengths given in terms of substitutions per bp.