| Literature DB >> 28158587 |
Yunsoo Kim1, Christopher Cullis.
Abstract
Tylosema esculentum (marama bean) is being developed as a possible crop for resource-poor farmers in arid regions of Southern Africa. As part of the molecular characterization of this species, the chloroplast genome has been assembled from next-generation sequencing using both Illumina and Pac-Bio data. The genome is of typical organization with a large single-copy region and a small single-copy region separated by a pair of inverted repeats and covers 161537 bp. It contains a unique inversion not present in any other legumes, even in the closest relatives for which the complete chloroplast genome is available, and two complete copies of the ycf1 gene. These data extend the range of variability of legume chloroplast genomes. The sequencing of multiple individuals has identified two different chloroplast genomes which were geographically separated. The current sampling is limited so that the extent of the intraspecific variation is still to be determined, leaving open the question of legume chloroplast genomes adapted to particular arid environments.Entities:
Keywords: Basal legume; chloroplast genome sequence; intraspecific variation; marama; unique inversion
Mesh:
Year: 2017 PMID: 28158587 PMCID: PMC5429017 DOI: 10.1093/jxb/erw500
Source DB: PubMed Journal: J Exp Bot ISSN: 0022-0957 Impact factor: 6.992
Fig. 1.Circular gene map of the Tylosema esculentum (Genistoid; Fabaceae) plastid genome. Genes are represented with boxes inside and outside the first circle to indicate a clockwise or counterclockwise transcription direction, respectively. Genes belonging to different functional groups are color coded. The locations of the different main plastomic regions (inverted repeats, large single copy, and small single copy) are indicated in the inner circle. The molecule was drawn through the analysis site http://www.herbalgenomics.org/0506/cpgavas/analyzer/home.
Summary of chloroplast genome characteristics of marama
| Total size (bp) | 161537 |
| LSC size in bp | 86113 |
| SSC size in bp | 13630 |
| IR length in bp | 30897 |
| Size of coding regions in bp | 101241 |
| Size of protein-coding regions in bp | 80218 |
| Size of rRNA in bp | 10282 |
| Size in bp of tRNA | 10741 |
| Size in bp of intergenic regions | 60296 |
| No. of different genes | 125 |
| No. of different protein-coding genes | 79 |
| No. of different tRNA genes | 30 |
| No. of different rRNA genes | 4 |
| No. of different genes duplicated by IR | 17 |
| No. of different genes with introns | 22 |
| Overall % GC content | 36.13% |
| % GC content in protein-coding regions | 37.5% |
| % GC content in IGSs | 31.16% |
| % GC content in rRNA | 54.6% |
| % GC content in tRNA | 43.4% |
Coding regions of the marama chloroplast
|
|
|
|---|---|
| rRNAs | 16S ( |
| tRNAs | tRNA-His(GTG), tRNA-Lys(TTT), tRNA-Gln(TTG), tRNA-Ser(GCT), tRNA-Thr(CGT), tRNA-Arg(TCT), tRNA-Cys(GCA), tRNA-Asp(GCT), tRNA- Tyr(GTA), tRNA-Glu(TTC), tRNA-Thr(GGT), tRNA-Ser(TGA), tRNA-Gly(GCC), tRNA-Met(CAT), tRNA-Ser(GGA), tRNA-Thr(TGT), tRNA-Leu(TAA), tRNA-Phe(GAA), tRNA-Ile(AAT), tRNA-Met(CAT), tRNA-Trp(CCA), tRNA-Pro(TGG), tRNA-Met(CAT) (×2), tRNA-Leu(CAA) (×2), tRNA-Val(GAC) (×2), tRNA-Glu(TTC) (×2), tRNA-Ala(TGC) (×2), tRNA-Arg(ACG) (×2), tRNA-Asn(GTT) (×2), tRNA-Leu(TAG) |
| Small subunit of ribosome |
|
| Large subunit of ribosome |
|
| RNA polymerase |
|
| NADH-dehydrogenase |
|
| PSI |
|
| PSII |
|
| Cytochrome |
|
| ATP synthase |
|
| Rubisco |
|
| Subunit of acetyl-CoA-carboxylase |
|
| Others |
|
| Unknown function ORFs |
|
The lengths of introns and exons for the splitting genes
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
|
| – | 12120 | 13437 | 148 | 709 | 461 | ||
|
| – | 21208 | 24065 | 437 | 800 | 1621 | ||
|
| – | 41327 | 43468 | 1788 | 30 | 324 | ||
|
| – | 44330 | 46381 | 127 | 716 | 230 | 829 | 150 |
|
| – | 60367 | 61305 | 238 | 120 | 581 | ||
|
| – | 71533 | 72885 | 299 | 837 | 217 | ||
|
| – | 86463 | 87955 | 394 | 632 | 467 | ||
|
| – | 96955 | 99163 | 870 | 586 | 753 | ||
|
| – | 126748 | 129120 | 552 | 1263 | 558 | ||
|
| + | 148488 | 150699 | 867 | 589 | 756 | ||
|
| + | 159696 | 161191 | 391 | 635 | 470 |
Fig. 2.Amplification across the two ends of the inversion confirming the arrangement of the sequences as assembled. DNAs from three different marama plants and from lupin as a control were used. Lane 1, New England Biolabs 100 bp ladder (bands visible at 1517, 1200, 1000, 900, 800, 700, 600, and 500 bp); lane 2, marama DNA; lane 3, marama DNA; lane 4, marama DNA; lane 5, lupin DNA lane 6, water; lane 7, blank; lane 8, marama DNA; lane 9, marama DNA; lane 10 marama DNA; lane 11, lupin DNA; lane 12, water. The products in lanes 2–6 were amplified with primers 1Forward and 1Reverse, while those in lanes 8–12 were amplified with primers 2Forward and 2Reverse.
Fig. 3.Alignment of the sequences from marama and Cercis canadensis chloroplast genomes covering the inversion region. The respective regions of the chloroplasts were aligned using Blastn (megablast) for high similarity. Note, not only the inversion but the differences in sequences across this region. The organization of the protein-coding genes is shown. The axes are labeled with the nucleotide positions in the chloroplast molecules, for marama the new genome and for Cercis the genome available at NCBI (KF856619.1).
Fig. 4.Phylogenetic relationships. Twenty-nine proteins (psb, matK, atpA, atpF, atpF, rpoB, psbD, psbC, psaB, psaA, ndhK, psbG, ndhC, atpE, atpB, rbcL, accD, cemA, petA, rpl20, psbB, petB, rpoA, ndhH, ndhA, ndhI, ndhG, ndhE, and ndhD) were identified from the available chloroplast genomes of Arabidopsis thaliana (AP000423.1), Cercis canadensis (KF856619.1), Haematoxylum brasiletto (NC_026679), Lotus japonicus (NC_002694.1), Medicago truncatula (NC_003119.6), Tamarindus indica (NC_026685), Tylosema esculentum (KX792933), and Vigna unguiculata (NC_018051.1) in NCBI and concatenated. The protein sets were aligned in KAlign and then their relationship tree determined by MrBayes.