Literature DB >> 30175293

Plastome characteristics of Cannabaceae.

Huanlei Zhang1,2, Jianjun Jin1,2, Michael J Moore3, Tingshuang Yi1, Dezhu Li1.   

Abstract

Cannabaceae is an economically important family that includes ten genera and ca. 117 accepted species. To explore the structure and size variation of their plastomes, we sequenced ten plastomes representing all ten genera of Cannabaceae. Each plastome possessed the typical angiosperm quadripartite structure and contained a total of 128 genes. The Inverted Repeat (IR) regions in five plastomes had experienced small expansions (330-983 bp) into the Large Single-Copy (LSC) region. The plastome of Chaetachme aristata has experienced a 942-bp IR contraction and lost rpl22 and rps19 in its IRs. The substitution rates of rps19 and rpl22 decreased after they shifted from the LSC to IR. A 270-bp inversion was detected in the Parasponia rugosa plastome, which might have been mediated by 18-bp inverted repeats. Repeat sequences, simple sequence repeats, and nucleotide substitution rates varied among these plastomes. Molecular markers with more than 13% variable sites and 5% parsimony-informative sites were identified, which may be useful for further phylogenetic analysis and species identification. Our results show strong support for a sister relationship between Gironniera and Lozanell (BS = 100). Celtis, Cannabis-Humulus, Chaetachme-Pteroceltis, and Trema-Parasponia formed a strongly supported clade, and their relationships were well resolved with strong support (BS = 100). The availability of these ten plastomes provides valuable genetic information for accurately identifying species, clarifying taxonomy and reconstructing the intergeneric phylogeny of Cannabaceae.

Entities:  

Keywords:  IR expansion/contraction; Phylogenomics; Plastome; Repeats; SSR; Sequence divergence

Year:  2018        PMID: 30175293      PMCID: PMC6114266          DOI: 10.1016/j.pld.2018.04.003

Source DB:  PubMed          Journal:  Plant Divers        ISSN: 2468-2659


Introduction

Cannabaceae sensu APG IV (Byng et al., 2016) comprise ten genera (Lipton, 1997, Sytsma et al., 2002, Haston et al., 2007, Haston et al., 2009; Mabberley, 2008, Bell et al., 2010) and ca. 117 species (Jin et al., unpublished). Most Cannabaceae species are trees and shrubs, while some are herbs (Cannabis L.) or vines (Humulus L.). The family has a cosmopolitan distribution; Aphananthe (Thunb.) Planch., Celtis L. and Trema Lour. are widely distributed in tropical and temperate regions (Yang et al., 2013; Jin et al., unpublished); the remaining genera have restricted distributions. A few species of this family are of great economic importance. Cannabis sativa L. (hemp) is one of earliest and most important domesticated food and fiber crops, and an increasingly important drug used for its anesthetic and antipsychotic properties (Measham et al., 1994, Kostic et al., 2008, Marks et al., 2009). Humulus lupulus L. (hops) is a key ingredient for brewing beer (Wilson, 1975, Murakami et al., 2006), and the phloem fiber of Pteroceltis tatarinowii Maxim. is the sole raw material for manufacturing traditional Chinese Xuan paper (Cao, 1993). There are long-standing controversies over the circumscription and phylogenetic position of Cannabaceae. Cannabaceae was first separated from Moraceae by Rendle (1925). The circumscription of this family has been expanded significantly to include most former members of Ulmaceae subfam. Celtidoideae sensu Engler and Prantl (1893) or Celtidaceae sensu Link (1829) (Yang et al., 2013). A series of molecular studies elucidated the phylogenetic position of this family, which was supported to be a member of Rosales and sister to Moraceae and Urticaceae (Sytsma et al., 2002, Van Velzen et al., 2006, Wang et al., 2009, Zhang et al., 2011a, Zhang et al., 2011b). Multiple molecular studies have also helped to clarify intergeneric relationships of the family (Yang et al., 2013; Jin et al., unpublished). However, a few nodes among genera have remained unresolved with weak support (Yang et al., 2013). The plastome of angiosperms is usually conserved in gene content and structure, typically featuring two ∼25 kb Inverted Repeat (IR) regions separating the remainder of the genome into Large and Small Single-Copy regions (LSC, SSC). Size variation among plastomes is mostly due to the expansion or contraction of the IR and/or larger indels, as for example caused by the loss of genes (especially the ndh genes) (Downie and Jansen, 2015). Plastomes have proved highly valuable in resolving difficult phylogenetic relationships at both deeper taxonomic levels (e.g. Jansen et al., 2007, Moore et al., 2007, Moore et al., 2010), as well as at more shallow levels (e.g. Zhang et al., 2011a, Zhang et al., 2011b, Givnish et al., 2015, Wysocki et al., 2015, Duvall et al., 2016). In this article, we report the complete plastome sequences of ten species representing all ten genera of Cannabaceae. We annotated the plastomes in detail, identified structure and size variation, and determined the distribution and location of microsatellites (SSRs) and repeats. We demonstrate that the resulting plastome information will be widely useful for understanding phylogenetic relationships, population genetics and breeding programs across the family.

Materials and methods

Chloroplast DNA extraction and sequencing

We used about 100 mg of fresh leaf material of each species (see Table S1 for voucher specimens). Total genomic DNA was extracted with a modified CTAB (Cetyl Trimethyl Ammonium Bromide) method (Doyle and Doyle, 1987), in which 4% CTAB with approximately 1% polyvinyl polypyrrolidone (PVP) and 0.2% DL-dithiothreitol (DTT) was included (Yang et al., 2014). Long-range polymerase chain reaction (PCR) was used for DNA amplification of the plastome using 15 universal primers pairs and methods described by Zhang et al. (2016). Illumina Nextera XT libraries (Illumina, San Diego, CA, USA) with 500 bp inserts were constructed following the manufacturer's protocol. Paired-end (PE) sequencing was performed on an Illumina Hiseq 2500 instrument at the Beijing Genomics Institute (BGI, Shenzhen, Guangdong, China) or on a Hiseq 2000 instrument at the Plant Germplasm and Genomics Center (Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China).

Plastome assembly and annotation

Raw reads were filtered using NGSQCToolkit (Patel and Jain, 2012; cut-off value for percentage of read length = 80, cut-off value for PHRED quality score = 30) to obtain high quality reads that were free of vector and adaptor sequences. Filtered reads were then assembled into contigs using the software CLC Genomics Workbench 8, via the de novo method using a k-mer of 63 and a minimum contig length of 1 kb. Using BLAST (Altschul et al., 1990) with default search parameters, all contigs were aligned to the Morus mongolica Schneid. plastome (NC025772.2) as a reference. We mapped the paired reads to the assembled plastomes using Bowtie 2 (Langmead and Salzberg, 2012), as implemented in Geneious v9.5 (Kearse et al., 2012), to verify the IR boundaries, correct some biased bases brought in by the CLC assembler, and detect the number of matched paired-end (PE) reads and the depth of coverage. Lastly, we filled the remaining gaps using long-range PCR and Sanger sequencing. We designed primers based on previous incomplete plastomes (Table S2). Each amplification was performed in 25 μL reaction volume containing 12.5 μL Taq DNA polymerase, 0.5 μL each of forward and reverse primers (dissolved in 10× ddH2O), and 1 μL (30 ng/μL) template DNA. The amplification was conducted using 94 °C for 3 min, 35 cycles of 94 °C for 50 s, 50 °C for 2 min, and 72 °C for 1 min, followed by a final extension step at 72 °C for 8 min. PCR products were sequenced at the Kunming Sequencing Department of Biosune Biotechnology Limited Company (Shanghai, China). Assembled genomes were annotated using DOGMA (Wyman et al., 2004) along with manual correction of start and stop codons and intron/exon boundaries in Geneious. Transfer RNA (tRNA) genes were further annotated using tRNAscan-SE (Schattner et al., 2005). Genome maps were created in OGDraw 1.2 (Lohse et al., 2013). All annotated plastomes were deposited in GenBank; accession numbers are MH118117–MH11812 that provided in Table S1.

Phylogenetic analysis

Phylogenetic analyses included all ten genera of Cannabaceae as ingroups, two species of M. mongolica (Moraceae) and Ulmus macrocarpa Hance (Ulmaceae) representing closely related families as outgroups (Table S1). A total of 237 loci (112 coding and 125 noncoding regions) were extracted from each plastome (exons were joined) for phylogenetic analysis. Loci shared by less than 6 taxa or with length <30 bp were excluded (Table S3). Sequences were aligned using MAFFT version 7 (Katoh and Standley, 2013) with default parameters. Maximum likelihood analysis was performed with RAxMLv8.2.10 (Stamatakis, 2006), by using the ‘-f a’ option, GTRGAMMA model, and 1000 bootstrap replicates, with data partitioned by locus.

Analysis of sequence divergence

To characterize sequence divergence among all sequenced plastomes of Cannabaceae, we extracted 133 coding and 129 noncoding regions (including intergenic spacers and introns), each of them treated as a separate locus. These regions were aligned using MEGA v6.06 (Tamura et al., 2013). For each alignment, the number of invariant sites, variable but parsimony-uninformative sites, and parsimony-informative sites were calculated, as was pairwise sequence divergence (uncorrected “p” distance), all using PAUP* 4.0a147 (Swofford, 2002). Gaps were treated as missing data. Using the Humulus scandens plastome as a reference, sequence identity was also plotted using mVISTA (Frazer et al., 2004) in LAGAN mode.

Repeat analysis

REPuter (Kurtz et al., 2001) was used to locate sequence repeats including forward, reverse, and palindromic repeats. The minimal repeat size was set to 30 bp and repeat identity was set to ≥90% (hamming distance equal to 3). Before using REPuter to detect repeats, to avoid redundancy we removed the IRA region from each plastome. However, IR repeats were treated twice (to represent both copies) when summarizing repeats across the genome. Tandem repeats were analyzed using TRF (Tandem Repeat Finder program) web interface (Benson, 1999) with the parameters setting as 2, 7 and 7 for match, mismatch and indel respectively. The minimum alignment score and maximum period size were set as 50 and 500. After analysis, tandem repeats <15 bp in length and the redundant results of REPuter were manually removed (Wang et al., 2017). We also tallied the total number of repeats, measured repeat lengths, and calculated the proportion of repeats in the LSC, SSC, and IR.

SSR analysis

Microsatellite detection was performed using MISA with minimum number of repeats of 8, 5, 4, 3, 3, and 3 respectively for mono-, di-, tri-, tetra-, penta-, and hexanucleotide repeats. One copy of the IR was removed prior to microsatellite detection. All of the repeats were manually verified, and redundant results were removed.

Results and discussion

Conservation of Cannabaceae plastomes

Illumina sequencing produced from 289,464 (Celtis blondii) to 4,807,452 (Trema orientalis) paired-end reads, among which 257,965 (Celtis blondii) to 4,346,229 (T. orientalis) reads were mapped to their respective assembled genomes. De novo and reference-guided assembly produced full coverage for all plastomes, with mean coverages ranging from 120.3 × (Celtis blondii) to 2569.3 × (T. orientalis) (Table 1).
Table 1

Assembly statistics and genome features for newly sequenced Cannabaceae plastomes.

SpeciesTotal PE readsMatched PE readsMean coverage (×)Genome length (bp)LSC length (bp)SSC length (bp)IR length (bp)GC content (%)
Aphananthe aspera1,695,716374,611583.7157,68786,13519,44226,01536.4
Cannabis sativa2,040,5001,880,7001351.8153,91084,05917,82926,01136.7
Celtis blondii289,464257,965120.3159,00186,07219,17126,87936.3
Chaetachme aristata1,142,6081,045,8911415.4157,93986,74320,06425,56636.1
Gironniera subaequalis396,352374,583583.6157,80786,21518,94226,32536.3
Humulus scandens1,010,646839,2511436.6153,77683,88517,75126,07036.9
Lozanella enantiophylla1,077,0021,026,1151573.4156,71185,92819,13325,82536.6
Parasponia rugosa586,024498,328627.5157,43486,96119,31325,58036.3
Pteroceltis tatarinowii1,051,832992,3801711.1158,50487,62018,85626,01436.3
Trema orientalis4,807,4524,346,2292569.3157,19286,85919,30925,51236.3

PE = paired-end; LSC = Large Single-Copy region; SSC = Small Single-Copy region; IR = Inverted Repeat region.

Assembly statistics and genome features for newly sequenced Cannabaceae plastomes. PE = paired-end; LSC = Large Single-Copy region; SSC = Small Single-Copy region; IR = Inverted Repeat region. All sequenced plastomes displayed the typical quadripartite structure of most angiosperms (Wang et al., 2013, Li et al., 2014). The ten plastomes ranged in size from 153,776 bp (H. scandens) to 159,001 bp (Celtis blondii). The length of their LSC region varied from 83,885 bp (H. scandens) to 87,620 bp (P. tatarinowii), that of the SSC region from 17,751 bp (H. scandens) to 20,064 bp (Chaetachme aristata), and their IR region from 25,512 bp (T. orientalis) to 26,879 bp (Celtis blondii) (Table 1). The overall GC content was approximately 37.3% across all ten sampled plastomes. The gene content and structural organization of all ten sequenced plastomes were also highly conserved (Fig. 1, Fig. S1). Most plastomes harbored 112 unique genes, including 78 protein-coding genes, 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes. The exceptions were the plastomes of P. tatarinowii and C. aristata; the former had a pseudogenic rpl22 and the latter lost rpl22 (Table 2). All plastomes lost infA, which was consistent with those of most eurosids (Millen et al., 2001).
Fig. 1

Gene maps of the plastome of Genes are indicated by boxes on the inside (clockwise transcription) and outside (counterclockwise transcription) of the outermost circle. The inner circle identifies the major structural components of the plastome (LSC, IR, and SSC). Genes belonging to different functional groups are color-coded. Dashed area in the inner circle indicates the GC content of the plastome. * represents the tRNA with an intron.

Table 2

Gene content in Cannabaceae plastomes.

CategoryGene groupsName of genes
Self- replicationLarge subunit of ribosomal proteinsrpl2b (×2), rpl14, rpl16b, rpl20, rpl22 (×2)e,f, rpl23 (×2), rpl32, rpl33, rpl36
Small subunit of ribosomal proteinsrps2, rps3, rps4, rps7 (×2), rps8, rps11, rps12a–c (×2), rps14, rps15, rps16b, rps18, rps19 (×2)d
DNA-dependent RNA polymeraserpoA, rpoB, rpoC1b, rpoC2
Ribosomal RNA genesrrn4.5 (×2), rrn5 (×2), rrn16 (×2), rrn23 (×2)
Transfer RNA genestrnA-UGC (×2)b, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, trnG-GCC, trnG-UCCb, trnH-GUG, trnI-CAU (×2), trnI-GAU (×2)b, trnK-UUUb, trnL-CAA (×2), trnL-UAAb, trnL-UAG, trnM-CAU, trnN-GUU (×2), trnP-UGG, trnQ-UUG, trnR-ACG (×2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC (×2), trnV-UACb, trnW-CCA, trnY-GUA
PhotosynthesisPhotosystem IpsaA, psaB, psaC, psaI, psaJ
Photosystem IIpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
NADH dehydrogenasendhAb, ndhBb (×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Cytochrome b/f complexpetA, petBb, petDb, petG, petL, petN
ATP synthaseatpA, atpB, atpE, atpFb, atpH, atpI
RubisCo large subunitrbcL
Other genesMaturase KmatK
Envelope membrane proteincemA
Subunit of acetyl- CoA carboxylaseaccD
c-type cytochrome synthesis geneccsA
ProteaseclpPa
Proteins of unknown functionycf1, ycf2 (×2), ycf3a, ycf4

(×2) = gene present twice due to position within the IR; a Contains two introns; b Contains one intron; c Exons separated and joined by trans-splicing; d gene present in the IRs in the IR-expanded species; e Gene present in the IR of Celtis blondii; f Gene present in the IR of Chaetachme aristata.

Gene maps of the plastome of Genes are indicated by boxes on the inside (clockwise transcription) and outside (counterclockwise transcription) of the outermost circle. The inner circle identifies the major structural components of the plastome (LSC, IR, and SSC). Genes belonging to different functional groups are color-coded. Dashed area in the inner circle indicates the GC content of the plastome. * represents the tRNA with an intron. Gene content in Cannabaceae plastomes. (×2) = gene present twice due to position within the IR; a Contains two introns; b Contains one intron; c Exons separated and joined by trans-splicing; d gene present in the IRs in the IR-expanded species; e Gene present in the IR of Celtis blondii; f Gene present in the IR of Chaetachme aristata. The IR, LSC, and SSC gene content, as well as intron content, for most of the Cannabaceae plastomes matched the typical content for angiosperms, with some differences in IR gene content (Fig. 2, Table S4). The plastomes of Aphananthe aspera, Lozanella enantiophylla, Parasponia rugosa and T. orientalis possessed canonical IRs ranging from 25,512 bp in T. orientalis to 26,015 bp in A. aspera. Their IRs contained 17 complete genes (including six protein-coding genes, seven tRNAs, and all four rRNAs) as well as the 5′ ends of ycf1 (1037–1076 bp) and rps19 (0–100 bp). The plastomes of C. sativa, H. scandens, P. tatarinowii, Celtis blondii and Gironniera subaequalis had longer IRs, ranging from 26,011 bp (C. sativa) to 26,879 bp (Celtis blondii), caused by 330-bp (C. sativa) to 983-bp (Celtis blondii) IR expansions into the LSC; specifically, IRs expanded into all of rps19 and all or part of rpl22 (25–408 bp). In contrast, C. aristata had the shortest IR at 25,566 bp, due to a 942-bp IR contraction. Its IRs lost rps19 and rpl22, but rps19 was found before trnH-GUG in LSC near the IRa/LSC junction (JLA). IRs of C. aristata may have experienced more than a 942-bp IR expansion into LSC firstly to include rps19 and rpl22, followed by the loss of rps19 (279 bp) and rpl22 (408 bp) from IRb and rpl22 from IRa. In contrast, the IR/SSC junctions showed little variation, including 0 (A. aspera) to 45 bp (L. enantiophylla) of the 3′ end of ndhF.
Fig. 2

Comparison of IR/SC boundaries among Cannabaceae plastomes. JSB, JSA and JLA refer to junctions of SSC/IRB, SSC/IRA, and LSC/IRA, respectively. Ψ indicates a pseudogene copy of a gene partially duplicated in the IR.

Comparison of IR/SC boundaries among Cannabaceae plastomes. JSB, JSA and JLA refer to junctions of SSC/IRB, SSC/IRA, and LSC/IRA, respectively. Ψ indicates a pseudogene copy of a gene partially duplicated in the IR. IR expansion and contraction are common, especially small contractions and expansions of <100 base pairs (bp), and the positions of four IR/single-copy junctions can vary even among closely related species (Goulding et al., 1996, Plunkett and Downie, 2000). Large IR expansions occur less frequently and sometimes accompany structural rearrangements elsewhere in the plastid genome (Guisinger et al., 2011, Wicke et al., 2011). Cannabaceae provide yet another example of moderate to small IR expansion and contraction. IR expansion has been suggested to start with double-strand breaks followed by strand invasion and recombination (Goulding et al., 1996, Wang et al., 2008). Regions with a high content of short repeats or “poly A tracts” were inferred to be associated with the dynamics of IR-LSC junctions and expansions of IR (Wang et al., 2008, Dugas et al., 2015). In Cannabaceae plastomes with expanded IRs, a region ca. 100 bp upstream of the IR-LSC junctions was found to be extremely AT-rich (>90%), including many poly A tracts and short repeats, which could explain the IR expansion of Cannabaceae plastomes. Large IR contractions have been rarely reported, and illegitimate recombination has been considered as the most plausible explanation (Goulding et al., 1996, Downie and Jansen, 2015, Blazier et al., 2016), which may also account for the IR contraction in C. aristata. Nucleotide substitution rates of most plastome coding genes have been demonstrated to decrease after translocation from SC regions to the IR (Lin et al., 2012, Li et al., 2016, Zhu et al., 2016; but see exceptions in Lin et al., 2012, Wang et al., 2017). In this study, we also found a decrease of substitution rates for rps19 (0.0154) and rpl22 (0.0229) after their shifts from LSC into IR. Finally, an interesting 270-bp inversion between petN and psbM was detected in the plastome of P. rugosa, representing the first known reasonably long inversion in Cannabaceae plastomes. A pair of 18-bp inverted repeats resided at the boundaries of this inversion, and it is likely that these repeats helped mediate this inversion, as seen for other smaller inversions (Kim et al., 2005; Qu et al., 2017a, Qu et al., 2017b). Likewise, short repeats have also been inferred to associated with large inversions, such as the association of 29-kb repeats with a 36-kb inversion in legumes (Martin et al., 2014); the association ≥ 20-bp repeats with a 45-kb inversion of Medicago truncatula (Gurdon and Maliga, 2014); and the association of 11-bp repeats with a 36-kb inversion in Calocedrus macrolepis (Qu et al., 2017a, Qu et al., 2017b).

Phylogenetic relationships

The monophyly of Cannabaceae was strongly supported (BS = 100). Relationships among the ten genera of Cannabaceae were also fully resolved with high bootstrap support (BS) (Fig. 3). Complete plastome sequences have also been used to successfully resolve intergeneric relationships in many other vascular plants (e.g. Givnish et al., 2015, Qu et al., 2017a, Qu et al., 2017b, Zhang et al., 2017, Wang et al., 2018), and our study provides yet another example. Some previously resolved intrafamilial relationships were strongly supported in this study (Fig. 3): Aphananthe was sister to other genera of Cannabaceae (Song et al., 2001, Sytsma et al., 2002, Van Velzen et al., 2006, Yang et al., 2013); Gironniera, Lozanella and the clade B together formed a monophyletic group (Yang et al., 2013); Chaetachme and Pteroceltis were sisters (Van Velzen et al., 2006, Yang et al., 2013); Cannabis and Humulus were sisters (Song et al., 2001, Song and Li, 2002, Sytsma et al., 2002); Parasponia was nested within Trema (Zavada and Kim, 1996, Sytsma et al., 2002, Yesson et al., 2004, Van Velzen et al., 2006, Yang et al., 2013). However, our study supported some new relationships. Our results show strong support (BS = 100) for a sister relationship between Gironniera and Lozanella. Celtis was strongly supported to be sister of clade A (BS = 100). The Humulus-Cannabis clade and the Trema-Parasponia clade were sisters with strong support (BS = 100). Morphologically, they all have persistent tepals and stigmas. The Chaetachme-Pteroceltis clade was sister to the Humulus-Cannabis-Trema-Parasponia with relatively low support (BS = 80).
Fig. 3

The best maximum likelihood (ML) tree based on RAxML analysis. Bootstrap support values are provided next to each node.

The best maximum likelihood (ML) tree based on RAxML analysis. Bootstrap support values are provided next to each node.

Sequence divergence and phylogenetic informativeness

Sequence alignments and the mVISTA plot (Fig. 4) revealed high sequence similarity among Cannabaceae plastomes. Aligned lengths of 133 coding and 129 noncoding regions ranged from 9 bp (psbF-psbE intergenic spacer) to 6828 bp (ycf2). The number of variable sites ranged from 0 (for 20 loci) to 943 (ycf1), and the number of parsimony-informative sites ranged from 0 (for 26 loci) to 392 (ycf1). Percentages of variable and parsimony-informative sites in coding and noncoding regions are provided in Fig. 5A and B and Table S5. Among coding regions, matK, rps8, rpl22, ndhF and ycf1 had the highest percentages of variable and parsimony-informative sites, with matK having an especially high percentage of variable sites (14.05%) and rpl22 having a high percentage of parsimony-informative sites (6.70%). The percentages of variable sites in noncoding regions ranged from 0 to 28.93% with a mean value of 9.43%, which was nearly twice that of coding regions (5.24% on average). The five noncoding regions with highest percentages of variable sites were trnfM-CAU-rps14, psaI-ycf4, petD-2-rpoA, rpl36-rps8 and rps15-ycf1, with rpl36-rps8 having the highest percentage of variable (28.93%) and parsimony-informative sites (10.85%). The five noncoding regions with highest percentage of parsimony-informative sites were rpl33-rps18, clpP-3-clpP-2, rpoA-rps11, rpl36-rps8 and rps15-ycf1. The proportions of parsimony-informative sites in noncoding regions ranged from 0 to 10.85% with a mean value of 2.99%, which was higher than that of the coding regions (2.19% on average). In IRs, both of the percentages of variable sites and informative sites ranged from 0 to 2.78% with a mean value of 0.88% in coding regions. Among noncoding regions, the percentages of variable sites ranged from 0 to 6.93% with a mean value of 2.65%, which was similar low to the percentages of PIS (0–2.97% and mean of 1.00%). These findings all showed that fewer mutations were observed within IR regions, including coding and non-coding regions, than LSC and SSC regions. Those with no mutations were mostly tRNAs and rrn5, illustrating that tRNAs are more conserved than other genes.
Fig. 4

mVISTA-based identity plot showing sequence identity among Cannabaceae plastomes. Humulus scandens is set as the reference. Coding and noncoding regions are colored in blue and red, respectively.

Fig. 5

Percentages of variable (blue, top line) and parsimony-informative (red, bottom line) sites across coding and non-coding loci. A coding regions; B noncoding regions. Regions are oriented according to their genome locations.

mVISTA-based identity plot showing sequence identity among Cannabaceae plastomes. Humulus scandens is set as the reference. Coding and noncoding regions are colored in blue and red, respectively. Percentages of variable (blue, top line) and parsimony-informative (red, bottom line) sites across coding and non-coding loci. A coding regions; B noncoding regions. Regions are oriented according to their genome locations. Plastomes supply many valuable loci for reconstructing phylogenetic relationships at multiple taxonomic scales. A number of plastid coding and noncoding loci have been used in phylogenetic studies among genera in the same family, including for example atpB, atpB-rbcL, matK, ndhF, rbcL, rpl16, rps4-trnS, rps16, trnH-psbA, trnL-F, and trnS-G (Kim and Jansen, 1995, Gao et al., 2008, Hilu et al., 2008, Wilson, 2009, Peterson et al., 2010). Some plastome regions, such as atpF-H, matK, psbK-I, rbcL, rpoB, rpoC1, trnH-psbA, etc., have been relied upon heavily for development of candidate markers for plant DNA barcoding (Kress et al., 2005, Newmaster et al., 2006, Chase et al., 2007, Hollingsworth et al., 2011, Dong et al., 2012). The fast-evolving loci we identified, such as rpl36-rps8, rpl22, rpl33-rps18, rps15-ycf1, matK and rps8 could be applied to resolve inter- or intraspecific relationships.

Repetitive sequences

Repeat regions are thought to play an important role in genome recombination and rearrangement (Smith, 2002). In this study, a total of 431 repeats were detected across all Cannabaceae plastomes, including 116 dispersed repeats and 314 tandem repeats (Table S6). Among all ten plastomes, T. orientalis had the most repeats (56) and C. sativa had the fewest (29). After excluding overlapped repeats detected by REPuter and accounting for both IR copies, 7 (G. subaequalis) −19 (C. aristata) pairs of dispersed repeats were identified. Plastomes of C. aristata, P. rugosa, and T. orientalis had three repeat types—direct, reverse and palindromic repeats (Fig. 6). Among these, 61% were direct, 33% were palindromic and 6% were reverse. The lengths of repeats ranged from 30 to 55 bp. The total length of dispersed repeats ranged from 541 (G. subaequalis) to 1229 bp (C. aristata), and their proportion of the whole plastome ranged from 0.34% (G. subaequalis) to 0.77% (C. aristata). We detected 20 (C. sativa)–42 (T. orientalis) tandem repeats with a size ≥ 15 bp, of which 184 were 15–20 bp in size, 112 were 21–30 bp, 13 were 31–40 bp, four were 41–50 bp, and one was 61 bp (in A. aspera). The total length of tandem repeats ranged from 950 (H. scandens) to 1727 bp (T. orientalis), and their proportion of the whole plastome ranged from 0.62% (H. scandens) to 1.59% (C. aristata). Across all repeats, most were located in intergenic spacer regions (64%), followed by coding sequences (19%), introns (11%), and tRNAs (6%).
Fig. 6

Analyses of repeated sequences in Cannabaceae plastomes. A Numbers of the three dispersed repeat types; B Numbers of tandem repeats; C Frequency of dispersed repeats by length; D Frequency of tandem repeats by length; E The locations of repeats.

Analyses of repeated sequences in Cannabaceae plastomes. A Numbers of the three dispersed repeat types; B Numbers of tandem repeats; C Frequency of dispersed repeats by length; D Frequency of tandem repeats by length; E The locations of repeats.

Simple sequence repeat (SSR) polymorphisms

SSRs, including mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide repeats, were detected in all plastomes, although hexanucleotide repeats were absent from the plastomes of Celtis blondii, H. scandens, and P. rugosa. (see Table S7 for a comprehensive list of SSRs, including their positions within the plastome). In total, 221, 186, 193, 229, 210, 172, 195, 250, 209 and 228 SSRs were found in the plastomes of Aphananthe spera, C. sativa, Celtis blondii, C. aristata, G. subaequalis, H. scandens, L. enantiophylla, P. rugosa, P. tatarinowii and T. orientalis, respectively. The majority of mononucleotide repeat units were A/T, ranging from 8 to 23 bp in length (Fig. 7; the longest was present in T. orientalis). This finding is consistent with previous observations that cpSSRs are dominated by A/T mononucleotide repeats (Kuang et al., 2011). SSR loci were mainly located within intergenic spacers, followed by coding sequences and introns. Most SSRs were located in the LSC region, followed by the IR and SSC regions. SSRs have been used to understand evolutionary relationships among some closely related plant taxa, and are also effective genetic markers for studying plant breeding, population genetics, biological conservation, mating systems, and uniparental lineages (Terrab et al., 2006, Cardle et al., 2000, Peakall et al., 1998). The SSRs characterized in this study may prove useful for understanding phylogeography and genetic structure of populations.
Fig. 7

The distribution of the simple sequence repeats (SSRs) in Cannabaceae plastomes.

The distribution of the simple sequence repeats (SSRs) in Cannabaceae plastomes.

Conclusion

We reported ten complete plastomes in Cannabaceae using Illumina sequencing technology via a combination of de novo and reference-guided assembly. These plastomes were relatively conserved, but the IR regions in some plastomes experienced small expansions and contractions. Substitution rates were calculated after the genes shifted from the LSC to IR. We investigated the variation of repeat sequences, SSRs, and sequence divergence among the ten complete plastomes. Molecular markers with rapid evolution rates were identified, which may be useful for further phylogenetic analysis and species identification. Phylogenies were constructed using the entire genomes. The availability of these ten plastomes provided valuable genetic information for accurately identifying species, clarifying taxonomy and reconstructing the intergeneric phylogeny of Cannabaceae.
  58 in total

1.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

2.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2006-08-23       Impact factor: 6.937

3.  Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms.

Authors:  Michael J Moore; Charles D Bell; Pamela S Soltis; Douglas E Soltis
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

4.  Urticalean rosids: circumscription, rosid ancestry, and phylogenetics based on rbcL, trnL-F, and ndhF sequences.

Authors:  Kenneth J Sytsma; Jeffery Morawetz; J Chris Pires; Molly Nepokroeff; Elena Conti; Michelle Zjhra; Jocelyn C Hall; Mark W Chase
Journal:  Am J Bot       Date:  2002-09       Impact factor: 3.844

5.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

6.  Variable presence of the inverted repeat and plastome stability in Erodium.

Authors:  John C Blazier; Robert K Jansen; Jeffrey P Mower; Madhu Govindu; Jin Zhang; Mao-Lun Weng; Tracey A Ruhlman
Journal:  Ann Bot       Date:  2016-04-28       Impact factor: 4.357

7.  ndhF sequence evolution and the major clades in the sunflower family.

Authors:  K J Kim; R K Jansen
Journal:  Proc Natl Acad Sci U S A       Date:  1995-10-24       Impact factor: 11.205

8.  The normalization of recreational drug use amongst young people in north-west England.

Authors:  F Measham; R Newcombe; H Parker
Journal:  Br J Sociol       Date:  1994-06

9.  Multiple measures could alleviate long-branch attraction in phylogenomic reconstruction of Cupressoideae (Cupressaceae).

Authors:  Xiao-Jian Qu; Jian-Jun Jin; Shu-Miaw Chaw; De-Zhu Li; Ting-Shuang Yi
Journal:  Sci Rep       Date:  2017-01-25       Impact factor: 4.379

10.  Insights into the Existence of Isomeric Plastomes in Cupressoideae (Cupressaceae).

Authors:  Xiao-Jian Qu; Chung-Shien Wu; Shu-Miaw Chaw; Ting-Shuang Yi
Journal:  Genome Biol Evol       Date:  2017-04-01       Impact factor: 3.416

View more
  7 in total

1.  Long-reads reveal that Rhododendron delavayi plastid genome contains extensive repeat sequences, and recombination exists among plastid genomes of photosynthetic Ericaceae.

Authors:  Huie Li; Qiqiang Guo; Qian Li; Lan Yang
Journal:  PeerJ       Date:  2020-04-22       Impact factor: 2.984

2.  Intraspecific Variation within the Utricularia amethystina Species Morphotypes Based on Chloroplast Genomes.

Authors:  Saura R Silva; Daniel G Pinheiro; Helen A Penha; Bartosz J Płachno; Todd P Michael; Elliott J Meer; Vitor F O Miranda; Alessandro M Varani
Journal:  Int J Mol Sci       Date:  2019-12-05       Impact factor: 5.923

3.  Identification of evolutionary relationships and DNA markers in the medicinally important genus Fritillaria based on chloroplast genomics.

Authors:  Tian Zhang; Sipei Huang; Simin Song; Meng Zou; Tiechui Yang; Weiwei Wang; Jiayu Zhou; Hai Liao
Journal:  PeerJ       Date:  2021-12-16       Impact factor: 2.984

4.  SilicoDArT and SNP markers for genetic diversity and population structure analysis of Trema orientalis; a fodder species.

Authors:  Judith Ssali Nantongo; Juventine Boaz Odoi; Hillary Agaba; Samson Gwali
Journal:  PLoS One       Date:  2022-08-22       Impact factor: 3.752

5.  Cannabinoids from inflorescences fractions of Trema orientalis (L.) Blume (Cannabaceae) against human pathogenic bacteria.

Authors:  Tiwtawat Napiroon; Keerati Tanruean; Pisit Poolprasert; Markus Bacher; Henrik Balslev; Manop Poopath; Wichai Santimaleeworagun
Journal:  PeerJ       Date:  2021-05-13       Impact factor: 2.984

6.  New Insights Into the Plastome Evolution of the Millettioid/Phaseoloid Clade (Papilionoideae, Leguminosae).

Authors:  Oyetola Oyebanji; Rong Zhang; Si-Yun Chen; Ting-Shuang Yi
Journal:  Front Plant Sci       Date:  2020-03-10       Impact factor: 5.753

7.  Generation of a Comprehensive Transcriptome Atlas and Transcriptome Dynamics in Medicinal Cannabis.

Authors:  Shivraj Braich; Rebecca C Baillie; Larry S Jewell; German C Spangenberg; Noel O I Cogan
Journal:  Sci Rep       Date:  2019-11-12       Impact factor: 4.379

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.