Literature DB >> 27843724

Complete chloroplast genome of a valuable medicinal plant, Huperzia serrata (Lycopodiaceae), and comparison with its congener.

Zhi-You Guo1, Hong-Rui Zhang2, Nawal Shrestha3, Xian-Chun Zhang3.   

Abstract

PREMISE OF THE STUDY: Here we report the complete chloroplast genome of the important medicinal species Huperzia serrata (Lycopodiaceae) and compare it to the chloroplast genome of the congeneric species H. lucidula. METHODS AND
RESULTS: The whole chloroplast genome of H. serrata was sequenced using an Illumina platform and assembled with Geneious version R9.0.5. The genome size of H. serrata was 154,176 bp, with 36.3% GC content. The complete chloroplast genome contained 120 unique genes, including 86 coding genes, four rRNA genes, and 30 tRNA genes. Comparison with the chloroplast genome of H. lucidula revealed three highly variable regions (rps16-chlB, ycf12-trnR, and ycf1) between these two species and 252 mutation events including 27 insertion/deletion polymorphisms and 225 single-nucleotide polymorphisms (SNPs). Ninety-two SNPs were identified in the gene-coding regions. In addition, 18 microsatellite sites were found, which can potentially be used in phylogeographic studies.
CONCLUSIONS: The complete chloroplast genome of H. serrata is reported here, and will be a valuable genome resource for further phylogenetic, evolutionary, and medical studies of medicinal plants in the genus Huperzia.

Entities:  

Keywords:  Huperzia serrata; Lycopodiaceae; lycophytes; mutation; next-generation sequencing

Year:  2016        PMID: 27843724      PMCID: PMC5104525          DOI: 10.3732/apps.1600071

Source DB:  PubMed          Journal:  Appl Plant Sci        ISSN: 2168-0450            Impact factor:   1.936


The structure of chloroplast genomes in land plants is generally highly conserved in terms of gene order, organization, and content, which makes them suitable for characterizing genetic relationships among species (Bock, 2007). Portions of these genomes have also been widely used by many plant taxonomists as effective DNA barcoding tools. Most of the chloroplast genomes of land plants have a pair of inverted repeats (IRs), separated by one large single copy region (LSC) and one small single copy region (SSC) (Jansen et al., 2005). However, variations occur in certain lineages, and these variations have been proven to be useful in identifying some critical events during the evolution of land plants (Dong et al., 2014; Song et al., 2015). One typical example is the 30-Kb inversion (from trnC to ycf2) detected in the LSC region from bryophytes and lycophytes to other land plants, supporting the hypothesis that lycophytes are a sister clade to all other extant vascular plants (Raubeson and Jansen, 1992). Compared with those on seed plants, studies on chloroplast genomes of ferns and lycophytes have been relatively sparse (Lu et al., 2015). The North American firmoss Huperzia lucidula (Michx.) Trevis. (Lycopodiaceae) was the first lycophyte species with a complete chloroplast genome sequence (Wolf et al., 2005; GenBank accession no. NC_006861). Because H. lucidula belongs to a significant sister clade of all extant vascular plants, sequencing its complete chloroplast genome facilitates the exploration of the relationships between lycophytes and other vascular plants. Both the rearrangement structure of the chloroplast genome and the phylogenomic analyses of 73 protein-coding genes supported the hypothesis that lycophytes were a sister to both extant fern and seed plant lineages (Wolf et al., 2005). However, the phylogenetic relationships within this family and particularly within the genus Huperzia Bernh. (ca. 55 species) are still unclear because of insufficient phylogenetic data (Zhang and Iwatsuki, 2013). Here we describe the complete chloroplast genome sequence of a valuable species (H. serrata (Thunb.) Trevis.) within this genus and compare it to existing chloroplast genome data of H. lucidula to better understand the mutation patterns in chloroplast genomes of Huperzia. Both H. lucidula and H. serrata belong to Huperzia sect. Serratae (Rothm.) Holub and form a clade based on matK sequences showing a close phylogenetic relationship (Zhang, 2004; Ji et al., 2007). Furthermore, H. serrata is an important medicinal plant containing huperzine A, which several studies have found to be effective in the treatment of Alzheimer’s disease (Tang, 1996; Wang et al., 1998; Guo et al., 2005). Thus, this draft genome may not only facilitate investigations into genetic variation but also elucidate relationships within the genus to guide further exploration of compounds in closely related species.

METHODS AND RESULTS

Specimens of H. serrata were collected from Helong, Jilin Province, northeastern China. A voucher specimen (X. C. Zhang 6972) has been deposited in the Herbarium of the Institute of Botany, Chinese Academy of Sciences (PE). Total DNA was extracted with a modified cetyltrimethylammonium bromide (CTAB) method (Li et al., 2013). The DNAs were sheared into ∼350-bp fragments using the Covaris M220 focused-ultrasonicator (Covaris, Woburn, Massachusetts, USA). The NEBNext DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, Massachusetts, USA) was used for library construction. Paired-end reads of 2 × 150 bp then were generated using an Illumina HiSeq PE150 (Illumina, San Diego, California, USA). A total of 9,391,796 paired-end sequence reads of 150 bp were generated, of which 406,164 reads belong to the chloroplast genome. The chloroplast genome data were extracted using H. lucidula as a reference and assembled de novo with Geneious version R9.0.5 (Kearse et al., 2012). The first de novo assembly generated eight contigs, and the eight contigs were then extended by mapping raw reads to the contigs several times until all contigs were merged into one whole sequence of 138,841 bp. The four ends of IR regions were located through BLAST with the whole sequence itself to assemble into the complete chloroplast genome sequence. The annotation of all the genes encoding proteins, tRNAs, and rRNAs was constructed with Dual Organellar GenoMe Annotator (DOGMA; Wyman et al., 2004) and was uploaded to GenBank. The tRNAs were further verified using tRNAscan-SE version 1.21 (Lowe and Eddy, 1997; Schattner et al., 2005). The genome map was drawn with OGDraw version 1.2 (Lohse et al., 2007). The chloroplast genome sequence of H. lucidula was downloaded from GenBank and aligned with H. serrata using MAFFT version 7 (Katoh and Standley, 2013). A sliding window analysis was conducted with DnaSP version 5.1 (Librado and Rozas, 2009) to evaluate the genetic diversity (π) across whole genomes within the genus Huperzia. The window length was set to 600 bp with a 200-bp step size based on the proposed length of DNA barcoding regions (Song et al., 2015). DnaSP was also used for identifying insertion/deletion polymorphisms (indels) with the chloroplast genome of H. lucidula as a reference. A custom Python script (https://www.biostars.org/p/119214/) based on single-nucleotide polymorphism (SNP) definition (a variation in a single nucleotide that occurs at a specific position in the genome) was employed to call SNPs. The SNPs in coding regions were classified in two ways: synonymous and nonsynonymous; transition and transversion. The simple sequence repeats (SSRs) in H. serrata were detected using NWISRL-Imperfect SSR Finder version 1.0 (Stieneke and Eujayl, 2007). The repeats unit length was set to two to nine base pairs with at least five copies for dinucleotide and four copies for other multinucleotide repeats. The complete genome sequence of H. serrata (GenBank accession no. KX426071) was 154,176 bp long, 197 bp shorter than that of H. lucidula (154,373 bp; GenBank accession no. NC_006861). Both genomes had GC content of 36.3% (Table 1). A pair of IRs of 30,438 bp was separated by an LSC and a SSC of 104,080 bp and 19,658 bp, respectively, in H. serrata. The complete chloroplast genome contained 120 putative unique genes, including 86 coding genes, four rRNA genes, and 30 tRNA genes. The gene map of H. serrata is shown in Fig. 1. Based on our preliminary analysis, we found that 15 genes have one predicted intron (10 coding genes and five tRNA genes) and two coding genes have two introns (clpP and ycf3). Compared with H. lucidula, we found that the gene order and features are almost identical in genomes of H. serrata. Because comparisons between H. lucidula and other land plants have already been conducted in previous studies (Wolf et al., 2005), we did not repeat the work again. However, some unusual features also existed: an extra tRNA trnI-GAU between rrn16 and trnA-UGC in the IR region, and an intron within ycf66 in the LSC region were first annotated in H. serrata. These three genes were also annotated in the chloroplast genome sequence of another lycophyte plant, Isoetes flaccida A. Braun (GenBank accession no. NC_014675) (Karol et al., 2010). Similar to H. lucidula, nine predicted protein-coding genes (ndhJ, atpI, chlL, ndhH, ccsA, rpl36, ycf1, rps15, ndhD) lack their canonical start codons and/or stop codons at the expected positions. A triplet ACG, which is changed into a start codon by C to U RNA editing, and another triplet CAA, which is changed into a stop codon by C to U RNA editing, appear in the position of the expected start codon and stop codon, respectively (Tsuji et al., 2007). Furthermore, rps16 has two internal stop codons in the chloroplast genome and is therefore considered to be a pseudogene, but the chloroplast transcriptome evidence is needed to prove this hypothesis (Oldenkott et al., 2014).
Table 1.

Summary of Huperzia serrata and H. lucidula chloroplast features.

FeatureH. lucidulaH. serrata
Total cpDNA size154,373154,176
 LSC104,088104,080
 SSC19,65719,658
 IR30,62830,438
Total GC content (%)36.336.3
 LSC34.434.4
 SSC32.832.8
 IR44.945.0
Total no. of genes119120
 Protein encoding8686
 tRNA2930
 rRNA44

Note: IR = inverted repeat; LSC = large single copy; SSC = small single copy.

Fig. 1.

Gene map of the Huperzia serrata chloroplast reference genome. Genes outside of the outer circle are transcribed clockwise, whereas genes inside the outer circle are transcribed counterclockwise. The colored bars indicate different functional groups. The dashed darker gray area in the inner circle denotes GC content while the lighter gray area shows the AT content of the genome. IR = inverted repeat; LSC = large single copy; SSC = small single copy.

Summary of Huperzia serrata and H. lucidula chloroplast features. Note: IR = inverted repeat; LSC = large single copy; SSC = small single copy. Gene map of the Huperzia serrata chloroplast reference genome. Genes outside of the outer circle are transcribed clockwise, whereas genes inside the outer circle are transcribed counterclockwise. The colored bars indicate different functional groups. The dashed darker gray area in the inner circle denotes GC content while the lighter gray area shows the AT content of the genome. IR = inverted repeat; LSC = large single copy; SSC = small single copy. The nucleotide variability (π) of the aligned genome sequences of H. lucidula and H. serrata was calculated with DnaSP version 5.1 to explore the level of sequence divergence. The value varied from 0 to 0.1 with an average of 0.00143, showing that divergence between the genomes of these closely related species is small. However, three highly variable regions—rps16-chlB, ycf12-trnR, and ycf1—were located (Fig. 2). Only ycf1 is in the SSC region; the other two loci were in the LSC region. None of these highly variable regions have been employed in previous phylogenetic analyses of ferns and lycophytes (Kuo et al., 2011; Li et al., 2011). Based on these results, we infer that ycf1, ycf12-trnR, and rps16-chlB (π > 0.008) could be suitable for phylogenetic analyses at the species level.
Fig. 2.

Sliding window analysis of the whole chloroplast genomes of Huperzia serrata and H. lucidula. Window length = 600 bp; step size = 200 bp; x-axis = position of the midpoint of a window; y-axis = value of π of each window.

Sliding window analysis of the whole chloroplast genomes of Huperzia serrata and H. lucidula. Window length = 600 bp; step size = 200 bp; x-axis = position of the midpoint of a window; y-axis = value of π of each window. Eighteen potential SSR motifs were found, and most were located in the intergenic regions of the LSC region (Table 2). Only three types were identified, with the majority belonging to di- and trinucleotide motifs. ACT/TCT and AAT/TAA/TAG/TAT motifs were found among trinucleotide SSRs while only AT/TA motifs were identified for the dinucleotide motifs. Twenty-seven indels were revealed in the comparison between the chloroplast genome sequences of H. serrata and H. lucidula. Most indels ranged from one to nine base pairs in size and were located in noncoding regions, while three indels occurred within the coding region of the rpoC2 gene, with lengths of 24 bp, 30 bp, and 126 bp, respectively (Table 3). The three indels are all deletions in rpoC2 of H. serrata. Ninety-two SNPs, including 75 transitions and 17 transversions in gene-coding regions, and 133 SNPs, including 88 transitions and 45 transversions in noncoding regions, were detected (Table 4). Among gene-coding regions, 36 synonymous and 56 nonsynonymous substitution sites existed in the whole genomes. Thirty-seven out of 86 coding genes have nonsynonymous substitution sites. Among these genes, rpoC2, ycf1, and ycf2 have the most nonsynonymous substitution sites, showing that these three genes may have relatively fast rates of evolution and can be used in phylogenetic analyses.
Table 2.

Location of simple sequence repeats in Huperzia serrata.

No.StartEndLocationRegionMotifNo. of repeats
111,24211,255petBIntronTA7
240,48840,497trnF-trnLIntergenicTA5
343,48343,492rps4-trnSIntergenicAT5
456,43756,454psbD-trnEIntergenicAT9
570,50970,522trnK-rps16IntergenicAT7
674,09374,104trnQ-psbKIntergenicTA6
784,16184,178atpI-rps2IntergenicTA9
885,14785,161rps2-rpoC2IntergenicTAG5
987,84387,878rpoC2CDSTGCTTCATC4
1093,01793,031rpoC1CDSTCT5
1197,44497,457trnC-petNIntergenicTA7
1299,07499,097psbM-trnLIntergenicTAT8
1399,93899,947trnL-ndhBIntergenicTA5
14100,289100,298trnL-ndhBIntergenicAT5
15121,906121,917chlN-ycf1IntergenicTAA4
16127,339127,350ycf1-rps15IntergenicACT4
17130,250130,261ndhAIntronAAT4
18138,671138,685rpl21-ndhFIntergenicTAA5

Note: CDS = coding DNA sequence.

Table 3.

Location of indels in the genomes of Huperzia serrata and H. lucidula.

No.PositionLocationRegionMotifSize (bp)Directiona
15862rpl14-rps8IntergenicA1Insertion
27064rpl36-rps11IntergenicG1Insertion
39953petDIntronA1Insertion
420,555–20,556rpl20-rps18IntergenicGG2Insertion
530,794psaI-accDIntergenicT1Deletion
642,082–42,083trnL-UAAExonCC2Insertion
742,758–42,759trnT-rps4IntergenicCC2Deletion
844,361–44,363trnS-ycf3IntergenicGGG3Deletion
944,986–44,989ycf3IntronTTC3Insertion
1051,669–51,673psaB-rps14IntergenicGGGGG5Deletion
1151,749psaB-rps14IntergenicG1Deletion
1258,365psbD-trnEIntergenicT1Insertion
1373,875chlB-trnQIntergenicT1Insertion
1475,019psbK-psbIIntergenicA1Insertion
1576,144psaM-ycf12IntergenicA1Deletion
1677,128ycf12-trnRIntergenicA1Insertion
1777,497–77,505ycf12-trnRIntergenicCGTAGTATT9Deletion
1878,117ycf12-trnRIntergenicG1Deletion
1981,577atpFIntronA1Deletion
2085,338–85,361rpoC2CDSTCGGTTGCTTCACCAACAGTTTCC24Deletion
2185,557–85,586rpoC2CDSTTATCACTAGTTTCTTCATCACTAGTTTCT30Deletion
2288,781–88,906rpoC2CDSTTCAAATTCTGTCTGATCTTCTTCTAAAGAAGAATCAAATGATTCAAATTCTGTCTGATCTTCTTCTAAAGAAGAATCAAATGATTCAAATTCTGTCTGATCTTCTTCTAAAGAAGAATCAAATGA126Deletion
2392,802–92,803rpoC1IntronTT2Insertion
2499,278–99,283psbMCDSTATTAT6Deletion
25100,234–100,235trnL-ndhBIntergenicCC2Deletion
26104,210rps7-rps12IntergenicT1Deletion
27122,056chlN-ycf1IntergenicA1Insertion

Note: CDS = coding DNA sequence.

The plastome of Huperzia lucidula was used as a reference.

Table 4.

Comparisons of mutations, number of transitions (Ts) and transversions (Tv), and number of synonymous (S) and nonsynonymous (N) substitutions per gene of Huperzia serrata and H. lucidula.

Gene typeGeneTsTvSN
Photosynthetic apparatuspetB1001
petD1001
petN1010
psaA0110
psaB3012
psbB3012
psbD1010
Photosynthetic metabolismatpA1001
atpB3012
atpE1010
atpH2020
atpI2002
ndhA0101
ndhB1001
ndhC1010
ndhF2204
ndhG1001
ndhH1010
ndhK1010
rbcL2020
Gene expressionrpl211010
rpoB5023
rpoC12020
rpoC26446
rps111010
rps121010
rps72002
rps81010
accD2002
clpP1001
matK1102
Other geneschlB2020
chlL2002
chlN2002
ycf1133610
ycf101001
ycf24527
Total75173656
Location of simple sequence repeats in Huperzia serrata. Note: CDS = coding DNA sequence. Location of indels in the genomes of Huperzia serrata and H. lucidula. Note: CDS = coding DNA sequence. The plastome of Huperzia lucidula was used as a reference. Comparisons of mutations, number of transitions (Ts) and transversions (Tv), and number of synonymous (S) and nonsynonymous (N) substitutions per gene of Huperzia serrata and H. lucidula.

CONCLUSIONS

Here we report the complete chloroplast genome sequence of H. serrata, an important and widely distributed medicinal plant. Availability of this chloroplast genome sequence and the existing H. lucidula chloroplast genome sequence enable us to evaluate the genome-wide mutational events within the genus Huperzia. The genome arrangement, gene order, gene size, and GC content of H. serrata and H. lucidula are almost identical. Three divergence hotspots (rps16-chlB, ycf12-trnR, and ycf1), 18 SSRs, 27 indels, and 225 SNPs across the whole genome were identified and could provide useful phylogenetic and phylogeographic information for closely related species. Moreover, conserved primers could be designed for the highly variable regions in Huperzia based on these two complete chloroplast genomes.
  20 in total

1.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

2.  The first complete chloroplast genome sequence of a lycophyte, Huperzia lucidula (Lycopodiaceae).

Authors:  Paul G Wolf; Kenneth G Karol; Dina F Mandoli; Jennifer Kuehl; K Arumuganathan; Mark W Ellis; Brent D Mishler; Dean G Kelch; Richard G Olmstead; Jeffrey L Boore
Journal:  Gene       Date:  2005-03-19       Impact factor: 3.688

3.  [FTIR spectra-principal component analysis of phenetic relationships of Huperzia serrata and its closely related species].

Authors:  Shui-Liang Guo; Pei-Ling Li; Fang Fang; Hua Huang; Cun-Gui Cheng
Journal:  Guang Pu Xue Yu Guang Pu Fen Xi       Date:  2005-05       Impact factor: 0.589

4.  First insights into fern matK phylogeny.

Authors:  Li-Yaung Kuo; Fay-Wei Li; Wen-Liang Chiou; Chun-Neng Wang
Journal:  Mol Phylogenet Evol       Date:  2011-03-21       Impact factor: 4.286

5.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

6.  [Phylogeny relationship and molecular identification of ten Huperzia species (Huperziaceae) based on matK gene sequences].

Authors:  Sheng-Guo Ji; Sheng-Li Pan; Jun Wang; Ke-Ke Huo
Journal:  Zhongguo Zhong Yao Za Zhi       Date:  2007-10

7.  Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants.

Authors:  L A Raubeson; R K Jansen
Journal:  Science       Date:  1992-03-27       Impact factor: 47.728

8.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

9.  Chloroplast RNA editing going extreme: more than 3400 events of C-to-U editing in the chloroplast transcriptome of the lycophyte Selaginella uncinata.

Authors:  Bastian Oldenkott; Kazuo Yamaguchi; Sumika Tsuji-Tsukinoki; Nils Knie; Volker Knoop
Journal:  RNA       Date:  2014-08-20       Impact factor: 4.942

10.  A chloroplast genomic strategy for designing taxon specific DNA mini-barcodes: a case study on ginsengs.

Authors:  Wenpan Dong; Han Liu; Chao Xu; Yunjuan Zuo; Zhongjian Chen; Shiliang Zhou
Journal:  BMC Genet       Date:  2014-12-20       Impact factor: 2.797

View more
  5 in total

1.  The First Chloroplast Genome Sequence of Boswellia sacra, a Resin-Producing Plant in Oman.

Authors:  Abdul Latif Khan; Ahmed Al-Harrasi; Sajjad Asaf; Chang Eon Park; Gun-Seok Park; Abdur Rahim Khan; In-Jung Lee; Ahmed Al-Rawahi; Jae-Ho Shin
Journal:  PLoS One       Date:  2017-01-13       Impact factor: 3.240

2.  Directed Repeats Co-occur with Few Short-Dispersed Repeats in Plastid Genome of a Spikemoss, Selaginella vardei (Selaginellaceae, Lycopodiopsida).

Authors:  Hong-Rui Zhang; Xian-Chun Zhang; Qiao-Ping Xiang
Journal:  BMC Genomics       Date:  2019-06-11       Impact factor: 3.969

3.  The Unique Evolutionary Trajectory and Dynamic Conformations of DR and IR/DR-Coexisting Plastomes of the Early Vascular Plant Selaginellaceae (Lycophyte).

Authors:  Hong-Rui Zhang; Qiao-Ping Xiang; Xian-Chun Zhang
Journal:  Genome Biol Evol       Date:  2019-04-01       Impact factor: 3.416

4.  Plastid Genomes of the Early Vascular Plant Genus Selaginella Have Unusual Direct Repeat Structures and Drastically Reduced Gene Numbers.

Authors:  Hyeonah Shim; Hyeon Ju Lee; Junki Lee; Hyun-Oh Lee; Jong-Hwa Kim; Tae-Jin Yang; Nam-Soo Kim
Journal:  Int J Mol Sci       Date:  2021-01-11       Impact factor: 5.923

5.  Integrated analysis of three newly sequenced fern chloroplast genomes: Genome structure and comparative analysis.

Authors:  Ruifeng Fan; Wei Ma; Shilei Liu; Qingyang Huang
Journal:  Ecol Evol       Date:  2021-03-18       Impact factor: 2.912

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.