| Literature DB >> 27399686 |
Weiwei Zheng1,2, Jinhui Chen3, Zhaodong Hao4, Jisen Shi5.
Abstract
Chinese fir (Cunninghamia lanceolata (Lamb.) Hook) is an important coniferous tree species for timber production, which accounts for ~40% of log supply from plantations in southern China. Chloroplast genetic engineering is an exciting field to engineer several valuable tree traits. In this study, we revisited the published complete Chinese fir (NC_021437) and four other coniferous species chloroplast genome sequence in Taxodiaceae. Comparison of their chloroplast genomes revealed three unique inversions found in the downstream of the gene clusters and evolutionary divergence were found, although overall the chloroplast genomic structure of the Cupressaceae linage was conserved. We also investigated the phylogenetic position of Chinese fir among conifers by examining gene functions, selection forces, substitution rates, and the full chloroplast genome sequence. Consistent with previous molecular systematics analysis, the results provided a well-supported phylogeny framework for the Cupressaceae that strongly confirms the "basal" position of Cunninghamia lanceolata. The structure of the Cunninghamia lanceolata chloroplast genome showed a partial lack of one IR copy, rearrangements clearly occurred and slight evolutionary divergence appeared among the cp genome of C. lanceolata, Taiwania cryptomerioides, Taiwania flousiana, Calocedrus formosana and Cryptomeria japonica. The information from sequence divergence and length variation of genes could be further considered for bioengineering research.Entities:
Keywords: Cunninghamia lanceolata (Lamb.) Hook; chloroplast; coniferous species; phylogeny
Mesh:
Year: 2016 PMID: 27399686 PMCID: PMC4964460 DOI: 10.3390/ijms17071084
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1The Cunninghamia lanceolata sequences (NC_021437) were re-annotated using DOGMA [33]. The complete genome contains 121 genes. The graphical map of C. lanceolata was then generated by OGDRAW [34]. Red arrows indicate new defined genes, including two protein-coding and three rRNA genes.
REPuter [35] was used to locate and count both forward and inverted repeats in the C. lanceolata chloroplast genome. The minimal repeat size was set to 30 bp and the identity of repeats was set to ≥90%. Fifty-one repeats were detected in the Cunninghamia lanceolata chloroplast genome. Most of them are between 10 and 29 bp in length. Repeats longer than 30 bp are listed in the table.
| Repeat Number | Size (bp) | Repeat Unit | Location |
|---|---|---|---|
| 1 | 30 | AAAAAAGAAAAAATCAACACGAGCAGTAAAA(×2) 1 | |
| 2 | 36 | TTGGACGATTTAGAATACGAAACTACATTGGACAAT(×2) | |
| 3 | 132 | AAGTATTATTTTCAATGGAAAAAAGCATTCAAAAGATACTATATTGAATTCATAAAAACATTGAATAAGTATTATTTTGAATGGAAAAAAGTATTATTTTGATTCTGTATTAAATTCATAAAAACATTGAAT(×2) | |
| 4 | 66 | AAGTATTATTTTGAATGGAAAAAAGTATTAAAAGATTCTGTATTGAATTCATAAAAACATTGAAT(×4) | |
| 5 | 94 | TTACGAGCAATAATGAAACAAAACTTGCCAAATACAATGATGACATTATATAATGATACATAGAGATATTGTGTTGCGTTGTTTACAAAACATG(×2) | IGS 3 ( |
| 6 | 104 | CAAAACTTGCCAAATACAATGATGACATTATATAATGATACATAGAGATATTGTGTTGCGTTGTTTACAAAACATGTTACGAGCAATAATGAAACAAAACTTGT(×2) | IGS ( |
| 7 | 119 | ACAAAACTTGACAAAACTTGCCAAATACAATGATGACATTCTATAATGATAAATAGAGATATTGTGTTGCGTTGTTTAAATGTTACGAGCAATAATGAAACAAAACTTGTCAAAACTG(×2) | IGS ( |
| 8 | 185 | GGAAAAACAAAAAGAACAAATTGAAAGAATAAGATGCTTAAAATTGACTAATAATATTTTTTTTAATGCAACAAAAATTATTTTAAATACCACTACCACAGGAGGGATATGATCACCACTTTTGCATTGTCTTGGCTACAAAGATGTAGCCCAATAATATTGTTTGGTTTCTATTATGGTTTTTT(×2) | IGS ( |
| 9 | 30 | GAAAAGAAAAGAGAAAAGAACAAGAAGCAT | |
| 10 | 66 | ATGAATGAGGCAAAGGATACAAAAATAGACTCCATAACTTCGTCTCAAATGGACTCTTTTTGTAGC(×2) | |
| 11 | 44 | TTATTATCTCTTCTAAAATTATTTTGAAAGATCTGATTCAATGG(×2) | |
| 12 | 44 | CTCTTCTAAAATTATTTTGAAAGATCTGATTCAATGGTTATAAC(×2) | |
| 13 | 33 | TTTGTTTCAATATTTTCAGAATCTTTGTTTTCC(×3) |
1 Parenthetical information refers to repeat numbers. For example, (×2) indicates the number of the repeat unit is 2; 2 CDS = coding sequence; 3 IGS = intergenic spacer.
Figure 2The gene content of five samples in Cupressaceae lineages was visually detected and compared by Mauve [40] with default settings. The colored boxes, which are above and below the middle lines, represent DNA sequences in reverse directions. There were three unique inversions found in the downstream of the gene clusters and evolutionary divergence was shown, although overall the chloroplast genome structure appears to be conserved in the Cupressaceae linage based on the selected plants.
Figure 3Comparison of the selection forces (dN/dS) of the 46 common protein-coding genes in the 19-species matrix. The matrix consisted of 19 species including Selaginella moellendorffii and 18 gymnosperms. A, B and C represent different dN/dS ranges groups according to the description in Section 3.6.
Figure 4Comparison of the total nucleotide substitution rates (Ts + Tv) of the 46 common protein-coding genes in the 19-species matrix. The matrix consisted of 19 species including Selaginella moellendorffii and 18 gymnosperms. a, b and c represent Ts + Tv ranges groups according to the description in Section 3.6.
Figure 5Phylogenetic trees based on the different gene functional groups in the 19-species matrix and the 45-species matrix, respectively. I, II and III represent three main categories of functional genes: (I) photosynthetic electron transport and related processes; (II) gene expression; and (III) other genes.
Figure 6Phylogenetic analyses were performed based on the 65 protein-coding sequences in the 45-species matrix using the maximum likelihood (ML) methods implemented in MEGA5 [46] with the best models [47] calculated using the MEGA5 [46] embedded software “Find DNA/Protein Models” and rapid bootstrapping of 1000 replicates.
45 chloroplast genomes selected from Selaginella moellendorffii and almost all orders from the gymnosperms and angiosperms in order to minimize missing data and balance taxon sample.
| NO. | Taxon | Family | Gneus | Accession Number | NO. | Taxon | Family | Gneus | Accession Number | NO. | Taxon | Family | Gneus | Accession Number |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Selaginellaceae | Selaginella | NC_013086 | 16 | Pinaceae | Pinus | NC_001631 | 31 | Calycanthaceae | Calycanthus | NC_004993 | |||
| 2 | Cycadaceae | Cycas | NC_020319 | 17 | Pinaceae | Pinus | NC_021439 | 32 | Magnoliaceae | Liriodendron | NC_008326 | |||
| 3 | Cycadaceae | Cycas | NC_009618 | 18 | Pinaceae | Pinus | NC_021440 | 33 | Magnoliaceae | Magnolia | NC_020318 | |||
| 4 | Ginkgoaceae | Ginkgo | NC_016986 | 19 | Taxaceae | Taxus | NC_020321 | 34 | Piperaceae | Piper | NC_008457 | |||
| 5 | Araucariaceae | Agathis | NC_023119 | 20 | Cucurbitaceae | Cucumis | NC_007144 | 35 | Acoraceae | Acorus | NC_010093 | |||
| 6 | Cephalotaxaceae | Cephalotaxus | NC_016063 | 21 | Fabaceae | Lotus | NC_002694 | 36 | Orchidaceae | Phalaenopsis | NC_007499 | |||
| 7 | Cupressaceae | Calocedrus | NC_023121 | 22 | Fabaceae | Medicago | NC_003119 | 37 | Gramineae | Phyllostachys | NC_016699 | |||
| 8 | Cupressaceae | Cryptomeria | NC_010548 | 23 | Salicaceae | Populus | NC_008235 | 38 | Gramineae | Oryza | NC_001320 | |||
| 9 | Cupressaceae | Cunninghamia | NC_021437 | 24 | Salicaceae | Populus | NC_009143 | 39 | Gramineae | Phyllostachys | NC_015817 | |||
| 10 | Cupressaceae | Taiwania | NC_021441 | 25 | Malvaceae | Gossypium | NC_007944 | 40 | Gramineae | Saccharum | NC_006084 | |||
| 11 | Cupressaceae | Taiwania | NC_016065 | 26 | Myrtaceae | Eucalyptus | NC_008115 | 41 | Gramineae | Triticeae | NC_002762 | |||
| 12 | Pinaceae | Cathaya | NC_014589 | 27 | Ranunculaceae | Ranunculus | NC_008796 | 42 | Gramineae | Zea | NC_001666 | |||
| 13 | Pinaceae | Cedrus | NC_014575 | 28 | Solanaceae | Nicotiana | NC_001879 | 43 | Typhaceae | Typha | NC_013823 | |||
| 14 | Pinaceae | Keteleeria | NC_011930 | 29 | Vitaceae | Vitis | NC_007957 | 44 | Amborellaceae | Amborella | NC_005086 | |||
| 15 | Pinaceae | Picea | NC_021456 | 30 | Winteraceae | Drimys | NC_008456 | 45 | Nymphaeaceae | Nymphaea | NC_006050 |
The 65 protein-coding genes in 45 representative species were extracted from NCBI for construction of the phylogenetic trees [24]. Nucleotides were translated into amino acids using Geneious [59]. Amino acid sequence homologies were aligned by MUSCLE [60]. Aligned genes were concatenated into functional categories [24,66].
| Photosynthetic Electron Transport and Related Processes (I) | Subunits of Photosystem I | |
| Subunits of Photosystem II | ||
| Subunits of Cytochrome | ||
| Subunits of ATP synthase | ||
| Large subunit of Rubisco |
| |
| Gene Expression (II) | DNA dependent RNA polymerase | |
| Small/Large subunits of Ribosome | ||
| Other (III) |
The genes were sorted into categories by the gene functions, average dN/dS and Ts + Tv values among lineages. The phylogenetic analyses were performed according to these gene groups in order to determining whether the gene function, selection force and nucleotide substitution rate impact phylogenetic estimation [41].
| Category | Category ID | Fields |
|---|---|---|
| gene function | I | Photosynthetic Electron Transport and Related Processes |
| II | Gene Expression | |
| III | Other | |
| selection force (dN/dS) | A | dN/dS ≤ 0.25 |
| B | 0.25 < dN/dS ≤ 0.5 | |
| C | 0.5 < dN/dS | |
| substitution rate (Ts + Tv) | a | Ts + Tv ≤ 0.25 |
| b | 0.25 < Ts + Tv ≤ 0.5 | |
| c | 0.5 < Ts + Tv |