Literature DB >> 35136556

Loss of the IR region in conifer plastomes: Changes in the selection pressure and substitution rate of protein-coding genes.

Jingyao Ping1, Jing Hao1, Jinye Li1, Yiqing Yang2, Yingjuan Su3,4, Ting Wang1.   

Abstract

Plastid genomes (plastomes) have a quadripartite structure, but some species have drastically reduced or lost inverted repeat (IR) regions. IR regions are important for genome stability and the evolution rate. In the evolutionary process of gymnosperms, the typical IRs of conifers were lost, possibly affecting the evolutionary rate and selection pressure of genomic protein-coding genes. In this study, we selected 78 gymnosperm species (51 genera, 13 families) for evolutionary analysis. The selection pressure analysis results showed that negative selection effects were detected in all 50 common genes. Among them, six genes in conifers had higher ω values than non-conifers, and 12 genes had lower ω values. The evolutionary rate analysis results showed that 9 of 50 common genes differed between conifers and non-conifers. It is more obvious that in non-conifers, the rates of psbA (trst, trsv, ratio, dN, dS, and ω) were 2.6- to 3.1-fold of conifers. In conifers, trsv, ratio, dN, dS, and ω of ycf2 were 1.2- to 3.6-fold of non-conifers. In addition, the evolution rate of ycf2 in the IR was significantly reduced. psbA is undergoing dynamic change, with an abnormally high evolution rate as a small portion of it enters the IR region. Although conifers have lost the typical IR regions, we detected no change in the substitution rate or selection pressure of most protein-coding genes due to gene function, plant habitat, or newly acquired IRs.
© 2022 The Authors. Ecology and Evolution published by John Wiley & Sons Ltd.

Entities:  

Keywords:  conifer; gymnosperm; inverted repeat region; rate heterogeneity; selection pressure

Year:  2022        PMID: 35136556      PMCID: PMC8809450          DOI: 10.1002/ece3.8499

Source DB:  PubMed          Journal:  Ecol Evol        ISSN: 2045-7758            Impact factor:   2.912


INTRODUCTION

The plastid genomes (plastomes) of most land plants have a highly conserved quadripartite structure, consisting of a large single‐copy (LSC) region and a small single‐copy (SSC) region separated by a pair of inverted repeat (IR) regions (Wicke et al., 2011). Due to selective pressure on photosynthesis‐related elements, plastids have highly conserved gene content and order (Ruhlman & Jansen, 2014). And due to their highly conservation, a large copy number, lack of recombination, and uniparental inheritance, plastomes have been used to evaluate phylogeographic relationships, phylogeographic histories, and evolutionary events (Barrett et al., 2016; Julian et al., 2017; Moore et al., 2006; Pacheco et al., 2019; Shaw et al., 2014; Wang et al., 2020). Inverted repeat regions are important in replication initiation (Heinhorst & Cannon, 1993), genomic stability (Maréchal & Brisson, 2010; Palmer & Thompson, 1982), and gene conservation (Palmer & Thompson, 1982; Wolfe et al., 1987). IR changes alter the size of the plastid genome (Kwon et al., 2020). From green algae to angiosperms, IR regions typically contain at least four rRNAs and five tRNAs (Mower & Vickrey, 2018). However, the IR region, as a hot spot of structural rearrangement, often contracts, expands, or is lost. In related species, the boundary of the IR region changes little, resulting in the gain or loss of a small number of genes (Downie & Jansen, 2015; Li et al., 2016; Ping, Li, et al., 2021; Wicke et al., 2014). The IR region of Pelargonium, Psilotum, and Trochodendraceae has undergone large‐scale expansion, gaining a large number of genes from the SC region (Chumley et al., 2006; Grewe et al., 2013; Sun et al., 2013). The IR region is absent in the plastomes of some plant lineages, such as prasinophyceans (Lemieux et al., 2014), trebouxiophyceans (Turmel et al., 2015), streptophytes (Lemieux et al., 2016), Ulvophyceae (Cai et al., 2017), and Euglenaceae (Karnkowska et al., 2018) among algae; Pinaceae, Cupressophytes (Li, Gao, et al., 2016; Wu, Lin, et al., 2011; Wu, Wang, et al., 2011), and Taxaceae (Zhang et al., 2014) among gymnosperms; and the putranjivoid clade of Malpighiales (Jin et al., 2020), Tahina spectabilis (Arecaceae) (Barrett et al., 2016), legumes (Palmer & Thompson, 1981, 1982), Carnegiea gigantea (Cactaceae) (Sanderson et al., 2015), and Erodium species (Geraniaceae) (Blazier et al., 2016; Guisinger et al., 2011; Ruhlman et al., 2017) among angiosperms. The sequence substitution rate in the IR region differs from that in the SC region. In some legumes with missing IR regions, the synonymous substitution rate of genes entering the SC region from the IR region is similar to that of those already in the SC region (Perry & Wolfe, 2002). The IR region has a low substitution rate in Cycas (Wu & Chaw, 2015). Zhu et al. (2016) found that the synonymous substitution rate of genes in the SC region in vascular plants (angiosperms, gymnosperms, and ferns) is 3.7‐fold that of genes in the IR region; after the transfer of genes between the SC and IR regions, the substitution rate becomes consistent. Some genes in the IR region have high synonymous substitution rates in some species (Pelargonium, Plantago, and Silene). In ferns, psbA, rps7, 3'‐rps12, and ycf2 showed a reduced substitution rate and increased GC content after entering the IR region (Li, Kuo, et al., 2016). Recently, Ping, Feng, et al. (2021), Ping, Li, et al. (2021) reported that the substitution rate of 3'‐rps12 in the IR region is lower than that of 5'‐rps12 in the SC region in ferns and gymnosperms. The IR region has important effects on genome stability and the evolutionary rate (Maréchal & Brisson, 2010; Palmer & Thompson, 1982; Wolfe et al., 1987). During gymnosperm evolution, conifers lost the IR region. Araucaria cunninghamii (Araucariaceae) is native to southeastern coastal areas of Oceania and is the main afforestation tree species in tropical and subtropical regions. It is one of the top five park tree species worldwide and an important garden ornamental plant, and it is cultivated widely in China. Callitropsis funebris (Cupressaceae) is endemic to China and distributed widely; it grows rapidly and has a wide range of uses and strong adaptability. Its phylogenetic classification is a focus of research (Zheng & Fu, 1978). As these two species are important representatives of their genera, we sequenced their plastomes. In this study, we selected 78 gymnosperms and 2 Polypodiales (outgroup) and analyzed selection pressure and the evolutionary rate using the maximum likelihood method. We investigated whether the selection pressure and evolutionary rate of protein‐coding genes in conifers differed according to the presence of the IR region; whether the selection pressure and evolutionary rate of genes that enter the IR region differ from those that enter the SC region; and the heterogeneity of gymnosperm evolutionary rates.

MATERIALS AND METHODS

Sequencing and sequence preparation

Fresh leaves of Callitropsis funebris (E113°35’, N23°15’) and Araucaria cunninghamii (E113°200’, N23°90’) were taken from the campus of South China Agricultural University. Specimens were stored in Herbarium of the College of Life Sciences, SCAU (specimen no.: PJY‐NYS1910 and PJY‐BM1910). Plastid genomic DNA was extracted using the DNASECure Plant Genomic DNA Extraction Kit (Tiangen) and double‐ended sequencing was performed on Illumina HiSeq2500 platform. Clean reads filtered by Trimmomatic V0.32 (Bolger et al., 2014) were spliced and assembled in Velvet V1.2.03 (Zerbino & Birney, 2008). The genes were predicted by the DOGMA Program (Wyman et al., 2004) and the plastome was mapped using the online site OGDRAW v1.3 (https://chlorobox.mpimp‐golm.mpg.de/OGDraw.html) after gene annotation (Greiner et al., 2019). Finally, the sequence information was submitted on the Banklt Platform (https://www.ncbi.nlm.nih.gov/WebSub/), and Genbank accession no. is MT227812 and MT227813, respectively. In addition, 76 gymnosperms and 2 Polypodiales (as outgroups) were downloaded from the NCBI database. A total of 78 gymnosperms cover 13 families and 51 genera (Table 1). Genious prime 2020 (Kearse et al., 2012) software was used to extract 50 protein‐coding genes. And performed sequence alignment and correction through the ClustalW (codons) module in MEGA X (Kumar et al., 2018).
TABLE 1

The Information of sampled species

OrderFamilySpecies nameGenBank accession no.Species nameGenBank accession no.
CycadalesCycadaceae Cycas revoluta NC_020319 Cycas szechuanensis NC_042668
Cycas panzhihuaensis NC_031413 Cycas taitungensis NC_009618
Boweniaceae Bowenia serrulata NC_026036
Zamiaceae Stangeria eriopus NC_026041 Zamia furfuracea NC_026040
Ceratozamia hildae NC_026037 Encephalartos lehmannii NC_027514
Dioon spinulosum NC_027512 Lepidozamia peroffskyana NC_027513
Macrozamia mountperriensis NC_027511
GinkgoalesGinkgoaceae Ginkgo biloba NC_016986
GnetalesGnetaceae Gnetum montanum NC_021438 Gnetum ula NC_028734
Gnetum parvifolium NC_011942 Gnetum gnemon NC_026301
EphedralesEphedraceae Ephedra equisetina NC_011954 Ephedra intermedia NC_044772
Ephedra foeminea NC_029347 Ephedra sinica NC_044773
WelwitschialesWelwitschiaceae Welwitschia mirabilis EU342371
CupressalesCupressaceae Cryptomeria japonica NC_010548 Cupressus tonkinensis NC_039562
Taiwania cryptomerioides NC_016065 Cupressus gigantea NC_028155
Taiwania flousiana NC_021441 Cupressus sempervirens NC_026296
Cunninghamia lanceolata NC_021437 Callitropsis funebris MT227813
Juniperus monosperma NC_024022 Callitropsis nootkatensis KP099642
Juniperus recurva NC_042763 Callitropsis vietnamensis KX832629
Taxodium distichum NC_034941 Hesperocyparis lusitanica MH121051
Taxodium mucronatum NC_045277 Chamaecyparis formosensis NC_034943
Calocedrus formosana NC_023121 Chamaecyparis hodginsii NC_036996
Glyptostrobus pensilis NC_031354 Thuja occidentalis NC_042177
Metasequoia glyptostroboides NC_027423 Thuja sutchuenensis NC_042176
Callitris rhomboidea NC_034940
Taxaceae Cephalotaxus sinensis MF977938 Taxus fuana NC_038099
Cephalotaxus oliveri NC_021110 Pseudotaxus chienii NC_041503
Amentotaxus argotaenia NC_027581 Torreya fargesii NC_029398
Amentotaxus formosana NC_024945
Sciadopityaceae Sciadopitys verticillata NC_029734
PinalesPinaceae Cedrus deodara NC_014575 Picea neoveitchii NC_043913
Pinus massoniana MF564195 Keteleeria davidiana NC_011930
Pinus yunnanensis NC_043856 Tsuga chinensis NC_030630
Pseudolarix amabilis NC_030631 Larix sibirica NC_036811
Pseudotsuga sinensis NC_016064 Larix decidua NC_016058
Abies fargesii NC_042775 Abies fanjingshanensis NC_042777
AraucarialesPodocarpaceae Nageia nagi NC_023120 Retrophyllum piresii NC_024827
Podocarpus lambertii NC_023805 Dacrydium elatum NC_045880
Dacrycarpus imbricatus NC_034942 Manoao colensoi NC_044893
Araucariaceae Agathis dammara NC_023119 Araucaria heterophylla NC_026450
Wollemia nobilis KP259800 Araucaria araucana NC_045394
Araucaria cunninghamii MT227812 Araucaria bidwillii NC_045395
Araucaria angustifolia NC_039155
PolypodialesPolypodiaceae Lepisorus clathratus NC_035739
Athyriaceae Athyrium anisopterum NC_035738
The Information of sampled species

Analysis of phylogenetic relationships

We constructed the phylogenetic relationships based on the common gene dataset (the two Polypodiales as the outgroup). MEGA X and PAUP4.0 (Swofford, 2002) were used to construct the NJ (neighbor‐joining) tree and MP (maximum‐parsimony) tree, respectively. RaxmlGUI2 software built ML (maximum‐likelihood) tree with GTRGAMMAI substitution model and 1,000 bootstrap (Stamatakis, 2014). BI (Bayesian inference) tree was built with MrBayes 3.2.6 software (Huelsenbeck & Ronquist, 2001). We reconstructed a background tree combined with the published phylogenetic relationship to analyze selection pressure and evolution rate, finally.

Analysis of selection pressure

Codeml program in PAML4.9 was used to analyze the selection pressure (Yang, 2004). The M0 (one‐ration) model assumes that each branch has the same value of ω under the Branch model. The Model 2 (two‐ratio) assumes the foreground branch and the background branch have different ω values. The likelihood ratio test of M0 and Model 2 can be used to detect the difference of selection pressure between the foreground branch and the background branch.

Analysis of evolutionary rate

We used HyPhy 2.2.4 software (Pond et al., 2005) to calculate the evolution rate, which is based on the maximum likelihood method and in the context of the phylogenetic tree. The transition rate (trst), transversion rate (trsv), and trsv/trst (ratio) of each branch were calculated under the nucleotide HKY85 substitution model with the local parameter. Similarly, the synonymous (dS) and non‐synonymous (dN) substitution rates and dN/dS (ω) of each branch were calculated under the codon MG94×HKY85×3_4 substitution model with the rate Het. parameter and 4 rate classes. IBM SPSS v22.0 (IBM Corporation, 2014) software was used to perform the Mann–Whitney U test and Spearman's rank correlations test on related parameters.

RESULTS

Plastome characteristics

The 76 gymnosperms were divided into two types according to the presence of a typical IR region: 56 gymnosperms lacked a typical IR region (SC‐56; conifer) and 22 gymnosperms had a typical IR region (gnetophytes, cycads, and Ginkgo biloba; IR‐22; non‐conifer). For SC‐56 (Appendix S1), the genome size ranged from 117,720 bp (K. davidiana) to 146,723 bp (A. heterophylla), with an average of 130,858 bp. The GC content ranged from 34.3% (T. sutchuenensis) to 39.1% (C. davidiana), with an average of 36.2%. For IR‐22 (Table 2), the genome size ranged from 109,518 bp (Ephedra equisetina) to 166,341 bp (Macrozamia mountperriensis), with an average of 142,888 bp. The GC content ranged from 36.6% to 40.1%, with an average of 38.7%. The size of the IR region ranged from 17,732 bp (G. biloba) to 26,137 bp (Ceratozamia hildae). Compared with other groups, the plastome of the gnetophytes was the smallest, and the proportion of SSC region was also the smallest (7.2–9.3%), but the proportion of the IR region was largest (16%–18.9%). The three groups all contained the protein‐coding genes rps7 and 3'‐rps12 in the IR region, and ndhB was present in G. biloba and cycads. All plants, except G. biloba, possessed ycf2. Individually, 3'‐psbA was present in Welwitschia mirabilis, Gnetum parvifolium, and Gnetum gnemon, rpl32 was detected in W. mirabilis, and rps15 was detected in E. equisetina (Figure 1).
TABLE 2

Plastome structure information of 22 species with typical IR regions

Species nameGenome size/bpGC/%LSC/bpLSC percentageSSC/bpSSC percentageIR/bpIR percentageNumber of genes in the IR region
CDStRNA
Ginkgo biloba 156,98839.699,25463.2%22,26714.2%17,73211.3%36
Cycas revoluta 162,48939.488,97854.8%23,37914.4%25,06615.4%46
Cycas panzhihuaensis 162,47039.488,93254.7%23,44814.4%25,04515.4%47
Cycas szechuanensis 162,08339.488,97054.9%23,10714.3%25,00315.4%47
Cycas taitungensis 163,40339.590,21655.2%23,03914.1%25,07415.3%47
Bowenia serrulata 165,69539.990,73354.8%23,15614.0%25,90315.6%47
Stangeria eriopus 163,54839.589,85054.9%23,00614.1%25,34615.5%47
Ceratozamia hildae 165,73439.790,48754.6%22,97313.9%26,13715.8%47
Dioon spinulosum 161,81540.188,75454.8%23,35514.4%24,85315.4%47
Zamia furfuracea 164,95339.790,44154.8%23,22814.1%25,64215.5%47
Encephalartos lehmannii 165,82239.990,72454.7%23,30214.1%25,89815.6%47
Lepidozamia peroffskyana 165,93939.990,80454.7%23,29914.0%25,98115.7%47
Macrozamia mountperriensis 166,34139.891,17154.8%23,33414.0%25,91815.6%47
Gnetum montanum 115,01938.266,13957.5%9,4948.3%19,69317.1%38
Gnetum parvifolium 114,91438.266,09557.5%9,5598.3%19,63017.1%48
Gnetum ula 113,24938.564,91457.3%8,7917.8%19,77217.5%38
Gnetum gnemon 115,02238.266,59157.9%8,3297.2%20,05117.4%58
Ephedra equisetina 109,51836.659,93654.7%8,0787.4%20,75218.9%68
Ephedra foeminea 109,58436.760,70855.4%8,0787.4%20,39918.6%68
Ephedra intermedia 109,66736.659,93654.7%8,2477.5%20,74218.9%68
Ephedra sinica 109,55036.759,96154.7%8,1037.4%20,74318.9%68
Welwitschia mirabilis 119,72636.768,55657.3%11,1569.3%20,00716.7%58
FIGURE 1

Comparison of IR regions between different taxa. Different color blocks represent different gene types. In species with typical IR regions, there were two protein‐coding genes (rps7 and 3’‐rps12) located in the IR region. In Ginkgo biloba, the ycf2 was located in the LSC region due to contraction of the IR region. Gnetophytes lost ndh genes, and in Welwitschia mirabilis, 3’‐psbA entered IR region

Plastome structure information of 22 species with typical IR regions Comparison of IR regions between different taxa. Different color blocks represent different gene types. In species with typical IR regions, there were two protein‐coding genes (rps7 and 3’‐rps12) located in the IR region. In Ginkgo biloba, the ycf2 was located in the LSC region due to contraction of the IR region. Gnetophytes lost ndh genes, and in Welwitschia mirabilis, 3’‐psbA entered IR region Rank sum tests indicated no significant differences in genome size (p = .202) and significant differences in GC content (p < .01) between IR‐22 and SC‐56.

Phylogenetic analysis

The 50 protein‐coding genes comprised 30 photosynthetic system genes, 15 genetic systems, and 5 other genes (Table 3). The constructed phylogenetic results (Appendix S2) showed that the phylogenetic relationship constructed by the NJ method was the clearest, and each group formed a monophyletic branch. Using the BI and ML methods, we found that all groups, except the Pinales, formed a monophyletic branch. In the MP tree, the relationship between Cupressales and Araucariales was unclear.
TABLE 3

Types of common genes

Gene typeGene name
Genes for photosynthesis
Photosystem psaA psaB psaC psaI psaJ
Photosystem II psbA psbB psbC psbD psbE psbF psbH psbI
psbJ psbK psbL psbM psbN psbT
Genetic system genes
Cytochrome petA petB petD petG petL petN
ATP Synthase atpB atpE atpF atpI
RubiscoCO large subunit rbcL
Ribosomal Proteins (LSU) rpl14 rpl20 rpl33 rpl36
Ribosomal Proteins (SSU) rps2 rps4 rps7 rps8 rps11 rps18 rps19
RNA Polymerase rpoA rpoB rpoC1 rpoC2
Other genes
Envelop membrane protein cemA
C‐type cytochrome synthesis ccsA
Hypothetical chloroplast reading frames ycf2 ycf3 ycf4
Types of common genes The NJ (bootstrap value = 100) and MP (bootstrap value = 50) trees supported gnetophytes as the basic group of gymnosperms, and the MP (bootstrap value = 100) and BI (posterior probability = 1) trees supported gnetophytes and P. neoveitchii as sister groups. Based on previous reports, we accepted the Gnepine hypothesis, and performed a manual adjustment to obtain the phylogenetic tree for selection pressure and evolution rate analysis (Figure 2).
FIGURE 2

Phylogenetic relationship of sampled species. The tree was obtained through manual adjustment by combining four methods and previous research results. See Appendix S2 for four trees

Phylogenetic relationship of sampled species. The tree was obtained through manual adjustment by combining four methods and previous research results. See Appendix S2 for four trees

Evolutionary analysis based on the presence of a typical IR region

With IR‐22 as the foreground branch and SC‐56 as the background branch, likelihood ratio tests of M0 and Model 2 yielded 18 genes with significant differences (p < .05). That is, these 18 genes experience different selection pressure in IR‐22 and SC‐56. Among them, six genes had an ω value in IR‐22 that were lower than SC‐56, and the rest were higher (Table 4).
TABLE 4

Genes that accepted Model 2 in the Branch Model

Gene lnL M0 lnL Model 22 ΔƖ P‐value ω foreground ω background ω foreground/ ω background
atpF −6580.533−6587.05613.04600.3670.1951.882
ccsA −11747.691−11749.6974.012.0450.2680.346 0.775
cemA −10261.516−10266.1929.352.0020.4180.2721.537
petD −3345.317−3348.1005.566.0180.0660.117 0.564
psaA −16296.555−16299.0805.050.0250.0780.0571.368
psaJ −1520.374−1522.3403.932.0470.2690.1322.038
psbA −6124.730−6133.24517.02900.0260.071 0.366
psbC −9768.612−9773.69910.173.0010.0420.073 0.575
psbE −1676.534−1681.76810.468.0010.0580.183 0.317
psbJ −1229.013−1231.7215.416.0200.1330.366 0.363
psbM −1381.099−1384.8937.588.0060.4790.1323.629
rpoB −38294.175−38327.81967.28900.2860.1561.833
rpoC1 −26817.747−26840.73645.97800.3570.1911.869
rpoC2 −50272.622−50279.51913.79500.3720.2941.265
rps2 −8506.349−8508.8855.071.0240.2810.2051.371
rps11 −4972.071−4979.64215.14200.3390.1522.230
rps18 −5208.180−5212.7619.163.0020.3520.1602.200
ycf2 −142683.719−142688.96810.499.0010.7680.6351.209

Foreground is IR‐22 and background is SC‐56; Bold font indicates that ω foreground is less than ω background.

Genes that accepted Model 2 in the Branch Model Foreground is IR‐22 and background is SC‐56; Bold font indicates that ω foreground is less than ω background. We analyzed and tested the evolutionary rates of the 50 selected common genes (Appendix S3). Among these, we found that the related parameters of nine genes were significantly different between IR‐22 and SC‐56 (p < .05; Figure 3). Compared with SC‐56, the genes with a higher evolution rates in IR‐22 were trst (.003 vs. .001), trsv (.026 vs. .010), ratio (.082 vs. .031), dN (.007 vs. .001), dS (.070 vs. .025), and ω (.100 vs. .032) of psbA (2.6‐ to 4.2‐fold that of the SC‐56); trst (.052 vs. .021), dN (.033 vs. .014), dS (.112 vs. .051), and ω (.631 vs. .226) of rps8 (2.3‐ to 2.4‐fold); dN (.010 vs. .002) of psbE (5‐fold); and ω (.064 vs. .002) of petG (32‐fold). Compared with IR‐22, the genes with higher evolution rates in the SC‐56 were trsv (.033 vs. .009), ratio (.504 vs. .145), dN (.063 vs. .022), dS (.068 vs. .024), and ω (.904 vs. .024) of ycf2 (1.2‐ to 3.6‐fold those of IR‐22); trsv (.008 vs. .005) of rps4 (1.8‐fold); and ratios of rpoC1 (.224 vs. .087), rbcL (.234 vs. .077), and cemA (.183 vs. .064) (2.6‐, 3‐, and 2.8‐fold, respectively). Spearman's rank correlation test results showed that the evolutionary rate of genes other than dS of rps8 (Spearman's rho = .157, p = .169) and ω of psbA (Spearman's rho = .212, p = .063) was significantly correlated (p < .05) with the IR region (Appendix S4).
FIGURE 3

Nine of fifty common genes with significantly different evolutionary rates between IR‐22 (non‐conifers) and SC‐56 (conifers). (a) Transversion rate. (b) Transition rate. (c) Ratio. (d) Non‐synonymous substitution rate. (e) Synonymous substitution rate. (f) ω. *: Rank sum test, p < .05. Compared with SC‐56, the genes with a higher evolution rate in IR‐22 were trst, trsv, ratio, dN, dS, and ω of psbA; trst, dN, dS, and ω of rps8; dN of psbE; and ω of petG. Compared with IR‐22, the genes with higher evolution rates in the SC‐56 were trsv, ratio, dN, dS, and ω of ycf2; trsv of rps4; and ratios of rpoC1, rbcL, and cemA

Nine of fifty common genes with significantly different evolutionary rates between IR‐22 (non‐conifers) and SC‐56 (conifers). (a) Transversion rate. (b) Transition rate. (c) Ratio. (d) Non‐synonymous substitution rate. (e) Synonymous substitution rate. (f) ω. *: Rank sum test, p < .05. Compared with SC‐56, the genes with a higher evolution rate in IR‐22 were trst, trsv, ratio, dN, dS, and ω of psbA; trst, dN, dS, and ω of rps8; dN of psbE; and ω of petG. Compared with IR‐22, the genes with higher evolution rates in the SC‐56 were trsv, ratio, dN, dS, and ω of ycf2; trsv of rps4; and ratios of rpoC1, rbcL, and cemA The evolutionary rate of psbA (Appendix S5) was abnormally high in Cycas taitungensis and G. biloba and in Cycas spp. By contrast, the species evolutionary rate of SC‐56 was low. For rps8 (Appendix S6), evolutionary rates were high in G. biloba, W. mirabilis, and cycads; the values for most other species branches in SC‐56 were 0, except C. rhomboidea and S. verticillata. For rps4 (Appendix S7A), the transversion rate of IR‐22 was low, and most values were 0. All species branches in SC‐56 had high transversion rates, with those of C. rhomboidea and S. verticillata being abnormally high. For psbE (Appendix S7B), the dN of IR‐22 was high (especially in W. mirabilis and Dioon spinulosum) and was 0 for most species in SC‐56 (n = 49). For ycf2 (Appendix S8), substitution rates were high in most branches of SC‐56, especially C. rhomboidea and S. verticillata. In IR‐22, the value of the other branches was 0, except in W. mirabilis and G. biloba.

Evolutionary analysis based on genes entering the IR region

The IR regions of 22 species contained the protein‐coding genes 3’‐rps12 and rps7 (Figure 1). In W. mirabilis, G. parvifolium, and G. gnemon, part of the psbA (3’‐psbA) entered the IR region; thus, it was divided into two categories (IR‐3 and SC‐75). In G. biloba, ycf2 has been left from the IR region; thus, it was divided into two categories (IR‐21 and SC‐57). Likelihood ratio test results revealed that the selection pressure differed among categories. For psbA, ω IR‐3 was 0.03474, with IR‐3 as the foreground branch, and ω SC‐75 was 0.0001, with SC‐75 as the background branch. For ycf2, ω IR‐21 was 0.76124, with IR‐21 as the foreground branch, and ω SC‐57 was 0.67832, with SC‐57 as the background branch. The trsv, ratio, and ω values of psbA were significantly higher (P = .048, P = .003, and P = .027) in the IR region than in the SC region, and the evolutionary rate of ycf2 was significantly lower (p < .05) in the IR region than in the SC region (Figure 4).
FIGURE 4

The evolutionary rate of psbA and ycf2 in the different categories. (a) For psbA, the mean evolutionary rate of IR‐3 and SC‐75. B: For ycf2, the mean evolutionary rate of IR‐21 and SC‐57. *Rank sum test, p < .05. The trsv, ratio, and ω values of psbA were significantly higher in the IR region than in the SC region, and the evolutionary rate of ycf2 was significantly lower in the IR region than in the SC region

The evolutionary rate of psbA and ycf2 in the different categories. (a) For psbA, the mean evolutionary rate of IR‐3 and SC‐75. B: For ycf2, the mean evolutionary rate of IR‐21 and SC‐57. *Rank sum test, p < .05. The trsv, ratio, and ω values of psbA were significantly higher in the IR region than in the SC region, and the evolutionary rate of ycf2 was significantly lower in the IR region than in the SC region

DISCUSSIONS

Changes in genome size

Evolution has decreased the gymnosperm genome size. Conifers and non‐conifers have genomes of similar sizes, which is related to the large compact genome of gnetophytes (115,022–109,518 bp). This compact genome may increase the gnetophyte survival rate in harsh and competitive environments (Wu et al., 2009). After the removal of gnetophytes, a significant difference was detected between groups (p < .01). The shrinkage of the conifer plastome is a result mainly of the loss of the IR region (IRb in Pinaceae and IRa in cupressophytes). In addition, the reduction in the cupressophyte plastome size may be caused by intergenic shrinkage resulting from the mutational burden (Wu & Chaw, 2014). Pinaceae and cupressophytes have lost their typical IRs during evolution, but have acquired one or more short, novel IRs, some of which also exhibit recombination to generate genomic structural diversity (Guo et al., 2014). Compared with that of ginkgo, the IR region of other non‐conifers has expanded. Although the IR region in gnetophytes is smaller than that in cycads, it accounts for a large proportion of the genome and contains more genes. Gnetophytes lost the ndh gene and non‐coding sequence, resulting in a compact, more economical genome (Wu et al., 2009). In Pinaceae, the IR comprises only the full trnI‐CAU gene and, in most species, a portion of the psbA gene (Lin et al., 2010; Wu, Lin, et al., 2011). The GC content was increased in non‐conifers, which may be related to a GC bias in gene conversion in the IR region. Indeed, the IR region of cycads contains more A/T to GC substitutions (Wu & Chaw, 2015).

Phylogenetic location of gnetophytes

The four phylogenetic trees constructed using a tandem dataset of 50 common protein‐coding genes were not consistent, and the application of different algorithms yielded different topological structures. Compared with those produced by NJ and MP, the phylogenetic relationships constructed by ML and BI are more accurate (Hall, 2005). The ML and BI trees showed that gnetophytes and P. neoveitchi were sister groups with a high degree of support. Although Pinaceae was a non‐monophyletic group, it was closest to the gnetophytes, supporting the Gnepine hypothesis. The phylogenetic position of the gnetophytes has been debated (Mathews, 2009; Palmer et al., 2004). Burleigh and Mathews (2004) examined 13 genes in gymnosperms (five plastid, four nuclear, and four mitochondrial) and almost 19 000 nucleotides, providing evidence for the Gnepine hypothesis. Wu and Chaw (2014) analyzed the rearrangement of DNA in the plastome of gymnosperms, and the results supported the Gnepine hypothesis. Li et al. (2017) analyzed single‐copy genes from the genome and transcriptome, and the phylogenetic relationships constructed from their different datasets supported two topological structures: a sister relationship between gnetophytes and other gymnosperms and gnetophytes as a sister group to Pinaceae. Ran et al. (2018) constructed a phylogenetic tree with 1308 loci based on transcriptome data, which also supported the Gnepine hypothesis. Ping, Feng, et al. (2021) constructed ML and BI trees using rbcL and matK, which supported the classification of gnetophytes and Pinaceae as sister groups.

Heterogeneity of evolutionary rates

The change in evolutionary rate was independent of the presence of the IR region, but rather indicates an increased substitution rate of gnetophytes (especially W. mirabilis) and a lower evolutionary rate of conifers. The evolutionary rate of four genes was increased in non‐conifers. The increased substitution rates of rps8, psbE, and psbA were related to the low evolutionary rate of conifers. The petG value was abnormally high. The dN or dS value for most species branches was 0, resulting in false‐positive results. By contrast, the substitution rates of rps4 and ycf2 were decreased in non‐conifers, which is related to their generally low evolutionary rate. The evolutionary rate of most genes in W. mirabilis was high. The phylogenetic tree constructed using 51 common genes showed that the gnetophytes, especially W. mirabilis and Ephedra sinica, had high evolutionary rates. McCoy et al. (2008) reported that about 75% of plastid protein‐coding genes showed high substitution rates in W. mirabilis. The substitution rates of most protein‐coding genes did not differ significantly between conifers and non‐conifers, which may be related to the newly acquired short IR region of conifers (Hirao et al., 2008; Wu & Chaw, 2014; Yi et al., 2013). In Cephalotaxus oliveri, the 544‐bp IR (repeated trnQ‐UUG gene) was inferred to have recombination activity (Yi et al., 2013). Gnetophytes have a higher substitution rate than do other gymnosperms (Ran et al., 2018; Wang et al., 2015; Wu et al., 2009), possibly because of higher mutation rates, changes in selective pressure, increased fixation of mutations by genetic drift, and biological characteristics (Fry & Wernegreen, 2005; Lanfear et al., 2010). Wu et al. (2009) proposed that the high frequency of AT‐rich codons in gnetophytes leads to a higher substitution rate. Wang et al. (2015) proposed that the generation time and plant height underlie the increased substitution rate of gnetophytes (Lanfear et al., 2010, 2013). Ran et al. (2018) found that gnetophytes and angiosperms have similar rates of molecular evolution, which are higher than those of other gymnosperms, suggesting that gnetophytes and angiosperms experienced similar selection pressure during their evolutionary histories. As conifers are typically taller than non‐conifers, the long‐term rate of mitosis in the apical meristem is slower, the frequency of DNA replication is lower, and the accumulated errors per unit time are reduced, resulting in low mutation and substitution rates (Lanfear et al., 2013).

Negative selection pressure

Genes in the IR region experience strong negative selection (Ping, Feng, et al., 2021; Ping, Li, et al., 2021). The IR region is important for genome structural stability, and genomes lacking an IR region may experience different natural selection effects. However, the selection pressure of most genes did not differ significantly, indicating that they are not affected by the IR region. When ω < 1, smaller ω values reflect stronger negative selection pressure (Yang, 2006). The selection pressure of 18 genes differed between conifers and non‐conifers, and their ω values were <1 (Table 4). Compared with conifers, six genes (four PSII genes, one cytochrome gene, and one c‐type cytochrome synthesis gene) showed lower ω values in non‐conifers, and 12 genes (three PSI genes, one PSII gene, six genetic system genes, and two other genes) showed higher ω values in non‐conifers. The lower ω values of these genes indicates the greater the negative selection effect, which may be more conducive to their function in various groups. The differences in selection pressure between conifers and non‐conifers may be related to plant height; non‐conifers are typically shorter than conifers. Efficient photosynthesis is required to obtain enough sunlight, necessitating negative selection pressure on the genes encoding the photosynthetic machinery. However, some such genes in conifers experienced negative selection pressure, which may be related to their structure or function or to habitat differences. The negative selection exerted by the habitats of gnetophytes leads to gene‐specific reductions in dN/dS in their plastomes and genomes (Wang et al., 2015).

Reduced substitution rate in the IR region

Genes in the IR region tend to have a low substitution rate (Li, Kuo, et al., 2016; Li, Gao, et al., 2016; Ping, Feng, et al., 2021; Ping, Li, et al., 2021). In this study, ycf2 in the IR region showed a reduced substitution rate (Figure 4) but weak selection pressure (ω IR‐21 = 0.76124, ω SC‐57 = 0.67832), indicating the lack of correlation between these factors. However, Lin et al. (2012) showed that the substitution rate of ycf2 in the SC region of G. biloba was not significantly higher than that in the IR region in seven other species. These discrepancies in results are attributable to differences in sample size. In gymnosperms, 3'‐rps12 (exons 2–3) shows a reduced substitution rate and negative selection pressure when present in the IR region (Ping, Feng, et al., 2021). The substitution rate and selection pressure of rps7 in the IR region did not change, indicating high stability during gymnosperm evolution. The other gene, psbA, encodes the D1 protein in the PSII core complex. In ferns, psbA, ycf2, rps7, and 3'‐rps12 (exons 2–3) showed reduced substitution rates when present in the IR region (Li, Kuo, et al., 2016; Li, Gao, et al., 2016). psbA only partially enters the IR region in three gymnosperm species and shows increased substitution rates (Figure 4) and weak selection pressure. These characteristics differ from those of other genes; the most reasonable explanation is that this gene is undergoing dynamic changes in gymnosperms, resulting in changes in its substitution rate and selection pressure. IR regions may play an important role in the maintenance of genomic stability (Maréchal & Brisson, 2010), such as by replication initiation (Heinhorst & Cannon, 1993), genome stabilization (Palmer & Thompson, 1982), and gene conservation (Palmer & Thompson, 1982; Wolfe et al., 1987). IR loss is related to structural recombination. In some taxa lacking an IR region, plastids underwent significant structural changes, including gene and intron loss, multiple inversion, and translocation and duplication of plastome segments (Jin et al., 2020; Hirao et al., 2008; Mower & Vickrey, 2018). Sequences in the IR region often have a reduced substitution rate, which is related to the region's characteristics. Due to its double‐copy nature, the frequency of gene conversion is higher, and some mutation sites are repaired, resulting in a low substitution rate (Birky & Walsh, 1992; Khakhlova & Bock, 2006).

CONFLICT OF INTEREST

The authors declare no conflict of interest.

AUTHOR CONTRIBUTION

Jingyao Ping: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Investigation (equal); Methodology (equal); Writing – original draft (equal). Jing Hao: Data curation (equal); Formal analysis (equal); Methodology (equal). Jinye Li: Data curation (equal); Formal analysis (equal); Resources (equal). Yiqing Yang: Formal analysis (equal); Methodology (equal). Yingjuan Su: Funding acquisition (equal); Writing – original draft (equal); Writing – review & editing (equal). Ting Wang: Funding acquisition (equal); Writing – original draft (equal); Writing – review & editing (equal). Appendix S1 Click here for additional data file. Appendix S2 Click here for additional data file. Appendix S3 Click here for additional data file. Appendix S4 Click here for additional data file. Appendix S5 Click here for additional data file. Appendix S6 Click here for additional data file. Appendix S7 Click here for additional data file. Appendix S8 Click here for additional data file.
  63 in total

1.  Nucleotide substitution rates in legume chloroplast DNA depend on the presence of the inverted repeat.

Authors:  Antoinette S Perry; Kenneth H Wolfe
Journal:  J Mol Evol       Date:  2002-11       Impact factor: 2.395

2.  HyPhy: hypothesis testing using phylogenies.

Authors:  Sergei L Kosakovsky Pond; Simon D W Frost; Spencer V Muse
Journal:  Bioinformatics       Date:  2004-10-27       Impact factor: 6.937

3.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

4.  Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs.

Authors:  K H Wolfe; W H Li; P M Sharp
Journal:  Proc Natl Acad Sci U S A       Date:  1987-12       Impact factor: 11.205

5.  Plastome evolution and phylogenetic relationships among Malvaceae subfamilies.

Authors:  Jian-Hua Wang; Michael J Moore; Hongxin Wang; Zhi-Xin Zhu; Hua-Feng Wang
Journal:  Gene       Date:  2020-09-01       Impact factor: 3.688

6.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors:  Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

7.  Genetic, evolutionary and phylogenetic aspects of the plastome of annatto (Bixa orellana L.), the Amazonian commercial species of natural dyes.

Authors:  Túlio Gomes Pacheco; Amanda de Santana Lopes; Gélia Dinah Monteiro Viana; Odyone Nascimento da Silva; Gleyson Morais da Silva; Leila do Nascimento Vieira; Miguel Pedro Guerra; Rubens Onofre Nodari; Emanuel Maltempi de Souza; Fábio de Oliveira Pedrosa; Wagner Campos Otoni; Marcelo Rogalski
Journal:  Planta       Date:  2018-10-11       Impact factor: 4.116

8.  Evolution of short inverted repeat in cupressophytes, transfer of accD to nucleus in Sciadopitys verticillata and phylogenetic position of Sciadopityaceae.

Authors:  Jia Li; Lei Gao; Shanshan Chen; Ke Tao; Yingjuan Su; Ting Wang
Journal:  Sci Rep       Date:  2016-02-11       Impact factor: 4.379

9.  Comparative chloroplast genomes of pinaceae: insights into the mechanism of diversified genomic organizations.

Authors:  Chung-Shien Wu; Ching-Ping Lin; Chi-Yao Hsu; Rui-Jiang Wang; Shu-Miaw Chaw
Journal:  Genome Biol Evol       Date:  2011-03-14       Impact factor: 3.416

10.  Complete plastid genomes from Ophioglossum californicum, Psilotum nudum, and Equisetum hyemale reveal an ancestral land plant genome structure and resolve the position of Equisetales among monilophytes.

Authors:  Felix Grewe; Wenhu Guo; Emily A Gubbels; A Katie Hansen; Jeffrey P Mower
Journal:  BMC Evol Biol       Date:  2013-01-11       Impact factor: 3.260

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.