Literature DB >> 28338826

The evolutionary fate of the chloroplast and nuclear rps16 genes as revealed through the sequencing and comparative analyses of four novel legume chloroplast genomes from Lupinus.

J Keller1, M Rousseau-Gueutin1,2, G E Martin3, J Morice2, J Boutte1, E Coissac4, M Ourari5, M Aïnouche1, A Salmon1, F Cabello-Hurtado1, A Aïnouche1.   

Abstract

The Fabaceae family is considered as a model system for understanding chloroplast genome evolution due to the presence of extensive structural rearrangements, gene losses and localized hypermutable regions. Here, we provide sequences of four chloroplast genomes from the Lupinus genus, belonging to the underinvestigated Genistoid clade. Notably, we found in Lupinus species the functional loss of the essential rps16 gene, which was most likely replaced by the nuclear rps16 gene that encodes chloroplast and mitochondrion targeted RPS16 proteins. To study the evolutionary fate of the rps16 gene, we explored all available plant chloroplast, mitochondrial and nuclear genomes. Whereas no plant mitochondrial genomes carry an rps16 gene, many plants still have a functional nuclear and chloroplast rps16 gene. Ka/Ks ratios revealed that both chloroplast and nuclear rps16 copies were under purifying selection. However, due to the dual targeting of the nuclear rps16 gene product and the absence of a mitochondrial copy, the chloroplast gene may be lost. We also performed comparative analyses of lupine plastomes (SNPs, indels and repeat elements), identified the most variable regions and examined their phylogenetic utility. The markers identified here will help to reveal the evolutionary history of lupines, Genistoids and closely related clades.
© The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

Entities:  

Keywords:  Lupinus; chloroplast genomes; functional gene relocation; phylogeny; repeated sequences

Mesh:

Substances:

Year:  2017        PMID: 28338826      PMCID: PMC5737547          DOI: 10.1093/dnares/dsx006

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


1. Introduction

The Fabaceae (or Leguminosae) is one of the largest flowering plant families, with ca. 19,500 herbaceous to tree species (ca. 751 genera) distributed in very diverse ecogeographical areas around the World. Because of their ability to establish specific associations with nitrogen-fixing rhizobial bacteria,, many legume species are of great ecological and economic interest. They provide valuable biological nitrogen for better productivity and ecosystem functioning, and supply significant sources of protein for human and animal nutrition and health. Within Fabaceae, the Papillionoideae clade includes several major crops for human and animal consumption, such as soybean (Glycine), barrel medic (Medicago), bean (Phaseolus), cowpea (Vigna), chickpea (Cicer), pea (Pisum), peanut (Arachis), pigeon pea (Cajanus) and lupine (Lupinus). Increasing our knowledge of the evolutionary history of this family, as well as of the mechanisms involved in its physiological and ecological properties will improve management of natural and agricultural ecosystems and guide plant breeding programs. During the last decade, our understanding of the structural and functional evolutionary dynamics of legume genomes increased significantly due to progress in Next Generation Sequencing (NGS) technologies. This recent sequencing of many plant plastomes revealed the unusual evolution of the Fabaceae, Geraniaceae and Campanulaceae plastomes. To date, 34 complete Fabaceae plastomes have been sequenced (including 17 in the last three years), mainly from Papilionoid lineages (25), and a few from the Cesalpinioid (5) and Mimosoid (4) lineages. Comparative analyses of Fabaceae plastomes showed that they have undergone major structural evolution compared with other plant families, including the lack of one inverted repeat (IR), a 51-kb inversion shared by most Papilionoid clades (including species from the IR lacking clade also called IRLC), a 78-kb inversion in Phaseolae, a 5.6-kb inversion in Milletia, a 36-kb inversion in the Genistoid clade and a 39-kb inversion in Robinia. Although gene content is relatively well conserved in angiosperm plastomes,, it has been shown that several genes, such as accD,psaI,ycf4, rpl33,clpP, or rps16, have been functionally lost in various legume lineages. Some of these chloroplast genes (accD and rps16), which are essential for plant survival, were shown to be functionally replaced by a nuclear gene., In contrast to the plastomes of most angiosperm families, Fabaceae plastomes have regions with accelerated mutation rates, including genic regions such as the rps16-ycf4 region in the IRLC clade or the clpP gene in Mimosoids., It has been suggested that this remarkable pattern of variation most likely results from the functional alteration of genes involved in DNA replication, repair and recombination,,, which may also facilitate the expansion of repeat sequences and the formation of structural rearrangements. For instance, the extensive reorganization of the plastid genome in Trifolium was correlated with an increase in repeat number, and the increase in size of the plastid genome in Mimosoids was correlated with tandem repeat expansions. Until recently, most of the knowledge of legume plastome evolutionary dynamics derived from model and crop plants in the Papilionoid lineage, and specifically the non-protein amino acid-accumulating (NPAA) clade (including Millettioids, Robinioids and IRLC). In the last three years, plastomes of other Papilionoids, Mimosoids and Cesalpinioids lineages have been sequenced and have provided additional insights into the unusual plastome evolution of the Fabaceae. In a few genera (Glycine, Lathyrus, Trifolium), the plastomes of several species were sequenced,,, contributing to a better understanding of the origin of specific structural variations. Localized hypermutations, gene losses and plastome size variations were identified as well as useful sequence resources were found for phylogenetic inference. However, additional sequencing efforts in key genera of the highly diverse legume family is essential for understanding key features of plastome evolution, and to resolve phylogenetic relationships at these taxonomic levels. The Genistoid clade contains ∼18% of the 13,800 Papilionoid taxa. Within this poorly-studied lineage, the diverse Lupinus genus is considered as a good model system. Lupinus is composed of hundreds of annual and perennial herbaceous species and a few soft-woody shrubs and trees, which occur in a wide range of ecogeographical conditions. Lupines are mainly distributed in the New World (NW) (from Alaska to southern Chile and Argentina), whereas ∼20 species and subspecies are native to the Old World (OW) where two groups are distinguished: the smooth-seeded (circum-Mediterranean) and the rough-seeded lupines (scattered in North-equatorial Africa). In addition, this genus includes some crop species (Lupinus albus, Lupinus luteus, Lupinus angustifolius, Lupinus mutabilis), which are of growing interest due to their high seed protein content, their potential as nitrogen producers and for their health benefits., Molecular phylogenetic investigations using nuclear (ITS, nrDNA, LEGCYC1A, SymRK) and chloroplastic (rbcL, matK, trnL-trnF, trnL-trnT, trnL-intron, trnS-trnG) regions drastically improved knowledge of the evolutionary history of this complex genus. Many clades have been well circumscribed, and patterns of diversification were identified in both the NW and the OW. In spite of these significant advances, there are still uncertainties and unresolved relationships to be elucidated, such as for instance: (i) basal relationships between the NW and OW lineages in the Lupinus phylogeny; (ii) relationships amongst the OW lineages and within the African clade; and (iii) the enigmatic position of some taxa (e.g. Floridian lupines)., Recently, the plastome of L. luteus was published, representing the first chloroplast genome sequenced in Lupinus and in the Genistoid lineage. Comparison with other legume plastomes allowed the discovery that the Genistoids share a 36-kb inversion, and the identification of mutational hotspots representing potentially informative regions for evolutionary studies. However, these identified regions, such as the ycf4 gene in the NPAA clade or the clpP in the IRLC and Mimosoids, may be of interest only in particular clades, due to a specific accelerated evolutionary rate in these lineages. Thus, additional plastomes are needed to specifically understand the plastome evolution of the lupine/Genistoid lineage and to accurately identify their most variable regions that are of phylogenetic significance. In this context, we sequenced four novel lupine plastomes: two Mediterranean smooth-seeded species, L. albus and Lupinus micranthus Guss. and two rough-seeded species, Lupinus atlanticus Glads. and Lupinus princei Harms. Comparative analyses were performed among the five OW lupine plastomes (including the previously published L. luteus) at the structural (inversions, indels, repeat numbers and distribution), gene and sequence levels, in order to better understand their evolutionary dynamics and to identify novel phylogenetically informative regions. More specifically, sequencing of these four additional plastomes revealed the pseudogenization of the chloroplast rps16 gene in Lupinus species. Analyses of the Ka/Ks ratios of the functional chloroplast and the nuclear rps16 genes (both encoding the same chloroplast RPS16 protein) from some representative Angiosperm species revealed that both copies were under purifying selection. However, since the nuclear rps16 gene also encodes the mitochondrial RPS16 protein and that this gene is lost in the mitochondrial genomes of all plants sequenced to date, the loss of the nuclear rps16 gene would be detrimental for plant survival. This could explain why only the chloroplast rps16 gene has been functionally lost many times during plant evolution, despite being under purifying selection. In addition, investigations on the evolutionary dynamic of the lupine plastomes (mutations, indels and repeated elements) allowed identification of variable characters and regions. The phylogenetic interest of these regions in the genus Lupinus was tested using representative species of the main lupine clades.

2. Material and methods

2.1. Plant material and DNA isolation

Genomic DNA of 30 lupine species was extracted from fresh leaves using the NucleoSpin® Plant II kit (Macherey-Nagel), following the manufacturer’s instructions. The genomic DNA extracts of four Lupinus species (L. albus, L. micranthus, L. atlanticus and L. princei) were subjected to NGS for plastome reconstruction. DNA extracts from the other 26 lupine species were used in different evolutionary tests on genes and regions of interest; including four OW rough-seeded species (L. digitatus, L. cosentinii, L. anatolicus and L. pilosus); three OW smooth-seeded species (L. hispanicus subsp. bicolor, L. angustifolius subsp. angustifolius and L. angustifolius subsp. reticulatus); and fourteen species representing the main known groups in the NW lupines. Among these are (i) five members of the North and South East American clade (L. texensis, L. paraguariensis, L. gibertianus and L. sellowianus); (ii) nine members from various groups mainly occurring in western regions of North, Central and South America (L. affinis, L. hirsitussimus, L. luteolus, L. nanus, L. polyphyllus, L. mutabilis, L. mexicanus, L. elegans, two unidentified samples L. sp. from Equator); (iii) and two Florida endemic species (L. diffusus and L. villosus). Moreover, the DNA of three representatives from the Genista-Cytisus complex, sister group to Lupinus in the Genisteae tribe were obtained: Retama sphaerocarpa, Cytisus battandieri and Genista erioclada. More details on geographic origins and reference numbers of these plant materials are presented in Supplementary Table S1.

2.2. High throughput sequencing, plastome assembly and annotation

The genomic DNA of L. albus, L. micranthus, L. atlanticus and L. princei were subjected to high-throughput sequencing using an Illumina HiSeq 2000 platform (BGI, Hong-Kong). One flow cell containing a library of each species was used, yielding ∼11 millions of 100 bp paired-end (PE) reads (insert size = 500 bp) for each library, except L. micranthus, for which ∼5.5 millions of PE reads were obtained. De novo chloroplast genome (plastomes) assemblies were performed using Paired End Illumina reads and ‘The organelle assembler’ software (http://metabarcoding.org/asm (January 2015, date last accessed)): its aim is to assemble over represented sequences such as organelle genomes (chloroplast or mitochondrion), or the rDNA cistron. Each draft plastome sequence was then verified and corrected by mapping the Illumina reads against each genome using Bowtie 2 v2.0. A few uncertain nucleotides were verified by Sanger sequencing. Plastome annotation was performed using DOGMA (Dual Organellar GenoMe Annotator, http://dogma.ccbb.utexas.edu (January 2015, date last accessed)) and by aligning each of the four newly constructed plastomes with the published L. luteus plastome (KC695666). A graphical representation of each plastome was drawn using Circos (Supplementary Figs S1–S4).

2.3. Identification of rps16 gene sequences in plant mitochondrion, chloroplast and nuclear genomes

Sequences of the rps16 gene were searched for all non-parasitic plant mitochondrion, chloroplast and nuclear genomes available to date. Organelle and nuclear genomes were downloaded from GenBank (https://ncbi.nlm.nih.gov (November 2016, date last accessed)) and Phytozome v11 (https://phytozome.jgi.doe.gov/pz/portal.html (November 2016, date last accessed)), respectively. For the nuclear rps16 genes, presence in mature proteins of a signal peptide targeting the proteins to the organelles was tested using BaCelLo, ProteinProwler, TargetP, MultiLoc2 and Predotar. In addition, as the chloroplast rps16 gene has a subgroup IIB intron, we looked for the presence of the correct splicing of this intron by verifying the presence of the strictly conserved splicing sites (GTGYG and AY at the 5ʹ and 3ʹ splice sites of the intron, respectively) in all chloroplast rps16 genes with a complete coding sequence (742 species).

2.4. Selective pressure acting on the nuclear and chloroplast rps16 genes

Within a subset of plants representing the main clades of Angiosperms (Arabidopsis lyrata: Brassicales; Citrus sinensis: Sapindales; Cucumis sativus: Cucurbitales; Glycine max: Fabales; Manihot esculentum: Malpighiales; Musa acuminata: Zingiberales; Oryza sativa: Poales; Panicum virgatum: Poales; Prunus persica: Rosales; Solanum lycopersicum: Solanales; Theobroma cacao: Malvales; Vitis vinifera: Vitales), we retrieved the functional nuclear and chloroplast rps16 gene sequences, which both encode the chloroplast RPS16 proteins. The different nuclear or chloroplast copies were aligned using Geneious v6.1.8 and the alignments were adjusted manually. Non-synonymous and synonymous nucleotide substitution rates were evaluated using the yn00 method implemented in PAML for the nuclear and chloroplast rps16 sequences. A list of species considered and the accession numbers of nuclear and chloroplast rps16 sequences used are presented in Supplementary Table S2. Ka/Ks analyses of chloroplast rps16 gene were also performed using only the representatives of the following Angiosperm families: Asteraceae, Brassicaceae, Fabaceae, Poaceae and Solanaceae.

2.5. Sequence divergence among lupine plastomes

To identify the most variable regions among lupine chloroplast genomes, the five plastomes were aligned using Geneious v6.1.8 and pairwise comparisons between each of the five plastomes were performed to evaluate the percentage of identity in sliding window frames of 1 kb with a Python custom script. Using this script, insertion-deletions (indels) with a minimum size of 20 bp were identified. These large indels as well as pairwise comparisons results were represented graphically using Circos. Additionally, the five aligned plastomes were screened to identify autapomorphous (single) and shared indels of at least 2 bp, and the excluding regions with homopolymers or with ambiguous overlapping indels. Sequence divergence among the five lupine plastomes (including L. luteus) was also evaluated independently for intergenic spacers, introns, exons, rRNAs and tRNAs by calculating pairwise distances between homologous regions. Pairwise distances were determined with the R-cran Package (available at: http://cran.r-project.org/web/packages/ape/ape.pdf) using the Kimura 2-parameters (K2p) evolution model for introns and intergenic spacers. Additionally, sequence divergence of protein encoding sequences was estimated using the synonymous (Ks) and non-synonymous (Ka) nucleotide substitution rates with the yn00 method from the PAML package. Repeat sequences in each lupine plastome were identified using REPuter with similar parameters as previously described for the analysis of Fabaceae plastomes,,, and excluding one copy of the IR. Palindrome sequences as well as dispersed direct and IRs of a minimum length of 30 bp and presenting at least 90% sequence identity were identified (Hamming distance of three). Additionally, mono-, di-, tri-, tetra- and penta-nucleotides Short Sequence Repeats (SSRs) with a mininimal size of 12 bp and a minimal repeat number of five were detected using the Phobos software implemented in Geneious v6.1.8.

2.6. PCR amplification of the rps16 gene and the most variable regions in lupines

Several primer pairs were designed using Primer 3.0 to examine the variation of the rps16 gene and two fast evolving regions (psaA-ycf4 and ycf1-rps15) in Lupinus. For rps16, the following primer pairs were used: F-CCGTCCCAGAGCATATTCAG, R-GCAACGATTCGATAAATGGC and F-CCCATTCATATCGAAGGAAAACT, R-CCATCATGTACTATTTACATCATCAATC and R-CTATATACAAGTCATCCACACCCTC. Within the fast evolving regions, primer pairs were designed to amplify four sub-regions (accD and ycf1 genes, ycf1-rps15 and trnF 5ʹ-3ʹ intergenic spacers): accD with F-GTCTATAAATACATTACCCCCG, R-TGTCTTCATCCATAGGATTCC; ycf1-rps15 with F-GATTTATGTTGCACAAACCG, R-CATTGATGGGTGGTGAGG; trnF-trnL with F-TTGAACTGGTGACACGAGG, R-TGGCGAAATTGGTAGACG. Because of the large size of the ycf1, two primer pairs were designed: ycf1 part1 with F-AATCAAGCAGAAAGTTATGGG, R-CTTACATCTTTTGAGCTTTCACTC; ycf1 part2 with F-GGAATGGAAGTAGAATTGCC, R-TTTTGTTTACGCGTCTTGT. PCR amplifications of these regions were carried out for 32 taxa (including three Genisteae outgroups, Supplementary Table S1) in a total volume of 50 μl, containing 5× Green GoTaq flexi Reaction Buffer (Promega), 0.2 mM of dNTP, 0.2 μM of each primer, 1.25 Unit of G2 flexi DNA polymerase (Promega), mqH2O and 20 ng of template DNA. Cycling conditions were 94 °C for 2 min followed by 35 cycles at 94 °C for 30 s, 48–52 °C (adapted according to the primer pairs used) for 30 s, 72 °C for 90 s and a final extension at 72 °C for 7 min. PCR products were purified using the NucleoSpin gel and PCR clean up kit (Macherey Nagel), following the manufacturer’s instructions. Purified PCR products were sequenced directly (in both directions) by Sanger at Macrogen Europe (Amsterdam, The Netherlands). All sequences were deposited in Genbank under the accession numbers, KX147685 to KX147753 and KX787895 to KX787910.

2.7. Phylogenetic analyses

For each of the four chloroplast regions investigated, all lupine sequences were aligned using MAFFT implemented in Geneious v6.1.8., The resulting alignments were adjusted manually. In addition, a concatenated data matrix was constructed using the sequences obtained from the four regions (accD and ycf1 genes, ycf1-rps15 and trnF 5ʹ-3ʹ intergenic spacers). These matrices were first subjected to phylogenetic analyses using Maximum Parsimony (MP). Bootstrap analyses were performed with 1,000 replicates. These data matrices were also subjected to Maximum Likelihood (ML) phylogenetic analyses. The best-fitted model of sequence evolution for each region (individual or concatenated) was determined using JModeltest and ML analyses were then performed for each matrix with 1,000 bootstrap replicates using MEGA 6.0.

3. Results and discussion

3.1. Structure, organization and gene content of lupine plastomes

The Illumina PE reads obtained for L. albus, L. atlanticus, L. micranthus and L. princei, were used to assemble their chloroplast genome sequences (deposited in GenBank under accession numbers KU726826; KU726827; KU726828; KU726829, respectively). The four plastomes harbor a quadripartite structure (a Large Single Copy and a Small Single Copy separated by two IRs) with a total length ranging from 151,808 bp to 152,272 bp. As expected from previous PCR-based evidence, they all have the 36-kb inversion that occurred at the base of the Genistoid emergence or soon after. The different Lupinus plastomes have similar gene, intron and GC content (Table 1) as do most photosynthetic and non-parasitic angiosperm plastomes. The genes are distributed into three main categories: self-replication (58 genes), photosynthesis (47 genes) and other functions (six genes) (Supplementary Table S3). Among these genes, 76 are protein-encoding genes, 30 encode tRNAs and four encode rRNAs. None of the genes known to be lost or pseudogenized in other legume lineages, such as accD,psaI, ycf4, rpl23 or rpl33 are missing in the lupine plastomes. Interestingly, comparative analyses of the lupine plastomes (including L. luteus) revealed a likely loss of functionality of the rps16 gene in L. albus and L. micranthus but not in the other species. Both pseudogenes showed a deletion (verified by Sanger sequencing), which lead to a pre-mature stop codon (19 and 20 amino acids earlier in L. albus and L. micranthus, respectively) within the functional domain of the RPS16 protein (Fig. 1). To determine if these truncated rps16 genes in L. albus and L. micranthus are still functional, we used pfam (pfam-A, default parameters) to search for the presence of a functional domain in the five lupine species investigated in this study. No RPS16 functional domain could be identified in L. albus and L. micranthus only, clearly suggesting that rps16 is a pseudogene in these two lupine species. Recently, an additional way at the origin of the loss of functionality of the chloroplast rps16 gene was identified and corresponds to the loss of its splicing capacity. In lupines, we observed that the rps16 intron is not correctly spliced. This suggests that the rps16 is not functional in the five chloroplast genomes (all five lupine plastomes must therefore have 76 functional protein-coding genes), and that the loss of functionality most likely occurred first via the loss of the ability to splice the intron. Thereafter, additional mutations in L. albus and L. micranthus led to pre-mature stop codons. Whether this shared pre-mature stop codon results from a common ancestor or from independent mutational events needs to more accurately resolve phylogenetic relationships of these two species among the OW lupines (see later in the phylogenetic section). Sequencing of the rps16 gene in other lupines and closely related species revealed that another population of L. micranthus has a pseudo rps16, and that it is also defunct in L. angustifolius, Lupinus mariae-josephae, L. villosus and in a member of the Lupinus sister group, G. erioclada (data not shown).
Table 1

Characteristics of Lupinus plastomes

Plastome characteristicsL. luteusL. albusL. atlanticusL. princeiL. micranthus
Overall size in bp151,894151,921152,272152,243151,808
LSC size in bp (%)82,327 (54.2)82,280 (54.2)82,674 (54.3)82,663 (54.3)82,145 (54.1)
SSC size in bp (%)17,847 (11.7)17,841 (11.7)17,894 (11.8)17,876 (11.7)17,857 (11.8)
IR size in bp (%)25,860 (34.1)25,900 (34.1)25,852 (34)25,852 (34)25,903 (34.1)
Coding regions size in bp (%)90,217 (59.4)90,002 (59.2)90,125 (59.2)90,104 (59.2)90,083 (59.3)
Protein-coding region in bp (%)78,363 (51.6)78,148 (51.4)78,271 (51.4)78,250 (51.4)78,229 (51.5)
Introns size in bp (%)19,136 (12.6)19,115 (12.6)19,111 (12.6)19,121 (12.6)18,754 (12.4)
rRNA size in bp (%)9,056 (6)9,056 (6)9,056 (5.9)9,056 (5.9)9,056 (6)
tRNA size in bp (%)2,798 (1.8)2,798 (1.8)2,798 (1.8)2,798 (1.8)2,798 (1.8)
IGSs size in bp (%)42,541 (28)42,804 (28.2)43,036 (28.3)43,018 (28.3)42,971 (28.3)
No. of different genes110110110110110
No. of different protein-coding genes7676767676
No. of different rRNA genes44444
tRNA genes3030303030
No. of different duplicated genes by IR1717171717
No. of different genes with introns1818181818
Overall % of GC content36.636.736.636.736.6
% of GC content in protein-coding regions37.337.337.337.337.3
% of GC content in introns36.336.936.836.836.8
% of GC content in IGSs30.330.430.330.430.3
% of GC content in rRNA55.355.355.355.355.3
% of GC content in tRNA53.353.253.353.353.3
Figure 1

Comparison of lupine chloroplast rps16 coding-sequences with legume rps16 sequences (Glycine max and Lotus japonicus) and Cucumis sativus rps16 sequence (outgroup). The ribosomal protein S16 domain is indicated between brackets. The presence of a pre-mature stop codon within the rps16 functional domain of L. albus and L. micranthus is represented by a black asterisk. The black triangle denotes the position of rps16 intron. It is important to note that the five lupine species present incorrect splicing sites according to.

Characteristics of Lupinus plastomes Comparison of lupine chloroplast rps16 coding-sequences with legume rps16 sequences (Glycine max and Lotus japonicus) and Cucumis sativus rps16 sequence (outgroup). The ribosomal protein S16 domain is indicated between brackets. The presence of a pre-mature stop codon within the rps16 functional domain of L. albus and L. micranthus is represented by a black asterisk. The black triangle denotes the position of rps16 intron. It is important to note that the five lupine species present incorrect splicing sites according to.

3.2. Evolutionary dynamic and fate of the rps16 gene in plant mitochondrial, chloroplast and nuclear genomes

As in some Lupinus species, the chloroplast rps16 gene was missing in many other Fabaceae, including P. vulgaris and the IRLC. In this family, the chloroplast rps16 gene, which is essential for plant survival, has been functionally replaced by a nuclear gene that can encode both mitochondrial and chloroplast RPS16 proteins. To better understand the origin and evolutionary fate of rps16 genes residing in different genome compartments but with similar functions, we searched for plant chloroplast, mitochondrial and nuclear rps16 genes in the currently available nuclear and organelle genomes. In total, we investigated 52 nuclear, 289 mitochondrion and 1,166 chloroplast genomes from the non-parasitic brown and green plant lineages (Haptophytes, Stramenopiles, Glaucophytes, Rhodophytes, Chlorophytes and Streptophytes). Within all the sequenced mitochondrial genomes, no rps16 gene was found, whereas a functional (no pre-mature stop codon) nuclear rps16 gene copy was observed in all species investigated. The loss of the mitochondrial rps16 gene before the divergence of the Glaucophyta from Rhodophyta, Chlorophyta and Streptophyta lineages suggests that the transfer of rps16 from the mitochondrion to the nucleus occurred ∼1,500 million years ago (Fig. 2), which is much earlier than previously determined (i.e. before the emergence of angiosperms). As the rps16 gene was lost from the mitochondrion before divergence of the green lineage and as the nuclear rps16 gene encodes mitochondrion RPS16 proteins, its functional loss from the nuclear genome would be detrimental. As expected, a peptide signal targeting the nuclear encoded RPS16 proteins to the mitochondrion was identified in all species investigated, while the presence of a chloroplast target peptide was predicted in only a few species (Supplementary Table S4). However, it is likely that all species present a nuclear rps16 gene that can target the protein to both the mitochondria and plastids, as previously demonstrated by. Indeed, these authors showed experimentally that in two species that have lost the rps16 gene from their chloroplast (Medicago truncatula and Populus alba), and for which the nuclear encoded RPS16 proteins were only predicted to be localized in the mitochondrion (Supplementary Table S4), the RPS16 proteins were targeted to both organelles. It is thus very likely that despite the absence of a predicted chloroplast target peptide, nuclear-encoded RPS16 proteins are targeted to both mitochondria and chloroplasts.
Figure 2

Genome localization of the rps16 gene(s) encoding the mitochondrial and chloroplast RPS16 proteins in Archeaplastida. Early in Archeaplastida, the mitochondrial rps16 gene was transferred to the nucleus (nuc) and acquired a signal peptide targeting both mitochondrion (mt) and chloroplast (cp). In Glaucophytes and Red Algae, the mitochondrial rps16 gene is always absent whereas it is present in 13 Rhodophyta chloroplast genomes (no plastome sequence available from Glaucophytes). In the core Chlorophyta lineage, none of the 76 species having a fully sequenced chloroplast and mitochondrial genomes have a rps16 gene. In the Streptophyta lineage, no rps16 gene was found in the mitochondrial genomes, whereas the chloroplast rps16 gene may either be functional or loss its functionality (complete gene loss, presence of a pre-mature stop codon or loss of the splicing capacity). Tree was redrawn according to Ref.,

Genome localization of the rps16 gene(s) encoding the mitochondrial and chloroplast RPS16 proteins in Archeaplastida. Early in Archeaplastida, the mitochondrial rps16 gene was transferred to the nucleus (nuc) and acquired a signal peptide targeting both mitochondrion (mt) and chloroplast (cp). In Glaucophytes and Red Algae, the mitochondrial rps16 gene is always absent whereas it is present in 13 Rhodophyta chloroplast genomes (no plastome sequence available from Glaucophytes). In the core Chlorophyta lineage, none of the 76 species having a fully sequenced chloroplast and mitochondrial genomes have a rps16 gene. In the Streptophyta lineage, no rps16 gene was found in the mitochondrial genomes, whereas the chloroplast rps16 gene may either be functional or loss its functionality (complete gene loss, presence of a pre-mature stop codon or loss of the splicing capacity). Tree was redrawn according to Ref., Within the chloroplast genomes investigated (1,166), the rps16 gene was found to be missing (total absence of the gene or truncated proteins due to pre-mature stop codon) in 312 genomes. We looked for the presence of correct splicing sites in chloroplast rps16 genes with a complete coding sequence and an intron. We found that 434 and 306 had or did not have the splicing capacity, respectively. Among the latter, 197 exhibited mutations in both 5ʹ and 3ʹ splice sites, whereas 22 and 87 had mutations only at the 5ʹ or the 3ʹ splice sites, respectively (Supplementary Table S5). As previously observed, this gene has lost its functionality many times during flowering plant evolution, by the loss of either all or part of the coding sequence or of the splicing sites. Dating back to 1,500 MYA, chloroplast RPS16 proteins can be produced by either nuclear or chloroplast rps16 genes. Our results highlight the fact that even though the chloroplast rps16 gene could have been non-functional in all plant genomes since then, it is still present and functional in most plants. To better understand the evolutionary dynamics of the rps16 gene, we analysed the selective pressure acting on functional chloroplast and nuclear rps16 genes among 12 species representing the main angiosperms clades. As the rps16 gene was functional in both chloroplast and nuclear genomes and as only the chloroplast copy is likely to be lost, the selective constraints acting on the chloroplast gene could be relaxed. Results of Ka/Ks ratios revealed a strong purifying selection for both chloroplast and nuclear rps16 (average Ka/Ks ratio: 0.045 ± se 0.010 and 0.1707 ± se 0.003 for nuclear and chloroplast copies, respectively; Supplementary Tables S6 and S7). Ka/Ks ratios of cp-rps16 for each of the main Angiosperm families (Asteraceae, Brassicaceae, Fabaceae, Poaceae and Solanaceae) were also calculated for all cp-rps16 found with a complete coding sequence and a correctly spliced intron. Results were similar for each of the five families investigated and revealed a strong purifying selection acting on all the tested datasets (average of Ka/Ks ratios always lower than 0.23; Supplementary Table S8). A possible explanation of this strong negative selection pressure still acting on the chloroplast rps16 gene and the presence of a functional chloroplast rps16 gene in many plant genomes (in contrast with plant mitochondrial genomes) is that the chloroplast rps16 gene may function or be regulated slightly better than the nuclear gene under certain conditions. Although these results revealed the multiple status (absent, truncated, incorrectly spliced, functional) of the chloroplast rps16 gene among the plant kingdom and confirmed the hypothesis of that the loss of splicing capacity is widely spread through plant species, mechanisms beyond the conservation of the chloroplast copy in most species remain unknown. Indeed, the chloroplast rps16 gene is still essential in certain plant species as revealed by knock-down studies in tobacco. Different hypotheses have been proposed to explain the retention of some genes within the organelle genomes. The current most widely accepted hypothesis corresponds to the Colocation of gene and gene product for redox regulation of gene expression (or CoRR). This hypothesis concerns genes that are redox-dependant (such as rbcL; rps2,3,4,7,8,11,12,14,19; rpl2,14,16,20,36). However, the chloroplast rps16 gene has been found to be redox independent. An alternative hypothesis that was considered concerned the retention of the ribosomal assembly genes in the organelle. A ‘core set’ of ribosomal genes were identified in all plants investigated, however, rps16 was not included. Another possible explanation of the retention of a functional chloroplast rps16 gene in many species may be due to the loss of the chloroplast target peptide of the nuclear-encoded RPS16 protein (despite the fact that the mitochondrion rps16 target signal remain retained).

3.3. Lupine plastome variability

3.3.1. Identification of single nucleotide polymorphims and indels in lupine plastomes

To identify putative mutation hotspots, pairwise comparisons of the five lupine plastomes were performed and showed that they have a very high level of sequence identity (98% on average). The two African species (L. atlanticus and L. princei), which are the most closely related lupine species investigated in this study, exhibit the highest identity (99.7%); whereas the species with the lowest sequence identity are L. luteus and L. micranthus (97.9%). These comparisons also enabled identification of 164 non-ambiguous indels along the chloroplast genomes, including 14 with a size ranging from 20 to 357 bp. Of the 164 indels, 50% are 5–6 bp long. These analyses revealed two highly variable regions (Fig. 3). The first region spans from psaA to ycf4 (∼11.5 kb) and was already identified as a hypermutable region. This region contains 11 genes: psaA, ycf4, ycf3, trnS-GGA, psbI, psbK, trnQ-UUG, rps16, accD and psaI genes, for which four genes (accD, rps16, ycf4, psaI) were shown to be functionally lost in at least one legume species. The second most variable region includes the ycf1-rps15 genes (∼6.5 kb). The ycf1 gene, which encodes a translocon protein of the inner chloroplast membrane, is larger than 5 kb in lupines and is highly variable with the exception of a 5ʹ fragment duplicated in the IR (519 bp in lupines). This gene was recently identified as one of the most variable chloroplast genes in Angiosperms and is considered as a powerful tool for DNA barcoding., The longest hypervariable region (psaA-ycf4) contains the highest number of indels (25), with 11 (among the 14 present in the genome) between 20 and 357 bp. Some of these large indels will most likely be useful to discriminate lupine species and/or groups of species from other Lupinus lineages or from other closely related genera.
Figure 3

Pairwise comparison of lupine plastomes to identify single nucleotide polymorphisms and indels. The outer circle represents the gene map of lupine plastomes; the boxes outside this first circle indicate a counterclockwise of transcription direction whereas inside boxes indicate a clockwise transcript direction. In the second circle, potentially informative sites are indicated by black bars. The following ten inner circles represent pairwise comparisons between the five available lupine plastomes; pairwise identity level is indicated and indels >20 bp are represented by black triangle. The central black circle represents the different parts of the chloroplast genome (LSC, SSC and IRs). The endpoints of the 50-kb inversion, specific to the Papilionoid legumes and of the 36-kb inversion, specific to the Genistoid clade, are represented by arrows.

Pairwise comparison of lupine plastomes to identify single nucleotide polymorphisms and indels. The outer circle represents the gene map of lupine plastomes; the boxes outside this first circle indicate a counterclockwise of transcription direction whereas inside boxes indicate a clockwise transcript direction. In the second circle, potentially informative sites are indicated by black bars. The following ten inner circles represent pairwise comparisons between the five available lupine plastomes; pairwise identity level is indicated and indels >20 bp are represented by black triangle. The central black circle represents the different parts of the chloroplast genome (LSC, SSC and IRs). The endpoints of the 50-kb inversion, specific to the Papilionoid legumes and of the 36-kb inversion, specific to the Genistoid clade, are represented by arrows.

3.3.2. Sequence divergence between lupine plastomes

Pairwise distance (K2p) comparisons among the five lupine plastomes were calculated for non-coding sequences. As expected, the lowest rates of variation were observed for tRNA and rRNA (maximum K2p value: 0.0141, Supplementary Table S9). For introns (Supplementary Table S9, Supplementary Fig. S6A), average of K2p rates ranged from 0.0006 (ndhB intron) to 0.0263 (clpP intron 1). Compared with the K2p analyses performed by, who estimated sequence divergence between L. luteus and other legume species, our overall K2p values obtained by comparing only lupine species are, as expected, significantly lower (Wilcoxon test, P-value = 0.05; see Fig. 4A). Among the five lupine species considered, the clpP intron 1, rpl16, rpoC1 and ndhA introns exhibit the higher K2p values. The most variable intron in lupines corresponds to the first intron of clpP, which also showed accelerated mutation rate in Mimosoideae., The trnK and trnL introns previously used for legumes and lupines phylogenies were found to only exhibit high variation when comparing L. luteus to other Fabaceae.
Figure 4

K2p mean values ± standard error for (A) introns and (B) intergenic spacers between (i) L. albus, L. atlanticus, L. luteus, L. micranthus and L. princei (black cicles) and (ii) L. luteus, Phaseolus vulgaris, Pisum sativum, Vigna radiata, Glycine max, Lathyrus sativus, Cicer arietinum, Trifolium subterraneum, Medicago truncatula, Lotus japonicus and Millettia pinnata (black squares). The x-axis corresponds to intron and intergenic regions. Asterisks represent a statistically significant difference (P.value: 0.05).

K2p mean values ± standard error for (A) introns and (B) intergenic spacers between (i) L. albus, L. atlanticus, L. luteus, L. micranthus and L. princei (black cicles) and (ii) L. luteus, Phaseolus vulgaris, Pisum sativum, Vigna radiata, Glycine max, Lathyrus sativus, Cicer arietinum, Trifolium subterraneum, Medicago truncatula, Lotus japonicus and Millettia pinnata (black squares). The x-axis corresponds to intron and intergenic regions. Asterisks represent a statistically significant difference (P.value: 0.05). K2p values for IGSs ranged from 0 to 0.0434 (Supplementary Table S9, Supplementary Fig. S6B). In comparison to the commonly-used IGSs in legume phylogenetic studies (trnF_trnL, mean K2p = 0.0185, 428 bp; trnL_trnT, 0.0182, 633 bp; trnS_trnG, 0.0181, 799 bp), 36 IGS regions present a higher K2p values, and 15 of them are larger than 300 bp, and thus may be useful for phylogenetic studies. This analysis allowed detection of two relatively variable IGS sequences, corresponding to ycf1_rps15 (mean K2p = 0.0355, aligned length = 470 bp) and rpl32_ndhF (0.0322, 486 bp) that were not detected in previous analysis (Fig. 4B). Non-synonymous (Ka) and synonymous (Ks) nucleotide substitution rates were calculated for protein-coding sequences, as well as the Ka/Ks ratio (Supplementary Tables S10–S12). The mean Ks among the five lupines studied ranged from 0 (petG, petL, petN, psaJ, psbF, psbI, psbM, rpl23, rpl33, rps7) to 0.05049 (psbT). All protein-encoding genes have a Ks value lower than 0.1. Similarly, the non-synonymous substitution rate (Ka) was lower than 0.025 for all genes. Finally, Ka/Ks ratios were calculated for each protein-coding region in order to determine the selective constraint acting on each gene. Almost all genes evolved under high purifying selective constraint (53 of the 77 genes have a Ka/Ks ratio lower than 0.2), except for six genes (matK, rpoA, ycf2, rpoC2, accD and ycf1) that show a ratio > 0.5 (including three genes evolving almost neutrally (matK, rpoA, ycf2; Supplementary Fig. S7). Except for ycf1 and ycf2, the other genes were not identified as neutrally evolving between legumes and L. luteus. Comparison of Ka/Ks ratios obtained when considering only lupine species to the Ka/Ks ratios obtained when comparing L. luteus to other legumes, revealed 14 genes that exhibit higher Ka/Ks ratios between lupines than between L. luteus and legumes. Among these genes, only six are significantly higher (accD, ndhF, psbB, rbcL, rpoB and rsp2; Fig. 5). However, detailed analysis of these genes (synonymous and non-synonymous substitution comparisons; and codon-based ML phylogenetic analyses; results not shown) did not reveal significant accelerated mutation rates at either synonymous or non-synonymous sites. Conversely, the ycf4 gene and the flanking cemA and accD genes, which were found to be highly variable in the Lathyrus and Desmodium clades, were more stable, lacking major rearrangements in Lupinus. These results highlight that fast-evolving regions may strongly differ among clades within a family.
Figure 5

Mean Ka/Ks ratio values ± standard error between homologous regions of (i) the five Lupines L. albus, L. atlanticus, L. luteus, L. micranthus and L. princei (black circles) and (ii) L. luteus, Phaseolus vulgaris, Pisum sativum, Vigna radiata, Glycine max, Lathyrus sativus, Cicer arietinum, Trifolium subterraneum, Medicago truncatula, Lotus japonicus and Millettia pinnata (black squares). The x-axis corresponds to the CDS regions. Asterisks represent a statistically significant difference (P.value: 0.05).

Mean Ka/Ks ratio values ± standard error between homologous regions of (i) the five Lupines L. albus, L. atlanticus, L. luteus, L. micranthus and L. princei (black circles) and (ii) L. luteus, Phaseolus vulgaris, Pisum sativum, Vigna radiata, Glycine max, Lathyrus sativus, Cicer arietinum, Trifolium subterraneum, Medicago truncatula, Lotus japonicus and Millettia pinnata (black squares). The x-axis corresponds to the CDS regions. Asterisks represent a statistically significant difference (P.value: 0.05).

3.3.3. Lupinus plastid sequences of phylogenetic utility

To explore the putative phylogenetic utility of different chloroplast regions, potentially informative sites (Pi) were evaluated among lupines in: (i) complete chloroplast genomes, (ii) protein-coding sequences, (iii) intergenic spacers, (iv) introns and (v) in the two hypervariable regions (psaA-ycf4 and ycf1-rps15) (Table 2). Results revealed 666 Pi (among 2,874 variables sites) in the five aligned plastomes, which are distributed as follow: 45.3% of the Pi in IGS, 38.3% in CDS and only 11.4% in introns. The remaining five percent are located in tRNA and rRNA genes. The two hypervariable regions containing psaA-ycf4 and ycf1-rps15, account for 8.7 and 14.3% of the total number of Pi, respectively. Molecular phylogenies were performed using either complete plastomes, or introns, or IGS or CDS, and revealed a similar topology (Supplementary Fig. S5), with the rough-seeded species (L. atlanticus and L. princei) in a well-supported clade (always with 100% of bootstrap support) clearly distinct from the smooth-seeded species L. albus, L. luteus and L. micranthus. Among the latter, L. albus was always the closest Mediterranean lupine to the rough-seeded group (86–87% of bootstrap value based on either IGS or complete plastome data). These results differ from previous phylogenies,,, based on single or few genes (chloroplast and nuclear genes), which found L. micranthus to be the closest Mediterranean lupine to the rough-seeded species. In addition, the whole plastome phylogenies provide, for the first time, strong evidence (97–99% bootstrap support) of a common ancestor for L. micranthus, L. albus and the rough-seeded lupines, which are positioned as sister group to L. luteus. MP analyses (using PAUP4) of the five aligned lupine plastomes, with or without the 164 non-ambiguous indels (coded as 0 or 1 for the presence of a deletion or an insertion, respectively), led to the same results (not shown). To further investigate the phylogenetic utility of the most variable lupine regions identified, we amplified and sequenced five chloroplast regions (accD, two parts of the ycf1 gene and the ycf1-rps15 and trnF 5ʹ-3ʹ intergenic spacers, length ranging from 800 to 2,000 bp) from 16 lupine species. Each matrix was subjected to ML analysis. After verifying the absence of incongruence between the trees obtained for the five different regions, a concatenated matrix of all regions was analysed, following the conditional-combination approach.L. villosus and L. anatolicus were not considered in this analysis, as not all regions were amplified in these two species. The ML tree obtained from this latter matrix is presented in Fig. 6. Despite low resolution of the basal nodes, the topology is consistent with an early divergence of the lupines into two main lineages: the OW lineage comprising all the smooth- and rough-seeded Mediterranean and African taxa, which includes the representative of the Floridian species (L. diffusus) and the NW lineage composed of all American taxa from diverse origins (except L. diffusus). Within these two main lineages, most clades are consistent with previous phylogenies,,, and some of them present very high support using these cpDNA data, such as: (i) the OW rough-seeded species (L. atlanticus, L. cosentinii, L. digitatus, L. pilosus and L. princei) with a bootstrap value of 99%; (ii) the Mediterranean smooth-seeded lupines L. luteus and L. hispanicus subsp. bicolor (which together form the lutei section) with 100% bootstrap support, linked to L. angustifolius as sister group; (iii) the clade including the Texan lupines and the eastern South American species (L. texensis, L. paraguariensis and L. gibertianus) with 100% bootstrap support, (iv) and a clade (100% bootstrap support) corresponding to the Western American and Mexican species (L. polyphyllus, L. mutabilis, L. mexicanus and the undetermined lupine from Ecuador). Support for these clades is reinforced by synapomorphic indels (Fig. 6). Within the OW lineage, the Mediterranean smooth-seeded species do not form a distinct clade and appear as paraphyletic to the rough-seeded group and the Floridian L. diffusus, L. albus and L. micranthus are placed (with 88% bootstrap support) as the closest Mediterranean smooth-seeded lupines to the rough-seeded species. In this phylogeny, L. albus is sister to L. micranthus, with moderate bootstrap support (75%) rather than to the rough-seeded lupins (with a bootstrap support of 86–87%), as observed in the whole plastome based phylogenies (see above and Supplementary Fig. S5). This incongruence might be explained by the low number of taxa or to the different sequence datasets analysed in these phylogenies (Fig. 6, Supplementary Fig. S5). Further investigation to resolve such phylogenetic uncertainty is needed. Compared with previous Lupinus phylogenies, based on chloroplast (matK, rbcL, trnL intron, trnL-trnF, trnS-trnG, trnT-trnL) or nuclear sequences (LEGCYC1A, LEGCYC1B, ITS1 + 2, GPAT1, GPAT2, SymRK, ETS),,,,, our five hypervariable chloroplast regions may not have revealed novel relationships but strongly reinforced support for some known clades (such as the West and the East American groups, or the OW rough-seeded section). Moreover, they provided additional and significant data supporting the singular Floridian unifoliolate lupines (represented here by L. diffusus), for which phylogenetic placement has always been questionable, as close relative to the OW lupines rather than to the NW ones (at least from the maternally inherited plastome). Finally, we showed the phylogenetic utility of these two identified regions but the consideration of a higher number of lupines and related species will allow for optimal exploitation of their potential to inform these phylogenies, and improve our knowledge of the evolutionary history of lupine and closely related Genistoid clades that are poorly investigated.
Table 2

Number of potentially informative sites in complete plastomes (cp), protein-coding sequences (CDS), intergenic spacers (IGS), introns as well as in the two hypervariable regions

RegionsComplete cpCDSIGSIntronstRNA-rRNApsaA-ycf4ycf1-rps15
Number of Pi66625530276335895
% of total Pi10038.345.311.458.714.3
Aligned length153,46276,51837,94816,09322,90311,5346,092
% of Pi by region0.40.30.70.50.10.51.6
Figure 6

Maximum likelihood unrooted tree (General Time Reversible model, rates Gamma distributed with Invariant Sites, 1,000 bootstraps) of concatenated regions (part of accD and ycf1 genes, ycf1-rps15 IGS and trnF-trnL regions). Bootstrap support values are indicated above branches. Grey diamonds represent indels specific to a node. The numbers above the diamonds indicate the number of additional indels supporting the node, with corresponding indel sizes (in bp) between brackets. The Old Wold (OW) and New World (NW) ancestral nodes are indicated on the tree by solid black points.

Number of potentially informative sites in complete plastomes (cp), protein-coding sequences (CDS), intergenic spacers (IGS), introns as well as in the two hypervariable regions Maximum likelihood unrooted tree (General Time Reversible model, rates Gamma distributed with Invariant Sites, 1,000 bootstraps) of concatenated regions (part of accD and ycf1 genes, ycf1-rps15 IGS and trnF-trnL regions). Bootstrap support values are indicated above branches. Grey diamonds represent indels specific to a node. The numbers above the diamonds indicate the number of additional indels supporting the node, with corresponding indel sizes (in bp) between brackets. The Old Wold (OW) and New World (NW) ancestral nodes are indicated on the tree by solid black points.

3.3.4. Repeated sequences

Repeated sequences are known to play a major role in genome evolution. In chloroplast genomes repeats are involved in various structural rearrangements, such as inversions, insertions or deletions. These structural modifications sometimes lead to pseudogenization or duplication as well as to plastome expansion or contraction.,,, The most striking example of the involvement of repeat sequences in genome size change was observed in Geraniaceae, where the plastome size varies from 128,787 bp to 217,942 bp in Monsonia speciosa and Pelargonium hortorum species, respectively. In Fabaceae, repeat sequences were also shown to be involved in LSC extension in Mimosoids or related to structural rearrangements, as in Trifolium subterraneum that presents numerous reorganization events and a very high percentage of repeated elements (20% of its genome). Recently, a 29 bp IR in the trnS and trnS was found to be at the origin of a large 36 kb inversion discovered in L. luteus (and Genisteae), through a flip-flop recombination event. This inversion was regarded as a new powerful clade marker for most Genistoids in legumes and our study confirmed the presence of this inversion in the four additional Lupinus plastomes investigated here. Since these short inverted repeats (separated by at least 30 kb) are present in almost all known Fabaceae plastomes, it has been underlined that such inversion events could have occurred and could occur again elsewhere via the same mechanism. Interestingly, recently discovered an independent 39 kb inversion at exactly the same location in Robinia pseudoacacia among 13 taxa investigated. This result confirms the potential of such repeats in plastome dynamics, and demonstrates that even rarely occurring, large inversions might result from independent events in distantly related taxa, such as here in Robinioids and Genistoids, biasing their phylogenetic utility. Despite the homoplasious nature of these inversions, such remarkable parallel inversions could be cautiously used as clade evolutionary markers in each of the affected lineages. Because of the importance of repeated elements in plastome evolution (particularly in Fabaceae), we investigated the type and number of repeats present in each Lupinus plastome using REPuter. A total of 142 repeats were identified across the five Lupinus species. These repeats, which are relatively well distributed along the plastome sequences (Fig. 7), were divided into three categories, (i) palindromes (60 repeats), (ii) forward repeats (45 repeats) and (iii) reverse repeats (37 repeats). Although all five chloroplast genomes show a relatively similar number of repeats (24 in L. albus to 33 in L. princei) and confirm previous results obtained by, we identified three, four, nine and six repeats specific to L. atlanticus, L. albus, L. micranthus and L. luteus, respectively.
Figure 7

Distribution of repeated sequences and potentially informative SNPs in lupine plastomes. From the outer to the most inner circles. First circle: representation of genes content; second circle: potentially informative sites; third circle: SSRs (circles correspond to mononucleotides, squares stand for dinucleotides and triangles represent trinucleotides); fourth circle: direct repeat interspaced by < 3 kb; fifth circle: inverted repeat interspaced by < 3 kb; sixth circle: palindromic repeats. In the middle, full and dotted lines represent direct and inverted dispersed repeats (separated by > 3 kb), respectively. The endpoints of the 50-kb inversion, specific to the Papilionoid legumes and of the 36-kb inversion, specific to the Genistoid clade, are represented arrows.

Distribution of repeated sequences and potentially informative SNPs in lupine plastomes. From the outer to the most inner circles. First circle: representation of genes content; second circle: potentially informative sites; third circle: SSRs (circles correspond to mononucleotides, squares stand for dinucleotides and triangles represent trinucleotides); fourth circle: direct repeat interspaced by < 3 kb; fifth circle: inverted repeat interspaced by < 3 kb; sixth circle: palindromic repeats. In the middle, full and dotted lines represent direct and inverted dispersed repeats (separated by > 3 kb), respectively. The endpoints of the 50-kb inversion, specific to the Papilionoid legumes and of the 36-kb inversion, specific to the Genistoid clade, are represented arrows. The number of repeats found across lupine plastomes is much lower than the number observed in some other legumes, such as in T. subterraneum, M. truncatula, G. max, Pisum sativum, or Lathyrus sativus and Cicer arietinum which contain ∼500, 190, 100, 74, 78 and 75 repeats of a similar size, respectively.,, It is thus not surprising to observe a more conserved genome size, gene content and less structural rearrangement in the Lupinus genus. Distribution of these repeats (mononucleotides, dispersed and palindromic) along plastomes was characterized, revealing that around 70% of the repeats are in the LSC, whereas 20% and 10% are localized in the SSC and IR, respectively. Within the plastome, most of the repeats are situated in the highly variable regions of the LSC, in the rps12-trnV intergenic spacer of the IR and in the intron of ndhA in the SSC. The five lupine plastomes exhibit a similar pattern of repeat distribution, with more than half of the repeats localized in the non-coding regions, around 30% are in protein-coding sequences and around 15% are in introns. Only two dispersed repeats (shared by all lupines) were found in tRNA genes, including the inverted repeat found to be involved in the 36-kb inversion. While performing these analyses, we also paid particular attention to Simple Sequence Repeats (SSRs or microsatellites) that are particularly interesting in a wide range of genetic studies in population genetics, plant evolution and domestication, or for the estimation of gene and pollen flow. We found between 37 (L. princei) and 51 (L. luteus) microsatellites longer than 12 bp in lupine plastomes, with mononucleotide repeats representing between 70 and 80% of these microsatellites, compared with only 16–30% and 0–5%, of di- and tri-nucleotide SSRs, respectively. The ycf1 gene, which corresponds to the most variable lupine plastome regions, is the richest region in SSRs, with ∼20% of the total microsatellites. In comparison, the second hypervariable region (psaA-ycf4) presents only zero to five percent of SSRs (Supplementary Table S13). Among the 213 SSRs identified within the five plastomes, nine are perfectly shared by all species, two additional shared SSRs vary in size, whereas the others are species or group-specific and thus represent potentially useful markers. Taken together, the various kinds of markers revealed from this study (Single Nucleotide Polymorphisms or SNPs, indels, repeats and inversions) represent important resources of genetic/genomic markers with which to deepen our investigations of Lupinus and its Genistoid allies, and for comparative analyses in legumes.

4. Conclusion and further perspectives

In this work, four additional lupine chloroplast genomes were sequenced, assembled and analysed at different levels. This study provides novel insights into the chloroplast genome evolutionary dynamics in the poorly studied Genistoid clade. Our results revealed highly conserved structure and gene content among the five Lupinus species with the exception of the rps16 gene, which is very likely pseudogenized in the different lupine species investigated. Detailed surveys of mitochondrion, nuclear and chloroplast genomes available to date revealed that rps16 gene is absent from all plant mitochondria, strongly suggesting that this gene was functionally replaced by the nuclear rps16 gene since the divergence of plants. Compared with the mitochondrion, the chloroplast rps16 gene is still present in many plants but has lost its functionality many times independently. Analysis of the evolution rate of functional rps16 genes present in both the nuclear and chloroplast genomes of some representative angiosperm species revealed that these genes are both under purifying selection, whereas a relaxed selective constraint was expected for the chloroplast copy. Comparative analyses of lupine plastomes also enabled identification of two hypervariable regions: psaA-ycf4 (11.5 kb) and ycf1-rps15 (6.5 kb). We demonstrate that these regions, which contain a high number of potentially informative sites and the highest number of SSRs, were highly consistent with, and reinforced the support for, previous phylogenies. The analyses of the short repeated sequences present in Lupinus plastomes allowed us to identify different types of chloroplast markers that could be very useful, low cost and easy to use for studying genetic diversity and evolutionary history of lupines or Genistoids. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  83 in total

1.  The tobacco plastid accD gene is essential and is required for leaf development.

Authors:  Vasumathi Kode; Elisabeth A Mudd; Siriluck Iamtham; Anil Day
Journal:  Plant J       Date:  2005-10       Impact factor: 6.417

Review 2.  Legume comparative genomics: progress in phylogenetics and phylogenomics.

Authors:  Quentin Cronk; Isidro Ojeda; R Toby Pennington
Journal:  Curr Opin Plant Biol       Date:  2006-02-15       Impact factor: 7.834

3.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

Review 4.  Structure and activities of group II introns.

Authors:  F Michel; J L Ferat
Journal:  Annu Rev Biochem       Date:  1995       Impact factor: 23.643

5.  The highly rearranged chloroplast genome of Trachelium caeruleum (Campanulaceae): multiple inversions, inverted repeat expansion and contraction, transposition, insertions/deletions, and several repeat families.

Authors:  M E Cosner; R K Jansen; J D Palmer; S R Downie
Journal:  Curr Genet       Date:  1997-05       Impact factor: 3.886

6.  Different status of the gene for ribosomal protein S16 in the chloroplast genome during evolution of the genus Arabidopsis and closely related species.

Authors:  Shradha Roy; Minoru Ueda; Koh-ichi Kadowaki; Nobuhiro Tsutsumi
Journal:  Genes Genet Syst       Date:  2010       Impact factor: 1.517

7.  Capturing the biofuel wellhead and powerhouse: the chloroplast and mitochondrial genomes of the leguminous feedstock tree Pongamia pinnata.

Authors:  Stephen H Kazakoff; Michael Imelfort; David Edwards; Jasper Koehorst; Bandana Biswas; Jacqueline Batley; Paul T Scott; Peter M Gresshoff
Journal:  PLoS One       Date:  2012-12-14       Impact factor: 3.240

8.  The Complete Sequence of the Acacia ligulata Chloroplast Genome Reveals a Highly Divergent clpP1 Gene.

Authors:  Anna V Williams; Laura M Boykin; Katharine A Howell; Paul G Nevill; Ian Small
Journal:  PLoS One       Date:  2015-05-08       Impact factor: 3.240

9.  MultiLoc2: integrating phylogeny and Gene Ontology terms improves subcellular protein localization prediction.

Authors:  Torsten Blum; Sebastian Briesemeister; Oliver Kohlbacher
Journal:  BMC Bioinformatics       Date:  2009-09-01       Impact factor: 3.169

10.  Chloroplast Microsatellite Diversity in Phaseolus vulgaris.

Authors:  F Desiderio; E Bitocchi; E Bellucci; D Rau; M Rodriguez; G Attene; R Papa; L Nanni
Journal:  Front Plant Sci       Date:  2013-01-22       Impact factor: 5.753

View more
  31 in total

1.  Comparative Analysis of Chloroplast Genomes of Dalbergia Species for Identification and Phylogenetic Analysis.

Authors:  Hoi-Yan Wu; Kwan-Ho Wong; Bobby Lim-Ho Kong; Tin-Yan Siu; Grace Wing-Chiu But; Stacey Shun-Kei Tsang; David Tai-Wai Lau; Pang-Chui Shaw
Journal:  Plants (Basel)       Date:  2022-04-20

2.  Comparative Analyses of 3,654 Plastid Genomes Unravel Insights Into Evolutionary Dynamics and Phylogenetic Discordance of Green Plants.

Authors:  Ting Yang; Sunil Kumar Sahu; Lingxiao Yang; Yang Liu; Weixue Mu; Xin Liu; Mikael Lenz Strube; Huan Liu; Bojian Zhong
Journal:  Front Plant Sci       Date:  2022-04-11       Impact factor: 6.627

3.  A comparison of chloroplast genome sequences in Aconitum (Ranunculaceae): a traditional herbal medicinal genus.

Authors:  Hanghui Kong; Wanzhen Liu; Gang Yao; Wei Gong
Journal:  PeerJ       Date:  2017-11-07       Impact factor: 2.984

4.  Plastid Genomes of Five Species of Riverweeds (Podostemaceae): Structural Organization and Comparative Analysis in Malpighiales.

Authors:  Ana M Bedoya; Bradley R Ruhfel; C Thomas Philbrick; Santiago Madriñán; Claudia P Bove; Attila Mesterházy; Richard G Olmstead
Journal:  Front Plant Sci       Date:  2019-08-20       Impact factor: 5.753

5.  Plastid Genome Evolution in the Early-Diverging Legume Subfamily Cercidoideae (Fabaceae).

Authors:  Yin-Huan Wang; Susann Wicke; Hong Wang; Jian-Jun Jin; Si-Yun Chen; Shu-Dong Zhang; De-Zhu Li; Ting-Shuang Yi
Journal:  Front Plant Sci       Date:  2018-02-08       Impact factor: 5.753

6.  Sequencing, Characterization, and Comparative Analyses of the Plastome of Caragana rosea var. rosea.

Authors:  Mei Jiang; Haimei Chen; Shuaibing He; Liqiang Wang; Amanda Juan Chen; Chang Liu
Journal:  Int J Mol Sci       Date:  2018-05-09       Impact factor: 5.923

7.  Quillworts from the Amazon: A multidisciplinary populational study on Isoetes serracarajensis and Isoetes cangae.

Authors:  Gisele Lopes Nunes; Renato Renison Moreira Oliveira; José Tasso Felix Guimarães; Ana Maria Giulietti; Cecílio Caldeira; Santelmo Vasconcelos; Eder Pires; Mariana Dias; Maurício Watanabe; Jovani Pereira; Rodolfo Jaffé; Cinthia Helena M M Bandeira; Nelson Carvalho-Filho; Edilson Freitas da Silva; Tarcísio Magevski Rodrigues; Fernando Marino Gomes Dos Santos; Taís Fernandes; Alexandre Castilho; Pedro Walfir M Souza-Filho; Vera Imperatriz-Fonseca; José Oswaldo Siqueira; Ronnie Alves; Guilherme Oliveira
Journal:  PLoS One       Date:  2018-08-08       Impact factor: 3.240

8.  The First Glimpse of Streptocarpus ionanthus (Gesneriaceae) Phylogenomics: Analysis of Five Subspecies' Chloroplast Genomes.

Authors:  Cornelius M Kyalo; Zhi-Zhong Li; Elijah M Mkala; Itambo Malombe; Guang-Wan Hu; Qing-Feng Wang
Journal:  Plants (Basel)       Date:  2020-04-04

9.  Comparison of Chloroplast Genomes among Species of Unisexual and Bisexual Clades of the Monocot Family Araceae.

Authors:  Claudia L Henriquez; Furrukh Mehmood; Iram Shahzadi; Zain Ali; Mohammad Tahir Waheed; Thomas B Croat; Peter Poczai; Ibrar Ahmed
Journal:  Plants (Basel)       Date:  2020-06-11

10.  Reconfiguration of the plastid genome in Lamprocapnos spectabilis: IR boundary shifting, inversion, and intraspecific variation.

Authors:  Seongjun Park; Boram An; SeonJoo Park
Journal:  Sci Rep       Date:  2018-09-11       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.