Literature DB >> 25477958

Comparative genomics of ten solanaceous plastomes.

Harpreet Kaur1, Bhupinder Pal Singh1, Harpreet Singh2, Avinash Kaur Nagpal1.   

Abstract

Availability of complete plastid genomes of ten solanaceous species, Atropa belladonna, Capsicum annuum, Datura stramonium, Nicotiana sylvestris, Nicotiana tabacum, Nicotiana tomentosiformis, Nicotiana undulata, Solanum bulbocastanum, Solanum lycopersicum, and Solanum tuberosum provided us with an opportunity to conduct their in silico comparative analysis in depth. The size of complete chloroplast genomes and LSC and SSC regions of three species of Solanum is comparatively smaller than that of any other species studied till date (exception: SSC region of A. belladonna). AT content of coding regions was found to be less than noncoding regions. A duplicate copy of trnH gene in C. annuum and two alternative tRNA genes for proline in D. stramonium were observed for the first time in this analysis. Further, homology search revealed the presence of rps19 pseudogene and infA genes in A. belladonna and D. stramonium, a region identical to rps19 pseudogene in C. annum and orthologues of sprA gene in another six species. Among the eighteen intron-containing genes, 3 genes have two introns and 15 genes have one intron. The longest insertion was found in accD gene in C. annuum. Phylogenetic analysis using concatenated protein coding sequences gave two clades, one for Nicotiana species and another for Solanum, Capsicum, Atropa, and Datura.

Entities:  

Year:  2014        PMID: 25477958      PMCID: PMC4248371          DOI: 10.1155/2014/424873

Source DB:  PubMed          Journal:  Adv Bioinformatics        ISSN: 1687-8027


1. Introduction

Chloroplasts are essential cellular organelles within plant cells possessing the enzymatic machinery for the process of photosynthesis which provides essential energy to plants. Besides photosynthesis, chloroplasts are also involved in biosynthesis of fatty acids, amino acids, pigments, and vitamins [1, 2]. Despite enormous divergence in whole plant form and habitat, chloroplast structure and function have remained remarkably conserved which might be due to intense evolutionary selection pressures associated with the functional requirements of photosynthesis [3-7]. The chloroplast genome is actually a reduced genome derived from a cyanobacterial ancestor that was captured early in the evolution of the eukaryotic cell [8, 9]. Among the three genomes of the plant cell, the plastome is the most gene dense with more than 100 genes in a genome of only 120 to 210 kb [10]. In the last two decades, the nucleotide sequences of large number of plastid genomes have been published leading to better understanding of their organization and evolution [2, 11, 12]. Currently, about 470 eukaryotic chloroplast genomes have been sequenced completely (http://www.ncbi.nlm.nih.gov/genomes/GenomesHome.cgi?taxid=2759&hopt=html) with the best representation from flowering plants. Most land plant chloroplast genomes are composed of a single circular chromosome with a quadripartite structure which includes two copies of an inverted repeat (IR) region that separates the large and small single copy regions (LSC and SSC). Genes of chloroplast genomes of higher plants can be divided into three broad categories [13, 14]. In the first, there are genetic system genes encoding for rRNAs, tRNAs, ribosomal proteins, and RNA polymerase subunits. The second category is comprised of genes for photosynthesis which encode subunits of the two photosystems, the cytochrome b6f complex and the ATP synthase. Open reading frames (orfs) of unknown function constitute the third category. Besides, there are some other genes coding for different kinds of proteins including infA, matK, clpP, cemA, accD, and ccsA. Although overall chloroplast genome organization is highly conserved among taxa, structural rearrangements due to inversions have been reported in different taxa like Campanulaceae [15], Cyatheaceae [16], Fabaceae [17], Funariaceae [18], Geraniaceae [19], Onagraceae [20], and Poaceae [21, 22]. Besides structural rearrangements, sequence polymorphisms have also been reported in some cereals [23, 24] and Oenothera species [20]. These studies revealed that highly divergent sequences were concentrated in specific regions called “hotspots.” Such sequence polymorphisms have been used to derive phylogenetic relationships among species. Solanaceae is an important family of dicots comprising more than 3000 species placed within about 90 genera. It is an ethnobotanical family and is extensively utilized by humans and has recently become a model of comparative and evolutionary genomics research. Few efforts have been made to study the variations in chloroplast genomes of Solanaceae family by using in silico tools. Most of these attempts have been concentrated on comparison of newly sequenced chloroplast genome with the available complete chloroplast genomes from some members of this family [25-29]. The availability of complete nucleotide sequences of plastid genomes of ten solanaceous species, Atropa belladonna (NC_004561.1; [30]), Capsicum annuum (NC_018552.1; [29]), Datura stramonium (NC_018117.1; Li et al. (unpublished)), Nicotiana sylvestris (NC_007500.1; [28]), Nicotiana tabacum (NC_001879.2; [31]), Nicotiana tomentosiformis (NC_007602.1; [28]), Nicotiana undulata (NC_016068.1; [32]), Solanum bulbocastanum (NC_007943.1; [26]), Solanum lycopersicum (NC_007898.2; [27]), and Solanum tuberosum (NC_008096.2; [33]), provided us with an opportunity to conduct their in silico comparative analysis in depth. Hence, the present study is an attempt to compare the genome organization, structure, and coding capacity of chloroplast genomes of ten solanaceous species. The study focuses on length mutations, intron-containing genes, grouping of genes in different identity classes based on pairwise comparison of individual genes, and InDel analysis of divergent genes.

2. Materials and Methods

2.1. Sequence Analysis

Whole chloroplast genome sequences as well as individual gene and protein sequences of ten solanaceous species were obtained from “Organelle Genome Resources” section of NCBI in Genbank as well as in Fasta format. Sequence regions corresponding to various genomic features including genes, exons, introns, and cds were specifically extracted from the Genbank files using Extractfeat, Extractseq, and Featcopy programs from Jemboss package. AT percentage for different genomic regions was calculated using Wordcount and Union programs from Jemboss package. Pairwise comparison of gene sequences was done by using NCBI BLAST program and multiple sequence alignment of nucleotide as well as protein sequences was done by using ClustalW. Alignments of protein sequences for some of the genes were manually edited in correspondence to InDels observed in alignments of their nucleotide sequences.

2.2. Phylogenetic Analysis of Concatenated Protein-Coding Genes

75 protein-coding genes of plastomes of ten solanaceous species and two outgroup species (Daucus carota and Coffea arabica) were selected for phylogenetic analysis from the total of 79 classified protein-coding genes excluding accD, rpl20, ycf1, and ycf15. Ycf15 was excluded due to its absence on the plastome of both outgroup species chosen while the other three were not included in the phylogenetic analysis due to their high levels of variation. Multiple sequence alignment of each gene was obtained using ClustalW (https://www.ebi.ac.uk/Tools/msa/clustalw2/). These alignments were then concatenated using standalone BIOEDIT version 7.25 (http://www.mbio.ncsu.edu/bioedit/bioedit.html) and maximum likelihood phylogenetic tree with 500 bootstrap iterations was constructed using PhyMLv3.0 (http://www.atgc-montpellier.fr/phyml/). A graphical view of tree was generated using Archaeopteryx 0.988 SR (https://sites.google.com/site/cmzmasek/home/software/archaeopteryx).

3. Results and Discussion

3.1. Comparison of Properties of Chloroplast Genomes

Comparison of the properties of plastid genomes of ten solanaceous species with respect to their genome size (size of complete plastid genome and LSC, SSC, and IR regions); percent coding regions, introns, and intergenic regions; AT content of overall plastid genomes as well as coding and noncoding regions is presented in Table 1. The total plastid genome size ranged from 155296 bp (S. tuberosum) to 156781 bp (C. annuum). The large size of plastome of C. annuum can be attributed to large LSC region as compared to other species. On the contrary, size of SSC region in C. annuum was the least as compared to other species. The largest size of IR region was in A. belladonna. Among four Nicotiana species studied, N. sylvestris and N. tabacum were almost identical to each other with respect to size of complete genome (difference of only 2 bps) or LSC, SSC, or IR regions compared with plastome of any other species studied. However the percent coding region was slightly more for N. sylvestris (61.49%) than in N. tabacum (61.12%). The size of complete chloroplast genome and LSC and SSC regions of three species of Solanum is comparatively smaller than that of any other species studied except for A. belladonna where size of SSC region was the smallest (18008 bp). However the size of IR region of Solanum species is larger as compared to Nicotiana species. Coding region percentage was found to be higher in Nicotiana species as compared to all other species with maximum for N. undulata (63.12%) and minimum for S. tuberosum (58.45%). Maximum of 12.8% of the plastome was shown to be introns for S. bulbocastanum whereas minimum intron percentage (11.62%) was observed for D. stramonium. Maximum percentage (29.19%) of intergenic region was observed in D. stramonium and minimum (24.19%) was observed in N. undulata. The AT content of noncoding regions was found to be higher as compared to coding regions for all the ten species studied. Similarly, protein-coding regions have shown higher content of AT base pairs as compared to RNA coding genes which can be explained by the requirement of more GC base pairs for proper folding of highly structured ribosomal RNAs and tRNAs [13-27]. Comparison of AT content in LSC, SSC, and IR regions reveals that AT content was the highest in SSC regions and the lowest in IR regions. Some earlier studies have also shown similar distribution of AT content in LSC, SSC, and IR regions with the lowest AT content in IR region and the highest AT content in SSC region [2, 27, 34, 35]. The low AT content of IR regions reflects low AT content in the four ribosomal RNA genes in this region.
Table 1

Properties of the solanaceous chloroplast genomes.

PropertyName of species
ABECANDSTNSYNTANTONUNSBUSLYSTU
Genome size (bp)156687156781 155871155941155943155745155863155371155461155296
LSC (bp) (coordinates)* 86,869 (1–86869)87366 (1–87366)86297 (1–86297)86684 (1–86684)86,686 (1–86686)86392 (1–86392)86633 (1–86633)85785 (1–85785)85,882 (1–85882)85737 (1–85737)
IRB (bp) (coordinates)* 25,905 (86870–112774)25783 (87367–113149)25563 (86298–111860)25342 (86685–112026)25,343 (86687–112029)25429 (86393–111821)25331 (86634–111964)25588 (85786–111373)25,608 (85883–111490)25593 (85738–111330)
SSC (bp) (coordinates)* 18,008 (112775–130782)17849 (113150–130998)18448 (111861–130308)18573 (112027–130599)18,571 (112030–130600)18495 (111822–130316)18568 (111965–130532)18381 (111374–129754)18,363 (111491–129853)18373 (111331–129703)
IRA (bp) (coordinates)* 25,905 (130783–156687)25783 (130999–156781)25563 (130309–155871)25342 (130600–155941)25,343 (130601–155943)25429 (130317–155745)25331 (130533–155863)25588 (129755–155342)25,608 (129854–155461)25593 (129704–155296)
Coding regions (%)58.8958.5059.1961.4961.1261.5863.1258.5258.9158.45
Introns (%)12.5112.7111.6212.7012.7012.6812.6912.8212.4712.49
Intergenic regions (%)28.6028.7929.1925.8126.1825.7324.1928.6628.6229.06

AT content (%)
Overall62.4462.2762.1262.1562.15 62.2162.1262.1262.1462.12
Coding regions59.8659.6859.6559.8559.7959.7959.7059.6159.6559.59
Noncoding regions66.1365.9365.7265.8465.8766.0966.2765.6665.7165.68
tRNAs47.7047.3847.0847.0647.0547.1047.0847.1247.0147.06
rRNAs44.6444.7344.6344.6444.6444.6444.6444.6644.6644.65
Protein-coding genes62.0161.8361.7961.9161.8661.8461.6861.7661.8061.74
LSC64.3764.2564.0464.0564.0564.1264.0163.9964.0163.99
SSC68.3567.9967.7267.9467.9368.0367.8767.8767.9767.91
IR57.1456.9456.8756.7856.7856.8456.7856.9356.9156.90

ABE: Atropa belladonna, CAN: Capsicum annuum, DST: Datura stramonium, NSY: Nicotiana sylvestris, NTA: Nicotiana tabacum, NTO: Nicotiana tomentosiformis, NUN: Nicotiana undulata, SBU: Solanum bulbocastanum, SLY: Solanum lycopersicum, STU: Solanum tuberosum, LSC: large single copy region, SSC: small single copy region, and IR: inverted repeat region.

*Start and end position of nucleotide in the genome.

3.2. Gene Content of Solanaceous Chloroplast Genomes

The genes present in different regions of the plastid genomes are highly conserved except for several open reading frames [6, 26, 36]. There are typically 111 genes, 5 hypothetical chloroplast reading frames (ycfs), and few open reading frames (orfs). Some of our unique findings have been discussed below. The trnP-GGG which codes for tRNA for proline was annotated only in D. stramonium whereas its alternative code trnP-UGG was annotated in all other species including D. stramonium (NC_018117.1; Li et al. (unpublished)). We mined all the species for similar sequence by BLAST search but no similar sequence was found in any other species. Gene trnH was only reported to be trnH coding gene in C. annuum. In all other species, this region was reported to be part of ycf2 gene as in C. annuum also. These observations indicate the presence of duplicate copy of trnH gene sequence in C. annuum and two alternative tRNA genes coding for proline amino acid in D. stramonium. However, no other evidence was found in databases about this particular region coding for trnH. Rps19 pseudogene was reported in three species, namely, N. tomentosiformis, S. bulbocastanum, and S. tuberosum. All other species were mined for similar pseudogene using BLAST pairwise algorithm which confirmed the presence of rps19 pseudogene in other species, namely, A. belladonna, C. annuum, and D. stramonium. The presence of pseudogene may be attributed to the expansion of IRB into the LSC region. In three species, namely, N. sylvestris, N. tabacum, and N. undulata, rps19 pseudogene was found to be absent. infA, a pseudogene for all species except A. belladonna, D. stramonium, and S. Lycopercicum, is a protein-coding gene for S. bulbocastanum. Homology search with infA sequence from S. bulbocastanum against plastomes of A. belladonna and D. stramonium revealed identical sequence in both species. sprA gene has been annotated for N. sylvestris, N. tomentosiformis, S. lycopersicum, and S. tuberosum. Its identical orthologous gene sequences were found in A. belladonna, C. annuum, D. stramonium, N. tabacum, N. undulata, and S. bulbocastanum using BLAST search.

3.3. Split Genes

A total of eighteen split genes have been reported. The sizes of exons and introns for these genes in all the solanaceous species studied are summarized in Table 2. The rps12 gene is divided such that its 5′ end exon is located in the LSC region whereas second and third exons are located in the IR region. Maturation of RNA transcript requires a trans-splicing mechanism between exon 1 and exon 2 [34, 37]. Among the eighteen intron-containing genes, ycf3, clpP, and rps12 contained two introns whereas the other 15 genes contain only one intron. As per Kim and Lee [34] trnL-UAA gene intron belongs to the self-splicing group I intron whereas all other introns belong to group II. Generally, the size of exons was shown to be conserved and variability was observed in the intron regions; however, ndhB was found to be highly conserved for both exons and introns.
Table 2

The lengths of introns and exons for the split genes of ten solanaceous species.

Gene (region)Exon/intronABECANDSTNSYNTANTONUNSBUSLYSTU
trnK-UUU (LSC)Exon I37373737373737373737
Intron I2519250025062526252625262521250125142512
Exon II36353535353535353535

rps16 (LSC)Exon I40404040404040404040
Intron I822865866860860860859855864855
Exon II227227227218218218218227227227

trnG-UCC (LSC)Exon I23232323232323232323
Intron I692692694692692690691701695692
Exon II48484848484848374848

atpF (LSC)Exon I145145145145145145145144144145
Intron I715693700695695692692693686693
Exon II410410410410410410410411411410

rpoC1 (LSC)Exon I432453453453453432453453453453
Intron I737742737737737709733737737737
Exon II1614161416141614161416141623161416141614

ycf3 (LSC)Exon I124124124124124124124124124124
Intron I739742740739738731735730729727
Exon II230230230230230230230230230230
Intron II763744753783783779781750750750
Exon III153153159153153153153153153153

trnL-UAA (LSC)Exon I35353535353535373535
Intron I497426501503503497498502497497
Exon II50505050505050505050

trnV-UAC (LSC)Exon I38383838383838383838
Intron I572575569571571572573569571571
Exon II35353735353535373535

rps12* Exon I114114114114114114114114114114
Intron I
Exon II232232232232232232232232232232
Intron II535536536536536536536536536536
Exon III26262626262626262626

clpP (LSC)Exon I71717171717171717171
Intron I799811792807807789789789798789
Exon II292292292292292292292292292292
Intron II622626624637637634631625617620
Exon III228228234228228228228234258234

petB (LSC)Exon I6666666666
Intron I759755746753753753753747747747
Exon II642642642642642642642642642642

petD (LSC)Exon I8898888688
Intron I742742748742742742742739738739
Exon II475475474475475475475477475475

rpl16 (LSC)Exon I9999999999
Intron I1019102610251020102010211020101410181014
Exon II396396396396396396396396396396

rpl2 (IR)Exon I391391393391391391391390391391
Intron I664665669666666666666666666666
Exon II434434429434434434434435434434

ndhB (IR)Exon I777777777777777777777777777777
Intron I679679679679679679679679679679
Exon II756756756756756756756756756756

trnI-GAU (IR)Exon I37374237373737423737
Intron I717722717707707716716717722722
Exon II34353535353535353535

trnA-UGC (IR)Exon I38383838383838383838
Intron I681811811709709709709811811811
Exon II35353535353535353535

ndhA (SSC)Exon I553553552553553553553552553553
Intron I1150115711541148114811491148115811331158
Exon II539539537539539539539540539539

*rps12 gene is dividing gene. The 3′-rps12 locates on the IR-region, while the 5′-rps12 locates on the LSC region.

ABE: Atropa belladonna, CAN: Capsicum annuum, DST: Datura stramonium, NSY: Nicotiana sylvestris, NTA: Nicotiana tabacum, NTO: Nicotiana tomentosiformis, NUN: Nicotiana undulata, SBU: Solanum bulbocastanum, SLY: Solanum lycopersicum, and STU: Solanum tuberosum.

3.4. Pairwise Comparison of Plastid Genes of Solanaceae and InDel Analyses

Pairwise comparison of nucleotide sequences of individual gene sequences (45 combinations) for 116 genes was also performed to classify genes based on percent identity. Supplementary Table 1 (Supplementary Material available online at http://dx.doi.org/10.1155/2014/424873) shows grouping of genes in different clusters based on percent identity in pairwise comparison. Genes which showed 100% identity in comparison were considered as highly conserved and the genes showing less than 95% identity at least once in the comparison were considered as highly divergent. These highly divergent genes were further explored at nucleotide as well as at protein level to probe the variations in detail. A total of 11 highly divergent genes were found whereas the number of highly conserved genes varied from 26 (for species pair: N. tomentosiformis and S. lycopersicum) to 107 (for species pair: N. sylvestris and N. tabacum). Most of the tRNA genes were found to be highly conserved. Genes accD, cemA, clpP, ndhA, rpl32, rpl36, rps16, sprA, trnA-UGC, trnL-UAA, and ycf1 were found to be highly diverged. Tables 3 and 4 describe the summary of InDels observed in nucleotide and amino acid sequences, respectively. Partial multiple sequence alignment of 9 genes and 5 proteins is shown in Figures 1 and 2, respectively. The longest insertion of 141 bp was observed in accD gene sequence of C. annuum. Since genes clpP, ndhA, rps16, and trnL-UAA contained introns, it was important to examine whether these InDels were present in exon or intron region. It was found that all the InDels reported in ndhA and trnL-UAA were present in introns whereas, in case of clpP, InDel 24 was located in exon of the gene. Similarly, the first and last InDels of gene rps16 were present in exons of the gene. Keeping in view the observations in number and length of InDels in nucleotide and protein sequences of genes under consideration, the variation for individual genes is discussed below.
Table 3

InDels in nucleotide sequences of 9 genes of ten solanaceous plastid genomes.

S. numberGeneabc Total number of InDelsInDel length (bp)
1accDa 424, 9, 141, 6
2clpPa 248(I), 14(I), 13(I), 7(I), 1(I), 2-3(I), 7(I), 1–7(I), 3(I), 2(I), 3(I), 1–7(I), 1–3(I), 1(I), 1(I), 1(I), 1–5(I), 4–7(I), 1(I), 9(I), 1-2(I), 3(I), 5(I), 24–30
3ndhAb 149(I), 5-6(I), 3(I), 1(I), 9(I), 3(I), 4(I), 1–4(I), 1-2(I), 1–23(I), 1-2(I), 2(I), 1(I), 3(I)
4rpl32b 2 2-3, 4
5rps16a 111–38, 9(I), 1(I), 1(I), 5(I), 1-2(I), 5(I), 4(I), 6(I), 1(I), 9
6sprAb 2109, 7
7trnA-UGCc 1102–130
8trnL-UAAa 41, 6, 71, 4
9ycf1b 313, 18, 18, 21, 6, 6, 48, 9, 6, 6, 42, 3, 6, 30, 3, 15, 12–39, 18, 6, 9–36, 6, 6, 6, 9, 9, 12, 6, 6, 6, 57, 6

abcLocation in different regions; aLSC, bSSC, and cIR; I: InDels present in introns.

Table 4

InDels in amino acid sequences of 5 proteins of ten solanaceous plastid genomes.

S. numberProteinTotal number of InDelsInDel length (bp)
1accD48, 3, 47, 2
2clpP22, 10
3rpl32 11-2
4rps1613
5ycf1291, 6, 6, 7, 2, 2, 7, 3, 2, 2, 14, 1–10, 1, 5, 4–13, 6, 2, 3–12, 2, 2, 2, 3, 3, 4, 2, 2, 2, 19, 2
Figure 1

Partial multiple sequence alignment of accD, clpP, ndhA, rpl32, rps16, sprA, tRNA-Ala (UGC), tRNA-Leu(UAA), and ycf1 gene sequences of ten solanaceous species showing location of InDels indicated by hyphens.

Figure 2

Partial multiple sequence alignment of amino acid sequences of genes, namely, accD, clpP, ndhA, rpl32, rps16, sprA, tRNA-Ala(UGC), tRNA-Leu(UAA), and ycf1, of ten solanaceous species showing location of InDels indicated by hyphens.

(1) accD. A total of four InDels were observed in accD gene as depicted in Figure 1. Insertion of 24 bp was present interestingly in all Nicotiana species and D. stramonium followed by insertion of 9 bp in all Solanum species indicating stronger sequence conservation at genus level. These insertions were also reported by Chung et al. [25]. A 141 bp insertion was observed specifically in C. annuum which has also been reported by Jo et al. [29] and confirmed by RT-PCR. Similarly a species specific deletion of 6 bp was found in D. stramonium. All these InDels were also reflected in the corresponding protein sequences as shown in Figure 2. The accD gene has been reported to be one of the most variable plastid genes and is probably under diversifying selection [26]. (2) clpP. In clpP gene InDels were found both in intron and in exon regions. Two major consequences were observed in the InDels in the exon regions. An insertion of 6 bp in S. bulbocastanum and S. tuberosum and 30 bp in S. lycopersicum at 3′ end (exon 3) of the clpP gene resulted in shifting of stop codon by 6, 6, and 30 bp downstream in respective species compared to other species of Solanaceae family, increasing the length of the coding sequence and the protein product (Figures 1 and 2). An interesting feature was observed as InDel 1 in protein sequence corresponding to insertion of a repeat of “I” amino acids in D. stramonium making exon 3 region longer by 6 bp. This region however corresponds to the end of intron 2 in clpP gene in all other species. Since D. stramonium chloroplast genome has been sequenced recently, this observation needs to be experimentally validated. (3) ndhA. All the InDels found in ndhA were present in introns while the protein-coding regions (exons) were highly conserved. This indicates high diversifying selection on intronic region of this gene. Out of the total 14 InDels most of the InDels were observed with respect to C. annuum (InDels 1, 2, 5, 7, and 10). InDel 10 was observed to be shared by C. annuum and S. lycopersicum in full and by C. annuum and A. belladonna in part. (4) rpl32. In rpl32 insertion of 1 bp in D. stramonium and 3 bp in C. annuum was found in the 3′ region of gene while a deletion of 4 bp was observed in D. stramonium. The insertion of 3 bp in C. annuum only altered the length of the protein by making it longer by 1 amino acid. However, the small insertion of 1 bp in D. stramonium proved to be a frameshift mutation resulting in three changes in the amino acid sequence near the C-terminus. Moreover, deletion of 4 bp at the 3′ end resulted in premature termination of protein synthesis. The frameshift mutation and the 3′ end deletion finally reduced the gene product length by 1 amino acid. As the C-terminal of amino acid chain is well conserved in all the other species, the effect of above mentioned variations needs to be validated experimentally. (5) rps16. In rps16 also InDels were observed in introns as well as exons. Five of the major insertions in the intron regions were species specific. Insertion of 38 bp (InDel 1), 9 bp (InDel 2), 5 bp (InDel 7), 4 bp (InDel 8), and 6 bp (InDel 9) was observed in A. belladonna, S. lycopersicum, D. stramonium, and C. annuum. A deletion of 5 bp was observed in all the three Solanum species and C. annuum. A deletion of 9 bp was observed in all Nicotiana species resulting in an amino acid change (P to S) and shortening of protein by three amino acids in the C-terminal region. Similar deletion has also been observed by Kahlau et al. [27] and was suggested to be functionally neutral. (6) sprA. sprA gene has been reported as stable noncoding RNA of unknown function. This gene has been suggested to influence 16S rRNA maturation [38, 39]. In many species this gene seems to be present as remnant and shows large variations in its 5′ region. The largest deletion of 109 bp was observed in C. annuum. The rest of this gene appears to be more conserved with a deletion towards the 3′ end in all Nicotiana species and A. belladonna. The manner in which this gene functions and the consequences of the above mentioned variations are yet to be investigated experimentally. (7) trnA-UGC. In this particular gene a long deletion of 102 bp was observed in all Nicotiana species. Interestingly, this deletion was further extended to 130 bp in both directions in A. belladonna. These deletions were found in the intron region and so are unlikely to have any negative impact on gene product function. (8) trnL-UAA. The trend of variation in trnL-UAA was similar to that in ndhA as all InDels were observed in introns. The longest species specific deletion (InDel 3) was observed in C. annuum whereas short insertion of four nucleotides, a repeat of “T,” was observed specifically in D. stramonium. Another insertion of 6 bp was observed in two Nicotiana species, that is, N. sylvestris and N. tabacum. (9) ycf1. Many InDels (3′ region) were found in the fastest evolving gene, that is, ycf1 gene. Most of the InDels were found to be species specific. Maximum InDels (InDels 2, 3, 5, 7, 9, 16, 17, 19, 23, 24, 25, and 30) were observed in C. annuum followed by D. stramonium (InDels 4, 6, 20, 26, and 31), by N. tomentosiformis (InDels 8, 17, and 18), and by S. lycopersicum (InDels 16, 22, and 28). Two genus specific InDels (InDels 14 and 21) were observed in all the four Nicotiana species. However, InDel 19 was also present in D. stramonium. Another genus specific InDel (InDel 29) was observed in Solanum species. Most of the InDels altered the length of the gene product with maximum length of 1906 amino acids (aa) observed in C. annuum and the minimum of 1873 aa observed in D. stramonium. Among the Solanum species the length of protein (1887 aa) was conserved among S. bulbocastanum and S. tuberosum. However, S. lycopersicum was having the amino acid sequence of 1891 aa, larger by 4 aa as compared to the other two species of the same genus. Among the four Nicotiana species the ycf1 gene product length was conserved among N. sylvestris, N. tabacum, and N. undulata having protein lengths of 1901 aa, 1902 aa, and 1901 aa, respectively. However, N. tomentosiformis was observed to be the most variable member of the genus Nicotiana having protein length of 1892 aa.

3.5. Phylogenetic Analysis of Solanaceous Plastomes

Evolutionary relationships between diverse plant species have been analyzed using several plastome markers including matK and rbcL (genes) or trnH-psbA and trnL-trnF (intergenic regions) due to sequence conservation among plant taxa blended with suitable variation [40, 41]. However, determination of phylogeny based on single gene sequences may be inaccurate [42]. Availability of complete chloroplast sequences for many species has made it possible to use many individual genes or concatenated gene sequences to deduce phylogenetic relationships among taxa [42-44]. Efforts have been made to carry out phylogenetic analysis of solanaceous species using complete plastome sequences by Moore et al. [44] and Jansen et al. [45]. Evolutionary positions of Capsicum and Datura in Solanaceae have been determined using a single or a few plastid genes [46, 47]. Recently, concatenated protein-coding gene sequences from completely sequenced plastomes were used to obtain reasonable phylogenetic relationships for solanaceous species [29]. In the present investigation we also used a similar approach to analyze the phylogenetic relationship for ten solanaceous species with completely sequenced plastomes. Individual multiple sequence alignments were concatenated for maximum likelihood phylogenetic tree generation. As depicted in Figure 3, taxa were divided into two clades with 100% bootstrap value of 500. The first clade consisted of four Nicotiana species while the species in Solanum, Capsicum, Atropa, and Datura were included in the second clade. These results are in line with previous phylogenetic analyses using concatenated protein-coding gene sequences as well as phylogenetic analyses using plastid ndhF and trnL-F sequences [29, 47]. However, in an analysis of 13 orfs of solanaceous plastomes, a different arrangement was shown in which Atropa was shown to be separated from Solanum and was grouped together with Nicotiana [25].
Figure 3

Maximum likelihood phylogenetic tree derived using concatenated nucleotide sequences of 75 protein-coding genes of ten solanaceous species and two outgroup species.

4. Conclusions

The analyses of complete plastid genomes of ten solanaceous species revealed overall similarity in terms of the gene content and organization. The sizes of LSC, SSC, and IR regions were found to be somewhat conserved among species but a significant variation was found between genera. Most of the coding regions were well conserved. However, many of the features in few genes were observed to be typical of a particular genus and even species, which can be used as molecular markers to investigate genetic diversity and evolution. These typical variations can be further utilized to develop more sophisticated DNA barcoding based techniques. Ten solanaceous species are divided into two clades on the basis of Phylogenetic analysis using concatenated alignment of gene sequences from coding regions of plastomes. The first clade consisted of four Nicotiana species and the second clade consisted of species of Solanum, Capsicum, Atropa, and Datura. Supplementary table 1: shows grouping of genes in different clusters based on percent identity in pairwise comparison. Ten clusters were made depending upon percentage identity observed between the genes ranging from 80% (minimum identity observed for a given gene between any two species) to 100%. Genes which showed 100% identity in comparison were considered as highly conserved and the genes showing less than 95% identity at least once in the comparison were considered highly divergent. These highly divergent genes were further explored at nucleotide as well as at protein level to probe the variations in detail. A total of 11 highly divergent genes were found whereas the number of highly conserved genes varied from 26 (for species pair: N. tomentosiformis and S. lycopersicum) to 107 (for N. sylvestris and N. tabacum). Most of the tRNA genes were found to be highly conserved. Genes accD, cemA, clpP, ndhA, rpl32, rpl36, rps16, sprA, trnA-UGC, trnL-UAA and ycf1 were found to be highly diverged.
  38 in total

Review 1.  Why have organelles retained genomes?

Authors:  H L Race; R G Herrmann; W Martin
Journal:  Trends Genet       Date:  1999-09       Impact factor: 11.639

2.  Chloroplast DNA inversions and the origin of the grass family (Poaceae).

Authors:  J J Doyle; J I Davis; R J Soreng; D Garvin; M J Anderson
Journal:  Proc Natl Acad Sci U S A       Date:  1992-08-15       Impact factor: 11.205

Review 3.  The chloroplast genome.

Authors:  M Sugiura
Journal:  Plant Mol Biol       Date:  1992-05       Impact factor: 4.076

4.  Complete nucleotide sequence of the chloroplast genome from the Tasmanian blue gum, Eucalyptus globulus (Myrtaceae).

Authors:  Dorothy A Steane
Journal:  DNA Res       Date:  2005       Impact factor: 4.458

5.  Gene transfer from organelles to the nucleus: how much, what happens, and Why?

Authors: 
Journal:  Plant Physiol       Date:  1998-09       Impact factor: 8.340

6.  The highly rearranged chloroplast genome of Trachelium caeruleum (Campanulaceae): multiple inversions, inverted repeat expansion and contraction, transposition, insertions/deletions, and several repeat families.

Authors:  M E Cosner; R K Jansen; J D Palmer; S R Downie
Journal:  Curr Genet       Date:  1997-05       Impact factor: 3.886

7.  Complete chloroplast DNA sequence from a Korean endemic genus, Megaleranthis saniculifolia, and its evolutionary implications.

Authors:  Young-Kyu Kim; Chong-wook Park; Ki-Joong Kim
Journal:  Mol Cells       Date:  2009-03-19       Impact factor: 5.034

8.  Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.

Authors:  Robert K Jansen; Zhengqiu Cai; Linda A Raubeson; Henry Daniell; Claude W Depamphilis; James Leebens-Mack; Kai F Müller; Mary Guisinger-Bellian; Rosemarie C Haberle; Anne K Hansen; Timothy W Chumley; Seung-Bum Lee; Rhiannon Peery; Joel R McNeal; Jennifer V Kuehl; Jeffrey L Boore
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

9.  Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids.

Authors:  Robert K Jansen; Charalambos Kaittanis; Christopher Saski; Seung-Bum Lee; Jeffrey Tomkins; Andrew J Alverson; Henry Daniell
Journal:  BMC Evol Biol       Date:  2006-04-09       Impact factor: 3.260

10.  The complete nucleotide sequences of the five genetically distinct plastid genomes of Oenothera, subsection Oenothera: I. sequence evaluation and plastome evolution.

Authors:  Stephan Greiner; Xi Wang; Uwe Rauwolf; Martina V Silber; Klaus Mayer; Jörg Meurer; Georg Haberer; Reinhold G Herrmann
Journal:  Nucleic Acids Res       Date:  2008-02-24       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.