Literature DB >> 25398081

Complete chloroplast genome sequence of Omani lime (Citrus aurantiifolia) and comparative analysis within the rosids.

Huei-Jiun Su1, Saskia A Hogenhout2, Abdullah M Al-Sadi3, Chih-Horng Kuo4.   

Abstract

The genus Citrus contains many economically important fruits that are grown worldwide for their high nutritional and medicinal value. Due to frequent hybridizations among species and cultivars, the exact number of natural species and the taxonomic relationships within this genus are unclear. To compare the differences between the Citrus chloroplast genomes and to develop useful genetic markers, we used a reference-assisted approach to assemble the complete chloroplast genome of Omani lime (C. aurantiifolia). The complete C. aurantiifolia chloroplast genome is 159,893 bp in length; the organization and gene content are similar to most of the rosids lineages characterized to date. Through comparison with the sweet orange (C. sinensis) chloroplast genome, we identified three intergenic regions and 94 simple sequence repeats (SSRs) that are potentially informative markers with resolution for interspecific relationships. These markers can be utilized to better understand the origin of cultivated Citrus. A comparison among 72 species belonging to 10 families of representative rosids lineages also provides new insights into their chloroplast genome evolution.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25398081      PMCID: PMC4232571          DOI: 10.1371/journal.pone.0113049

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Citrus is in the family of Rutaceae, which is one of the largest families in order Sapindales. Flowers and leaves of Citrus are usually strong scented, the extracts from which contain many useful flavonoids and other compounds that are effective insecticides, fungicides and medicinal agents [1]–[3]. Citrus is of great economic importance and contains many fruit crops such as oranges, grapefruit, lemons, limes, and tangerines. However, due to a long cultivation history, wide dispersion, somatic bud mutation, and sexual compatibility among Citrus species and related genera, the taxonomy of Citrus remains controversial [4], [5] and the origination of many Citrus species and hybrids is still unresolved [6], [7]. The chloroplast (cp) genome sequence contains useful information in plant systematics because of its maternal inheritance in most angiosperms [8], [9] and its highly conserved structures for developing promising genetic markers. The only complete cp genome available in Citrus is sweet orange (Citrus sinensis) [10], which has provided valuable information to the position of Sapindales in rosids. Although a genome sequencing project is in progress for C. clementine, its complete chloroplast genome sequence is not available yet. To identify the cp genome regions that are polymorphic and may be used as molecular markers for resolving the evolutionary relationships among Citrus species, a second cp genome within the genus is necessary for comparative analysis. For this purpose, the major aim of this study is to determine the complete cp genome sequence of C. aurantiifolia. C. aurantiifolia, which is commonly known as Key lime, Mexican lime, Omani lime, Indian lime, or acid lime, is native to Southeast Asia and widely cultivated in tropics and subtropics. Oman is known to be a transit country for lime, from which lime spread to Africa and the New World [11]. In Oman, Omani lime is considered the fourth most important fruit crop in terms of cultivated area and production. The products of Omani lime can be used for beverage, food additives and cosmetic industries [12]. Omani lime is sensitive to several biotic agents, the most serious of which is ‘Candidatus Phytoplasma aurantifolia’, the cause of witches’ broom disease of lime (WBDL). Recent studies on WBDL focused on effect of genetic diversity of Omani limes on the disease [13], transcriptome and proteomic analysis of lime response to infection by phytoplasma [14]–[16] and effect of phytoplasma on seed germination, growth and metabolite content in lime [17], [18]. Here, we present the complete chloroplast genome sequence of Omani lime (C. aurantiifolia). To identify loci of potential utility for the molecular identification and phylogenetic analyses of Citrus cultivars and species, we compared the intergenic regions and SSRs in the cp genomes of C. aurantiifolia and C. sinensis. Furthermore, we performed phylogenetic analyses to infer the history of gene losses in the cp genome evolution among representative rosids lineages.

Materials and Methods

Sample Preparation and Sequencing

The Omani lime leaves were collected from a 5-year-old lime tree at a private farm located in the Omani territory of Madha (GPS coordinates: 25.276318, 56.318909). This farm is owned by one of the co-authors of this work, Dr. Abdullah M. Al-Sadi, whom should be contacted for future permissions. This study does not involve endangered or protected species and does not require specific permission from regulatory authority concerned with protection of wildlife. The sample was stored in a cool box and transported to the Plant Pathology Research Laboratory at Sultan Qaboos University (Al Khoud, Oman) for DNA extraction following a protocol of Maixner et al. [19]. The leaves were washed with clear water before the isolation procedure. 1 g of leaves were used and crushed in 3 ml CTAB extraction buffer (2% CTAB, 1.4 M NaCl, 500 mM EDTA pH8, 1 M Tris-HCl pH8 and 0.2% beta-mercaptol). 1.5 ml of the leave extract was transferred to a 2 ml tube and incubated in a water-bath at 65°C for 15 min. The tube was turned up and down twice during incubation, centrifuged at 960 g for 5 min, and the supernatant was subsequently transferred to a clean eppendorf tube. An equal volume of chloroform-isoamyl alcohol mix (24∶1) was added and the tube was centrifuged at 21000 g for 20 min. The supernatant was transferred to a new tube and then 0.6 volume of isopropanol was added to the supernatant and incubated at −20°C for 30 min. The DNA pellet was collected by centrifugation at 21000 g for 20 min and then washed with 1 ml of 70% ethanol. The final DNA was resuspended in 100 µl TE (Tris 10 mM, EDTA 1 mM pH8) and was stored at −80°C until used. The library construction and sequencing were done at the Genome Analysis Centre (Norwich, UK). The Illumina TruSeq DNA Sample Preparation v2 Kit was used to prepare an indexed library. The DNA sample was sheared to a fragment size of 500–600 bp using a sonicator, followed by end-repair and the addition of a single A base for binding of the indexed adapter. The appropriate sized library (500 bp) was selected by gel electrophoresis, followed by PCR enrichment. The 251 bp paired-end sequencing run was performed on an Illumina MiSeq instrument using the SBS chemistry and Illumina software MCS v2.3.0.3 and RTA v1.18.42. The raw reads were deposited at the NCBI Sequence Read Archive under the accession number SRR1611615.

Genome Assembly and Analyses

The procedures for genome assembly and annotation were based on our previous studies of cp genomes [20], [21]. In addition to the standard de novo assembly approach by using Velvet v1.2.10 [22] with the k-mer size set to 243, a reference-based approach for assembly as described below was used in parallel. All of the raw reads were initially mapped onto the published cp genome of C. sinensis [10] using BWA v0.6.2 [23]. The sequence variations were identified with SAMtools v0.1.19 [24] and visually inspected using IGV v2.3.25 [25]. The variants were corrected with the raw reads and the regions without sufficient coverage were converted into gaps. This corrected sequence was then used as the new draft reference for the next iteration of verification. Gaps were filled using the reads overhang at margins and the process was repeated until the reference was fully supported by all mapped raw reads. The final assembly, which was supported by our de novo and reference-based approaches, resulted in an average of 1,441-fold coverage of paired-end reads with a mapping quality of 60 and the region with the lowest coverage is 506-fold. The preliminary annotations of the C. aurantiifolia cp genome were performed online using the automatic annotator DOGMA [26] and verified using BLASTN [27], [28] searches (e-value cutoff = 1e-10) against other land plant cp genomes. Each annotated gene was manually compared with C. sinensis cp genome for start and stop codons or intron junctions to ensure accurate annotation. The codon usage was analyzed by using the seqinr R-cran package [29]. A circular map of genome was produced using OGDRAW [30]. To identify the differences between C. aurantiifolia and C. sinensis, the two sequences were aligned using Mauve v2.3.1 [31] and the result was analyzed using custom Perl scripts. Intergenic gene regions were parsed out from the two Citrus cp genomes and aligned using MUSCLE v3.8.31 [32] with the default settings. The pairwise distances were calculated using the DNADIST program in the PHYLIP package v3.695 [33]. The positions and types of simple sequence repeats (SSRs) in the two Citrus cp genomes were detected using MISA (http://pgrc.ipk-gatersleben.de/misa/). The minimum number of repeats were set to 10, 5, 4, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and hexanucleotides, respectively. For long repeats, the program REPuter [34] was used to identify the number and location of direct and inverted (i.e., palindromic) repeats. A minimum repeat size of 30 bp and sequence identity greater than 90% setting were used according to the study of C. sinensis cp genome [10]. The redundant or overlapping repeats were identified and filtered manually.

Phylogenetic Inference

Phylogenetic analysis of the representative rosids lineages with complete cp genomes available was performed using PhyML v20120412 [35] with the GTR+I+G model. A total of 72 rosids species were chosen as the ingroups and Vitis venifera was included as the outgroup, the accession numbers were provided in Table S1. The protein-coding and rRNA genes were parsed from the selected cp genomes and clustered into ortholog groups using OrthoMCL [36]. The presence/absence of orthologous genes in each genome was examined and further verified using TBLASTN [27], [28] searches (e-value cutoff = 1e-10). The nucleotide sequences of the conserved genes were aligned individually by using MUSCLE with the default settings. The concatenated alignment was used to infer a maximum likelihood phylogeny as described above. The bootstrap supports were estimated from 1,000 resampled alignments generated by the SEQBOOT program in the PHYLIP package.

Investigations of orf56 and ycf68

To investigate the presence/absence of orf56 and ycf68 in the selected cp genomes, the gene sequences from C. aurantiifolia was used as the queries to perform BLASTN [27], [28] searches (e-value cutoff = 1e-10). The significant hits were examined to investigate the presence of intact open reading frames (ORFs). Phylogenetic analysis of the cp orf56 genes and the homologous mitochondrial sequences was performed as described above. The final alignment contains 190 aligned nucleotide sites and a total of 70 sequences, including two sequences of Amborella as the outgroup.

Results and Discussion

General Features of the Omani Lime Chloroplast Genome

The complete cp genome of C. aurantiifolia (Christm.) Swingle (GenBank accession number KJ865401.1) is 159,893 bp in length, including a large single copy (LSC) region of 87,148 bp, a small single copy (SSC) region of 18,763 bp, and a pair of inverted repeats (IRa and IRb) of 26,991 bp each (Figure 1 and Table 1). A total of 137 different genes, including 93 protein-coding genes, 30 tRNA genes, and four rRNA genes, were annotated (Table S2). Among these, 12 protein-coding genes and 7 tRNA genes are duplicated in the IR regions. Most of the protein-coding genes are composed of a single exon, while 14 contain one intron and three contain two introns. The gene rps12 was predicted to undergo trans-splicing, with the 5′ exon located in the LSC region and the other two exons located in the IR regions.
Figure 1

Chloroplast genome map of Citrus aurantiifolia.

Gene drawn inside the circle are transcribed clockwise, whereas those outside are counterclockwise. The within-genome GC content variation is indicated in the middle circles.

Table 1

Summary of the Citrus chloroplast genome characteristics.

Attribute C. aurantiifolia (KJ865401) C. sinensis (NC_008334)
Size (bp)159,893160,129
overall GC content (%)38.438.5
LSC size in bp (% total)87,148 (54.5%)87,744 (54.8%)
SSC size in bp (% total)18,763 (11.7%)18,393 (11.5%)
IR size in bp (% total)a 26,991 (16.9%)26,996 (16.9%)
Protein-coding regions size in bp (% total)81,468 (51.0%)79,773 (49.8%)
rRNA and tRNA size in bp (% total)11,850 (7.5%)11,850 (7.4%)
Introns size in bp (% total)17,129 (10.7%)18,252 (11.4%)
Intergenic spacer size in bp (% total)49,446 (30.9%)50,254 (31.4%)
Number of different genes115113b
Number of different protein-coding genes8179b
Number of different rRNA genes44
Number of different tRNA genes3030
Number of different genes duplicated by IR2220
Number of different genes with introns1717

Each cp genome contains two copies of inverted repeats (IRs).

According to the original annotation, not including orf56.

Chloroplast genome map of Citrus aurantiifolia.

Gene drawn inside the circle are transcribed clockwise, whereas those outside are counterclockwise. The within-genome GC content variation is indicated in the middle circles. Each cp genome contains two copies of inverted repeats (IRs). According to the original annotation, not including orf56. The protein-coding regions contain a total of 27,159 codons (Table S3). Isoleucine and cysteine are the most and least frequent amino acids and have 2,892 (10.7%) and 359 (1.2%) codons, respectively. The codon usage is biased towards a high ratio of A/T at the third position, which is also observed in many land plant cp genomes [37].

Sequence Comparisons with Sweet Orange

The general characteristics of the two Citrus cp genomes are summarized in Table 1, overall the compositions are quite similar. The GC content of these Citrus cp genomes is approximately 38.5%, which is slightly higher than the average of the 72 representative rosids lineages (36.7%). In these two Citrus cp genomes, the genic regions, introns, and intergenic regions account for ca. 58%, 11%, and 31%, respectively (Table 1). The pairwise sequence alignment between the two Citrus cp genomes revealed approximately 1.3% sequence divergence (Table 2), including 1,780 indels (1.11%) and 330 substitutions (0.21%). The LSC region contains more sequence polymorphisms than expected by its size, including 1,360 (76.4%) indels and 235 (71.2%) substitutions. In contrast, the two IR regions account for ca. 34% of the cp genome yet contain only 16 (0.9%) indels and 12 (3.6%) substitutions. The size differences in the LSC and SSC regions between these two cp genomes are mostly explained by one large indel in each region. The LSC sizes differ by 596 bp and a 523-bp indel was found in the spacer between rps16 and trnQ-UUG. The SSC sizes differ by 370 bp and a 354-bp indel was found in the spacer between rpl32 and trnL-UAA.
Table 2

Differences between the C. aurantiifolia and C. sinensis cp genomes.

Indel
Length (bp)Count
143
2–1020
11–10018
101–1,0003
Sum1,780116Percentagea: 1.11%
Substitution
TypeCount
A <-> T34
C <-> G15
A <-> C81
T <-> C64
A <-> G51
T <-> G85
Sum330Percentagea: 0.21%
10 most divergent intergenic regions
RegionLengthb (bp)Pairwise distance
rps3 - rpl22 (LSC)2340.027
ndhE - ndhG (SSC)2760.018
psaC - ndhE (SSC)2310.017
psbH - petB (LSC)1180.017
trnY-GUA-trnE-UCC (LSC)590.017
trnH-GUG - psbA (LSC)4490.016
rpl32 - trnL-UAG (SSC)1,1410.015
psbT-psbN (LSC)660.015
trnG-GCC-trnR-UCU (LSC)2040.015
trnD-GUC-trnY-GUA (LSC)4690.013

Relative to the length of C. aurantiifolia.

Length in C. aurantiifolia.

Relative to the length of C. aurantiifolia. Length in C. aurantiifolia. To identify the intergenic regions that may be useful for phylogenic analysis or molecular identification, we searched for the spacers that are >400 bp in length and exhibit above-average sequence divergence between the two Citrus species (i.e., >1.3%). A total of three regions satisfied these criteria, including the spacer between trnH-GUG and psbA (449 bp, 1.6% divergence), the spacer between rpl32 and trnL-UAG (1141 bp, 1.5% divergence), and the spacer between trnD-GUC and trnY-GUA (469 bp, 1.3% divergence). The junctions between the IR, LSC, and SSC regions in C. aurantiifolia are similar to that of C. sinensis except for the LSC-IRb boundary. A total of 23 indels and five substitutions were found at this region, resulting in one copy of rpl22 spanning across the LSC-IRb junction in C. aurantiifolia. Comparing the IR junctions of Citrus with Theobroma and Gossypium in Malvaceae [38], it was found that the IRs in Citrus have expanded to include rps19 and 252 nt of rpl22, whereas in Malvaceae, rps19 is located in LSC and rpl22 was missing [38]–[40].

Analyses of Repetitive Sequences

A total of 109 SSR loci were found in the cp genome of C. aurantiifoliaa, accounting for 1,352 bp of the total sequence (ca. 0.8%). Among these, 94 were also found in C. sinensis and 42 exhibit length polymorphism (Table 3). Most SSRs are located in intergenic regions, but some were found in coding genes such as matK and ycf1. Concerning the controversial status of Citrus taxonomy, the SSRs identified in this study may provide new perspective to refine the phylogeny and elucidate the origin of the cultivars. Furthermore, these SSRs may be used as molecular markers for population studies.
Table 3

List of simple sequence repeats.

Repeat unitLength (bp)Number of SSRsStart positiona
A106 4512; 47812; 53871; 72614; 121748; 159288
116 6866; 10130; 69481; 71892; 117725; 134802
129 8332; 31399; 47307; 63928; 111804; 113977 ( ycf1 ); 118367; 140302 (ycf68); 144255
132 10107; 84557
141 385
151 32360
162 69965; 118302
173 7620; 39139; 74176
191 12023
22170289
T1010 2424 ( matK ); 19786; 26964 ( rpoB ); 37622; 46938; 63632; 87731; 117742; 117871; 118851
11119401; 10416; 17001; 30912; 46021; 63530; 112216; 117988; 118224; 121703; 131189 (ycf1)
126 14722; 29024; 102773; 106715 (ycf68); 133040 ( ycf1 ); 135213
132 73946; 80423
1421776; 85274
152 54209; 57817
17145965
183 52748; 68339; 81409
201 49202
232 23694; 33282
C102 28769; 104247
G101 142772
AT104 20631 ( rpoC2 ); 33636; 11817; 121517 ( ndhD )
AAG121 97331
AAT122 38604; 122629
ATA12170220
ATT123 10283; 53810; 54088
181 1760;
CTT122 37353 ( psbC ); 149686
TAA122 30250; 61945
TAT121 83297
TTC121 73084
TAAA1224866; 45088
AAAT123 30423; 32502; 71394
ATAC121 51167
ATTT12149193
201 117168
TTAA12239175; 39188
TTAG121 61483
TTTC121 14352
TCTT121 46961
AATAA201144226
TTTTA201102781
TTCAAA18163817

The SSR-containing coding regions are indicated in parentheses. SSRs that are identical in the C. sinensis chloroplast genome are highlighted in bold; SSRs that are conserved but with different lengths are highlighted by underline.

The SSR-containing coding regions are indicated in parentheses. SSRs that are identical in the C. sinensis chloroplast genome are highlighted in bold; SSRs that are conserved but with different lengths are highlighted by underline. In addition, 62 large repeats that are longer than 30 bp were found in the C. aurantiifolia cp genome (Table 4). Most of these repeats are located in intergenic spacers, except for three that are located in the coding regions of rps4, psaA and psaB. Twelve of these long repeats were also found in C. sinensis, indicating that these repeats might be widespread in the genus.
Table 4

List of long repeat sequences.

RepeatsizeTypea Start position of1st repeatStart position the repeatfound in other regionLocationb Region
30D17591762IGS (psbA-trnK-UUU)LSC
30P177112015IGS (psbA-trnK-UUU, atpA-atpF)LSC
30P823137726, 47606 IGS ( trnS-GCU , trnS-UGA , trnS-GGA ), LSC
30D2322685067 intron ( rpoC1 ), IGS ( rpl16 - rps3 ) LSC
30D2368652733intron (rpoC1 ), IGS (ndhK-ndhC)LSC
30P2368770291intron (rpoC1), IGS (trnP-UGG-psaJ)LSC
30D2369233280intron (rpoC1), IGS (trnE-UUC-trnT-GGU)LSC
30D49192117171IGS (psbA-trnK-UUU), IGS (atpA-atpF)LSC, IR
30D, P49197102764, 144233IGS (trnT-UGU-trnL-UAA), IGS (rps12-trnV-GAC)LSC, IR
30D, P51215102768, 144229IGS (trnF-GAA-ndhJ), IGS (rps12-trnV-GAC)LSC, IR
30P7134471344 IGS ( rpl33 - rps18 ) LSC
30D, P102768102773, 144224IGS (rps12-trnV-GAC)IR
30D144225144230IGS (trnV-GAC-rps12)IR
31P4492117868IGS (trnK-UUU-rps16), IGS (rpl32-trnL-UAG)LSC, IR
31P1010649188IGS (trnG-GCC-trnR-UCU, trnT-UGU-trnL-UAA)LSC
31P2981129811 IGS ( petN - psbM ) LSC
31P3328170282 IGS ( trnE-UUC - trnT-GGU , trnP-UGG - psaJ ) LSC
31P119977119977intron (ccsA)IR
32D761574171IGS (psbK-psbI), intron (clpP)LSC
32P3916639166IGS (trnG-GCC-trnfM-CAU)LSC
34P3877438782 IGS ( psbZ - trnG-GCC ) LSC
34P4918670288rps4, IGS (trnP-UGG-psaJ)LSC
34D, P111432111464, 135529, 135561 IGS ( rrn4.5 - rrn5 ) IR
35P1009749193IGS trnG-GCC-trnR-UCU, trnT-UGU-trnL-UAA)LSC
36P2764827648IGS (rpoB-trnC-GCA)LSC
40P7777677776IGS (psbT-psbN)LSC
41D4129443518 psaB , psaA LSC
41D, P102353124945, 144633 IGS ( rps12 - trnV-GAC ), intron ( ndhA ) IR
48P3062630626 IGS ( petN - psbM ) LSC
50D3902039044 IGS ( trnG-GCC - trnfM-CAU ) LSC
51P99849984IGS (trnG-GCC-trnR-UCU)LSC
53P886931095 IGS (trnS-GCU-trnG-GCC, psbM - trnD-GUC)LSC
54P441441IGS (trnH-GUG-psbA)LSC

D: direct repeat; P: palindrome inverted repeat.

IGS: intergenic spacer region. Sequences conserved in the C. sinensis chloroplast genome are highlighted in bold.

D: direct repeat; P: palindrome inverted repeat. IGS: intergenic spacer region. Sequences conserved in the C. sinensis chloroplast genome are highlighted in bold.

Gene Content Analyses within the Rosids

A maximum likelihood phylogenetic analysis of 72 representative rosids lineages was conducted based on a concatenated alignment of four rRNA and 58 protein-coding genes with 54,689 sites (Figure 2). Citrus represents Sapindales and is sister to the clade containing Malvales and Brassicales. These relationships are congruent with the previous reports [10], [41]–[43]. Based on this phylogeny and the gene content, we inferred the gene loss events during the cp genome evolution in rosids.
Figure 2

Maximum likelihood phylogeny of the representative rosids lineages.

The common grape vine (Vitis vinifera) is included as the outgroup to root the tree. The concatenated alignment includes 62 conserved chloroplast genome genes and 54,689 aligned nucleotide sites. Nodes received <70% bootstrap support are indicated by gray arrows. The putative events of gene losses are inferred based on the most parsimonious scenario.

Maximum likelihood phylogeny of the representative rosids lineages.

The common grape vine (Vitis vinifera) is included as the outgroup to root the tree. The concatenated alignment includes 62 conserved chloroplast genome genes and 54,689 aligned nucleotide sites. Nodes received <70% bootstrap support are indicated by gray arrows. The putative events of gene losses are inferred based on the most parsimonious scenario. The translation initiation factor gene infA in cp has been lost independently at least 24 times in angiosperms and evidence provided from some cases suggested functional replacement by a nucleus copy [44]. Although the majority of infA in our selected cp genomes were found to be pseudogenized or completely lost, an intact infA was found in Quercus, Francoa, and two Cuscumis species. The rpl22 were found to be lost in Fabaceae [45] and Castanea of Fagaceae [46] following independent transfers to nucleus. Furthermore, another putative loss of rpl22 was detected in Passiflora [46]. The rpl22 in Malvaceae, including Theobroma and three Gossypium species, were found to be pseudogenized in our analysis. In Citrus, the ORF of rpl22 was shortened to 252–264 nt compared to the typical length of 399–489 nt in other rosids [10], [46]. However, compared with the pseudogenized rpl22 found in Malvalvace, the rpl22 homologs in Citrus still show high sequence conservation. Additionally, the rpl22 transcripts can be identified in the EST database for various Citrus species (data not shown). Taking account into the above consideration, we did not annotate rpl22 as a pseudogene in Citrus. The parallel losses of rps16 were found in several rosids lineages (Figure 2), including one time in Salicaceae, two times in Fabaceae and another two times in Brassicaceae. The loss of rps16 in Medicago and Populus was found to be substituted by a nuclear-encoded copy that transferred from the mitochondrion (mt) [47]. Because the nuclear-encoded RPS16 was found to target both mt and cp in Arabidopsis, Lycopersicon, and Oryza [47], it is possible that the cp genome-encoded rps16 would not be maintained by selection and will eventually become lost in these lineages. There are only a few gene loss events of photosynthetic genes found in rosids. In addition to the loss of psaI in Lathyrus sativus [48], the accD seems to be lost independently in Trifolium subterraneum and several Gerantiaceae species except for Geranium palmatum. In Trifolium, a nuclear-encoded accD copy has been reported [48], which presented another example of horizontal gene transfer from cp to nucleus. Successful gene transfers from cp to the nucleus in angiosperms are rare and have been only documented for four genes in rosids. Other than the three genes described above (i.e., infA, rpl22, and accD), the rpl32 in Populus (Salicaceae) is the fourth example [49]–[51]. The IR has been reported to be independently lost at least five times among seed plants, two of which are within rosids [51]. In addition to the inverted repeat lacking clade (IRLC) of papilionoid Fabaceae [52] and Erodium of Gerantiaceae [53], [54], the IR was found to be lost in two lineages of Fragaria (Rosaceae), which are F. vesca ssp. bracteatea and F. mandschurica (accession: NC_018767, not shown in Figure 2). Based on the Fragaria phylogeny shown in a previous study [55], it seems that IR loss was not a single event in Fragaria.

Molecular Evolution of orf56 and ycf68 within the Rosids

In the comparison of gene content between the two Citrus cp genomes, C. aurantiifolia was found to contain two additional protein-coding genes. The first gene, orf56, is located in the trnA-UGC intron that contains one sequence homologous to previously recognized mitochondrial ACRS (ACR-toxin sensitivity gene) in Citrus [56]. In addition to the 171-bp identical sequences between cp orf56 and the ORF sequences of ACRS in mt, the full length of 355-bp region of ACRS that conferred sensitivity to ACR-toxin in E. coil are also identical. Furthermore, the whole trnA-UGC among two Citrus cp regions and C. jambhiri mitochondrial ACRS shared more than 96% identity (Figure S1), which highlight the conservation of this region between cp and mt. The gene orf56 has also been included in the annotation of complete cp genomes of Calycanthus [57] and Pelargonium [58]. Our BLAST search against the rosids genome database revealed that in addition to Citrus and Pelargonium, all of the species examined in Cucurbitaceae and Myrtales also contain an intact orf56 (Figure 3). Moreover, an intact ACRS ORF is also present in the mt genomes of Liriodendron [59] and Silene [60] and the ORF sequences between cp and mt are identical. Goremykin et al. [57] suggested that the ACRS gene was relative recently transferred from cp to mt. Based on the phylogeny containing the cp orf56 and the mt ACRS (Figure S2), it appears that orf56 has been independently transferred from cp to mt in different lineages.
Figure 3

The phylogenetic distribution patterns of orf56 and ycf68.

The second gene, ycf68, is located in the trnI-GAU intron. A nearly identical sequence was found in C. sinensis but an additional T insertion near the C-terminus abolished the stop codon at the corresponding position. The intact ycf68 can be detected in several monocots and Nymphaeaceae [61], [62]. However, in the majority of other rosids (Figure 3) and the rest of the eudicots [61], the ycf68 homologs all contain premature stop codons. Although Raubeson et al. [61] argued that ycf68 is not a protein-coding gene based on the lack of intron-folding pattern, the high levels of sequence conservation among the ORFs of identified homologs suggest that the true identity and functionality of this putative gene remains to be further investigated.

Conclusions

We reported the complete cp genome sequence of Citrus aurantiifolia (Rutaceae) in this study. The genome organization and gene content is typical of most angiosperms and highly similar to that of C. sinensis (i.e., 98.7% identical at the nucleotide level). The only difference in the gene content between the two Citrus cp genomes is the C. aurantiifolia-specific presence of a protein-coding gene (ycf68) in the trnI-GAU intron. Notably, three long intergenic spacers with high sequence divergence and 94 shared SSR regions were identified in the C. aurantiifolia-C. sinensis comparison. These regions may provide phylogenetic utility at low taxonomic levels and could be applied to the molecular identification of Citrus cultivars. Finally, our comparative analysis of gene content among 72 representative rosids lineages highlighted multiple events of gene losses within this group. Alignment of the -containing sequences of two cp genomes and mitochondrial sequences. (TIF) Click here for additional data file. The maximum likelihood phylogeny of the cp and mt ORF sequences. (TIF) Click here for additional data file. List of the complete chloroplast genome sequences included in the phylogenetic analysis. (XLSX) Click here for additional data file. List of the genes found in the cp genome. (XLSX) Click here for additional data file. Codon usage of the cp genome. (XLSX) Click here for additional data file.
  44 in total

1.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

2.  Localized hypermutation and associated gene losses in legume chloroplast genomes.

Authors:  Alan M Magee; Sue Aspinall; Danny W Rice; Brian P Cusack; Marie Sémon; Antoinette S Perry; Sasa Stefanović; Dan Milbourne; Susanne Barth; Jeffrey D Palmer; John C Gray; Tony A Kavanagh; Kenneth H Wolfe
Journal:  Genome Res       Date:  2010-10-26       Impact factor: 9.043

3.  Ultra-barcoding in cacao (Theobroma spp.; Malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA.

Authors:  Nolan Kane; Saemundur Sveinsson; Hannes Dempewolf; Ji Yong Yang; Dapeng Zhang; Johannes M M Engels; Quentin Cronk
Journal:  Am J Bot       Date:  2012-02-01       Impact factor: 3.844

Review 4.  When gene marriages don't work out: divorce by subfunctionalization.

Authors:  Brian P Cusack; Kenneth H Wolfe
Journal:  Trends Genet       Date:  2007-04-05       Impact factor: 11.639

5.  Complete plastid genome sequences of three Rosids (Castanea, Prunus, Theobroma): evidence for at least two independent transfers of rpl22 to the nucleus.

Authors:  Robert K Jansen; Christopher Saski; Seung-Bum Lee; Anne K Hansen; Henry Daniell
Journal:  Mol Biol Evol       Date:  2010-10-08       Impact factor: 16.240

6.  Substitution of the gene for chloroplast RPS16 was assisted by generation of a dual targeting signal.

Authors:  Minoru Ueda; Tomotaro Nishikawa; Masaru Fujimoto; Hideki Takanashi; Shin-Ichi Arimura; Nobuhiro Tsutsumi; Koh-Ichi Kadowaki
Journal:  Mol Biol Evol       Date:  2008-05-02       Impact factor: 16.240

7.  The Complete Plastid Genome Sequence of Madagascar Periwinkle Catharanthus roseus (L.) G. Don: Plastid Genome Evolution, Molecular Marker Identification, and Phylogenetic Implications in Asterids.

Authors:  Chuan Ku; Wan-Chia Chung; Ling-Ling Chen; Chih-Horng Kuo
Journal:  PLoS One       Date:  2013-06-18       Impact factor: 3.240

8.  Phylogenetic relationships of citrus and its relatives based on matK gene sequences.

Authors:  Tshering Penjor; Masashi Yamamoto; Miki Uehara; Manami Ide; Natsumi Matsumoto; Ryoji Matsumoto; Yukio Nagano
Journal:  PLoS One       Date:  2013-04-25       Impact factor: 3.240

9.  Mutational dynamics of aroid chloroplast genomes.

Authors:  Ibrar Ahmed; Patrick J Biggs; Peter J Matthews; Lesley J Collins; Michael D Hendy; Peter J Lockhart
Journal:  Genome Biol Evol       Date:  2012       Impact factor: 3.416

10.  Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus.

Authors:  Linda A Raubeson; Rhiannon Peery; Timothy W Chumley; Chris Dziubek; H Matthew Fourcade; Jeffrey L Boore; Robert K Jansen
Journal:  BMC Genomics       Date:  2007-06-15       Impact factor: 3.969

View more
  28 in total

1.  The complete chloroplast genome sequence of Morus mongolica and a comparative analysis within the Fabidae clade.

Authors:  Weiqing Kong; Jinhong Yang
Journal:  Curr Genet       Date:  2015-07-24       Impact factor: 3.886

Review 2.  Genomics and Evolution in Traditional Medicinal Plants: Road to a Healthier Life.

Authors:  Da-Cheng Hao; Pei-Gen Xiao
Journal:  Evol Bioinform Online       Date:  2015-10-04       Impact factor: 1.625

3.  Comparative analysis of complete chloroplast genome sequences of two tropical trees Machilus yunnanensis and Machilus balansae in the family Lauraceae.

Authors:  Yu Song; Wenpan Dong; Bing Liu; Chao Xu; Xin Yao; Jie Gao; Richard T Corlett
Journal:  Front Plant Sci       Date:  2015-08-25       Impact factor: 5.753

4.  Complete Genome Sequence of Agrobacterium tumefaciens Ach5.

Authors:  Ya-Yi Huang; Shu-Ting Cho; Wen-Sui Lo; Yi-Chieh Wang; Erh-Min Lai; Chih-Horng Kuo
Journal:  Genome Announc       Date:  2015-06-04

5.  The Chloroplast Genome of Elaeagnus macrophylla and trnH Duplication Event in Elaeagnaceae.

Authors:  Kyoung Su Choi; O Gyeong Son; SeonJoo Park
Journal:  PLoS One       Date:  2015-09-22       Impact factor: 3.240

Review 6.  Chloroplast genomes: diversity, evolution, and applications in genetic engineering.

Authors:  Henry Daniell; Choun-Sea Lin; Ming Yu; Wan-Jung Chang
Journal:  Genome Biol       Date:  2016-06-23       Impact factor: 13.583

7.  Towards a molecular taxonomic key of the Aurantioideae subfamily using chloroplastic SNP diagnostic markers of the main clades genotyped by competitive allele-specific PCR.

Authors:  Amel Oueslati; Frederique Ollitrault; Ghada Baraket; Amel Salhi-Hannachi; Luis Navarro; Patrick Ollitrault
Journal:  BMC Genet       Date:  2016-08-18       Impact factor: 2.797

8.  Complete plastid genome of Eriobotrya japonica (Thunb.) Lindl and comparative analysis in Rosaceae.

Authors:  Liqun Shen; Qijie Guan; Awais Amin; Wei Zhu; Mengzhu Li; Ximin Li; Lin Zhang; Jingkui Tian
Journal:  Springerplus       Date:  2016-11-29

9.  Comparative analysis of the complete chloroplast genome sequences in psammophytic Haloxylon species (Amaranthaceae).

Authors:  Wenpan Dong; Chao Xu; Delu Li; Xiaobai Jin; Ruili Li; Qi Lu; Zhili Suo
Journal:  PeerJ       Date:  2016-11-10       Impact factor: 2.984

10.  Comparative Analysis of Six Lagerstroemia Complete Chloroplast Genomes.

Authors:  Chao Xu; Wenpan Dong; Wenqing Li; Yizeng Lu; Xiaoman Xie; Xiaobai Jin; Jipu Shi; Kaihong He; Zhili Suo
Journal:  Front Plant Sci       Date:  2017-01-19       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.