Literature DB >> 33195780

Chloroplast genome data of Luffa acutangula and Luffa aegyptiaca and their phylogenetic relationships.

Chutintorn Yundaeng1, Wanapinun Nawae1, Chaiwat Naktang1, Jeremy R Shearman1, Chutima Sonthirod1, Duangjai Sangsrakru1, Thippawan Yoocha1, Nukoon Jomchai1, John R Sheedy2, Supat Mekiyanon2, Methawat Tuntaisong1, Wirulda Pootakham1, Sithichoke Tangphatsornruang1.   

Abstract

Luffa acutangula and Luffa aegyptiaca are domesticated plants in the family Cucurbitaceae. They are mainly cultivated in the tropical and subtropical regions of Asia. The chloroplast genomes of many Cucurbitaceae species were sequenced to examine gene content and evolution. However, the chloroplast genome sequences of L. acutangula and L. aegyptiaca have not been reported. We report the first complete sequences of L. acutangula and L. aegyptiaca chloroplast genomes obtained from Pacific Biosciences sequencing and use them to infer evolutionary relationships. The chloroplast genomes of L. acutangula and L. aegyptiaca are 157,202 and 157,275 bp, respectively. Both genomes possessed the typical quadripartite structure and contained 131 genes, including 87 coding genes, 36 tRNA genes and 8 rRNA genes. We identified simple sequence repeats (SSR) and single nucleotide polymorphisms (SNP) from both chloroplast genomes. Polycistronic mRNA was examined in L. acutangula and L. aegyptiaca using RNA sequences from Isoform sequencing to identify co-transcribed genes. IR size and locations were compared to other species and found to be relatively unchanged. Phylogenetic analysis confirmed the close relationship between L. acutangula and L. aegyptiaca in the Cucurbitaceae lineage and showed separation of the Luffa monophyletic clade from other species in the subtribe Sicyocae. The results obtained from this study can be useful for studying the evolution of Cucurbitaceae plants.
© 2020 The Author(s). Published by Elsevier Inc.

Entities:  

Keywords:  Luffa acutangula; Luffa aegyptiaca; PacBio sequencing; chloroplast genome; comparative analysis

Year:  2020        PMID: 33195780      PMCID: PMC7644877          DOI: 10.1016/j.dib.2020.106470

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the Data

L. acutangula and L. aegyptiaca chloroplast genomes are sources of molecular data that confirm complex evolutionary relationships and support the need for phylogenetic research in various plant groups. The complete chloroplast genome data could be utilized in the genetics, biotechnology, plant breeding, and ecology fields. The sequence variation among the chloroplast genomes of Luffa sp. and other representatives of the family Cucurbitaceae enhances the understanding of their phylogenetic relationships. Polymorphisms in the chloroplast genome (e.g., simple sequence repeats (SSRs) or single nucleotide polymorphisms (SNPs)) can be used to develop potential molecular markers and study evolutionary patterns of Luffa sp. and closely related species.

Data Description

The complete chloroplast genomes of L. acutangula and L. aegyptiaca were assembled using long read sequences obtained from PacBio sequencing and annotated for gene content. The chloroplast genome sequences and annotated genes are available through NCBI accession number MT381996 (L. acutangula) and MT381997 (L. aegyptiaca). Both chloroplast genomes had the typical quadripartite structure, which consists of a small single-copy region (SSC) and a large single-copy region (LSC), separated by a pair of inverted repeats (IRs) (Fig. 1, Table 1). Both chloroplast genomes encoded 131 genes, including 87 protein-coding genes, 36 tRNA genes and 8 rRNA genes (Table 2, Table 3). The codon-usage frequencies were calculated for the protein-coding genes and tRNA genes of the L. acutangula and L. aegyptiaca chloroplast genomes (Fig 2, Table 4). Length and position of the LSC and SSC regions and genetic variation the chloroplast genomes were examined among L. acutangula, L. aegyptiaca and other species in the family Cucurbitaceae (Fig. 3 and 4). Simple sequence repeats (SSR) (Fig. 5, supplementary Table S1), single nucleotide polymorphisms (SNP) (Table 5) and RNA editing events (Table 6) in bothL. acutangula and L. aegyptiaca chloroplast genomes were identified. Polycistronic transcript sequences were similar in L. acutangula and L. aegyptiaca chloroplast genomes (Table 7, supplementary Table S2). Furthermore, a phylogenetic analysis of Luffa and several Cucurbitaceae species placed L. acutangula and L. aegyptiaca closely related to Tricosanthes and Hodgsonia in the Sicyoeae tribe (Fig. 6).
Fig. 1

The chloroplast genomes of L. acutangula and L. aegyptiaca. Genes shown outside of the circle are transcribed counterclockwise, while those inside are transcribed clockwise, as shown by the arrows. The functions of genes are grouped by color. Asterisks indicate intron-containing genes.

Table 1

Chloroplast genome features among Cucurbitaceae species.

L. acutangulaL. aegyptiacaC. lanatusC. meloC. sativusC. pepo
Genome size (bp)157,202157,275156,906156,017155,293157,343
LSC size (bp)86,22686,31086,84686,33586,68987,828
SSC size (bp)18,40218,39317,89818,09018,20918,169
IRs size (bp)26,28026,28626,08125,79625,19925,678
GC content (%)37.1437.1237.1836.9237.0837.16
LSC GC content (%)34.9634.9334.9434.6734.8534.91
SSC GC content (%)31.0231.0431.5430.9431.8331.44
IRs GC content (%)42.8642.8642.8442.7942.8343.05
No. of genes131131124135133131
No. of CDS878787908986
No. of tRNA363629373737
No. of rRNA888888
No. of CDS with intron151510161515
Gene coding density (%)50.0850.0449.7451.7450.0646.60
Genbank accession numberMT381996MT381997NC_032008NC_015983NC_007144NC_038229
Table 2

List of genes present in L. acutangula and L. aegyptiaca chloroplast genomes.

CategoryGene groupsGene name
PhotosynthesisPhotosystem I (5)psaA, psaB, psaC, psaI, psaJ
Photosystem II (15)psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
Cytochome b6/f complex (6)petA, petB*, petD*, petG, petL, petN
ATP synthase (6)atpA, atpB, atpE, atpF*, atpH, atpI
Rubisco large subunit (1)rbcl
NADH dehydrogenase (12)ndhA*, ndhB (× 2)*, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Self-replicationLarge subunit Ribosomal protein (11)rpl2 (× 2)*, rpl14, rpl16*, rpl20, rpl22, rpl23 (× 2), rpl32, rpl33, rpl36
Small subunit ribosomal protein (14)rps2, rps3, rps4, rps7 (× 2), rps8, rps11, rps12 (× 2)*, rps14, rps15, rps16*, rps18, rps19
RNA polymerase (4)rpoA, rpoB, rpoC1*, rpoC2
Ribosomal RNAs (8)rrn4.5 (× 2),rrn5 (× 2), rrn16 (× 2), rrn23 (× 2)
Transfer RNAs (36)trnA-UGC (× 2)*, trnC-GCA, trnD-GTC, trnE-TTC, trnF-GAA, trnfM-CAT, trnG-GCC, trnH-GTG, trnI-CAT (× 2), trnI-GAU (× 2)*, trnK-UUU*, trnL-CAA (× 2), trnL-TAG, trnL-UAA*, trnM-CAT, trnN-GTT (× 2), trnP-TGG, trnQ-TTG, trnR-ACG (× 2), trnR-TCT, trnS-GCU, trnS-GGA, trnS-TGA, trnT-GGT, trnT-TGT, trnV-GAC (× 2), trnV-UAC*, trnW-CCA, trnY-GUA
Other genesAcetyl-CoA carboxylase gene (1)accD
c-type cytochrome biogenesis (1)ccsA
ATP-dependent protease subunit (1)clpP*
Maturease (1)matK
Membrane protein (1)cemA
Proteins of unknown function (7)ycf1, ycf2 (× 2), ycf3*, ycf4, ycf15 (× 2)
Translation-related gene (1)infA

Gene with intron(s)

Table 3

Genes with intron(s) inL. acutangula and L. aegyptiaca chloroplast genomes.

GeneLocationSpecies
L. acutangula
L. aegyptiaca
Exon IIntron IExon IIIntron IIExon IIIExon IIntron IExon IIIntron IIExon III
(bp)(bp)(bp)(bp)(bp)(bp)(bp)(bp)(bp)(bp)
rps16LSC42855213--45856213--
atpFLSC144755411--144757411--
rpoC1LSC4327531611--4327561611--
ycf3LSC126740228743153126740228740156
clpPLSC6984728861322869835297615225
petBLSC6783642--9780642--
petDLSC9727474--9732474--
rpl16LSC91100402--91098402--
rpl2IRb390665435--390665435--
ndhBIRb777686756--777686756--
rps12IRb11428918234537271142834623453727
ndhASSC5521155540--5521146540--
rps12IRa11471157234537271147113623453727
ndhBIRa786677756--777686756--
rpl2IRa390665435--393662435--
Fig. 2

Amino acid frequencies in L. acutangula and L. aegyptiaca protein-coding sequences.

Table 4

The codon-anticodon recognition pattern and codon usage forL. acutangula and L. aegyptiaca chloroplast genomes.

Amino acidCodonFrequencya
RSCU
trnb
L. acutangulaL. aegyptiacaL. acutangulaL. aegyptiaca
PheUUU9579571.291.29trnF-GAA
PheUUC5305290.710.71
LeuUUA8608601.881.88trnL-UAA
LeuUUG5565561.221.22trnL-CAA
LeuCUU5855851.281.28trnL-TAG
LeuCUC1901890.420.41
LeuCUA3773790.820.83
LeuCUG1741760.380.38
IleAUU84831.451.45trnI-GAU
IleAUC4744720.630.63
IleAUA6886870.920.92trnI-CAT
MetAUG62462511trnM-CAT
trnfM-CAT
ValGUU5085071.431.43trnV-GAC
ValGUC1811830.510.52
ValGUA5305311.51.5trnV-UAC
ValGUG1981980.560.56
SerUCU5715661.691.68trnS-GGA
SerUCC3193220.940.95
SerUCA4284291.271.27trnS-UGA
SerUCG1891880.560.56
ProCCU4134101.531.52trnP-UGG
ProCCC2012030.750.75
ProCCA3153141.171.17
ProCCG1501510.560.56
ThrACU5345351.611.61trnT-GGU
ThrACC2482480.750.75
ThrACA3973991.21.2trnT-UGU
ThrACG1491470.450.44
AlaGCU6346351.811.81trnA-UGC
AlaGCC2312320.660.66
AlaGCA3843831.11.09
AlaGCG1491500.430.43
TyrUAU7827841.61.6trnY-GUA
TyrUAC1941940.40.4
STOPUAA54541.931.93
STOPUAG16160.570.57
HisCAU4754771.531.53trnH-GTG
HisCAC1471460.470.47
GlnCAA7197201.541.54trnQ-TTG
GlnCAG2152160.460.46
AsnAAU9839821.541.53trnN-GTT
AsnAAC2932980.460.47
LysAAA48421.51.5trnK-UUU
LysAAG3503480.50.5
AspGAU8738711.611.61trnD-GTC
AspGAC2112090.390.39
GluGAA20221.491.49trnE-TTC
GluGAG3483490.510.51
CysUGU2162161.471.47trnC-GCA
CysUGC78780.530.53
STOPUGA14140.50.5
TrpUGG46446211trnW-CCA
ArgCGU3543541.341.34trnR-ACG
ArgCGC1031000.390.38trnR-TCT
ArgCGA3683701.41.41
ArgCGG1131120.430.43
SerAGU4013991.191.18trnS-GCU
SerAGC1211220.360.36
ArgAGA4744781.81.82
ArgAGG1681660.640.63
GlyGGU6066061.351.35trnG-GCC
GlyGGC1661670.370.37
GlyGGA7277271.621.62
GlyGGG2952920.660.65

*RSCU (Relative synonymous codon usage) value ≥ 1.00

Frequency of codon usage in 23,224 and 23,220 codons in all potential protein-coding genes of L. acutangula and L. aegyptiaca, respectively;

Gene encoding transfer RNA

Fig. 3

Comparison of the chloroplast genome borders of the LSC, SSC, and IR regions among six species, ψ partial fragment of the ycf1 gene.

Fig. 4

Alignment of chloroplast genome sequences, showing percent similarity, among six species using L. acutangula as a reference.

Fig. 5

Simple sequence repeat (SSR) analysis in L. acutagula andL. aegyptiaca chloroplast genomes. (a) SSR percentage in the LSC, SSC and IR regions, (b) Number of SSR per motif size.

Table 5

Candidate single nucleotide polymorphisms (SNPs) identified in CDS between the reference (L. Acutangula) and L. aegyptiaca.

PositionReferenceL. aegSustitutionsaGeneFunction
1973TCNSmatKMaturease K
3132GTSmatKMaturease K
5299TGNSrps1630S ribosomal protein S16
8127CANSpsbKPhotosystem II reaction center protein K
8217CANSpsbKPhotosystem II reaction center protein K
12059GTSatpAATP synthase subunit alpha
13328GTSatpFATP synthase subunit b
17060GTSrps230S ribosomal protein S2
17982CANSrpoC2DNA-directed RNA polymerase subunit beta
18665CANSrpoC2DNA-directed RNA polymerase subunit beta
19148CTSrpoC2DNA-directed RNA polymerase subunit beta
19540CANSrpoC2DNA-directed RNA polymerase subunit beta
20274GTNSrpoC2DNA-directed RNA polymerase subunit beta
20678AGSrpoC2DNA-directed RNA polymerase subunit beta
20777AGSrpoC2DNA-directed RNA polymerase subunit beta
25097GTSrpoBDNA-directed RNA polymerase subunit beta
26705CTSrpoBDNA-directed RNA polymerase subunit beta
27002CTSrpoBDNA-directed RNA polymerase subunit beta
35125GCNSpsbDPhotosystem II D2 protein
51601GTNSndhJNAD(P)H-quinone oxidoreductase subunit J
52335GTSndhKNAD(P)H-quinone oxidoreductase subunit K
55091ATSatpEATP synthase epsilon chain
55260TGNSatpBATP synthase subunit beta
55588CASatpBATP synthase subunit beta
56576GANSatpBATP synthase subunit beta
57691TGNSrbcLRibulose bisphosphate carboxylase large chain
59684ACNSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
59876CANSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
59878CGNSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
59913GCSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60037AGSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60042TGSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60169TCNSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60287CASaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60384GCSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60417CGSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60615CGSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60665GTSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60914GCNSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60921TGSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
60963AGSaccDAcetyl-coenzyme A carboxylase carboxyl transferase subunit beta
62698CASycf4Proteins of unknown function
63405CAScemAChloroplast envelope membrane protein
63691ACNScemAChloroplast envelope membrane protein
64793GASpetACytochrome f
67969TGSpetGCytochrome b6-f complex subunit 5
112795TGNSndhFNAD(P)H-quinone oxidoreductase subunit 5
112868CGNSndhFNAD(P)H-quinone oxidoreductase subunit 5
112869CANSndhFNAD(P)H-quinone oxidoreductase subunit 5
113666CASndhFNAD(P)H-quinone oxidoreductase subunit 5
114616CGNSndhFNAD(P)H-quinone oxidoreductase subunit 5
114678GANSndhFNAD(P)H-quinone oxidoreductase subunit 5
117774TCSccsACytochrome c biogenesis protein

Note: L. aeg, Luffa aegyptiaca; a Ns: Non-synonymous, S: Synonymous

Table 6

Comparison of RNA editing patterns in L. acutangula and L. aegyptiaca chloroplast genomes with other species.

LocationGeneAA positionCodon conversionAA ChangeSubstitutionL. acutangulaL. aegyptiacaC. sativusC. pepoA. thalianaN. tabacum
LSCatpA258uCa→uUaS→LNonsynonymous(-)(+)(-)(-)(-)(-)
305uCa→uUaS→LNonsynonymous(-)(+)(-)(-)(-)(-)
383uCa→uUaS→LNonsynonymous(-)(+)(-)(-)(-)(-)
atpF31cCa→cUaP→LNonsynonymous(+)(+)(+)(+)(+)(+)
rps283uCa→uUaS→LNonsynonymous(-)(+)(+)(+)-(+)
rpoC21,245uCa→uUaS→LNonsynonymous(+)(+)(+)(+)-(+)
rpoB809uCa→uUaS→LNonsynonymous(-)(+)(+)(+)(+)(+)
ndhK22uCa→uUaS→LNonsynonymous(+)(+)(-)(-)(-)(-)
petA273Cag→UagQ→QSynonymous(-)-(-)(-)(-)(-)
276gCg→gUgA→SNonsynonymous(-)(+)(-)(-)(-)(-)
279guC→guUV→VSynonymous(-)-(-)(-)(-)(-)
psbJ20cCu→cUuP→LNonsynonymous(+)(+)(-)(-)(-)(-)
psbF26uCu→uUuS→FNonsynonymous(+)(+)(+)(-)(+)(+)
rpoA67uCu→uUuS→FNonsynonymous(+)(+)(-)(-)(-)(-)
277uCa→uUaS→LNonsynonymous(+)(+)(+)(+)-(+)
rps1136uuC→uuUF→FSynonymous--(-)(-)(-)(-)
IRbrpl2324uCu→uUuS→FNonsynonymous(-)(+)(-)(-)(-)(-)
SSCndhD97uCa→uUaS→LNonsynonymous(+)(-)(-)(-)(-)(-)
194uCa→uUaS→LNonsynonymous(+)(+)(-)(-)(-)(-)
262uCa→uUaS→LNonsynonymous(-)(+)(-)(-)(-)(-)
265uCg→uUgS→LNonsynonymous(+)(-)(-)(-)(-)(-)
ndhE77cCa→cUaP→LNonsynonymous(+)(+)(-)(-)(-)(-)
ndhA114uCa→uUaS→LNonsynonymous(+)(+)(-)(-)(+)(+)
ndhH169Cau→UauH→YNonsynonymous(+)(+)(-)(-)(-)(-)

Capital letters in codon triplets indicate target nucleotides; AA, Amino acid; (+), editing; (-), no editing; -, U encoded in the DNA (no editing); Blank space, Silent mutation

Table 7

Polycistronic gene clusters in L. acutangula and L. aegyptiaca chloroplast genomes.

FunctionGene clusterLuffa acutangulaLuffa aegyptiaca
GenesPositionLength (bp)GenesPositionLength (bp)
ATP synthaseatp-1atpI+atpH16,507..14,5661,942atpI+atpH16,511..14,5701,942
Ribosomal protein, ATP synthaseatp-2rps2+atpI+atpH17,422..14,5662,857rps2+atpI17,432..15,7681,665
NADH oxidoreductasendh-1ndhC+ndhK+ndhJ52,894..51,2151,680ndhC+ndhK+ndhJ52,970..51,2921,679
NADH oxidoreductasendh-2ndhE+psaC+ndhD120,578..118,1282,451ndhE+psaC+ndhD120,668..118,2242,445
Photosystem IIpsb-1psbE+psbF+psbL+psbJ66,388..65,615774psbE+psbF+psbL+psbJ66,493..65,721773
Ribosomal proteinrpl-1rpl14+rps8+infA+rpl36+rps1182,936..80,8562,081rpl16+rpl14+rps8+infA+rpl36+rps1184,678..80,9453,734
Ribosomal proteinrpl-2---rpl22+rps385,963..84,8191,145
Ribosomal proteinrpl-3---rpl23+rpl2+rps1988,163..86,0332,131
Ribosomal proteinrps-1---rps12+rpl2071,652..70,3931,260
Ribosomal proteinrps-2---rps19+rpl22+rps386,311..84,8191,493
Ribosomal protein, NADH oxidoreductaserps-3rps15+ndhH126,075..124,5171,559rps15+ndhH126,156..124,5991,558
Ribosomal RNAsrrn-1rrn23+rrn4.5+rrn5106,587.109,9773,391rrn23+rrn4.5+rrn5106,675..110,0653,391
Fig. 6

Phylogenetic relationship of 17 species within Cucurbitaceae family based on 66 protein-coding chloroplast genes. O. sativa and A. thaliana are outgroups. Numbers above the node are the bootstrap values of maximum likelihood (ML) analysis.

The chloroplast genomes of L. acutangula and L. aegyptiaca. Genes shown outside of the circle are transcribed counterclockwise, while those inside are transcribed clockwise, as shown by the arrows. The functions of genes are grouped by color. Asterisks indicate intron-containing genes. Chloroplast genome features among Cucurbitaceae species. List of genes present in L. acutangula and L. aegyptiaca chloroplast genomes. Gene with intron(s) Genes with intron(s) inL. acutangula and L. aegyptiaca chloroplast genomes. Amino acid frequencies in L. acutangula and L. aegyptiaca protein-coding sequences. The codon-anticodon recognition pattern and codon usage forL. acutangula and L. aegyptiaca chloroplast genomes. *RSCU (Relative synonymous codon usage) value ≥ 1.00 Frequency of codon usage in 23,224 and 23,220 codons in all potential protein-coding genes of L. acutangula and L. aegyptiaca, respectively; Gene encoding transfer RNA Comparison of the chloroplast genome borders of the LSC, SSC, and IR regions among six species, ψ partial fragment of the ycf1 gene. Alignment of chloroplast genome sequences, showing percent similarity, among six species using L. acutangula as a reference. Simple sequence repeat (SSR) analysis in L. acutagula andL. aegyptiaca chloroplast genomes. (a) SSR percentage in the LSC, SSC and IR regions, (b) Number of SSR per motif size. Candidate single nucleotide polymorphisms (SNPs) identified in CDS between the reference (L. Acutangula) and L. aegyptiaca. Note: L. aeg, Luffa aegyptiaca; a Ns: Non-synonymous, S: Synonymous Comparison of RNA editing patterns in L. acutangula and L. aegyptiaca chloroplast genomes with other species. Capital letters in codon triplets indicate target nucleotides; AA, Amino acid; (+), editing; (-), no editing; -, U encoded in the DNA (no editing); Blank space, Silent mutation Polycistronic gene clusters in L. acutangula and L. aegyptiaca chloroplast genomes. Phylogenetic relationship of 17 species within Cucurbitaceae family based on 66 protein-coding chloroplast genes. O. sativa and A. thaliana are outgroups. Numbers above the node are the bootstrap values of maximum likelihood (ML) analysis.

Experimental Design, Materials and Methods

DNA extraction, sequencing and assembly

Young leaves of L. acutangula (ridge gourd) and L. aegyptiaca (smooth gourd) plants from Chia Tai Company Limited were collected at National Omics Center, Thailand Science Park, Pathum Thani, Thailand in March 2019 for DNA extraction. Genomic DNA was extracted using a CTAB method [2]. Total DNA was examined using a NanoDrop One spectrophotometer (Thermo Scientific, Wilmington, USA) and visualized by pulsed-field gel electrophoresis (PFGE). High quality DNA was used to construct PacBio libraries according to the ‘Procedure & Checklist—20 Kb Template Preparation Using Bluepippin Size Selection System’ protocol and sequenced on the PacBio RSII system. The short PacBio reads were used to correct the long PacBio reads and the corrected long reads were assembled using CANU version 1.4 software [3]. The resulting contigs were blasted against the plastid genome database to identify any chloroplast contigs, which were used to construct full chloroplast genomes. Young leaves of L. acutangula and L. aegyptiaca seedlings (Chia Tai Co, Ltd) were harvested and genomic DNA isolated using the High Pure PCR Template Preparation kit of Roche. Genomic DNA was examined using a NanoDrop One spectrophotometer (Thermo Scientific, Wilmington, USA). High quality DNA was used to prepare Illumina Hiseq X Ten libraries and 150 bp pair-end sequencing was performed by Novogene, Singapore according to standard Illumina protocols.

Chloroplast genome annotation

The assembled chloroplast genomes of L. acutangula and L. aegyptiaca were annotated using GeSeq MPI-MP CHLOROBOX tool [4], specifically HMMER, tRNAscan and ARAGORN. An annotated genome map was generated using Organellar Genome DRAW (OGDRAW) [5]. Finally, the preliminary annotations were corrected manually to ensure that the correct start and stop positions were reported.

Codon usage analysis

L. acutangula and L. aegyptiaca coding sequences were used to calculate relative synonymous codon usage (RSCU) value using CodonW version 1.4.2 software [6]. Codon usage frequency was calculated and expressed as the number of codons encoding the same amino acid divided by the total number of codons [7].

Comparative structure analysis

IR regions in the chloroplast genomes of L. acutangula, L. aegyptiaca, Cucumis melo (NC_015983), Cucumis sativus (NC_007144), Citrullus lanatus (NC_032008), and Cucurbita pepo (NC_038229) were compared using IRscope software [8]. Sequences of all analyzed chloroplast genomes were aligned using LAGAN mode of mVISTA alignment software [9] (http://genome.lbl.gov/vista/mvista/submit.shtml).

Simple sequence repeat (SSR) analysis

L. acutangula and L. aegyptiaca chloroplast genomes were scanned for simple sequence repeats (SSRs) using MIcroSAtellite (MISA) identification tool [10]. The length threshold of minimum repetitive units were set to ten repeats for mono-nucleotide repeats, four repeats for di- and tri-nucleotide repeats, and three repeats for tetra-, penta- and hexa-nucleotide repeats according to the method of Ivanova and co-workers [11].

Single nucleotide polymorphism (SNP) identification

Illumina sequences were mapped to the chloroplast genomes using Burrows-Wheeler Aligner (BWA-MEM) software [12]. SNPs were identified from L. acutangula and L. aegyptiaca using Genome Analysis Toolkit (GATK) software v 4.1.2.0 [13]. All SNPs were filtered with criteria of read depth ≥ 20 and missing data ≤ 10%.

RNA editing analysis and polycistronic mRNA in chloroplast genomes

RNA sequencing of L. acutangula [SRA accession number: SRR11445640] and L. aegyptiaca [SRA accession number: SRR11452010] from isoform sequencing (Iso-seq) were obtained from a previous study of Pootakham et al. (2020) [1]. These long-read sequences were mapped to their corresponding chloroplast genomes using BWA-MEM software [12]. Subsequently, RNA editing sites were checked by calling SNPs using GATK and comparing to the genomic SNP data [13]. The RNA reads were mapped against their respective chloroplast genome sequence using blastN version 2.2.28 to identify single reads that spanned more than one gene to identify gene clusters that are co-transcribed.

Phylogenetic analysis

The chloroplast genomes of L. acutangula and L. aegyptiaca, together with 13 chloroplast genomes in the lineage of the Cucurbitaceae family were selected to analyze phylogenetic relationships. The 13 other species were Cucumis melo (NC_015983), Cucumis sativus (NC_007144), Coccinia grandis (NC_031834), Citrullus lanatus (NC_032008), Lagenaria siceraria (NC_036808), Cucurbita maxima (NC_036505), Cucurbita moschata (NC_036506), Cucurbita pepo (NC_038229), Trichosanthes kirilowii (NC_041088), Hodgsonia macrocarpa (NC_039628), Momordica charantia (NC_036807), Siraitia grosvenorii (NC_043881), and Gynostemma pentaphyllum (NC_029484). Oryza sativa (NC_031333) and Arabidopsis thaliana (NC_000932) were also included as outgroups. Sixty-six protein coding genes, conserved among these 17 species (Table S3), were aligned using Kalign software [14], and a phylogenetic tree was constructed using MEGA-X software [15] with the maximum likelihood (ML) method. Bootstrap analysis was calculated by 1000 replications for correction.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.
SubjectPlant Science
Specific subject areaGenomic
Type of dataTablesGraphFiguresRaw dataSequences
How data were acquiredPacific Biosciences sequencing (PacBio RSII sequencing)
Data formatChloroplast raw sequence data in FASTQ formatComplete chloroplast genome sequence in FASTA format
Parameters for data collectionGenomic DNA was extracted from fresh leaves of L. acutangula and L. aegyptiaca plants to derive from Chia Tai Company Limited.Leaves of 61 accessions of L. acutangula and 23 accessions of L. aegyptiaca seedlings (Chia Tai Co, Ltd) were harvested and genomic DNA isolated.
Description of data collectionPacBio libraries were prepared to sequence on the PacBio RSII sequencing for complete chloroplast genomes assembly.Illumina Hiseq X ten libraries with 150 bp pair-end were constructed and sequenced for simple sequence repeats (SSR) and single nucleotide polymorphism (SNP) identifications.
Data source locationInstitution: National Science and Technology Development Agency, Region: Khlong Luang, Pathum ThaniCountry: Thailand
Data accessibilityAll data in this article are available at NCBI, BioProject number PRJNA639390. Chloroplast raw sequence data with this article are accessible under SRA accession number SRR12011300 (L. acutangula) and SRR12011301 (L. aegyptiaca).Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/?term=SRR12011300https://www.ncbi.nlm.nih.gov/sra/?term=SRR12011301Complete chloroplast sequence data are accessible at NCBI under GenBank accession number MT381996 (L. acutangula) and MT381997 (L. aegyptiaca).Direct URL to data: https://www.ncbi.nlm.nih.gov/genome/?term=MT381996https://www.ncbi.nlm.nih.gov/genome/?term=MT381997Isoform sequencing (Iso-seq) data of L. acutangula [SRA accession number: SRR11445640] and L. aegyptiaca [SRA accession number: SRR11452010] were obtained from NCBI [1].Direct URL to data: https://www.ncbi.nlm.nih.gov/sra/?term=SRR11445640https://www.ncbi.nlm.nih.gov/sra/?term=SRR11452010
  14 in total

1.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors:  Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

2.  Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.).

Authors:  T Thiel; W Michalek; R K Varshney; A Graner
Journal:  Theor Appl Genet       Date:  2002-09-14       Impact factor: 5.699

3.  De novo assemblies of Luffa acutangula and Luffa cylindrica genomes reveal an expansion associated with substantial accumulation of transposable elements.

Authors:  Wirulda Pootakham; Chutima Sonthirod; Chaiwat Naktang; Wanapinun Nawae; Thippawan Yoocha; Wasitthee Kongkachana; Duangjai Sangsrakru; Nukoon Jomchai; Sonicha U-Thoomporn; John R Sheedy; Jarunee Buaboocha; Supat Mekiyanon; Sithichoke Tangphatsornruang
Journal:  Mol Ecol Resour       Date:  2020-08-25       Impact factor: 7.090

4.  Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment.

Authors:  Timo Lassmann; Erik L L Sonnhammer
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

5.  GeSeq - versatile and accurate annotation of organelle genomes.

Authors:  Michael Tillich; Pascal Lehwark; Tommaso Pellizzer; Elena S Ulbricht-Jones; Axel Fischer; Ralph Bock; Stephan Greiner
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

6.  Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation.

Authors:  Sergey Koren; Brian P Walenz; Konstantin Berlin; Jason R Miller; Nicholas H Bergman; Adam M Phillippy
Journal:  Genome Res       Date:  2017-03-15       Impact factor: 9.043

7.  OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes.

Authors:  Stephan Greiner; Pascal Lehwark; Ralph Bock
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

8.  Analysis of codon usage in type 1 and the new genotypes of duck hepatitis virus.

Authors:  Meng Wang; Jie Zhang; Jian-Hua Zhou; Hao-Tai Chen; Li-Na Ma; Yao-Zhong Ding; Wen-Qian Liu; Yuan-Xing Gu; Feng Zhao; Yong-Sheng Liu
Journal:  Biosystems       Date:  2011-06-17       Impact factor: 1.973

9.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

10.  The chloroplast genome sequence of mungbean (Vigna radiata) determined by high-throughput pyrosequencing: structural organization and phylogenetic relationships.

Authors:  S Tangphatsornruang; D Sangsrakru; J Chanprasert; P Uthaipaisanwong; T Yoocha; N Jomchai; S Tragoonrung
Journal:  DNA Res       Date:  2009-12-10       Impact factor: 4.458

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.