| Literature DB >> 33195780 |
Chutintorn Yundaeng1, Wanapinun Nawae1, Chaiwat Naktang1, Jeremy R Shearman1, Chutima Sonthirod1, Duangjai Sangsrakru1, Thippawan Yoocha1, Nukoon Jomchai1, John R Sheedy2, Supat Mekiyanon2, Methawat Tuntaisong1, Wirulda Pootakham1, Sithichoke Tangphatsornruang1.
Abstract
Luffa acutangula and Luffa aegyptiaca are domesticated plants in the family Cucurbitaceae. They are mainly cultivated in the tropical and subtropical regions of Asia. The chloroplast genomes of many Cucurbitaceae species were sequenced to examine gene content and evolution. However, the chloroplast genome sequences of L. acutangula and L. aegyptiaca have not been reported. We report the first complete sequences of L. acutangula and L. aegyptiaca chloroplast genomes obtained from Pacific Biosciences sequencing and use them to infer evolutionary relationships. The chloroplast genomes of L. acutangula and L. aegyptiaca are 157,202 and 157,275 bp, respectively. Both genomes possessed the typical quadripartite structure and contained 131 genes, including 87 coding genes, 36 tRNA genes and 8 rRNA genes. We identified simple sequence repeats (SSR) and single nucleotide polymorphisms (SNP) from both chloroplast genomes. Polycistronic mRNA was examined in L. acutangula and L. aegyptiaca using RNA sequences from Isoform sequencing to identify co-transcribed genes. IR size and locations were compared to other species and found to be relatively unchanged. Phylogenetic analysis confirmed the close relationship between L. acutangula and L. aegyptiaca in the Cucurbitaceae lineage and showed separation of the Luffa monophyletic clade from other species in the subtribe Sicyocae. The results obtained from this study can be useful for studying the evolution of Cucurbitaceae plants.Entities:
Keywords: Luffa acutangula; Luffa aegyptiaca; PacBio sequencing; chloroplast genome; comparative analysis
Year: 2020 PMID: 33195780 PMCID: PMC7644877 DOI: 10.1016/j.dib.2020.106470
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1The chloroplast genomes of L. acutangula and L. aegyptiaca. Genes shown outside of the circle are transcribed counterclockwise, while those inside are transcribed clockwise, as shown by the arrows. The functions of genes are grouped by color. Asterisks indicate intron-containing genes.
Chloroplast genome features among Cucurbitaceae species.
| Genome size (bp) | 157,202 | 157,275 | 156,906 | 156,017 | 155,293 | 157,343 |
| LSC size (bp) | 86,226 | 86,310 | 86,846 | 86,335 | 86,689 | 87,828 |
| SSC size (bp) | 18,402 | 18,393 | 17,898 | 18,090 | 18,209 | 18,169 |
| IRs size (bp) | 26,280 | 26,286 | 26,081 | 25,796 | 25,199 | 25,678 |
| GC content (%) | 37.14 | 37.12 | 37.18 | 36.92 | 37.08 | 37.16 |
| LSC GC content (%) | 34.96 | 34.93 | 34.94 | 34.67 | 34.85 | 34.91 |
| SSC GC content (%) | 31.02 | 31.04 | 31.54 | 30.94 | 31.83 | 31.44 |
| IRs GC content (%) | 42.86 | 42.86 | 42.84 | 42.79 | 42.83 | 43.05 |
| No. of genes | 131 | 131 | 124 | 135 | 133 | 131 |
| No. of CDS | 87 | 87 | 87 | 90 | 89 | 86 |
| No. of tRNA | 36 | 36 | 29 | 37 | 37 | 37 |
| No. of rRNA | 8 | 8 | 8 | 8 | 8 | 8 |
| No. of CDS with intron | 15 | 15 | 10 | 16 | 15 | 15 |
| Gene coding density (%) | 50.08 | 50.04 | 49.74 | 51.74 | 50.06 | 46.60 |
| Genbank accession number | MT381996 | MT381997 | NC_032008 | NC_015983 | NC_007144 | NC_038229 |
List of genes present in L. acutangula and L. aegyptiaca chloroplast genomes.
| Category | Gene groups | Gene name |
|---|---|---|
| Photosynthesis | Photosystem I (5) | |
| Photosystem II (15) | ||
| Cytochome b6/f complex (6) | ||
| ATP synthase (6) | ||
| Rubisco large subunit (1) | ||
| NADH dehydrogenase (12) | ||
| Self-replication | Large subunit Ribosomal protein (11) | |
| Small subunit ribosomal protein (14) | ||
| RNA polymerase (4) | ||
| Ribosomal RNAs (8) | ||
| Transfer RNAs (36) | ||
| Other genes | Acetyl-CoA carboxylase gene (1) | |
| c-type cytochrome biogenesis (1) | ||
| ATP-dependent protease subunit (1) | ||
| Maturease (1) | ||
| Membrane protein (1) | ||
| Proteins of unknown function (7) | ||
| Translation-related gene (1) |
Gene with intron(s)
Genes with intron(s) inL. acutangula and L. aegyptiaca chloroplast genomes.
| Gene | Location | Species | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Exon I | Intron I | Exon II | Intron II | Exon III | Exon I | Intron I | Exon II | Intron II | Exon III | ||
| (bp) | (bp) | (bp) | (bp) | (bp) | (bp) | (bp) | (bp) | (bp) | (bp) | ||
| LSC | 42 | 855 | 213 | - | - | 45 | 856 | 213 | - | - | |
| LSC | 144 | 755 | 411 | - | - | 144 | 757 | 411 | - | - | |
| LSC | 432 | 753 | 1611 | - | - | 432 | 756 | 1611 | - | - | |
| LSC | 126 | 740 | 228 | 743 | 153 | 126 | 740 | 228 | 740 | 156 | |
| LSC | 69 | 847 | 288 | 613 | 228 | 69 | 835 | 297 | 615 | 225 | |
| LSC | 6 | 783 | 642 | - | - | 9 | 780 | 642 | - | - | |
| LSC | 9 | 727 | 474 | - | - | 9 | 732 | 474 | - | - | |
| LSC | 9 | 1100 | 402 | - | - | 9 | 1098 | 402 | - | - | |
| IRb | 390 | 665 | 435 | - | - | 390 | 665 | 435 | - | - | |
| IRb | 777 | 686 | 756 | - | - | 777 | 686 | 756 | - | - | |
| IRb | 114 | 28918 | 234 | 537 | 27 | 114 | 28346 | 234 | 537 | 27 | |
| SSC | 552 | 1155 | 540 | - | - | 552 | 1146 | 540 | - | - | |
| IRa | 114 | 71157 | 234 | 537 | 27 | 114 | 71136 | 234 | 537 | 27 | |
| IRa | 786 | 677 | 756 | - | - | 777 | 686 | 756 | - | - | |
| IRa | 390 | 665 | 435 | - | - | 393 | 662 | 435 | - | - | |
Fig. 2Amino acid frequencies in L. acutangula and L. aegyptiaca protein-coding sequences.
The codon-anticodon recognition pattern and codon usage forL. acutangula and L. aegyptiaca chloroplast genomes.
| Amino acid | Codon | Frequency | RSCU | trn | ||
|---|---|---|---|---|---|---|
| Phe | UUU | 957 | 957 | 1.29 | 1.29 | |
| Phe | UUC | 530 | 529 | 0.71 | 0.71 | |
| Leu | UUA | 860 | 860 | 1.88 | 1.88 | |
| Leu | UUG | 556 | 556 | 1.22 | 1.22 | |
| Leu | CUU | 585 | 585 | 1.28 | 1.28 | |
| Leu | CUC | 190 | 189 | 0.42 | 0.41 | |
| Leu | CUA | 377 | 379 | 0.82 | 0.83 | |
| Leu | CUG | 174 | 176 | 0.38 | 0.38 | |
| Ile | AUU | 84 | 83 | 1.45 | 1.45 | |
| Ile | AUC | 474 | 472 | 0.63 | 0.63 | |
| Ile | AUA | 688 | 687 | 0.92 | 0.92 | |
| Met | AUG | 624 | 625 | 1 | 1 | |
| Val | GUU | 508 | 507 | 1.43 | 1.43 | |
| Val | GUC | 181 | 183 | 0.51 | 0.52 | |
| Val | GUA | 530 | 531 | 1.5 | 1.5 | |
| Val | GUG | 198 | 198 | 0.56 | 0.56 | |
| Ser | UCU | 571 | 566 | 1.69 | 1.68 | |
| Ser | UCC | 319 | 322 | 0.94 | 0.95 | |
| Ser | UCA | 428 | 429 | 1.27 | 1.27 | |
| Ser | UCG | 189 | 188 | 0.56 | 0.56 | |
| Pro | CCU | 413 | 410 | 1.53 | 1.52 | |
| Pro | CCC | 201 | 203 | 0.75 | 0.75 | |
| Pro | CCA | 315 | 314 | 1.17 | 1.17 | |
| Pro | CCG | 150 | 151 | 0.56 | 0.56 | |
| Thr | ACU | 534 | 535 | 1.61 | 1.61 | |
| Thr | ACC | 248 | 248 | 0.75 | 0.75 | |
| Thr | ACA | 397 | 399 | 1.2 | 1.2 | |
| Thr | ACG | 149 | 147 | 0.45 | 0.44 | |
| Ala | GCU | 634 | 635 | 1.81 | 1.81 | |
| Ala | GCC | 231 | 232 | 0.66 | 0.66 | |
| Ala | GCA | 384 | 383 | 1.1 | 1.09 | |
| Ala | GCG | 149 | 150 | 0.43 | 0.43 | |
| Tyr | UAU | 782 | 784 | 1.6 | 1.6 | |
| Tyr | UAC | 194 | 194 | 0.4 | 0.4 | |
| STOP | UAA | 54 | 54 | 1.93 | 1.93 | |
| STOP | UAG | 16 | 16 | 0.57 | 0.57 | |
| His | CAU | 475 | 477 | 1.53 | 1.53 | |
| His | CAC | 147 | 146 | 0.47 | 0.47 | |
| Gln | CAA | 719 | 720 | 1.54 | 1.54 | |
| Gln | CAG | 215 | 216 | 0.46 | 0.46 | |
| Asn | AAU | 983 | 982 | 1.54 | 1.53 | |
| Asn | AAC | 293 | 298 | 0.46 | 0.47 | |
| Lys | AAA | 48 | 42 | 1.5 | 1.5 | |
| Lys | AAG | 350 | 348 | 0.5 | 0.5 | |
| Asp | GAU | 873 | 871 | 1.61 | 1.61 | |
| Asp | GAC | 211 | 209 | 0.39 | 0.39 | |
| Glu | GAA | 20 | 22 | 1.49 | 1.49 | |
| Glu | GAG | 348 | 349 | 0.51 | 0.51 | |
| Cys | UGU | 216 | 216 | 1.47 | 1.47 | |
| Cys | UGC | 78 | 78 | 0.53 | 0.53 | |
| STOP | UGA | 14 | 14 | 0.5 | 0.5 | |
| Trp | UGG | 464 | 462 | 1 | 1 | |
| Arg | CGU | 354 | 354 | 1.34 | 1.34 | |
| Arg | CGC | 103 | 100 | 0.39 | 0.38 | |
| Arg | CGA | 368 | 370 | 1.4 | 1.41 | |
| Arg | CGG | 113 | 112 | 0.43 | 0.43 | |
| Ser | AGU | 401 | 399 | 1.19 | 1.18 | |
| Ser | AGC | 121 | 122 | 0.36 | 0.36 | |
| Arg | AGA | 474 | 478 | 1.8 | 1.82 | |
| Arg | AGG | 168 | 166 | 0.64 | 0.63 | |
| Gly | GGU | 606 | 606 | 1.35 | 1.35 | |
| Gly | GGC | 166 | 167 | 0.37 | 0.37 | |
| Gly | GGA | 727 | 727 | 1.62 | 1.62 | |
| Gly | GGG | 295 | 292 | 0.66 | 0.65 | |
*RSCU (Relative synonymous codon usage) value ≥ 1.00
Frequency of codon usage in 23,224 and 23,220 codons in all potential protein-coding genes of L. acutangula and L. aegyptiaca, respectively;
Gene encoding transfer RNA
Fig. 3Comparison of the chloroplast genome borders of the LSC, SSC, and IR regions among six species, ψ partial fragment of the ycf1 gene.
Fig. 4Alignment of chloroplast genome sequences, showing percent similarity, among six species using L. acutangula as a reference.
Fig. 5Simple sequence repeat (SSR) analysis in L. acutagula andL. aegyptiaca chloroplast genomes. (a) SSR percentage in the LSC, SSC and IR regions, (b) Number of SSR per motif size.
Candidate single nucleotide polymorphisms (SNPs) identified in CDS between the reference (L. Acutangula) and L. aegyptiaca.
| Position | Reference | L. aeg | Sustitutionsa | Gene | Function |
|---|---|---|---|---|---|
| 1973 | T | C | NS | Maturease K | |
| 3132 | G | T | S | Maturease K | |
| 5299 | T | G | NS | 30S ribosomal protein S16 | |
| 8127 | C | A | NS | Photosystem II reaction center protein K | |
| 8217 | C | A | NS | Photosystem II reaction center protein K | |
| 12059 | G | T | S | ATP synthase subunit alpha | |
| 13328 | G | T | S | ATP synthase subunit b | |
| 17060 | G | T | S | 30S ribosomal protein S2 | |
| 17982 | C | A | NS | DNA-directed RNA polymerase subunit beta | |
| 18665 | C | A | NS | DNA-directed RNA polymerase subunit beta | |
| 19148 | C | T | S | DNA-directed RNA polymerase subunit beta | |
| 19540 | C | A | NS | DNA-directed RNA polymerase subunit beta | |
| 20274 | G | T | NS | DNA-directed RNA polymerase subunit beta | |
| 20678 | A | G | S | DNA-directed RNA polymerase subunit beta | |
| 20777 | A | G | S | DNA-directed RNA polymerase subunit beta | |
| 25097 | G | T | S | DNA-directed RNA polymerase subunit beta | |
| 26705 | C | T | S | DNA-directed RNA polymerase subunit beta | |
| 27002 | C | T | S | DNA-directed RNA polymerase subunit beta | |
| 35125 | G | C | NS | Photosystem II D2 protein | |
| 51601 | G | T | NS | NAD(P)H-quinone oxidoreductase subunit J | |
| 52335 | G | T | S | NAD(P)H-quinone oxidoreductase subunit K | |
| 55091 | A | T | S | ATP synthase epsilon chain | |
| 55260 | T | G | NS | ATP synthase subunit beta | |
| 55588 | C | A | S | ATP synthase subunit beta | |
| 56576 | G | A | NS | ATP synthase subunit beta | |
| 57691 | T | G | NS | Ribulose bisphosphate carboxylase large chain | |
| 59684 | A | C | NS | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 59876 | C | A | NS | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 59878 | C | G | NS | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 59913 | G | C | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60037 | A | G | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60042 | T | G | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60169 | T | C | NS | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60287 | C | A | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60384 | G | C | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60417 | C | G | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60615 | C | G | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60665 | G | T | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60914 | G | C | NS | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60921 | T | G | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 60963 | A | G | S | Acetyl-coenzyme A carboxylase carboxyl transferase subunit beta | |
| 62698 | C | A | S | Proteins of unknown function | |
| 63405 | C | A | S | Chloroplast envelope membrane protein | |
| 63691 | A | C | NS | Chloroplast envelope membrane protein | |
| 64793 | G | A | S | Cytochrome f | |
| 67969 | T | G | S | Cytochrome b6-f complex subunit 5 | |
| 112795 | T | G | NS | NAD(P)H-quinone oxidoreductase subunit 5 | |
| 112868 | C | G | NS | NAD(P)H-quinone oxidoreductase subunit 5 | |
| 112869 | C | A | NS | NAD(P)H-quinone oxidoreductase subunit 5 | |
| 113666 | C | A | S | NAD(P)H-quinone oxidoreductase subunit 5 | |
| 114616 | C | G | NS | NAD(P)H-quinone oxidoreductase subunit 5 | |
| 114678 | G | A | NS | NAD(P)H-quinone oxidoreductase subunit 5 | |
| 117774 | T | C | S | Cytochrome c biogenesis protein |
Note: L. aeg, Luffa aegyptiaca; a Ns: Non-synonymous, S: Synonymous
Comparison of RNA editing patterns in L. acutangula and L. aegyptiaca chloroplast genomes with other species.
| Location | Gene | AA position | Codon conversion | AA Change | Substitution | L. acutangula | L. aegyptiaca | C. sativus | C. pepo | A. thaliana | N. tabacum |
|---|---|---|---|---|---|---|---|---|---|---|---|
| LSC | 258 | uCa→uUa | S→L | Nonsynonymous | (-) | (+) | (-) | (-) | (-) | (-) | |
| 305 | uCa→uUa | S→L | Nonsynonymous | (-) | (+) | (-) | (-) | (-) | (-) | ||
| 383 | uCa→uUa | S→L | Nonsynonymous | (-) | (+) | (-) | (-) | (-) | (-) | ||
| 31 | cCa→cUa | P→L | Nonsynonymous | (+) | (+) | (+) | (+) | (+) | (+) | ||
| 83 | uCa→uUa | S→L | Nonsynonymous | (-) | (+) | (+) | (+) | - | (+) | ||
| 1,245 | uCa→uUa | S→L | Nonsynonymous | (+) | (+) | (+) | (+) | - | (+) | ||
| 809 | uCa→uUa | S→L | Nonsynonymous | (-) | (+) | (+) | (+) | (+) | (+) | ||
| 22 | uCa→uUa | S→L | Nonsynonymous | (+) | (+) | (-) | (-) | (-) | (-) | ||
| 273 | Cag→Uag | Q→Q | Synonymous | (-) | - | (-) | (-) | (-) | (-) | ||
| 276 | gCg→gUg | A→S | Nonsynonymous | (-) | (+) | (-) | (-) | (-) | (-) | ||
| 279 | guC→guU | V→V | Synonymous | (-) | - | (-) | (-) | (-) | (-) | ||
| 20 | cCu→cUu | P→L | Nonsynonymous | (+) | (+) | (-) | (-) | (-) | (-) | ||
| 26 | uCu→uUu | S→F | Nonsynonymous | (+) | (+) | (+) | (-) | (+) | (+) | ||
| 67 | uCu→uUu | S→F | Nonsynonymous | (+) | (+) | (-) | (-) | (-) | (-) | ||
| 277 | uCa→uUa | S→L | Nonsynonymous | (+) | (+) | (+) | (+) | - | (+) | ||
| 36 | uuC→uuU | F→F | Synonymous | - | - | (-) | (-) | (-) | (-) | ||
| IRb | 24 | uCu→uUu | S→F | Nonsynonymous | (-) | (+) | (-) | (-) | (-) | (-) | |
| SSC | 97 | uCa→uUa | S→L | Nonsynonymous | (+) | (-) | (-) | (-) | (-) | (-) | |
| 194 | uCa→uUa | S→L | Nonsynonymous | (+) | (+) | (-) | (-) | (-) | (-) | ||
| 262 | uCa→uUa | S→L | Nonsynonymous | (-) | (+) | (-) | (-) | (-) | (-) | ||
| 265 | uCg→uUg | S→L | Nonsynonymous | (+) | (-) | (-) | (-) | (-) | (-) | ||
| 77 | cCa→cUa | P→L | Nonsynonymous | (+) | (+) | (-) | (-) | (-) | (-) | ||
| 114 | uCa→uUa | S→L | Nonsynonymous | (+) | (+) | (-) | (-) | (+) | (+) | ||
| 169 | Cau→Uau | H→Y | Nonsynonymous | (+) | (+) | (-) | (-) | (-) | (-) |
Capital letters in codon triplets indicate target nucleotides; AA, Amino acid; (+), editing; (-), no editing; -, U encoded in the DNA (no editing); Blank space, Silent mutation
Polycistronic gene clusters in L. acutangula and L. aegyptiaca chloroplast genomes.
| Function | Gene cluster | Luffa acutangula | Luffa aegyptiaca | ||||
|---|---|---|---|---|---|---|---|
| Genes | Position | Length (bp) | Genes | Position | Length (bp) | ||
| ATP synthase | atp-1 | 16,507..14,566 | 1,942 | atpI+atpH | 16,511..14,570 | 1,942 | |
| Ribosomal protein, ATP synthase | atp-2 | 17,422..14,566 | 2,857 | rps2+atpI | 17,432..15,768 | 1,665 | |
| NADH oxidoreductase | ndh-1 | 52,894..51,215 | 1,680 | ndhC+ndhK+ndhJ | 52,970..51,292 | 1,679 | |
| NADH oxidoreductase | ndh-2 | 120,578..118,128 | 2,451 | ndhE+psaC+ndhD | 120,668..118,224 | 2,445 | |
| Photosystem II | psb-1 | 66,388..65,615 | 774 | psbE+psbF+psbL+psbJ | 66,493..65,721 | 773 | |
| Ribosomal protein | rpl-1 | 82,936..80,856 | 2,081 | rpl16+rpl14+rps8+ | 84,678..80,945 | 3,734 | |
| Ribosomal protein | rpl-2 | - | - | - | rpl22+rps3 | 85,963..84,819 | 1,145 |
| Ribosomal protein | rpl-3 | - | - | - | rpl23+rpl2+rps19 | 88,163..86,033 | 2,131 |
| Ribosomal protein | rps-1 | - | - | - | rps12+rpl20 | 71,652..70,393 | 1,260 |
| Ribosomal protein | rps-2 | - | - | - | rps19+rpl22+rps3 | 86,311..84,819 | 1,493 |
| Ribosomal protein, NADH oxidoreductase | rps-3 | 126,075..124,517 | 1,559 | rps15+ndhH | 126,156..124,599 | 1,558 | |
| Ribosomal RNAs | rrn-1 | 106,587.109,977 | 3,391 | rrn23+rrn4.5+rrn5 | 106,675..110,065 | 3,391 |
Fig. 6Phylogenetic relationship of 17 species within Cucurbitaceae family based on 66 protein-coding chloroplast genes. O. sativa and A. thaliana are outgroups. Numbers above the node are the bootstrap values of maximum likelihood (ML) analysis.
| Subject | Plant Science |
|---|---|
| Specific subject area | Genomic |
| Type of data | Tables |
| How data were acquired | Pacific Biosciences sequencing (PacBio RSII sequencing) |
| Data format | Chloroplast raw sequence data in FASTQ format |
| Parameters for data collection | Genomic DNA was extracted from fresh leaves of |
| Description of data collection | PacBio libraries were prepared to sequence on the PacBio RSII sequencing for complete chloroplast genomes assembly. |
| Data source location | Institution: National Science and Technology Development Agency, Region: Khlong Luang, Pathum Thani |
| Data accessibility | All data in this article are available at NCBI, BioProject number PRJNA639390. Chloroplast raw sequence data with this article are accessible under SRA accession number SRR12011300 ( |