| Literature DB >> 29088105 |
Wenbin Wang1, Huan Yu2, Jiahui Wang3, Wanjun Lei4, Jianhua Gao5, Xiangpo Qiu6, Jinsheng Wang7.
Abstract
Forsythia suspensa is an important medicinal plant and traditionally applied for the treatment of inflammation, pyrexia, gonorrhea, diabetes, and so on. However, there is limited sequence and genomic information available for F. suspensa. Here, we produced the complete chloroplast genomes of F. suspensa using Illumina sequencing technology. F. suspensa is the first sequenced member within the genus Forsythia (Oleaceae). The gene order and organization of the chloroplast genome of F. suspensa are similar to other Oleaceae chloroplast genomes. The F. suspensa chloroplast genome is 156,404 bp in length, exhibits a conserved quadripartite structure with a large single-copy (LSC; 87,159 bp) region, and a small single-copy (SSC; 17,811 bp) region interspersed between inverted repeat (IRa/b; 25,717 bp) regions. A total of 114 unique genes were annotated, including 80 protein-coding genes, 30 tRNA, and four rRNA. The low GC content (37.8%) and codon usage bias for A- or T-ending codons may largely affect gene codon usage. Sequence analysis identified a total of 26 forward repeats, 23 palindrome repeats with lengths >30 bp (identity > 90%), and 54 simple sequence repeats (SSRs) with an average rate of 0.35 SSRs/kb. We predicted 52 RNA editing sites in the chloroplast of F. suspensa, all for C-to-U transitions. IR expansion or contraction and the divergent regions were analyzed among several species including the reported F. suspensa in this study. Phylogenetic analysis based on whole-plastome revealed that F. suspensa, as a member of the Oleaceae family, diverged relatively early from Lamiales. This study will contribute to strengthening medicinal resource conservation, molecular phylogenetic, and genetic engineering research investigations of this species.Entities:
Keywords: Forsythia suspensa; chloroplast genome; comparative genomics; phylogenetic analysis; sequencing
Mesh:
Substances:
Year: 2017 PMID: 29088105 PMCID: PMC5713258 DOI: 10.3390/ijms18112288
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Chloroplast genome map of Forsythia suspensa. Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes are color-coded based on their function, which are shown at the left bottom. The inner circle indicates the inverted boundaries and GC content.
A list of genes found in the plastid genome of Forsythia suspensa.
| Category for Genes | Group of Gene | Name of Gene |
|---|---|---|
| Photosynthesis related genes | Rubisco | |
| Photosystem І | ||
| Assembly/stability of photosystem І | ||
| Photosystem ІІ | ||
| ATP synthase | ||
| cytochrome b/f complex | ||
| cytochrome c synthesis | ||
| NADPH dehydrogenase | ||
| Transcription and translation related genes | transcription | |
| ribosomal proteins | ||
| translation initiation factor | ||
| RNA genes | ribosomal RNA | |
| transfer RNA | ||
| Other genes | RNA processing | |
| carbon metabolism | ||
| fatty acid synthesis | ||
| proteolysis | ||
| Genes of unknown function | conserved reading frames |
* indicate the intron-containing genes.
Genes with introns within the F. suspensa chloroplast genome and the length of exons and introns.
| Gene | Location | Exon І (bp) | Intron І (bp) | Exon ІІ (bp) | Intron ІІ (bp) | Exon ІІІ (bp) |
|---|---|---|---|---|---|---|
| IR | 38 | 814 | 35 | |||
| LSC | 24 | 676 | 48 | |||
| IR | 42 | 942 | 35 | |||
| LSC | 38 | 2494 | 37 | |||
| LSC | 37 | 473 | 50 | |||
| LSC | 38 | 572 | 37 | |||
| LSC | 114 | - | 231 | 536 | 27 | |
| LSC | 40 | 864 | 227 | |||
| LSC | 144 | 705 | 411 | |||
| LSC | 445 | 758 | 1619 | |||
| LSC | 129 | 714 | 228 | 737 | 153 | |
| LSC | 69 | 815 | 291 | 642 | 228 | |
| LSC | 6 | 707 | 642 | |||
| LSC | 8 | 713 | 475 | |||
| LSC | 9 | 865 | 399 | |||
| IR | 393 | 664 | 435 | |||
| IR | 777 | 679 | 756 | |||
| SSC | 555 | 1106 | 531 |
* The rps12 is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ end in the IR regions.
Figure 2Comparisons of LSC, SSC, and IR region borders among six Lamiales chloroplast genomes. Ψ indicates a pseudogene. Colorcoding mean different genes on both sides of the junctions. Number above the gene features means the distance between the ends of genes and the junction sites. The arrows indicated the location of the distance. This figure is not to scale.
The relative synonymous codon usage of the Forsythia suspensa chloroplast genome.
| Amino Acids | Codon | Number | RSCU | AA Frequency | Amino Acids | Codon | Number | RSCU | AA Frequency |
|---|---|---|---|---|---|---|---|---|---|
| Phe | UUU | 779 | 5.59% | Ser | UCU | 472 | 7.59% | ||
| UUC | 405 | 0.68 | UCC | 247 | 0.92 | ||||
| Leu | UUA | 720 | 10.56% | UCA | 307 | ||||
| UUG | 451 | UCG | 152 | 0.57 | |||||
| CUU | 486 | AGU | 339 | ||||||
| CUC | 129 | 0.35 | AGC | 91 | 0.34 | ||||
| CUA | 301 | 0.81 | Pro | CCU | 351 | 4.26% | |||
| CUG | 150 | 0.40 | CCC | 170 | 0.75 | ||||
| Ile | AUU | 890 | 8.57% | CCA | 269 | ||||
| AUC | 377 | 0.62 | CCG | 113 | 0.50 | ||||
| AUA | 548 | 0.91 | Thr | ACU | 430 | 4.98% | |||
| Met | AUG | 495 | 1.00 | 2.34% | ACC | 201 | 0.76 | ||
| Val | GUU | 423 | 5.41% | ACA | 324 | ||||
| GUC | 126 | 0.44 | ACG | 100 | 0.38 | ||||
| GUA | 447 | Ala | GCU | 526 | 5.41% | ||||
| GUG | 151 | 0.53 | GCC | 177 | 0.62 | ||||
| Tyr | UAU | 631 | 3.70% | GCA | 328 | ||||
| UAC | 152 | 0.39 | GCG | 115 | 0.40 | ||||
| TER | UAA | 28 | 0.25% | Cys | UGU | 171 | 1.05% | ||
| UAG | 10 | 0.58 | UGC | 52 | 0.47 | ||||
| UGA | 14 | 0.81 | Arg | CGU | 275 | 6.00% | |||
| His | CAU | 404 | 2.42% | CGC | 90 | 0.42 | |||
| CAC | 108 | 0.42 | CGA | 284 | |||||
| Gln | CAA | 595 | 3.69% | CGG | 97 | 0.46 | |||
| CAG | 186 | 0.48 | Arg | AGA | 392 | ||||
| Asn | AAU | 796 | 4.81% | AGG | 133 | 0.63 | |||
| AAC | 224 | 0.44 | Gly | GGU | 493 | 7.00% | |||
| Lys | AAA | 837 | 5.15% | GGC | 145 | 0.39 | |||
| AAG | 253 | 0.46 | GGA | 594 | |||||
| Asp | GAU | 690 | 4.09% | GGG | 251 | 0.68 | |||
| GAC | 176 | 0.41 | Glu | GAA | 866 | 5.32% | |||
| Trp | UGG | 386 | 1.00 | 1.82% | GAG | 262 | 0.46 |
The value of relative synonymous codon usage (RSCU) > 1 are highlighted in bold.
Repetitive sequences of Forsythia suspensa calculated using REPuter.
| No. | Size/bp | Type # | Repeat 1 Start (Location) | Repeat 2 Start (Location) | Region |
|---|---|---|---|---|---|
| 1 | 30 | F | 10,814 ( | 38,746 ( | LSC |
| 2 | 30 | F | 17,447 ( | 17,448 ( | LSC |
| 3 | 30 | F | 44,547 ( | 44,550 ( | LSC |
| 4 | 30 | F | 45,978 ( | 101,338 ( | LSC, IRa |
| 5 | 30 | F | 91,923 ( | 91,965 ( | IRa |
| 6 | 30 | F | 110,167 ( | 110,198 ( | IRa |
| 7 | 30 | F | 133,335 ( | 133,366 ( | IRb |
| 8 | 30 | F | 149,178 ( | 149,214 ( | IRb |
| 9 | 30 | F | 149,196 ( | 149,214 ( | IRb |
| 10 | 30 | F | 151,568 ( | 151,610 ( | IRb |
| 11 | 32 | F | 9313 ( | 37,781 ( | LSC |
| 12 | 32 | F | 40,965 ( | 43,189 ( | LSC |
| 13 | 32 | F | 53,338 ( | 53,358 ( | LSC |
| 14 | 32 | F | 115,350 ( | 115,378 ( | SSC |
| 15 | 34 | F | 94,332 ( | 94,368 ( | IRa |
| 16 | 34 | F | 94,350 ( | 94,368 ( | IRa |
| 17 | 35 | F | 149,188( | 149,206 ( | IRb |
| 18 | 39 | F | 45,966 ( | 101,326 ( | LSC, IRa |
| 19 | 39 | F | 45,966 ( | 122,604 ( | LSC, SSC |
| 20 | 41 | F | 40,953 ( | 43,177 ( | LSC |
| 21 | 41 | F | 101,324 ( | 122,602 ( | IRa, SSC |
| 22 | 42 | F | 94,320 ( | 94,356 ( | IRa |
| 23 | 42 | F | 149,165 ( | 149,201 ( | IRb |
| 24 | 44 | F | 94,340 ( | 94,358 ( | IRa |
| 25 | 58 | F | 94,332 ( | 94,340 ( | IRa |
| 26 | 58 | F | 149,165 ( | 149,183 ( | IRb |
| 27 | 30 | P | 9315 ( | 47,653 ( | LSC |
| 28 | 30 | P | 14,359 ( | 14,359 ( | LSC |
| 29 | 30 | P | 34,338 ( | 34,338 ( | LSC |
| 30 | 30 | P | 37,783 ( | 47,653 ( | LSC |
| 31 | 30 | P | 45,978 ( | 142,195 ( | LSC, IRb |
| 32 | 30 | P | 91,923 ( | 151,568 ( | IRa, IRb |
| 33 | 30 | P | 91,965 ( | 151,610 ( | IRa, IRb |
| 34 | 30 | P | 110,167 ( | 133,335 ( | IRa, IRb |
| 35 | 30 | P | 110,198 ( | 133,366 ( | IRa, IRb |
| 36 | 30 | P | 122,764 ( | 122,766 ( | SSC |
| 37 | 34 | P | 94,332 ( | 149,161 ( | IRa, IRb |
| 38 | 34 | P | 94,350 ( | 149,161 ( | IRa, IRb |
| 39 | 34 | P | 94,368 ( | 149,179 ( | IRa, IRb |
| 40 | 34 | P | 94,368 ( | 149,179 ( | IRa, IRb |
| 41 | 39 | P | 45,966 ( | 45,966 ( | LSC, IRb |
| 42 | 41 | P | 122,602 ( | 142,198 ( | SSC, IRb |
| 43 | 42 | P | 94,320 ( | 149,165 ( | IRa, IRb |
| 44 | 42 | P | 94,356 ( | 149,201 ( | IRa, IRb |
| 45 | 44 | P | 77,475 ( | 77,475 ( | LSC |
| 46 | 44 | P | 94,340 ( | 149,161 ( | IRa, IRb |
| 47 | 44 | P | 94,358 ( | 149,179 ( | IRa, IRb |
| 48 | 58 | P | 94,332 ( | 149,165 ( | IRa, IRb |
| 49 | 58 | P | 94,340 ( | 149,183 ( | IRa, IRb |
# F: forward; P: palindrome; * part in the gene.
Figure 3Distribution of repeat sequence and simple sequence repeats (SSRs) within F. suspensa chloroplast genomes. (A) Distribution of repeats; and (B) distribution of SSRs. IGS: intergenic spacer.
Distribution of SSR loci in the chloroplast genome of Forsythia suspensa.
| SSR Type # | SSR Sequence | Size | Start | SSR Location | Region |
|---|---|---|---|---|---|
| p1 | (A)10 | 10 | 31,855 | LSC | |
| 10 | 31,992 | LSC | |||
| 10 | 38,025 | LSC | |||
| 10 | 73,886 | LSC | |||
| 10 | 85,390 | LSC | |||
| (T)10 | 10 | 507 | LSC | ||
| 10 | 9056 | LSC | |||
| 10 | 11,162 | LSC | |||
| 10 | 59,781 | LSC | |||
| 10 | 66,291 | LSC | |||
| 10 | 69,202 | LSC | |||
| (C)10 | 10 | 5236 | LSC | ||
| (T)11 | 11 | 19,678 | LSC | ||
| 11 | 50,871 | LSC | |||
| 11 | 61,662 | LSC | |||
| 11 | 72,263 | LSC | |||
| 11 | 74,741 | LSC | |||
| (T)12 | 12 | 20,216 | LSC | ||
| 12 | 81,254 | LSC | |||
| 12 | 83,666 | LSC | |||
| (A)13 | 13 | 12,741 | LSC | ||
| 13 | 46,877 | LSC | |||
| (T)13 | 13 | 14,109 | LSC | ||
| 13 | 34,486 | LSC | |||
| 13 | 37,645 | LSC | |||
| 13 | 86,860 | LSC | |||
| (T)14 | 14 | 48,630 | LSC | ||
| (A)15 | 15 | 33,163 | LSC | ||
| (A)16 | 16 | 46,618 | LSC | ||
| (A)19 | 19 | 44,559 | LSC | ||
| (T)19 | 19 | 117,928 | SSC | ||
| (A)20 | 20 | 29,957 | LSC | ||
| p2 | (AT)5 | 10 | 4646 | LSC | |
| 10 | 6558 | LSC | |||
| 10 | 21,057 | LSC | |||
| (TA)5 | 10 | 69,619 | LSC | ||
| (TA)6 | 12 | 48,772 | LSC | ||
| 12 | 49,291 | LSC | |||
| 12 | 69,931 | LSC | |||
| p3 | (CCT)4 | 12 | 69,371 | LSC | |
| p4 | (AAAG)3 | 12 | 73,413 | LSC | |
| (TCTT)3 | 12 | 31,191 | LSC | ||
| (TTTA)3 | 12 | 55,102 | LSC | ||
| (AAAT)4 | 16 | 9284 | LSC | ||
| p5 | (TCTAT)3 | 15 | 9458 | LSC | |
| c | - | 23 | 17,456 | LSC | |
| - | 27 | 63,589 | LSC | ||
| - | 33 | 78,324 | LSC | ||
| - | 45 | 71,570 | LSC | ||
| - | 59 | 38,501 | LSC | ||
| - | 90 | 57,078 | LSC |
# p1: mono-nucleotide; p2: di-nucleotide; p3: tri-nucleotide; p4: tetra-nucleotide; p5: penta-nucleotide; c: compound; * part in the gene.
The predicted RNA editing site in the Forsythia suspensa chloroplast genes.
| Gene | Codon Position | Amino Acid Position | Codon (Amino Acid) Conversion | Score |
|---|---|---|---|---|
| 794 | 265 | uCg (S) => uUg (L) | 0.8 | |
| 1403 | 468 | cCu (P) => cUu (L) | 1 | |
| 914 | 305 | uCa (S) => uUa (L) | 1 | |
| 92 | 31 | cCa (P) => cUa(L) | 0.86 | |
| 629 | 210 | uCa (S) => uUa (L) | 1 | |
| 71 | 24 | aCu (T) => aUu (I) | 1 | |
| 271 | 91 | Ccu (P) => Ucu (S) | 0.86 | |
| 460 | 154 | Cac (H) => Uac (Y) | 1 | |
| 646 | 216 | Cau (H) => Uau (Y) | 1 | |
| 1180 | 394 | Cgg (R) => Ugg (W) | 1 | |
| 1249 | 417 | Cau (H) => Uau (Y) | 1 | |
| 344 | 115 | uCa (S) => uUa (L) | 1 | |
| 569 | 190 | uCa (S) => uUa (L) | 1 | |
| 149 | 50 | uCa (S) => uUa (L) | 1 | |
| 467 | 156 | cCa (P) => cUa (L) | 1 | |
| 586 | 196 | Cau (H) => Uau (Y) | 1 | |
| 611 | 204 | uCa (S) => uUa (L) | 0.8 | |
| 737 | 246 | cCa (P) => cUa (L) | 1 | |
| 746 | 249 | uCu (S) => uUu (F) | 1 | |
| 830 | 277 | uCa (S) => uUa (L) | 1 | |
| 836 | 279 | uCa (S) => uUa (L) | 1 | |
| 1292 | 431 | uCc (S) => uUc (F) | 1 | |
| 1481 | 494 | cCa (P) => cUa (L) | 1 | |
| 2 | 1 | aCg (T) => aUg (M) | 1 | |
| 47 | 16 | uCu (S) => uUu (F) | 0.8 | |
| 313 | 105 | Cgg (R) => Ugg (W) | 0.8 | |
| 878 | 293 | uCa (S) => uUa (L) | 1 | |
| 1298 | 433 | uCa (S) => uUa (L) | 0.8 | |
| 1310 | 437 | uCa (S) => uUa (L) | 0.8 | |
| 290 | 97 | uCa (S) => uUa (L) | 1 | |
| 671 | 224 | uCa (S) => uUa (L) | 1 | |
| 314 | 105 | aCa (T) => aUa (I) | 0.8 | |
| 385 | 129 | Cca (P) => Uca (S) | 0.8 | |
| 418 | 140 | Cgg (R) => Ugg (W) | 1 | |
| 611 | 204 | cCa (P) => cUa (L) | 1 | |
| 94 | 32 | Cuu (L) => Uuu (F) | 0.86 | |
| 214 | 72 | Ccu (P) => Ucu (S) | 1 | |
| 596 | 199 | gCg (A) => gUg (V) | 0.86 | |
| 308 | 103 | uCa (S) => uUa (L) | 0.86 | |
| 830 | 277 | uCa (S) => uUa (L) | 1 | |
| 338 | 113 | uCu (S) => uUu (F) | 1 | |
| 551 | 184 | uCa (S) => uUa (L) | 1 | |
| 566 | 189 | uCg (S) => uUg (L) | 1 | |
| 1672 | 558 | Ccc (P) => Ucc (S) | 0.86 | |
| 2000 | 667 | uCu (S) => uUu (F) | 1 | |
| 2426 | 809 | uCa (S) => uUa (L) | 0.86 | |
| 1792 | 598 | Cgu (R) => Ugu (C) | 0.86 | |
| 2305 | 769 | Cgg (R) => Ugg (W) | 1 | |
| 3746 | 1249 | uCa (S) => uUa (L) | 0.86 | |
| 248 | 83 | uCa (S) => uUa (L) | 1 | |
| 80 | 27 | uCa (S) => uUa (L) | 1 | |
| 149 | 50 | cCa (P) => cUa (L) | 1 |
Figure 4Maximum likelihood phylogeny of the Lamiales species inferred from complete chloroplast genome sequences. Numbers near branches are bootstrap values of 100 pseudo-replicates. The tree on the right panel was constructed manually by reference to the left one, and the distance of branches was meaningless. The branches without numbers indicate 100% bootstrap supports.