| Literature DB >> 35368674 |
Feng Li1,2, Ying Liu1, Junhui Wang1, Peiyao Xin2, Jiangtao Zhang3, Kun Zhao4, Minggang Zhang5, Huiling Yun6, Wenjun Ma1.
Abstract
Species within the Genus Catalpa are mostly semievergreen or deciduous trees with opposite or whorled leaves. C. bungei, C. fargesii f. duclouxii and C. fargesii are sources of traditional precious wood in China, known as the "kings of wood". Due to a lack of phenotypic and molecular studies and insufficient sequence information, intraspecific morphological differences, common DNA barcodes and partial sequence fragments cannot clearly reveal the phylogenetic or intraspecific relationships within Catalpa. Therefore, we sequenced the complete chloroplast genomes of six taxa of the genus Catalpa and analyzed their basic structure and evolutionary relationships. The chloroplast genome of Catalpa shows a typical tetrad structure with a total length ranging from 157,765 bp (C. fargesii) to 158,355 bp (C. ovata). The length of the large single-copy (LSC) region ranges from 84,599 bp (C. fargesii) to 85,004 bp (C. ovata), that of the small single-copy (SSC) region ranges from 12,662 bp (C. fargesii) to 12,675 bp (C. ovata), and that of the inverted repeat (IR) regions ranges from 30,252 bp (C. fargesii) to 30,338 bp (C. ovata). The GC content of the six chloroplast genomes were 38.1%. In total, 113 unique genes were detected, and there were 19 genes in IR regions. The 113 genes included 79 protein-coding genes, 30 tRNA genes and four rRNA genes. Five hypervariable regions (trnH-psbA, rps2-rpoC2, rpl22, ycf15-trnl-CAA and rps15) were identified by analyzing chloroplast nucleotide polymorphisms, which might be serve as potential DNA barcodes for the species. Comparative analysis showed that single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) were highly diverse in the six species. Codon usage patterns were highly similar among the taxa included in the present study. In addition to the stop codons, all codons showed a preference for ending in A or T. Phylogenetic analysis of the entire chloroplast genome showed that all taxa within the genus Catalpa formed a monophyletic group, clearly reflecting the relationships within the genus. This study provides information on the chloroplast genome sequence, structural variation, codon bias and phylogeny of Catalpa, which will facilitate future research efforts.Entities:
Keywords: catalpa; chloroplast genome; chloroplast structure; codon bias; phylogenetic; simple sequence repeat
Year: 2022 PMID: 35368674 PMCID: PMC8966708 DOI: 10.3389/fgene.2022.845619
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Summaries of the complete chloroplast genomes of six taxa within the genus Catalpa.
|
|
|
| C. |
|
| |
|---|---|---|---|---|---|---|
| GenBank accession number | OL628868 | OL628869 | OL628866 | OL628865 | OL628867 | OL628864 |
| Raw data point no | 21995192 | 20011296 | 20020162 | 21755074 | 25543980 | 16234746 |
| Mapped read no | 4522145 | 5797641 | 5243356 | 4686024 | 7111529 | 3929601 |
| Percentage of chloroplast genome reads (%) | 20.56 | 28.97 | 26.19 | 21.54 | 27.84 | 24.20 |
| Chloroplast genome coverage (X) | 4,289 | 5512 | 4,973 | 4,445 | 6742 | 3722 |
| Plastome size (bp) | 158164 | 157765 | 158164 | 158143 | 158213 | 158355 |
| LSC length (bp) | 84929 | 84599 | 84929 | 84909 | 84931 | 85004 |
| IR length (bp) | 30285 | 30252 | 30285 | 30285 | 30309 | 30338 |
| SSC length (bp) | 12665 | 12662 | 12665 | 12664 | 12664 | 12675 |
| GC content (%) | 38.10 | 38.10 | 38.10 | 38.10 | 38.10 | 38.10 |
| Number of protein-coding genes | 79 | 79 | 79 | 79 | 79 | 79 |
| Number of tRNA genes | 30 | 30 | 30 | 30 | 30 | 30 |
| Number of rRNA genes | 4 | 4 | 4 | 4 | 4 | 4 |
FIGURE 1Gene map of the Catalpa chloroplast genome. Genes shown outside the outer circle are transcribed clockwise, and those insides are transcribed counterclockwise. Genes are color coded according to different functional groups. The darker gray in the inner circle indicates the GC content, and the lighter gray indicates the AT content. The inner circle also shows that the chloroplast genome contains two copies of inverted repeats (IRA and IRB), a large single-copy (LSC) region and a small single-copy (SSC) region. The map was constructed using OrganellarGenomeDRAW.
Genes in the Catalpa chloroplast genomes.
| Gene category | Gene group | Gene name |
|---|---|---|
| Photosynthesis-related genes | Rubisco | rbcL |
| Photosystem I | psaA, psaB, psaC, psaI, psaJ | |
| Assembly/stability of photosystem I | **ycf3, ycf4 | |
| Photosystem II | psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ | |
| ATP synthase | atpA, atpB, atpE, *atpF, atpH, atpI | |
| Cytochrome b/f complex | petA, *petB, *petD, petG, petL, petN | |
| Cytochrome c synthesis | ccsA | |
| NADPH dehydrogenase | *ndhA, *ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK | |
| Transcription- and translation-related genes | Transcription | rpoA, rpoB, *rpoC1, rpoC2 |
| Ribosomal proteins | rps2, rps3, rps4, rps7, rps8, rps11, rps12, rps14, rps15, rps16, rps18, rps19,*rpl2, rpl14, *rpl16, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36 | |
| RNA genes | Ribosomal RNA | rrn5, rrn4.5, rrn16, rrn23 |
| Transfer RNA | *trnA-UGC, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA,*trnG-UCC, *trnG-GCC, trnH-GUG, trnI-CAU, *trnI-GAU,*trnK-UUU, trnL-CAA, *trnL-UAA, trnL-UAG, trnfM-CAU, trnM-CAU, trnN-GUU, trnP-UGG, trnQ-UUG, trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, *trnV-UAC, trnW-CCA, trnY-GUA | |
| Other genes | RNA processing | matK |
| Carbon metabolism | cemA | |
| Fatty acid synthesis | accD | |
| Proteolysis | **clpP | |
| Genes of unknown function | Conserved reading frames | ycf1, ycf2 |
| Pseudogenes | ycf15 |
Note: * represents an intron gene; ** indicates two intron genes.
FIGURE 2Microsatellite and oligonucleotide repeat analyses. (A) Frequency of identified SSRs in LSC, IR, and SSC regions. (B) Frequency of identified SSR motifs in different repeat class types. (C) Number of SSRs detected in six chloroplast genomes. (D) Number of SSR types detected in six chloroplast genomes. (E) Comparison of the various types of oligonucleotide repeats. (F) Comparison of repeats based on size.
FIGURE 3Comparisons of LSC, SSC and IR junctions among six taxa within the genus Catalpa.
FIGURE 4Visualization of the chloroplast genome alignments of six taxa within the genus Catalpa using C. bungei as a reference in mVISTA. The x-axis represents the position in the chloroplast genome. The sequence similarity of the aligned regions is shown as horizontal bars indicating the average percent identity within 50–100%.
DNA polymorphisms identified in the six Catalpa plastomes.
| Region | Alignment length (bp) | Number of variable sites | Nucleotide polymorphism | |||
|---|---|---|---|---|---|---|
| Polymorphic | Singleton | Parsimony-informative | Nucleotide diversity | Haplotypes | ||
| LSC | 85107 | 193 | 169 | 24 | 0.00071 | 6 |
| SSC | 12703 | 71 | 61 | 10 | 0.00178 | 5 |
| IR | 30417 | 18 | 17 | 1 | 0.00018 | 4 |
| Whole plastome | 159629 | 301 | 265 | 36 | 0.00059 | 6 |
FIGURE 5Pattern of single-nucleotide substitutions in the six Catalpa chloroplast genomes.
FIGURE 6Extent of polymorphism in all plastid regions. Regions with no nucleotide diversity were excluded and are not shown here. The black circle indicates the five suitable polymorphic loci with a length >200 bp. The x-axis shows plastid regions, and the y-axis shows nucleotide diversity.
Identified suitable polymorphic loci based on comparative plastome analysis of taxa within the genus Catalpa.
| Serial number | Region | Nucleotide diversity | Number of substitutions | Number of indels | Region length | Alignment length | Missing data (%) |
|---|---|---|---|---|---|---|---|
| 1 | trnH-psbA | 0.00452 | 3 | 1 | 299 | 299 | 6.35 |
| 2 | rps2-rpoC2 | 0.00574 | 3 | 3 | 214 | 214 | 2.34 |
| 3 | rpl22 | 0.00378 | 4 | 0 | 459 | 459 | 0 |
| 4 | ycf15-trnl-CAA | 0.00458 | 3 | 0 | 364 | 364 | 0 |
| 5 | rps15 | 0.00497 | 4 | 1 | 255 | 266 | 4.14 |
Abbreviation: Indels, insertions/deletions.
Note: The “Region length” data of each gene was obtained from the “C. bungei” reference sequence.
FIGURE 7Analysis of codon bias in the chloroplast genome of Catalpa.
FIGURE 8Phylogenetic tree constructed using the maximum likelihood (ML) and Bayesian inference (BI) methods based on the whole chloroplast genomes from 23 different species. The numbers above the branches represent the MI bootstrap values and Bl posterior probabilities.