| Literature DB >> 31779118 |
Xinbo Pang1,2,3,4, Hongshan Liu2, Suran Wu2, Yangchen Yuan2, Haijun Li2, Junsheng Dong2, Zhaohua Liu2, Chuanzhi An2, Zhihai Su2, Bin Li1,2,3,4.
Abstract
Species identification of oaks (Quercus) is always a challenge because many species exhibit variable phenotypes that overlap with other species. Oaks are notorious for interspecific hybridization and introgression, and complex speciation patterns involving incomplete lineage sorting. Therefore, accurately identifying Quercus species barcodes has been unsuccessful. In this study, we used chloroplast genome sequence data to identify molecular markers for oak species identification. Using next generation sequencing methods, we sequenced 14 chloroplast genomes of Quercus species in this study and added 10 additional chloroplast genome sequences from GenBank to develop a DNA barcode for oaks. Chloroplast genome sequence divergence was low. We identified four mutation hotspots as candidate Quercus DNA barcodes; two intergenic regions (matK-trnK-rps16 and trnR-atpA) were located in the large single copy region, and two coding regions (ndhF and ycf1b) were located in the small single copy region. The standard plant DNA barcode (rbcL and matK) had lower variability than that of the newly identified markers. Our data provide complete chloroplast genome sequences that improve the phylogenetic resolution and species level discrimination of Quercus. This study demonstrates that the complete chloroplast genome can substantially increase species discriminatory power and resolve phylogenetic relationships in plants.Entities:
Keywords: Quercus; chloroplast genome; mutation hotspots; oak species identification
Mesh:
Substances:
Year: 2019 PMID: 31779118 PMCID: PMC6928813 DOI: 10.3390/ijms20235940
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Gene map of Quercus chloroplast genome. Genes drawn within the circle are transcribed clockwise; genes drawn outside are transcribed counterclockwise. Genes in different functional groups are shown in different colors. Dark bold lines indicate the extent of the inverted repeats (IRa and IRb) that separate the genomes into small single-copy (SSC) and large single-copy (LSC) regions.
Summary statistics for the assembly of 14 Quercus species chloroplast genomes.
| Species | LSC | IR | SSC | Total Size (bp) | Number of Genes | Protein Coding Genes | tRNA | rRNA | Accession Number in Genbank |
|---|---|---|---|---|---|---|---|---|---|
|
| 90594 | 25848 | 18946 | 161236 | 113 | 79 | 30 | 4 | MK105459 |
|
| 90570 | 25848 | 18947 | 161213 | 113 | 79 | 30 | 4 | MK105457 |
|
| 90562 | 25848 | 18956 | 161214 | 113 | 79 | 30 | 4 | MK105467 |
|
| 90624 | 25852 | 18956 | 161284 | 113 | 79 | 30 | 4 | MK105461 |
| 90532 | 25837 | 18988 | 161194 | 113 | 79 | 30 | 4 | MK105452 | |
|
| 90363 | 25866 | 19037 | 161132 | 113 | 79 | 30 | 4 | MK105462 |
| 90534 | 25826 | 19038 | 161224 | 113 | 79 | 30 | 4 | MK105458 | |
|
| 90520 | 25825 | 19041 | 161211 | 113 | 79 | 30 | 4 | MK105466 |
|
| 90504 | 25820 | 19047 | 161191 | 113 | 79 | 30 | 4 | MK105460 |
|
| 90593 | 25826 | 19055 | 161300 | 113 | 79 | 30 | 4 | MK105453 |
|
| 90557 | 25832 | 19064 | 161285 | 113 | 79 | 30 | 4 | MK105456 |
|
| 90447 | 25817 | 19065 | 161146 | 113 | 79 | 30 | 4 | MK105464 |
|
| 90464 | 25817 | 19070 | 161168 | 113 | 79 | 30 | 4 | MK105451 |
|
| 90553 | 25870 | 19073 | 161366 | 113 | 79 | 30 | 4 | MK105463 |
Figure 2Phylogenetic tree inferred from the 25 chloroplast genomes. Left: Maximum likelihood tree with maximum likelihood (ML) bootstrap values; right: Bayesian tree with posterior probabilities.
The variability of the four new markers, chloroplast genome, and the universal chloroplast DNA barcodes in Quercus.
| Markers | Length | Variable Sites | Information Sites | Discrimination Success (%) Based on Distance Method | ||
|---|---|---|---|---|---|---|
| Numbers | % | Numbers | % | |||
|
| 698 | 8 | 1.15% | 5 | 0.72% | 12.50% |
|
| 744 | 21 | 2.82% | 11 | 1.48% | 25.00% |
|
| 574 | 27 | 4.70% | 16 | 2.79% | 37.50% |
|
| 1442 | 29 | 2.01% | 16 | 1.11% | 29.17% |
|
| 2016 | 56 | 2.78% | 32 | 1.59% | 50.00% |
|
| 2311 | 93 | 4.02% | 59 | 2.55% | 79.17% |
|
| 1309 | 57 | 4.35% | 35 | 2.67% | 66.67% |
|
| 1536 | 74 | 4.82% | 45 | 2.93% | 83.33% |
|
| 1765 | 94 | 5.33% | 59 | 3.34% | 70.83% |
|
| 3301 | 168 | 5.09% | 104 | 3.15% | 91.67% |
|
| 6921 | 318 | 4.59% | 198 | 2.86% | 100.00% |
Figure 3Neighbor joining trees for Quercus using rbcL + matK, rbcL + matK, and trnH-psbA combinations.
Figure 4Specific DNA barcode development. (A) Mean distance of each window; (B) proportion of zero pairwise distances for each species; (C) nucleotide diversity (pi) of each window. Window length: 800 bp; Step size: 100 bp; X-axis: position of the midpoint of a window.
Figure 5Neighbor joining tree for Quercus using the four highly variable markers and complete chloroplast genome data.
Variable site analyses in Quercus chloroplast genomes.
| Number of Sites | Variable Sites | Information Sites | Nucleotide Diversity | |||
|---|---|---|---|---|---|---|
| Numbers | % | Numbers | % | |||
| LSC | 92,888 | 2009 | 2.16% | 1257 | 1.35% | 0.0043 |
| SSC | 19,535 | 593 | 3.04% | 368 | 1.88% | 0.00624 |
| IR | 25,879 | 91 | 0.35% | 54 | 0.21% | 0.00073 |
| Complete chloroplast genome | 164,156 | 2778 | 1.69% | 1727 | 1.05% | 0.00335 |