| Literature DB >> 34321524 |
Abstract
The Triticum (wheat)-Aegilops (goatgrass) complex has been extensively studied, but the evolutionary history of polyploid wheats has not been fully elucidated. The chloroplast (cp) with maternal inheritance and homoplasy can simplify the sequence-based evolutionary inferences, but informative inferences would require a complete and accurate cp genome sequence. In this study, 16 cp genomes representing five Aegilops and 11 Triticum species and subspecies were sequenced, assembled and annotated, yielding five novel circular cp genome sequences. Analyzing the assembled cp genomes revealed no marked differences in genome structure and gene arrangement across the assayed species. A polymorphism analysis of 72 published cp genome sequences representing 10 Aegilops and 15 Triticum species and subspecies detected 1183 SNPs and 1881 SSRs. More than 80% SNPs detected resided on the downstream and upstream gene regions and only 2.78% or less SNPs were predicted to be deleterious. The largest nucleotide diversity was observed in the short single-copy genomic region. Relatively weak selection pressure on cp coding genes was detected. Different phylogenetic analyses confirmed that the maternal divergence of the Triticum-Aegilops complex had three deep lineages each representing a diploid species with nuclear A, B, or D genome. Dating the maternal divergence yielded age estimates of divergence that matched well with those reported previously. The divergence between emmer and bread wheats occurred at 8200-11,200 years ago. These findings are useful for further genomic studies, provide insight into cp genome evolvability and allow for better understanding of the maternal divergence of the Triticum-Aegilops complex.Entities:
Mesh:
Year: 2021 PMID: 34321524 PMCID: PMC8319314 DOI: 10.1038/s41598-021-94649-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
List of 16 samples representing five Aegilops and six Triticum species and their chloroplast genome assemblies, annotations and GenBank accessions.
| Species | NG | PGRC | Raw | CPG | CPG Region (bp) | All | tRNA | Pseudo | NCBI | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LSC | IRb | SSC | IRa | |||||||||
| M | CN034253 | 48,98,139 | 1,36,247 | 80,497 | 21,491 | 12,770 | 21,489 | 130 | 38 | 3 | ||
| SI | CN108024 | 40,35,549 | 1,36,762 | 81,030 | 21,479 | 12,776 | 21,477 | 131 | 39 | 4 | MG958549 | |
| B | CN108018 | 23,97,826 | 1,35,982 | 80,214 | 21,489 | 12,792 | 21,487 | 130 | 38 | 3 | MG958553 | |
| D | CN034229 | 42,49,172 | 1,35,502 | 79,766 | 21,483 | 12,772 | 21,481 | 130 | 38 | 3 | MG958544 | |
| U | CN108095 | 34,43,538 | 1,36,743 | 80,988 | 21,489 | 12,779 | 21,487 | 130 | 38 | 3 | ||
| AuBD | CN011189 | 56,32,473 | 1,35,766 | 80,003 | 21,487 | 12,791 | 21,485 | 130 | 38 | 3 | MG958554 | |
| AuBD | CN012261 | 39,02,845 | 1,35,819 | 80,054 | 21,488 | 12,791 | 21,486 | 130 | 38 | 3 | MG958556 | |
| Am | CN034176 | 44,75,344 | 1,36,820 | 81,048 | 21,484 | 12,806 | 21,482 | 130 | 38 | 3 | ||
| Am | CN052948 | 36,92,717 | 1,36,758 | 80,986 | 21,484 | 12,806 | 21,482 | 130 | 38 | 3 | MG958558 | |
| GAu | CN034138 | 45,96,575 | 1,36,757 | 80,984 | 21,484 | 12,807 | 21,482 | 130 | 38 | 3 | MG958559 | |
| GAu | CN001847 | 27,34,120 | 1,36,028 | 80,257 | 21,495 | 12,789 | 21,487 | 131 | 39 | 4 | MG958546 | |
| AuB | CN107962 | 28,12,946 | 1,35,786 | 80,021 | 21,488 | 12,791 | 21,486 | 130 | 38 | 3 | MG958552 | |
| AuB | CN011272 | 64,66,083 | 1,35,775 | 80,008 | 21,489 | 12,791 | 21,487 | 130 | 38 | 3 | ||
| AuB | CN041178 | 48,55,401 | 1,35,772 | 80,005 | 21,489 | 12,791 | 21,487 | 130 | 38 | 3 | MG958545 | |
| Au | CN038614 | 37,55,335 | 1,36,749 | 80,962 | 21,484 | 12,821 | 21,482 | 129 | 37 | 3 | MG958555 | |
| GAmAu | CN036212 | 15,66,335 | 1,36,028 | 80,257 | 21,495 | 12,789 | 21,487 | 130 | 38 | 3 | ||
The new circular chloroplast genome assemblies with annotations are highlighted in bold for NCBI accessions.
NG nuclear genome designation, PGRC Plant Gene Resources of Canada, Acc# accession number, CPG chloroplast genome, LSC large single copy, SSC small single copy, IRa and IRb two inverted repeat regions, NCBI National Center for Biotechnology Information.
Figure 1Percent identity plot of 16 complete Triticum and Aegilops chloroplast genome assemblies, using T. aestivum as internal reference. The vertical order of samples (or rows) followed the tree topology shown in Fig. 3. Vertical scale indicates the percentage of identity ranging from 50 to 100%. Coding regions are in blue and non-coding regions are in orange. Note that the plot was generated using mVISTA[42] (http://genome.lbl.gov/vista/mvista/about.shtml) and merging the output figure with the GNU Image Manipulation Program version 2.8.20 (http://www.gimp.org).
Figure 3Maternal phylogenetic trees of 16 samples representing six Triticum and five Aegilops species with node age and support, as inferred from chloroplast sequence variations using BEAST software. The top (A) and bottom (B) trees were rooted with wild barley (Hordeum vulgare ssp. spontaneum) and rye (Secale cereale), respectively. The node ages were calibrated with the root ages in Mya following the related age estimates by Chalupska et al.[14]. All the nodes had supports with posterior probability of 0.99 or higher, except one node in purple and two nodes in red with the posterior probabilities of 0.89 and 0.33, respectively. Eleven major nodes are labeled and the sample is labeled with its nuclear genome designation.
SNP detection and annotation based on the consensus sequences of the 16 and 72 chloroplast genomes of the Triticum–Aegilops complex using SnpEff[47].
| Type | Count | Percent | Count | Percent |
|---|---|---|---|---|
| Total SNPs | 871 | 1183 | ||
| Total possible effects | 16,313 | 21,775 | ||
| Downstream gene variant | 6803 | 41.70 | 8967 | 41.18 |
| Upstream gene variant | 6447 | 39.52 | 8712 | 40.01 |
| Intragenic variant | 1443 | 8.85 | 1736 | 7.97 |
| Intron variant | 693 | 4.25 | 1027 | 4.72 |
| Intergenic region | 404 | 2.48 | 516 | 2.37 |
| Missense variant | 353 | 2.16 | 516 | 2.37 |
| Synonymous variant | 86 | 0.53 | 142 | 0.65 |
| Non coding transcript exon variant | 17 | 0.10 | 61 | 0.28 |
| Stop lost | 23 | 0.14 | 40 | 0.18 |
| Stop gained | 29 | 0.18 | 35 | 0.16 |
| Stop retained variant | 13 | 0.08 | 15 | 0.07 |
| Non coding transcript variant | 2 | 0.01 | 4 | 0.02 |
| 5 prime UTR variant | 2 | 0.01 | ||
| Splice region variant | 2 | 0.01 | ||
Note that the gene regions defined in SnpEff annotation differ from those four regions of cp genome structure (LSC, SSC, IRb and IRa). Up/Downstream gene means the distance to the first/last codon of the gene; Intergenic region mean the distance to the closest gene; and Splice region means the region of the splice site, either within 1–3 bases of the exon or 3–8 bases of the intron.
Figure 2Nucleotide diversity (Pi) from the sliding window analysis of the 16 complete Triticum and Aegilops chloroplast genome assemblies generated in this study (A) and published 72 circular chloroplast genome sequences (B) (window length: 2000 bp, step size 200 bp). X-axis: position of the window midpoint, Y-axis: nucleotide diversity within each window. Note that the plots were generated in Microsoft Excel based on the outputs from DnaSP 6 software[49].
The extents of codons showing positive selection in the chloroplast (cp) genes in two sets of samples (16 cp genome assemblies generated in this study and published 72 cp genome sequences) obtained from the tests by two methods (PAML codeml and Hyphy MEME).
| Sample | Total codons | CPG region | 72 cp | 16 cp | |||
|---|---|---|---|---|---|---|---|
| Method | Codeml | MEME | Codeml | MEME | |||
| Gene | Codon count | Proportion | Codon count | Codon count | Codon count | ||
| 353 | LSC | 1 | 0.003 | 1 | |||
| 512 | LSC | 5 | 0.010 | 3 | 5 | ||
| 1076 | LSC | 1 | 0.001 | 1 | 1 | ||
| 683 | LSC | 4 | 0.006 | 2 | |||
| 1479 | LSC | 15 | 0.010 | 3 | 7 | 1 | |
| 247 | LSC | 1 | 0.004 | ||||
| 504 | LSC | 1 | 0.002 | 1 | |||
| 170 | LSC | 1 | 0.006 | 2 | |||
| 201 | LSC | 1 | |||||
| 245 | LSC | 2 | 0.008 | 1 | 2 | 1 | |
| 137 | LSC | 1 | 0.007 | 1 | 1 | ||
| 498 | LSC | 1 | 0.002 | 1 | |||
| 477 | LSC | 1 | 0.002 | ||||
| 320 | LSC | 1 | 0.003 | 1 | |||
| 73 | LSC | 1 | 0.014 | 1 | |||
| 123 | LSC | 1 | 0.008 | 1 | |||
| 739 | SSC | 2 | 0.003 | 1 | 2 | ||
| 322 | SSC | 2 | 0.006 | 1 | 2 | 1 | |
| 500 | SSC | 1 | 0.002 | 1 | |||
| 362 | SSC | 3 | 0.008 | 3 | 1 | ||
| 393 | SSC | 70 | 0.178 | 2 | |||
| 90 | IRa | 12 | 0.133 | ||||
| 156 | IRa | 6 | 0.038 | ||||
| 93 | IRa | 8 | 0.086 | ||||
| Sum/Average | 9753 | 141 | 0.024 | 19 | 30 | 3 | |
| Total genes | 24 | 23 | 12 | 15 | 3 | ||
Note that the total codon count for each associated gene was obtained from the coding protein sequences of T. aestivum (MG958554).
CPG chloroplast genome.
Figure 4Bayesian maximum clade credibility trees of published 72 complete and six incomplete chloroplast genomes representing six Triticum and 11 Aegilops species with nodal support (in posterior probability) and outgroup of Hordeum vulgare ssp. spontaneum, as inferred from chloroplast sequence variations using BEAST software. A sample with a number after its species name was published by others and a sample without the number label was assembled from this study (see Table S1). The nodes with a posterior probability of 0.90 or less are highlighted in red. The node ages were calibrated with the root ages in Mya following the related age estimates by Chalupska et al.[14]. Some lineage ages are shown, along the divergence time axis at the bottom of the figure.
Estimated node ages with standard deviation (in parenthesis) for seven major lineages (I: AD-B, II: A-D, V: AuBD-B, XI: Einkorn, VIII: Emmer, IX: Durum-Bread, X: Emmer-Bread wheat) for barley- and rye-rooted trees.
| Tree | Rooted age | Lineage | ||||||
|---|---|---|---|---|---|---|---|---|
| I: AD-B | II: A-D | V: AuBD-B | XI: Einkorn | VIII: Emmer | IX: Durum-Bread | X: Emmer-Bread | ||
| BrT | Mean 11.6 | 5.7880(3.3860) | 3.9369(2.2385) | 1.7502(1.2728) | 0.6262(0.4363) | 0.3675(0.2468) | 0.0335(0.0401) | 0.0083(NA) |
| RrT | Mean 6.7 | 5.2976(1.9577) | 3.5499(1.4188) | 2.1719(0.9709) | 0.7066(0.3698) | 0.4548(0.2743) | 0.0414(0.0456) | 0.0109(NA) |
| BrT | LB 9.7 | 4.8400(2.8314) | 3.2921(1.8718) | 1.4635(1.0644) | 0.5236(0.3468) | 0.3074(0.2064) | 0.0281(0.0335) | 0.0070(NA) |
| BrT | UB 12.2 | 6.0874(3.5612) | 4.1405(2.3543) | 1.8407(1.3387) | 0.6586(0.4588) | 0.3866(0.2595) | 0.0354(0.0422) | 0.0088(NA) |
| RrT | LB 4.3001 | 3.4000(1.2516) | 2.2783(0.9071) | 1.3939(0.6207) | 0.4535(0.2364) | 0.2919(0.1753) | 0.0266(0.0298) | 0.0070(NA) |
| RrT | UB 5.1854 | 4.1000(1.5093) | 2.7474(1.0938) | 1.6809(0.7484) | 0.5469(0.2851) | 0.3520(0.2114) | 0.0321(0.0359) | 0.0085(NA) |
The lineages were defined in the maternal phylogenetic trees in Fig. 3. The calibrations were made at the root age in Mya estimated from nuclear genes by Chalupska et al.[14] and chloroplast (cp) genes by Bernhardt et al.[21].
BrT is barley-rooted tree. RrT is rye-rooted tree. LB is lower bound. UB is upper bound. The standard deviation was obtained based on the estimate of height_95%_HD and scaled with the root age, but it was not obtained for the lineage X (Emmer-Bread) due to little sequence variation and labeled as not available (NA).