| Literature DB >> 32708598 |
Yongfu Tao1, Barbara George-Jaeggli1,2, Marie Bouteillé-Pallas1, Shuaishuai Tai3, Alan Cruickshank2, David Jordan1, Emma Mace1,2.
Abstract
C4 photosynthesis has evolved in over 60 different plant taxa and is an excellent example of convergent evolution. Plants using the C4 photosynthetic pathway have an efficiency advantage, particularly in hot and dry environments. They account for 23% of global primary production and include some of our most productive cereals. While previous genetic studies comparing phylogenetically related C3 and C4 species have elucidated the genetic diversity underpinning the C4 photosynthetic pathway, no previous studies have described the genetic diversity of the genes involved in this pathway within a C4 crop species. Enhanced understanding of the allelic diversity and selection signatures of genes in this pathway may present opportunities to improve photosynthetic efficiency, and ultimately yield, by exploiting natural variation. Here, we present the first genetic diversity survey of 8 known C4 gene families in an important C4 crop, Sorghum bicolor (L.) Moench, using sequence data of 48 genotypes covering wild and domesticated sorghum accessions. Average nucleotide diversity of C4 gene families varied more than 20-fold from the NADP-malate dehydrogenase (MDH) gene family (θπ = 0.2 × 10-3) to the pyruvate orthophosphate dikinase (PPDK) gene family (θπ = 5.21 × 10-3). Genetic diversity of C4 genes was reduced by 22.43% in cultivated sorghum compared to wild and weedy sorghum, indicating that the group of wild and weedy sorghum may constitute an untapped reservoir for alleles related to the C4 photosynthetic pathway. A SNP-level analysis identified purifying selection signals on C4 PPDK and carbonic anhydrase (CA) genes, and balancing selection signals on C4 PPDK-regulatory protein (RP) and phosphoenolpyruvate carboxylase (PEPC) genes. Allelic distribution of these C4 genes was consistent with selection signals detected. A better understanding of the genetic diversity of C4 pathway in sorghum paves the way for mining the natural allelic variation for the improvement of photosynthesis.Entities:
Keywords: C4 pathway; SNPs; domestication; genetic diversity; sorghum
Year: 2020 PMID: 32708598 PMCID: PMC7397294 DOI: 10.3390/genes11070806
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Diagram of the nicotinamide adenine dinucleotide phosphate-malic enzyme (NADP-ME) biosynthetic pathway of C4 photosynthesis (adapted from [40]). In the mesophyll cells, CO2 is converted to HCO3− catalyzed by carbonic anhydrase (CA) and fixed into the four-carbon acid, oxaloacetate (OAA), by phosphoenolpyruvate carboxylase (PEPC). Phosphorylation of PEPC is carried out by PEPC kinase (PPCK). The OAA generated by PEPC is then reduced to malate by the NADP-malate dehydrogenase (NADP-MDH) or trans-aminated to aspartate. The resultant C4 acids, malate and aspartate, are transported to the bundle sheath and then decarboxylated in the vicinity of Rubisco to release CO2 and pyruvate. Pyruvate is transported back to mesophyll cells to regenerate PEP by pyruvate orthophosphate dikinase (PPDK), while CO2 enters the Calvin–Benson–Bassham cycle and is fixed by ribulose-1,5-bisphosphate carboxylase (Rubisco). Activation and inactivation of PPDK is catalyzed by PPDK regulatory protein (PPDK-RP).
Single nucleotide polymorphism (SNP) information and selection signals across 27 genes from C4 gene families.
| Gene ID | Enzyme | GL | CDSL | NoS | NoSiC | NoNS | NoSS | UPSGL | UBSGL | NoSUPS | NoNSUPS | NoSUBS | NoNSUBS |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sobic.002G230100 | CA | 4823 | 1014 | 115 | 14 | 4 | 10 | No | No | 0 | 0 | 1 | 0 |
|
| CA | 10440 | 1371 | 475 | 33 | 7 | 26 | No | No | 1 | 1 | 0 | 0 |
|
| CA | 4749 | 615 | 138 | 13 | 3 | 10 | No | No | 0 | 0 | 0 | 0 |
| Sobic.003G234500 | CA | 2986 | 609 | 173 | 11 | 5 | 6 | No | No | 0 | 0 | 0 | 0 |
| Sobic.003G234600 | CA | 4750 | 771 | 210 | 18 | 10 | 8 | No | No | 0 | 0 | 0 | 0 |
| Sobic.007G166200 | NADP-MDH | 3354 | 1308 | 53 | 11 | 6 | 5 | No | No | 0 | 0 | 0 | 0 |
|
| NADP-MDH | 3816 | 1290 | 108 | 12 | 3 | 9 | No | No | 0 | 0 | 0 | 0 |
| Sobic.003G036000 | NADP-ME | 6107 | 1941 | 111 | 11 | 4 | 7 | No | No | 0 | 0 | 0 | 0 |
|
| NADP-ME | 5447 | 1911 | 141 | 12 | 3 | 9 | No | No | 0 | 0 | 0 | 0 |
| Sobic.003G280900 | NADP-ME | 5691 | 1782 | 175 | 22 | 13 | 9 | No | No | 1 | 1 | 0 | 0 |
| Sobic.003G292400 | NADP-ME | 4527 | 1782 | 95 | 22 | 8 | 14 | No | No | 10 | 2 | 0 | 0 |
| Sobic.009G069600 | NADP-ME | 3624 | 1713 | 118 | 34 | 10 | 24 | No | No | 3 | 1 | 0 | 0 |
| Sobic.002G167000 | PEPC | 5632 | 2904 | 41 | 11 | 6 | 5 | No | No | 0 | 0 | 0 | 0 |
| Sobic.003G100600 | PEPC | 8881 | 3117 | 371 | 43 | 9 | 34 | No | No | 0 | 0 | 21 | 2 |
| Sobic.003G301800 | PEPC | 7610 | 2901 | 138 | 19 | 3 | 17 | No | No | 0 | 0 | 0 | 0 |
| Sobic.004G106900 | PEPC | 6977 | 2883 | 146 | 34 | 5 | 29 | No | No | 0 | 0 | 7 | 0 |
| Sobic.007G106500 | PEPC | 5616 | 2895 | 64 | 12 | 8 | 4 | No | No | 1 | 1 | 0 | 0 |
|
| PEPC | 6647 | 3087 | 193 | 28 | 9 | 19 | No | No | 0 | 0 | 2 | 0 |
| Sobic.004G219900 | PPCK | 1612 | 924 | 40 | 9 | 1 | 8 | No | No | 0 | 0 | 2 | 0 |
|
| PPCK | 1749 | 855 | 37 | 9 | 4 | 4 | No | No | 0 | 0 | 0 | 0 |
| Sobic.006G148300 | PPCK | 1997 | 900 | 64 | 4 | 1 | 3 | No | No | 0 | 0 | 0 | 0 |
| Sobic.001G326900 | PPDK | 8494 | 2730 | 321 | 46 | 18 | 28 | No | Yes | 0 | 0 | 24 | 5 |
|
| PPDK | 12748 | 2847 | 441 | 16 | 0 | 16 | No | No | 3 | 0 | 0 | 0 |
|
| PPDK-RP | 2507 | 1290 | 79 | 22 | 8 | 14 | No | No | 0 | 0 | 3 | 0 |
| Sobic.002G324500 | PPDK-RP | 3072 | 1260 | 69 | 20 | 5 | 15 | No | No | 4 | 0 | 0 | 0 |
| Sobic.002G324700 | PPDK-RP | 4662 | 1587 | 222 | 28 | 19 | 9 | No | No | 1 | 1 | 2 | 2 |
|
| RbcS | 1556 | 510 | 45 | 7 | 4 | 3 | No | No | 0 | 0 | 0 | 0 |
Gene ID is according to sorghum reference genome V3.1. Gene IDs in bold indicate their C4 genes. Enzyme: Encoded enzyme. GL: Gene length. CDSL: Length of coding sequence (CDS). NoS: Total number of SNPs identified across the gene. NoSiC: Number of SNPs identified in CDS. NoNS: Number of non-synonymous SNPs. NoSS: Number of synonymous SNPs. UPSGL: Under purifying selection based on gene level analysis. UBSGL: Under balancing selection based on gene level analysis. NoSUPS: Number of SNPs under purifying selection. NoNSUPS: Number of non-synonymous SNPs under purifying selection. NoSUBS: Number of SNPs under balancing selection. NoNSUBS: Number of non-synonymous SNPs under balancing selection.
Genetic diversity (θπ) and fixation index (FST) of 27 genes from C4 gene families.
| GeneID | Enzyme | θπ–All | θπ-Cultivated | θπ-W&W | FST |
|---|---|---|---|---|---|
| Sobic.002G230100 | CA | 0.80 | 0.74 | 0.90 | 0.19 |
|
| CA | 2.65 | 2.46 | 2.66 | 0.16 |
|
| CA | 1.01 | 0.91 | 0.88 | 0.37 |
| Sobic.003G234500 | CA | 5.55 | 5.51 | 4.56 | 0.07 |
| Sobic.003G234600 | CA | 1.27 | 1.35 | 0.65 | 0.06 |
| Sobic.007G166200 | NADP-MDH | 0.18 | 0.21 | 0.13 | 0.07 |
|
| NADP-MDH | 0.33 | 0.33 | 0.42 | 0.08 |
| Sobic.003G036000 | NADP-ME | 0.88 | 0.65 | 1.59 | 0.15 |
|
| NADP-ME | 0.89 | 0.67 | 1.39 | 0.06 |
| Sobic.003G280900 | NADP-ME | 0.93 | 0.85 | 1.11 | 0.09 |
| Sobic.003G292400 | NADP-ME | 1.43 | 0.08 | 4.44 | 0.32 |
| Sobic.009G069600 | NADP-ME | 0.52 | 0.49 | 0.10 | 0.45 |
| Sobic.002G167000 | PEPC | 0.58 | 0.51 | 0.85 | 0.04 |
| Sobic.003G100600 | PEPC | 5.36 | 5.18 | 3.56 | 0.05 |
| Sobic.003G301800 | PEPC | 0.64 | 0.22 | 2.37 | 0.22 |
| Sobic.004G106900 | PEPC | 3.18 | 3.02 | 2.14 | 0.07 |
| Sobic.007G106500 | PEPC | 0.44 | 0.22 | 0.47 | 0.21 |
|
| PEPC | 2.49 | 2.25 | 2.86 | 0.04 |
| Sobic.004G219900 | PPCK | 2.08 | 1.94 | 2.12 | 0.12 |
|
| PPCK | 1.03 | 0.96 | 0.91 | 0.03 |
| Sobic.006G148300 | PPCK | 0.48 | 0.39 | 0.13 | 0.41 |
| Sobic.001G326900 | PPDK | 8.34 | 5.64 | 5.64 | 0.40 |
|
| PPDK | 2.07 | 1.79 | 2.19 | 0.13 |
|
| PPDK-RP | 5.04 | 3.82 | 4.55 | 0.41 |
| Sobic.002G324500 | PPDK-RP | 1.27 | 0.10 | 3.75 | 0.24 |
| Sobic.002G324700 | PPDK-RP | 2.58 | 2.50 | 3.51 | 0.05 |
|
| rbcS | 4.32 | 3.41 | 5.72 | 0.12 |
Gene ID is according to sorghum reference genome V3.1. Gene IDs in bold indicate the C4 gene versions. Enzyme: Encoded enzyme. θπ-All: Nucleotide diversity across all 48 genotypes. θπ-Cultivated: Nucleotide diversity across cultivated genotypes. θπ-W&W: Nucleotide diversity across wild and weedy genotypes. All θπ values are in unites of per kb. FST: Fixation index between cultivated genotypes and wild and weedy genotypes.
Figure 2Genetic diversity and fixation index (FST) of C4 gene families between cultivated sorghum and the wild and weedy group. (A) Genetic diversity (pi) for each of the C4 gene families. Gene IDs in red indicate core C4 genes. Red bars represent the pi of cultivated sorghum, while dark blue bars represent the pi of wild and weedy. (B) FST between cultivated and wild and weedy of each of C4 gene families. Gene IDs in red indicate core C4 genes.
Figure 3Haplotype network of 4 core C4 gene with selection signal based on individual SNP analysis. (A) The PPDK gene (Sobic.009G132900) with signal of purifying selection; (B) one of the CA genes (Sobic.003G234200) with signal of purifying selection; (C) the PPDK-RP gene (Sobic.002G324400) with signal of balancing selection; (D) the PEPC gene (Sobic.010G160700) with signal of balancing selection. Group classification of sorghum accessions used as detailed in Table S1. Color-coding as follows; cultivated sorghum (red), wild and weedy genotypes (purple), Sorghum propinquum (blue), and Sorghum guinea margaritiferum (green). The size of the circles in the haplotype networks is proportionate to the number of accessions with that haplotype. The branch length represents the genetic distance between two haplotypes.