| Literature DB >> 35720573 |
Åsa Grimberg1, Ganapathi Varma Saripella1, Ritva Ann-Mari Repo-Carrasco Valencia2, Therése Bengtsson1, Gabriela Alandia3, Anders S Carlsson1.
Abstract
Quinoa (Chenopodium quinoa Willd.) is a crop that has great potential for increased cultivation in diverse climate regions. The seed protein quality obtained from this crop is high concerning the requirements to meet human nutritional needs, but the seed protein content is relatively low if compared to crops such as grain legumes. Increased seed protein content is desirable for increasing the economic viability of this crop in order for it to be used as a protein crop. In this study, we characterized three genotypes of quinoa with different levels of seed protein content. By performing RNA sequencing of developing seeds, we determined the genotype differences in gene expression and identified genetic polymorphisms that could be associated with increased protein content. Storage nutrient analyses of seeds of three quinoa genotypes (Titicaca, Pasankalla, and Regalona) from different ecoregions grown under controlled climate conditions showed that Pasankalla had the highest protein content (20%) and the lowest starch content (46%). Our seed transcriptome analyses revealed highly differentially expressed transcripts (DETs) in Pasankalla as compared to the other genotypes. These DETs encoded functions in sugar transport, starch and protein synthesis, genes regulating embryo size, and seed transcription factors. We selected 60 genes that encode functions in the central carbon metabolism and transcription factors as potential targets for the development of high-precision markers. Genetic polymorphisms, such as single nucleotide polymorphisms (SNPs) and base insertions and deletions (InDels), were found in 19 of the 60 selected genes, which can be further evaluated for the development of genetic markers for high seed protein content in quinoa. Increased cultivation of quinoa can contribute to a more diversified agriculture and support the plant protein diet shift. The identification of quinoa genotypes with contrasting seed quality can help establish a model system that can be used for the identification of precise breeding targets to improve the seed quality of quinoa. The data presented in this study based on nutrient and transcriptome analyses contribute to an enhanced understanding of the genetic regulation of seed quality traits in quinoa and suggest high-precision candidate markers for such traits.Entities:
Keywords: Chenopodium quinoa; RNA sequencing; SNP; plant protein; transcription factor; transcriptional regulation; transcriptome-based markers
Year: 2022 PMID: 35720573 PMCID: PMC9201758 DOI: 10.3389/fpls.2022.816425
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
Figure 1Microscopy photos of quinoa seeds (Titicaca) at the early (A), mid (B), and late (C) developmental stages used in this study. These developmental stages correspond to the following days post anthesis (dpa) for Titicaca: 15 dpa (early), 25 dpa (mid), and 35 dpa (late). Sepals have been removed from seeds on the right in each panel. Scale bars are 1.07 mm.
Figure 2Protein (A), starch (B), and oil (C) content [% by dry weight (dw)] in developing seeds of quinoa Titicaca, Pasankalla, and Regalona. The protein content in mature seeds of these genotypes was published previously (Gargiulo et al., 2019). Plants were grown under controlled growth conditions. For corresponding dpa for each developmental stage, see the text. Results are showing mean values ± SD of three biological replicates. Bars that do not share a letter are significantly different according to Fisher’s pairwise test (p ≤ 0.05).
Figure 3Light microscopy photos of quinoa seed flour from the mid developmental stage (A–C), flour at maturity (D–F) and whole seeds at maturity (G-I) of Titicaca (A,D,G), Pasankalla (B,E,H), and Regalona (C,F,I). For corresponding dpa for the mid developmental stage, see the text. Scale bars are 1 mm.
Figure 4Amino acid profile in mature quinoa seeds of Titicaca, Pasankalla, and Regalona. Results are showing mean values ± SD of two biological replicates. Bars for amino acids showing significant differences between genotypes are marked with letters. Bars that do not share a letter are significantly different according to Tukey’s test (p ≤ 0.05).
Figure 5Fatty acid profile (mol % of total fatty acids) of triacylglycerol (oil) in mature quinoa seeds of Titicaca, Pasankalla, and Regalona. Results are showing mean values ± SD of three biological replicates. Bars that do not share a letter (for each individual fatty acid) are significantly different according to Tukey’s test (p ≤ 0.05).
Intersecting DETs with threshold FDR < 1e-2 and log2FC is set to 1 from pairwise comparisons of quinoa seed transcriptomes between genotypes (one to one; column 2–4, or one to two; column 5–7).
| Pas_vs_Reg | Pas_vs_Tit | Reg_vs_Tit | Pas_vs_Reg-Tit | Reg_vs_Pas-Tit | Tit_vs_Pas-Reg | |
|---|---|---|---|---|---|---|
| Number of DETs | 3,691 | 2,702 | 353 | 3,568 | 361 | 376 |
| Unique genes | 3,242 | 2,320 | 299 | 3,072 | 330 | 332 |
| AT BLAST-hit genes | 2,999 | 2,076 | 252 | 2,776 | 295 | 276 |
| TF PlantTFDBv5.0 | 182 | 131 | 12 | 171 | 9 | 11 |
| TF ITAKv18.12 | 202 | 144 | 13 | 192 | 11 | 17 |
For example, 3,568 transcripts were identified as being differentially expressed in Pasankalla (Pas) as compared to both in Regalona (Reg) and Titicaca (Tit). After removing duplicate transcripts (i.e., representing the same coding strand in the reference genome) on this list it was reduced to 3,072 genes [genes with missing (.) gene information were not included]. Of these 3,072 unique genes 2,776 showed hits in Arabidopsis TAIR database through BLAST searches, and 171 and 192 were annotated as coding for transcription factors as determined from blast searches against two plant transcription factor databases (PlantTFDBv5.0 and TF ITAKv18.12, respectively). For clarity it can also be noted that the sum of all the numbers in for example the blue circle in Figure 7B is 3,568, which can be found in this Table for the Pas_vs_Reg-Tit comparison.
Figure 6(A) Number of differentially expressed transcripts (DETs) as identified by DESeq2 from pairwise comparisons between the three quinoa genotypes Pasankalla, Regalona, and Titicaca after setting thresholds of adjusted value of p ≤ 0.01 and log2FC > 1.0. (B) Number of differentially expressed transcripts as identified by DESeq2 from comparisons between one genotype against the two other [Pasankalla (Pas), Regalona (Reg), and Titicaca (Tit)] after setting thresholds of adjusted value of p ≤ 0.01 and log2FC > 1.0. FC, fold change. Also see data provided in Supplementary Table S6.
Figure 7(A) Venn-diagram showing the numbers of intersecting genes from the DETs identified from pairwise comparisons of the three genotypes Pasankalla (Pas), Regalona (Reg), and Titicaca (Tit). Adjusted value of p ≤ 0.01 and log2FC > 1.0. (B) Venn-diagram showing the numbers of intersecting genes from the DETs identified from comparisons of one genotype against the other two genotypes of quinoa. Adjusted value of p ≤ 0.01 and log2FC > 1.0. FC, fold change. Also see data provided in Supplementary Table S6.
Figure 8Heat map showing the expression levels of the 60 selected transcripts (rlog normalized scores) in quinoa seeds of potential importance for regulating seed quality in Pasankalla. These selected transcripts encode either functions involved in central carbon metabolism (sugar, starch, lipid, and protein metabolism) or transcription factors, and they all showed at least eight times higher/lower transcript expression (log2FC > 3.0, value of p ≤ 0.01) in Pasankalla (Pas, turquiose) as compared to in Regalona and Titicaca (RegTit, pink). Expression levels shown are normalized raw read counts (as from the mapping onto the reference genome) using regularized logarithm (rlog). Values are shown from triplicate biological samples from the early and mid-developmental stages (for corresponding dpa, see the text).
Number of sequence variations (SNPs/InDels) in quinoa seed transcriptomes as compared to the quinoa reference genome.
| Explanation of identified SNPs/InDels | Pasankalla | Regalona | Titicaca |
|---|---|---|---|
| Total SNPs/InDels in transcripts after alignment to reference genome | 341,911 | 151,816 | 126,847 |
| Subset of SNPs/InDels after quality filters | 148,863 | 67,446 | 45,456 |
| Subset of SNPs/InDels after quality filters that are present within significantly DETs | 8,939 | 442 | 416 |
| Total no of High and Moderate impact variants | 2,863 | 115 | 142 |
|
| 47 | 2 | 6 |
|
| 47 | 5 | 4 |
|
| 68 | 3 | 4 |
|
| 219 | 10 | 11 |
|
| 14 | 1 | 1 |
|
| 15 | 0 | 0 |
|
| 2,473 | 97 | 119 |
Numbers given are: total number of SNPs/InDels in the transcriptomes of the three genotypes (when individually compared to the reference genome) and number of these SNPs/InDels left after quality filtering [QUAL (The Phred-scaled probability that a REF/ALT polymorphism exists at this site given sequencing data) ≥30; Read Depth ≥10; Genotype Quality ≥20]. Further the number of the quality filtered SNPs/InDels that were found in the significantly (p < 0.01 and Fold Change >1) DETs were extracted, and finally the numbers of these identified SNPs/InDels that was yielding high and moderate impact amino acid changes. For example, the seed transcriptome of Pasankalla showed 341,911 SNPs/InDels as identified from comparison to the quinoa reference genome, of which 148,863 SNPs/InDels were left after quality filtering. Of these 148,863 SNPs/InDels, 8,939 were found in the 3,072 significant DETs identified in Pasankalla as compared to Titicaca and Regalona (see Table 1). Of these 8,939 SNPs/InDels in DETs, 2,863 SNPs/InDels were identified as being of high to moderately impacting variants giving rise to splice acceptor/donor variants, introducing or deleting stop/start codons, frameshift and missense variants.
List of moderate to high impact SNPs/InDels found in Pasankalla seed transcripts as compared to the reference genome in 19 of the 60 candidate transcripts (listed in Supplementary Table S5).
| #CHROM | POS | REF | ALT | QUAL | Genotype | Predicted effects | NG-14833_Pas | Gene annotation | Closest |
|---|---|---|---|---|---|---|---|---|---|
| NW_018742204.1 | 12313015 | C | T | 2062.77 | 1/1 | Missense_variant | rna498 | Protein FAR-RED IMPAIRED RESPONSE 1-like | AT4G15090 |
| NW_018742204.1 | 12313702 | A | G | 4467.77 | 1/1 | Stop_lost&splice_region_variant | rna498 | Protein FAR-RED IMPAIRED RESPONSE 1-like | AT4G15090 |
| NW_018742204.1 | 19431791 | G | A | 3625.77 | 1/1 | Stop_gained | rna1048 | Truncated transcription factor CAULIFLOWER A-like | AT1G69120 |
| NW_018742205.1 | 1222228 | A | T | 1548.77 | 0/1 | Splice_acceptor_variant&intron_variant | rna1790 | Granule-bound starch synthase 1 | AT1G32900 |
| NW_018742205.1 | 1223861 | A | G | 154458.77 | 1/1 | Missense_variant | rna1790 | Granule-bound starch synthase 1 | AT1G32900 |
| NW_018742205.1 | 1224425 | G | C | 101104.77 | 1/1 | Missense_variant | rna1790 | Granule-bound starch synthase 1 | AT1G32900 |
| NW_018742205.1 | 1225082 | A | G | 29044.77 | 1/1 | Missense_variant | rna1790 | Granule-bound starch synthase 1 | AT1G32900 |
| NW_018742418.1 | 4213817 | C | A | 1850.77 | 1/1 | Splice_donor_variant&intron_variant | rna6695 | Glucose-6-phosphate isomerase | AT5G42740 |
| NW_018742418.1 | 4213846 | A | G | 2701.77 | 1/1 | Missense_variant | rna6695 | Glucose-6-phosphate isomerase | AT5G42740 |
| NW_018742418.1 | 4214066 | A | G | 2573.77 | 1/1 | Missense_variant | rna6695 | Glucose-6-phosphate isomerase | AT5G42740 |
| NW_018742418.1 | 6800112 | C | T | 382.77 | 0/1 | Missense_variant | rna6969 | Trihelix transcription factor ASR3-like | AT2G33550 |
| NW_018742418.1 | 6800393 | G | T | 992.77 | 1/1 | Missense_variant | rna6969 | Trihelix transcription factor ASR3-like | AT2G33550 |
| NW_018742484.1 | 3420361 | G | T | 3626.77 | 1/1 | Splice_acceptor_variant&intron_variant | rna8863 | Leucoanthocyanidin dioxygenase-like | AT4G22880 |
| NW_018743014.1 | 1550482 | C | A | 346.77 | 0/1 | Splice_acceptor_variant&intron_variant | rna20892 | Hexokinase-2-like | AT2G19860 |
| NW_018743033.1 | 1533172 | A | G | 3475.77 | 0/1 | Missense_variant | rna21577 | Pyruvate decarboxylase 1-like | AT4G33070 |
| NW_018743033.1 | 1534233 | T | C | 4599.77 | 0/1 | Missense_variant | rna21577 | Pyruvate decarboxylase 1-like | AT4G33070 |
| NW_018743033.1 | 1535169 | G | C | 7051.77 | 0/1 | Missense_variant | rna21577 | Pyruvate decarboxylase 1-like | AT4G33070 |
| NW_018743066.1 | 1868796 | A | G | 717.77 | 1/1 | Missense_variant | rna22494 | Uncharacterized LOC110731339(AT:pyrophosphorylase 2) | AT2G18230 |
| NW_018743175.1 | 1811370 | C | G | 109.77 | 0/1 | Missense_variant | rna23725 | Ethylene-responsive transcription factor ERF104-like | AT5G47230 |
| NW_018743175.1 | 1811380 | A | T | 112.77 | 0/1 | Missense_variant | rna23725 | Ethylene-responsive transcription factor ERF104-like | AT5G47230 |
| NW_018743175.1 | 1811963 | A | C | 3684.77 | 1/1 | Missense_variant | rna23725 | Ethylene-responsive transcription factor ERF104-like | AT5G47230 |
| NW_018743397.1 | 696295 | G | C | 27773.77 | 0/1 | Missense_variant | rna31317 | Granule-bound starch synthase 1 | AT1G32900 |
| NW_018744290.1 | 1355398 | A | T | 2531.77 | 0/1 | Missense_variant | rna48879 | BURP Domain-containing protein BNM2A-like | AT1G49320 |
| NW_018744290.1 | 1355566 | G | A | 12661.77 | 0/1 | Missense_variant | rna48879 | BURP Domain-containing protein BNM2A-like | AT1G49320 |
| NW_018744489.1 | 33930 | T | TATCA | 49.73 | 0/1 | Frameshift_variant&stop_gained | rna51742 | BURP Domain-containing protein BNM2A-like | AT1G49320 |
| NW_018744878.1 | 3241449 | T | C | 2645.77 | 1/1 | Missense_variant | rna60851 | Protein FAR-RED IMPAIRED RESPONSE 1-like | AT4G15090 |
| NW_018744878.1 | 3241962 | G | C | 1718.77 | 1/1 | Missense_variant | rna60851 | Protein FAR-RED IMPAIRED RESPONSE 1-like | AT4G15090 |
| NW_018744878.1 | 3241977 | CCGGCATATAACATGGCACCA | C | 1003.73 | 1/1 | Frameshift_variant | rna60851 | Protein FAR-RED IMPAIRED RESPONSE 1-like | AT4G15090 |
| NW_018744878.1 | 3242167 | C | A | 1792.77 | 1/1 | Missense_variant | rna60851 | Protein FAR-RED IMPAIRED RESPONSE 1-like | AT4G15090 |
| NW_018744955.1 | 1064185 | A | G | 17049.77 | 1/1 | Missense_variant | rna61870 | Transcription factor bHLH19-lik | AT2G22760 |
| NW_018744955.1 | 1064321 | T | C | 14898.77 | 1/1 | Missense_variant | rna61870 | Transcription factor bHLH19-like | AT2G22760 |
| NW_018745003.1 | 393540 | A | T | 472.77 | 0/1 | Missense_variant | rna63362 | NADP-dependent malic enzyme | AT5G25880 |
| NW_018745003.1 | 393541 | A | T | 472.77 | 0/1 | Missense_variant | rna63362 | NADP-dependent malic enzyme | AT5G25880 |
| NW_018745323.1 | 1054649 | C | T | 2359.77 | 0/1 | Missense_variant | rna68911 | Glyceraldehyde-3-phosphate dehydrogenase | AT1G13440 |
| NW_018745454.1 | 1606358 | C | CACCACCGG | 1039.73 | 1/1 | Frameshift_variant | rna70849 | Phospholipase D gamma 1-like | AT2G42010 |
| NW_018745454.1 | 1606396 | C | G | 919.77 | 1/1 | Missense_variant | rna70849 | Phospholipase D gamma 1-like | AT2G42010 |
| NW_018745454.1 | 1606841 | G | A | 1495.77 | 1/1 | Missense_variant | rna70849 | Phospholipase D gamma 1-like | AT2G42010 |
| NW_018745454.1 | 1607164 | C | CA | 1286.73 | 0/1 | Frameshift_variant | rna70849 | Phospholipase D gamma 1-like | AT2G42010 |
| NW_018745454.1 | 1607221 | A | G | 2132.78 | 1/1 | Missense_variant | rna70849 | Phospholipase D gamma 1-like | AT2G42010 |
| NW_018745454.1 | 1607446 | G | A | 2668.77 | 1/1 | Missense_variant | rna70849 | Phospholipase D gamma 1-like | AT2G42010 |
| NW_018745454.1 | 1607710 | A | G | 1366.77 | 1/1 | Missense_variant | rna70849 | Phospholipase D gamma 1-like | AT2G42010 |
| NW_018745454.1 | 1610173 | T | A | 2160.77 | 1/1 | Splice_donor_variant&intron_variant | rna70849 | Phospholipase D gamma 1-like | AT2G42010 |
| NW_018745454.1 | 1613459 | C | G | 4019.77 | 0/1 | Missense_variant | rna70849 | Phospholipase D gamma 1-like | AT2G42010 |
| NW_018745684.1 | 2622955 | C | T | 11608.77 | 0/1 | Missense_variant | rna74938 | 13S Globulin seed storage protein 2-like | AT5G44120 |
| NW_018745684.1 | 2624227 | C | G | 6071.73 | 0/1 | Missense_variant | rna74938 | 13S Globulin seed storage protein 2-like | AT5G44120 |
#CHROM; assembly scaffold numbers, POS; base position in scaffolds, REF; DNA base in reference genome, ALT; DNA base in Pasankalla seed transcript, QUAL; quality score, Genotype; 1/1 means that Pasankalla is homozygote for the identified SNP/InDel and 0/1 means Pasankalla is heterozygote for the SNP/InDel, Predicted effects; type of SNP/InDel, NG-14833_Pas; Pasankalla transcript number, Gene annotation; functional annotation from the quinoa genome, Closest Arabidopsis homolog; gene identification number of closest homolog in Arabidopsis thaliana.