| Literature DB >> 33256606 |
Anna Hawliczek1, Leszek Bolibok2, Katarzyna Tofil1, Ewa Borzęcka1, Joanna Jankowicz-Cieślak3, Piotr Gawroński1, Adam Kral1, Bradley J Till4,5, Hanna Bolibok-Brągoszewska6.
Abstract
BACKGROUND: Loss of genetic variation negatively impacts breeding efforts and food security. Genebanks house over 7 million accessions representing vast allelic diversity that is a resource for sustainable breeding. Discovery of DNA variations is an important step in the efficient use of these resources. While technologies have improved and costs dropped, it remains impractical to consider resequencing millions of accessions. Candidate genes are known for most agronomic traits, providing a list of high priority targets. Heterogeneity in seed stocks means that multiple samples from an accession need to be evaluated to recover available alleles. To address this we developed a pooled amplicon sequencing approach and applied it to the out-crossing cereal rye (Secale cereale L.).Entities:
Keywords: Allele frequency; FBA; GSP-1; MATE1; Natural variation; PBF; Secale cereale; Sinb; TLP; Variant calling
Mesh:
Year: 2020 PMID: 33256606 PMCID: PMC7706248 DOI: 10.1186/s12864-020-07240-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Scatter plot of Variant Allele Frequency (VAF) data from GATK HaplotypeCaller. VAF is plotted on the x-axis. Black dots represent every predicted variant. The number of accessions predicted to harbor the variant is plotted on the y-axis. Data is plotted on the z-axis to separate different variants that share the same VAF and number of accessions. The percentage of the total data from VAF 0 to a specific frequency is overlaid in red. For example, 75% of all predicted nucleotide variants have a VAF of 0.05 or lower
Fig. 2Common and unique variants called by GATK, SNVer and CRISP. The Venn diagram shows the overlap of variant calls for the three algorithms (interior image). Eight hundred and ninety-five variants were commonly identified. The outer image is a Circos plot of the common variants. Only the PCR amplified regions of gene targets are displayed (track 1). Gene models are shown on track 2 with exons and introns represented by thick and thin black lines, respectively. Tracks 3, 4, and 5 show the position and frequency (indicated by bar height) of variants predicted by GATK, SNVer and CRISP, respectively
Missense, nonsense and silent changes with different variant calling methods
| GATK | SNVer | CRISP | Common variants | |
|---|---|---|---|---|
| Missense | 1183 | 336 | 868 | 164 |
| Nonsense | 14 | 7 | 9 | 2 |
| Total | 1770 | 602 | 1322 | 348 |
CAPS and Sanger validation of variants in multiple single plants of an accession
| Gene | Posa | Refb | Altc | Method | RE used | Accd | GATKe | SNVere | CRISPe | VAFobsf | No. plantsg |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 170 | A | G | CAPS | D2 | 0.880 | 0.587 | 0.819 | 0.79 | 26[11] | ||
| 170 | A | G | CAPS | E12 | 0.875 | 0.592 | 0.783 | 0.70 | 27[8] | ||
| 210 | A | G | CAPS | H5 | 0.172 | 0.079 | 0.276 | 0.38 | 25[15] | ||
| 364 | G | C | CAPS | H5 | 0.307 | 0.137 | 0.393 | 0.24 | 25[8] | ||
| 310 | C | T | CAPS | D2 | 0.292 | 0.206 | 0.165 | 0.44 | 25[14] | ||
| 310 | C | T | CAPS | E12 | 0.120 | 0.059 | 0.059 | 0.10 | 25[5] | ||
| 517 | G | A | CAPS | D2 | 0.286 | 0.262 | 0.180 | 0.44 | 27[12] | ||
| 517 | G | A | CAPS | E12 | 0.016 | 0.104 | 0.065 | 0.09 | 27[5] | ||
| 532 | C | T | CAPS | D2 | 0.104 | 0.096 | 0.138 | 0.00 | 26[0] | ||
| 532 | C | T | CAPS | E12 | 0.536 | 0.405 | 0.472 | 0.43 | 25[14] | ||
| 666 | C | T | nah | F8 | 0.401 | 0.371 | 0.359 | 0.58 | 25[11] | ||
| 810 | C | T | na | F10 | 0.042 | 0.022 | 0.068 | 0.00 | 6[0] | ||
| 810 | C | T | na | F11 | 0.214 | 0.074 | 0.216 | 0.16 | 16[5] | ||
| 846 | G | C | na | F8 | 0.094 | 0.053 | 0.104 | 0.10 | 25[5] | ||
| 847 | G | A | na | F8 | 0.094 | 0.064 | 0.100 | 0.08 | 25[4] | ||
| 211 | A | G | CAPS | H5 | 0.026 | 0.183 | 0.111 | 0.00 | 25[0] |
anucleotide position
breference sequence
cvariant sequence
daccession code
ealgorithm predicted allele frequency (VAF)
fobserved allele frequency
gnumbers in brackets indicate the number of heterozygous individuals
hnot applicable
Fig. 3Neighbor Joining tree based on Nei’s genetic distance calculated from VAF values, showing genetic relationships between 95 rye accessions. VAF values reported by GATK for 895 variants detected in common by tree algorithms were used. To simplify the output, accessions are referred to by the 96 well plate coordinates, which are also included in the accession list (Additional file 11: Table S5). Numbers in brackets indicate private alleles identified in the respective accession. Colors indicate improvement status: light blue – modern cultivar, dark blue – historic cultivar, dark green – landrace, light green – wild accession
Fig. 4PCoA plot based on Nei’s genetic distance matrix derived from VAF values, showing genetic relationships between 95 rye accessions. VAF values reported by GATK for 895 variants detected in common by tree algorithms were used. Colors indicate clusters in NJ tree (Fig. 3). To simplify output accession codes are shown only for the most distant accessions
Fig. 5Combined strip/violin plots of GATK VAF values for variants predicted in common by GATK HaplotypeCaller, SNVer and CRISP. Horizontal bars indicate median values. Representative accessions are shown. Combined strip/violin plots for all 95 accessions included in the study are presented in Additional file 10: Figure S6