| Literature DB >> 32724070 |
Yogendra Khedikar1, Wayne E Clarke1, Lifeng Chen2, Erin E Higgins1, Sateesh Kagale3, Chu Shin Koh4, Rick Bennett2, Isobel A P Parkin5.
Abstract
Ethiopian mustard (Brassica carinata A. Braun) is an emerging sustainable source of vegetable oil, in particular for the biofuel industry. The present study exploited genome assemblies of the Brassica diploids, Brassica nigra and Brassica oleracea, to discover over 10,000 genome-wide SNPs using genotype by sequencing of 620 B. carinata lines. The analyses revealed a SNP frequency of one every 91.7 kb, a heterozygosity level of 0.30, nucleotide diversity levels of 1.31 × 10-05, and the first five principal components captured only 13% molecular variation, indicating low levels of genetic diversity among the B. carinata collection. Genome bias was observed, with greater SNP density found on the B subgenome. The 620 lines clustered into two distinct sub-populations (SP1 and SP2) with the majority of accessions (88%) clustered in SP1 with those from Ethiopia, the presumed centre of origin. SP2 was distinguished by a collection of breeding lines, implicating targeted selection in creating population structure. Two selective sweep regions on B3 and B8 were detected, which harbour genes involved in fatty acid and aliphatic glucosinolate biosynthesis, respectively. The assessment of genetic diversity, population structure, and LD in the global B. carinata collection provides critical information to assist future crop improvement.Entities:
Mesh:
Year: 2020 PMID: 32724070 PMCID: PMC7387349 DOI: 10.1038/s41598-020-69255-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1(a) Distribution of genome-wide SNPs across B and C subgenomes in 1 MB window. (b) The annotation of SNPs.
Figure 2Summary of population analysis for worldwide B. carinata collection (a) Bar chart of inferred population structure for K = 2 from STRUCTURE. (b) Phylogenetic analysis; track I indicates subpopulations identified by STRUCTURE (SP1 are coloured in red; SP2 are green and AG are blue); track II shows the source of accessions (Gene banks); track III indicate the country of origin. (c) Principal component analysis (PCA) of 620 accessions.
Diversity statistics for various genomic contexts calculated over 100 kb non-overlapping windows across the B. carinata genome.
| Number of SNPs | Nucleotide diversity (π)a | Watterson’s θb | Tajima’s Dc | |
|---|---|---|---|---|
| Total (BC genome) | 10,199 | 1.31 × 10–05 | 6.60 × 10–06 | 1.30 |
| B subgenome | 7,452 | 1.56 × 10–05 | 7.83 × 10–06 | 1.35 |
| C subgenome | 2,747 | 8.78 × 10–06 | 4.43 × 10–07 | 1.21 |
| Coding | 4,475 | 9.23 × 10–06 | 4.68 × 10–06 | 1.23 |
| Synonymous | 2,988 | 7.72 × 10–06 | 3.93 × 10–06 | 1.16 |
| Non-synonymous | 1,487 | 6.46 × 10–06 | 3.30 × 10–06 | 1.12 |
| Introns | 2,435 | 8.61 × 10–06 | 4.40 × 10–06 | 1.19 |
| Intergenic | 3,183 | 1.10 × 10–05 | 4.85 × 10–06 | 1.09 |
| SP1 | 9,575 | 1.36 × 10–05 | 6.60 × 10–06 | 1.43 |
| SP2 | 4,604 | 1.71 × 10–05 | 8.89 × 10–06 | 1.43 |
aNucleotide diversity (π); i.e., the average pairwise nucleotide differences per site.
bWaterson’s estimator of nucleotide diversity per site.
cTajima’s D Neutrality test statistic.
Linkage disequilibrium pattern and distribution of haplotype blocks in the B. carinata collection.
| Chromosome | Number of SNPs | LD decay at | Mean | Number of haplotype blocks | Max size of block (kbp) | Recombination rate (ƿ/kb) |
|---|---|---|---|---|---|---|
| BC | 10,199 | 700 | 0.077 | 1970 | 697.43 | 1.07 |
| B | 7,452 | 475 | 0.076 | 1,431 | 697.43 | 1.28 |
| C | 2,747 | 725 | 0.081 | 539 | 617.98 | 0.88 |
| B1 | 771 | 350 | 0.073 | 146 | 284.77 | 1.47 |
| B2 | 1,261 | 175 | 0.048 | 240 | 96.02 | 1.33 |
| B3 | 1,207 | 5,000 | 0.121 | 227* | 147.53 | 1.09 |
| B4 | 599 | 200 | 0.057 | 118 | 155.05 | 1.73 |
| B5 | 894 | 275 | 0.052 | 186 | 202.54 | 0.98 |
| B6 | 809 | 200 | 0.056 | 165 | 190.35 | 1.22 |
| B7 | 920 | 775 | 0.101 | 162 | 715.05 | 0.97 |
| B8 | 991 | 525 | 0.061 | 188 | 506.46 | 1.50 |
| C1 | 213 | 175 | 0.073 | 47 | 88.97 | 0.77 |
| C2 | 266 | 425 | 0.108 | 51 | 188.85 | 0.73 |
| C3 | 503 | 300 | 0.062 | 101 | 197.44 | 0.50 |
| C4 | 300 | 675 | 0.170 | 56 | 617.98 | 0.53 |
| C5 | 262 | 200 | 0.087 | 46 | 60.39 | 1.18 |
| C6 | 270 | 300 | 0.066 | 57 | 188.77 | 0.65 |
| C7 | 302 | 525 | 0.082 | 57 | 457.46 | 0.70 |
| C8 | 348 | 100 | 0.062 | 66 | 68.43 | 1.61 |
| C9 | 283 | 200 | 0.077 | 58 | 139.89 | 1.25 |
| SP1 | 9,575 | 475 | 0.070 | – | – | – |
| SP2 | 4,604 | > 50,000 | 0.465 | – | – | – |
*Strong LD (FAE1 region) was excluded and LD recalculated for use in haplotype block analysis.
Figure 3Linkage disequilibrium (LD) decay and genome-wide haplotype blocks. (a) LD decay at whole genome and subgenome level. Scatterplots showing r2 plotted against physical distance in kb. (b) Genome-wide distribution of haplotype blocks. Red rectangles represent genomic regions with haplotype blocks. Grey colours indicate genomic regions without haplotype blocks.
Genomic co-ordinates for regions of interest.
| Chromosome | Coordinates (bp) | Length (Mbp) | Genes* | PC variation (%) (First two PCs) | |
|---|---|---|---|---|---|
| B1 | 36,151,527–37,051,510 | 0.90 | 49.60 | SP1 and SP2 are not differentiated | |
| C2-1 | 13,922,159–16,610,327 | 2.69 | 83.66 | SP1 and SP2 are not differentiated | |
| C2-2 | 27,814,654–29,475,511 | 1.66 | 88.92 | SP1 and SP2 are not differentiated |
Bold values indicates multiple independent lines of evidence suggesting selection
*Candidates genes involved in fatty acid, glucosinolate and flowering pathway.
Figure 4Recent selective sweep regions in B. carinata on Chromosomes B3 and B8. Each row from the top, calculated for all the SNPs with a non-overlapping a window of 100 kb, represents: level of genetic differentiation (Pairwise F); Nucleotide diversity estimates, Tajima’s D neutrality statistics; and Heterozygosity of SNPs.