| Literature DB >> 31740845 |
Jonathan J Stocks1,2, Carey L Metheringham1,2, William J Plumb1,2,3, Steve J Lee4, Laura J Kelly1,2, Richard A Nichols1, Richard J A Buggs5,6.
Abstract
Populations of European ash trees (Fraxinus excelsior) are being devastated by the invasive alien fungus Hymenoscyphus fraxineus, which causes ash dieback. We sequenced whole genomic DNA from 1,250 ash trees in 31 DNA pools, each pool containing trees with the same ash dieback damage status in a screening trial and from the same seed-source zone. A genome-wide association study identified 3,149 single nucleotide polymorphisms (SNPs) associated with low versus high ash dieback damage. Sixty-one of the 192 most significant SNPs were in, or close to, genes with putative homologues already known to be involved in pathogen responses in other plant species. We also used the pooled sequence data to train a genomic prediction model, cross-validated using individual whole genome sequence data generated for 75 healthy and 75 damaged trees from a single seed source. The model's genomic estimated breeding values (GEBVs) allocated these 150 trees to their observed health statuses with 67% accuracy using 10,000 SNPs. Using the top 20% of GEBVs from just 200 SNPs, we could predict observed tree health with over 90% accuracy. We infer that ash dieback resistance in F. excelsior is a polygenic trait that should respond well to both natural selection and breeding, which could be accelerated using genomic prediction.Entities:
Mesh:
Year: 2019 PMID: 31740845 PMCID: PMC6887550 DOI: 10.1038/s41559-019-1036-6
Source DB: PubMed Journal: Nat Ecol Evol ISSN: 2397-334X Impact factor: 15.460
Extended Data Fig. 1Schematic overview of the study design.
Showing sampling and pooling strategies and dependencies of analyses for genome-wide association study and genomic prediction.
Figure 1Summary of variation among the sequenced DNA pools using Correspondence Analysis (CA).
Major allele frequencies were used for all 31 seed source populations (including replicate). Numbers after seed source code correspond to health status (1 - healthy or 2 - infected by ADB). The vertical axis represents Principal Coordinate 1, which accounts for 10% of the variation and the horizontal axis represents Principal Coordinate 2, which accounts for 9% of the variation.
Extended Data Fig. 2Circle plot of major allele frequency correlation values between all 31 pools in the Pool-seq dataset.
Numbers after seed source code correspond to health status (1 - healthy or 2 - damaged by ADB). Pool NSZ204:1 (with low ADB damage) was technically replicated (NSZ204:1R) using the same set of trees. Both pools from NSZ106 and NSZ107 were biologically replicated for both high and low damage pools, using different sets of trees. High correlation for both technical (NSZ204:1R) and biological replicates (NSZ 106 & 107) can be seen.
Extended Data Fig. 3Detection of contamination in the F. excelsior reference genome (BATG0.5).
Blobtools plot for the showing taxonomic affiliation at the phylum rank level, distributed according to GC content and base coverage. Contigs that were not classified as streptophyta corresponded to 0.5% of the genome assembly and 0.24% of all mapped reads.
Extended Data Fig. 4Pool-seq GWAS p-value density histogram with line plots of the q-values and local False Discovery Rate (FDR) values versus p-values.
The π0 estimate is also displayed.
Figure 2Manhattan plot for pool-seq genome-wide association study of tree health under natural ash dieback inoculation.
For each SNP a -log10(p) value is shown. The green line represents the p = 1 x 10-13 threshold. Loci are ordered by position in the F. excelsior reference genome (BATG0.5).
Ash genes likely to be affected by top GWAS candidate SNPs.
Based on the top 192 hits by p-value (with -log10(p) > 13): (1) Genes that contain one or more significant SNP loci altering protein sequence; (2) Genes containing SNPs that are transcribed but not translated (synonymous changes, and changes in UTRs and introns); (3) Genes that are within 5Kb of significant SNP loci and the closest gene to those loci; Asterisks (*) mark genes that have evidence for involvement with disease resistance in other species. The “Gene” column gives the final six digits for the full gene names for the annotation of the ash genome[2], which are in the form FRAEX38873_v2_000######. Details of amino acid changes in missense variants can be found in Table S5.
| Contig location | Gene | Predicted function | Variant functions (positions) |
|---|---|---|---|
| 003260* | BED finger-NBS-LRR resistance protein (for
model see | 1x missense variant (3018) | |
| 003270* | Protein CPR-5-like (LOC111390874), transcript variant X1, mRNA | 5x 3' UTR variant (6993, 7018, 7026,
7098, 7133) | |
| 116110 | 60S ribosomal protein L4-1 (LOC111391733),
mRNA (for model see | 4x missense variant (48646, 48672, 48665,
48775) | |
| 164520 | F-box/kelch-repeat protein SKIP6
(LOC111408673), mRNA (for model see | 1x 5' UTR variant (26379) | |
| 180950 | Protein DAMAGED DNA-BINDING (for model see
| 1x missense variant (25205) | |
| 305440* | Uncharacterized LOC111377332 (LOC111377332), transcript variant X1, mRNA | 1x missense variant (206888) | |
| 346660 | Protein HEAT INTOLERANT 4-like (LOC111409690),
mRNA( | 1x missense variant (12331) | |
| 116430 | Uncharacterized LOC111374226 (LOC111374226), transcript variant X2, mRNA | 1x synonymous variant (13617) | |
| 145630 | VIN3-like protein 1 (LOC111390514), transcript variant X2, mRNA | 1x synonymous variant (45617) | |
| 234590* | WPP domain-interacting protein 1-like (LOC111407140), mRNA | 1x synonymous variant (42379) | |
| 013250* | MACPF domain-containing protein CAD1-like (LOC111379406), mRNA | 1x 3' UTR variant (98777) | |
| 047060 | Short-chain dehydrogenase TIC 32, chloroplastic-like (LOC111372928), transcript variant X2, mRNA | 1x intron variant (118417) | |
| 051390 | Probable boron transporter | 1x intron variant (13054) | |
| 057960 | Beta-taxilin (LOC111407559) | 1x intron variant (123113) | |
| 074310 | Squamosa promoter-binding-like protein 8 (LOC111383449), mRNA | 1x 3' UTR variant (174158) | |
| 094440 | Regulatory-associated protein of TOR 1 (LOC111407995), mRNA | 1x 3' UTR variant (29838) | |
| 105920 | Uncharacterized LOC111409367 (LOC111409367), mRNA | 1x 5' UTR variant (60622) | |
| 114040 | ATP synthase subunit O, mitochondrial-like (LOC111411675), mRNA | 1x intron variant (371764) | |
| 154470 | Uncharacterized LOC111404100 | 2x intron variant (83904, 83931) | |
| 168770 | Protein LATE FLOWERING-like (LOC111406993), mRNA | 1x 5' UTR variant (6633) | |
| 207550 | receptor-like cytosolic | 1x intron variant (50024) | |
| 211580* | Squalene monooxygenase-like (LOC111410179), mRNA | 1x intron variant (957) | |
| 238810 | Uncharacterized LOC111381639 | 1x 3' UTR variant (40482) | |
| 266510 | Zinc finger CCCH domain-containing protein 11-like (LOC111366362), transcript variant X3, mRNA | 1x intron variant (3930) | |
| 305460* | Protein PHR1-LIKE 3-like (LOC111377335), mRNA | 14x intron variant (235226, 235272, 235318, 235327, 235343, 235356, 235367, 235506, 235514, 235705, 235801, 235831, 235852, 235915) | |
| 308800 | Probable DNA helicase MCM8 (LOC111365493), transcript variant X2, mRNA | 2x intron variant (70778, 71059) | |
| 319390 | Uncharacterized LOC111408674 (LOC111408674), mRNA | 1x intron variant (7471) | |
| 342270 | Protein LIKE COV 2-like (LOC111397136), mRNA | 2x intron variant (38068, 42193) | |
| 342280 | Uncharacterized LOC111408663 (LOC111408663), transcript variant X5, misc_RNA | 1x 5' UTR variant (76154) | |
| 346650 | Pentatricopeptide repeat-containing protein At4g39620, chloroplastic-like (LOC111408678), transcript variant X2, mRNA | 1x 3' UTR variant (7650) | |
| 372350 | Uncharacterized LOC111393674 (LOC111393674), mRNA | 3x intron variant (383677, 383731, 383732) | |
| 378970 | Uncharacterized LOC111377872 (LOC111377872), transcript variant X8, mRNA | 1x intron variant (20201) | |
| 025560* | Probable xyloglucan endotransglucosylase/hydrolase protein 28 (LOC111399252), mRNA(3) | 1x upstream gene variant (130319) | |
| 059350 | Low affinity sulfate | 1x upstream gene variant (22510) | |
| 059880 | 60S Ribosomal protein L30-like (LOC111409078), transcript variant X1, mRNA | 1x upstream gene variant (3808) | |
| 065110 | E3 ubiquitin-protein ligase RNF170-like (LOC111409836), transcript variant X3, mRNA | 2x upstream gene variant (16009, 16035) | |
| 086130 | Oleoyl-acyl carrier protein thioesterase 1, chloroplastic-like (LOC111385815), mRNA(1) | 2x downstream gene variant (130264, 130505) | |
| 124500 | Ent-kaurene oxidase, chloroplastic-like (LOC111394477), mRNA | 1x upstream gene variant (5627) | |
| 164530 | Uncharacterized LOC111408676 (LOC111408676), transcript variant X3, mRNA | 1x upstream gene variant (70203) | |
| 190500* | Ethylene-responsive transcription factor ERF098-like (LOC111379140), mRNA(1) | 2x downstream gene variant (152443, 152551) | |
| 214510 | Basic Helix loop helix protein A (LOC111388546) mRNA | 1x upstream gene variant (32110) | |
| 239330 | Vacuolar protein sorting-associated protein 20 homolog 2-like (LOC111393567), mRNA | 1x upstream gene variant (44192) | |
| 241210 | Kinesin-like protein KIN-7K, chloroplastic (LOC111375100), mRNA | 1x upstream gene variant (108478) | |
| 255180 | Casein kinase 1-like protein HD16 (LOC111366886), mRNA | 1x upstream gene variant (152485) | |
| 258470* | F-box/FBD/LRR-repeat protein At1g13570-like (LOC111367195), transcript variant X2, mRNA | 1x upstream gene variant (39661) | |
| 262070 | Putative zinc transporter At3g08650 (LOC111388858), mRNA | 1x downstream gene variant (94609) | |
| 282910 | Nitrate regulatory gene2 protein-like (LOC111409481), mRNA | 1x upstream gene variant (101905) | |
| 282920 | Uncharacterized LOC111409076 (LOC111409076), mRNA | 2x downstream gene variant (119168,
119172) | |
| 282930 | Uncharacterized LOC111409077 (LOC111409077), transcript variant X3, mRNA | 1x upstream gene variant (138012) | |
| 296810* | Ankyrin repeat-containing protein NPR4-like (LOC111379708), mRNA | 1x downstream gene variant (222554) | |
| 310310 | Calmodulin-binding protein 60 A-like (LOC111368134), transcript variant X3, mRNA | 2x upstream gene variant (4636, 4779) | |
| 340820* | Dehydration-responsive element-binding protein 2C-like (LOC111397561), transcript variant X1, mRNA | 5x upstream gene variant (17466, 17478, 17533, 18485, 18547) | |
| 342250 | Ethylene-responsive transcription factor ERF113-like (LOC111408666), mRNA | 1x upstream gene variant (15367) | |
| 342260* | Protein S-acyltransferase 8-like (LOC111408665), mRNA | 2x upstream gene variant (28606, 28760) | |
| 364260 | Pentatricopeptide repeat-containing protein At4g39620, chloroplastic-like (LOC111408678), transcript variant X2, mRNA | 1x upstream gene variant (9024) | |
Blast performed using cDNA sequences
Figure 3Manhattan plots for contigs containing genes with missense variants associated with tree health under natural ash dieback inoculations.
Points representing SNPs within genes are colored and those genes containing missense SNPs are named above the plot in the same colour as the points representing SNPs within them. The red line represents the p = 1 x 10-13 threshold.
Extended Data Fig. 5Predicted protein structures for genes containing amino acid changes associated with tree health status under ADB pressure.
The protein structures to the left were more common in damaged trees, and those to the right were more common in healthy trees. Variant amino acids are coloured in magenta and indicated with a black arrowhead. (a) Gene FRAEX38873_v2_000003260, a BED finger-NBS-LRR resistance protein, where position 157 is a leucine (left) versus tryptophan (right) variant. Two ATP molecules are shown in orange to indicate the location of nucleotide binding sites. (b) Gene FRAEX38873_v2_000164520, a F-box/kelch-repeat, where position 13 is a glutamine (left) versus arginine (right) variant. (c) FRAEX38873_v2_000180950, a Protein DAMAGED DNA-BINDING, where position 99 is a proline (left) versus leucine (right) variant. DNA molecules are shown in orange docked at the proteins’ DNA binding sites. (d) Gene FRAEX38873_v2_000116110, a 60S ribosomal protein L4-1, where position 251 is an arginine (left) versus glycine (right) variant, position 285 is a methionine (left) versus arginine (right) variant, position 287 is an asparagine (left) versus lysine (right) variant and position 297 is a threonine (left) versus alanine (right) variant.
Figure 4Performance of genomic prediction models for health under ash dieback pressure.
For 150 individual ash trees, with models trained on pooled sequencing of 1250 trees, using varying numbers of SNPs in training and test sets. Solid lines show results for SNPs selected using the pool-seq GWAS; dashed lines show mean results for repeated runs (n=10) of randomly selected SNPs, with bars indicating standard error. Left column: correlation of genomic estimated breeding value (GEBV) with observed health status. Right column: accuracy of health status assignment from GEBV.
Extended Data Fig. 6Genomic prediction results using the 150 individually genotyped samples as both training and testing set, showing little difference in accuracy between GWAS SNPs and random SNPs.
(A) GWAS candidate SNPs with all data filters applied (mapping quality, indel and repeat removal); (B) GWAS candidate SNPs only filtering by mapping quality and indel removal; (C) random selection of SNPs using all data filters (mean and standard error shown for N=10 runs, each of 500 iterations); (D) GP allocation accuracy calculated using data with all filters applied. The scale on the left hand vertical axis is for correlation, and the scale on the right hand vertical axis is for accuracy. 100 to 5 million SNPs used to train and test the rrBLUP model.
Extended Data Fig. 7Genomic prediction using Pool-seq data for training and 150 NSZ 204 individuals for testing.
Dashed lines show results excluding Pool-seq data from NSZ 204 (the test seed source) from the training dataset, whereas solid lines show results with NSZ 204 included. The left column shows correlation of observed phenotype and GEBV and the right column shows accuracy of phenotypic assignment from GEBV.
Figure 5Performance of genomic prediction models for selection.
Genomic prediction accuracy of assignment of health status for the (left) top 20% and (right) top 30% of test population trees by GEBV, using 1000 to 50,000 SNPs identified by GWAS in the training set and use of ten to 250 SNPs in the testing set.