| Literature DB >> 30325414 |
Elisabeth Veeckman1,2,3, Sabine Van Glabeke1, Annelies Haegeman1, Hilde Muylle1, Frederik R D van Parijs1, Stephen L Byrne4, Torben Asp5, Bruno Studer6, Antje Rohde1, Isabel Roldán-Ruiz1,3, Klaas Vandepoele2,3,7, Tom Ruttink1,2.
Abstract
Revealing DNA sequence variation within the Lolium perenne genepool is important for genetic analysis and development of breeding applications. We reviewed current literature on plant development to select candidate genes in pathways that control agronomic traits, and identified 503 orthologues in L. perenne. Using targeted resequencing, we constructed a comprehensive catalogue of genomic variation for a L. perenne germplasm collection of 736 genotypes derived from current cultivars, breeding material and wild accessions. To overcome challenges of variant calling in heterogeneous outbreeding species, we used two complementary strategies to explore sequence diversity. First, four variant calling pipelines were integrated with the VariantMetaCaller to reach maximal sensitivity. Additional multiplex amplicon sequencing was used to empirically estimate an appropriate precision threshold. Second, a de novo assembly strategy was used to reconstruct divergent alleles for each gene. The advantage of this approach was illustrated by discovery of 28 novel alleles of LpSDUF247, a polymorphic gene co-segregating with the S-locus of the grass self-incompatibility system. Our approach is applicable to other genetically diverse outbreeding species. The resulting collection of functionally annotated variants can be mined for variants causing phenotypic variation, either through genetic association studies, or by selecting carriers of rare defective alleles for physiological analyses.Entities:
Keywords: allele reconstruction; genomic diversity; natural variation; targeted resequencing; variant calling
Mesh:
Year: 2019 PMID: 30325414 PMCID: PMC6379033 DOI: 10.1093/dnares/dsy033
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Assignment of 503 candidate genes to pathways and distribution of high impact mutations per pathway
| Pathway | Gene families | # candidate genes | Stop gain | Splice site | Frame shift |
|---|---|---|---|---|---|
| Development | BCH1, BRIZ, CBP80, DRM1, HB13, HYL1, ING2, RSM1, SAMDC4 | 14 | 2 (14%) | 4 (29%) | – |
| Cell wall | 4CL, ALDH, C3H, C4H, CAD, CAD2, CCoAOMT, CCR, CES, COMT, F5H, HCT, HPRGP, IRX, LAC, OFP, PAL, POX, SND, XylS, XylT | 121 | 41 (34%) | 16 (13%) | 5 (4%) |
| Cell wall TF | ERF, WRKY | 6 | 2 (33%) | – | 1 (17%) |
| Cell wall TF MYB | MYB | 21 | 3 (14%) | – | 1 (5%) |
| Cell wall TF NAC | NAC | 11 | 2 (18%) | 4 (36%) | 1 (9%) |
| Chromatin remodelling | MET1, SWI | 4 | 3 (75%) | 2 (50%) | – |
| Lateral organ initiation | ANT, SLOMO, TOP1A | 6 | 1 (17%) | – | – |
| Lateral organ patterning morphogenesis | AS, CLF, DOT5, GRF, KAN, NOV, SE, TRN1, YABBY, ZPR1, ZPR3 | 30 | 7 (23%) | 3 (10%) | 2 (7%) |
| Lateral organ identity | AN3, BOP, HDZIPIII | 10 | 4 (40%) | 1 (10%) | – |
| Light signalling | bHLHABAI, CO1, COP9, CRY, DET1, HY5, LHY, PCI, PFT1, PHYB, PIF, SPA | 29 | 4 (14%) | 7 (24%) | – |
| Shoot apical meristem | BARD1, BLH, CLPS3, FTA, KNAT, OBE1, ULT1, USP1, VEF2, WOX14, WUS | 25 | 8 (32%) | 5 (20%) | 3 (12%) |
| Self-incompatibility | DUF247, GK | 4 | 2 (50%) | 1 (25%) | 1 (25%) |
| Transition to flowering | CCA, FCA, FIE, FKF1, FLD, FPA, FT, FVE, FWA, FY, GI, LHP1, MBD9, PHP, RAV, SDG8, SPL3, VIL3, VRN1, VRN1-like | 45 | 19 (42%) | 12 (27%) | 1 (2%) |
| Flower development | ESD4, HAC3, LFY3, LUG, MADS, RGA, SEU, SUF4, SUP | 31 | 2 (6%) | 4 (13%) | – |
| Transcription factor | BIM2, TCP | 8 | 2 (25%) | – | – |
| ABA biosynthesis | NCED1, PDS1, PDS3 | 4 | 1 (25%) | 1 (25%) | – |
| ABA signalling | ABI1, ABI3, ABI5, ABI8, AIP3, DRIP, GBF, GPA, GTG2, HD2C, PSY, SAD1, SIR3, WIG, ZEP | 29 | 10 (34%) | 3 (10%) | – |
| Auxin biosynthesis | TAA1, TAR2, YUC | 6 | 3 (50%) | – | 1 (17%) |
| Auxin signalling | ADA2B, AMP1, ARF, AUXIAA, AXR, AXR1, AXR4, AXR6, CAND1, GH3, TIR1 | 20 | 8 (40%) | 3 (15%) | – |
| Auxin transport | AUX1, ENP, PGP4, PID2, PIN1, PIN1like, SPS | 12 | 1 (8%) | – | – |
| Brassinosteroid biosynthesis | DWF1, DWF3, DWF5, DWF7, SQS | 8 | 2 (25%) | 1 (13%) | – |
| Brassinosteroid signalling | BES1 | 2 | – | – | – |
| Cytokinin signalling | ARR, CRE, GCR1, RR | 11 | 2 (18%) | 1 (9%) | – |
| Ethylene biosynthesis | ACS | 2 | 1 (50%) | – | – |
| Ethylene signalling | EBF1, EBF2, EIL3, EIN2, ETO1, ETR1 | 13 | 7 (54%) | 1 (8%) | 1 (8%) |
| Gibberellin biosynthesis | GAOX | 11 | 4 (36%) | – | 2 (18%) |
| Gibberellin signalling | GID1A, SHI, SPY | 5 | – | 1 (20%) | – |
| Strigolacton biosynthesis | D14, D27, MAX1, MAX3, MAX4 | 11 | 3 (27%) | 2 (18%) | – |
| Strigolacton signalling | MAX2, TB1 | 4 | – | – | – |
Figure 1Target region coverage per genotype. For each of the 503 candidate genes, the target region was delineated as the gene model and an additional 1,000 bp upstream promoter region. The mean fraction of the target region covered per genotype is shown in function of the number of uniquely mapped reads using BWA-MEM after duplicate removal, using different RD thresholds [RD ≥ 1 (blue), RD ≥ 6 (green) and RD ≥ 10 (orange)].
Figure 2Overlap of variant sets generated using BWA-MEM and GSNAP mappings as input for four VC pipelines. SNPs and indels were determined for 503 candidate genes in 736 genotypes for BWA-MEM and GSNAP mappings using four VC pipelines. The intersect of variants sets was calculated to determine common variants (dark grey) and uniquely identified variants using BWA-MEM mappings (light grey) or GSNAP mappings (black) as input. The Jaccard index value indicates the corresponding similarity.
Figure 3Size and concordance of bi-allelic SNP and indel sets of four VC pipelines, before and after precision-based filtering. SNPs and indels were identified for 503 candidate genes in 736 genotypes using four VC pipelines and concordance was calculated for bi-allelic SNPs (a) and indels (b). Per Upset plot, the lower left panel shows the total number of variants per VC pipeline; the lower right panel shows the overlap in call sets between the four VC pipelines. The bar graph shows the size per concordance group before (black) and after integration by VMC and precision-based filtering (EP > 80%) (light grey).
Figure 4Effect of hard filtering and precision-based filtering on the saturation of genotype calls across the 736 genotypes. SNPs and indels were determined for 503 candidate genes in 736 genotypes using four VC pipelines and integrated using the VMC. The genotype call rate was calculated as the number of genotype calls present for each variant, over the total number of genotypes, and plotted cumulatively to estimate the genotype call saturation. This was done for bi-allelic variant sets: (a) before and (b) after hard filtering (RD > 6, GQ > 30) of the variant sets returned by the four VC pipelines and (c) before and (d) after precision-based filtering (EP > 80%) of the VMC output.
Figure 5Distribution of EP values in HQ and LQ variant sets. For variants present in 78 genotypes and 147 amplicons, box plots show the distributions of EP values for commonly identified (HQ) and uniquely identified variants (LQ) in the probe capture and Hi-Plex variant sets.
Figure 6Sequence diversity and distribution of 28 newly identified alleles of LpSDUF247 across breeding populations and wild accessions. A similarity matrix (a) and phylogenetic tree (b) were built using the protein sequences of 28 alleles and reference sequence (R) of LpSDUF247, together with three additional DUF247 genes. Panel c gives an overview of the distribution of the LpSDUF247 alleles across the gene pool. The alleles present per genotype were identified by mapping the reads to a multi-allelic reference genome, and calculating the ratio of average RD per allele over the total number of reads mapping to LpSDUF247 alleles.