Literature DB >> 35450212

SNP Discovery Using BSR-Seq Approach for Spot Blotch Resistance in Wheat (Triticum aestivum L.), an Essential Crop for Food Security.

Ravi Ranjan Saxesena¹, Vinod Kumar Mishra¹, Ramesh Chand², Uttam Kumar³, Apurba Kumar Chowdhury⁴, Jyotika Bhati⁵, Neeraj Budhlakoti⁵, Arun Kumar Joshi^3,6.

Abstract

The pathogenic fungus, Bipolaris sorokiniana, that causes spot blotch (SB) disease of wheat, is a major production constraint in the Eastern Gangetic Plains of South Asia and other warm, humid regions of the world. A recombinant inbred line population was developed and phenotyped at three SB-prone locations in India. The single nucleotide polymorphism (SNP) for SB resistance was identified using a bulked segregant RNA-Seq-based approach, referred to as "BSR-Seq." Transcriptome sequencing of the resistant parent (YS#24), the susceptible parent (YS#58), and their resistant and susceptible bulks yielded a total of 429.67 million raw reads. The bulk frequency ratio (BFR) of SNPs between the resistant and susceptible bulks was estimated, and selection of SNPs linked to resistance was done using sixfold enrichments in the corresponding bulks (BFR >6). With additional filtering criteria, the number of transcripts was further reduced to 506 with 1055 putative polymorphic SNPs distributed on 21 chromosomes of wheat. Based on SNP enrichment on chromosomal loci, five transcripts were found to be associated with SB resistance. Among the five SB resistance-associated transcripts, four were distributed on the 5B chromosome with putative 52 SNPs, whereas one transcript with eight SNPs was present on chromosome 3B. The SNPs linked to the trait were exposed to a tetra-primer ARMS-PCR assay, and an SNP-based allele-specific marker was identified for SB resistance. The in silico study of these five transcripts showed homology with pathogenesis-related genes; the metabolic pathway also exhibits similar results, suggesting their role in the plant defense mechanism.

Entities: Chemical

Keywords: RILs; SNPs; bulk segregant analysis; food security; spot blotch; wheat

Year: 2022 PMID： 35450212 PMCID： PMC9016274 DOI： 10.3389/fgene.2022.859676

Source DB: PubMed Journal: Front Genet ISSN： 1664-8021 Impact factor: 4.772

Introduction

As a primary staple crop, the importance of wheat is well-documented in food security and provides nutrition for more than 35% of the world’s population (FAO, 2018). Great progress has been achieved in wheat production since the Green Revolution; however, due to climate change and the popularization of dwarf and semidwarf varieties, many of the biotic factors, including the pathogen of spot blotch (SB) disease have gained importance in countries such as India (Joshi et al., 2007a). The prevalence of SB is more common in the wheat-producing countries, notably in the Eastern Gangetic Plains of India, Nepal, Bangladesh, China, and South America (Joshi et al., 2007b; Gupta et al., 2018; Singh et al., 2018). A hemi-biotrophic fungus, Bipolaris sorokiniana (Sacc.), causes SB disease in wheat, seedling blight, common root rot, seedling rot, and seed rot (Acharya et al., 2011). Crop yield losses in the Indian subcontinent alone are estimated to be in the range of 15%–25% (Dubin and Van Ginkel, 1991; Pandey et al., 2021), but the level of loss in individual fields can be much higher. In South Asia, the disease is expected to inflict a 15%–20% average yield loss (Duveiller and Sharma, 2009); however, under favorable conditions, more than 85% of losses are reported in Zambia during the summer season (Raemaekers, 1987). Thus, SB disease might have a significant impact on global food security. Seed treatments and foliar fungicidal sprays are both recommended for the treatment of SB disease. Although strong-efficacy fungicides are available to manage SB disease, their application may have adverse effects on human health and the environment (Monyo et al., 2009; Castro et al., 2018). Other than health and environmental constraints, the use of fungicides led to an increase in the cost of cultivation and a reduction in farmers’ income. Therefore, the most successful, cost-efficient, and environmentally friendly strategy to control the disease is to use cultivars with host resistance (Gupta et al., 2018; Zhang et al., 2020). However, breeding for SB resistance has been slow due to the quantitative nature of inheritance (Joshi et al., 2004) and is often influenced by the environment (Joshi et al., 2007b). In such a situation, genetics, and genomics-based technology aid in developing resistant plants more efficiently. Classic methods of mapping QTLs/genes, involve phenotyping and genotyping of segregating mapping populations with polymorphic markers identified between parents. However, the identification of polymorphic markers between contrasting parents is a time-consuming and tedious task (Schneeberger and Weigel, 2011; Jaganathan et al., 2020). So far, SSRs, or microsatellite markers, are used most widely to map wheat genomes for SB resistance (Singh and Singh, 2015). Presently, several QTLs and four genes (Sb1-Sb4) have been identified as having major effects on SB resistance (Gupta et al., 2018). Gene Sb1 is present on chromosome 7DS, where it shares space with Lr34 (Lillemo et al., 2013), Sb2 on chromosomes 5BL (Kumar et al., 2015), Sb3 on 3BS (Lu et al., 2016), and a recently discovered Sb4 are present on 4BL (Zhang et al., 2020). The identification of genetic regions and the development of robust molecular markers in wheat has long been hampered by its hexaploidy nature (AABBDD), which has the large sizes of the subgenomes and more than 85% repeated sequences (Paux et al., 2012; Wicker et al., 2018). It is difficult to design single-copy markers because the level of polymorphism is quite low in wheat compared with other cereals (Paux et al., 2012). Liu et al. (2012) propose a new genetic mapping strategy called “BSR-Seq,” which combines bulked segregant analysis with RNA-Seq. BSR-Seq is a technique that involves sequencing RNAs from extreme bulks for the trait of interest. The method is especially important for crops with large and complex genomes, such as wheat, where resequencing is still prohibitively expensive (Liu et al., 2012; Xie et al., 2020). BSR-Seq can also be used to fine-map crops that do not yet have a reference genome sequence (O’Neil and Emrich, 2013). RNA-Seq captures the full range of dynamic spectrum of the transcriptome, advantageous over array platforms that are restricted to the predefined set of variants incorporated into the array design. SNPs can be identified either by aligning to a known transcriptome or by de novo assembly over the transcriptome (Grabherr et al., 2011; O’Neil and Emrich, 2013). RNA-Seq is more likely to discover functional SNPs than other SNP discovery methods (Pootakham et al., 2014). Genotyping by RNA-Seq can detect much more variation compared with array-based technology because it covers 70%–90% of the total genes based on the tissue and development stage of the sample. For the development of constitutive markers, the combination of advanced sequencing technology with BSA provides a powerful tool for the rapid identification of genes or causal mutations (Xu and Bai, 2015; Zou et al., 2016). BSR-Seq has been applied successfully to localize the candidate gene for grain protein content (GPC) gene GPC-B1 in wheat to 0.4 cM from 30 cM (Trick et al., 2012). The glossy 3 (gl3) gene of maize was allocated to a ∼2 Mb area by BSR-Seq, and a single gene, MYB transcription factor, was found (Liu et al., 2012). Ramirez-Gonzalez et al. (2015) identified putative single nucleotide polymorphisms (SNPs) for the Yr15 locus using BSR-Seq and mapped this gene to a 0.77-cM interval that imparts resistance to yellow rust in wheat. Pearce et al. (2016) used RNA-Seq to analyze and compare the transcriptomes of phyB-null and phyC-null TILLING mutants and identified 82 genes that are significantly upregulated or downregulated in both types of mutants. In a more recent study on BSR-Seq (Klein et al., 2018), cloned mutant genes in maize were involved in plant growth by delineating mapping intervals. Compared with the entire population analysis, BSA provides a shortcut to identifying and developing markers for a trait. The substantial reduction in the cost of sequencing, particularly with the introduction of BSR-Seq, may be accomplished by genotyping several bulks from a large-sized population, and the power of detection can be significantly improved, particularly for alleles of interest or rare alleles (Hiebert et al., 2014; Zou et al., 2016). As a result, BSR-Seq is widely used for the quick finding of genes and markers associated with the target gene. To meet future food demands, climatic resilience and disease-resistant wheat combined with good agronomic value can potentially improve its productivity (Mondal et al., 2016). To date, SB resistance of wheat is quantitative, involving genes and many QTLs with low-coverage linkage maps. To reduce the loss of wheat productivity and grain quality caused by SB, new resistance genes must be identified. Here, the present investigation was initiated with an objective to identify putative SNPs for SB resistance by the “BSR-Seq” approach.

Materials and Methods

Plant Materials

A total of 211 single seed descent (SSD)-derived recombinant inbred lines (RILs) were generated from the cross YS#24 × YS#58. The parental lines YS#24 and YS#58 are stable RILs selected from the Yangmai6 × Sonalika cross and advanced to the F12 generation. Yangmai6 is a Chinese source of SB resistance, and Sonalika is a susceptible cultivar of Mexican origin that has been under cultivation in India for more than five decades/during the green revolution. The parents of the RILs used in this study are similar with respect to the agronomical and phenological traits but harbor different SB resistance QTLs (Kumar et al., 2009). To develop RILs, field trials in the crop season were conducted on the Agricultural Research Farm, BHU, Varanasi, whereas off-season nurseries were raised for generation advancement at Wellington, Tamil Nadu, India. The whole procedure of RIL development and its evaluation is mentioned in the flow chart (Supplementary Figure S1).

Phenotypic Evaluation of RILs for SB Resistance

The 211 RILs (F5 and F6) were evaluated at three hot spots in India, namely, Agricultural Research Farm, BHU, Varanasi (25°18′N, 83°03′E); Borlaug Institute for South Asia, Samastipur, Pusa (25°57′N, 85°40′E), Bihar; and Uttar Banga Krishi Vishwavidyalaya, Coochbehar (26°19′N, 89°27′E), West Bengal, during two consecutive crop seasons 2013–14(F5) and 2014–15(F6). The F7 generation was evaluated only at BHU in crop season 2015–16. At each location, all RILs were planted along with their parents (YS#24 and YS#58) in two replications following a randomized complete block design (RCBD). Each line was sown in two rows of 2 m, keeping row-to-row and plant-to-plant distances of 20 and 5 cm, respectively. To ensure disease build-up and spread, two rows of the susceptible genotype Sonalika was planted after every 20th row and in alleys along with the plots. The susceptible spreader rows served as an inoculum source for epidemic development in addition to the direct inoculation on the RILs (Saxesena et al., 2017). Planting at all sites was done between the 1st and 10th of December each year to coincide with the post-anthesis with higher temperatures, conducive to disease development (Chaurasia et al., 2000). The experimental plots were fertilized at 120 kg N2, 60 kg P2O5, and 40 kg K2O per hectare. A complete dose of K2O and P2O5 was given at the time of sowing. The N2 was given in three splits of 60, 30, and 30 kg per ha at sowing, 21 days after sowing (at first irrigation), and 45 days after sowing (at second irrigation), respectively. A total of five irrigations were given to maintain sufficient moisture in the field.

Inoculation of the Pathogen, Disease Assessment, and Estimation of Area Under the Disease Progress Curve (AUDPC)

The highly aggressive B. sorokiniana isolate HDBHU (NABM MAT1; NCBIJN128877) was used to create an artificial epiphytotic. The pathogen was multiplied on sorghum grains, suspended in water (104 spores per ml), and sprayed in the evening at the time of flag leaf emergence as described earlier (Joshi and Chand, 2002). Disease severity (DS) was scored using a double-digit scale, first at the beginning of anthesis (GS 63) and then at the end of anthesis (GS 69) and third at the late milk stage (GS 77) (Saari and Prescott, 1975). The percentage DS was calculated as D1/9 × D2/9 × 100, where D1 is the first digit referring to the vertical progress of disease and D2 is the second digit indicating the extent of leaf area affected. The AUDPC based on DS recorded at three growth stages was calculated following Shaner and Finney (1977). where Yi = disease level at time ti, t (i+1)−ti = time (days) between two disease scores, n = number of dates when disease was recorded.

Grouping of RILs and Construction of Bulk Samples

To increase the statistical power and reduce false positives, multiple bulks were selected independently from each of the three locations in both years (Zou et al., 2016). AUDPC was analyzed from the pooled data of disease severity in all the screened environments, and three resistant bulks with lower AUDPC (R-bulk1, R-bulk2, and R-bulk3) and three susceptible bulks with higher AUDPC (S-bulk1, S-bulk2, and S-bulk3) were prepared (Figure 1). Each group bulked for resistance and susceptibility is composed of 10 RILs.

FIGURE 1

Frequency distribution of the AUDPC of the wheat RIL population and parents. Resistant parent (RP), YS#24, and susceptible parent (SP), YS#58 clearly show the phenotypic differences for SB disease severity. Resistant bulk (R1-bulk, R2-bulk, and R3-bulk) with lower AUDPC and susceptible (S1-bulk, S2-bulk, and S3-bulk) with higher AUDPC were selected from each tail.

RNA Extraction and Transcriptome Sequencing

Seeds of each RIL of the resistant and susceptible bulks (30R + 30S RILs) and their parents (YS#24 and YS#58) were grown separately in a greenhouse; seedling leaves were harvested 14 days after sowing for RNA isolation. Total RNA was extracted from the leaf tissues of each RIL using a standard TRIzol method (Catalog # 15596018) and treated with DNase I to remove residual DNA contaminants. RNA samples were purified using the RNeasy Mini Kit (Qiagen, Hilden, Germany) and the quantitative and qualitative estimation was performed using the NanoDrop 1000 (Thermo Fisher Scientific) and the Agilent Bioanalyzer 2100, respectively. An equimolar concentration of RNA from 10 resistant individuals of RILs was pooled together to make resistant bulk 1, and the same was done to prepare the other resistant and susceptible bulks. Before cDNA library preparation, we enriched the RNA samples for transcripts using the absolute mRNA Purification Kit (Agilent Technologies, Santa Clara, CA, United States). The cDNA libraries were constructed, and Illumina paired-end adapters and barcode sequences were ligated onto the cDNA fragments (Pootakham et al., 2014). The pooled libraries were sequenced at QTLomics Technologies (Bengaluru, India) using the Illumina Next Seq 500 platform (Illumina, San Diego, CA, United States) to generate 76 bp paired end (PE) sequence reads for parents and bulked samples (resistance and susceptible) in three biological replicates.

Sequence Analysis; SNP Calling and Bulk Frequency Ratio Calculation

The data from Illumina Next Seq 500 was passed through Fast QC to check the quality of the reads (Babraham Bioinformatics - FastQC A, 2012). The ends of the reads with low quality, adaptor contamination, and low-quality regions were trimmed using the fastx-tool kit (http://hannonlab.cshl.edu/fastx_toolkit). Identification of SNPs was done; using a reference-based approach, the UniGene build63 was used as reference transcriptome (NCBI:ftp://ftp.ncbi.nhi.gov/repository/UniGene/Triticum_aestivum/Ta.seq.uniq.gz). Since UniGene represents only the genic portion, the complexity of the wheat genome was further reduced (Sidhu et al., 2015). The reads were aligned to transcript sequence build 63 using the Burrows–Wheeler Aligner (BWA) (Li and Durbin, 2009) using default parameters for PE libraries. Misaligned reads were removed from the unigene alignment and the unigenes with a high bulk frequency ratio (BFR) shortlisted. The allele frequency ratio of the resistant allele (allele of the resistant parent) in the resistant bulk compared with the resistant allele frequency in the susceptible bulk is known as the BFR. However, the genome locations of the SNPs were derived by alignment of shortlisted unigenes to the reference genome. We ran BLAST alignment and took the best hit throughout the transcript rather than the short read. The whole transcript alignment would give the best alignment to their respective genomes rather than the short read. The alignments were converted to one binary alignment/map (BAM) file per sample. On average, 80% of the RNA-Seq reads from each sample were able to align to the reference transcriptome (Pootakham et al., 2014). The SNPs for each sample were collected using the SAMtool sv0.1.18 as described by Li et al. (2009). To identify SNPs linked to SB resistance, polymorphic markers between the parents were identified using a custom Perl script followed by BFR calculation for the bulks following Trick et al. (2012). The algorithm was implemented in Perl and R software. BFR values were calculated independently for the comparisons of three bulks (Bulk 1:S-bulk1, R-bulk1; Bulk 2: S-bulk2, R-bulk2; Bulk 3: S-bulk3, R-bulk3). BFR values and depth of each SNP for all the bulks and parents were calculated in the R program. Potential SNPs linked to the trait were selected based on SNPs with a BFR ratio of six as the minimum threshold in all three bulk replicates. The unigene that contain putative SNPs were aligned to a repeat masked reference genome (transcriptome) of wheat using BLASTN, and the best hit for each transcript was recorded. The BLAST result was segregated according to the chromosome number, and the chromosomal regions based on SNP density (twice or greater than the average SNP density) were further shortlisted. To identify SNPs that were enriched for the corresponding parental allele, BFR was calculated in the appropriate bulk, YS #24 derived SNPs for the resistant bulks, and YS#58 derived SNPs for the susceptible. Finally, to classify and prioritize the SNPs across each bulk, the frequency of the allele (SNP index) calculated at each SNP position (Takagi et al., 2013) was estimated, and then the ratio between the bulks (BFR) for each SNP was determined (Trick et al., 2012). Thus, a high BFR in the resistant parent (YS#24) derived SNP was indicative of an allele that is very frequent in the resistant bulk while depleted in the susceptible. In addition, a threshold of BFR >6 was set to select putative SNPs for the presence or absence of polymorphism between bulks for further validation as done previously in wheat (Ramirez-Gonzalez et al., 2015).

SNP-Based Primer Designing

The tetra-primer amplification refractory mutation system-PCR (ARMS-PCR) is a simple and economical technique that produces an allele-specific reaction (Ye et al., 2001; Ruiz-Sanz et al., 2007) The principle behind this technique is that the allele-specific primer amplifies a region specific to the base present at the 3′terminus, thus making it allele-specific (Newton et al., 1989; Ye et al., 2001). The genomic region flanking 300 bp on both ends of putative SNPs was extracted and formatted using the Perl script, and the SNPs with the highest score were selected to design primers. The batch primer 3 web tools (http://probes.pw.usda.gov/batchprimer3/) were finally used to create the tetra-primer ARMS-PCR (You et al., 2008).

DNA Isolation and PCR Conditions for Allele-Specific Primer

The genomic DNA of RILs used for BSR-Seq was extracted from the leaves of young seedlings using a cetyltrimethylammonium bromide (CTAB) protocol (Saghai-Maroof et al., 1984). The DNA pellet was vacuum dried, dissolved in DNase-free water, and stored at −20°C. A target-specific tetra-primer ARMS PCR amplification of all the resistant and susceptible bulks along with the parents was performed. The PCR was performed in a total volume of 10 μl reaction mixture, containing 30 ng DNA, 1X PCR buffer [75 mM Tris-HCl (pH9.0), 50 mM KCl, 20 mM (NH4)2SO4], 0.33 pmol of each outer primer (forward and reverse), 0.5 pmol of inner forward primer, 0.83 pmol of inner reverse primer, 2.5 mM MgCl2, 0.125 mM of each dNTPs, 1U of Taq Polymerase (3B DNA polymerase, 3B Black Bio Biotech India Ltd.). PCR reactions were performed in an Agilent SureCycler 8800 using the following program: initial denaturation at 95°C for 5 min, 38 cycles consisting of denaturation at 95°C for 1 min, annealing at 62°C for 90 s, extension at 72°C for 2 min, followed by a final extension at 72°C for 10 min. The PCR product was resolved on 3.5% agarose gel and stained with ethidium bromide to visualize.

Functional Identification and Annotation of Transcripts

To perform the functional analysis of the transcripts, Blast2GO v 2.5 was used (Conesa et al., 2005). It is a Gene Ontology–based annotation tool and was found effective in the functional characterization of sequence data (Conesa and Götz, 2008). For functional characterization of the transcripts, we performed NCBI-BLASTX analysis using transcripts homologous to annotated proteins in the nr database with the criterion of E-value (threshold of 1e-03) and alignment size (threshold length 33). Furthermore, the transcript sequences were categorized for gene ontology (i.e., GO terms) into three groups: molecular function, biological process, and cellular component. The pathways for the selected transcripts were also delineated using the Kyoto encyclopaedia of genes and genomes (KEGG) database.

Results

Phenotyping of RIL Population and Construction of Bulk Samples

In all environments, the AUDPC of the resistant parent YS#24 was consistently and considerably higher than that of the susceptible parent YS#58 (Table 1). The mean AUDPC values of the resistant (YS#24) and susceptible parents (YS#58) were 336.29 and 820.55, respectively, over the environment (Table 1). Based on the pooled data, the RIL population for SB AUDPC revealed considerable phenotypic variation, ranging from 231.70 to 836.80. The RILs also exhibited a high coefficient of variation (CV) for AUDPC across environments, ranging from 8.80% (BHU14) to 16.92% (UBKV14), whereas the broad-sense heritability (h 2) values of AUDPC were lowest at BISA14 (0.65) and highest at BHU14 (0.87). Therefore, resistant and susceptible bulks were constituted by taking the samples independently using extreme phenotypes (resistant and susceptible) from both tails as illustrated in Figure 1.

TABLE 1

Disease response to SB of parents and RILs measured in different environments.

Env.	AUDPC
	YS#24	YS#58	RILs
	Mean ± SD	Mean ± SD	Range	Mean ± SD	h ²	CV
BHU14	288.90 ± 53.25	796.50 ± 81.52	142.20–906.80	583.30 ± 141.12	0.87	8.80
BHU15	401.50 ± 96.46	1062.30 ± 61.10	235.80–1016.00	701.00 ± 174.00	0.81	11.41
BHU16	360.90 ± 18.75	896.70 ± 32.89	145.20–925.40	610.00 ± 172.00	0.83	11.02
BISA14	229.80 ± 8.43	545.00 ± 39.84	219.00–588.20	424.00 ± 77.18	0.65	11.97
BISA15	330.50 ± 9.16	680.60 ± 21.38	248.40–846.90	546.39 ± 127.40	0.83	10.05
UBKV14	321.40 ± 18.58	886.00 ± 18.43	161.90–928.80	469.80 ± 173.59	0.81	16.92
UBKV15	431.80 ± 39.72	966.30 ± 39.73	192.30–1085.80	546.50 ± 178.21	0.77	16.55
Over Env	336.29 ± 68.07	820.55 ± 183.28	231.70–836.84	554.00 ± 183.60	0.92	13.61

Env, environment; AUDPC, area under disease progress curve; h2, broad sense heritability; SD, standard deviation; CV, coefficient of variation.

Disease response to SB of parents and RILs measured in different environments. Env, environment; AUDPC, area under disease progress curve; h2, broad sense heritability; SD, standard deviation; CV, coefficient of variation.

Transcriptome Alignment and Variant Identification

The bulk samples utilized for BSR-Seq produced 429.40 million raw reads, containing 32,634.40 million base pairs. The number of raw reads in the sequenced samples ranged from 30.40 million (7.08%) in the resistant bulk-3 to 171.70 million (39.99%) in the susceptible parent (Table 2). The highest mapping of reads to the reference genome was obtained for the susceptible parent (171.70 million reads), followed by susceptible bulk-3 (43.90 million reads), resistant parent (42.50 million reads), resistant bulk-1 (37.90 million reads), and susceptible bulk-2 (36.50 million reads), and the lowest values were obtained for resistant bulk-3 (30.40 million reads) (Figure 2A). All the raw sequence reads have been deposited at NCBI, and the provisional Sequence Reads Archive (SRA) identifiers were obtained under the accession codes SRR5948908.

TABLE 2

Number of reads generated, alignment percentage, and number of SNPs in parents, resistant bulk, and susceptible bulk of the “YS#24 × YS#58” in wheat.

Samples used for BSR-seq	Number of reads (76 bp/read pair)	% of total number of reads	Number of base pairs	Percent alignment	Number of SNPs
Resistant parent	42.50 × 10⁶	9.90	3230.0 × 10⁶	79.11%	843,836
Susceptible parent	171.70 × 10⁶	39.99	13,049.2 × 10⁶	79.55%	656,548
Resistant bulk-1	37.90 × 10⁶	8.83	2880.4 × 10⁶	79.07%	629,129
Resistant bulk-2	35.10× 10⁶	8.17	2667.6 × 10⁶	79.01%	669,279
Resistant bulk -3	30.40 × 10⁶	7.08	2310.4 × 10⁶	79.27%	694,316
Susceptible bulk-1	31.40 × 10⁶	7.31	2386.4 × 10⁶	80.38%	535,390
Susceptible bulk-2	36.50 × 10⁶	8.50	2774.0 × 10⁶	79.43%	635,868
Susceptible bulk-3	43.90 × 10⁶	10.22	3336.4 × 10⁶	80.09%	613,480
Total	429.40× 10⁶	100	32,634.4 × 10⁶	—	5277846

FIGURE 2

RNA-Seq data study in parents and bulk samples. (A) The number of raw sequence reads (76 bp/read pair) generated in resistant and susceptible parents and bulk samples. (B) The percentage alignment of the read to the wheat reference sequence. Susceptible bulk-1 shows the highest alignment. BWA was used to align the reads. (C) The total number of SNPs obtained in each sequenced sample and the maximum number of SNPs present in the resistant parent.

Number of reads generated, alignment percentage, and number of SNPs in parents, resistant bulk, and susceptible bulk of the “YS#24 × YS#58” in wheat. RNA-Seq data study in parents and bulk samples. (A) The number of raw sequence reads (76 bp/read pair) generated in resistant and susceptible parents and bulk samples. (B) The percentage alignment of the read to the wheat reference sequence. Susceptible bulk-1 shows the highest alignment. BWA was used to align the reads. (C) The total number of SNPs obtained in each sequenced sample and the maximum number of SNPs present in the resistant parent. Each sample yielded an average of 80% high-quality (free of adaptor contamination and low-quality areas) RNA-Seq reads that were matched to the wheat reference transcriptome. Susceptible bulk-1 had the highest percentage alignment (80.38% with 535390 SNPs), followed by susceptible bulk-3 (80.09% with 613480 SNPs), and resistant bulk-1 had the lowest percentage alignment (79.01% with 669279 SNPs) (Table 2). The resistant parent had the highest number of SNPs (843836), whereas the minimum number (535390) was in the susceptible bulk-1 with the highest percentage of alignment ( Figures 2B,C ).

BFR and Polymorphic SNPs Associated With SB Resistance

A total of 1,379,122 SNPs were identified on 91,855 transcripts of wheat samples used in the present investigation (Table 3). The putative SNPs linked to SB were selected based on SNPs with BFR >6 in all three bulk samples, i.e., 3666 SNPs present on 1837 transcripts in Bulk-1 (S-bulk1: R-bulk1), 3635 SNPs on 2056 transcripts in Bulk-2 (S-bulk2: R-bulk2), and 2130 SNPs on 1443 transcripts in Bulk-3 (S-bulk3: R-bulk3). A total of 7860 SNPs with >6 BFR were found across all bulks (Bulk-1: Bulk-2: Bulk-3) on 3950 transcripts. Out of 7860 SNPs, only 1055 SNPs were present on 506 transcripts detected as homozygous and polymorphic between parents (Table 3). The transcripts carrying homozygous and polymorphic SNPs with other details are listed in Supplementary Table S1.

TABLE 3

Number of transcripts containing SNPs having BFR >6 in bulked samples.

Description of samples	SNPs count	Number of transcripts
Raw SNP count	1379122	91855
Bulk 1 BFR >6	3666	1837
Bulk 2 BFR >6	3635	2056
Bulk 3 BFR >6	2130	1443
Bulk total BFR >6	7860	3950
Bulk total with parental polymorphism	1055	506

Number of transcripts containing SNPs having BFR >6 in bulked samples.

Distribution of Polymorphic SNPs for SB in Wheat Genome

The distribution of trait-linked SNP markers in the A, B, and D genomes for each homoeologous group and percentage of the total number of SNPs is given in Table 4. The 1055 polymorphic SNP markers, bar plotted on 21 wheat chromosomes, formed seven unevenly distributed homoeologous groups (Figure 3). The highest number of SNP markers were identified in the B genome (532 SNP; 50.42%), followed by A (311 SNP; 29.46%) and D (212 SNP; 20.09%). The number of SNPs per linkage group ranged from 16 (2D) to 198 (5B) (Figure 4). The homologous group 5 had the highest markers (313 SNP, 29.67%), followed by group 3 (207 SNP, 19.60%), and group 4 had the lowest (79 SNP, 7.49%) (Table 4).

TABLE 4

Distribution of polymorphic SNPs markers of SB with BFR >6 across the A, B, and D genomes of wheat.

Chromosome	Number of SNPs	Number of transcripts	% of total number of SNPs	Number of SNPs in homoeologous group	Percentage
1A	30	14	2.84	90	8.53
1B	36	17	3.41
1D	24	10	2.27
2A	63	32	5.97	135	12.79
2B	56	36	5.30
2D	16	11	1.52
3A	49	16	4.64	207	19.62
3B	136	67	12.89
3D	22	13	2.09
4A	34	22	3.22	79	7.49
4B	27	22	2.56
4D	18	14	1.70
5A	50	26	4.74	313	29.67
5B	198	74	18.77
5D	65	34	6.16
6A	44	14	4.17	126	11.94
6B	53	18	5.02
6D	29	13	2.75
7A	41	27	3.88	105	9.95
7B	26	8	2.46
7D	38	18	3.60
Average	50.23	24.09	4.76	150.71	14.29
Total	1055	506	100	1055	100

FIGURE 3

Bar plot showing the densities of polymorphic SNPs marker with bulk frequency ratio greater than 6 (BFR >6) on the 21 wheat chromosomes of the cross “YS#24 × YS#58” RIL population. The locations of the SNPs were determined by the best alignment to the wheat genome with the transcript containing the SNPs.

FIGURE 4

Chromosomal distribution of SB-associated transcripts (blue bar) and polymorphic SNPs (orange bar) on wheat chromosomes with a bulk frequency ratio greater than 6. The figure shows that the 5B chromosome has the maximum number of SNPs, followed by the 3B.

Distribution of polymorphic SNPs markers of SB with BFR >6 across the A, B, and D genomes of wheat. Bar plot showing the densities of polymorphic SNPs marker with bulk frequency ratio greater than 6 (BFR >6) on the 21 wheat chromosomes of the cross “YS#24 × YS#58” RIL population. The locations of the SNPs were determined by the best alignment to the wheat genome with the transcript containing the SNPs. Chromosomal distribution of SB-associated transcripts (blue bar) and polymorphic SNPs (orange bar) on wheat chromosomes with a bulk frequency ratio greater than 6. The figure shows that the 5B chromosome has the maximum number of SNPs, followed by the 3B.

Analysis of Polymorphic SNPs on Chromosomes 3B and 5B

The maximum number of polymorphic SNP with BFR >6 was found on chromosome 5B (198 SNP), followed by 3B (136 SNP) (Figure 4). Thus, a total of 334 SNPs of 3B and 5B chromosomes were present on 142 transcripts, out of which 60 SNPs were found only on five transcripts. Out of these five, one transcript of the 3B chromosome had eight SNPs, and four transcripts of the 5B chromosome had 52 SNPs (Table 5). Among the four transcripts of 5B, a maximum of 27 SNPs were present on transcript gnl/UG/Ta#S61812294, followed by transcripts gnl/UG/Ta#S17985740 (19 SNPs) and gnl/UG/Ta#S61830716 (4 SNPs), while transcript gnl/UG/Ta#S65715070 had the lowest number (2 SNPs) (Table 5). The chromosomal distribution of 60 polymorphic SNP present on five different transcripts of the 3B and 5B chromosomes associated with SB resistance is shown in Figure 5, which indicates their relative position.

TABLE 5

Transcripts of 3B and 5B chromosomes having SNPs markers of SB resistance with BFR >6.

Chromosome	No. of transcripts	Total number of SNPs	Transcript ID	Number of SNPs
3B	1	8	gnl/UG/Ta#S61799095	8
5B	4	52	gnl/UG/Ta#S17985740	19
			gnl/UG/Ta#S61812294	27
			gnl/UG/Ta#S61830716	4
			gnl/UG/Ta#S65715070	2

FIGURE 5

The distribution of 60 polymorphic SNPs linked to SB resistance on the wheat chromosome derived from the RIL population. The eight SNPs present on 3B and 52 SNPs on 5B chromosomes indicate their relative position.

Transcripts of 3B and 5B chromosomes having SNPs markers of SB resistance with BFR >6. The distribution of 60 polymorphic SNPs linked to SB resistance on the wheat chromosome derived from the RIL population. The eight SNPs present on 3B and 52 SNPs on 5B chromosomes indicate their relative position.

Development of an Assay From Allele-Specific Primers in Bulk Samples

In this study, allele-specific tetra-primer ARMS PCR primers were designed to develop an assay for SB resistance. The SNPs on transcript gnl/UG/Ta#S61799095 of 3B chromosomes and transcript gnl/UG/Ta#S61830716 and gnl/UG/Ta#S17985740 of chromosome 5B were used to design the tetra-primers for ARMS PCR (Supplementary Table S2). A primer (Ta_S61830716_1262_3) from chromosome 5B amplified 142 bp fragments in both the resistant parent and resistant bulk-1, whereas a 142 bp fragment was not amplified in the susceptible parent, but in a few samples of susceptible bulk-1, a faint band appeared in lane 10–12 (Figures 6A,B). The primer (Ta_S61830716_1262_3) showed clear differentiation between resistant and susceptible genotypes.

FIGURE 6

(A) Lane 1: 100 bp ladder, Lane 2: susceptible parent, Lane 3: resistant parent, Lane 4–13: resistant lines. The resistant lines amplify the inner allele-specific primer region with a product size of 142 bp, which is absent in the susceptible parent. (B) Lane 1: 100bp ladder, Lane 2: susceptible parent, Lane 3: resistant parent, Lane 4–13: susceptible lines. The inner allele-specific primer region amplifies in the resistant parent at 142 bp and is absent in the susceptible lines except for lanes 10–12.

Pathway Enrichment Analysis Using GO and KEGG

The transcript of 3B chromosomes (gnl/UG/Ta#S61799095) shared homology with acetyl-CoA acetyltransferase, whereas two transcripts of 5B (gnl/UG/Ta#S17985740 and gnl/UG/Ta#S61830716) shared homology with proteinase/protease and the remaining two with phospholipase C1 and exohydrolase proteins (Supplementary Table S3). A total of 346 GO terms were identified for the selected transcripts and further categorized into molecular function, biological process, and cellular component (Supplementary Figure S2). The maximum number of GO terms fall into the molecular function category (231 terms), which were further grouped into hydrolase activity, transferase activity, peptidase activity, and catalytic activity. There were 70 terms associated with the biological process, which fall into the category of protein catabolic process, carbohydrate metabolic process, metabolic process, phospholipid catabolic process, and fatty acid beta-oxidation (Supplementary Figure S2). A few terms were associated with cellular components, viz., peroxisome, membrane, an integral component of the membrane, lysosome, and extracellular space. Furthermore, as shown in Supplementary Figure S3, the most highly identified pathways in the study were fatty acid metabolism, valine, leucine, isoleucine metabolism, and benzoate degradation.

Discussion

In the 21st century, wheat is placed as one of the world’s most productive and essential crops (Curtis and Halford, 2014), contributing significantly to global food security by providing nutrition for 35% of the world population (FAO, 2018; Tomar et al., 2021). The continuous threat of SB fungus in major wheat-growing areas of the world results in a significant yield loss that affects future food security. Wheat cultivars with host resistance are employed to minimize crop yield loss, which is the most effective and economical way to manage SB (Gupta et al., 2018). However, the conventional breeding approaches to develop cultivars with disease resistance have certain limitations due to the genetic complexity of wheat (Zhang et al., 2020). Several molecular breeding tools and techniques have been developed to study the genetics and genomics of plants having simple and complex genomes. For a significant period, researchers primarily relied on SSR markers, but their sparse distribution across the genome rendered them less ideal for a large-scale genotyping assay (Pootakham et al., 2014). Recently, SNP markers have become increasingly popular in molecular genetics and breeding studies due to their abundance. However, SNP discovery in organisms with highly repetitive DNA and polyploid nature, such as wheat, remains difficult (Ganal et al., 2009; Trick et al., 2012). The SNP discovery via transcriptome sequencing is an attractive strategy to reduce genome complexity in wheat (Westermann et al., 2012; Edae and Rouse, 2019). To reduce the complexity of the data, we focused on sequencing the wheat transcriptome using RNA-Seq instead of genomic DNA. Due to the introduction of next-generation sequencing, combining BSA and RNA-Seq (BSR-Seq) reduces the cost remarkably when repetitive sequences are enriched in the genome and enabled a rapid and detailed understanding of a near-complete set of transcripts and SNPs linked to the trait (Garg et al., 2011; Zou et al., 2016). In this context, the RIL population was used for making bulks with extreme phenotypes for SB disease. The RIL population’s AUDPC showed a continuous distribution across environments, indicating that SB resistance components behave like quantitative traits (Figure 1) as quantitative traits generally show a continuous phenotypic distribution. The estimated broad-sense heritability for AUDPC was high (0.92) over the environments, indicating good reproducibility of the phenotypic data (Table 1). The high heritability over environments revealed the genetic control of AUDPC. The numbers of bulks for pooling were selected in multiples (in replicate) independently from each of the two tails by following Navabi et al. (2009) for small- to moderate-sized populations; the optimum tail size should be 20%–30% of the entire population. The replicated number of bulks for pooling provides high accuracy in SNP predictions by reducing false positives, increasing the likelihood of obtaining reliable markers by many orders of magnitude. The parents were used to define the SNPs rather than to make quantitative estimates; therefore, replications to detect SNPs were omitted (Ramirez-Gonzalez et al., 2015). In diploid crops of the medium genome, approximately 57–65 million reads were generated and successfully achieved higher (>90%) genome coverage in cucumber (Lu et al., 2014) and pigeon pea (Singh et al., 2016). In this study, 429.40 million reads were generated for the bulk samples; these excess reads provided better coverage to the genome. In a bulked sample of groundnut (allotetraploid) for rust and late leaf spot resistance, 423.70 million reads were generated by Pandey et al. (2017). Because bread wheat is hexaploid, having three subgenomes (A, B, and D), more sequence data should be generated than the other diploid species, to achieve maximum genome coverage and read depth. Hence, the generated sequencing data with maximum genome coverage and read depth allowed for detailed sequence analysis. The alignment of reads to the reference transcriptome revealed that the maximum number of SNPs were present in the resistant parent, whereas the minimum was in the susceptible bulk-1 with the highest percentage of alignment. The higher percentage alignment indicated that the sample was closer to the reference being aligned. In the present study, a total of 1,379,122 genome-wide, high-quality SNPs were identified in parents and resistant and susceptible bulks of wheat after sequence alignment of filtered reads, and only 1055 SNPs were detected as polymorphic between the parents using BFR >6, i.e., associated with SB resistance. Among the 1055 SNPs, 198 (18.77%) and 136 (12.89%) were mapped on chromosomes 5B and 3B, suggesting the SB resistance gene might be located on chromosome 5B or 3B. Among the three genomes, there were polymorphic SNPs present in the B genome (50.42%) compared with A (29.46%) and D (20.09%) (Table 4). A similar result was reported by Kumar et al. (2009) in wheat when parents were screened with SSR markers. However, the density of polymorphic SNPs throughout the B genome was not uniform; only the 5B and 3B chromosomes were found saturated considerably with SNP markers having a magnitude of BFR >6. Kumar et al. (2009, 2010) also report two QTLs for SB resistance on the 5B chromosome in two different populations that explained around 38.62% and 10.70% of the phenotypic variation, respectively. In the recent past, three major SB resistance genes, Sb2 (Kumar et al., 2015), Sb3 (Lu et al., 2016), and Sb4 (Zhang et al., 2020), were identified in the B genome of wheat. Thus, the previous (SSR, SNP) and present SNP studies indicate the effectiveness of the B genome for SB resistance, especially 5, 4, and 3B. The average number of SNPs mapped to each linkage group was 50.23, whereas the highest number of markers was mapped to 5B. In this study, the D genome was found less saturated compared with A and B because a lesser number of polymorphic SNP markers (>6 BFR) was identified on it, which is commensurate with microsatellite markers reported in wheat (Ganal and Röder, 2007). However, the first resistance gene (Sb1) for SB was reported on the 7D chromosome by Lillemo et al. (2013); as a result, the importance of the D genome on a saturation basis cannot be overstated. It appears to indicate that, despite the lesser number of SNPs identified on the D genome, we should pick only those SNPs having the highest magnitude of BFR. Because the higher the BFR, the more likely the SNP is genetically linked to the R-gene, the putative SNPs with enriched BFR can then be converted into high-throughput SNP assays and genotyped across the individuals that were used to assemble the bulk (Ramirez-Gonzalez et al., 2015). Finally, the key issue is to move from in silico SNPs into a high-throughput SNP assay with allele-specific markers that can score on agarose gel electrophoresis. The allelic-specific primer XTaSb_S61830716_1262_3, designed from SNPs present on transcript id Ta_S61830716, was found polymorphic in parents and validated in each line of resistant bulk-1, but only in 70% lines of the susceptible bulk-1 and was named as a marker of SB resistance. The marker XTaSb_S61830716_1262_3 is characterized by a few individuals of susceptible bulks as resistant but is otherwise susceptible phenotypically, limiting the efficiency of the marker. Because plant immunity is regulated by the expression of pathogenesis-related genes (PRs), transcription is de-repressed by pathogen-induced signals. The studied transcripts here are shown to have homology with pathogenesis-related genes. The transcript gnl/UG/Ta#S61799095 of 3B identified in this study showed homology with acetyl-CoA acetyltransferase, which is conferred for its pathogenesis-related activity in rice (Zhong et al., 2015). It is a vital starting molecule for the biosynthesis of various metabolites. The transcripts of 5B (gnl/UG/Ta#S17985740 and gnl/UG/Ta#S61830716) also show homology to cysteine proteinase and proteases. Because different families of proteases manage the extracellular defense, which contributes to effector-triggered immunity (ETI), others help in the induction of microbe-associated molecular pattern–triggered immunity. A few proteases/proteinases are associated with systemic acquired resistance and the establishment of induced systemic resistance (Kim and Hwang, 2015; Balakireva and Zamyatnin, 2018). The other transcripts of 5B (gnl/UG/Ta#S61812294 and gnl/UG/Ta#S65715070) were found homologous to phospholipase and exohydrolase. It is reported that phospholipase affects the translocation of nonexpressor pathogenesis-related (NPR) proteins to the nucleus in Arabidopsis thaliana. The structural changes and localization of this protein in plant cells are responsible for the plant defense signaling (Kinkema et al., 2000; Janda et al., 2015). The major pathways identified in the study are fatty acid degradation and valine, leucine, and isoleucine degradation, along with other pathways. Fatty acid degradation is the process by which fatty acids break down into their metabolites, which finally generates acetyl-CoA, the entry molecule for the citric acid cycle. In the case of Brassica napus, when infected with the pathogen, it significantly enriched in fatty acid oxidation activities in the upregulated gene sets on both susceptible and resistant lines (Chittem et al., 2020). Thus, the study of five wheat transcripts shows that they are closely related to genes involved in pathogenesis and metabolism, suggesting their prominent role in the plant defense mechanism.

Conclusion

In this study, BSA combined with RNA-Seq (BSR-Seq) appears to be useful for SNP discovery in bread wheat for SB resistance, later used to develop a new marker assay. The marker XTaSb_S61830716_1262_3 could be productive in screening wheat germplasm for SB resistance. In future projects, the SNPs discovered for SB resistance across the wheat genome will be visualized by converting them into Kompetitive Allele Specific PCR (KASP) markers for establishing a high-throughput genotyping platform, useful for MAS of the target genes. The newly developed SNPs marker could also be converted to a qPCR-based assay for large-scale application in crop improvement and study of the molecular biology of SB resistance.

49 in total

1. Nuclear localization of NPR1 is required for activation of PR gene expression.

Authors: M Kinkema; W Fan; X Dong
Journal: Plant Cell Date: 2000-12 Impact factor: 11.277

Review 2. SNP identification in crop plants.

Authors: Martin W Ganal; Thomas Altmann; Marion S Röder
Journal: Curr Opin Plant Biol Date: 2009-01-29 Impact factor: 7.834

3. A rare single nucleotide variant in Pm5e confers powdery mildew resistance in common wheat.

Authors: Jingzhong Xie; Guanghao Guo; Yong Wang; Tiezhu Hu; Lili Wang; Jingting Li; Dan Qiu; Yahui Li; Qiuhong Wu; Ping Lu; Yongxing Chen; Lingli Dong; Miaomiao Li; Huaizhi Zhang; Panpan Zhang; Keyu Zhu; Beibei Li; Karin R Deal; Naxin Huo; Yan Zhang; Ming-Cheng Luo; Sanzhen Liu; Yong Qiang Gu; Hongjie Li; Zhiyong Liu
Journal: New Phytol Date: 2020-07-26 Impact factor: 10.151

4. QTL for spot blotch resistance in bread wheat line Saar co-locate to the biotrophic disease resistance loci Lr34 and Lr46.

Authors: Morten Lillemo; Arun K Joshi; Ravindra Prasad; Ramesh Chand; Ravi P Singh
Journal: Theor Appl Genet Date: 2012-11-09 Impact factor: 5.699

5. Food security: the challenge of increasing wheat yield and the importance of not compromising food safety.

Authors: T Curtis; N G Halford
Journal: Ann Appl Biol Date: 2014-02-21 Impact factor: 2.750

6. Genetic Dissection of Novel QTLs for Resistance to Leaf Spots and Tomato Spotted Wilt Virus in Peanut (Arachis hypogaea L.).

Authors: Manish K Pandey; Hui Wang; Pawan Khera; Manish K Vishwakarma; Sandip M Kale; Albert K Culbreath; C Corley Holbrook; Xingjun Wang; Rajeev K Varshney; Baozhu Guo
Journal: Front Plant Sci Date: 2017-01-31 Impact factor: 5.753

7. Bulked segregant analysis RNA-seq (BSR-Seq) validated a stem resistance locus in Aegilops umbellulata, a wild relative of wheat.

Authors: Erena A Edae; Matthew N Rouse
Journal: PLoS One Date: 2019-09-20 Impact factor: 3.240

8. BatchPrimer3: a high throughput web application for PCR and sequencing primer design.

Authors: Frank M You; Naxin Huo; Yong Qiang Gu; Ming-Cheng Luo; Yaqin Ma; Dave Hane; Gerard R Lazo; Jan Dvorak; Olin D Anderson
Journal: BMC Bioinformatics Date: 2008-05-29 Impact factor: 3.169

Review 9. Bulked sample analysis in genetics, genomics and crop improvement.

Authors: Cheng Zou; Pingxi Wang; Yunbi Xu
Journal: Plant Biotechnol J Date: 2016-04-28 Impact factor: 9.803

Review 10. Harnessing Diversity in Wheat to Enhance Grain Yield, Climate Resilience, Disease and Insect Pest Resistance and Nutrition Through Conventional and Modern Breeding Approaches.

Authors: Suchismita Mondal; Jessica E Rutkoski; Govindan Velu; Pawan K Singh; Leonardo A Crespo-Herrera; Carlos Guzmán; Sridhar Bhavani; Caixia Lan; Xinyao He; Ravi P Singh
Journal: Front Plant Sci Date: 2016-07-06 Impact factor: 5.753