| Literature DB >> 36003337 |
Aasim Majeed1, Prerna Johar2, Aamir Raina3, R K Salgotra2, Xianzhong Feng4, Javaid Akhter Bhat4,5.
Abstract
Most plant traits are governed by polygenes including both major and minor genes. Linkage mapping and positional cloning have contributed greatly to mapping genomic loci controlling important traits in crop species. However, they are low-throughput, time-consuming, and have low resolution due to which their efficiency in crop breeding is reduced. In this regard, the bulk segregant analysis sequencing (BSA-seq) and its related approaches, viz., quantitative trait locus (QTL)-seq, bulk segregant RNA-Seq (BSR)-seq, and MutMap, have emerged as efficient methods to identify the genomic loci/QTLs controlling specific traits at high resolution, accuracy, reduced time span, and in a high-throughput manner. These approaches combine BSA with next-generation sequencing (NGS) and enable the rapid identification of genetic loci for qualitative and quantitative assessments. Many previous studies have shown the successful identification of the genetic loci for different plant traits using BSA-seq and its related approaches, as discussed in the text with details. However, the efficiency and accuracy of the BSA-seq depend upon factors like sequencing depth and coverage, which enhance the sequencing cost. Recently, the rapid reduction in the cost of NGS together with the expected cost reduction of third-generation sequencing in the future has further increased the accuracy and commercial applicability of these approaches in crop improvement programs. This review article provides an overview of BSA-seq and its related approaches in crop breeding together with their merits and challenges in trait mapping.Entities:
Keywords: MutMap; QTL-seq; crop breeding; fine-mapping; next-generation sequencing
Year: 2022 PMID: 36003337 PMCID: PMC9393495 DOI: 10.3389/fgene.2022.944501
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.772
FIGURE 1Representation of BSA-seq and general data analysis approach for marker trait associations. (A) depicts the creation of opposite bulks and their sequencing. (B) depicts variant identification and their association with the trait. This figure was created through Biorender https://biorender.com.
Key characteristics of different statistical approaches and pipelines used to analyze BSA-seq data.
| Method | Key statistics used | Citations | Limitation | Advantage | |
|---|---|---|---|---|---|
| 1 | G-statistic | G-test | 200 | Based on estimating the G′ threshold; the method for calculating FDR for multiple testing has not been concretely devised; significantly affected by sequencing depth and is less suitable under low sequencing depth; no estimation of confidence intervals | Simplicity |
| 2 | MULTIPOOL | Probabilistic multi-locus dynamic Bayesian network model | 70 | Based on estimating the LOD threshold, judging the significance of signals | Non-reliance on a particular aligner and SNP calling strategy |
| 3 | QTL-Seq | SNP index, ∆ SNP index | 780 | Significance threshold estimated in QTL-seq is inappropriate; no estimation of confidence interval | Simplicity and intuition |
| 4 | EXPLoRA | Hidden Markov model (HMM), Linkage disequilibrium (LD) | 45 | No multiple testing correction, sometimes maps a single QTL as two or more adjacent QTLs, no confidence interval estimation | Robust even at a low signal-to-noise ratio |
| 5 | Hidden Markov model | HMM | 9 | Does not take into account that co-segregation of SNPs is affected by the distance between them | |
| 6 | Non-homogeneous hidden Markov model | HMM | 16 | Takes the effect of distance between SNPs during co-segregation into account | |
| 7 | QTG-Seq | smooth LOD test, Euclidean distance, and G-statistic | 49 | Large pool size and high sequencing coverage required | Time- and cost-saving strategy for fine-mapping, suitable for minor-effect QTLs, mapping resolution up to the gene level, and requires only four generations from the first cross of any parent lines for fine-mapping |
| 8 | PyBSASeq | Fischer’s exact test, ∆ SNP index or G-statistic, significant | 4 | No estimation of confidence intervals for detected QTLs | Simple and effective, calculates significance, can detect SNP-trait associations at lower sequencing coverage so can reduce up to 80% sequencing cost, high sensitivity |
| 9 | Block regression mapping | Δf or ∆ SNP index, Δf curve LOESS analysis, block regression, central limit theorem, and Bonferroni correction | 10 | Not apparent yet | Calculates significance, uses multiple testing, estimates confidence intervals |
| 10 | QTLseqr | ∆ SNP index and G-statistic | 93 | Not apparent | Calculates significance, uses multiple testing, and options for better visualization |
Details of the studies utilizing BSA-Seq for the elucidation of trait-specific genomic regions in different crop species.
| S.No. | Species | Pop type | Pop size | Pool size | Sequencing strategy | Number of SNPs | Bioinformatics approach used | Trait | Key findings | Reference |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Rice | NIL-F2 | 176 | 35 individuals from extreme phenotypes | Whole-genome sequencing (WGS) | 455,262 | QTL-seq method | Grain length and weight | One major QTL, 15–20 Mb on chr 5, for grain length and weight identified |
|
| 2 | Rice | F3 | 10,800 | 385 tolerant pools and 430 sensitive pools | Paired-end Illumina sequencing on Hiseq 2000 platform | 450,000 | G′ statistic method | Cold tolerance | Six QTLs were mapped on chromosomes 1, 2, 5, 8, and 10 |
|
| 3 | Rice | F2 | 940 | 20 extreme phenotypes for heading time (HT) and plant height (PH) | WGS on the Illumina HiSeq X Ten platform | 511,393 for HT and 543,319 for PH | ΔSNP-index method | Heading time and plant height | Four QTLs for HT on chromosomes 3, 6, 9 and 10. Three QTLs for PH on chromosomes 1 and 8 |
|
| 4 | Rice | RILs | 190 | 20 extreme phenotypes were used for bulking | Paired-end sequencing using the Illumina MiSeq platform | 184,917 | Euclidean distance and ΔSNP-index method | Cold tolerance | One major QTL on chr6 was identified, which spans 1.81 Mb and harbors 269 genes |
|
| 5 | Rice | RILs | 151 | ---------- | Paired-end sequencing using Illumina HiSeq 2500 | 116,993 | Euclidean distance and ΔSNP-index method | Grain shape | One major QTL on chr9 was identified, which spans 0.8 Mb and harbors 101 genes |
|
| 6 | Cucumber | F2 | 258 | 10 individuals from extreme phenotypes | Illumina paired-end sequencing | 234,393 | Δ (SNP index) | Early flowering | One major QTL around 890 kb on chr 1 for early flowering. The gene Csa1G651710 was identified as the main flowering switch |
|
| 7 | Cucumber | F3 | 135 | 15 resistant and 15 susceptible | Paired-end sequencing using Illumina HiSeq 2000 | 933,846 and 915,524 for susceptible and resistant bulk | ΔSNP-index method | Vein yellowing virus resistance | A unique region in chromosome 5 containing 24 annotated genes was identified for resistance |
|
| 8 | Maize | RILs | 224 | 46 more extreme plants formed two pools | Paired-end sequencing using Illumina HiSeq 2000 | 3,301,371 | Customized R-script | Flowering time and plant height | Two major QTLs found for FT on chr 5 and chr 8 were 10.8 Mb and 18.9 Mb in size, respectively. Two major QTLs on chr 4 and chr 6 found for PH were 21.2 Mb and 9.7 Mb in size, respectively |
|
| 9 | Maize | ILs | 400 | 10 tolerant and 10 sensitive extreme phenotypes | BSR-seq | 114,580 | Bayes’ theorem | Waterlogging | In tolerant and sensitive bulks, 354 and 1,094 genes were differentially expressed, respectively. GRMZM2G055704 on chromosome 1 was identified as a candidate gene responsive to waterlogging |
|
| 10 | Wheat | RILs | 244 | six low- and six high-TGW | SLAF-seq | 132,530 | ΔSNP-index method | 1,000 grain weight (TGW) | One candidate gene associated with TGW was identified on chr 7A |
|
| 11 | Hessian fly (HF), a wheat galling parasite | Non-structured Louisiana field population | --- | 23 virulent and 19 avirulent | WGS | 1.5 million | Fisher’s exact test using PoPoolation2 | Hessian fly (HF) virulence to wheat R genes H6, Hdic, and H5 | One 1.3-Mb region for HF virulence was mapped to HF autosome 2 |
|
| 12 | Chickpea | F4 | 221 | 10 individuals of each low and high seed weight forming two pools | Paired-end WGS using the Illumina HiSeq 2000 platform | 118,321 | Δ (SNP index) | 100 seed weight | One major QTL of 35 kb on chromosome 1 containing six genes |
|
| 13 | Chickpea | RILs | 92 and 139 for two populations | 10 and 14 extreme phenotypes for two populations | WGS | 77,938 in one population and 106,907 in the other | G-statistic and ΔSNP-index method | Ascochyta blight resistance | 17 QTLs identified and mapped on chromosomes Ca1, Ca2, Ca4, Ca6, and Ca7 |
|
| 14 | Tomato | F2 populations | 549 | 10 individuals of each extreme phenotype | Paired-end WGS using the Illumina HiSeq 2000 platform | ---------- | Δ (SNP index) | Fruit weight (FW) and locule number (LC) | Three highly significant and newly mapped FW QTLs on chr 1 and chr 11. 66 candidate genes for FW. Three LC QTLs of low significance |
|
| 15 | Groundnut | RILs | 266 | 25 individuals with extreme phenotypes | Paired-end WGS using Illumina HiSeq 2000 | 259,621 for rust and 243,262 for LS | Δ (SNP index) | Rust and late leaf spot disease | One 3.06-Mb region on the A03 pseudomolecule of A-genome harboring 3,136 SNPs was identified for rust resistance. A 2.98 Mb region on A03 pseudomolecule harboring 66 SNPs was identified for LS resistance |
|
| 1F6 | Groundnut | RILs | 366 | 20 individuals with extreme phenotypes | WGS | 10,759 | ΔSNP-index method | Fresh seed dormancy | Two genomic regions on the B05 and A09 pseudomolecules control seed dormancy |
|
| 17 | Potato | Diploid mapping population | 90 | 10 individuals with extreme phenotypes | Paired-end Illumina HiSeq 2000 | 6,766,8,152,000 | Pearson’s chi-squared test | Steroidal glycoalkaloids (GAs) | One region located on chromosome 1 ranging from 63.1 to 73.5 Mb was found the most confident |
|
| 18 | Pepper | F2 | 249 | 30 individuals with extreme phenotypes | SLAF-seq | 106,848 | Euclidean distance | first flower node | One QTL on chr 12 was detected, followed by 393 high-quality SNP markers associated with FFN |
|
| 19 | Sunflower | F2 and F3 | 300 | 15 individuals with extreme phenotypes | Genotyping-by-sequencing (GBS) | 11,484 | G-statistic using QTLseqr | Broomrape resistance | Two major QTLs on chromosome 3 |
|
| 20 | Rapeseed | BC8 | 965 | 36 individuals with extreme phenotypes | Illumina HiSeq 2000 platform | 1,830,225 | ΔSNP-index method | Plant architecture | Five major QTLs on chromosome 1 |
|
| 21 | Pigeonpea | F2 | 179 | 15 individuals with extreme phenotypes | WGS | 47,429, 46,510, and 54,556 for three different bulk types | ΔSNP-index method | Days to flowering (DTF) | Two significant genomic regions, one on CcLG03 and another on CcLG08 were found controlling DTF |
|