Literature DB >> 35576159

A combinatorial strategy to identify various types of QTLs for quantitative traits using extreme phenotype individuals in an F₂ population.

Pei Li¹, Guo Li¹, Ya-Wen Zhang¹, Jian-Fang Zuo¹, Jin-Yang Liu², Yuan-Ming Zhang³.

Abstract

Theoretical and applied studies demonstrate the difficulty of detecting extremely over-dominant and small-effect genes for quantitative traits via bulked segregant analysis (BSA) in an F2 population. To address this issue, we proposed an integrated strategy for mapping various types of quantitative trait loci (QTLs) for quantitative traits via a combination of BSA and whole-genome sequencing. In this strategy, the numbers of read counts of marker alleles in two extreme pools were used to predict the numbers of read counts of marker genotypes. These observed and predicted numbers were used to construct a new statistic, Gw, for detecting quantitative trait genes (QTGs), and the method was named dQTG-seq1. This method was significantly better than existing BSA methods. If the goal was to identify extremely over-dominant and small-effect genes, another reserved DNA/RNA sample from each extreme phenotype F2 plant was sequenced, and the observed numbers of marker alleles and genotypes were used to calculate Gw to detect QTGs; this method was named dQTG-seq2. In simulated and real rice dataset analyses, dQTG-seq2 could identify many more extremely over-dominant and small-effect genes than BSA and QTL mapping methods. dQTG-seq2 may be extended to other heterogeneous mapping populations. The significance threshold of Gw in this study was determined by permutation experiments. In addition, a handbook for the R software dQTG.seq, which is available at https://cran.r-project.org/web/packages/dQTG.seq/index.html, has been provided in the supplemental materials for the users' convenience. This study provides a new strategy for identifying all types of QTLs for quantitative traits in an F2 population.

Entities: Chemical

Keywords: F(2); bulked segregant analysis; dQTG-seq; extremely over-dominant gene; rice; small-effect gene

Mesh：

Year: 2022 PMID： 35576159 PMCID： PMC9251438 DOI： 10.1016/j.xplc.2022.100319

Source DB: PubMed Journal: Plant Commun ISSN： 2590-3462

Introduction

In the past several decades, many methods have been established to mine elite genes for quantitative traits in animals, plants, and human beings; these include quantitative trait locus (QTL) mapping (Lander and Botstein, 1989), genome-wide association studies (Zhang et al., 2005; Yu et al., 2006), and bulked segregant analysis (BSA) (Giovannoni et al., 1991; Michelmore et al., 1991; Zhang et al., 1994). In real data analysis, it is difficult to identify small-effect and linked genes (Kroymann and Mitchell-Olds, 2005; Mackay et al., 2009; Wang et al., 2016; Wen et al., 2019). In theory, it is also difficult to detect extremely over-dominant genes in BSA (Takagi et al., 2013a; Schneeberger, 2014; Haase et al., 2015; Wang et al., 2019). As we know, quantitative traits are controlled by both a few major genes and a series of polygenes, whereas over-dominant genes can be used to explain heterosis, which is a natural phenomenon. Thus, it is necessary to investigate methods for detecting extremely over-dominant and small-effect genes in BSA (Cockram and Mackay, 2018). Giovannoni et al. (1991) and Michelmore et al. (1991) established the BSA method in the early 1990s to quickly detect the association of a molecular marker with a trait of interest at a relatively low cost. To date, many BSA methods have been developed, and they have been summarized in Li and Xu, 2021. These methods are mainly used to detect genes for qualitative traits, mutants, and quantitative traits. In the early stage, BSA methods were widely used to identify dominant genes for qualitative traits and then increase molecular marker density in each F2 individual to fine-map target genes such as the dominant white locus in chicken (Ruyter-Spira et al., 1997). Recently, microsatellite markers were replaced by single nucleotide polymorphism (SNP) markers in the references of Li et al. (2018), Feng et al. (2019), and Liu et al. (2019). BSA methods were then widely used to identify mutant genes for qualitative traits. Schneeberger et al. (2009) first integrated BSA with whole-genome resequencing in a large pool of mutant F2 plants to rapidly identify recessive/dominant mutant genes based on the frequency of mutant alleles using SHOREmap software. When a smaller number of mutant plants was available, Cuperus et al. (2010) used wild-type SNP enrichment in a pool of mutant F2 plants to identify mutant genes. Austin et al. (2011) used the frequency of wild-type SNP alleles in a pool of mutant F2 plants to locate recessive mutant genes by the next generation mapping method, and Lindner et al. (2012) used the causative SNP/non-causative SNP ratio in a pool of mutant BC2 individuals to clone lethal mutations. For species with large and incomplete reference genomes, CloudMap (Minevich et al., 2012), SNPtrack (Leshchiner et al., 2012), Fishyskeleton (Bowen et al., 2012), MegaMapper (Obholzer et al., 2012), and MMAPPR (Hill et al., 2013) are available. When a mutant is dominant, the reverse mapping and F3 screen strategies of Smith et al. (2016) are available. Abe et al. (2012), Fekih et al. (2013), and Takagi et al. (2013b) used the SNP index in a pool of mutant F2 (M3) plants from a cross between a mutant and its wild type to identify mutant genes with MutMap, MutMap+, and MutMap-Gap, respectively. To simultaneously identify multiple causal mutations, SIMM (Yan et al., 2016) and BSA-seq (Wang et al., 2021) are available. However, these recessive/dominant mutant traits are different from the dominant effects of quantitative traits. Finally, BSA methods are widely used to identify quantitative trait genes (QTGs). As we know, quantitative traits are controlled by a few major genes and a series of polygenes with small genetic effects, making polygene identification difficult (Kroymann and Mitchell-Olds, 2005; Mackay et al., 2009). To identify these polygenes, SHOREmap (Schneeberger et al., 2009), MMAPPR (Hill et al., 2013), MutMap (Abe et al., 2012), MutMap+ (Fekih et al., 2013), and MutMap-Gap (Takagi et al., 2013b) are available. Ehrenreich et al. (2010) used allelic frequency to identify yeast polygenes via BSA-seq. The Z-statistic is also used to detect allelic frequency differences at the target locus in high and low pools (Huang et al., 2012; Benowitz et al., 2019). If several markers in one window (or region) are used to calculate the average Z-statistic, this is the Z′-statistic (Haase et al., 2015). Similarly, a smoothed G′ of the standard G statistic was proposed in Magwene et al. (2011). In addition, a dynamic Bayesian network via MULTI POOL (Edwards and Gifford, 2012) is used to capture the association among nearby loci in a pooled sequencing experiment; a hidden Markov model (HMM) is used to calculate the most probable states via EXPLoRA, indicating genomic regions that may be likely to contain trait-related genes (Duitama et al., 2014; Claesen and Burzykowski, 2015); a nonhomogeneous HMM with a transition matrix is used to detect the association of markers with genes (Ghavidel et al., 2015). All the approaches above make use of high and low pools. To increase the power of QTG detection, Wang et al. (2019) added an additional, middle pool in GradedPool-Seq to identify the rice heterotic QTL GW3p6. If no secondary F2 populations are available, a primary F2 population from a cross between any two homozygous parents may be used. In this case, the ΔSNP index is also used to detect QTLs, and the procedure is termed QTL-seq (Takagi et al., 2013a). Clearly, the results may be affected by genetic background, although the method may be effective in some instances. To address this issue, Zhang et al. (2019) proposed a new design and a smooth-LOD (logarithm of odds) statistic. In analyses of real data, however, it is difficult to detect extremely over-dominant and small-effect genes for quantitative traits (Ehrenreich et al., 2010; Takagi et al., 2013a; Schneeberger, 2014; Haase et al., 2015; Wang et al., 2019). A dominant mutant of a single-gene qualitative trait is different from the dominant effect of a QTL. In theory, the former is a qualitative trait, and the latter is a quantitative trait; they have different genetic structures in the two extreme phenotype pools. The structure of high and low pools is QQ + Qq via qq for a dominant mutant and QQ + qq via Qq for a fully over-dominant (zero additive and non-zero dominance) QTL for a quantitative trait (Haase et al., 2015). Thus, the allelic frequency of a homozygous genotype pool is indistinguishable from the allelic frequency of a heterozygous genotype pool in the fully over-dominant QTL model, and the fully over-dominant QTL becomes undetectable (Supplemental File 1). In the same way, it is difficult to detect extremely over-dominant QTLs (Schneeberger, 2014; Wang et al., 2019). To address the issues above, we propose a combinatorial strategy for identifying various types of QTLs for quantitative traits in the F2 population by a combination of BSA and whole-genome sequencing. Specifically, equally mixed DNA sample of all plants in each extreme pool was sequenced. On the basis of the observed numbers of read counts of marker alleles in the two extreme pools, we propose a method for predicting the numbers of read counts of marker genotypes. The observed allele and predicted genotype numbers were used to construct a new statistic, G to identify marker-trait associations. If the goal was to detect extremely over-dominant and small-effect genes, another reserved DNA/RNA sample from each extreme phenotype individual was sequenced one plant at a time, and the observed numbers of marker alleles and genotypes were used to calculate G to identify marker-trait associations. The former strategy is a new BSA method named dQTG-seq1, and the latter is a new individual segregant analysis named dQTG-seq2, where “d” indicates the dominant effect of the QTG (see materials and methods). Real and simulated datasets were used to analyze the sensitivity and specificity of the new strategy, and we also investigated the best sampling plan for the new strategy.

Results

A combinatorial strategy for mapping all types of QTLs for quantitative traits in an F2 population by a combination of BSA and whole-genome sequencing

The new strategy in this study was proposed to quickly identify QTGs for quantitative traits in an F2 population, especially for extremely over-dominant and small-effect genes (Figure 1). There were six types of F2 designs available. They included an F2 from a cross between any two homozygous parents (F2-I), an immortalized F2 (IMF2) derived from a DH (doubled haploid) or RIL (recombinant inbred line) population (F2-II) (Gardiner et al., 1993; Hua et al., 2003), a secondary F2 from a cross between a mutant and its wild type (F2-III) (Abe et al., 2012), a secondary F2 from a cross between two near-isogenic lines (F2-IV), a secondary F2 from the selfing of residual heterozygous lines of a targeted QTL (F2-V) (Yamanaka et al., 2005), and the improved secondary F2 of Zhang et al. (2019) from the selfing of residual heterozygous lines of a targeted QTL (F2-VI). When one of these F2 populations was constructed, the new strategy was available.

Figure 1

A combinatorial strategy for mapping all types of QTLs for quantitative traits in an F2 population by a combination of BSA and whole-genome sequencing.

A combinatorial strategy for mapping all types of QTLs for quantitative traits in an F2 population by a combination of BSA and whole-genome sequencing. F2-I to F2-V have been described in previous studies; F2-VI here is slightly different from that in Zhang et al. (2019). The improved F2 design involves the following steps: (1) Develop F1 and F2 populations from the cross between any two inbred lines. (2) Identify QTL for the quantitative trait of interest using trait-phenotype and marker-genotype datasets from the F2 population. (3) If the additive (a) and dominant (d) effects of the target QTL have the same sign, P2 should be crossed with F1 to construct BC1(P2)F1. If they have opposite signs, P1 should be crossed with F1 to construct BC1(P1)F1. This is because the large difference in the two genotypic values in BC1(Ps)F1 (s = 1, 2) can increase the power of QTL detection in BSA. To select BC1(Ps)F1 individuals with a heterozygous genotype at the target QTL and homozygous genotypes at non-target QTLs, molecular markers tightly linked to each detected QTL in the F2 population are available. The selected BC1(Ps)F1 individuals are self-pollinated to produce a secondary F2 population (Zhang et al., 2019). To ensure a large population, multiple target BCsF1 individuals should be selected. (4) The above-mentioned secondary F2 individuals are measured for the trait of interest, and the plants with minimum and maximum phenotypes are selected separately within each family. This is the F2-VI design used in the present study. In each aforementioned F2 design, an equally mixed DNA/RNA sample of all individuals in each extreme pool is separately subjected to whole-genome sequencing with deep genome coverage c. The sequenced datasets for each mixed sample are mapped to the reference genome, and genomic variants across the genome are identified. On the basis of the observed numbers of read counts of marker alleles Q (j = 1) and q (j = 2) in low (i = 1) and high (i = 2) pools, we propose a method to predict the numbers of read counts of marker genotypes QQ (j = 1), Qq (j = 2), and qq (j = 3) in the two extreme pools (see materials and methods). Thus, the observed allele and predicted genotype numbers are used to calculate the new statistic Gwhere G1 and G2 are the standard G statistics of Magwene et al. (2011), , using allelic read numbers n and their expectations E(n), and using genotypic read numbers n and their expectations E(n). The G statistic was used to identify QTLs for quantitative traits, and the strategy was named dQTG-seq1. Although dQTG-seq1 has higher power for QTL detection than existing BSA methods, it may fail to detect extremely and fully over-dominant and small-effect genes, for which all existing BSA methods including dQTG-seq1 do not work well. To address this issue, additional reserved DNA/RNA sample from each extreme phenotype individual is separately subjected to whole-genome sequencing one plant at a time. In this case, the observed numbers of marker alleles and genotypes in the two pools can be used to calculate G to identify QTGs in a strategy we have named dQTG-seq2. To validate the new strategy, other BSA and QTL mapping approaches, such as the ΔSNP index (Takagi et al., 2013a, 2013b), G′ (Magwene et al., 2011), composite interval mapping (CIM) (Zeng, 1993), and inclusive CIM (ICIM) (Li et al., 2007), were also used in this study.

Monte Carlo simulation studies

dQTG-seq1 has higher power than other BSA methods

To demonstrate that dQTG-seq1 has higher power than other BSA methods, one small-effect (5%) QTL was simulated under four kinds of genetic models: additive (a ≠ 0 and d = 0), completely dominant (a = d), and over-dominant (d = 2a and d = 3a) (Supplemental Table 1). All samples in simulated dataset I were analyzed by the dQTG-seq1, ED (Euclidean distance), G′, and ΔSNP methods. dQTG-seq1 had significantly higher power than the other BSA methods; its power was 10.0%, 8.5%, 4.7%, and 4.1% higher than those of the other BSA methods in situations with dominance ratios of 0.0, 1.0, 2.0, and 3.0, respectively (Figure 2A).

Figure 2

Comparison of statistical power for QTL detection across various methods under different genetic models.

(A) Comparison of dQTG-seq1 with existing BSA (ED, G′, and ΔSNP) methods under various degrees of dominance (d/a = 0, 1, 2, and 3) models of one simulated QTL in a secondary F2 population.

(B) Comparison of dQTG-seq2 with dQTG-seq1, ED, G′, ΔSNP, and ICIM methods under extremely and fully over-dominant models of one simulated QTL in a secondary F2 population.

(C and D) Comparison of dQTG-seq2 with dQTG-seq1, ED, G′, ΔSNP, and ICIM methods under extremely (C) and fully (D) over-dominant models of five simulated QTLs in a primary F2 population. ∗, ∗∗, and ∗∗∗ indicate the 0.05, 0.01, and 0.001 significance probability levels using a two-sample percentage test.

Comparison of statistical power for QTL detection across various methods under different genetic models. (A) Comparison of dQTG-seq1 with existing BSA (ED, G′, and ΔSNP) methods under various degrees of dominance (d/a = 0, 1, 2, and 3) models of one simulated QTL in a secondary F2 population. (B) Comparison of dQTG-seq2 with dQTG-seq1, ED, G′, ΔSNP, and ICIM methods under extremely and fully over-dominant models of one simulated QTL in a secondary F2 population. (C and D) Comparison of dQTG-seq2 with dQTG-seq1, ED, G′, ΔSNP, and ICIM methods under extremely (C) and fully (D) over-dominant models of five simulated QTLs in a primary F2 population. ∗, ∗∗, and ∗∗∗ indicate the 0.05, 0.01, and 0.001 significance probability levels using a two-sample percentage test.

dQTG-seq2 can identify more loci with large and small

To investigate whether the dQTG-seq1 and dQTG-seq2 methods can identify extremely and fully over-dominant and small-effect genes in the F2 population, one small-effect (5%) QTL was simulated under extremely (d/a = 4.0) and fully (a = 0 and d ≠ 0) over-dominant models in a secondary F2 (F2-III to F2-VI) population. Each simulation sample in simulated dataset II was analyzed by the dQTG-seq1, dQTG-seq2, ED, G′, ΔSNP, and ICIM methods, and the power, false positive rate (FPR), false negative rate (FNR), and accuracy of QTL position estimates in QTL detection were calculated. Under the extremely over-dominant model, the powers of QTL detection using dQTG-seq2 (71.3%) and ICIM (70.0%) were significantly higher than those using the dQTG-seq1 (14.3%), ED (12.7%), G′ (13.7%), and ΔSNP (13.3%) methods. Similar trends were observed in the FPR, FNR, and accuracy of QTL position estimates (Figure 2B; Supplemental Table 2). Under the fully over-dominant model, the powers of QTL detection using dQTG-seq2 (74.8%) and ICIM (73.0%) increased slightly, whereas the powers of the other methods decreased significantly (<5%). The trends in FPR, FNR, and accuracy of QTL position estimates were similar to those in the extremely over-dominant model (Figure 2B; Supplemental Table 2). To assess the applicability of dQTG-seq1 and dQTG-seq2 to a primary F2 population from a cross between any two homozygous parents (F2-I and F2-II), five QTLs were simulated under extremely (d/a = 4) and fully over-dominant (a = 0 and d ≠ 0) models in simulated dataset III, and their sizes were set to 2.5%, 5.0%, 5.0%, 10.0%, and 10.0%. Under the extremely over-dominant model, the average powers of dQTG-seq2 (49.0%) and ICIM (30.1%) were significantly higher than those of the dQTG-seq1 (15.8%), ED (9.3%), G′ (13.9%), and ΔSNP (13.3%) methods, and the FPR, FNR, and accuracy of QTL position estimates from dQTG-seq2 and ICIM were better than those from the other methods (Figure 2C; Supplemental Table 3). Under the fully over-dominant model, the average powers of dQTG-seq2 (50.2%) and ICIM (28.4%) showed slight changes, whereas the average powers of the dQTG-seq1 (7.7%), ED (4.0%), G′ (6.8%), and ΔSNP (6.6%) methods decreased. The trends in FPR, FNR, and accuracy of QTL position estimates were similar to those in the extremely over-dominant model (Figure 2D; Supplemental Table 3). Interestingly, we found that QTL4 and QTL5 with a size of 10% had lower powers than QTL2 and QTL3 with a size of 5%. This may be because the effects of QTL4 and QTL5 were in opposite directions. In conclusion, from the second and third simulation experiments, dQTG-seq2 can detect extremely and fully over-dominant and small-effect genes, but the other BSA methods, such as dQTG-seq1, ED, G′, and ΔSNP, do not work well. To determine whether dQTG-seq1 and dQTGseq2 can be used to detect multiple QTLs on a chromosome, two 10% QTLs on one chromosome were simulated in an F2 population; the genetic model was d = 0.5a, and all the samples in simulated dataset IV were analyzed by dQTG-seq1 and dQTG-seq2. The two QTLs could be detected (Supplemental Table 4), and the new methods can therefore be used to detect multiple QTLs on a chromosome.

Analyses of real data from maize and rice

dQTG-seq1, as well as the G′ and ΔSNP methods, can identify a QTG for maize plant height in a secondary F2 population

To validate dQTG-seq1 in a secondary F2 population (F2-III to F2-VI), the maize dataset from Zhang et al. (2019) was re-analyzed. In Zhang et al. (2019), the numbers of read counts of marker alleles obtained from whole-genome sequencing were used to detect a QTN for plant height. The target QTL on chromosome 7 in the F2 population was also detected in a secondary F2 population, and its candidate gene around this QTN was confirmed by a molecular biology experiment (Zhang et al., 2019). In the present study, we predicted the numbers of read counts of marker genotypes, and the observed allele and predicted genotype numbers were used to detect the QTN. In Figure 3, the QTN in Zhang et al. (2019) was also found in the secondary F2 population by the dQTG-seq1, G′, and ΔSNP methods, indicating the feasibility of dQTG-seq1 for detection of target QTN in a secondary F2 population.

Figure 3

Identification of a QTG for maize plant height in the secondary F2 of a target QTL using three BSA methods.

(A–C) The three methods are dQTG-seq1 (A), G’ (B), and ΔSNP (C).

The dataset was derived from Zhang et al. (2019), and the red and blue colors are used to distinguish adjacent chromosomes in the whole genome.

Identification of a QTG for maize plant height in the secondary F2 of a target QTL using three BSA methods. (A–C) The three methods are dQTG-seq1 (A), G’ (B), and ΔSNP (C). The dataset was derived from Zhang et al. (2019), and the red and blue colors are used to distinguish adjacent chromosomes in the whole genome.

dQTG-seq1 identifies more known genes for rice yield in a primary F2 population than the previous BSA (ΔSNP and G′) methods

To validate dQTG-seq1 in a primary IMF2 population (F2-I and F2-II) from a cross between any two homozygous parents, the 1998 dataset of rice yield from Zhou et al. (2012) was re-analyzed. In Zhou et al. (2012), 11 QTLs were identified as significantly associated with rice yield. Around these QTLs, there were eight previously reported genes (Supplemental Table 5). In this study, 20% extremely low-yield plants and 20% extremely high-yield plants were grouped into low and high pools. In each pool, we calculated the numbers of marker alleles and predicted the numbers of marker genotypes. These numbers were used to calculate G. As a result, 18 QTLs were found to be significantly associated with rice yield (Supplemental Table 5). Around these QTLs, 11 previously reported genes were found to be truly associated with rice yield (Supplemental Table 6; Figure 4A). When the previous (G′ and ΔSNP) BSA methods were used, only eight previously reported genes were found by each method (Figure 4B and 4C). Clearly, dQTG-seq1 identified more known genes than the G′ and ΔSNP methods, demonstrating the advantage of dQTG-seq1 over the previous BSA (G′ and ΔSNP) methods.

Figure 4

Identification of QTGs for rice yield in a primary F2 population using six mapping methods.

(A–F) The six mapping methods are dQTG-seq1 (A), G’ (B), ΔSNP (C), composite interval mapping (CIM, D), inclusive CIM (ICIM, E), and dQTG-seq2 (F).

Horizontal dotted lines indicate thresholds for significant QTLs, where the critical values were determined by permutation experiments for the new methods and by the R package QTLseqr for the G′ and ΔSNP methods and set to LOD = 2.5 for the CIM and ICIM methods. Various statistics of genome-wide scanning using new and existing methods are indicated by blue line. The genes with < 2.0, small effects, and ≥ 2.0 are indicated by black, red, and pink, respectively. If the dominance ratio of a small-effect gene is larger than 2.0, the gene name is in pink, and its corresponding solid line is in red. The dataset was derived from Zhou et al. (2012).

Identification of QTGs for rice yield in a primary F2 population using six mapping methods. (A–F) The six mapping methods are dQTG-seq1 (A), G’ (B), ΔSNP (C), composite interval mapping (CIM, D), inclusive CIM (ICIM, E), and dQTG-seq2 (F). Horizontal dotted lines indicate thresholds for significant QTLs, where the critical values were determined by permutation experiments for the new methods and by the R package QTLseqr for the G′ and ΔSNP methods and set to LOD = 2.5 for the CIM and ICIM methods. Various statistics of genome-wide scanning using new and existing methods are indicated by blue line. The genes with < 2.0, small effects, and ≥ 2.0 are indicated by black, red, and pink, respectively. If the dominance ratio of a small-effect gene is larger than 2.0, the gene name is in pink, and its corresponding solid line is in red. The dataset was derived from Zhou et al. (2012).

dQTG-seq1 and existing BSA methods identify fewer large and small-effect known genes for rice yield in a primary F2 population than CIM and ICIM

To compare dQTG-seq1 with previous QTL mapping methods, all individuals in Zhou et al. (2012) were re-analyzed by the CIM and ICIM methods. As a result, eight (CIM) and six (ICIM) known genes were found to be truly associated with rice yield (Figure 4D and 4E). Among these known genes identified by QTL mapping methods, five were not identified by the BSA methods. Here, we calculated the and of the five known genes and found that four had a large (3.04, 5.93, 4.22, and –6.46), and one had a small . This result highlights the difficulty of using the dQTG-seq1, G′, and ΔSNP methods to detect genes with > 2 or small (≤0.15). To confirm this viewpoint, we calculated the and of the 11 known genes identified by dQTG-seq1. The value of ranged from 0.00 to 1.73, and its average was 0.72 ± 0.55; ranged from 0.13 to 0.22, and its average was 0.19 ± 0.026 (Supplemental Table 6). The low and large phenomena were also observed in the G′ and ΔSNP methods. Thus, dQTG-seq1 and existing BSA methods have shortcomings in the detection of large and small genes, although dQTG-seq1 has higher power than existing BSA methods. Here, we provide some simulation and theoretical analyses to support the thresholds of > 2 and ≤ 0.15. First, in Figure 2A, = 2.0 was viewed as a critical value based on the stability of power changes. That is, the power of small-effect (5%) QTL detection in 500 F2 individuals is approximately 30%. Second, we carried out one theoretical analysis to explain why the critical value of is 0.15. In situations where QTL size ranged from 1% to 5% and sampling fraction was 5%, 10%, 15%, and 20% in the F2 population, we calculated allelic frequency differences between the two extreme pools. We found that most allelic frequency differences were less than 0.15 (Supplemental Figure 1), in which it was difficult to detect these loci. The above two critical values were consistent with those in analysis of real data (Supplemental Table 6).

dQTG-seq2 identifies more large and small loci/genes for rice yield in an F2 population

To address the aforementioned issue with dQTG-seq1 and existing BSA methods, each extreme plant from the low and high pools of the primary or secondary F2 population was sequenced, and the observed numbers of marker alleles and genotypes were used to calculate G (Figure 1). To validate dQTG-seq2, the above-mentioned rice dataset from Zhou et al. (2012) was re-analyzed. All the IMF2 individuals were sorted according to their phenotypic values, and 20% of the extreme high and extreme low individuals were selected to form the high and low pools. From the 20% extreme low and high pools, we could obtain 5%, 10%, and 15% extreme low and high pools. In these four datasets, dQTG-seq2 was used to detect QTLs for rice yield. As a result, 33 QTLs were found to be significantly associated with rice yield (Figure 4F; Supplemental Table 5). Among these QTLs, 26 previously reported genes were found to be truly associated with rice yield, ranged from 0.04 to 36.61 with an average of 4.37 ± 7.69, and there were eight known genes with > 2.0, four and three of which were identified by CIM and ICIM, respectively. The ranged from 0.00 to 0.22 with an average of 0.13 ± 0.064, and there were 14 known genes with < 0.15, one of which was identified by CIM (Supplemental Table 6). These results confirmed the advantages of dQTG-seq2 over dQTG-seq1 and existing BSA methods. Among the above-mentioned 27 known genes, 25 were validated by one-way ANOVA by calculating the average values of individuals with each genotype (Supplemental Figure 2).

Suitable sampling plans via Monte Carlo simulation studies

To obtain suitable sampling plans for the dQTG-seq1 and dQTG-seq2 methods, we conducted a series of Monte Carlo simulation studies. In these studies, sample sizes were set to 250, 500, 1 000, and 2 000; QTL sizes (r2, %) were set to 2%, 5%, and 10%; 30, 50, and 80 extreme individuals were sampled in each extreme pool. There were 1 000 replicates. All samples in simulated dataset V were analyzed by dQTG-seq1 and dQTG-seq2, and the results are shown in Figure 5 and Supplemental Table 7. If sampling plans in which the power of QTL detection is >40% were suggested to be suitable to users, some suitable sampling plans were obtained. That is, 80 extreme individuals in 2 000 F2 plants were selected for each pool to identify 2% QTLs via dQTG-seq2, whereas more than 2 000 F2 individuals were needed for dQTG-seq1. Thirty extreme individuals in at least 250 F2 individuals were selected for each pool to identify 5%–10% QTLs via dQTG-seq1 and dQTG-seq2. In summary, selection of 80 extreme individuals for each pool from a large F2 population was suitable for identifying small-effect (2%) genes, whereas selection of 30 extreme individuals for each pool from a general F2 population was suitable for identifying large-effect genes.

Figure 5

The statistical power of various sampling plans is indicated with black and gray colors, where black indicates suitable sampling plans.

Monte Carlo simulation studies to determine suitable sampling plans for dQTG-seq1 and dQTG-seq2 under various QTL sizes (2%, 5%, and 10%) and numbers of plants in each extreme pool (30, 50, and 80) in an F2 population with sample sizes of 250, 500, 1000, and 2000. The statistical power of various sampling plans is indicated with black and gray colors, where black indicates suitable sampling plans.

Discussion

Significant progress in BSA has been made in this study. First, a new BSA strategy was proposed to quickly detect QTLs for quantitative traits in primary or secondary F2 population. Specifically, equally mixed DNA/RNA samples of all the plants in each extreme pool are sequenced one pool at a time. The sequencing results are used to identify QTLs via dQTG-seq1. If the goal is to identify large and/or small genes, reserved DNA/RNA sample from each extreme plant is sequenced one plant at a time. The observed numbers of marker alleles and genotypes are used to detect QTLs via dQTG-seq2. In analyses of simulated and real data, dQTG-seq2 can identify extremely and fully over-dominant and small-effect QTLs, whereas dQTG-seq1 is feasible for detecting low and large-effect loci. The two methods had significantly higher powers of QTL detection than existing BSA methods. This is because the new methods utilize both allelic and genotypic frequency differences between high and low pools (Table 1). As shown in Figure 6A–6E, decreases as increases, especially in the fully over-dominant model in which = 0, but differences in marker genotypic frequencies exist, demonstrating the effectiveness of detecting a gene or locus using dQTG-seq2. Then, in BSA, we predicted the numbers of read counts of marker genotypes from the numbers of read counts of marker alleles, and the observed allele and predicted genotype numbers were used to construct the new statistic G. We found that the predicted values were acceptable when < 2.0 and > 0.15, even in a primary F2 population (Supplemental Tables 1 and 5; Supplemental Figure 1). If ≥ 2.0, < 0.15, a = 0, and d ≠ 0, the predicted values were inaccurate, the power of QTL detection was low, and the FPR was high (>5‱) (Supplemental Tables 1 and 5; Supplemental Figure 1), indicating the difficulty of detecting extremely and fully over-dominant and small-effect genes or loci via dQTG-seq1. We therefore proposed sequencing each extreme individual in the two pools (Figure 1) so that the numbers of marker alleles and genotypes could be obtained, and extremely and fully over-dominant and small-effect genes could be detected by dQTG-seq2 with high power and accuracy (Figures 2 and 4; Supplemental Table 6) and a low false-positive rate (Figure 2; Supplemental Table 2). In the analysis of real data for rice yield, dQTG-seq2 identified eight over-dominant known genes, four of which were detected by CIM and ICIM; the other BSA methods found no known genes. More importantly, dQTG-seq2 identified eight small-effect known genes, whereas the other methods detected no known small-effect genes. Finally, dQTG-seq2 had some slight advantages over CIM and ICIM in simulation and real data analysis, such as the low cost of marker genotyping and slightly higher power and accuracy. As shown in Figure 4, dQTG-seq2 identified more known genes than CIM and ICIM, suggesting that CIM and ICIM can be replaced by dQTG-seq2 in QTL mapping.

Table 1

Mathematical differences among dQTG-seq1, dQTG-seq2, and other BSA methods.

Methods	No. of marker alleles and genotypes			Statistic
Methods	Observed alleles	Predicted number of genotypes	Observed number of genotypes	Statistic
dQTG-seq1	✓	✓		G_w = c₁G₁+c₂G₂
dQTG-seq2	✓		✓	G_w = c₁G₁+c₂G₂
ED	✓			ED=(p_AL−p_AH)²+(p_aL−p_aH)²
G′	✓			G′=∑jinawindowkjG
ΔSNP	✓			ΔSNPindex=\|pAL−pAH\|

, n are the number of read counts of marker alleles, and E(n) is the expectation of n; , ; n are the number of read counts of marker genotypes, and E(n) is the expectation of n. P and P (P and P) are the frequencies of marker allele A (a) in extreme low and high pools, respectively. k is the weight of the jth marker in the window.

Figure 6

Theoretical analyses for allelic and genotypic frequencies under various genetic models and various statistics under different dominance ratios in two extreme pools of an F2 population.

(A–D) Additive (d/a = 0, A), completely-dominant (d/a = 1, B), extremely over-dominant (d/a = 6, C), and fully over-dominant (a = 0 and d ≠ 0, D) models.

(E) The frequencies of alleles and genotypes of one fully over-dominant QTL.

(F) The values of G1, G2 and G under various degrees of dominance (d/a = 0, 1, 2, 3, 4 and ∞) models. x(P) and x(P) are the standardized cut points (sampling fractions) in low and high pools, respectively. f(f) is the frequency of QTL allele Q in the low (high) pool, and f(f) is the frequency of QTL genotype QQ in the low (high) pool. a and d are the additive and dominant effects of one simulated QTL, respectively.

Mathematical differences among dQTG-seq1, dQTG-seq2, and other BSA methods. , n are the number of read counts of marker alleles, and E(n) is the expectation of n; , ; n are the number of read counts of marker genotypes, and E(n) is the expectation of n. P and P (P and P) are the frequencies of marker allele A (a) in extreme low and high pools, respectively. k is the weight of the jth marker in the window. Theoretical analyses for allelic and genotypic frequencies under various genetic models and various statistics under different dominance ratios in two extreme pools of an F2 population. (A–D) Additive (d/a = 0, A), completely-dominant (d/a = 1, B), extremely over-dominant (d/a = 6, C), and fully over-dominant (a = 0 and d ≠ 0, D) models. (E) The frequencies of alleles and genotypes of one fully over-dominant QTL. (F) The values of G1, G2 and G under various degrees of dominance (d/a = 0, 1, 2, 3, 4 and ∞) models. x(P) and x(P) are the standardized cut points (sampling fractions) in low and high pools, respectively. f(f) is the frequency of QTL allele Q in the low (high) pool, and f(f) is the frequency of QTL genotype QQ in the low (high) pool. a and d are the additive and dominant effects of one simulated QTL, respectively. In quantitative genetics, over-dominance and complete dominance are defined as > 1.2 and 0.8 < < 1.2, respectively (Stuber et al., 1987). In this study, extreme over-dominance was defined as ≥ 4.0, and full over-dominance was defined as d ≠ 0 and a = 0. Small- and large-effect QTLs (genes) were distinguished between r2 < 5% and r2 ≥ 5% based on the results in Supplemental Figure 1 and the analysis of real rice data. The definitions in this study are helpful for understanding the new methods. The new methods can be used to identify QTLs in the aforementioned six types of F2 population. As we know, a secondary F2 population is better than a primary F2 population for fine-mapping of QTLs. This is also true for BSA. If no QTLs in a secondary F2 (F2-III to F2-VI) or no ideal QTLs in a primary F2 (F2-I and F2-II) are detected by dQTG-seq1, we recommend sequencing another reserved DNA/RNA sample from each extreme plant, and the observed numbers of marker alleles and genotypes can be used to identify QTLs via dQTG-seq2. We should point out that F2-VI is an improved F2 design based on Zhang et al. (2019), and the designs differ in whether two parents may be used to backcross the F1. In addition, dQTG-seq2 is theoretically applicable to any heterogeneous mapping population, such as BC1Fs, Fs (s ≥ 3), genetic mating design, or association mapping population, because the numbers of marker alleles and genotypes are observed, and their theoretical numbers are calculated from a contingency table. Thus, dQTG-seq2 is feasible. Detailed results will be reported soon. Sometimes, F2 individuals have a large experimental error. First, the F2:3 design (Zhang and Xu, 2004) can be used to overcome this problem. That is, DNA/RNA is extracted from each F2 plant, and extreme individuals are determined based on the average phenotypes of F2:3 families. This design is frequently used in maize and cotton genetic analysis. IMF2 derived from DH lines or RILs is also used to overcome this problem (Hua et al., 2003). This is because replicated experiments can be implemented in IMF2, especially from RILs with more recombinants. The new methods are different from previous BSA and QTL mapping methods. To reduce the time required to construct the mapping population and the cost of marker genotyping (Darvasi and Soller, 1994), BSA is proposed at an early stage to detect the association of a marker with a trait (Giovannoni et al., 1991; Michelmore et al., 1991). This method is also frequently used to locate a single-gene trait in Mendelian genetics (Ruyter-Spira et al., 1997). Specifically, some closely linked markers around one marker with significant differences in allele frequency between the two extreme pools are used to fine-map the gene via linkage analysis. With the development of sequencing technology and reductions in sequencing costs, the integration of BSA with sequencing technology provides new opportunities for quickly detecting QTGs. To date, a series of methods have been proposed (Magwene et al., 2011; Abe et al., 2012; Takagi et al., 2013a, 2013b; Yan et al., 2016). In these methods, the numbers of read counts of marker alleles in low and high pools in primary or secondary F2 populations are used to construct their statistics. In this study, the numbers of (read counts of) marker alleles and genotypes in the two extreme pools are used to construct the new statistic G. If the trait is controlled by a fully over-dominant gene, in theory there are no differences in allelic frequencies of markers around the trait gene between the two extreme pools (Supplemental File 1), highlighting the inability of existing BSA methods to locate fully over-dominant genes (Schneeberger, 2014; Wang et al., 2019). Clearly, the present study solves this problem, which has been pending for some time. As described in model 1, G consists of G1 and G2, which are based on the numbers of read counts of marker alleles and genotypes, respectively. If G1 = 0, G2 is available; if G2 = 0, G is G′ of Magwene et al. (2011). Thus, G is an extension of G′. Although the distributions of G1 and G2 are χ2 (Magwene et al., 2011), G1 and G2 are correlated; thus, the distribution of G is unknown. This makes it difficult to determine the critical value of G for significant QTGs. In the present study, 1 000 permutation experiments were used to determine the critical value. The results are presented in Supplemental Figure 3. They show that the critical value of G is 6.52–7.24 at the 0.10 probability level and 7.60–8.52 at the 0.05 probability level using the dQTG-seq2 method. We suggest that the 0.10 threshold be used with dQTG-seq1 and the 0.05 threshold be used with dQTG-seq2. This is because the predicted numbers of read counts of marker genotypes have residual error. This may explain why different critical values in Figures 4A (6.52) and 4F (7.60) are used. In addition, we have listed additional G thresholds under various probability levels in Supplemental Table 8 and Supplemental Figure 4. The new method is also applicable in other situations. If structural variants are obtained from sequencing, the markers with structural variants can also be used in the new methods. Although the new methods have many advantages over the existing methods, they have some limitations. First, in the additive–dominant model, the numbers of marker genotypes in the two extreme pools need to be predicted when dQTG-seq1 is used, and its predicted accuracy decreases with increasing dominance ratio; in particular, when a QTG is extremely over-dominant ( > 4.0), dQTG-seq1 does not work. To address this issue, each extreme plant is sequenced, and dQTG-seq2 is used. Second, the new and existing BSA methods are affected by QTL size, sample size, sampling fraction, and sequencing depth. The best values for these factors can improve the accuracy of these methods (Zou et al., 2016) (Supplemental Table 9). In this study, we also determined the best sampling plan for the new methods (Figure 5). Although marker segregation distortion is a common phenomenon in bi-parental populations, only three markers around 27 known genes (Supplemental Table 6) had segregation distortion, indicating that marker segregation distortion had little importance in the analysis of real rice data. In our simulation study, marker segregation distortion affected the detection of genes (Supplemental Table 10), and its methodological extension needs to be further addressed in the near future.

Materials and methods

Materials

The maize dataset was derived from Zhang et al. (2019), in which the F1 plants between HZS (an elite inbred line) and 1462 (a temperate inbred line) were self-pollinated and backcrossed to HZS to produce F2 and BC1F1 seeds, respectively. The 20% tallest (580) and 20% shortest (567) plants were selected from each BC1F1:2 family and bulked in equal ratios into high and low pools. The two pools were collectively subjected to whole-genome sequencing with up to >280× coverage, and 197 021 high-quality SNPs were generated. The genotypic and phenotypic observations from a rice IMF2 population were downloaded from Zhou et al. (2012) (http://www.pnas.org/content/suppl/2012/09/07/1214141109.DCSupplemental). In the IMF2 population, there were 278 individuals and 1619 bin markers, and yield per plant in 1998 was re-analyzed. All the IMF2 individuals were sorted according to their phenotypic values, and 20% of the extreme high and low individuals were selected to form the high and low pools. Using dQTG-seq1, the observed numbers of marker alleles were used to predict the numbers of marker genotypes. The observed and predicted numbers in extreme pools were used to calculate G. Using dQTG-seq2, the observed numbers of marker alleles and genotypes in extreme pools were used to calculate G.

Idea for constructing the statistic in the new method

In Figure 6A–6D, we drew the mixture distributions of one simulated QTG with four types of genetic model (a ≠ 0 and d = 0; a = d; d = 6a; a = 0 and d ≠ 0) in the F2 population. We found that the difference in allele frequencies between high and low pools decreased with increasing ; in particular, no difference was observed when a = 0 and d ≠ 0. The theoretical derivation is shown in Supplemental File 1, and the frequency changes in marker alleles and genotypes are shown in Figure 6E. Almost all the statistics in the existing BSA methods are derived from the differences between allelic read frequencies in the two pools. This demonstrates the inability of existing BSA methods to identify QTGs with fully over-dominant effects (Wang et al., 2019). It should be noted that the differences in marker genotype frequencies between the two pools exist under all types of genetic model. Thus, genotypic frequencies should be used to detect all types of QTGs, including fully over-dominant QTLs. The values of G1, G and G are also shown in Figure 6F. As the dominance ratio increases, the value of G1 calculated from the frequencies of marker alleles decreases, and the value of G2 calculated from the frequencies of marker genotypes increases. In the extreme case (full over-dominance), the value of G1 is close to zero, and the value of G2 is close to G. This is why the new statistic G in this study is constructed from the frequencies of (read counts of) marker alleles and genotypes.

Prediction for the numbers of marker genotypes in high and low pools

If high and low pools are sequenced in groups, allelic read frequencies, rather than genotypic read frequencies, can be obtained. Thus, the differences in allelic read frequencies between the two extreme pools are available for the detection of QTGs in existing BSA methods. As described above, it is difficult to detect QTGs with fully over-dominant effects. To solve this issue, genotypic read frequencies in high and low pools should be used. Thus, we need to predict the numbers of read counts of marker genotypes in the two pools. In theory, these numbers are calculated fromwhere r is the recombination fraction between the current marker and putative QTG; P and P are the sampling fractions in low and high pools, respectively; P(·) and P(·) are the probabilities of genotypes in low and high pools, respectively; n and n are the numbers of read counts of marker genotypes in low and high pools, respectively; a and d are the additive and dominant effects standardized by σ, respectively; x=(x−μ)/σ and x=(x−μ)/σ are the standardized cut points in low and high pools, respectively; μ and are mean and residual variance, respectively; and Φ(·) is the probability function of the standard normal distribution (Lander and Botstein, 1989); n and n are the numbers of read counts of marker alleles (j = 1, 2) and genotypes (j = 1, 2, and 3) in low (i = 1) and high (i = 2) pools, respectively. However, the parameters a, d, r, x, and x are unknown. To estimate the five parameters above, the five equations below are used (Darvasi and Soller, 1992; Lebowitz et al., 1987).where f (f) and f (f) are the frequencies of read counts of marker allele M (Q) at the currently detected marker (or QTG) in low and high pools, respectively. The five parameters above are solved using the R package BB. The initial value for is set as the maximum difference of marker allelic read frequencies in the current chromosome between high and low pools; the initial values for x and x were determined based on P, P, a = 0, and d = 0; the initial values for a and d are set to zero; the initial value for r is set to the recombinant fraction between the putative QTL and the marker with the maximum allele frequency difference on the chromosome based on the genetic relationship between physical and genetic distances (Supplemental Table 11) and the Haldane function (Haldane, 1919). If it does not converge in the estimation of five parameters, let G2 be zero.

Construction of a new statistic, Gw

A new statistic, G is a weighted mean of G1 and G2 and is indicated bywhere and are standard G statistics (Magwene et al., 2011); E(n) and E(n) are the theoretical numbers of read counts of marker alleles and genotypes, which are calculated from the 2 × 2 and 2 × 3 contingency tables, respectively. The Smooth_G value is a weighted average of the original statistic (G) values for all the SNPs in a window (Magwene et al., 2011) and is indicated by Although G1 and G2 obey the χ2 distribution, they are correlated. Thus, the distribution of G is unclear. In this case, permutation experiments are used to determine the critical value of G for significant QTGs (Fisher, 1935; Pitman, 1937). The purposes of a series of Monte Carlo simulation studies were to confirm whether the new methods could identify fully and extremely over-dominant and small-effect genes, to explore the conditions under which dQTG-seq1 was feasible, and to obtain the best sampling plans for the new methods. In the first simulation experiment, one small-effect (5%) QTL with various degrees of dominance (0.0, 1.0, 2.0, and 3.0) was simulated to imitate a secondary F2 population with a sample size of 500. This QTL was placed at the position of the 250th marker on a chromosome 100 cM in length covered by 500 evenly distributed markers. Their phenotypes were simulated from the population mean (100), the QTL effect, and residual variance (10). The sampling fraction was 20% for the high (low) pool. This was simulated dataset I. Each sample was analyzed by the dQTG-seq1, G′, ΔSNP, and ED methods. The critical values of significant QTLs at the 0.10 and 0.05 probability levels were determined by 10 000 permutation experiments for ED, dQTG-seq1, and dQTG-seq2 and by using the R package QTLseqr (Mansfeld and Grumet, 2018) for the G′ and ΔSNP methods. The number of replicates was 100 for ICIM and 1 000 for the other methods, and their results were used to evaluate the statistical power for this QTL, while the detected QTLs within 1 cM of the simulated QTL were viewed as true. In the second simulation experiment, one small-effect (5%) QTL was simulated under extremely (d/a = 4) and fully (a = 0 and d ≠ 0) over-dominant models to imitate a secondary F2 population with a sample size of 500. The others were the same as those in simulated dataset I. This was simulated dataset II. In the third simulation experiment, five QTLs were simulated under extremely ( = 4) and fully (a = 0 and d ≠ 0) over-dominant models to imitate a primary F2 population with a sample size of 500. Their positions and effects are shown in Supplemental Table 3. The others were the same as those in simulated dataset I. This was simulated dataset III. In the fourth simulation experiment, two QTLs with = 0.5 were simulated on one chromosome. The sample size in the F2 was 500, each QTL size was 10%, and QTL positions were placed at 25.00 cM and 45.00 cM. The others were the same as those in simulated dataset I. This was simulated dataset IV. In the fifth simulation experiment, we investigated the effect of sample size, QTL size, and number of plants in each extreme pool on the statistical power of dQTG-seq1 and dQTG-seq2. QTL sizes were set to 2%, 5%, and 10%; sample sizes in the F2 were set to 250, 500, 1 000, and 2 000; and the number of plants in each extreme pool was set to 30, 50, and 80. The others were the same as those in simulated dataset I. The simulated datasets of the 36 parameter combinations were analyzed by dQTG-seq1 and dQTG-seq2 to obtain suitable sampling plans for the two new methods. This was simulated dataset V. In the sixth simulation experiment, to investigate the effect of marker segregation distortion on QTL detection, one QTL with = 1.0 was simulated in the F2. The sample size was 500, the QTL position overlapped with a segregation distortion locus (SDL) at 50.00 cM, the sizes of both QTL and SDL were 10%, and the genetic models of the SDL were set to additive (a ≠ 0 and d = 0), completely dominant (d/a = 1.0), and fully over-dominant (a = 0 and d ≠0). The others were the same as those in simulated dataset I. This was simulated dataset VI. The above simulated datasets are found in Supplemental Data 1.

Funding

This work was supported by the (32070557 and 31871242), the (2662020ZKPY017), and the Scientific and Technological Self-Innovation Foundation (2014RC020).

Author contributions

Y.-M.Z. conceived and designed the study. P.L. wrote the code and conducted Monte Carlo simulation studies. P.L., G.L., Y.-W.Z., J.-F.Z., J.-Y.L., and Y.-M.Z. performed the data analyses. P.L. and Y.-M.Z. developed the method, wrote the draft, and revised the manuscript. All authors reviewed the manuscript.

57 in total

1. Genome sequencing reveals agronomically important loci in rice using MutMap.

Authors: Akira Abe; Shunichi Kosugi; Kentaro Yoshida; Satoshi Natsume; Hiroki Takagi; Hiroyuki Kanzaki; Hideo Matsumura; Kakoto Yoshida; Chikako Mitsuoka; Muluneh Tamiru; Hideki Innan; Liliana Cano; Sophien Kamoun; Ryohei Terauchi
Journal: Nat Biotechnol Date: 2012-01-22 Impact factor: 54.908

2. Identification of markers linked to disease-resistance genes by bulked segregant analysis: a rapid method to detect markers in specific genomic regions by using segregating populations.

Authors: R W Michelmore; I Paran; R V Kesseli
Journal: Proc Natl Acad Sci U S A Date: 1991-11-01 Impact factor: 11.205

3. Identification of MIR390a precursor processing-defective mutants in Arabidopsis by direct genome sequencing.

Authors: Josh T Cuperus; Taiowa A Montgomery; Noah Fahlgren; Russell T Burke; Tiffany Townsend; Christopher M Sullivan; James C Carrington
Journal: Proc Natl Acad Sci U S A Date: 2009-12-14 Impact factor: 11.205

4. A hidden Markov-model for gene mapping based on whole-genome next generation sequencing data.

Authors: Jürgen Claesen; Tomasz Burzykowski
Journal: Stat Appl Genet Mol Biol Date: 2015-02

5. QTLseqr: An R Package for Bulk Segregant Analysis with Next-Generation Sequencing.

Authors: Ben N Mansfeld; Rebecca Grumet
Journal: Plant Genome Date: 2018-07 Impact factor: 4.089

6. Selective DNA pooling for determination of linkage between a molecular marker and a quantitative trait locus.

Authors: A Darvasi; M Soller
Journal: Genetics Date: 1994-12 Impact factor: 4.562

7. Bulked segregant analysis using microsatellites: mapping of the dominant white locus in the chicken.

Authors: C P Ruyter-Spira; Z L Gu; J J Van der Poel; M A Groenen
Journal: Poult Sci Date: 1997-02 Impact factor: 3.352

8. Dissection of genetically complex traits with extremely large pools of yeast segregants.

Authors: Ian M Ehrenreich; Noorossadat Torabi; Yue Jia; Jonathan Kent; Stephen Martis; Joshua A Shapiro; David Gresham; Amy A Caudy; Leonid Kruglyak
Journal: Nature Date: 2010-04-15 Impact factor: 49.962

9. High-resolution genetic mapping with pooled sequencing.

Authors: Matthew D Edwards; David K Gifford
Journal: BMC Bioinformatics Date: 2012-04-19 Impact factor: 3.169

10. Improved linkage analysis of Quantitative Trait Loci using bulk segregants unveils a novel determinant of high ethanol tolerance in yeast.

Authors: Jorge Duitama; Aminael Sánchez-Rodríguez; Annelies Goovaerts; Sergio Pulido-Tamayo; Georg Hubmann; María R Foulquié-Moreno; Johan M Thevelein; Kevin J Verstrepen; Kathleen Marchal
Journal: BMC Genomics Date: 2014-03-19 Impact factor: 3.969

2 in total

1. Multi-faceted approaches for breeding nutrient-dense, disease-resistant, and climate-resilient crop varieties for food and nutritional security.

Authors: Reyazul Rouf Mir; Sachin Rustgi; Yuan-Ming Zhang; Chenwu Xu
Journal: Heredity (Edinb) Date: 2022-05-23 Impact factor: 3.832

2. dQTG.seq: A comprehensive R tool for detecting all types of QTLs using extreme phenotype individuals in bi-parental segregation populations.

Authors: Pei Li; Liu-Qiong Wei; Yi-Fan Pan; Yuan-Ming Zhang
Journal: Comput Struct Biotechnol J Date: 2022-05-14 Impact factor: 6.155

2 in total