Literature DB >> 33270706

Identification of superior parental lines for biparental crossing via genomic prediction.

Abstract

A parental selection approach based on genomic prediction has been developed to help plant breeders identify a set of superior parental lines from a candidate population before conducting field trials. A classical parental selection approach based on genomic prediction usually involves truncation selection, i.e., selecting the top fraction of accessions on the basis of their genomic estimated breeding values (GEBVs). However, truncation selection inevitably results in the loss of genomic diversity during the breeding process. To preserve genomic diversity, the selection of closely related accessions should be avoided during parental selection. We thus propose a new index to quantify the genomic diversity for a set of candidate accessions, and analyze two real rice (Oryza sativa L.) genome datasets to compare several selection strategies. Our results showed that the pure truncation selection strategy produced the best starting breeding value but the least genomic diversity in the base population, leading to less genetic gain. On the other hand, strategies that considered only genomic diversity resulted in greater genomic diversity but less favorable starting breeding values, leading to more genetic gain but unsatisfactorily performing recombination inbred lines (RILs) in progeny populations. Among all strategies investigated in this study, compromised strategies, which considered both GEBVs and genomic diversity, produced the best or second-best performing RILs mainly because these strategies balance the starting breeding value with the maintenance of genomic diversity.

Entities: Chemical Species

Year: 2020 PMID： 33270706 PMCID： PMC7714229 DOI： 10.1371/journal.pone.0243159

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Biparental crossing is a common scheme used for pure-line breeding in self-pollinated crops such as rice (Oryza sativa L.), wheat (Triticum aestivum L.), soybean (Glycine max [L.] Merr.), and oat (Avena sativa L.). Plant breeders cross two inbred parental lines to produce the F1 population. Then, a subset of diverse individuals from the F2 population is selected to produce potential recombination inbred lines (RILs) after several generations of selfing. Parental lines play a fundamental role in line development, and significantly affect the performance of the resulting RILs. However, the identification of superior parental lines from germplasm collections for maximizing selection response in subsequent cycles remains challenging for plant breeders [1, 2]. Another practical concern is that the number of possible crosses in such a breeding program is often far greater than what can be handled in the field. Therefore, developing a method that can identify a limited number of superior parents before field trials would be of great help to plant breeders. Genomic selection, based on the statistical method of genomic prediction (GP), has been used to improve breeding efficiency in dairy cattle [3] and a variety of crops [4-8]. The main concept of GP is to capture all the effects of quantitative trait loci (QTLs) using high-density DNA markers over the whole genome [9]. The most commonly used DNA markers are single nucleotide polymorphisms (SNPs). A GP model is first built using the phenotypic and genotypic data of a training population. Then, genomic estimated breeding values (GEBVs) for the candidate individuals with known genotypic data are predicted through the resulting GP model. Two kinds of mixed linear model methods are widely employed to obtain GEBVs: (i) best linear unbiased prediction (BLUP) based on markers, and (ii) BLUP based on a genomic relationship matrix. To perform marker-based BLUP, the marker effects are treated as random effects, and GEBVs of individuals are calculated by multiplying their marker scores by these BLUP estimates; the ridge regression BLUP (rr-BLUP) method [9, 10] follows this approach. To perform genomic relationship matrix-based BLUP, the genotypic values of individuals are treated as random effects and estimated through a genomic relationship matrix; this approach is used in the genomic BLUP (GBLUP) model [11, 12]. Gaynor et al. [13] proposed a two-part strategy for implementing genomic selection for line development, which addresses two components: (i) a product development component, to identify inbred lines either for hybrid parent development or cultivar release, and (ii) a population improvement component, to increase the frequency of favorable alleles through rapid recurrent genomic selection. Gaynor et al. [13] conducted a stochastic simulation and showed that programs using the two-part strategy generated up to 2.5- and 1.5-fold more genetic gain than conventional programs and the best performing standard genomic selection strategy, respectively. Additionally, Yao et al. [14] combined GP with Monte Carlo simulation to select superior parents in wheat breeding programs before field trials. The authors used the criterion of usefulness function on a selection index, which incorporates yield and two quality traits, to evaluate a cross, and concluded that the use of the usefulness function for parental selection resulted in higher genetic gain than the use of mid-parent GEBV, implying that the strategy for parental selection cannot only consider GEBVs of the candidate accessions. By selecting parental lines with the highest GEBVs, breeders hope to maximally pass the favorable traits of parental lines on to their progeny. However, the truncation selection approach risks the elimination of several favorable QTLs from the breeding population because of a lack of genomic diversity [15]. Therefore, in this study, we took both GEBV and genomic diversity into account for identifying superior parents in a biparental crossing program. We constructed a GBLUP model for a specific target trait to predict the GEBVs of the candidate accessions. We first proposed a new index to quantify the genomic diversity of a set of candidate accessions. Subsequently, we simulated the genotypic data for progeny populations derived from a cross over successive generations, and predicted the GEBVs of the simulated progeny populations through the trained GBLUP model. Then, we made generation advancement decisions according to the resulting GEBVs. Finally, we assessed a set of parental lines based on F10 RILs. We compared the performance of several selection strategies via analysis of two real rice genome datasets.

Materials and methods

Rice genome datasets

Dataset I

The rice genome dataset originally collected for genome-wide association study (GWAS) in Zhao et al. [16] was used to illustrate the proposed procedure. This dataset contains 44,100 SNP variants and 36 traits of 413 O. sativa accessions, which comprises five subpopulations and one admixed group. SNPs with missing rate > 0.05 and minor allele frequency < 0.05 were removed from the dataset. To reduce redundant collinearity in the genomic relationship matrix, one SNP was randomly selected from each bin of 20,000 bp over each chromosome. A scatter plot based on the first two principal components (PCs) using the retained 11,047 SNPs is displayed in Fig 1, which is almost the same as that in the corresponding plot generated using all 44,100 SNPs [16]. The SNP genotype at each locus was coded as -1, 0, or 1, where 1 indicates homozygous genotype of the major allele; -1 indicates homozygous genotype of the minor allele; and 0 indicates heterogenous genotype. After SNP coding, any missing locus was imputed as 1. Six traits were analyzed: brown rice seed width (BRSW), florets per panicle (FPP), flowering time at Arkansas (FTAA), flowering time at Faridpur (FTAF), plant height (PH), and panicle number per plant (PNPP).

Fig 1

Scatter plot of 413 accessions with 11,047 SNPs according to the first two principal components (PCs) for the 44k rice genome dataset.

IND: indica rice; TEJ: temperate japonica rice; TRJ: tropical japonica rice; AUS: Aus rice; AROMATIC: aromatic rice; ADMIX: admixed group.

Scatter plot of 413 accessions with 11,047 SNPs according to the first two principal components (PCs) for the 44k rice genome dataset.

IND: indica rice; TEJ: temperate japonica rice; TRJ: tropical japonica rice; AUS: Aus rice; AROMATIC: aromatic rice; ADMIX: admixed group.

Dataset II

The rice genome dataset, which was collected for genomic selection study [8], was further analyzed as dataset II. This dataset contains 73,147 SNP variants and 363 elite breeding lines belonging to indica or indica–admixed group. Phenotypic observations include four years (2009–2012; two seasons per year [dry and wet]) of data on grain yield (YLD), flowering time (FT), and plant height (PH), although PH data in the wet season of 2009 were not available. Phenotypic values of 35 out of 363 individuals were missing; therefore, adjusted means of only 328 individuals were used in this study. Additionally, only 10,772 out of 73,147 SNPs were used in this study. One SNP marker was selected per 0.1-cM interval on each chromosome because the chosen subset of the full marker set has been shown to be efficient enough for genomic selection in this collection of rice germplasm [8].

Monte Carlo simulation of the genotypic data of progeny populations

To simulate the genotypic data of progeny populations, the Gramene Annotated Nipponbare Sequence [17] was used to estimate recombination rates between two adjacent SNPs. The Gramene Annotated Nipponbare Sequence database contains both physical and linkage distances between SNPs, which can be downloaded from http://archive.gramene.org. The genetic positions of SNPs were estimated via linear interpolation between the two markers flanking each SNP. Once the genetic positions were obtained, the recombination rates between adjacent SNPs were estimated using Haldane’s mapping function [18]: where r is the recombination rate between markers A and B; X is the linkage distance between markers A and B; and e is Euler’s number, a mathematical constant approximately equal to 2.71828. Based on a series of Bernoulli distributions and the estimated recombination rates, the crossover of each chromosome was simulated to yield the sequence of a gamete. Then, two gametes were paired to produce the genotypic data for the progeny.

GBLUP model

The following GBLUP model was considered for GP: where y denotes the vector of phenotypic values of a training population with n individuals; μ is a constant term; 1 is the vector of order n with all elements equal to 1; represents the vector of genotypic values; and is the vector of random errors. It is assumed that follows a multivariate normal distribution , where 0 is a zero vector; is the genetic variance of additive effects; and is a genomic relationship matrix among the individuals. Furthermore, follows , where is the random error variance, and denotes the identity matrix of order n. Here, and are assumed to be mutually independent. In this study, the genomic relationship matrix = /p was considered, where is the marker score matrix, and p is the number of SNP markers. The model parameters of the GBLUP model can be estimated through the Henderson’s equation [19], as follows: where the regularization parameter λ is given by . The mmer () function in the R package sommer [20] was used to obtain the restricted maximum likelihood estimates (REMLs) for the two variance components of and , and the resulting estimates were entered into Eq (2) to obtain and . If is considered as the vector of estimated genotypic values for a breeding population, and is considered as the genomic relationship matrix between the breeding and training populations, the following equation is obtained: The GEBV for the breeding population is plus the estimate of the constant term .

Index for quantifying genomic diversity

Let 0 be the vector of genotypic values, and 0 be the genomic relationship matrix for a particular set of accessions with size n0. According to the GBLUP model in Eq (1), the covariance matrix for 0 is given by: The determinant of the covariance matrix represents the overall variability of the genotypic values, which is calculated as: Clearly, the determinant of Eq (3) is proportional to the D-score defined below: For a fixed number of n0, a subset of accessions chosen from a breeding population with the maximal D-score will have greater genomic diversity than the competing choices with size n0. The concept of the D-score is adopted from optimum experimental designs [21]. A simple example is given below to illustrate the D-score. Suppose that there are three accessions (n = 3) in the candidate set with the genomic relationship matrix: For n0 = 2, the D-score for g1 and g2 is calculated as . Similarly, the D-scores for g1 and g3 and for g2 and g3 are given as 0.75 and 0.99, respectively. Clearly, the two accessions with g2 and g3 genotypic values have greater genomic variation (smaller genomic correlation) than the other competing choices. Closely related individuals could be excluded from the maximal D-score set. The genetic algorithm presented in Ou and Liao [22] was used to search a subset of accessions from a candidate population, such that it can attain the maximal D-score of Eq (4).

Procedure for selecting parental lines

To evaluate a variety of strategies for determining parental lines, the following steps were carried out: For a specific target trait, all phenotypic values available from each rice genome dataset were used to build the corresponding GBLUP model of Eq (1). The GEBVs of all accessions in the dataset were predicted through the trained GBLUP model in step 1. Seven strategies were used to select a subset of 10 parental lines according to their GEBVs: (i) the GEBV only (GEBV-O) approach, which chose the top 10 accessions (either maximal or minimal); three genomic diversity only (GD-O) approaches: (ii) GD-O-30, (iii) GD-O-50, and (iv) GD-O-100, which applied the genetic exchange algorithm to search for an optimal subset of 10 accessions from each of the three candidate sets composed of the top 30, 50, and 100 accessions, respectively, such that the chosen subset had the maximal D-score; and three approaches that considered both GEBV and genomic diversity: (v) GEBV-GD-30, (vi) GEBV-GD-50, and (vii) GEBV-GD-100, which retained the top two accessions, and then applied the genetic exchange algorithm to search for another eight accessions from the remainder of each candidate set for GD-O-30, GD-O-50, and GD-O-100, respectively, so that the resulting 10 accessions had the maximal D-score. For each subset of 10 accessions determined by the seven strategies, any two parental lines were crossed to produce 45 F1 hybrids. Here, we started to simulate the genotypic data for successive generations of progeny populations through the Monte Carlo simulation. Each of the 45 F1 hybrids produced 60 individuals of the F2 population by self-pollination, resulting in 2,700 F2 individuals. After obtaining the GEBVs for the 2,700 F2 individuals via the trained GBLUP model, the top 45 F2 individuals were retained. Again, these 45 F2 individuals were used to produce 2,700 F3 individuals (60 F3 individuals per F2 individual), and the top 45 F3 individuals were retained. The same procedure was repeated to produce 2,700 F10 individuals, which were assumed to be a fixed population. For the resulting 2,700 F10 individuals generated according to each strategy, we found the best F10 RILs with the highest GEBVs. A flowchart of this procedure is displayed in Fig 2. This procedure was repeated 30 times to obtain the best F10 RILs from each repetition for each strategy. The average of the GEBVs for the best F10 RILs was then calculated and used as the measure of efficiency for the strategy. Then, pairwise comparisons were performed among the GEBV averages, based on the least significant difference (LSD) test. Note that for BRSW, FPP, and PNPP in Dataset I and for YLD in Dataset II, larger GEBVs are preferable (i.e., for these traits, the larger the GEBV, the better). The remaining five traits (FTAA, FTAF, and PH in Dataset I, and FT and PH in Dataset II) follow the rule: that the smaller the GEBV, the better.

Fig 2

Flowchart showing the Monte Carlo simulation.

GEBV: genomic estimated breeding value; GBLUP: genomic best linear unbiased predictor; RIL: recombinant inbred line.

Flowchart showing the Monte Carlo simulation.

GEBV: genomic estimated breeding value; GBLUP: genomic best linear unbiased predictor; RIL: recombinant inbred line.

Calculation of genetic gain

To understand the genetic improvement in a target trait using different strategies, the genetic gain was estimated as: where denotes the average GEBV of the resulting 2,700 F10 RILs; and denotes the average GEBV of 10 parental lines selected using each strategy [23]. The larger the absolute value of the genetic gain, the greater the improvement in the target trait. The average of genetic gains from 30 repetitions was reported for each strategy, and multiple comparisons among the genetic gain averages were performed using the LSD test.

Results

Comparison of strategies based on the best F10 RILs

The GEBV averages of the best F10 RILs and results of the LSD test are displayed in Tables 1 and 2 for Datasets I and II, respectively. The strategies that considered both GEBV and genomic diversity (GEBV-GD-30, -50, -100) generally showed satisfactory efficiency because they achieved the best or second-best performance for all traits. Therefore, these types of strategies could be used as a reliable means for selecting parental lines. On the other hand, strategies accounting for only genomic diversity (GD-O-30, -50, -100) did not show satisfactory efficiency for all traits, except GD-O-100, which was satisfactory for YLD in Dataset II. The GEBV-O strategy showed the best or second-best performance for FPP and PH in Dataset I and for PH and FT in Dataset II, but it also showed the worst or second-worst performance for the remaining four traits in Dataset I and for YLD in Dataset II. These data indicate that GEBV-O is a high-risk strategy. In general, the results of the LSD test showed significant differences in GEBV averages between the best/second-best and worst/second-worst performances for all traits in both datasets.

Table 1

Ranking and GEBV averages (in parentheses) of the best F10 RILs selected by 30 repetitions of the seven proposed strategies applied to six traits in Dataset I.

Strategy¹	Traits²
Strategy¹	BRSW	FPP	FTAA	FTAF	PH	PNPP
GEBV-O	6 (3.418)e	2 (5.961)a	6 (56.52)e	6 (61.85)d	1 (42.18)a	6 (4.125)c
GD-O-30	7 (3.408)e	5 (5.951)a	3 (51.56)b	3 (59.35)a	5 (49.33)b	3 (4.188)b
GD-O-50	3 (3.576)c	6 (5.916)b	5 (53.34)d	5 (60.12)c	6 (49.80)b	5 (4.138)c
GD-O-100	4 (3.496)d	7 (5.882)c	7 (56.83)e	7 (61.96)d	7 (51.78)c	7 (4.086)d
GEBV-GD-30	5 (3.419)e	3 (5.954)a	1 (47.13)a	1 (59.21)a	2 (42.69)a	1 (4.225)a
GEBV-GD-50	1 (3.656)a	1 (5.964)a	2 (47.45)a	2 (59.30)a	3 (43.23)a	2 (4.214)a
GEBV-GD-100	2 (3.634)b	4 (5.953)a	4 (51.38)c	4 (59.63)b	4 (43.49)a	4 (4.171)b

2 BRSW: brown rice seed width; FPP: florets per panicle; FTAA: flowering time at Arkansas; FTAF: flowering time at Faridpur; PH: plant height; PNPP: panicle number per plant. Different lowercase letters indicate significant differences among the strategies for a given trait (P < 0.01; LSD test). The best and second-best strategies are indicated in bold, while the worst and second-worst strategies are underlined.

Table 2

Ranking and GEBV averages (in parentheses) of the best F10 RILs selected by 30 repetitions of the seven proposed strategies applied to three traits in Dataset II.

Strategy¹	Traits²
Strategy¹	YLD	PH	FT
GEBV-O	7 (6472)b	1 (85.817)a	2 (77.818)a
GD-O-30	4 (6491)b	5 (87.517)b	7 (78.410)c
GD-O-50	5 (6489)b	6 (89.920)c	5 (78.164)b
GD-O-100	1 (6546)a	7 (91.799)e	6 (78.359)bc
GEBV-GD-30	3 (6506)ab	3 (85.976)a	4 (77.883)a
GEBV-GD-50	6 (6485)b	2 (85.917)a	1 (77.725)a
GEBV-GD-100	2 (6539)a	4 (86.062)a	3 (77.873)a

1 GEBV-O: subset of the top 10 accessions with the minimal or maximal GEBV; GD-O-30, -50, -100: subsets of 10 accessions with maximal D-scores chosen from the candidate sets comprising the top 30, 50, and 100 accessions, respectively; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with the maximal D-scores.

2 YLD: yield; PH: plant height; FT: flowering time. Different lowercase letters indicate significant differences among the strategies for a given trait (P < 0.01; LSD test). The best and second-best strategies are indicated in bold, while the worst and second-worst strategies are underlined.

1 GEBV-O: subset of the top 10 accessions with minimal or maximal GEBV; GD-O-30, -50, -100: subsets of 10 accessions with the maximal D-scores chosen from candidate sets comprising the top 30, 50, and 100 accessions, respectively; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with the maximal D-scores. 2 BRSW: brown rice seed width; FPP: florets per panicle; FTAA: flowering time at Arkansas; FTAF: flowering time at Faridpur; PH: plant height; PNPP: panicle number per plant. Different lowercase letters indicate significant differences among the strategies for a given trait (P < 0.01; LSD test). The best and second-best strategies are indicated in bold, while the worst and second-worst strategies are underlined. 1 GEBV-O: subset of the top 10 accessions with the minimal or maximal GEBV; GD-O-30, -50, -100: subsets of 10 accessions with maximal D-scores chosen from the candidate sets comprising the top 30, 50, and 100 accessions, respectively; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with the maximal D-scores. 2 YLD: yield; PH: plant height; FT: flowering time. Different lowercase letters indicate significant differences among the strategies for a given trait (P < 0.01; LSD test). The best and second-best strategies are indicated in bold, while the worst and second-worst strategies are underlined. We also displayed the average GEBV ± standard deviation (SD) of the best RILs selected by 30 repetitions over consecutive generations in Figs 3, 4 and 5. Four strategies including GEBV-O, GEBV-GD-30, -50, and -100 selected the same best individual from the 30 repetitions at the parental generation and also at the F1 generation; therefore, no SD is shown with the corresponding GEBV averages. The GEBV averages of the best parental lines selected by the strategies can be ranked as GEBV-O = GEBV-GD-30 = GEBV-GD-50 = GEBV-GD-100 ≥ GD-O-30 ≥ GD-O-50 ≥ GD-O-100 in decreasing desirability. The desirability at the parental generation decreased as the degree of diversity increased for the three strategies, considering the genomic diversity only. Additionally, the desirability declined from the parental generation to F1 generation for every strategy because of the heterozygous alleles in F1 hybrids.

Fig 3

GEBV averages of the best individuals selected from 30 repetitions at consecutive generations for the six chosen traits in Dataset I.

Fig 4

GEBV averages of the best individuals selected from 30 repetitions at consecutive generations for the six chosen traits in Dataset I.

Fig 5

GEBV averages of the best individuals selected from 30 repetitions at consecutive generations for the three target traits in Dataset II.

GEBV averages of the best individuals selected from 30 repetitions at consecutive generations for the six chosen traits in Dataset I.

GEBV-O: subset of the top 10 accessions with minimal or maximal GEBVs; GD-O-30, -50, -100: subsets of 10 accessions with maximal D-scores chosen from candidate sets composed of the top 30, 50, and 100 accessions, respectively; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with maximal D-scores; BRSW: brown rice seed width; FPP: florets per panicle; FTAA: flowering time at Arkansas. GEBV-O: subset of the top 10 accessions with minimal or maximal GEBVs; GD-O-30, -50, -100: subsets of 10 accessions with maximal D-scores chosen from candidate sets composed of the top 30, 50, and 100 accessions, respectively; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with maximal D-scores; FTAF: flowering time at Faridpur; PH: plant height; PNPP: panicle number per plant.

GEBV averages of the best individuals selected from 30 repetitions at consecutive generations for the three target traits in Dataset II.

GEBV-O: subset of the top 10 accessions with minimal or maximal GEBVs; GD-O-30, -50, -100: subsets of 10 accessions with maximal D-scores chosen from candidate sets composed of the top 30, 50, and 100 accessions, respectively; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with the maximal D-scores; YLD, yield; PH, plant height; FT, flowering time. To explore the extent to which the top two accessions contributed to the subset of 10 parental lines determined by four strategies (GEBV-O, GEBV-GD-30, -50, and -100), we compared each subset with a reduced group consisting of F1 hybrids, whose parental lines contained at least one of the top two accessions for each subset. Each reduced group consisted of 17 F1 hybrids. Similarly, we followed the analysis procedure to obtain the GEBV averages of the best F10 RILs from 30 repetitions based on the reduced group. The results are displayed in Table 3, with the corresponding GEBV averages based on the group of the original 45 F1 hybrids. The results showed no practical significance between these two groups for all the traits using the four strategies (Table 3). Therefore, the reduced group can be an alternative to the full group.

Table 3

GEBV averages of the best F10 RILs selected by 30 repetitions, based on the original group comprising 45 F1 hybrids and the reduced group comprising 17 F1 hybrids, using four strategies.

Trait¹	Strategy²
Trait¹	GEBV-O		GEBV-GD-30		GEBV-GD-50		GEBV-GD-100
Dataset I	45 F₁	17 F₁	45 F₁	17 F₁	45 F₁	17 F₁	45 F₁	17 F₁
BRSW	3.418	3.423	3.419	3.418	3.656	3.652	3.634	3.650
FPP	5.961	5.965	5.954	5.957	5.964	5.958	5.953	5.943
FTAA	56.521	57.513	47.136	46.961	47.457	47.421	51.382	51.734
FTAF	61.856	61.850	59.216	59.123	59.304	59.232	59.634	59.713
PH	42.185	43.409	42.699	43.271	43.232	43.791	43.498	43.854
PNPP	4.125	4.129	4.225	4.226	4.214	4.204	4.171	4.161
Dataset II	45 F₁	17 F₁	45 F₁	17 F₁	45 F₁	17 F₁	45 F₁	17 F₁
YLD	6472	6476	6506	6499	6485	6484	6539	6534
PH	85.817	85.991	85.976	85.844	85.917	86.092	86.062	86.060
FT	78.818	77.834	77.883	77.750	77.725	77.778	77.873	77.690

2 GEBV-O: subset of the top 10 accessions with minimal or maximal GEBVs; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with the maximal D-scores.

1 BRSW: brown rice seed width; FPP: florets per panicle; FTAA: flowering time at Arkansas; FTAF: flowering time at Faridpur; PH: plant height; PNPP: panicle number per plant; YLD: yield; PH: plant height; FT: flowering time. 2 GEBV-O: subset of the top 10 accessions with minimal or maximal GEBVs; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with the maximal D-scores.

Genetic gains with different strategies

The average genetic gains and results of the LSD test are displayed in Tables 4 and 5 for Datasets I and II, respectively. It is also reasonable to evaluate the performance of the strategies according to the endpoint of . The comparison results based on were consistent with the above comparison results based on the best F10 RILs. Strategies that considered genomic diversity (GD-O-30, -50, -100; GEBV-GD-30, -50, -100) showed greater genetic gain than the GEBV-O for all traits, except PH in Dataset I (Table 4). The genetic gain generally increased with the increase in genomic diversity, as expected (GD-O-100 outperformed both GD-O-50 and GD-O-30 for all traits, except BRSW and FTAF in Dataset I; GEBV-GD-100 outperformed both GEBV-GD-50 and GEBV-GD-30 for all traits). The results of the LSD test showed that the GEBV-GD-100 strategy significantly differs from the remaining strategies in genetic gain for all traits in Dataset I, but it showed no significant difference from GEBV-GD-50 for FTAA and from GEBV-GD-50 and -30 for PH. On the other hand, the GD-O-100 strategy significantly differed from the remaining strategies for all traits in Dataset II, except from the GEBV-GD-100 for PH. In addition, GEBV-O showed the best , while GEBV-GD-30, -50, and -100 showed higher than their counterparts (GD-O-30, -50, and -100, respectively) for all traits in both datasets. Thus, a strategy has a relatively good starting point, as it considers more candidate accessions with the top GEBVs.

Table 4

Average genetic gains from 30 repetitions for Dataset I.

Strategy¹	Brown rice seed width (BRSW)			Florets per panicle (FPP)
Strategy¹	GEBV-P²	GEBV-F10³	Genetic gain⁴	GEBV-P	GEBV-F10	Genetic gain
GEBV-O	3.17	3.42	0.25f	5.51	5.96	0.45f
GD-O-30	3.10	3.41	0.31e	5.48	5.95	0.47e
GD-O-50	3.00	3.57	0.57c	5.41	5.91	0.50d
GD-O-100	2.94	3.49	0.55d	5.31	5.88	0.57b
GEBV-GD-30	3.12	3.42	0.30e	5.48	5.95	0.47e
GEBV-GD-50	3.04	3.65	0.61b	5.43	5.96	0.53c
GEBV-GD-100	3.00	3.63	0.63a	5.34	5.95	0.61a
	Flowering time at Arkansas (FTAA)			Flowering time at Faridpur (FTAF)
	GEBV-P	GEBV-F10	Genetic gain	GEBV-P	GEBV-F10	Genetic gain
GEBV-O	64.30	56.57	-7.73d	63.45	61.87	-1.58f
GD-O-30	72.25	49.26	-22.99bc	64.93	59.40	-5.53cd
GD-O-50	75.41	53.54	-21.87c	65.82	60.16	-5.66c
GD-O-100	80.01	57.00	-23.01bc	67.34	62.01	-5.33e
GEBV-GD-30	71.09	47.31	-23.78b	64.68	59.25	-5.43de
GEBV-GD-50	72.86	47.64	-25.22a	65.40	59.35	-6.05b
GEBV-GD-100	77.16	51.53	-25.63a	66.46	59.68	-6.78a
	Plant height (PH)			Panicle number per plant (PNPP)
	GEBV-P	GEBV-F10	Genetic gain	GEBV-P	GEBV-F10	Genetic gain
GEBV-O	83.77	42.52	-41.25b	3.93	4.12	0.19e
GD-O-30	89.50	49.69	-39.81b	3.86	4.19	0.33d
GD-O-50	90.11	50.13	-39.98b	3.80	4.14	0.34d
GD-O-100	92.10	52.10	-40.00b	3.64	4.08	0.44b
GEBV-GD-30	87.26	42.99	-44.27a	3.90	4.22	0.32d
GEBV-GD-50	87.95	43.50	-44.45a	3.84	4.21	0.37c
GEBV-GD-100	89.27	43.95	-45.32a	3.70	4.17	0.47a

2 : average GEBV of the 10 selected parental lines.

3 : average GEBV of 2,700 F10 RILs.

4 Lowercase letters indicate significant differences among strategies for a given trait (P < 0.01; LSD test).

Table 5

Average genetic gains derived from 30 repetitions for Dataset II.

Strategy¹	Yield (YLD)
Strategy¹	GEBV-P²	GEBV-F10³	Genetic gain⁴
GEBV-O	5571.61	6468.60	896.99e
GD-O-30	5452.39	6488.02	1035.63c
GD-O-50	5436.58	6484.58	1048.00bc
GD-O-100	5289.74	6540.72	1250.98a
GEBV-GD-30	5538.44	6501.23	962.79d
GEBV-GD-50	5522.45	6482.13	959.68d
GEBV-GD-100	5454.37	6535.79	1081.42b
	Plant height (PH)
	GEBV-P	GEBV-F10	Genetic gain
GEBV-O	97.75	85.89	-11.86d
GD-O-30	102.20	87.59	-14.61a
GD-O-50	103.66	89.99	-13.67b
GD-O-100	106.83	91.85	-14.98a
GEBV-GD-30	99.00	86.01	-12.99c
GEBV-GD-50	99.39	85.99	-13.40bc
GEBV-GD-100	101.15	86.13	-15.02a
	Flowering time (FT)
	GEBV-P	GEBV-F10	Genetic gain
GEBV-O	83.14	77.84	-5.30e
GD-O-30	83.98	78.43	-5.55d
GD-O-50	84.57	78.19	-6.38b
GD-O-100	85.62	78.39	-7.23a
GEBV-GD-30	83.44	77.90	-5.54d
GEBV-GD-50	83.69	77.76	-5.93c
GEBV-GD-100	84.16	77.89	-6.27b

2 : average GEBV of the 10 selected parental lines.

3 : average GEBV of 2,700 F10 RILs.

4 Lowercase letters indicate significant differences among the strategies for a given trait (P < 0.01; LSD test).

1 GEBV-O: subset of the top 10 accessions with minimal or maximal GEBVs; GD-O-30, -50, -100: subsets of 10 accessions with maximal D-scores chosen from the candidate sets composed of the top 30, 50, and 100 accessions, respectively; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with the maximal D-scores. 2 : average GEBV of the 10 selected parental lines. 3 : average GEBV of 2,700 F10 RILs. 4 Lowercase letters indicate significant differences among strategies for a given trait (P < 0.01; LSD test). 1 GEBV-O: subset of the top 10 accessions with minimal or maximal GEBVs; GD-O-30, -50, -100: subsets of 10 accessions with maximal D-scores chosen from the candidate sets composed of the top 30, 50, and 100 accessions, respectively; GEBV-GD-30, -50, -100: subsets of the top two accessions plus eight accessions chosen from the remainder of the candidate sets composed of the top 30, 50, and 100 accessions, respectively, with the maximal D-scores. 2 : average GEBV of the 10 selected parental lines. 3 : average GEBV of 2,700 F10 RILs. 4 Lowercase letters indicate significant differences among the strategies for a given trait (P < 0.01; LSD test).

Discussion

Dataset II was specifically collected for genomic selection. All the available accessions in this dataset are indica or indica–admixed. The results of performance based on the best F10 RILs (Table 2) revealed that all seven strategies showed similar performance for the three target traits. The resulting GEBV averages of the best F10 RILs ranged from 6472 to 6546 kg/ha for YLD, from 85.889 to 91.852 cm for PH, and from 77.725 to 78.410 days for FT. This could be because the candidate accessions in Dataset II are elite breeding lines, with limited genetic diversity and similar phenotypic values for the target traits. However, the results of the LSD test revealed that the two strategies (GD-O-100 and GEBV-GD-100) with greater genomic diversity for YLD led to significantly larger YLD than the other five strategies. Four strategies including GEBV-O, GEBV-GD-30, -50, and -100 performed equally well for PH but performed significantly better than GD-O-30, -50, and -100. It is well known that Dataset I contains more genomic diversity than Dataset II since it consists of five subpopulations and one admixed group. The higher genomic diversity of Dataset I could result in a bigger difference between GEBV-GD-30/50/100 strategies and the GEBV-O strategy for some traits. For example, the difference in the GEBV averages among the best F10 RILs between GEBV-GD-50 and GEBV-O was approximately -9.06 days for FTAA and -2.55 days for FTAF in Dataset I (Table 1), but the corresponding difference was only -0.09 days for FT in Dataset II (Table 2). However, the flowering time is very sensitive to environmental conditions, implying that genomic diversity cannot solely amount to the differences in results between these two datasets. More interestingly, the higher genomic diversity of Dataset I could lead to a larger genetic gain for a specific trait. The average genetic gain using the seven strategies for PH in Dataset I was -42.15 cm (Table 4); however, the corresponding mean in Dataset II was only -13.79 cm (Table 5). The average GEBV of the best F10 RILs for YLD was the highest using the GD-O-100 strategy on Dataset II (Table 2). However, the corresponding GEBV averages for two yield components, FPP and PNPP, were the lowest in Dataset I (Table 1). This is possible because the analysis results were based on two different collections of rice lines. There is little diversity among the RILs in Dataset II; therefore, the difference in the average GEBV for YLD among the strategies seems to be negligible. Note that the LSD test revealed only two significance groups in YLD. Nonetheless, the results of FPP and PNPP analysis using the GD-O-100 strategy in Dataset I appear to be reasonable. Apparently, the number of accessions fixed in the proposed strategies seemed to be arbitrary, similar to the selection of 10 parental lines, retaining the top 2 accessions, and searching 10 or another 8 accessions from the three candidate sets composed of the top 30, 50, and 100 accessions, respectively. A user certainly can adjust these numbers in the strategies according to their own study. Additionally, historical phenotypic data were required to build the GP model. If the historical phenotypic data are not available, then a pilot experiment is needed to phenotype a set of accessions, which can be determined using an optimization algorithm [22]. Two R functions used to perform the proposed procedure for selecting parental lines are provided in S1 File. We addressed the issue that incorporating genomic diversity into the conventional truncation selection could improve the likelihood of identifying superior parental lines. More importantly, we showed that combining GP with Monte Carlo simulation could help breeders to discover superior parental lines before conducting field experiments. It is well known that phenotype is affected by the genotype (G), environment (E), and G × E interaction. In reality, environment can have a significant impact on the performance of progeny populations during the growth period of each generation until reaching the F10 generation. Thus, parental lines selected from our simulation study may not perform as expected. Therefore, conducting field experiments to validate our study would be worthwhile. As mentioned earlier, the works of Gaynor et al. [13] and Yao et al. [14] support that the strategy for selecting better parental lines through GP with Monte Carlo simulation should prove useful in plant breeding. The study of Vanavermaete et al. [15] supports our theory that considering both GEBV and genomic diversity in parental selection is a promising strategy. In this study, we focused on single-trait selection; therefore, the proposed approach could select different parental lines for different target traits. In practice, it is desirable to extend the approach to multiple-trait selection. A straightforward extension is to apply a selection index incorporating multiple traits, and then treat the selection index as a new target trait for the current single-trait approach. Another possible modification is to directly implement multiple-trait GP models, and then use an appropriate selection index for evaluating candidate accessions. Jia and Jannink [24], Hayashi and Iwata [25], and Guo et al. [26] have shown that multiple-trait GP models provide better prediction accuracy than single-trait GP models for a low-heritability trait, which shows strong correlation with a high-heritability trait. We will present results of the multiple-trait approach in a future communication.

Conclusion

Combining GP with Monte Carlo simulation can serve as a practical means of detecting superior parents for crop pre-breeding programs. Different strategies can be implemented to identify a set of superior parental lines from a candidate population. The strategy that considers only GEBV will have a higher starting average GEBV among selected parental lines, but it may lead to a less genetic gain. On the other hand, strategies that consider genomic diversity only can retain greater genomic diversity but cannot simultaneously have a favorable starting GEBV average, and therefore may not produce RILs with satisfactory performance. Strategies that consider both GEBV and genomic diversity balance the starting GEBV average and genomic diversity among parental lines; these strategies show satisfactory genetic gain and produce high-performing RILs.

R functions.

(DOCX) Click here for additional data file. 13 Oct 2020 PONE-D-20-25857 Identify Superior Lines for Biparental Crossing via Genomic Prediction PLOS ONE Dear Dr. Liao, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. I had very difficulty time to obtain review comments from the second reviewer in a timely fashion. I believe that the first reviewer's comments are reasonable, thus, I recommend you to make revisions per the comments. Please submit your revised manuscript by Nov 27 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, David D Fang, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Thank you for stating the following in the Acknowledgments Section of your manuscript: 'This research was supported by the Ministry of Science and Technology, Taiwan (grant number: MOST 108-2118-M-002-002-MY2).' We note that you have provided funding information that is not currently declared in your Funding Statement, i.e. grant number. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. a. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: 'MOST (Taiwan) funded the master study of PY, but the funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. ' b. Please include your amended statements within your cover letter; we will change the online submission form on your behalf. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The strategy for selecting better parental line through genomic prediction (GP) is an excellent idea. This will obviously help the breeder to select the best parents efficiently for getting higher genetic gain. The authors tried to explore GP to select best parental lines for bi-parental crossing based on genomic estimated breeding value (GEBV) couple with genomic diversity (GD) using third party data and simulation study using computer. However, I do have several issue regarding this study and manuscript (MS). I would recommend PlosOne to accept this MS with major revision. Please see below my concern and comments. 1. Authors compared the performance of seven strategies for selecting better parental lines by using the actual average value of traits of top 10 lines. They didn’t do any statistical analysis for comparing the significance of the differences. They need to do some kind of statically analysis for any kind of comparing results in this MS. 2. This is a simulation study. In reality, they may not perform similarly since environment will play a big role on the performance of the progeny during the growing cycle of each of the generation up to F10. It would be worthy, if authors can include some kinds of validation work from the real situation. 3. Authors wrote in the line# 320-23, “Apparently, the numbers of accessions fixed in the proposed strategies seem to be a little arbitrary, such as those of selecting 10 parental lines, retaining the top 2 accessions, and searching 10 or another 8 accessions from the three candidate sets composed of the top 30, 50 and 100 accessions, respectively.” My question is why they select it arbitrary instead selecting top based on actual trait value or doing some statistical analysis. Another concern, how they select the top lines for different traits as they could be different for different traits. 4. They need to rewrite the whole discussion part since many of the statements are redundant with elsewhere especially results section in this MS. They need to include more supporting literature of their outcome. Also they did not explain well of their interesting results. 5. Authors wrote, “An R function for performing the proposed procedure of selecting parental lines is available from the authors upon request”. I would request to authors to include those as supplementary materials for the convenient to readers. 6. Abstract need to rewrite since more than half contains is background. Only four lines (~15%) results and implication. They need to write more results and outcome from this study. 7. Some of the sentences are totally identical in the more than one sections. 8. They need to define first the abbreviation before use in the main body other than abstract. 9. Authors referred one publication in the MS (line# 58) for those who will interest to learn more. I believe that it is unnecessary. 10. In the dataset 1, authors used only one third of the available SNPs. My question is how did they select the SNPs and what was the distribution of SNPs in the genome. Please put some numbers. 11. Authors wrote that “we then imputed a missing SNP marker from its corresponding major homozygous alleles.” I have no idea how could they imputed a missing marker using alleles data. 12. Why and how did they reduce the number of progeny (328 from 363) and SNPs (10,772 out of 73, 147) in the dataset 2. 13. What is the “e” stand for in the equation that measured the recombination rate? 14. The GEBV average of F10 RILs for yield was the highest in the GD-0-100. However, GEBV for all other yield components were lowest. How is it possible? What are the possible explanation? Please include in the discussion. 15. Authors wrote that “the GEBV averages of the best selected parental lines by the strategies can be ranked as GEBV-O = GEBV-GD-30 = GEBV-GD-50 = GEBV-GD-100 > GD-O-30 > GD-O-50 > GD-O-100 in decreasing desirability”. Results of the corresponding figures 2 and 3 do not support this statement for the all traits. 16. English language and grammar need to improve. Some references do not have required information such as page number. 17. In table 2, ranking for PH is not correct. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 13 Nov 2020 We have revised the manuscript according to the decision letter. Submitted filename: Response to Reviewers.docx Click here for additional data file. 17 Nov 2020 Identification of superior parental lines for biparental crossing via genomic prediction PONE-D-20-25857R1 Dear Dr. Liao, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, David D Fang, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 23 Nov 2020 PONE-D-20-25857R1 Identification of superior parental lines for biparental crossing via genomic prediction Dear Dr. Liao: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. David D Fang Academic Editor PLOS ONE

15 in total

1. Prediction of total genetic value using genome-wide dense marker maps.

Authors: T H Meuwissen; B J Hayes; M E Goddard
Journal: Genetics Date: 2001-04 Impact factor: 4.562

2. Parental selection, number of breeding populations, and size of each population in inbred development.

Authors: R Bernardo
Journal: Theor Appl Genet Date: 2003-08-19 Impact factor: 5.699

3. The impact of genetic relationship information on genome-assisted breeding values.

Authors: D Habier; R L Fernando; J C M Dekkers
Journal: Genetics Date: 2007-12 Impact factor: 4.562

4. Multiple-trait genomic selection methods increase genetic value prediction accuracy.

Authors: Yi Jia; Jean-Luc Jannink
Journal: Genetics Date: 2012-10-19 Impact factor: 4.562

5. Gramene database in 2010: updates and extensions.

Authors: Ken Youens-Clark; Ed Buckler; Terry Casstevens; Charles Chen; Genevieve Declerck; Paul Derwent; Palitha Dharmawardhana; Pankaj Jaiswal; Paul Kersey; A S Karthikeyan; Jerry Lu; Susan R McCouch; Liya Ren; William Spooner; Joshua C Stein; Jim Thomason; Sharon Wei; Doreen Ware
Journal: Nucleic Acids Res Date: 2010-11-13 Impact factor: 16.971

6. Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa.

Authors: Keyan Zhao; Chih-Wei Tung; Georgia C Eizenga; Mark H Wright; M Liakat Ali; Adam H Price; Gareth J Norton; M Rafiqul Islam; Andy Reynolds; Jason Mezey; Anna M McClung; Carlos D Bustamante; Susan R McCouch
Journal: Nat Commun Date: 2011-09-13 Impact factor: 14.919

7. Correction: Genomic Selection and Association Mapping in Rice (Oryza sativa): Effect of Trait Genetic Architecture, Training Population Composition, Marker Number and Statistical Model on Accuracy of Rice Genomic Selection in Elite, Tropical Rice Breeding Lines.

Authors: Jennifer Spindel; Hasina Begum; Deniz Akdemir; Parminder Virk; Bertrand Collard; Edilberto Redoña; Gary Atlin; Jean-Luc Jannink; Susan R McCouch
Journal: PLoS Genet Date: 2015-06-30 Impact factor: 5.917