Literature DB >> 34305987

Multi-Locus Genome-Wide Association Study and Genomic Selection of Kernel Moisture Content at the Harvest Stage in Maize.

Guangfei Zhou^1,2, Qiuli Zhu³, Yuxiang Mao¹, Guoqing Chen^1,2, Lin Xue^1,2, Huhua Lu¹, Mingliang Shi¹, Zhenliang Zhang¹, Xudong Song¹, Huimin Zhang¹, Derong Hao¹.

Abstract

Kernel moisture content at the harvest stage (KMC) is an important trait that affects the mechanical harvesting of maize grain, and the identification of genetic loci for KMC is beneficial for maize molecular breeding. In this study, we performed a multi-locus genome-wide association study (ML-GWAS) to identify quantitative trait nucleotides (QTNs) for KMC using an association mapping panel of 251 maize inbred lines that were genotyped with an Affymetrix CGMB56K SNP Array and phenotypically evaluated in three environments. Ninety-eight QTNs for KMC were detected using six ML-GWAS models (mrMLM, FASTmrMLM, FASTmrEMMA, PLARmEB, PKWmEB, and ISIS EM-BLASSO). Eleven of these QTNs were considered to be stable, as they were detected by at least four ML-GWAS models under a uniformed environment or in at least two environments and BLUP using the same ML-GWAS model. With qKMC5.6 removed, the remaining 10 stable QTNs explained <10% of the phenotypic variation, suggesting that KMC is mainly controlled by multiple minor-effect genetic loci. A total of 63 candidate genes were predicted from the 11 stable QTNs, and 10 candidate genes were highly expressed in the kernel at different time points after pollination. High prediction accuracy was achieved when the KMC-associated QTNs were included as fixed effects in genomic selection, and the best strategy was to integrate all KMC QTNs identified by all six ML-GWAS models. These results further our understanding of the genetic architecture of KMC and highlight the potential of genomic selection for KMC in maize breeding.

Entities: CellLine Chemical Disease Gene Species

Keywords: candidate gene; genomic selection; kernel moisture content; maize (Zea mays L); multi-locus genome-wide association study; quantitative trait nucleotide

Year: 2021 PMID： 34305987 PMCID： PMC8299107 DOI： 10.3389/fpls.2021.697688

Source DB: PubMed Journal: Front Plant Sci ISSN： 1664-462X Impact factor: 5.753

Introduction

Kernel moisture content at the harvest stage (KMC) is one of the important traits that influence maize mechanical harvesting, especially in high latitude areas (Sala et al., 2012; Li et al., 2017). Since the 1970s, many developed countries, such as the United States and Germany, have achieved fully mechanical harvesting of maize. By contrast, other countries, like China, have not yet implemented mechanical harvesting, primarily due to a lack of suitable maize varieties (Liu et al., 2013). The high KMC of currently used maize varieties restricts mechanical harvesting and represents the major barrier to maize development in China (Zhou et al., 2016, 2020; Li et al., 2017). Therefore, the genetic improvement of KMC and the breeding of elite varieties with low KMC is a major goal for maize breeders in China. The loss of maize kernel moisture occurs in two phases. The first phase lasts from pollination to kernel physiological maturity. During this phase, water in the kernel is replaced with carbohydrates, oils, proteins, etc., and the moisture of the kernel is highly dependent on its own physiological characteristics; this is called the physiological dehydration stage. The second phase lasts from physiological maturity to harvest. During this phase, the change in kernel moisture is primarily caused by moisture evaporation into the air and is thus readily influenced by environmental factors and other agronomic traits; this is known as the field dehydration stage (Brooking, 1990; Reid et al., 2010). Rapid rates of kernel filling and field dehydration are marked features of maize varieties with low KMC (Johnson and Tanner, 1972; Sala et al., 2006). Fewer husk layers as well as shorter and lighter husks are associated with greater loss of kernel moisture after physiological maturity (Reid et al., 2010; Li et al., 2016; Zhou et al., 2018). In addition, moisture from the kernels can be transported to other plant parts through the cob and stem in response to water potential differences (Zhou et al., 2018). Previous studies have revealed that KMC is controlled by numerous quantitative trait loci (QTLs), and hundreds of QTLs for maize KMC have been identified (Beavis et al., 1994; Melchinger et al., 1998; Austin et al., 2000; Ho et al., 2002; Mihaljevic et al., 2004, 2005; Blanc et al., 2006; Sala et al., 2006; Frascaroli et al., 2007; Capelle et al., 2010; Kebede et al., 2016; Song et al., 2017; Zhou et al., 2018; Liu et al., 2020; Yin et al., 2020b; Zhang et al., 2020; Li et al., 2021). Using meta-analysis, 44 and 34 meta-QTLs for KMC were identified by Xiang et al. (2012) and Sala et al. (2012), respectively. Liu et al. (2020) narrowed a major QTL for KMC (qGwc1.1) to a 2.05-Mb genomic region on chromosome 1 using a recombinant-derived progeny test. Li et al. (2021) cloned a gene (gar2-related nucleolar protein, GAR2) for KMC on maize chromosome 7. Yin et al. (2020b) identified seven QTLs for KMC using multiple-environment analysis and revealed that the interactions between QTLs and the environment were larger than their additive effects. Zhou et al. (2018) detected five QTLs for KMC through a mixed linear model (MLM) of single-locus genome-wide association study (SL-GWAS). The Bonferroni correction for multiple tests is frequently used in SL-GWAS to reduce spurious associations, and this results in the elimination of some positive loci with small effects. Multi-locus GWAS (ML-GWAS), an alternative GWAS method, was developed to address this issue; it considers the information from all markers simultaneously and does not require a multiple testing correction. ML-GWAS has been shown to have higher power and accuracy for the detection of quantitative trait nucleotides (QTNs) in maize. Zhang et al. (2018) used four ML-GWAS methods (mrMLM, FASTmrEMMA, ISIS EM-BLASSO, and pLARmEB) to identify QTNs for three stalk lodging resistance-related traits in maize and reported that the methods were reliable and complementary. Xu et al. (2018) compared one SL-GWAS method (GEMMA) and three ML-GWAS methods (FASTmrEMMA, FarmCPU, and LASSO) for the genetic detection of maize starch pasting properties, and more QTNs were detected by individual ML-GWAS methods than by the SL-GWAS method. An et al. (2020) used one SL-GWAS method (MLM) and six ML-GWAS methods (mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EMBLASSO) to dissect the genetic architecture of maize kernel row number. The largest number of QTNs were identified with the mrMLM method, and the most co-detected QTNs were identified with ISIS EM-BLASSO. Given the lack of large-effect QTLs, the use of marker-assisted selection (MAS) for KMC is not ideal in maize breeding programs, and it is necessary to incorporate far more markers. Genomic selection (GS), an upgraded form of MAS, aims to use genetic effects of genome-wide molecular markers to estimate the genomic estimated breeding value (GEBV) of individuals based on optimum statistical models (Meuwissen et al., 2001). This approach has been considered most promising for the genetic improvement of complex traits controlled by multiple genes with minor effects (Wang X. et al., 2018; Xu et al., 2020). Controlling costs by using fewer markers while still achieving accurate predictions for complex quantitative traits remains a challenge (Xu et al., 2020). Recently, several studies reported that taking association markers for interesting traits detected by GWAS into account and including them as fixed effects in GS models resulted in higher accuracy than that achieved with GS models using genome-wide markers (Spindel et al., 2016; Qin et al., 2019; Ravelombola et al., 2019; An et al., 2020; Sehgal et al., 2020). In this study, we used 251 maize inbred lines that were genotyped using an Affymetrix CGMB56K SNP Array and phenotypically evaluated in three field trials to (i) identify significant QTNs for KMC using ML-GWAS, (ii) predict candidate genes associated with KMC, and (iii) explore the potential of GS for KMC in maize.

Materials and Methods

Plant Material and Field Experiments

An association mapping panel of 251 diverse maize inbred lines was used as the plant material in this study (Supplementary Table 1). The field experiments were performed in three environments in 2020: Nantong, Jiangsu Province (NT, 120°E, 31°N), which is in mid-eastern China and has an average temperature of 15.1°C and an average rainfall of 1,040 mm per year; Xinxiang, Henan Province (XX, 113°E, 35°N), which is in the middle of China and has an average temperature of 15.5°C and an average rainfall of 573.4 mm per year; and Sanya, Hainan Province (SY, 108°E, 18°N), which is in southern China and has an average temperature of 25.7°C and an average rainfall of 1,347 mm per year. Each line was grown in single rows, which were 3 m in length and spaced 0.6 m apart, thereby giving a planting density of 65,000 plants/ha. The trial followed a randomized complete block design with two replicates per environment. The agronomic management of the field experiments was the same in the three environments.

Phenotypic Evaluation and Data Analysis

As described in our previous study (Zhou et al., 2018), before the experimental treatment, the physiological maturity of each line was evaluated in the field. According to their growth periods, the 251 maize inbred lines were sown on three separate dates to obtain similar physiological maturity in each environment (Supplementary Table 1). In NT, they were planted on the 22nd, 26th, and 30th of March, and the harvest stage was adjusted to between the 24th and 26th of July. In XX, they were planted on the 1st, 5th, and 9th of June, and the harvest stage was adjusted to between the 26th and 28th of September. In SY, they were planted on the 14th, 18th, and 22nd of November, and the harvest stage was adjusted to between the 5th and 7th of the following March. The ears were bagged before silking, and artificial pollination was performed at the same time for each line. The KMC for 6–8 uniformly growing plants in the middle of rows was measured using a hand-held moisture meter. At 10 days after physiological maturity, the KMC of each plant was recorded one time at the middle part of the ear. The phenotypic data were analyzed using R version 3.6.3 for Windows (https://www.r-project.org/). Analysis of variance (ANOVA) of KMC was performed using lmer function of the lme4 package based on the following model: y = μ + G + E + GE + R + e, where y is the KMC on the ith genotype in the jth environment and kth replication, μ is the grand mean over all environments, G is the genotypic effect of the ith genotype, E is the environmental effect of the jth environment, GEij is the genotype × environment interaction effect of the ith genotype and jth environment, R is the effect of the kth replication in jth environment, and e is the residual error. The trait heritability (H2) was estimated following Knapp et al. (1985): the individual environment H2 (%) = /( /r) × 100%, and the multiple environments H2 (%) = /(/n/nr) × 100%, where is the genotypic variance, is the variance for the interaction of genotype with environment, is the error variance, n is the number of environments, and r is the number of replications. To minimize the effects of the environment, the best linear unbiased prediction (BLUP) for KMC across the three environments was estimated using the lmer function of the lme4 package with the same ANOVA model. The normal distribution test (W-value) of KMC in each environment was estimated using the Shapiro.test function of the stats package.

Genotyping, Population Structure, Linkage Disequilibrium, and Relative Kinship

Genotypes of the 251 maize inbred lines were evaluated using an Affymetrix CGMB56K SNP Array, which contains 56,000 single nucleotide polymorphisms (SNPs) and is made by China Golden Marker (Beijing) Biotech Co., Beijing, China. After quality control, 32,853 SNPs with minor allele frequencies >5% and missing data <20% were used for subsequent analysis. The genotypic data can be downloaded from the website https://pan.baidu.com/s/1_V0fm7hsxNdMbyYzciQsWg. Population structure was assessed using STRUCTURE 2.3 (Pritchard et al., 2000). The number of subpopulations (K) was set from 1 to 10 with five independent runs for each K. Both burn-in periods and Markov chain Monte Carlo replication number were set at 100,000 in each run under the admixture model. The K value was estimated by the log likelihood of the data [LnP(D)] and an ad hoc statistic ΔK, based on the rate of change of LnP(D) between successive K values (Evanno et al., 2005). Nei's genetic distance (Nei, 1972) was calculated and used to construct a neighbor-joining tree with MEGA-X software (Kumar et al., 2018). The linkage disequilibrium (LD) parameter r2 between pairwise SNPs was calculated with PLINK (Purcell et al., 2007), which window size was set at 1,000 kb and r2 was set at 0.2. The relative kinship matrix of the 251 lines was computed using SPAGeDi 1.3 (Hardy and Vekemans, 2002) with negative values between two individuals set to zero.

Multi-Locus Genome-Wide Association Study

ML-GWAS was conducted using the mrMLM package (https://cran.r-project.org/web/packages/mrMLM/index.html), including six statistical models: mrMLM (Wang et al., 2016), FASTmrMLM (Tamba and Zhang, 2018), FASTmrEMMA (Wen et al., 2018), pLARmEB (Zhang et al., 2017), pKWmEB (Ren et al., 2018), and ISIS EM-BLASSO (Tamba et al., 2017). The mrMLM is a multi-locus model including markers selected from the random-SNP-effect MLM with a less stringent selection criterion (Wang et al., 2016). The FASTmrMLM is relatively faster with higher statistical power and accuracy in estimating QTNs as compared to mrMLM (Tamba and Zhang, 2018). The FASTmrEMMA combines the MLM and the expectation maximization empirical Bayes method. The pLARmEB integrates least angle regression with empirical Bayes (Zhang et al., 2017), while the pKWmEB integrates Kruskal-Wallis test with empirical Bayes (Ren et al., 2018). The ISIS EMBLASSO can detect significant associations with highest robustness and accuracy as compared to mrMLM and FASTmrEMMA (Tamba et al., 2017). Default values were used for all parameters, and the threshold of logarithm of odds (LOD) ≥ 3 (or P ≤ 0.0002) was selected to determine significant QTNs (Zhang et al., 2019). To confirm the efficiency of ML-GWAS, one widely used SL-GWAS method, MLM, was conducted using TASSEL 5.0 (Bradbury et al., 2007), controlling for population structure and kinship. The threshold was also set at P ≤ 0.0002 [-log10(P) ≥ 3.70]. Two types of QTNs were defined as stable QTNs. One is model-stable QTN (msQTN), which is identified by at least four ML-GWAS models under a uniformed environment, another is environment-stable QTN (esQTN), which is identified by in at least two environments and BLUP using the same ML-GWAS model.

Candidate Gene Analysis

Based on the B73 reference genome v4 (https://www.maizegdb.org/gbrowse), the available genes within regions from 100 kb upstream to 100 kb downstream (LD of the association mapping panel) around the stable QTNs were regards as candidates. Candidate gene annotation was performed at NCBI (https://www.ncbi.nlm.nih.gov/). Expression data for candidate genes were collected from qTeller (https://qteller.maizegdb.org/).

Genomic Selection

GS was performed using the ridge regression best linear unbiased predictor (rrBLUP) model (Meuwissen et al., 2001) in rrBLUP package of R software (Endelman, 2011). The rrBLUP model (model 1) was: , where y is the predicted phenotype of ith individual, μ is the grand mean, x is the genotype at the kth marker of the ith individual, p is the total number of markers (i.e., 32,853 high quality markers from the Affymetrix array), β is the estimated random additive marker effect of the kth marker, and ε is the residual error. To improve the prediction accuracy, the significant markers identified by ML-GWAS were included as fixed effects in the following model (model 2): , where m is the number of significant markers identified ML-GWAS considered as fixed effect covariates, α is the fixed additive effect of the jth marker, and the remaining terms are the same as those described in model 1. Twelve sets of markers were included as fixed effects in model 2: the significant markers identified by mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO; the markers identified by at least two, three, four, five, or six models; and all significant markers identified by all six models. Prediction accuracy was evaluated using 5-fold cross validation with 100 iterations. In brief, the association panel was randomly divided into five equal subsets; four subsets were regarded as the training population, and the remaining set was considered to be the testing population. The prediction accuracy was defined as the coefficient of determination between the predicted and the observed values based on the linear regression analysis.

Results

Population Structure, Linkage Disequilibrium, and Relative Kinship

STRUCTURE software was used to calculate the Bayesian clustering from K = 1 to 10 with five independent runs for each K. The LnP(D) values increased as K increased from 1 to 10 without an obvious inflection point, and ΔK reached its peak at K = 6 (Figure 1A), suggesting that this association mapping panel could be divided into six subgroups (Figure 1B). A neighbor-joining tree was constructed based on Nei's genetic distance, and it showed six main clusters for this panel (Figure 1C), consistent with the STRUCTURE results.

Figure 1

Population structure analysis of 251 maize inbred lines. (A) Estimated ΔK and LnP(D) in the STRUCUTRE analysis. (B) Neighbor-joining tree of 251 maize inbred lines. (C) The Bayes cluster plot of 251 maize inbred lines when K = 6. The six subpopulations were designated PA, PB, Tangsipingtou (TSPT), Lancaster (LAN), BSSS, and X (Supplementary Table 1). Subgroup PA, including 45 inbred lines, tended to be improved Reid lines, such as Zheng 58 and Ye478. Subgroup PB, including 45 inbred lines, derived mainly from hybrid 78599 and included Qi319, ND1145, etc. Subgroup TSPT, including 74 inbred lines, consisted mainly of inbred lines such as Huangzaosi and Chang7-2. Subgroup LAN, including 15 inbred lines, contained the representative inbred line Mo17. Subgroup BSSS derived from the synthetic variety BSSS and included 44 inbred lines; its representative inbred line was B73. Subgroup X, including 28 inbred lines, derived mainly from hybrid Xianyu335 that is widely cultivated in China. LD decayed differently in the 10 chromosomes; chromosome 7 had the most rapid decay rate, and chromosome 4 had the slowest. The average LD decay distance across all chromosomes was ~100 kb, where the LD parameter (r2) dropped to half of its maximum value (Figure 2A). The average pairwise relative kinship value was 0.076. Pairwise relative kinship values of 0 accounted for 64.64% of all kinship values, values from 0 to 0.2 accounted for 88.03%, and values >0.5 accounted for only 4.07% (Figure 2B). This result revealed that the 251 inbred lines used in this study were distantly related.

Figure 2

Linkage disequilibrium across the 10 chromosomes (A) and pairwise relative kinship for 251 maize inbred lines (B).

Phenotypic Variation in KMC

Descriptive statistics for KMC are presented in Table 1. KMC showed abundant variation among the 251 lines in each environment and was positively correlated among the different environments (Figure 3). The KMC in each environment approximately fitted a normal distribution with small skewness and kurtosis and high W-value (P > 0.05) (Table 1; Figure 3), suggesting that KMC was controlled by multiple genetic loci in this association mapping panel. ANOVA showed that the environment exerted significant influence on KMC (Supplementary Table 2), and the heritability was high (67.36–75.86%) (Table 1).

Table 1

Phenotypic performance, variance component and heritability of KMC.

Env. ^a	Mean ± SD^b (%)	Range (%)	Skewness	Kurtosis	W-value	σg2c	σge2d	H²(%)^e
Nantong	34.74 ± 6.23	18.63–51.13	−0.15	−0.17	0.99	25.60^**		67.36
Xinxiang	35.30 ± 8.60	10.63–52.13	−0.29	−0.29	0.99	52.11^**		70.42
Sanya	38.33 ± 8.03	18.42–53.83	−0.56	−0.42	0.97	44.63^**		69.73
BLUP	36.12 ± 4.79	21.14–47.31	0.19	−0.42	0.99	30.20^**	9.63^**	75.86

Environment.

Standard deviation.

Variance of genotype.

Variance of genotype × environment.

Heritability.

Significant at P < 0.01.

Figure 3

The correlation and frequency distribution of KMC in three environments. The upper panel is correlation coefficients, and the lower panel is scatter plots. The histogram represents the frequency distribution of the trait. ***the significance level at P < 0.001.

Phenotypic performance, variance component and heritability of KMC. Environment. Standard deviation. Variance of genotype. Variance of genotype × environment. Heritability. Significant at P < 0.01. The correlation and frequency distribution of KMC in three environments. The upper panel is correlation coefficients, and the lower panel is scatter plots. The histogram represents the frequency distribution of the trait. ***the significance level at P < 0.001. Based on the BLUP values across the three environments, 11 of the 251 lines showed low KMC (below 27% moisture) (ID: 110, 111, 114, 131, 188, 209, 235, 242, 246, 247, and 249), and these are marked in bold in Supplementary Table 1. Fifty-five lines showed high KMC (above 40%), and the remaining lines showed moderate KMC (Supplementary Table 1). There was a significant difference in KMC among the six subpopulations; the KMC of subgroup X was the lowest, and that of subgroup PA was the highest (Figure 4A).

Figure 4

Violin plot of (A) the KMC and (B) the number of favorable alleles for KMC in six subpopulations of this association mapping panel. Different letters indicate significant difference at P < 0.001 estimated by Student's t-test.

Multi-Locus Genome-Wide Association Study of KMC

A total of 98 QTNs were detected in NT, XX, SY, and BLUP across all environments by the six ML-GWAS models, and each explained 0.67–26.96% of the phenotypic variation in KMC (Supplementary Table 3). Thirty-eight, 35, 23, 27, 34, and 39 QTNs were detected by mrMLM (Supplementary Figure 1), FASTmrMLM (Supplementary Figure 2), FASTmrEMMA (Supplementary Figure 3), PLARmEB, PKWmEB (Supplementary Figure 4), and ISIS EM-BLASSO, respectively. However, only 7 QTNs were detected in NT, XX, SY, and BLUP across all environments by the SL-GWAS model (MLM) (Supplementary Figure 5; Supplementary Table 3). Expect for qKMC7.4, the remaining 6 QTNs were overlapped with those from ML-GWAS models. Among 98 QTNs, 44, 27, 16, 7, and 4 QTNs were co-detected by at least two, three, four, five, or six ML-GWAS models, respectively. 25, 25, 32, and 42 were detected in NT, XX, SY, and BLUP, respectively. However, no QTN was detected in three environments and BLUP, and only three QTNs (qKMC2.15, qKMC6.1, and qKMC8.2) were detected in two environments and BLUP. Eleven QTNs were considered to be stable; these were distributed on chromosomes 1, 2, 3, 5, 6, 7, 8, and 10 (Table 2). Among the 11 stable QTNs, 11 were msQTNs, 1 was esQTNs, and 1 (qKMC2.15) was common between msQTNs and esQTNs. Seven of them overlapped previously reported genomic regions, and the remaining four were putatively novel loci. Only one QTN, qKMC5.6, accounted for more than 10% of the phenotypic variation (12.41–23.27%), and it may be a major genetic locus for KMC.

Table 2

Stable QTNs for KMC co-detected by at least four models under a uniform environment or in at least two environments and BLUP using the same ML-GWAS model.

QTN	Marker	Chr. ^a	Position (bp)	LOD	R² (%)^b	Model^c	Environment^d	References
qKMC1.4	AX-86284737	1	38082648	5.06–9.07	3.14–7.37	1, 2, 3, 4, 5, 6	XX, BLUP	Sala et al., 2012; Xiang et al., 2012
qKMC1.5	AX-86259253	1	39246603	3.50–9.41	3.02–7.66	1, 2, 3, 4, 5, 6	NT, BLUP	Sala et al., 2012; Xiang et al., 2012
qKMC1.18	AX-86266353	1	297863807	4.69–6.06	3.13–7.16	1, 2, 4, 5	XX, BLUP
qKMC2.15	AX-116874459	2	178270600	5.15–7.42	2.19–6.23	1, 3, 4, 5, 6	NT, XX, BLUP	Xiang et al., 2012
qKMC3.3	AX-86264182	3	5148837	4.69–7.95	3.46–9.74	1, 2, 4, 5, 6	XX, BLUP
qKMC3.15	AX-116872692	3	229667802	3.97–5.94	2.33–6.89	1, 2, 3, 4, 5, 6	XX, BLUP	Yin et al., 2020b
qKMC5.6	AX-86314969	5	61236323	4.21–7.84	12.41–23.27	1, 2, 4, 5, 6	NT	Xiang et al., 2012
qKMC5.15	AX-86282179	5	217125252	3.23–10.12	1.12–6.48	1, 2, 4, 6	NT, BLUP	Li et al., 2021
qKMC6.7	AX-86294630	6	163230474	3.41–7.67	3.29–7.93	1, 3, 4, 6	SY
qKMC8.3	AX-86297230	8	174417551	3.09–5.97	2.45–3.89	2, 3, 4, 5	SY, BLUP
qKMC10.1	AX-86257470	10	10313586	3.19–9.60	1.86–9.81	1, 2, 3, 4, 5, 6	XX, BLUP	Xiang et al., 2012

Chromosome.

Phenotypic variation explained.

1: mrMLM, 2: FASTmrMLM, 3: FASTmrEMMA, 4: PLARmEB, 5: PKWmEB, 6: ISIS EM-BLASSO.

NT, Nantong; XX, Xinxiang; SY, Sanya; BLUP, best linear unbiased prediction.

Stable QTNs for KMC co-detected by at least four models under a uniform environment or in at least two environments and BLUP using the same ML-GWAS model. Chromosome. Phenotypic variation explained. 1: mrMLM, 2: FASTmrMLM, 3: FASTmrEMMA, 4: PLARmEB, 5: PKWmEB, 6: ISIS EM-BLASSO. NT, Nantong; XX, Xinxiang; SY, Sanya; BLUP, best linear unbiased prediction.

Candidate Genes for KMC

According to the LD in this association panel (Figure 2A), 63 candidate genes were identified in 100 kb upstream and downstream of the 11 stable QTNs, and their expression in the kernel varied widely among 11 time points after pollination (Supplementary Table 4). Ten of these genes showed high expression (FPKM ≥ 20), which were marked in bold in Supplementary Table 4, suggesting that they may take part in the loss of kernel moisture. Nine of the genes encode proteins with assigned functions in multiple biological pathways, and the remaining gene encodes a protein of unknown function.

Favorable Allele Mining

Based on the BLUP values across the three environments, the favorable alleles of 98 QTNs were mined (Supplementary Table 5). The inbred line ID. One hundred and ten harbors the most favorable alleles (69), and the inbred lines ID. Twenty one harbors the smallest favorable alleles (21). At group level, we found a significant negative correlation between the KMC and the number of favorable alleles in each inbred line (R2 = 0.68, slope = −0.41, intercept = 53.17, P < 0.001) using the linear regression analysis, indicating that pyramiding of these favorable alleles could reduce KMC effectively. There was a significant difference in number of favorable alleles among the six subpopulations; the number of favorable alleles of subgroup X was the most (48), and that of subgroup PA was the smallest (32) (Figure 4B).

Genomic Selection of KMC

The average prediction accuracy was 0.12, 0.11, 0.13, and 0.17 in NT, XX, SY, and BLUP, respectively, when all 32,853 markers across the entire genome were included in model 1 (Table 3). To obtain optimal prediction accuracy, 12 different marker sets were included as fix effects in model 2. A significant increase in prediction accuracy was obtained when ML-GWAS-derived markers were used: the prediction accuracies based on markers detected by individual ML-GWAS model were almost the same and were approximately 0.34 in NT, 0.53 in XX, 0.29 in SY, and 0.59 in BLUP (Table 3). When all significant markers detected by all six ML-GWAS models were incorporated into the GS model, the prediction accuracy was highest: 0.54 in NT, 0.64 in XX, 0.58 in SY, and 0.76 in BLUP (Table 3).

Table 3

The prediction accuracy of the KMC when using markers in different ML-GWAS models in three environments and BLUP.

Marker set^a	Nantong				Xinxiang				Sanya				Best linear unbiased prediction
	R²^b	Slope	Intercept	RMSE^c	R²	Slope	Intercept	RMSE	R²	Slope	Intercept	RMSE	R²	Slope	Intercept	RMSE
Genome	0.12	0.83	5.99	5.84	0.13	0.90	3.45	7.97	0.11	0.81	6.07	7.84	0.17	0.85	5.20	4.35
mrMLM	0.34	0.99	0.15	5.02	0.53	0.99	0.32	5.86	0.29	0.97	0.98	6.75	0.59	1.00	0.01	3.05
FASTmrMLM	0.34	0.99	0.37	5.02	0.53	0.99	0.28	5.86	0.29	0.98	0.77	6.75	0.59	1.00	0.12	3.06
FASTmrEMMA	0.34	0.99	0.19	5.01	0.53	1.00	0.19	5.85	0.29	0.98	0.76	6.73	0.59	1.00	0.05	3.06
PLARmEB	0.34	0.99	0.45	5.02	0.53	0.99	0.33	5.86	0.29	0.97	0.96	6.75	0.59	1.00	0.12	3.05
PKWmEB	0.34	0.99	0.23	5.02	0.53	0.99	0.32	5.85	0.29	0.97	1.14	6.76	0.59	1.00	0.08	3.05
ISIS EM-BLASSO	0.34	0.99	0.31	5.01	0.53	0.99	0.26	5.87	0.29	0.97	1.25	6.75	0.59	1.00	0.13	3.06
All models	0.54	1.01	−0.20	4.21	0.64	1.00	−0.06	5.12	0.58	1.02	−0.60	5.16	0.76	1.01	−0.29	2.32
C2	0.46	1.01	−0.31	4.55	0.60	1.00	−0.03	5.42	0.42	0.99	0.43	6.06	0.69	1.00	−0.05	2.64
C3	0.47	1.00	−0.05	4.48	0.54	0.99	0.33	5.81	0.33	0.98	0.33	6.52	0.65	0.99	0.05	2.81
C4	0.42	1.00	0.03	4.72	0.48	1.00	0.09	6.19	0.26	0.98	0.82	6.87	0.56	1.00	0.10	3.17
C5	0.19	0.98	0.86	5.59	0.20	0.99	0.51	7.42	0.14	0.95	1.21	7.81	0.22	0.96	1.10	4.21
C6	0.15	0.92	2.80	6.03	0.16	0.97	0.86	8.06	0.13	0.88	0.86	7.89	0.19	0.96	1.36	4.53

Genome indicated that 32,853 markers across the entire genome were included in model 1; mrMLM indicated that 38 markers identified by mrMLM model were included as fixed effects in model 2; FASTmrMLM indicated that 35 markers identified by FASTmrMLM model were included as fixed effects in model 2; FASTmrEMMA indicated that 23 markers identified by FASTmrEMMA model were included as fixed effects in model 2; PLARmEB indicated that 27 markers identified by PLARmEB model were included as fixed effects in model 2; PKWmEB indicated that 34 markers identified by PKWmEB model were included as fixed effects in model 2; ISIS EM-BLASSO indicated that 39 markers identified by ISIS EM-BLASSO model were included as fixed effects in model 2; All models indicated that 98 markers identified by all six ML-GWAS models were included as fixed effects in model 2; C2 indicated that 44 markers identified by at least two ML-GWAS models were included as fixed effects in model 2; C3 indicated that 27 markers identified by at least three ML-GWAS models were included as fixed effects in model 2; C4 indicated that 16 markers identified by at least four ML-GWAS models were included as fixed effects in model 2; C5 indicated that 7 markers identified by at least five ML-GWAS models were included as fixed effects in model 2; C6 indicated that 4 markers identified by at least six ML-GWAS models were included as fixed effects in model 2.

Coefficient of determination.

Square root of the mean square error.

The prediction accuracy of the KMC when using markers in different ML-GWAS models in three environments and BLUP. Genome indicated that 32,853 markers across the entire genome were included in model 1; mrMLM indicated that 38 markers identified by mrMLM model were included as fixed effects in model 2; FASTmrMLM indicated that 35 markers identified by FASTmrMLM model were included as fixed effects in model 2; FASTmrEMMA indicated that 23 markers identified by FASTmrEMMA model were included as fixed effects in model 2; PLARmEB indicated that 27 markers identified by PLARmEB model were included as fixed effects in model 2; PKWmEB indicated that 34 markers identified by PKWmEB model were included as fixed effects in model 2; ISIS EM-BLASSO indicated that 39 markers identified by ISIS EM-BLASSO model were included as fixed effects in model 2; All models indicated that 98 markers identified by all six ML-GWAS models were included as fixed effects in model 2; C2 indicated that 44 markers identified by at least two ML-GWAS models were included as fixed effects in model 2; C3 indicated that 27 markers identified by at least three ML-GWAS models were included as fixed effects in model 2; C4 indicated that 16 markers identified by at least four ML-GWAS models were included as fixed effects in model 2; C5 indicated that 7 markers identified by at least five ML-GWAS models were included as fixed effects in model 2; C6 indicated that 4 markers identified by at least six ML-GWAS models were included as fixed effects in model 2. Coefficient of determination. Square root of the mean square error. To explore whether using QTNs co-detected in multiple ML-GWAS models could improve prediction accuracy, we conducted GS using QTNs identified in at least two, three, four, five, or six models. Use of the QTNs identified in at least two, three, or four models maintain a relatively high prediction level, but QTNs identified in five or six models provided slight advantage in predicting KMC (Table 3). Seven QTNs were randomly selected from the 44 QTNs identified in at least two models (repeated 5 times) to conducted GS, and the mean of prediction accuracy was relatively low: 0.19 in NT, 0.15 in XX, 0.11 in SY, and 0.19 in BLUP, consistent with the results obtained by GS using QTNs identified in at least five models. This may be due to the smaller QTN numbers, which only explained a small fraction of phenotypic variance.

Discussion

In this study, the 251 maize inbred lines were sown at three dates according to their growth periods, enabling us to measure KMC of each line over similar periods in each environment. The hand-held moisture meter was used to measure KMC, this is a reliable method and has been reported to be useful for evaluating genetic materials for QTL mapping (Sala et al., 2006; Kebede et al., 2016) and GWAS (Zhou et al., 2018; Li et al., 2021). However, we observed a significant genotype by environment interaction and relatively low correlation coefficients (0.44–0.58) among environments. This is typical because temperature, air humidity, and rainfall are uneven across environments, suggesting that genotype-by-environment interactions should be considered during maize breeding. Despite this issue, our aim was to obtain stable genetic loci that make a stable contribution to KMC. As described by Zhang et al. (2019), QTNs identified by multiple models are usually reliable when several ML-GWAS methods are applied to the same dataset. To reduce false positive signals and detect a set number of true positive loci, we considered two types of QTNs to be stable, one is msQTN, which is identified by at least four ML-GWAS models under an uniformed environment, and the other is esQTN, which is identified by in at least two environments and BLUP using the same ML-GWAS model. Ninety-eight QTNs were identified by six ML-GWAS models in three environments and in BLUP across the three environments; eleven were considered to be stable QTNs. Only one QTN, qKMC5.6, explained more than 10% of the phenotypic variation, consistent with previous studies in which KMC was mainly controlled by numerous minor-effect genetic loci (Kebede et al., 2016; Song et al., 2017). Of these stable QTNs, 7 were located in genomic regions reported by previous studies, confirming the accuracy of QTN detection by ML-GWAS. Five QTNs (qKMC1.4, qKMC1.5, qKMC2.15, qKMC5.6, and qKMC10.1) were located in the meta-QTL regions estimated by Xiang et al. (2012) and Sala et al. (2012). In addition, qKMC3.15 was located in the QTL regions reported by Yin et al. (2020b), and qKMC5.15 overlapped with a QTL region identified by Li et al. (2021). More importantly, four novel genetic loci for KMC were identified in this study. Candidate gene analysis of the stable QTNs is necessary for further gene cloning and functional verification. To date, only two genes underlying major QTLs for KMC have been identified (Li et al., 2021). In this study, 63 candidate genes were identified surrounding the 11 stable QTNs. Among these genes, 10 were highly expressed in the kernel at different time points after pollination, suggesting that they may potentially affect kernel water loss. We cannot accurately determine which are causal genes associated with KMC based on the present data. However, four of the candidate genes (Zm00001d028560, Zm00001d005546, Zm00001d014742, and Zm00001d012439) caught our attention. Zm00001d028560, a candidate gene for qKMC1.5, encodes a leucine-rich repeat protein kinase that has been suggested to have an important role in signaling during pathogen recognition (Romeis, 2001; Afzal et al., 2007). Its homologous Arabidopsis gene, FEI 1, participates in cell wall elongation (Xu et al., 2008; Basu et al., 2016). Interestingly, this gene was located in the QTL regions for maize ear rot resistance identified by Butrón et al. (2019), Martin et al. (2012), and Robertson-Hoyt et al. (2007a). Previous studies have reported that KMC is negatively correlated with resistance to maize ear rot (Robertson-Hoyt et al., 2007b; Kebede et al., 2016). Moreover, Xiang et al. (2012) reported 14 pleiotropic meta-QTLs associated with both ear rot resistance and KMC in maize. These results suggest that Zm00001d028560 may be simultaneously related to both KMC and ear rot resistance in maize. Zm00001d005546, a candidate gene for qKMC2.15, encodes ADP-glucose pyrophosphorylase (AGPase), which provides the nucleotide sugar ADP-glucose and thus constitutes the first step in starch biosynthesis (Slattery et al., 2000; Comparot-Moss and Denyer, 2009). During the maize kernel filling period, AGPase activity and starch synthesis were significantly improved by increasing AGPase expression (Ozbun et al., 1973; Li et al., 2010). In addition, Zm00001d005546 is located in a QTL region related to the maize kernel filling process identified by Yin et al. (2020a). Kernel filling had a notable influence on kernel drying rate before and after physiological maturity in maize (Jia et al., 2020). Therefore, we hypothesize that high expression of Zm00001d005546 may have promoted starch synthesis in the kernel, increasing kernel filling rate, and thereby accelerating kernel dehydration rate before physiological maturity. This may have prolonged the field dehydration time of the kernel and ultimately resulted in low KMC. This explanation also provides a molecular hypothesis for the maize breeding phenomenon in which a hybrid or inbred line with high kernel filling rate generally has low KMC (Johnson and Tanner, 1972; Kang and Zuber, 1989). More experiments are needed to elucidate the function and mechanism of Zm00001d005546. Zm00001d014742 encodes F-box domain protein and is a candidate gene for the major QTN, qKMC5.6. Its homologous Arabidopsis gene, AtSKIP31, involves in primary root growth under nitrogen deficiency and regulates the nitrogen utilization efficiencies (Hong et al., 2017). Nitrogen utilization efficiencies are related to grain yield and maturation (Wang W. et al., 2018), which have positive correlation with the KMC (Zhou et al., 2018; Li et al., 2021). Zm00001d012439, a candidate gene for the novel QTN, qKMC8.3, encodes histone H4, which may affect gene transcription activity through histone modification (Heintz, 1991). In this study, 251 maize inbred lines were divided into six subgroups, that was, PA, PB, TSPT, BASSS, LAN and X. Among these six subgroups, Subgroup PA had highest KMC and the least number of favorable alleles for the KMC, however, Subgroup PA contained multiple elite inbred lines, such as Ye478 and Zheng58, and have played an important role in maize breeding in China over the last 40 years (Li and Wang, 2010). Subgroup X had lowest KMC and the greatest number of favorable alleles for the KMC, and has being gradually applied in maize breeding in China (Zhao et al., 2018). This phenomenon is mainly due to the changes of maize breeding goals at different periods in China. Before 2010, maize harvesting mainly relied on manual. Farmers preferred to plant the varieties with large ear under low density condition to ensure yield. Chinese maize breeders increased yields by extending the growth period. Subgroup PA had the characteristics of large ears and long growth period, which was in line with the breeding goals at that time (Li and Wang, 2010). Maize inbred line or hybrid with large ear and long growth period tended to high KMC (Zhou et al., 2018; Li et al., 2021). In recent years, with the development of agricultural modernization, mechanical harvesting of grain is the developing direction of maize production (Li et al., 2017). Chinese maize breeders have increasingly concentrated on the KMC. Thus, a new heterosis group, Subgroup X, was breed. Subgroup X had short growth period and low KMC, which was suitable for mechanical harvesting of maize grain (Zhao et al., 2018). With advances in sequencing technology and reduction of testing costs, GS has been widely implemented in plant breeding. Fitting GS models need to face the fact that the number of markers (p) far exceeds the number of individuals (n) (de los Campos et al., 2013). Consequently, when a GS model that considers the additive effect of each marker is fitted to such large p and small n data, there will be an infinite number of maximum likelihood estimates of these effects (Gianola, 2013). rrBLUP incorporates all marker information to predict an individual genomic estimated value while simultaneously implementing a penalization function to restrict the values that each marker predicted additive contributions can equal, which is an effective model to overcome this issue (Meuwissen et al., 2001). In this study, the p is 32,853, while the n is only 251. Thus, rrBLUP was selected to conduct GS. When using 32,853 markers across the entire genome, we obtained a lower prediction accuracy (0.11–0.17). However, higher prediction levels were easily attained when using the ML-GWAS-derived markers included as fix effects. The prediction accuracy was still high (0.26–0.56) when only 16 stable markers identified by at least four models were included. Similar findings were reported for maize kernel row number (An et al., 2020), resistance to maize southern leaf blight and gray leaf spot (Bian and Holland, 2017), and maize low-phosphorus tolerance (Xu et al., 2018). Therefore, using a small set of markers identified by multiple ML-GWAS methods as fixed effects in an rrBLUP model is a powerful tool for KMC prediction in maize molecular breeding and can effectively save time and costs.

Conclusions

Ninety-eight QTNs for KMC were identified using six ML-GWAS models in three environments and BLUP across three environments. Eleven QTNs were considered to be stable. Seven stable QTNs corresponded to previously reported QTL regions, whereas the remaining four were putatively novel loci. Sixty-three candidate genes were identified within LD blocks of the 11 stable QTNs. Among these candidates, 10 may potentially affect the loss of water from the maize kernel. High prediction levels were easily reached when the KMC-associated markers were included as fixed effects in GS. The best strategy was to integrate all KMC-associated markers identified by all six ML-GWAS models. These results facilitate our understanding of the genetic basis of KMC and provide useful information for the reduction of KMC in maize breeding.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Author Contributions

GZ and DH designed the experiment and wrote the manuscript. GZ, YM, LX, GC, ZZ, XS, HZ, HL, MS, and DH performed the experiments and collected the phenotypic data. GZ and QZ analyzed the genotypic and phenotypic data. All authors read and approved the final manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

49 in total

Review 1. Protein kinases in the plant defence response.

Authors: T Romeis
Journal: Curr Opin Plant Biol Date: 2001-10 Impact factor: 7.834

2. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study.

Authors: G Evanno; S Regnaut; J Goudet
Journal: Mol Ecol Date: 2005-07 Impact factor: 6.185

3. PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025

4. Quantitative trait locus (QTL) mapping using different testers and independent population samples in maize reveals low power of QTL detection and large bias in estimates of QTL effects.

Authors: A E Melchinger; H F Utz; C C Schön
Journal: Genetics Date: 1998-05 Impact factor: 4.562

5. Molecular mapping of quantitative trait loci for grain moisture at harvest and field grain drying rate in maize (Zea mays L.).

Authors: Jun Zhang; Fengqi Zhang; Baojun Tang; Yong Ding; Laikun Xia; Jianshuang Qi; Xinyuan Mu; Limin Gu; Daowen Lu; Yanhui Chen
Journal: Physiol Plant Date: 2019-12-06 Impact factor: 4.500

6. Integrating a genome-wide association study with transcriptomic analysis to detect genes controlling grain drying rate in maize (Zea may, L.).

Authors: Tengjiao Jia; Lifeng Wang; Jingjing Li; Juan Ma; Yanyong Cao; Thomas Lübberstedt; Huiyong Li
Journal: Theor Appl Genet Date: 2019-12-03 Impact factor: 5.699

7. Iterative sure independence screening EM-Bayesian LASSO algorithm for multi-locus genome-wide association studies.

Authors: Cox Lwaka Tamba; Yuan-Li Ni; Yuan-Ming Zhang
Journal: PLoS Comput Biol Date: 2017-01-31 Impact factor: 4.475

8. Methodological implementation of mixed linear models in multi-locus genome-wide association studies.

Authors: Yang-Jun Wen; Hanwen Zhang; Yuan-Li Ni; Bo Huang; Jin Zhang; Jian-Ying Feng; Shi-Bo Wang; Jim M Dunwell; Yuan-Ming Zhang; Rongling Wu
Journal: Brief Bioinform Date: 2018-07-20 Impact factor: 11.622

9. Editorial: The Applications of New Multi-Locus GWAS Methodologies in the Genetic Dissection of Complex Traits.

Authors: Yuan-Ming Zhang; Zhenyu Jia; Jim M Dunwell
Journal: Front Plant Sci Date: 2019-02-11 Impact factor: 5.753

10. Genome-wide association study dissects yield components associated with low-phosphorus stress tolerance in maize.

Authors: Cheng Xu; Hongwei Zhang; Jianhao Sun; Zifeng Guo; Cheng Zou; Wen-Xue Li; Chuanxiao Xie; Changling Huang; Ruineng Xu; Hong Liao; Jinxiang Wang; Xiaojie Xu; Shanhong Wang; Yunbi Xu
Journal: Theor Appl Genet Date: 2018-05-12 Impact factor: 5.699

2 in total

1. Multi-Trait Genomic Prediction Models Enhance the Predictive Ability of Grain Trace Elements in Rice.

Authors: Blaise Pascal Muvunyi; Wenli Zou; Junhui Zhan; Sang He; Guoyou Ye
Journal: Front Genet Date: 2022-06-22 Impact factor: 4.772

2. Combining datasets for maize root seedling traits increases the power of GWAS and genomic prediction accuracies.

Authors: Leandro Tonello Zuffo; Rodrigo Oliveira DeLima; Thomas Lübberstedt
Journal: J Exp Bot Date: 2022-09-12 Impact factor: 7.298

2 in total