Literature DB >> 31242867

Genome-wide associations and epistatic interactions for internode number, plant height, seed weight and seed yield in soybean.

Teshale Assefa1, Paul I Otyama2, Anne V Brown3, Scott R Kalberer3, Roshan S Kulkarni2, Steven B Cannon4.   

Abstract

BACKGROUND: Breeding programs benefit from information about marker-trait associations for many traits, whether the goal is to place those traits under active selection or to maintain them through background selection. Association studies are also important for identifying accessions bearing potentially useful alleles by characterizing marker-trait associations and allelic states across germplasm collections. This study reports the results of a genome-wide association study and evaluation of epistatic interactions for four agronomic and seed-related traits in soybean.
RESULTS: Using 419 diverse soybean accessions, together with genotyping data from the SoySNP50K Illumina Infinium BeadChip, we identified marker-trait associations for internode number (IN), plant height (PH), seed weight (SW), and seed yield per plant (SYP). We conducted a genome-wide epistatic study (GWES), identifying candidate genes that show evidence of SNP-SNP interactions. Although these candidate genes will require further experimental validation, several appear to be involved in developmental processes related to the respective traits. For IN and PH, these include the Dt1 determinacy locus (a soybean meristematic transcription factor), as well as a pectinesterase gene and a squamosa promoter binding gene that in other plants are involved in cell elongation and the vegetative-to-reproductive transition, respectively. For SW, candidate genes include an ortholog of the AP2 gene, which in other species is involved in maintaining seed size, embryo size, seed weight and seed yield. Another SW candidate gene is a histidine phosphotransfer protein - orthologs of which are involved in cytokinin-mediated seed weight regulating pathways. The SYP association loci overlap with regions reported in previous QTL studies to be involved in seed yield.
CONCLUSIONS: This study further confirms the utility of GWAS and GWES approaches for identifying marker-trait associations and interactions within a diverse germplasm collection.

Entities:  

Keywords:  Epistasis; GWAS; GWES; Internode number; Plant height; QTL; Seed weight; Seed yield per plant; Soybean

Mesh:

Year:  2019        PMID: 31242867      PMCID: PMC6595607          DOI: 10.1186/s12864-019-5907-7

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Soybean (Glycine max [L] Merr.) is the world’s largest single source of vegetable protein and oil, accounting for more than 50% of world edible oil (Soystats 2013, http://www.soystats.com). Soybean is also a valuable component of many agricultural systems due to its N-fixing capacity. Improvement of seed yield is a major objective in soybean breeding. Seed yield (seed yield per plant, SYP) is a complex trait and is influenced by many developmental traits including seed weight (SW), internode number (IN) and plant height (PH). Like seed yield, these developmental traits are also quantitatively inherited. For example, SW is influenced by numerous physiological and morphological components [1]. Internode number and plant height affect seed yield via their impact on important traits including lodging and adaptability in soybean [2]. Many linkage mapping studies in soybean have been curated and compiled at SoyBase (https://www.soybase.org), collectively resulting in approximately 250, 200 and 30 QTLs for SW, PH and IN, respectively ([3] https://www.soybase.org). Significant, positive correlations have also been reported between PH and IN [3] as well as SW and SYP [4, 5]. Recent mapping studies have identified associations among QTLs related to seed yield and seed weight [2, 6, 7]. However, in general, QTL studies for yield and seed weight have not resulted in the detection of candidate genes, due to the typically low genetic resolution of biparental QTL studies [6]. Plant height and internode number have significant correlations with flowering and maturity traits, which are important agronomic traits associated with adaptability and productivity in soybean [8]. Chang et al. [3] identified 34 loci for PH and 30 loci for node number via genome wide association studies (GWAS) in 368 soybean accessions. This study also confirmed that IN and PH are correlated (r = 0.49). Similarly, Fang et al. [9] phenotyped 809 accessions for eighty-four agronomic traits and identified 245 significant loci, of which 95 genetically interacted with other loci. They also reported correlation of IN and PH (r = 0.6). IN and PH are polygenic traits controlled by many loci of small effect [3, 8]. The molecular mechanisms of some genes involved in IN and PH have been reported – for example, Dt1 is a meristematic transcription factor, orthologous to the A. thaliana Tfl1 gene [10], and E2 is an ortholog of GIGANTEA, which functions upstream of CONSTANS (CO) and FLOWERING LOCUS T (FT) in A. thaliana [11]. A linkage mapping study by Sun et al. [12] showed various QTL for plant height at different growth stages. Similarly, Chang et al. [3] reported that several loci of IN and PH were captured at different growth stages in soybean. Several other studies that associated developmental quantitative traits with genetic markers have been reported in soybean [3, 13, 14]. GWAS methods provide a powerful approach for discovering candidate genes associated with complex traits [3, 15–17]. They have identified QTLs in many crop species, including rice, maize, and soybean. GWAS complements QTL studies by offering a way to identify more association regions with greater precision – albeit depending on the number, diversity and genetic structure of the germplasm accessions. GWAS mainly addresses additive genetic effects; however, these only explain a portion of the heritability estimates for complex traits. Recent studies have revealed that both additive and epistatic interactions have measurable effects on the genetic architecture of soybean diseases such as sclerotinia stem rot, and sudden death syndrome [18, 19]. The combination of additive genetic and epistatic effects was able to explain additional phenotypic variations. We have used a “genome wide epistatic study” (GWES) approach to complement the more widely-used GWAS analysis and provide a fuller understanding of the genetic architecture of complex traits. In particular, GWES helps reveal the genetic basis of IN, PH, SW and SYP in soybean.

Results

Measurements from field evaluation

Significant differences (P < 0.05) were found among the 419 genotypes for all four traits (IN, PH, SW, SYP), measured at three locations. The IN (measured on the main stem) ranged from 8.3 to 17.8 with an average of 13.8. The PH ranged from 22.6 to 144.8 cm with an average of 70.3 cm. The SW ranged from 2.9 to 29.3 g with an average of 12.6 g. The SYP ranged from 7.6 to 81.0 g with an average of 33.9 g (Table 1). The population had approximately normal distribution for all four traits (Additional file 1: Figure S1). The heritability estimate was found to be moderate to high for each trait (65–75%), suggesting that genetic effects played a significant role in each trait’s expression (Table 1).
Table 1

Phenotypic variation of internode number, plant height, 100 seed weight, and seed yield per plant, based on genotype performance across locations

TraitMean ± SEMRangeHeritability (%)
Internode number (count)13.8 ± 0.368.3–17.865
Plant height (cm)70.3 ± 8.622.6–144.870
100 seed weight (gm)12.6 ± 0.642.9–29.375
Seed yield per plant (gm)33.9 ± 5.07.6–8167

SEM = Standard Error of the Mean

Phenotypic variation of internode number, plant height, 100 seed weight, and seed yield per plant, based on genotype performance across locations SEM = Standard Error of the Mean

Linkage disequilibrium (LD)

The linkage disequilibrium (LD) decay rate for euchromatin and heterochromatin was 238 kb and 1648 kb, respectively (Additional file 2: Figure S2). The average marker density varied over different chromosomes, from 39 kb per SNP in chromosome 1 (Gm01) to 19 kb per SNP in chromosome 14 (Gm14). From the total SNPs used for the soybean genome, 77% were found in euchromatic regions where 78% of the predicted genes occurred in the chromosome ends; this is the region which accounts for most genetic recombination events.

GWAS and candidate gene prediction

The best linear unbiased prediction (BLUP) of each PI line’s performance across three environments was used in a mixed linear model (MLM). This strategy was used to help minimize the rate of false positives [20]. A total of 75 significant SNPs (P < 0.05) were identified for the IN, PH, SW, and SYP traits (Fig. 1 a, b, c and d). The trait variation explained by each marker (R2) varied from 0.03 to 0.11.
Fig. 1

Manhattan plots of GWAS for internode number. The horizontal black line in each panel shows a selected genome-wide significance threshold at FDR < 0.05. The panels give GWAS values for (a) internode number (IN), (b) plant height (PH), (c) seed weight (SW), and (d) seed yield per plant (SYP). See Table 2 for regions and candidate genes

Manhattan plots of GWAS for internode number. The horizontal black line in each panel shows a selected genome-wide significance threshold at FDR < 0.05. The panels give GWAS values for (a) internode number (IN), (b) plant height (PH), (c) seed weight (SW), and (d) seed yield per plant (SYP). See Table 2 for regions and candidate genes
Table 2

Summary of GWAS regions, genes of interest, and overlapping QTLs. Counts of candidate genes are given for regions defined by two methods: by flanking non-significant markers (column “Genes in region - A”) and within a window defined by the significant SNP(s) plus 100 kb before and after the significant SNP(s) (column “Genes in region - B”). Known QTLs are abbreviated IN: Node number; PH: Plant height; SY: seed yield; SW: seed weight. Also see a more detailed version of the table in Additional file 3 (including full *QTL names), and a list of all genes within GWAS regions in Additional file 4

Start SNPEnd SNPChromGene count AGene count BP-valuestable 3 locsGenes of special interestAnnotationKnown QTLs*
Internode Number (IN)
 ss715578706ss715578717Gm018247E-05Glyma.01 g022500AP2-like ethylene-responsive TFNN
 ss715579430ss715579430Gm01117
 ss715578432ss715578590Gm0133
 ss715578480ss715578480Gm0149
 ss715578432ss715578590Gm01430.0004Glyma.01 g074000ubiquitin protein ligaseNN, PH
 ss715589109ss715589108Gm04025
 ss715590708ss715590708Gm05624
 ss715598728ss715598728Gm07321
 ss715614000ss715614000Gm133238E-05Glyma.13 g101400Putative lysophospholipaseNN, PH
 ss715625429ss715625429Gm16021
 ss715623458ss715623458Gm163220.0005Glyma.16 g016400NAC domain containing protein 62NN, PH
 ss715631395ss715631424Gm1814248E-05XGlyma.18 g203800Metallopeptidase proteinPH
 ss715635223ss715635190Gm194294E-05Glyma.19 g002900NAC domain-containing protein 8-likePH
 ss715635454ss715635456Gm195244E-05XGlyma.19 g194300Phosphatidylethanolamine binding prot (Dt1)PH
 ss715635024ss715635024Gm191220.0002Glyma.19 g146000squamosa promoter bindingPH
 samesameGlyma.19 g145700pectinesterasePH
 samesameGlyma.19 g145400plant phosphoribosyltransferase proteinPH
Plant Height (PH)
 ss715582992ss715582994Gm023239E-05XGlyma.02G245600Gibberellin-regulated family proteinPH
 ss715585780ss715585783Gm0310254E-05
 ss715585781ss715585784Gm039240.0002
 ss715585788ss715585790Gm038230.0002
 ss715588701ss715588703Gm043250.0001
 ss715592575ss715592573Gm051200.0002
 ss715602866ss715602868Gm086369E-05
 ss715635355ss715635357Gm193260.0001XGlyma.19G187800pectinesterase 11PH, SL
 ss715635405ss715635407Gm193177E-05XGlyma.19G192400ethylene-responsive transcription factor 12PH, SL
 samesameXGlyma.19G192700growth-regulating factor 4PH, SL
 ss715635424ss715635459Gm1917361E-06XGlyma.19 g194300Phosphatidylethanolamine binding prot (Dt1)PH, SL
Seed Weight (SW)
 ss715579653ss715579702Gm011826
 ss715583620ss715583627Gm02535
 ss715582341ss715582351Gm025268E-06Glyma.02 g04660histidine kinase 1SW
 ss715584321ss715584321Gm02116
 ss715588755ss715588756Gm043275E-05XGlyma.04 g228200Late embryogenesis protein (LEA)SW
 ss715594571ss715594571Gm06317
 ss715608522ss715608522Gm10216
 ss715609333ss715609333Gm11314
 ss715613704ss715613704Gm12817
 ss715612715ss715612715Gm12723
 ss715614263ss715614263Gm133192E-06XGlyma.13 g149000Late embryogenesis abundant (LEA)SW
 ss715624623ss715624628Gm16418
 ss715635103ss715635103Gm191249E-05Glyma.19 g151900Hpt domainSW
 ss715635205ss715635205Gm196200.0004XGlyma.19 g163900AP2 domain proteinSW
Seed Yield per Plant (SYP)
 ss715586641ss715586641Gm03730
 ss715602684ss715602688Gm08429
 ss715603479ss715603479Gm0908
 ss715603626ss715603626Gm095270.0004XGlyma.09 g040000Response regulator receiverSY, SW
 ss715615670ss715615670Gm131635
 ss715618757ss715618757Gm14122
 ss715620375ss715620383Gm156280.0003Glyma.15 g145200Response regulatorSY, SW
 ss715634241ss715634241Gm19014
 ss715637551ss715637574Gm2013370.0002Glyma.20 g112200Inner centromere proteinSY, SW
For all loci showing significant GWA, we examined annotations of genes both within regions defined by nonsignificant SNPs that flanked significant GWA SNP peaks, and within regions defined by 100 kb windows around significant GWA SNPs. In addition, we compared these results with QTL intervals for related traits reported for these regions. The count of genes falling within these two types of regions around each locus is given in Table 2, and details about all genes and loci in Additional files 3 and 4. The average number of genes found within regions around one or more significant SNPs and between non-significant SNPs is 5.1 genes (max 33, min 0), and the average number of genes found in regions defined by 100 kb windows around significant GWA SNPs is 22.3 (max 37, min 3). SNPs showing stable interactions across all three loci are indicated in Table 2 and Additional file 3. Summary of GWAS regions, genes of interest, and overlapping QTLs. Counts of candidate genes are given for regions defined by two methods: by flanking non-significant markers (column “Genes in region - A”) and within a window defined by the significant SNP(s) plus 100 kb before and after the significant SNP(s) (column “Genes in region - B”). Known QTLs are abbreviated IN: Node number; PH: Plant height; SY: seed yield; SW: seed weight. Also see a more detailed version of the table in Additional file 3 (including full *QTL names), and a list of all genes within GWAS regions in Additional file 4 We evaluated genes around 15 loci associated with IN (Table 2 and Additional files 3 and 4). We note several genes with functional annotations related to shoot related development, including shoot apical meristem (SAM) development and processing of pectin-containing cell walls. A candidate gene Glyma.01 g022500 on chromosome 1 encodes an AP2 domain, which has been shown in other species to be involved in SAM identity [21]. A second candidate gene, Glyma.01 g074000, at a different GWA locus on chromosome 1, encodes a ubiquitin protein ligase; it is homologous to AT2G44950, which encodes a gene involved in meristematic transition from vegetative to reproductive phase (Table 2, Additional files 3 and 4) [22, 23]. On chromosome 16, Glyma.16 g016400, encodes a “No Apical Meristem” (NAM) protein and is homologous to AT3G49530, which is involved in formation of the SAM [24-26]. Similarly, Glyma.19 g002900, on chromosome 19, also encodes a NAM protein. For the second locus on chromosome 19, a candidate gene in that region, Glyma.19 g145700, encodes a pectinesterase (pectin lyase-like protein) (Fig. 2). Another GWA locus on chromosome 19 is near (82 kb) the Dt1 gene, which is involved in control of flowering time and development of the inflorescence meristem (Fig. 3) [10, 27, 28].
Fig. 2

An association region for internode length (IN), on chromosome 19. Top panel: -log10 of P transformed values from GWAS for IN, within a 300 kb window; bottom panel: LD, measured in r2. The most significant SNP is ss715635024 (red dot), at a genomic position of 40,683,097. A candidate gene in this region is Glyma.19 g145700, a pectinestrase, at 14 kb from the significant SNP (location marked in green)

Fig. 3

An association region for plant height (PH), on chromosome 19. Top panel: -log10 of P transformed values from GWAS for IN, within a 300 kb window; bottom panel: LD, measured in r2. The most significant SNP is ss715635425 (red dot), at a genomic position of 45,204,441. A candidate gene in this region is Glyma.19 g194300, the determinacy (Dt1) gene, at 20 kb from the significant SNP (location marked in green)

An association region for internode length (IN), on chromosome 19. Top panel: -log10 of P transformed values from GWAS for IN, within a 300 kb window; bottom panel: LD, measured in r2. The most significant SNP is ss715635024 (red dot), at a genomic position of 40,683,097. A candidate gene in this region is Glyma.19 g145700, a pectinestrase, at 14 kb from the significant SNP (location marked in green) An association region for plant height (PH), on chromosome 19. Top panel: -log10 of P transformed values from GWAS for IN, within a 300 kb window; bottom panel: LD, measured in r2. The most significant SNP is ss715635425 (red dot), at a genomic position of 45,204,441. A candidate gene in this region is Glyma.19 g194300, the determinacy (Dt1) gene, at 20 kb from the significant SNP (location marked in green) For PH, we note 4 of 10 GWA loci having with functional annotations clearly related to plant growth and development, growth regulation, and regulation of flowering time and development of inflorescence meristem (Table 2 and Additional files 3 and 4). A candidate gene, Glyma.02G245600, encoding Gibberellin-regulated family protein with homologs related to plant height, is located within 10 kb of a significant SNP on chromosome 2 (Additional files 3 and 4). In another association region on chromosome 19, we note that a candidate gene Glyma.19G192400 is homologous to AT2G36450.1, which encodes an ethylene-responsive transcription factor that is involved (in Arabidopsis) in plant growth regulation (Additional files 3 and 4) [29, 30] . PH was also associated with the Dt1 gene on chromosome 19 (overlapping an association region for IN, above) (Fig. 3). For SW, we note 5 of 14 GWA loci having functional annotations clearly related to seed maturation and seed storage proteins (Table 2 and Additional files 3 and 4). The candidate gene Glyma.02 g046600 on chromosome 2 encodes a histidine kinase protein and is homologous to AT2G17820, which has been shown to be associated with seed maturation during water stress [31]. The candidate gene Glyma.19 g151900 on chromosome 19 encodes a histidine-containing phosphotransfer (Hpt) domain protein and is homologous to AT1G03430.1, which has been reported to play a role in seed size [6]. The second locus is associated with a SNP in high LD (r2 > 0.7) on chromosome 19. The candidate gene in that region, Glyma.19 g163900, encodes an AP2-domain protein, which is involved in maintaining seed size, embryo size, seed weight and seed yield [32, 33]. For SYP, we note 3 of 9 GWA loci having functional annotations related to the development maturation, and size of seeds (Table 2 and Additional files 3 and 4). A candidate gene on chromosome 9, Glyma.09 g040000, encode a response regulator receiver. The is related to regulation of seed growth and seed size development [34, 35]. On chromosome 15, the candidate gene Glyma.15 g145200 encodes a response regulator receiver domain, which is related to regulation of seed growth in Arabidopsis [36, 37].

Genome wide epistatic interaction

Tests for SNP-SNP interactions were performed using a linear regression method implemented with PLINK v1.07 [38]. Epistatic tests identified 11, 10, 9, and 3 SNP-SNP interactions associated with IN, PH, SW, and SYP, respectively (Additional file 6). Four SNPs (ss715578706, ss715588448, ss715592908, and ss715593060) on chromosomes 1, 4 and 6 exhibited significant interactions with nine SNPs on 8 other chromosomes for the IN trait (Additional file 6). The main-effect SNP ss715578706 had an additional significant epistatic interaction. For PH, four main-effect SNPs (ss715582993, ss715635454, ss715635458, and ss715635406), on chromosomes 2 and 19, showed significant epistatic interactions with nine SNPs on other chromosomes. For SW, nine significant SNPs (ss715582341, ss715592842, ss715593095, ss715593417, ss715593870, ss715613107, ss715611786, ss715616880, and ss715614051), on chromosomes 2, 6, 12, and 13 interacted with six SNPs on other chromosomes. A main-effect SNP ss715582341 had additional epistatic interactions (Additional file 6). For SYP, three SNPs (ss715603598, ss715603776, and ss715620383) on chromosome 9 and 15 exhibited significant interactions with three SNPs on other chromosomes. In this study, a total of 9, 10, 6, and 3 candidate genes were predicted for IN, PH, SW and SYP, respectively for loci involved in SNP-SNP interactions (Additional file 6). We hypothesize that more SNP-SNP interactions were seen for IN than for SYP because there are likely numerous contributing loci for the complex SYP trait, generally with small effect, and many of these are likely not picked up from our study – and therefore not available for discovery of epistatic effects.

Discussion

Many QTL mapping studies have provided useful information about approximate genetic locations underlying the genetic control of important agronomic traits in soybean, but these results often have limited mapping resolution. In the current study, we used GWAS to refine the regions previously reported with QTL associations for internode number, plant height, seed weight and seed yield per plant. A collection of 419 diverse soybean PI lines, obtained from the U.S. National Plant Germplasm System, was used to evaluate internode number (IN), plant height (PH), 100 seed weight (SW) and seed yield per plant (SYP). We observed significant genetic differences among genotypes for all four traits, a result connected to the general stability of these traits for each genotype across replicates and locations. We found moderate to high heritability (H2) estimates for the traits measured. GWAS identified 15 and 10 loci associated with IN and PH, respectively (Table 2 and Additional files 3 and 4). These loci were distributed on 11 chromosomes and some of regions overlapped previously reported QTLs for soybean node number and plant height [3, 8, 39–41]. One locus at chromosome 19 was shared by both IN and PH, suggesting these traits are controlled by genes that have pleotropic effects. Indeed, we found a significant phenotypic correlation (0.71) between IN and PH, further indicating the two traits are likely to be at least partly under common control [3]. In our study, both IN and PH were associated with the locus identified by SNP ss715635454, which is in LD with the Dt1 gene, that has been shown to control flowering and plant height; these traits are often highly correlated [8]. Previous studies have showed the Dt1 gene to regulate many agronomic traits including plant height and flowering in soybean [8, 42–44]. In soybean, genotypes exhibit either determinate or indeterminate stem growth [45]. The determinate growth habit genotype (which is partly controlled by the recessive dt1/dt1 alleles), is produced if the shoot apical meristem (SAM) switches from vegetative to reproductive growth [45]. The presence of the dominant alleles (Dt1/Dt1) suppresses the transition to a reproductive inflorescence [45]. Hence, soybean plant height and node production are highly affected by stem growth habit, which in turn has consequences for flowering. Our results provide further evidence in support of the effects of the Dt1 gene on important agronomic traits like plant height and internode number. Another two loci on chromosome 19 that are associated with IN and PH include candidate genes Glyma.19 g145700 and Glyma.19G187800. Both genes are annotated as having pectinesterase homology and may therefore be involved in processing of cellulose and pectin-containing cell walls [46]. Cellulose and pectin are polysaccharides and are important for plant growth and development, with involvement in cell wall expansion and stem elongation [47]. This is evidenced, for example, by overexpression of a petunia inflate pectinesterase in potato plants, which was involved in stem elongation [48]. Pectinesterase has been shown to control cell growth and normal cell elongation in Arabidopsis hypocotyls by affecting the degree of pectin methyl-esterification [49]. Through numerous studies, more than 200 QTLs have been identified for SW in soybean (https://soybase.org/), as well as for related traits such as seed size, seed length, seed height and width. In the current study, 14 loci associated with SW were identified. Each locus could explain <3.5% of the phenotypic variation – as might be expected for a complex quantitatively inherited trait. Of the 14 loci identified, seven of them were previously reported as QTL at least once (Additional file 3). The locus on chromosome 2 (ss715582341) showed strong association with SW and seed size-related traits, including seed height and seed width (Additional file 3). This region overlaps QTL for similar traits [50, 51]. Several candidate genes identified on chromosome 19 may have relevance for SW. Glyma.19 g151900 is annotated as a member of the AHP (Arabidopsis histidine phosphotransfer) protein family (Additional file 3). AHPs mediate the intracellular response to cytokinins via a two-component phosphorelay signaling pathway from the AHK (Arabidopsis histidine kinase) cytokinin receptors to the ARRs (Arabidopsis response regulators) and CRFs (cytokinin response factors) [50]. CRFs consist of a subgroup of AP2 transcriptional factors [51]. The sizes of triple cytokinin receptor (AHK2, AHK3, and CRE1/AHK4) loss-of-function mutant seeds were more than doubled due to increased size of the embryo [52]. Similarly, the ahp1,2,3,4,5 “penta” histidine phosphotransfer protein mutation in Arabidopsis resulted in larger seeds and embryos [50], suggesting that the homologous AHP Glyma.19 g151900 could also be involved in a cytokinin-mediated seed weight regulating pathway in soybean [6]. The candidate gene Glyma.19 g163900 encodes an AP2 domain protein, analogues of which play significant roles in maintaining seed size, embryo size, seed weight, and seed yield. APETALA2 is involved in regulating in the maternal endosperm in Arabidopsis [33] and affects various vegetative organs as well. Further studies will be needed to test the functions and effects of these genes in soybean. Nine significant loci were identified for seed yield per plant (SYP). The SYP loci on chromosome 9 (ss715603626) and chromosome 15 (ss715620383) were identified within regions overlapping four previously reported QTLs for seed yield [53-55] and four others for seed weight [2, 4, 44, 56] . In addition, several QTLs identified by previous studies tested in different environments were detected in this study, indicating that these seed yield associations are likely stable and might be used to improve soybean seed yield. For agronomic traits such as IN, PH, SW, and SYP, estimating epistatic loci could help to clarify complex genetic effects of GWAS loci and elucidate other types of interactions such as genotype x environment effects. For internode number, an epistatic candidate gene Glyma.03 g259800 is annotated as a being in the ROP (Rho of Plants)-activated gene family. It potentially interacts with the main gene Glyma.01 g022500 that manifests an AP2 homology (suggesting regulation of SAM identity). ROP-activated genes are reported to play a role in proper cell wall organization during the growth and development of plants [57]. In A. thaliana, AP2 is actively involved in vegetative development by regulating the RAP2 (“Related to AP2”) gene expression in stems as well as functioning in reproductive development [58]. Another epistatic candidate gene, Glyma.17 g110300, annotated as an Acetylglucosaminyltransferase protein, interacts with the main-effect gene, Glyma.06 g152000 (a pectinesterase). Furthermore, Glyma.17 g110300 is also homologous to the AT5G33290 gene annotated as a glycosyltransferase superfamily protein shown to play a role in pectin biosynthesis [59]. Pectin is a complex polysaccharide in plant cell wall and it is important for plant growth, development and defense [60]. For plant height, Glyma.02G245600, encodes a gibberellin-regulated family protein, with homologs known to have key function in plant development, while Glyma.19 g194300 encodes a phosphatidylethanolamine binding protein, and has been shown to play key role in regulating of flowering time in soybean and other plants [4, 8, 44]. For seed weight and seed yield significant related main and epistatic genes were identified (Additional file 6). In this GWAS study, an overlap between the locations of main effect and epistatic PH and IN genes suggests that GWES was an effective methodology.

Conclusions

This study provides a large number of interesting additive and epistatic loci possibly controlling important agronomic traits such as IN, PH, SW, and SYP in soybean. Markers reported here may help in future genomic studies and cultivar development programs.

Methods

Plant materials

This study included 419 diverse soybean PI accessions originating from 26 countries, including China, Japan, South Korea and the USA. These genotypes were obtained from the USDA National Plant Germplasm System (http://www.ars-grin.gov; Additional file 5). They were selected for breadth of phenotypic diversity and were derived from three maturity groups (MG): MG I, II, and III.

Field phenotyping

Soybeans were planted at three locations in central Iowa in 2016 and 2017: the Bruner Research Farm, Agronomy and Agricultural Engineering Research Farm (AAERF), and Burkey Research Farm. The soil type at Bruner Research Farm is Nicollet (poorly drained black loam), AAERF is Webster (silty clay loam) and the Burkey Research Farm is Clarion (well-drained black clay). (Note that in 2016, only PH was measured, as the primary objective in that year was for initial assessment of maturity and disease characteristics.) No fertilizer was applied to experimental plots. At all three sites, in both years, the soybean lines were planted using randomized complete block design with two replications. Each accession was planted in single row plot 0.5 m long, with 40 cm between plots. At full pod (R4) and beginning seed (R5), internode number (IN) per plant were recorded. The internode number of the main stem was taken as the average of three such measurements. Plant height was measured in 2016 and 2017 at physiological maturity from three randomly selected plants per plot, with each measured for main stem length from the soil surface to the tip of the plant. SW and SYP were harvested from one replication at each location. Hundred seed weight was measured for each genotype on a plot basis. SYP was estimated as an average of six plants per plot and seed weight was adjusted to 12% moisture content.

Genotyping and quality control

The “SoySNP50k” SNP dataset for the panel was described by Song et al. [61]. The SoySNP50k variant data were accessed from SoyBase (https://www.soybase.org/snps/) in spring 2018. The SNP dataset contains 42,506 high confidence SNPs. Of these, 59 SNPs that were unanchored to the Williams 82 v2 reference genome sequence were excluded from further analysis. We used BEAGLE version 3.3.1 with the default parameter settings to impute missing data [62, 63]. Markers with more than 10% missing values were excluded from further analyses. Following imputation, SNPs with a minor allele frequency < 5% were also excluded, yielding 36,140 SNPs for use in GWAS and GWES.

Marker distribution and linkage disequilibrium

Genome-wide inter-marker distances and chromosome-wide densities were calculated using chromosomal physical lengths, which were obtained from the Glycine max Wm82.a2 reference genome. The linkage disequilibrium (LD) was determined as the squared allelic frequency correlation coefficient (r2) in the R package synbreed [64]. The r2 was computed separately for euchromatic and heterochromatic regions due to significant differences in recombination rate for these two regions. The approximate euchromatic/heterochromatic boundaries for each chromosome were as reported in the SoyBase Wm82.a2 genome browser (https://www.soybase.org). The average LD decay graph was plotted using an R script, which was computed according to Remington et al. [65] r2 was calculated only for SNPs with a pairwise distance less than 10 Mbp in either euchromatic or heterochromatic regions of each chromosome. The LD decay rate was determined as the chromosomal distance at the point where the mean r2 dropped to half of its maximum value [66].

Genome-wide association (GWA) and genome wide epistatic (GWE) interaction analysis

The best linear unbiased prediction (BLUP) values for individual accessions were determined using the R package lme4 [67]. This is used to reduce the effects of environmental variation for the measured phenotypic traits. The GWAS analyses were implemented for each trait using GAPIT software [20, 68]. The mixed linear model (MLM) didn’t include population structure as a covariant, as indicated by the Bayesian information criteria (BIC) test of model fit (Additional file 7, [69]). The empirical significance level at p < 0.001, determined by 1000 permutations, was used as the significance threshold of SNPs-trait associations as described in Zhang et al. [18]. The genome-wide epistatic interactions between a pair of SNPs were analyzed using the software PLINK version 1.07 [18, 38]. To adjust the multiple comparisons of SNPs, we selected a Bonferroni threshold of α = 0.05.

Candidate gene prediction

Two sets of gene models were used for identification of candidate genes: the JGI-produced Glyma.Wm82.a2.v1 annotations and the NCBI RefSeq gene models, both called in the GlymaWm82.a2 assembly. Candidate genes in proximity to GWA markers were identified using two approaches: (1) genes within the region circumscribed by the two closest non-significant SNPs on either side of a significant SNP and (2) genes found in the genomic region within 100 kb upstream and downstream of the SNPs with highest association for the trait. All significant SNPs in close proximity were grouped at LD r2 > 0.7 [8] to designate candidate regions for quantitative trait loci (QTL), and only the peak SNPs within each LD block were maintained.

Statistical analysis

The model for phenotypic data collected for each trait was Yijk = μ + gi + l + (gl)ij + b + eijk, where μ is the overall population mean, gi is the genetic effect of the ith genotype, l is the effect of jth location, (gl) is the interaction effects between the i genotype and j location, bk(j) is the effect of the block within the jth location, and eijk is the residual effect including random error following N (0, σ). Broad sense heritability for each trait was estimated on an entry mean basis using the equation, H2 = σ 2 g / [σ 2 g + (σ 2 g*l/l) + (σ 2 e/rl)], where σ genotypic variance, σ 2 g*l = genotype by location interaction variance, l = number of locations, and r = number of replicates, and σ2 e is the error variance assuming all effects are random. We computed variance components estimation was computed using SAS version 9.3 (SAS Institute Inc., Cary, NC). Figure S1. Frequency distribution of observation of internode number, plant height, seed weight, and seed yield per plant in soybean. (DOCX 79 kb) Figure S2. The mean level of linkage disequilibrium (LD) decay rate in euchromatic and heterochromatic chromosome regions. The mean decay of LD was estimated as squared correlation coefficient (r2) using all pairs of loci within 10 Mb of physical distance. The x-axis shows the distance between markers pairs in Mb and the y-axis shows LD in r2. The red line denotes euchromatic region and black line denotes the heterochromatic region. The dashed grey line shows the position where r2 dropped to half of its maximum value. (DOCX 53 kb) Summary of GWAS regions, genes of interest, and overlapping QTLs. Extends Table 2, with SNP position data, P-values, distances to SNPs, and additional information about functional annotations. Counts of candidate genes are given for regions defined by two methods: by flanking non-significant markers around significant SNP(s), and within a window defined by the significant SNP(s) plus 100 kb before and after the significant SNP(s). (XLSX 22 kb) Genes within GWAS regions along with gene annotations. Genes are given for regions defined by two methods: by flanking non-significant markers around significant SNP(s), and within a window defined by the significant SNP(s) plus 100 kb before and after the significant SNP(s). (XLSX 150 kb) PI lines used in the study, arranged based on maturity group and country of origin. (XLSX 20 kb) Significantly associated genome wide epistatic (SNP-SNP) interactions and associated candidate genes. (XLSX 14 kb) Bayesian Information Criterion (BIC) mixed linear model with principal components (PCs) applied for association analysis of internode, plant height, seed weight and seed yield per plant. (DOCX 14 kb)
  63 in total

Review 1.  Formation and maintenance of the shoot apical meristem.

Authors:  J L Bowman; Y Eshed
Journal:  Trends Plant Sci       Date:  2000-03       Impact factor: 18.313

2.  Structure of linkage disequilibrium and phenotypic associations in the maize genome.

Authors:  D L Remington; J M Thornsberry; Y Matsuoka; L M Wilson; S R Whitt; J Doebley; S Kresovich; M M Goodman; E S Buckler
Journal:  Proc Natl Acad Sci U S A       Date:  2001-09-18       Impact factor: 11.205

Review 3.  Cell-cell signaling in the shoot meristem.

Authors:  A Haecker; T Laux
Journal:  Curr Opin Plant Biol       Date:  2001-10       Impact factor: 7.834

4.  Shoot Meristem Formation in Vegetative Development.

Authors:  R. A. Kerstetter; S. Hake
Journal:  Plant Cell       Date:  1997-07       Impact factor: 11.277

5.  Quantitative trait loci analysis for the developmental behavior of Soybean (Glycine max L. Merr.).

Authors:  Desheng Sun; Wenbin Li; Zhongchen Zhang; Qingshan Chen; Hailong Ning; Lijuan Qiu; Genlou Sun
Journal:  Theor Appl Genet       Date:  2005-12-20       Impact factor: 5.699

6.  Expression of a Petunia inflata pectin methyl esterase in Solanum tuberosum L. enhances stem elongation and modifies cation distribution.

Authors:  J Pilling; L Willmitzer; J Fisahn
Journal:  Planta       Date:  2000-02       Impact factor: 4.116

7.  Control of seed mass by APETALA2.

Authors:  Masa-Aki Ohto; Robert L Fischer; Robert B Goldberg; Kenzo Nakamura; John J Harada
Journal:  Proc Natl Acad Sci U S A       Date:  2005-02-11       Impact factor: 11.205

8.  Control of seed mass and seed yield by the floral homeotic gene APETALA2.

Authors:  K Diane Jofuku; Pamela K Omidyar; Zorana Gee; Jack K Okamuro
Journal:  Proc Natl Acad Sci U S A       Date:  2005-02-11       Impact factor: 11.205

9.  AtBXL1, a novel higher plant (Arabidopsis thaliana) putative beta-xylosidase gene, is involved in secondary cell wall metabolism and plant development.

Authors:  Thomas Goujon; Zoran Minic; Abdelhak El Amrani; Olivier Lerouxel; Estelle Aletti; Catherine Lapierre; Jean-Paul Joseleau; Lise Jouanin
Journal:  Plant J       Date:  2003-02       Impact factor: 6.417

10.  The response regulator 2 mediates ethylene signalling and hormone signal integration in Arabidopsis.

Authors:  Claudia Hass; Jens Lohrmann; Verónica Albrecht; Uta Sweere; Florian Hummel; Sang Dong Yoo; Ildoo Hwang; Tong Zhu; Eberhard Schäfer; Jörg Kudla; Klaus Harter
Journal:  EMBO J       Date:  2004-07-29       Impact factor: 11.598

View more
  11 in total

1.  Identification of QTNs and Their Candidate Genes for 100-Seed Weight in Soybean (Glycine max L.) Using Multi-Locus Genome-Wide Association Studies.

Authors:  Muhammad Ikram; Xu Han; Jian-Fang Zuo; Jian Song; Chun-Yu Han; Ya-Wen Zhang; Yuan-Ming Zhang
Journal:  Genes (Basel)       Date:  2020-06-27       Impact factor: 4.096

2.  Loci discovery, network-guided approach, and genomic prediction for drought tolerance index in a multi-parent advanced generation intercross (MAGIC) cowpea population.

Authors:  Waltram Ravelombola; Ainong Shi; Bao-Lam Huynh
Journal:  Hortic Res       Date:  2021-02-01       Impact factor: 6.793

3.  Identification of quantitative trait loci underlying five major agronomic traits of soybean in three biparental populations by specific length amplified fragment sequencing (SLAF-seq).

Authors:  Bo Hu; Yuqiu Li; Hongyan Wu; Hong Zhai; Kun Xu; Yi Gao; Jinlong Zhu; Yuzhuo Li; Zhengjun Xia
Journal:  PeerJ       Date:  2021-12-14       Impact factor: 2.984

4.  Detecting QTL and Candidate Genes for Plant Height in Soybean via Linkage Analysis and GWAS.

Authors:  Jiajing Wang; Bo Hu; Yuliang Jing; Xiping Hu; Yue Guo; Jiankun Chen; Yuxi Liu; Jianhui Hao; Wen-Xia Li; Hailong Ning
Journal:  Front Plant Sci       Date:  2022-01-21       Impact factor: 5.753

Review 5.  The Organ Size and Morphological Change During the Domestication Process of Soybean.

Authors:  Xuan Zhou; Dongfa Wang; Yawen Mao; Yueqiong Zhou; Limei Zhao; Chunbao Zhang; Yu Liu; Jianghua Chen
Journal:  Front Plant Sci       Date:  2022-06-09       Impact factor: 6.627

Review 6.  Quantitative Trait Loci for Seed Size Variation in Cucurbits - A Review.

Authors:  Yu Guo; Meiling Gao; Xiaoxue Liang; Ming Xu; Xiaosong Liu; Yanling Zhang; Xiujie Liu; Jixiu Liu; Yue Gao; Shuping Qu; Feishi Luan
Journal:  Front Plant Sci       Date:  2020-03-20       Impact factor: 5.753

7.  Genome-wide association and epistatic interactions of flowering time in soybean cultivar.

Authors:  Kyoung Hyoun Kim; Jae-Yoon Kim; Won-Jun Lim; Seongmun Jeong; Ho-Yeon Lee; Youngbum Cho; Jung-Kyung Moon; Namshin Kim
Journal:  PLoS One       Date:  2020-01-22       Impact factor: 3.240

8.  Identification of quantitative trait nucleotides and candidate genes for soybean seed weight by multiple models of genome-wide association study.

Authors:  Benjamin Karikari; Zili Wang; Yilan Zhou; Wenliang Yan; Jianying Feng; Tuanjie Zhao
Journal:  BMC Plant Biol       Date:  2020-09-01       Impact factor: 4.215

9.  Genome-wide association studies of plant architecture-related traits and 100-seed weight in soybean landraces.

Authors:  Xiaoli Zhang; Wentao Ding; Dong Xue; Xiangnan Li; Yang Zhou; Jiacheng Shen; Jianying Feng; Na Guo; Lijuan Qiu; Han Xing; Jinming Zhao
Journal:  BMC Genom Data       Date:  2021-03-06

10.  Genome-wide association coupled gene to gene interaction studies unveil novel epistatic targets among major effect loci impacting rice grain chalkiness.

Authors:  Gopal Misra; Saurabh Badoni; Sabiha Parween; Rakesh Kumar Singh; Hei Leung; OluFunmilayo Ladejobi; Richard Mott; Nese Sreenivasulu
Journal:  Plant Biotechnol J       Date:  2020-12-09       Impact factor: 9.803

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.