Literature DB >> 35521543

Linkage and association analyses reveal that hub genes in energy-flow and lipid biosynthesis pathways form a cluster in upland cotton.

Juwu Gong1,2,3, Yan Peng4, Jiwen Yu1,3, Wenfeng Pei1, Zhen Zhang1, Daoran Fan1, Linjie Liu1,2, Xianghui Xiao1,2, Ruixian Liu1,2, Quanwei Lu5, Pengtao Li5, Haihong Shang1,3, Yuzhen Shi1,3, Junwen Li1,3, Qun Ge1, Aiying Liu1, Xiaoying Deng1, Senmiao Fan1, Jingtao Pan1, Quanjia Chen2, Youlu Yuan1,2,3, Wankui Gong1.   

Abstract

Upland cotton is an important allotetraploid crop that provides both natural fiber for the textile industry and edible vegetable oil for the food or feed industry. To better understand the genetic mechanism that regulates the biosynthesis of storage oil in cottonseed, we identified the genes harbored in the major quantitative trait loci/nucleotides (QTLs/QTNs) of kernel oil content (KOC) in cottonseed via both multiple linkage analyses and genome-wide association studies (GWAS). In 'CCRI70' RILs, six stable QTLs were simultaneously identified by linkage analysis of CHIP and SLAF-seq strategies. In '0-153' RILs, eight stable QTLs were detected by consensus linkage analysis integrating multiple strategies. In the natural panel, thirteen and eight loci were associated across multiple environments with two algorithms of GWAS. Within the confidence interval of a major common QTL on chromosome 3, six genes were identified as participating in the interaction network highly correlated with cottonseed KOC. Further observations of gene differential expression showed that four of the genes, LtnD, PGK, LPLAT1, and PAH2, formed hub genes and two of them, FER and RAV1, formed the key genes in the interaction network. Sequence variations in the coding regions of LtnD, FER, PGK, LPLAT1, and PAH2 genes may support their regulatory effects on oil accumulation in mature cottonseed. Taken together, clustering of the hub genes in the lipid biosynthesis interaction network provides new insights to understanding the mechanism of fatty acid biosynthesis and TAG assembly and to further genetic improvement projects for the KOC in cottonseeds.
© 2022 The Authors.

Entities:  

Keywords:  Gene cluster; Hub gene; Interaction network; Kernel oil content; Quantitative trait loci; Upland cotton

Year:  2022        PMID: 35521543      PMCID: PMC9046884          DOI: 10.1016/j.csbj.2022.04.012

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   6.155


The growth of the world’s population and the improvement in living standards have greatly increased the global demand for vegetable oil [1]. As an important economic crop, upland cotton (2n = 4x = 26, AADD) is not only the main natural fiber source for the textile industry [66], [79] but also the fifth largest vegetable oil source for the food and feed industry [11]. Cottonseed oil contains a high proportion of unsaturated fatty acids (FAs), including linoleic acid (18-carbon chain with two double bonds; C18:2) and oleic acid (C18:1) [46]. Due to the beneficial effects of unsaturated FAs on human health and the significant advantages of low cost and stable flavor compared with rapeseed oil or olive oil, cottonseed oil is becoming increasingly attractive in various food processing techniques [70]. The carbon chain length of 99% FAs of cottonseed oil is between C16 and C18, similar to that of fossil diesel, which is usually between C15 and C18. In addition, the biodiesel converted from cottonseed oil is oxygen-rich but sulfur-free, which makes the biodiesel burn more thoroughly and causes less environmental pollution. These facts make cottonseed oil an ideal raw biofuel material, showing promising application prospects in the production of biodiesels [20]. Cottonseed oil can also be used to synthesize lubricants, inks, paints, soaps, and other chemical products. However, in cotton-related breeding and research projects, breeders put little effort into improving seed composition and quality, let alone cottonseed oil content and composition [80]. Therefore, on the premise of improving fiber yield and quality, it is necessary to improve the yield and quality of cottonseed to meet the high nutritional and industrial demand for cottonseed derivatives. Mapping the quantitative trait loci/nucleotides (QTLs/QTNs) of kernel oil content (KOC) in cottonseed and mining its candidate genes are effective ways to dissect the regulatory mechanism of cottonseed oil biosynthesis and to improve the KOC of cottonseed through corresponding breeding strategies. Cotton growers may also benefit from these joint improvement programs for both fiber and cottonseed qualities. Vegetable oil exists in the form of triacylglycerol (TAG), and its biosynthesis is a complex process of co-expression and regulation of several enzymes and genes. It involves FA biosynthesis and TAG assembly [8], a continuous procedure of incorporating acyl CoA FAs from the cytosol into a glycerol-3-phosphate (G3P) backbone in the endoplasmic reticulum (ER) [82]. G3P is produced by the reduction of dihydroxyacetone phosphate (DHAP), an intermediate of glycolysis, by glycerophosphate dehydrogenase [105]. At present, a large number of genes have been cloned to dissect gene expression regulation and mechanisms in the process of oil biosynthesis [131], [133]. WRKYs are a large family of transcription factors that are mainly involved in plant growth, development, and response to stress [10]. In A. thaliana, WRKY6 influences seed oil accumulation and FA composition in developing Arabidopsis seeds [90]. GLUCOSE-6-PHOSPHATE/PHOSPHATE TRANSLOCATOR1 (GPT1), the key target gene of WRKY2/WRKY34, drives Glc6P from the cytosol into the plastid in the late stage of pollen development, which is an important step in lipid biogenesis in pollen [12]. In Gossypium species, WRKY12 has 198 potential target genes, and their differential expression may alter the FA biosynthetic pathway [39]. The serine/threonine/tyrosine kinase (STYK) gene phosphorylates oil body proteins, which in turn affect seed oil content. The seeds of the styk mutant of Arachis hypogaea accumulated 10–12% less TAG compared to the wild type, validating the role of STYK phosphorylation in lipid accumulation [78]. The peanut β-ketoacyl-CoA synthase (KCS) genes, AhKCS1 and AhKCS28, are involved in the regulation of very long-chain FA (VLCFA) content in peanut seeds. Overexpression of AhKCS1 or AhKCS28 in A. thaliana can increase VLCFA content, especially the content of very long-chain saturated FAs (VLCSFAs) [38]. The hydroxysteroid dehydrogenase (HSD1) gene, which has two copies, At5g50600 and At5g50700, in Arabidopsis, is specifically and highly induced in the oil-accumulating tissues of maturing seeds [7]. The Phospholipid:diacylglycerol acyltransferase (PDAT) gene increases seed oil content in rice [41], soybean [63], A. thaliana [68], and upland cotton [123]. In Brassica napus, the genes POCKET and BnPMT6s negatively regulate seed oil content [97]. Among the four domesticated cotton species, upland cotton accounts for 90–95% of the total cotton fiber production due to its wide adaptability and attractive yield [47], [66]. The commercially planted upland cotton cultivars are the result of long-term selection and domestication [33], in which cottonseed oil content varies greatly from 12.60 to 39.33% of the total kernel weight [33], [103], [130]. To better understand the genetic mechanism of oil accumulation, some early studies have tentatively tackled QTL identification in cottonseed oil [85], [120]. Many QTLs have thus been identified using intraspecific populations of G. hirsutum [112], interspecific populations between G. hirsutum and G. barbadense [120], [134], and, more recently, natural accessions in genome-wide association studies (GWAS) [22], [130]. The rapid development of genome sequencing technologies has provided a new platform for QTL mapping based on linkage and association analyses of upland cotton and gene function studies [26], [37], [52], [107], [125]. The lysophosphatidic acid acyltransferase (LPAAT) gene has been reported to be involved in oil biosynthesis. Enhancing the expression of an LPAAT gene, At-Gh13LPAAT5, significantly increases the yield of total TAG and other FAs [108]). The FA desaturase (FAD2) gene encodes seed-specific desaturase, which catalyzes the conversion of oleic acid to linoleic acid in cotton seed and is the main factor determining the content of essential polyunsaturated FAs (with multiple unsaturated bonds) in seed oil [21]. There are at least two copies of this gene in tetraploid Gossypium species, and knockouts of GhFAD2 genes with the CRISPR/Cas9 system have generated high-KOC materials [11]. The transcription factor WRINKLED 1 (GhWRI1) participates in the upregulation of FA biosynthesis during seed development. Overexpression of GhWRI1 has been observed to increase KOC and seed weight in transgenic A. thaliana and upland cotton [131]. GhCPS1, GhCPS2, and GhCPPS3 are cyclopropane synthase homologues in cotton, and the expression of GhCPS1 and GhCPS2 correlates with cyclic FA distribution [121]. GbSWEET and GbACBP6 from G. barbadense were also demonstrated to have the potential for improving the KOC in cottonseed [133]. These studies are sporadic, aiming at isolated single genes; however, there are few reports on the regulatory network of oil synthesis in cottonseed, which is critical to dissect the genetic basis of biosynthesis of storage lipids. The current study combined multiple linkage and association strategies for major QTL identification and candidate gene verification of KOC in cottonseed across two recombinant inbred line (RIL) populations and a panel of 207 natural accessions to dissect the genetic regulatory mechanism of KOC accumulation in cottonseed. The hub genes were identified through expression network analysis of the candidate genes harbored in stable QTL intervals. Two hub genes, LPLAT1 and PAH2, coding lysophospholipid acyltransferase [106] and Mg2+-dependent phosphatidic acid phosphohydrolases [25], respectively, which coordinate the key steps of the Kennedy pathway and the Lands cycle for TAG assembly, were identified to co-localize in a cluster, determining the accumulation of oil in cells. These results provide a better understanding of the molecular regulatory mechanism in the metabolic pathways of lipid biosynthesis and oil accumulation during cottonseed development. This study also identified elite germplasm for future improvement of KOC in cottonseed in pragmatic breeding projects.

Materials and methods

Plant materials

The experimental materials included two RIL populations for linkage analyses and a panel of 219 natural accessions (natural panel) for association analysis. One RIL population consisted of 250 individual lines derived from an elite hybrid cultivar ‘CCRI70′, a commercial hybrid approved by the State Crop Variety Approval Committee (Approval certificate 2008011; CCRI70 RIL) [122], and parental lines ‘sGK156′ (female) and ‘901-001′ (male). In 2017, the F5:6 generation of the 250 CCRI70 lines was reached and thereafter regarded as RILs. The development of the population has been detailed in previous studies [19], [32]. From 2015 to 2017, the phenotypic performance of the CCRI70 RIL population was evaluated at multiple locations. The experiments were performed using a randomized block design, and the experimental locations and layouts are detailed in Table S1. The other RIL population consisted of 196 individual lines derived from a cross between parental lines ‘sGK9708′ and ‘0-153′ (0–153 RIL). The construction procedure and characteristics of the parental lines have been described elsewhere [95], [128]. The phenotypic performance of KOC was evaluated across eight environments from 2014 to 2015 in Anyang, Alaer, Kuerle, and Shihezi. Briefly, the field design and layout were conducted in a completely randomized block design with two replicates in each experimental location, which were detailed in a previous report [128]. The natural panel was collected from a pool of commercial planting cultivars and breeding nurseries, which represented the cotton production levels of the main cotton planting regions. The panel also contains a collection of advanced lines and germplasm in current breeding projects. The phenotypic performance of the natural panel was evaluated in four environments, namely Anyang in 2017 and Alaer, Anyang, and Wuhan in 2019. The experiments were performed in a randomized block design, which is detailed in Table S1. Field management for both RIL populations and the natural panel followed local practices in cotton production using ‘CCRI60,’ a commercial cultivar, as the control.

Trait phenotypic evaluation

In October, 30 normal cotton bolls in each line or accession were sampled. The cotton lint fiber and cottonseed were separated with a roller gin. The broken and immature seeds were discarded manually, and 50 healthy seeds were randomly sampled from each line or accession. The husks of the sampled seeds were removed manually. Then, the KOC in cottonseed was measured with three repeats using NMI20 Analyst (Shanghai, China).

Data statistical analyses

The best linear unbiased estimates (BLUE) of the KOC of the three populations and the KOC broad-sense heritability (H2) in different environments were estimated using QTL ICIMapping 4.2 software [73]. Statistical analysis of the phenotypic performance and BLUE was performed with Minitab 17 statistical software. The Chart.Correlation function in the Performance Analytics Package of R was applied to create the correlation graphs and figures.

QTL identification

Two linkage maps were used to detect the KOC QTLs of the CCRI70 RILs. One was constructed with the Intl Cotton SNP Consortium_80k CHIP containing 8292 SNP markers, with a total coverage of 4876.7 cM genetic distance and an average marker interval of 1.08 cM [136]. The other map was constructed with SLAF-seq technology, containing 24,425 SNPs with a total coverage of 4850.47 cM genetic distance and an average marker interval of 0.73 cM (Table S2) (Jiang et al., unpublished data). The linkage map of 0–153 RILs was a consensus map integrating multiple strategies, which has been detailed in a previous report (Table S2) [128]. The markers and genetic linkage map were re-analyzed using reference genome data of TM-1 (v2.1) [37]. QTLs were detected using the composite interval mapping (CIM) method of Windows QTL Cartographer v2.5 software [9], [110]. In the parameter setting, a likelihood ratio (LR) value of 11.5, which is equivalent to an LOD value of 2.5, was taken as the threshold of QTL existence, that is, when the LOD of the marker interval was greater than or equal to 2.5, a QTL existed at the peak of the LOD value in that interval. The scanning of the whole genetic linkage map for QTL existence was performed with 1000 permutation tests at a step length of 0.5 cM with p < 0.05. The nomenclature for QTL followed Sun et al. [95], and QTLs that were consistently identified in at least three environments were regarded as stable. Positive additive effect indicated that the favorable allele was from parent 901–001 (CCRI70 RIL) or 0–153 (0–153 RIL), and negative additive effect indicated that the favorable allele was from parent sGK156 (CCRI70 RIL) or sGK9708 (0–153 RIL). Stable QTLs were compared with the Cotton QTLdb database (https://www.cottonqtldb.org) or previous GWAS [23], [69], [120], [134] to determine whether they were identified de novo or previously published. If the markers in the confidence interval of QTLs in the current study shared the same or overlapped physical location with those in QTLs in the Cotton QTLdb database or if the physical position of GWAS QTL markers in previous studies rested on confidence intervals of the current study, then QTLs in the current study were regarded as previously identified; otherwise, they were regarded as new QTLs.

GWAS of the natural panel

The natural panel was genotyped using the SLAF-seq technology described previously [128]. Briefly, 3,464,561 SNPs were obtained. The reads containing these SNPs were aligned to the reference genome [37] using BWA software [53]. SAMtools [54] and GATK [72] software were used to call SNPs and short indel variants. After filtering at the integrity level of no<0.85 and a minimum allele frequency of 0.05, a total of 32,994 SNP markers were obtained to genotype the natural panel (Table S2). The GLM and MLM models [127] of the genetic association and prediction integrated tool (GAPIT package) [59] in R software were used to calculate the significance of QTNs. The signal threshold of − log10(P) > 4 was determined to declare the existence of a significant association site.

Acquisition of candidate genes in QTLs and QTNs

Cotton genome data and genome-wide functional annotation files were downloaded from the cotton genome database (CottonFGD, https://cottonfgd.org/) [135]. QTLs and QTNs of the KOC were aligned to the physical map. The candidate genes corresponding to the upland cotton genome (TM-1 v2.1) [37] were searched based on genome coordinates or genetic markers. Genes within the range of 100 kb flanking QTNs and within the confidence marker interval of QTLs were regarded as candidate genes of the KOC in cottonseed.

Prediction, annotation, and expression profiling of candidate genes

All candidate genes and their corresponding protein sequences were blasted with the Arabidopsis annotation database (https://www.arabidopsis.org/). Candidate genes in QTL regions were analyzed for Gene ontology annotation (GO) using agriGO 2.0 software [99]. Metabolic pathway enrichment analysis of the Kyoto Encyclopedia of Genes and Genomes (KEGG) was performed using KOBAS 3.0 [117]. Transcription factors (TFs) were obtained through annotation analysis from the PlantTFDB v5.0 database [104]. Cotton expression profile data were downloaded from the database (https://cotton.zju.edu.cn/). The database contains FPKM values of developing ovules at 0, 1, 3, 5, 10, and 20 days post anthesis (dpa) of ‘TM-1,’ a standard upland cotton cultivar with a low KOC of 30.82%, and ‘Hai7124,’ a cultivar with a high KOC of 35.31%. The TM-1 database was used to analyze the expression profiles of candidate genes in KOC QTLs, and the Hai7124 database was used to compare the differential expression of these genes during cotton ovule development between high and low KOC materials.

Construction of a regulatory network of lipid synthesis-related genes

The interaction between proteins was explored in the Metascape database [132] and rendered as a network plot, where terms with a similarity > 0.3 were connected by edges. The terms with the best p-value were selected from each cluster based on the constraint that the terms selected did not exceed 15 per cluster and that the total number of terms did not exceed 250. Using all interactions (details) in STRING 10.0 (https://string-db.org/cgi/ input.pl), the synthesized network exclusively kept the subset of proteins that physically interacted with at least one other member on the list. If the network contained 3–500 proteins, then the molecular complexity detection (MCODE) algorithm [40], [55] was used to identify the network components with dense node–node connections. For each MCODE component, pathway and process enrichment analyses were performed independently. The three best scoring items of the p-value were reserved to describe the function of the corresponding component. Finally, a regulatory network was constructed. Cytoscape 3.9.0 [45] was used to visualize the regulatory network, where each node represented an enriched term, and then, the network was pigmented by its cluster ID and then by its p-value.

Results and analysis

Statistical analysis of KOC phenotypic data in CCRI70 RIL and natural populations

In the 14 environment phenotypic evaluations of CCRI70 RILs and their parental lines, sGK156 and 901–001, the KOC of 901–001 was significantly higher than that of sGK156 (Table 1 and Fig. 1A). The phenotypic distributions of KOC of the CCRI70 RILs in the Cotton Regions of the Yellow River Basin (namely 14AY, 15AY, 16AY, 17AY, 15LQ, and 16LQ), of Northwest Inland (namely 16ALE, 16KEL, 17KEL, 16SHZ, and 17SHZ), and of the Yangtze River Basin (namely 16CD, 17CD, and 17DT) were 19.45–27.24%, 23.48–41.17%, and 21.47–40.24%, respectively, which exceeded the values of their two parents, indicating that both parents of CCRI70 RILs had alleles that favored KOC accumulation in cottonseed. The results also showed that the average KOC of CCRI70 RILs was between those of the two parents and that they varied significantly across different environments.
Table 1

Phenotype analysis of CCRI70 RILs and natural panel across multiple environments.

PopulationEnvaParentsbPopulation
P1P2NMean ± SDcRangeSkewnessKurtosisCVd (%)
CCRI70 RILs15AY21.7326.82**25026.21 ± 3.8719.45–33.31−0.05−1.5414.78
16AY25.0430.15**25027.86 ± 3.1521.78–34.23−0.15−1.3411.31
17AY27.4533.09**25031.74 ± 3.0924.29–37.24−0.23−1.359.74
15LQ26.5632.49**25029.48 ± 3.2123.38–34.40−0.15−1.4910.89
16LQ26.4533.52**25029.68 ± 2.9823.04–35.50−0.21−1.2310.03
15ALE25.7630.29**25029.05 ± 2.9523.48–34.78−0.08−1.2310.16
16ALE26.2930.78**25028.62 ± 2.3523.75–32.98−0.02−1.228.19
16KEL26.2330.63**25029.33 ± 2.7123.97–34.62−0.28−1.379.24
17KEL29.1636.43**25034.74 ± 2.6129.23–41.17−0.19−0.997.51
16SHZ29.1932.73**25030.71 ± 2.3925.60–35.41−0.13−1.047.76
17SHZ30.5237.77**25034.39 ± 2.9028.23–40.69−0.08−1.088.42
16CD25.7529.76**25028.31 ± 2.9521.47–33.49−0.06−1.2910.43
17CD26.6632.57**25031.98 ± 2.9825.55–37.26−0.22−1.279.32
17DT31.7538.97**25035.02 ± 2.6328.38–40.24−0.09−1.087.50
BLUECCRI7025030.37 ± 2.6924.90–34.86−0.18−1.508.86
GWAS17AY19331.60 ± 2.4126.27–38.721.061.147.61
19AY21929.35 ± 3.0324.27–37.321.010.1010.33
19ALE21930.03 ± 2.8324.69–38.791.241.169.42
19WH21929.67 ± 2.6324.50–36.760.81−0.038.85
BLUEGWAS21929.69 ± 2.7024.86–37.061.110.399.08

Env., 15AY: 2015 Anyang, 16AY: 2016 Anyang, 17AY: 2017 Anyang, 15LQ: 2015 Linqing, 16LQ: 2016 Linqing, 15ALE: 2015 Alaer, 16ALE: 2016 Alaer, 16KEL: 2016 Kuerle, 17KEL: 2017 Kuerle, 16SHZ: 2016 Shihezi, 17SHZ: 2017 Shihezi, 16CD: 2016 Changde, 17CD: 2017 Changde, 17DT: 2017 Dangtu, 19ALE: 2019 Alaer, 19AY: 2019 Anyang, 19WH: 2019 Wuhan, BLUECCRI70 and BLUEGWAS: best linear unbiased estimates of CCRI70 RILs and GWAS, respectively.

Parents, P1: sGK156, P2: 901–001.

SD, standard deviation.

CV, coefficient of variation.

indicates that the differences between sGK156 (P1) and 901–001 (P2) reached a 0.01 significance level.

Fig. 1

Phenotypic statistics of kernel oil content (KOC) of CCRI70 RILs and their parental lines, and correlations of the natural panel and CCRI70 RILs across different environments. A. Phenotypic performance of KOC of CCRI70 RILs (including highest, lowest, and average values) and their parental lines. B. Correlation analysis of the natural panel across five environments. C. Correlation analysis of CCRI70 RILs across 14 environments and their BLUE values. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Phenotype analysis of CCRI70 RILs and natural panel across multiple environments. Env., 15AY: 2015 Anyang, 16AY: 2016 Anyang, 17AY: 2017 Anyang, 15LQ: 2015 Linqing, 16LQ: 2016 Linqing, 15ALE: 2015 Alaer, 16ALE: 2016 Alaer, 16KEL: 2016 Kuerle, 17KEL: 2017 Kuerle, 16SHZ: 2016 Shihezi, 17SHZ: 2017 Shihezi, 16CD: 2016 Changde, 17CD: 2017 Changde, 17DT: 2017 Dangtu, 19ALE: 2019 Alaer, 19AY: 2019 Anyang, 19WH: 2019 Wuhan, BLUECCRI70 and BLUEGWAS: best linear unbiased estimates of CCRI70 RILs and GWAS, respectively. Parents, P1: sGK156, P2: 901–001. SD, standard deviation. CV, coefficient of variation. indicates that the differences between sGK156 (P1) and 901–001 (P2) reached a 0.01 significance level. Phenotypic statistics of kernel oil content (KOC) of CCRI70 RILs and their parental lines, and correlations of the natural panel and CCRI70 RILs across different environments. A. Phenotypic performance of KOC of CCRI70 RILs (including highest, lowest, and average values) and their parental lines. B. Correlation analysis of the natural panel across five environments. C. Correlation analysis of CCRI70 RILs across 14 environments and their BLUE values. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) The KOC of the accessions of the natural panel across four environments was distributed within the range of 24.27–38.79%, with an average of 30.16% (Table 1). Skewness and kurtosis showed that the phenotypes of both RILs and the natural panel followed a normal distribution and the typical characteristics of quantitative traits. Significant differences in KOC across different environments were also observed in both populations. The continuous variations in KOC in both the RILs and the natural panel indicated that it might be controlled by multiple genes.

Broad-sense heritability and correlations of KOC across different environments

The effect of genotype (G), environment (E), and genotype by environment (G × E) of KOC reached a significant level (p < 0.01) (Table S3). The average broad-sense heritability of the KOC of CCRI70 RILs and the natural panel was 90.51% and 92.30%, respectively, based on the above variance estimations. The KOC of upland cotton seed had strong heredity, showing little response to environmental variations. The stability and consistency of KOC in different environments provided a good basis for further genetic analysis. Correlation analysis showed a highly significant positive correlation across different environments, with correlation coefficients of 0.82–0.95 and 0.82–0.97 in CCRI70 RILs and the natural panel, respectively. Correlation analysis between 14 environments and BLUE values (Fig. 1B, 1C) showed a highly significant positive correlation, indicating that the environmental data were consistent and accurate.

QTL analysis of KOC in multiple RIL populations and GWAS of the natural panel

CCRI70 RILs Based on the genetic linkage map constructed by the Intl Cotton SNP Consortium_80k CHIP [136], 28 QTLs for KOC in cottonseed were detected by the CIM method using Windows QTL Cartographer v2.5 software (Table S4, Figure S1A). The number and distribution of QTLs throughout the genome are detailed in Fig. 2A. The LOD values of these QTLs ranged from 2.51 to 29.13, and their contribution rate to phenotypic values ranged from 2.58% to 63.07%. Seven of these QTLs were detected in at least three environments (Table 2), mainly distributed on chromosome 2 (c2), c3, c17, and c18, with additive effects from − 2.63 to 1.78 (Fig. 2D). Chromosome c17 (D03) harbored the most stable KOC QTLs (4), one of which, qOCchip-c17-5, was detected simultaneously in 14 environments. Its contribution rate to the phenotypic value ranged from 7.43% to 36.92%.
Fig. 2

Distribution of quantitative trait loci (QTLs) across the genome of cotton and the additive effects of stable QTLs. A. Distribution of QTLs of CCRI70 RILs identified by the CHIP strategy. B. Distribution of QTLs of CCRI70 RILs identified by the SLAF-seq strategy. C. Distribution of QTLs of 0–153 RILs. D. Additive effects of stable QTLs of CCRI70 RILs identified by the CHIP strategy. E. Additive effects of stable QTLs of CCRI70 RILs identified by the SLAF-seq strategy. F. Additive effects of stable QTLs of 0–153 RILs. The red bars indicate the number of QTLs that are stably epressed in at least three environments, and the green bars indicate the number of QTLs that are only expressed in one or two environments. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 2

Stable QTLs detected in CCRI70 and 0–153 RIL populations.

QTL nameaChrEnvbPosition(cM)LODAdditiveR2Physical startPhysical endReferences
qOCslaf-c1-1A0115LQ,17SHZ,17DT,BLUECCRI7086.81–86.812.59–3.38−1.21–-0.864.95–6.8112,460,81913,709,725[134]
qOCchip-c2-2A0217AY,16LQ,16KEL,17DT,16AY147.51–151.613.02–4.301.28–1.785.33–8.161,049,0901,483,801
qOCslaf-c2-1A0215LQ,17KEL,17SHZ,16CD105.21–112.512.70–3.111.20–1.774.88–6.051,190,2481,269,625
qOCslaf-c2-2A0216ALE,16SHZ,17SHZ,16CD,17DT,16LQ,BLUECCRI70121.71–124.813.01–3.531.07–1.825.11–6.251,069,0521,238,794
qOCslaf-c3-1A0316CD,17CD,16KEL6.01–7.012.67–2.770.68–0.704.52–5.491,547,0941,800,523
qOC0-153-c3-1A0314AY,15SHZ,14ALE,BLUE0-1533.81–7.812.50–3.40−0.37–-0.316.66–7.77835,2422,119,737
qOCchip-c3-1A0315AY,16ALE,17KEL,16SHZ74.01–76.112.57–3.29−1.35–-0.684.45–5.8022,751,74357,120,804[134][61]
qOC0-153-c3-2A0314ALE, BLUE0-15385.21–86.812.50–3.200.32–0.425.32–6.4416,858,60492,580,366[134]
qOC0-153-c3-3A0314ALE,15ALE,BLUE0-153115.11–115.112.70–3.30−0.44–-0.335.72–7.01102,477,680104,626,789
qOCslaf-c4-2A0417AY,15LQ,17KEL90.91–90.912.53–3.930.61–0.924.12–6.2270,463,95879,831,756
qOCslaf-c4-3A0416ALE,16KEL,17CD204.81–204.812.89–4.210.69–1.564.82–6.941,216,7152,315,633
qOC0-153-c4-1A0415KEL,14SHZ,15SHZ,BLUE0-1530.01–0.012.50–3.700.29–0.415.36–7.3084,518,72286,498,365
qOC0-153-c4-2A0414ALE,14SHZ31.81–32.613.20–3.40−0.90–-0.696.52–7.5480,133,67080,594,623
qOC0-153-c4-3A0414AY,15AY,15ALE,14SHZ,BLUE0-15335.61–40.612.90–5.90−0.92–-0.335.99–11.8479,085,78380,126,560
qOC0-153-c5-1A0515KEL,BLUE0-15314.31–14.312.50–2.600.26–0.465.14–5.51107,249,639109,379,072
qOCslaf-c6-1A0616ALE,16SHZ,17SHZ,17CD,BLUECCRI7085.11–85.412.69–3.26−1.21–-0.804.50–5.73120,173,225121,010,649
qOCslaf-c6-3A0616AY,17AY,15LQ,15ALE,16ALE,16SHZ,17SHZ,BLUECCRI70107.71–107.712.88–5.41.08–1.524.99–9.04119,831,704120,002,055
qOC0-153-c7-1A0714AY,14SHZ0.01–0.013.80–3.90−0.47–-0.467.78–7.9593,488,53094,679,900
qOC0-153-c7-2A0714ALE,14SHZ16.11–17.114.00–5.700.56–0.618.11–13.0290,291,12193,073,159[69]
qOC0-153-c7-5A07BLUE,15KEL67.71–69.412.80–3.500.54–0.835.87–7.5725,450,54726,472,025
qOC0-153-c7-6A0715KEL,15SHZ,BLUE,14SHZ80.41–81.712.8–3.4−0.66–-0.425.76–6.8321,941,17425,186,324
qOCslaf-c8-3A0816AY,17AY,16ALE,17KEL,16KEL,16SHZ,17SHZ,16CD,17CD,17DT,BLUECCRI70177.41–184.312.96–4.750.71–1.104.88–7.981,777,3122,135,693
qOCslaf-c8-4A0816AY,17AY,16LQ,BLUECCRI70190.71–190.712.60–3.390.86–0.984.33–5.691,572,7171,768,288
qOC0-153-c10-2A1014ALE,BLUE0-153109.21–111.712.90–3.30−0.43–-0.326.09–8.2673,178,54194,556,940
qOCslaf-c11-1A1115AY,16AY,15LQ,16LQ,15ALE,16ALE,17KEL,17KEL,16SHZ,17SHZ,16CD,17DT,BLUECCRI70122.11–128.512.75–4.44−0.99–-0.584.49–7.4123,689,87458,859,230
qOC0-153-c11-2A1114KEL,15AY18.11–19.612.70–2.700.57–0.875.62–5.83110,990,183113,750,731
qOC0-153-c11-3A1115KEL,15ALE,BLUE0-15358.21–60.212.30–2.90−0.52–-0.405.25–6.0159,121,52659,583,114
qOCslaf-c14-1D0215LQ,16CD,BLUE,16ALE,17SHZ137.31–143.512.69–3.780.65–0.914.33–6.2360,566,07061,911,566
qOCslaf-c15-1D0115LQ,15ALE,17KEL,16SHZ,17DT94.51–95.112.84–4.770.70–1.164.75–7.9010,767,13211,338,952
qOCslaf-c17-1D0315AY,16AY,17AY,15LQ,16LQ,15ALE,16ALE,17KEL,16KEL,16SHZ,17SHZ,16CD,17CD,17DT,BLUECCRI7043.81–45.6125.4–43.61−2.92–-1.6630.64–54.0342,061,07643,311,612[130][23]
qOCslaf-c17-2D0315AY,17AY,15LQ,15ALE,16CD,17DT,BLUECCRI7050.51–51.5127.00–37.80−2.75–-1.6935.95–49.0931,778,26842,054,654[23][130], [134]
qOCslaf-c17-4D0317AY,15ALE,16CD140.91–144.112.55–3.540.55–0.752.01–3.252,476,6962,492,953
qOCchip-c17-2D0315AY,16AY,15LQ,16LQ,15ALE,17KEL,16KEL,17SHZ,16CD,17CD,17DT40.31–42.912.52–15.49−1.96–-0.662.86–24.1839,490,03939,800,209[130], [134]
qOCchip-c17-3D0316LQ,15ALE,BLUECCRI7048.51–48.5121.80–25.28−2.01–-1.8733.36–37.2835,884,33539,490,039[130], [134]
qOCchip-c17-4D0315AY,16AY,17AY,15LQ,16ALE,17KEL,16KEL,16SHZ,17SHZ,16CD,17CD,17DT51.11–54.7118.29–29.13−2.63–-1.4424.56–41.0534,643,83535,888,354[130]
qOCchip-c17-5D0315AY,16AY,17AY,15LQ,16LQ,15ALE,16ALE,16KEL,16SHZ,17SHZ,16CD,17CD,17DT,BLUECCRI7058.21–60.615.97–25.28−2.43–-1.097.43–36.9232,858,40834,643,835[130]
qOC0-153-c17-3D0315ALE,14SHZ,BLUE0-153116.41–117.212.10–4.10−0.52–-0.284.92–9.54608,4891,048,342
qOCslaf-c22-1D0415AY,16AY,15LQ,15ALE,16KEL,17CD,BLUECCRI70144.91–148.914.50–11.58−3.33–-2.3260.70–72.8313,34554,237,568
qOC0-153-c19-1D0515SHZ,14AY40.11–41.912.40–2.600.38–0.405.16–5.6724,566,96428,945,620[69]
qOC0-153-c19-2D0515KEL,15SHZ47.91–47.912.50–2.600.37–0.425.48–5.7416,717,49623,760,060[60], [84]
qOCslaf-c25-1D0615AY,17SHZ,17CD33.11–33.112.50–2.92−1.14–-0.824.55–5.576,710,7636,853,575
qOC0-153-c25-1D0614SHZ,15KEL37.31–39.712.50–2.500.51–0.525.38–5.4527,591,30043,038,644
qOC0-153-c25-2D0615ALE,15KEL58.91–58.913.10–3.20−0.57–-0.537.42–7.566,133,0468,278,619
qOCslaf-c23-1D0916AY,17AY,17SHZ,17CD,BLUECCRI7026.81–26.812.59–5.170.85–1.354.69–8.8648,167,98348,221,325
qOC0-153-c20-1D1014KEL,BLUE,15ALE10.11–10.612.80–4.80−1.02–-0.386.05–8.0261,328,78962,929,532[69], [130]
qOC0-153-c20-3D1015KEL,15SHZ75.21–77.312.80–2.900.55–0.576.06–6.386,479,6568,103,827
qOCchip-c18-2D1317KEL,17CD,17DT74.71–75.312.72–2.90−0.86–-0.794.77–5.2131,737,19245,707,863[61]

In the name of a QTL, chip indicates that the QTL is identified on the genetic linkage map of CCRI70 by CHIP strategy, slaf indicates that the QTL is identified on the genetic linkage map of CCRI70 by SLAF-seq strategy, and 0-153 indicates that the QTL is identified on the genetic linkage map of 0–153.

Env., 14ALE: 2014 Alaer, 14AY: 2014 Anyang, 14KEL: 2014 Kuerle, 14SHZ: 2014 Shihezi, 15ALE: 2015 Alaer, 15AY: 2015 Anyang, 15KEL: 2015 Kuerle, 15LQ: 2015 Linqing, 15SHZ: 2015 Shihezi, 16ALE: 2016 Alaer, 16AY: 2016 Anyang, 16CD: 2016 Changde, 16KEL: 2016 Kuerle, 16LQ: 2016 Linqing, 16SHZ: 2016 Shihezi, 17AY: 2017 Anyang, 17CD: 2017 Changde, 17DT: 2017 Dangtu, 17KEL: 2017 Kuerle, 17SHZ: 2017 Shihezi, BLUECCRI70 and BLUE0-153: best linear unbiased estimates of CCRI70 RILs and 0–153 RILs, respectively.

Distribution of quantitative trait loci (QTLs) across the genome of cotton and the additive effects of stable QTLs. A. Distribution of QTLs of CCRI70 RILs identified by the CHIP strategy. B. Distribution of QTLs of CCRI70 RILs identified by the SLAF-seq strategy. C. Distribution of QTLs of 0–153 RILs. D. Additive effects of stable QTLs of CCRI70 RILs identified by the CHIP strategy. E. Additive effects of stable QTLs of CCRI70 RILs identified by the SLAF-seq strategy. F. Additive effects of stable QTLs of 0–153 RILs. The red bars indicate the number of QTLs that are stably epressed in at least three environments, and the green bars indicate the number of QTLs that are only expressed in one or two environments. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Stable QTLs detected in CCRI70 and 0–153 RIL populations. In the name of a QTL, chip indicates that the QTL is identified on the genetic linkage map of CCRI70 by CHIP strategy, slaf indicates that the QTL is identified on the genetic linkage map of CCRI70 by SLAF-seq strategy, and 0-153 indicates that the QTL is identified on the genetic linkage map of 0–153. Env., 14ALE: 2014 Alaer, 14AY: 2014 Anyang, 14KEL: 2014 Kuerle, 14SHZ: 2014 Shihezi, 15ALE: 2015 Alaer, 15AY: 2015 Anyang, 15KEL: 2015 Kuerle, 15LQ: 2015 Linqing, 15SHZ: 2015 Shihezi, 16ALE: 2016 Alaer, 16AY: 2016 Anyang, 16CD: 2016 Changde, 16KEL: 2016 Kuerle, 16LQ: 2016 Linqing, 16SHZ: 2016 Shihezi, 17AY: 2017 Anyang, 17CD: 2017 Changde, 17DT: 2017 Dangtu, 17KEL: 2017 Kuerle, 17SHZ: 2017 Shihezi, BLUECCRI70 and BLUE0-153: best linear unbiased estimates of CCRI70 RILs and 0–153 RILs, respectively. Based on the genetic linkage map constructed by SLAF-seq technology, 41 QTLs for the KOC in cottonseed were identified by the CIM method (Table S5, Figure S1B). The number and distribution of QTLs throughout the genome are detailed in Fig. 2B. The LOD value of these QTLs ranged from 2.50 to 43.61, and the contribution rate ranged from 2.01 to 72.80%. Nineteen of these QTLs were detected in at least three environments, mainly distributed on c1, c2, c3, c4, c6, c8, c11, c14, c15, c17, c22, c23, and c25, with additive effects from − 3.33 to 1.82 (Fig. 2E). A major QTL, qOCslaf-c17-1, on c17 (D03) was detected simultaneously in all 14 environments, with a LOD value of 25.4–43.61 and a contribution rate to the phenotype of 30.64–54.03%. Its negative additive effect, which ranged from − 2.92 to − 1.66, indicated that its additive allele, which favors phenotypic formation of KOC, was obtained from parent 901–001. In summary, the above two linkage maps simultaneously identified two DNA fragments on c2 (A02) and c17 (D03). In the c2 fragment, the Intl Cotton SNP Consortium_80k CHIP strategy identified locus qOCchip-c2-2, and the SLAF-seq strategy identified two loci, qOCslaf-c2-1 and qOCslaf-c2-2. In the c17 fragment, the Intl Cotton SNP Consortium_80k CHIP strategy identified four loci, qOCchip-c17-2, qOCchip-c17-3, qOCchip-c17-4, and qOCchip-c17-5, and the SLAF-seq strategy identified one locus, qOCslaf-c17-2. Previous studies have isolated genes controlling the KOC in cottonseed on these chromosome loci [130], [133]. 0–153 RILs Based on the genetic linkage map [128], which was constructed by the integration of SSRs, SLAF-SNPs, and chip-SNPs using TM-1 v2.1 as a reference genome [37], 44 QTLs for KOC in cottonseed were obtained by the CIM method (Table S6, Figure S1C). The distribution of these QTLs across the genome is detailed in Fig. 2C. These QTLs have LOD values of 2.1–5.9, contributing 4.87–13.02% of KOC phenotypic variance. Eight of these QTLs were detected in at least three environments (Table 2), mainly distributed on chromosomes c3, c4, c7, c11, c17, and c20 (Figure S1C). The distribution of additive effects is shown in Fig. 2F. GWAS of the natural panel GWAS of the natural panel was performed with both GLM and MLM algorithms (Figures 3 and S2). The evaluation of phenotypic data and SNP filtering exhibited their reliabilities (Figure S2A, S2B, S2C, S2D, S2E). GWAS was carried out under the condition of K = 4, under which the population was in the most appropriate structure (Figure S2F). The quantile–quantile plot (QQ plot) showed that there was a significant correlation between the KOC phenotype and genotypes of the natural panel (Fig. 3B and 3D). The results finally associated 34 and 22 loci with the KOC in cottonseed, among which 13 and 8 loci, respectively, were detected in at least two environments (Fig. 3A and 3C, Table S7), distributed mainly on chromosomes A1, A3, A5, A6, A7, A10, A11, D5, D11, and D13 (Table 3). Among these loci, seven were simultaneously associated with KOC in both models, and all of them were on c18 (D13). These significant SNP markers explained 9.38–18.33% of the phenotypic variance, with genetic effects of − 7.33 to 6.9. These results suggest that the chromosome regions where the significant SNPs were located may harbor candidate genes regulating KOC accumulation in cottonseed.
Fig. 3

Manhattan map and the quantile–quantile (QQ) chart in the genome-wide association studies (GWAS) of the natural panel. A. Manhattan map of GLM model. B. QQ chart of GLM model. C. Manhattan map of MLM model. D. QQ chart of MLM model.

Table 3

The significant loci that are associated with KOC in at least two environments using models GLM and MLM in GWAS.

QTNChrPositionAlleleEnv*GLMMLMReference
p-valueMafR2effectp-valueMafR2effect
c01_69712855A0169,712,855G/A19ALE6.21E-060.009713.94−7.33////
17AY6.15E-050.010813.81−6.12////
c03_18925549A0318,925,549A/G19ALE5.13E-050.009711.854.73////[134]
17AY2.63E-050.010814.744.58////
c05_107481128A05107,481,128A/G19ALE8.42E-060.024213.643.05////
BLUEGWAS6.84E-050.024211.502.73////
c05_107481161A05107,481,161A/G19ALE1.73E-060.021715.243.431.32E-050.021718.333.42
17AY1.23E-050.016215.583.381.16E-050.016216.253.69
19AY4.29E-050.021713.103.365.18E-050.021712.423.49
BLUEGWAS5.45E-060.021714.013.261.45E-050.021714.353.35
19WH5.64E-050.02179.843.03////
c06_17240705A0617,240,705C/G19ALE6.29E-050.009711.654.67////
17AY6.83E-050.010813.694.33////
BLUEGWAS6.49E-050.009711.554.68////
c07_8706328A078,706,328T/C19AY////6.61E-050.024212.18−3.17
BLUEGWAS////8.41E-050.024212.65−2.72
c10_77548836A1077,548,836T/A19ALE6.14E-060.007213.966.454.10E-050.007217.286.36
17AY2.26E-050.008114.905.701.58E-050.008115.916.27
BLUEGWAS7.94E-050.007211.355.63////
c11_104291533A11104,291,533G/A19ALE6.73E-050.004811.586.54////
17AY6.03E-060.005416.386.96////
c18_42111539D1342,111,539A/G19ALE5.85E-060.019314.003.704.63E-050.019312.524.18[61]
19AY3.60E-050.019313.273.87////
BLUEGWAS7.47E-060.019313.693.673.05E-050.019313.623.91
19WH5.40E-050.01939.893.47////
c18_42113870D1342,113,870G/A19AY4.55E-050.016913.04−3.804.26E-050.016912.60−4.09[61]
BLUEGWAS4.29E-050.016911.95−3.329.63E-050.016912.52−3.51
c18_42507564D1342,507,564A/G19ALE5.89E-050.019311.712.80////[61]
19AY2.56E-050.019313.603.392.47E-050.019313.143.69
BLUEGWAS1.75E-050.019312.843.024.40E-050.019313.273.24
19WH8.98E-050.01939.382.88////
c18_46810731D1346,810,731G/A19ALE1.90E-050.024212.82−2.54////
BLUEGWAS5.51E-050.024211.71−2.40////
c19_19777635D519,777,635G/A19AY5.13E-060.014515.19−6.444.95E-060.014514.73−6.86[60], [69], [84]
BLUEGWAS1.67E-050.014512.89−5.282.38E-050.014513.87−5.69
c21_22911784D1122,911,784C/A19AY4.75E-050.014513.00−3.943.67E-050.014512.75−4.10
BLUEGWAS1.87E-050.014512.77−3.621.68E-050.014514.21−3.72
19WH9.11E-060.014511.69−3.964.47E-060.014513.07−4.19

Env., 17AY: 2017 Anyang, 19AY: 2019 Anyang, 19ALE: 2019 Alaer, 19WH: 2019 Wuhan, and BLUEGWAS: best linear unbiased estimates of GWAS, respectively.

Manhattan map and the quantile–quantile (QQ) chart in the genome-wide association studies (GWAS) of the natural panel. A. Manhattan map of GLM model. B. QQ chart of GLM model. C. Manhattan map of MLM model. D. QQ chart of MLM model. The significant loci that are associated with KOC in at least two environments using models GLM and MLM in GWAS. Env., 17AY: 2017 Anyang, 19AY: 2019 Anyang, 19ALE: 2019 Alaer, 19WH: 2019 Wuhan, and BLUEGWAS: best linear unbiased estimates of GWAS, respectively. Combining the results of linkage analyses of CCRI70 and 0–153 RILs and association study of the natural panel, 11 QTLs mapped by SLAF-seq and 10 QTLs mapped by chip technology in CCRI70 RILs, 10 QTLs mapped in 0–153 RILs, and 8 QTNs mapped by GWAS could be mapped simultaneously on at least two linkage maps (CCRI70 RIL) or at least in two populations (CCRI70 and 0–153 RILs and the natural panel) (Table S8). The chromosome regions of these QTLs or QTNs are important research objects for candidate gene mapping and metabolic pathway analysis in this study.

Candidate genes in stable QTLs and their GO enrichment and KEGG analysis

The SNPs at both ends of the QTL confidence interval were mapped back to the upland cotton genome (TM-1_V2.1) [37] based on their sequence information. Based on the gene annotation information of the physical interval in the cotton genome, a total of 832 and 3177 genes were obtained in the confidence interval stable QTLs identified on the two genetic linkage maps of the Intl Cotton SNP Consortium_80k chip and SLAF-seq technique, respectively. Using the identification results of the two linkage maps in CCRI70 RILs, 3763 genes were identified, and 246 genes were identified simultaneously on both maps (Table 4, Fig. 4A). In 0–153 RILs, 3736 genes were identified in the confidence interval of the stable QTLs based on linkage [128] (Table 4). In the natural panel, 205 genes were associated with KOC (Table 4). Based on the results of linkage analyses, 7126 genes were identified in the two linkage populations (CCRI70 and 0–153 RILs), among which 373 genes were identified simultaneously in both populations (Table 4, Fig. 4B). Finally, 7196 candidate genes were detected through both linkage and association analyses (Table S9, Fig. 4B) on 20 chromosomes of the G. hirsutum genome (Table 4). Among the genes identified in the three populations, 754 candidate genes were detected simultaneously on the two genetic linkage maps of CCRI70 RILs or in each pair of the populations (Table 4, Fig. 4A and 4B).
Table 4

Number of candidate genes identified from KOC QTLs that are identified simultaneously by at least two strategies or two populations.

ChrCCRI70 RILs0–153 RILsNatural panelTotal
SLAFCHIPCommonTotal
A0146//2/48
A0221**33**//21a33
A0317**240**892**7**17b + 240c + 7e892
A04277**/253**/26b504
A05//119**13**13e119
A0641//3/44
A07//35112/363
A0843////43
A10//374**9**9e374
A11415**/265**981b608
D0141////41
D0247////47
D03396**225**56/225a452
D041818////1818
D05//887**21**21e887
D069**/344**/9b344
D096////6
D10//195//195
D11///3/3
D13/334**/126**85d375
Total genes317783237362057547196

/ indicates that no common QTLs were identified.

indicates that the genes are identified in the marker intervals of common QTLs.

indicates that the genes are identified in the common QTLs of CCRI70_Chip and CCRI70_SLAF.

indicates that the genes are identified in the common QTLs of CCRI70_SLAF and 0–153 RILs.

indicates that the genes are identified in the common QTLs of CCRI70_Chip and 0–153 RILs.

indicates that the genes are identified in the common QTLs of CCRI70_Chip and the natural panel.

indicates that the genes are identified in the common QTLs of 0–153 RILs and the natural panel.

Fig. 4

Bioinformatics studies of the candidate genes harbored in stable QTLs. A. The gene numbers harbored in the stable QTLs in the CHIP and SLAF-seq strategies of CCRI70 RILs. B. Comparison of the genes in the stable QTLs of CCRI70 RILs, 0–153 RILs, and the natural panel. C. gene ontology annotation (GO) term enrichment analysis of the candidate genes. D. The first 20 significantly enriched GO terms. E. The primary and secondary level annotaions of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway of the candidate genes. F. The first 20 significantly enriched KEGG pathways. G. The number of genes related to oil metabolism in GO term enrichment and KEGG pathway annotation after referring to the genes related to oil metabolism in Arabidopsis.

Number of candidate genes identified from KOC QTLs that are identified simultaneously by at least two strategies or two populations. / indicates that no common QTLs were identified. indicates that the genes are identified in the marker intervals of common QTLs. indicates that the genes are identified in the common QTLs of CCRI70_Chip and CCRI70_SLAF. indicates that the genes are identified in the common QTLs of CCRI70_SLAF and 0–153 RILs. indicates that the genes are identified in the common QTLs of CCRI70_Chip and 0–153 RILs. indicates that the genes are identified in the common QTLs of CCRI70_Chip and the natural panel. indicates that the genes are identified in the common QTLs of 0–153 RILs and the natural panel. Bioinformatics studies of the candidate genes harbored in stable QTLs. A. The gene numbers harbored in the stable QTLs in the CHIP and SLAF-seq strategies of CCRI70 RILs. B. Comparison of the genes in the stable QTLs of CCRI70 RILs, 0–153 RILs, and the natural panel. C. gene ontology annotation (GO) term enrichment analysis of the candidate genes. D. The first 20 significantly enriched GO terms. E. The primary and secondary level annotaions of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway of the candidate genes. F. The first 20 significantly enriched KEGG pathways. G. The number of genes related to oil metabolism in GO term enrichment and KEGG pathway annotation after referring to the genes related to oil metabolism in Arabidopsis. GO analysis of the 754 candidate genes showed that these candidate genes were mainly enriched in the categories of biological process, cellular component, and molecular function (Table S10, Fig. 4C). Biological process mainly included cellular process, metabolic process, and single-organism process. Cellular components mainly included the cell, cell part, and organelle, and molecular functions mainly included binding and catalytic activity. Thirteen genes, GH_A03G0921, GH_A03G0964, GH_A03G0975, GH_A03G0976, GH_A03G0979, GH_A03G1006, GH_A03G1056, GH_A03G1070, GH_A03G1100, GH_A10G1508, GH_A11G2175, GH_A11G2207, and GH_D03G0938, were identified to participate in the FA metabolic process, VLCFA metabolic process, and FA elongase complex (Table S11, Fig. 4D), which were regarded to be involved in important pathways of lipid biosynthesis [25], [42], [56]. KEGG analysis of these candidate genes showed that they mainly play important roles in the pathways of metabolism, genetic information processing, and cellular processes (Table S12). A total of 32 genes were involved in the metabolic processes of pathways related to lipid biosynthesis (Fig. 4E, Tables S12 and S13). Among them, GH_A04G1201 was identified as being involved in the metabolic process of FA degradation (ko00071), biosynthesis of secondary metabolites (ko01110), FA metabolism (ko01212), metabolic pathways (ko01100), and alpha-linolenic acid metabolism (ko00592). GH_A03G0956 was involved in the biosynthesis of secondary metabolites (ko01110), metabolic pathways (ko01100), alpha-linolenic acid metabolism (ko00592), and glycerophospholipid metabolism (ko00564). GH_A11G2246 was involved in the biosynthesis of secondary metabolites (ko01110), metabolic pathways (ko01100), starch and sucrose metabolism (ko00500), and cyanoamino acid metabolism (ko00460). GH_A03G0949 is involved in the biosynthesis of secondary metabolites (ko01110), metabolic pathways (ko01100), and glycerophospholipid metabolism (ko00564). GH_D13G1396 was involved in the biosynthesis of secondary metabolites (ko01110), metabolic pathways (ko01100), and lipoic acid metabolism (ko00785). Three genes, GH_D03G0980, GH_A03G1138, and GH_D13G1354, were involved in the biosynthesis of secondary metabolites (ko01110), metabolic pathways (ko01100), and starch and sucrose metabolism (ko00500). GH_D13G1401 was involved in metabolic pathways (ko01100) and lipoic acid metabolism (ko00785). Both GH_D03G0934 and GH_D03G0968 were involved in metabolic pathways (ko01100) and starch and sucrose metabolism (ko00500) (Fig. 4F). At present, a large number of achievements have been made in research relating to Arabidopsis lipid metabolic pathways [8], [42], [44], [48], [106]. A database (https://aralip.plantbiology.msu.edu/pathways/pathways) [48] has collected comparatively complete gene pools relating to lipid metabolic pathways in Arabidopsis. Comparing the results of GO term enrichment and KEGG pathway annotation revealed that they shared 338 genes. Referring these common genes to the gene pool relating to oil metabolism in Arabidopsis, we found that 28 genes had ascertained functions in synthetic oil metabolism (Fig. 4G, Table S14).

Dynamic expression patterns of candidate genes relating to lipid metabolism in developing ovules

Combining analyses of GO, KEGG, and homologues in pathways related to lipid metabolism in Arabidopsis, 64 candidate genes, 57 coding genes, and 7 TFs were considered to be related to lipid metabolism in cottonseed. Using the FPKM values of gene expression levels of transcriptome data of TM-1 at different developmental stages of 0, 1, 3, 5, 10, and 20 dpa ovules to validate the dynamic expression patterns of these candidate genes, almost all of them had some specific expression at these stages, except five, which did not have any expression (Table S15). Clustering analyses of the gene expression pattern, the expression curves across multiple stages and expression tendency sorted these genes into seven expression modules (Figure S4). The lowest gene number in a module had five genes, and the highest had 12 genes. The accumulation pattern of lipid and FA components during the development of cottonseed showed an upward trend from 2 dpa to maturity, and the oil content reached its peak in mature seeds [36], [76]. Notably, Cluster 5 included six members: GH_A11G2214, GH_D03G0988, GH_D03G0922, GH_A11G2180, GH_D13G1385, and GH_A04G1183. The genes had a steady expression before 10 dpa, and then, their expression significantly increased at 20 dpa. Gene annotation revealed that GH_A11G2214, which encodes Probable beta-D-xylosidase 6 BXL6, GH_D03G0988, which encodes 2-keto-3-deoxy-L-rhamnonate aldolase rhmA, and GH_D03G0922, which encodes arabinogalactan peptide 20 (AGP20), were involved in glycolysis. The genes in Clusters 1 and 2 had high expression at 10 dpa, while the genes in Clusters 3 and 4 had high expression at 5 dpa.

Hub genes in the interaction network of differentially expressed candidate genes

To further capture the interaction between these differentially expressed genes, the interaction network among proteins of 57 coding genes and 7 TFs was explored using the Metascape database [132]. The network construction revealed that 24 hub genes formed interaction nodes that had at least three directions of gene interaction (Fig. 5). The network recognized three core MCODE components, which contained 13 hub or key genes and 5 TFs, including RAV1 (GH_A03G1130), ERF1-3 (GH_D03G1065), FERONIA (FER, a receptor-like kinase) (GH_A03G1011), BHLH93 (GH_A11G2218), and MYB28 (GH_D03G1077), which were involved in the interaction of genes in the core MCODE components (Fig. 5 and Table S16). The genes in MCODE 1, which mainly participated in the carbon metabolic process, included atpB (GH_A03G1009), MENB (GH_D03G1063), PGK (coding phosphoglycerate kinase) (GH_A03G0979), SHM1 (GH_A11G2207), and LtnD (GH_A03G1070). The genes in MCODE 2, which participated in the coenzyme metabolic process, cofactor metabolic process, and small molecule biosynthetic process, included EMB8 (GH_D03G0974), GDSL-like Lipase (GH_D13G1338), MJE7.13 (GH_A10G1509), SPCC663.09c (GH_A03G0993), and D27 (GH_D03G0969). The genes in MCODE 3, which participated in the lipid biosynthetic process and cellular lipid metabolic process, included LACS4 (GH_D03G0938), PAH2 (GH_A03G0964), and LPLAT1 (GH_A03G1006) (Fig. 5, Table S16).
Fig. 5

Analysis of the protein interaction network of candidate genes under the molecular complexity detection (MCODE) algorithm.

Analysis of the protein interaction network of candidate genes under the molecular complexity detection (MCODE) algorithm.

Expression verification of hub genes in different KOC germplasm

To further clarify the expression differences of the genes in relation to lipid biosynthesis between materials of high and low KOC, 17 genes recognized in the network were further analyzed for their expression differences during ovule development between Hai7124 (high KOC of 35.31%) and TM-1 (low KOC of 30.82%) (Figure S5A). The genes included 13 hub or key genes of core MCODE components (Figures S5B, S5C, S5D, S5E, S5F, S5H, S5I, S5J, S5L, S5M, S5N, S5O, S5Q) and 4 TF genes (Figures S5G, S5K, S5P, S5R). According to the FPKM values of the transcriptome data of 0, 1, 3, 5, 10, and 20 dpa ovules of Hai7124 and TM-1, GH_ A02G0124 (FPA), GH_A03G0949 (Atlg22950), GH_D03G0961 (ARF9), and GH_D03G1065 (ERF1-3) (Figure S5Q) had a higher FPKM expression in Hai7124 than in TM-1 almost across all six ovule development stages (Figures S5B, S5C, S5P, and S5R), while GH_A03G1070 (LtnD) and GH_A10G1508 (DIR1) had a higher expression in TM-1 than in Hai7124 (Figures S5I, S5L, Table S17). GH_A03G0964 (PAH2), GH_A03G1075 (NMT1), GH_D03G0956 (P14KG2), and GH_D03G0974 (EMB8) had a higher expression in 5 and 10 dpa ovules in TM-1 than in Hai7124 (Figures S5D, S5J, S5O, and S5Q). GH_A03G0991 (NTF4), GH_A10G1509 (MJE7.13), and GH_D03G0938 (LACS4) had a higher expression in 10 and 20 dpa ovules in Hai7124 than in TM-1, while GH_A03G1006 (LPLAT1) had a higher expression in TM-1 than in Hai7124 in these stages (Figures S5E, S5M, S5N, and S5F). GH_A03G1056 (ABCG11) had a high expression in 3 dpa ovules in TM-1, and GH_A03G1130 (RAV1) had a high expression exclusively before 3 dpa in Hai7124 (Figures S5H and S5K).

Identification of functional gene clusters in carbon metabolism and the Kennedy pathway

Based on the gene expression profiles and differences in developing ovules and the interaction network of the differentially expressed candidate genes, to identify causal genes within the QTL intervals, we examined the genetic variations of the core region of these hub genes. First, we mapped the hub genes back to the QTLs and aligned them to the physical location of the reference genome of G. hirsutum [37]. Five of the hub genes, GH_A03G0964 (PAH2), GH_A03G0979 (PGK), GH_A03G1006 (LPLAT1), GH_A03G1011 (FER), and GH_A03G1070 (LtnD), were identified in the common region of a QTL identified in both CCRI70 and 0–153 RILs (qOCchip-c3-1 and qOC0-153-c3-2), in which the association analysis of the natural panel identified 20 haplotypes (Fig. 6A, 6B, and 6C). Analysis of these genes in the currently available Gossypium genome assembly revealed that the clustering of these genes varied slightly [37], [107] (Fig. 6D). Genetic variation analysis of these genes showed that all five genes had SNP differences between sGK156 and 901–001, the two parental lines of CCRI70 RILs (Fig. 6E). Three of these can also be verified in the natural panel (Fig. 6F). These genes have a dynamic expression profiling across different organs and tissues, especially the developmental stages of the ovule and fiber (Fig. 6G). Functional analysis indicated that GH_A03G0964 (PAH2) and GH_A03G1006 (LPLAT1) coordinated the key steps of the Kennedy pathway and Lands cycle, respectively, and GH_A03G0979 (PGK), GH_A03G1011 (FER), and GH_A03G1070 (LtnD) were responsible for catalyzing the steps in carbon metabolism that provide carbon sources and energy-flow for FA biosynthesis and TAG assembly (Fig. 7).
Fig. 6

Candidate gene identification in a major quantative trait locus (QTL) on chromosome 3 (A03). A. The genetic position and significance of the common QTL identified both in CCRI70 and 0–153 RILs. B. Physical position of the marker interval of common QTL. C. Haplotypes of the overlapping region of common QTL, where the hub genes were identified. D. Arrangement of the hub genes on the physical maps of the different Gossypium genome assemblies. E. SNP differences identified in the hub genes between sGK156 and 901–001, the two parental lines of CCRI70 RILs. F. SNPs identified in the hub genes in the natural panel. G. The gene expression levels (FPKM in log2(FPKM + 1)) in different organs, tissues, and developmental stages of fiber and ovule.

Fig. 7

A working model of the hub genes LPLAT1 and PAH2 in the Kennedy pathway. This model was modified based on Li-Beisson et al. [48], Wang et al. [106], Zhao et al. [131], and Zhu et al. [133].

Candidate gene identification in a major quantative trait locus (QTL) on chromosome 3 (A03). A. The genetic position and significance of the common QTL identified both in CCRI70 and 0–153 RILs. B. Physical position of the marker interval of common QTL. C. Haplotypes of the overlapping region of common QTL, where the hub genes were identified. D. Arrangement of the hub genes on the physical maps of the different Gossypium genome assemblies. E. SNP differences identified in the hub genes between sGK156 and 901–001, the two parental lines of CCRI70 RILs. F. SNPs identified in the hub genes in the natural panel. G. The gene expression levels (FPKM in log2(FPKM + 1)) in different organs, tissues, and developmental stages of fiber and ovule. A working model of the hub genes LPLAT1 and PAH2 in the Kennedy pathway. This model was modified based on Li-Beisson et al. [48], Wang et al. [106], Zhao et al. [131], and Zhu et al. [133].

Discussion

Influencing factors of the KOC in cottonseed

Oil is an important storage substance in plant seeds, and its accumulation is closely correlated with seed development [36], [133]. Previously, research on KOC emphasized the effect of the environment on oil accumulation in mature cottonseed [77]. An early study found that the KOC in cottonseed varied according to the locations where the seed samples were collected [102]. These results suggest that environmental impact accounts for a considerable proportion of the total variations in KOC. More recently, multiple environmental data of upland cotton cultivars have shown that the genotype, environment, and their interaction impose significant impacts on the KOC in cottonseed [31]. All genetic effects, including additive, dominance, dominance × environment, and additive–dominance epistasis × environment effects, were involved in the genetic architecture of the KOC in cottonseed. However, it has been demonstrated that the dominance effect contributes the most to the total KOC heritability [23]. Maternal or cytoplasmic effects may also contribute to KOC variations to some extent [62]. In the current study, phenotypic evaluations were performed in a multi-year multi-location style (Table 1). The phenotypic variations of KOC ranged from 19.45% to 41.17% in CCRI70 RILs and from 24.27% to 38.72% in the natural panel accessions. Both showed a high broad-sense heritability of 86.23–98.81% and 90.13–96.12%, respectively, which was consistent with the results of previous research [70], [76]. The high heritability of oil content ensures the effect of the genotype of each line (RIL) or accession (natural panel) on phenotypic performance and the reliability of QTLs mapped on this basis. Significant differences between years or between locations indicate that the environmental effect and the interaction effect between environment and genotype also contribute greatly to phenotypic performance [31]. The variation between different years was greater than that between different locations. It can be reckoned that the optimum collection of parental germplasm, targeting selection, and multi-environment/location evaluations are the key elements to guide breeders to obtain high oil content progeny materials.

Linkage and association analysis of the KOC loci of cottonseed

Previous studies have also tackled the KOC in cottonseed, mainly using genetic linkage analysis [61], [112] or association studies of natural populations [22], [69]. GWAS mapping utilizes abundant genetic variations in natural populations, which bestows its advantages of high mapping accuracy. However, linkage disequilibrium makes inefficient mapping unable to detect loci of low abundance, yielding false-positive association results [4]. Genetic linkage mapping can identify low-abundance alleles with low marker coverage in the genome. However, the low mapping resolution renders the mapped fragments large and imprecise, which affects the research expectations. The genetic segregation of the target traits involves only differences between parents [67], [114], [128]. At present, strategies combining linkage and association analyses have been successfully applied in wheat [64], [88], maize [65], [67], rice [43], and oil rape (Brassica napus L) [100]. In cotton, based on the GWAS of a panel of 355 natural accessions and linkage analysis of an F2 population derived from two extreme materials in fiber length, a major QTL, which explains>10% of the phenotypic variance, on D03 was simultaneously identified in both strategies. A further study revealed that a candidate gene (Gh_D03G1338) in the QTL region may be responsible for determining cotton fiber elongation by regulating the synthesis of sucrose [124]. Wen et al. [114] combining linkage and association analyses of brown fiber accessions, fine-mapped the brown fiber region, Lc1, and dissected it into two loci, qBFA07-1 and qBF-A07-2. They further identified Gh_A07G2341 and Gh_A07G0100 as the causal genes of these two loci, respectively. The current study also combined both strategies of multiple linkage and association analyses. The linkage maps genotyping CCRI70 RILs with the Intl Cotton SNP Consortium_80k chip and SLAF-seq technique identified 7 and 19 stable QTLs, respectively, of which six were simultaneously identified by both strategies, namely qOCchip-c2-2 and qOCslaf-c2-1, qOCchip-c2-2 and qOCslaf-c2-2, qOCslaf-c17-2 and qOCchip-c17-2, qOCslaf-c17-2 and qOCccri70-c17-3, qOCslaf-c17-2 and qOCccri70-c17-4, and qOCslaf-c17-2 and qOCccri70-c17-5 (Table 1). Notably, qOCchip-c3-1 and qOC0-153-c3-2 shared identical confidence marker intervals in which a locus, c03_18925549, was also significantly associated with KOC in the GWAS of the natural panel. In the confidence marker interval of qOCchip-c18-2, three loci, c18_42111539, c18_42113870, and c18_42507564, were significantly associated with KOC in the GWAS of the natural panel. Linkage analysis of 0–153 RILs identified 21 stable QTLs for KOC in cottonseed (Table 1), among which, qOC0-153-c3-1, qOC0-153-c4-3 (together with qOC0-153-c4-2), qOC0-153-c11-3, and qOC0-153-c25-2 shared identical marker intervals of qOCslaf-c3-1, qOCslaf-c4-2, qOCslaf-c11-1, and qOCslaf-c25-1 in CCRI70 RILs, respectively. In the confidence marker interval of qOC0-153-c5-1, two loci, c05_107481128 and c05_107481161, were significantly associated with KOC in the GWAS of the natural panel. qOC0-153-c10-2 and qOC0-153-c19-2 each harbored a locus, c10_77548836 and c19_19777635, respectively, that was significantly associated with the GWAS of the natural panel. The mutual confirmation of the mapping results between different linkage maps (within CCRI70 RIL), across different populations (between CCRI70 and 0–153 RILs) and between linkage and association analyses, ensures the accuracy of the major QTLs. The major QTLs and QTNs mapped by linkage and association analyses were consistent, which could significantly improve the precision and accuracy of the loci [98]. The mapping results provide a good opportunity to further identify the hub genes in the lipid biosynthesis pathways to dissect the genetic mechanism of KOC accumulation during ovule development through interaction analysis of the candidate genes harbored in these loci. To better understand the QTL/QTN identification results, markers in the QTL interval were aligned with the physical map of TM-1 (v2.1) [37]) to compare with those reported in previous studies based on their respective physical confidence intervals. The stable QTLs that were obtained through linkage analysis of CCRI70 RILs (qOCslaf-c1-1, qOCslaf-c3-1, qOCslaf-c17-1, qOCslaf-c17-2, qOCslaf-c17-4, qOCchip-c17-2, qOCchip-c17-3, qOCchip-c17-4, qOCchip-c17-5, qOCchip-c18-2, and qOCslaf-c22-1) were identified in regions already reported in previous studies (Table 2) [23], [61], [130], [134]. The remaining 17 QTLs were identified de novo in the current study. The stable QTLs obtained through linkage analysis of 0–153 RILs (qOC0-153-c3-2, qOC0-153-c5-1, qOC0-153-c7-2, qOC0-153-c10-2, qOC0-153-c19-1, qOC0-153-c19-2, and qOC0-153-c20-1) were identified in the regions already reported in previous studies (Table 2) [60], [69], [84], [134]. The remaining 16 QTLs were identified de novo in the current study. In the natural panel, 14 stable significant loci were associated with the KOC in cottonseed (Table 1). The locus on chromosome A03 (c03_18925549) overlapped with that reported by Zhu et al. [134]. The loci on D13 (c18_42111539, c18_42113870, and c18_42507564) overlapped with the regions reported in several previous studies [61]. The locus of c19_19777635 also had overlapping regions in previous results [60], [69], [84]. The remaining seven loci, c01_69712855, c05_107481161, c06_17240705, c07_8706328, c11_104291533, c18_46810731, and c21_22911784, were identified de novo in the current study.

Clustering and regulation of hub genes in carbon metabolism and oil synthesis pathways

The main component of vegetable oil is TAG, which is synthesized by assembling three FA molecules onto a glycerol skeleton [57]. The process of oil anabolism is complex and involves several pathways, including FA biosynthesis, FA desaturation, and glycerophospholipid metabolism [115]. The FA biosynthesis pathway is the most clearly understood pathway in oil synthesis. The essential first step in the de novo biosynthesis of FAs is the catalyzation of acetyl-CoA to produce malonyl-CoA by ACCase. Specific overexpression of heteromeric GhACCase in upland cotton can effectively increase KOC in mature cottonseed [16]. The activation of FAs through the formation of ATP-dependent thioester-linked FA CoA conjugates is required for further TAG synthesis [93]. Hexose and triose are the main carbohydrates for the anabolism of plant seed oil. They are transmitted to plastids to synthesize FA via glycolysis [6]. The FA synthesis pathway in plastids determines the final length and unsaturation level of FA chains in seed oils. Finally, the nascent TAG is transported to and stored in the oil bodies, the main storage organelles of lipids in plant cells [48]. The polyploidy of the G. hirsutum genome implies the complexity of gene function in the pathways related to FA biosynthesis and TAG assembly [26], [52], [125]. Studies have revealed that the genes in oil-related pathways in allotetraploid G. hirsutum are tightly co-regulated in response to artificial and natural selection during the domestication process [36]. The interaction network provides a strong indicator of gene function in a biological process [86], [113], it may also indicate the expression characteristics of a gene cluster in the metabolic pathway. Here, we demonstrated that the hub genes in the interaction network correlated to the oil biosynthetic pathway in G. hirsutum co-localize in a cluster, which is harbored in the confidence marker interval of a major QTL identified simultaneously in both CCRI70 and 0–153 RILs (qOCchip-c3-1 and qOC0-153-c3-2) (Fig. 6). In the cluster, LtnD [126], FER [24], [34], [49], PGK [111], [133], and RAV1 [96], [129] are involved in the related pathways of carbon metabolism in plastids, supplying carbon source and energy for FA synthesis, while PAH2 [6], [25] and LPLAT1 [12], [92] coordinate the Kennedy pathway and Lands cycle for TAG assembly [106] (Fig. 7). LtnD catalyzes the oxidation of L-threonate to produce 2-oxo-tetronate and synthesizes DHAP, an intermediate of glycolysis [126]. DHAP is then reduced to G3P via the catalysis of G3P dehydrogenase [105]. PGK catalyzes the conversion of 1,3-bisphosphoglycerate (1,3-BPG) to G3P in glycolysis [5], [56]. Increasing PGK activity has been demonstrated to increase the KOC in sunflower seeds [101]. A significant expression difference in PGK was also observed in developing ovules between different KOC lines of cotton [133]. FER interacts with glyceraldehyde-3-phosphate dehydrogenase (GAPDH) [118], which catalyzes the conversion of GAP to 1,3-BPG, an NADPH-dependent biochemical reaction. FER belongs to a large family of plasma membrane-localized receptor-like kinases (RLKs) [27], [34]. It has a transmembrane domain [50], [51], and its plasma-membrane localization and signaling require direct interaction with the glycosylphosphatidylinositol-anchored protein (GPI-AP) LORELEI-like GPI-AP1 (LLG1) [27], [51]. The RAV TFs have two specific DNA-binding domains: one is the AP2/ERF domain about 60–70 amino acids in size that is contained in the AP2/ERF TF superfamily, and the other is the B3 domain about 110 amino acids in size that is contained in another plant-specific B3 TF superfamily, including VP1/ABI3 [74], [96]. A recent study revealed that the stability of WRINKLED1, an AP2 TF, enhances oil accumulation in Arabidopsis [129], implying a role of the AP2 domain in TAG biosynthesis. The LPLAT gene encodes a lysophospholipid acyltransferase (LPLAT) with broad specificity, which participates in the opposite transport of nascent FAs from the chloroplast to the ER through phosphatidylcholine (PC) acyl editing [44] and mediates the transformation of lysophosphatidylethanolamine (1-acyl-sn-glycerin-3-phosphoethanolamine or LPE) to phosphatidylethanolamine (1,2-diacyl-sn-glycerin-3-phosphoethanolamine or PE). It can also use lysophosphatidylglycerol as a substrate in vitro, together with LPCAT2, to export FAs synthesized de novo from the chloroplast to the PC pool through acyl editing [106]. In upland cotton, a homolog of LPCAT1 (Gh_A03G0735) was identified in a major QTL for KOC in cottonseed [61]. PAH1 and PAH2 act redundantly to inhibit phospholipid biosynthesis in the ER [25]. The phospholipid content in the leaves from pah1 pah2 double mutants is ∼ 1.8-fold more than that from the wild type. The dephosphorylation of phosphatidic acid to diacylglycerol (DAG) is a rate-limiting step in TAG formation [89]. Thus, PAH may control the formation of oil bodies through DAG; a decrease in TAG content is then observed in seeds of the mutants [2]. A homolog of PAH2 (Gh_D05G1934) in upland cotton was identified in the interval of the major QTL qOC-Dt5-1 [69]. Gene clustering in eukaryotes was first identified in the biosynthesis pathway of secondary metabolite cyclic hydroxamic acids in maize [30]. This phenomenon was also found in major plant species, including oat, rice [81], [109], [116], Arabidopsis [28], [29], wheat, and barley [35]. Most of the clustering genes in early studies are correlated with the biosynthesis pathway of secondary metabolites [3], however, an increasing number of studies in recent years have shown that the genes of the primary metabolic pathways of both fungi and plants may also form clusters [75]. Plants usually produce a large number of compounds that underpin the traits of ecological and agronomic importance to help shape ecological interactions and adaptations to changing environments [86]. Each specific compound is often restricted to specific taxon lineages and tissues or synthesized under certain environmental conditions [86]. Preformed or induced compounds that plants produce in response to biotic or abiotic stress are major secondary metabolites synthesized by plants [28]. Clustering the genes relating to the metabolic pathways of these compounds is conducive to the formation of plant resistance to corresponding biotic or abiotic stress. Examples of these compounds include at least cyclic hydroxamic acids in maize (Zea mays) [30], triterpene avenacin in diploid oat (Avena strigosa) [14], [81], and momilactones in rice [109]. The disruption of any member of the clustered genes may lead not only to the loss of the pathway end-products but also to the accumulation of toxic/bioactive intermediates [14], [29], which may further impose strong selection for the maintenance of intact clusters [71], [75]. Previous studies have established a correlation between lipid metabolism and plant responses to abiotic stress [18], [94]. In the gene cluster of the oil symbiotic pathway in the current study, some genes were also involved in plant responses to stress. FER signaling pathways are multifunctional for plant growth and development in response to internal physiological signals [51], [119] or for plant defense establishment in responses to biotic or abiotic environmental signals [17], [58]. PGK1 and PGK3 of upland cotton were targeted by downregulated lncRNAs [91]. RAV1 also responded to environmental signals, including cold, NaCl, and PEG treatments [83]. When cells are exposed to excess nutrients, FAs are esterified to G3P to form TAG [15], [93]. Oil bodies provide an effective mechanism to keep lipids in dynamic turnover equilibrium, sequestering potentially toxic lipids and providing an on-demand source of FAs during periods of increased cellular need in response to ER stress, or lipid sensing in the ER [13], [87], [111]. The involvement of these clustered genes in plant responses to stress indirectly supports their expression interaction in the pathways of lipid anabolism. In summary, we identified 16 major QTLs/QTNs for the KOC in cottonseed that were simultaneously detected across multiple environments and populations based on both linkage analysis and GWAS (Table S8). In a short DNA fragment on chromosome 3 (A03) of the confidence interval of a major QTL, qOCchip-c3-1 in CCRI70 RILs and qOC0-153-c3-2 in 0–153 RILs harbored a cluster-like hub or key genes in lipid biosynthesis-related pathways (Fig. 6), proposing a working model of their roles in lipid biosynthesis (Fig. 7). These clustered genes consisted of two distinct but interacting groups: those that regulate carbon metabolism and energy-flow, including LtnD, FER, PGK, and RAV1, and those that regulate oil synthesis and assembly, including LPLAT1 and PAH2. Upon the plant internal developmental signal of lipid biosynthesis initiation, the former four genes modulated the flow of hexose or triose and energy to FA synthesis, while the latter two genes catalyzed the metabolic interaction of the key steps of the Lands Cycle and Kennedy Pathway in TAG assembly in upland cottonseed. A literature survey indicated that oil metabolism is also involved in the response of plants to abiotic stress, thereby inferring the expression characteristics of these clustered regulatory genes.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  116 in total

1.  Gene network of oil accumulation reveals expression profiles in developing embryos and fatty acid composition in Upland cotton.

Authors:  Yanpeng Zhao; Yumei Wang; Yi Huang; Yupeng Cui; Jinping Hua
Journal:  J Plant Physiol       Date:  2018-06-04       Impact factor: 3.549

2.  Serine/threonine/tyrosine protein kinase phosphorylates oleosin, a regulator of lipid metabolic functions.

Authors:  Velayoudame Parthibane; Ramachandiran Iyappan; Anitha Vijayakumar; Varadarajan Venkateshwari; Ram Rajasekharan
Journal:  Plant Physiol       Date:  2012-03-20       Impact factor: 8.340

3.  Glycolytic enzymatic activities in developing seeds involved in the differences between standard and low oil content sunflowers (Helianthus annuus L.).

Authors:  M Adrián Troncoso-Ponce; Rafael Garcés; Enrique Martínez-Force
Journal:  Plant Physiol Biochem       Date:  2010-10-01       Impact factor: 4.270

4.  A family of eukaryotic lysophospholipid acyltransferases with broad specificity.

Authors:  Ulf Ståhl; Kjell Stålberg; Sten Stymne; Hans Ronne
Journal:  FEBS Lett       Date:  2007-12-26       Impact factor: 4.124

5.  Genetic dissection of wheat panicle traits using linkage analysis and a genome-wide association study.

Authors:  Kai Liu; Xiaoxiao Sun; Tangyuan Ning; Xixian Duan; Qiaoling Wang; Tongtong Liu; Yuling An; Xin Guan; Jichun Tian; Jiansheng Chen
Journal:  Theor Appl Genet       Date:  2018-02-22       Impact factor: 5.699

6.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines.

Authors:  Susanna Atwell; Yu S Huang; Bjarni J Vilhjálmsson; Glenda Willems; Matthew Horton; Yan Li; Dazhe Meng; Alexander Platt; Aaron M Tarone; Tina T Hu; Rong Jiang; N Wayan Muliyati; Xu Zhang; Muhammad Ali Amer; Ivan Baxter; Benjamin Brachi; Joanne Chory; Caroline Dean; Marilyne Debieu; Juliette de Meaux; Joseph R Ecker; Nathalie Faure; Joel M Kniskern; Jonathan D G Jones; Todd Michael; Adnane Nemri; Fabrice Roux; David E Salt; Chunlao Tang; Marco Todesco; M Brian Traw; Detlef Weigel; Paul Marjoram; Justin O Borevitz; Joy Bergelson; Magnus Nordborg
Journal:  Nature       Date:  2010-03-24       Impact factor: 49.962

7.  A novel functional module detection algorithm for protein-protein interaction networks.

Authors:  Woochang Hwang; Young-Rae Cho; Aidong Zhang; Murali Ramanathan
Journal:  Algorithms Mol Biol       Date:  2006-12-05       Impact factor: 1.405

8.  Rapid resistome mapping using nanopore sequencing.

Authors:  Eric van der Helm; Lejla Imamovic; Mostafa M Hashim Ellabaan; Willem van Schaik; Anna Koza; Morten O A Sommer
Journal:  Nucleic Acids Res       Date:  2017-05-05       Impact factor: 16.971

9.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

10.  A comprehensive meta QTL analysis for fiber quality, yield, yield related and morphological traits, drought tolerance, and disease resistance in tetraploid cotton.

Authors:  Joseph I Said; Zhongxu Lin; Xianlong Zhang; Mingzhou Song; Jinfa Zhang
Journal:  BMC Genomics       Date:  2013-11-11       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.