Literature DB >> 33369825

Large-fragment insertion activates gene GaFZ (Ga08G0121) and is associated with the fuzz and trichome reduction in cotton (Gossypium arboreum).

Xiaoyang Wang1,2, Yuchen Miao3, Yingfan Cai3, Gaofei Sun4, Yinhua Jia1, Song Song1, Zhaoe Pan1, Yuanming Zhang2, Liyuan Wang1, Guoyong Fu1, Qiong Gao1, Gaoxiang Ji1, Pengpeng Wang1, Baojun Chen1, Zhen Peng1, Xiaomeng Zhang1, Xiao Wang1, Yi Ding1, Daowu Hu1, Xiaoli Geng1, Liru Wang1, Baoyin Pang1, Wenfang Gong1,5, Shoupu He1, Xiongming Du1.   

Abstract

Cotton seeds are typically covered by lint and fuzz fibres. Natural 'fuzzless' mutants are an ideal model system for identifying genes that regulate cell initiation and elongation. Here, using a genome-wide association study (GWAS), we identified a ~ 6.2 kb insertion, larINDELFZ , located at the end of chromosome 8, composed of a ~ 5.0 kb repetitive sequence and a ~ 1.2 kb fragment translocated from chromosome 12 in fuzzless Gossypium arboreum. The presence of larINDELFZ was associated with a fuzzless seed and reduced trichome phenotypes in G. arboreum. This distant insertion was predicted to be an enhancer, located ~ 18 kb upstream of the dominant-repressor GaFZ (Ga08G0121). Ectopic overexpression of GaFZ in Arabidopsis thaliana and G. hirsutum suggested that GaFZ negatively modulates fuzz and trichome development. Co-expression and interaction analyses demonstrated that GaFZ might impact fuzz fibre/trichome development by repressing the expression of genes in the very-long-chain fatty acid elongation pathway. Thus, we identified a novel regulator of fibre/trichome development while providing insights into the importance of noncoding sequences in cotton.
© 2020 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

Entities:  

Keywords:  zzm321990Gossypium arboreumzzm321990; fuzzless mutant; genome-wide association study; structural variation

Mesh:

Year:  2021        PMID: 33369825      PMCID: PMC8196653          DOI: 10.1111/pbi.13532

Source DB:  PubMed          Journal:  Plant Biotechnol J        ISSN: 1467-7644            Impact factor:   9.803


Introduction

Mutations are distributed extensively in the genome, including single‐nucleotide polymorphisms (SNPs), small insertions/deletions (InDels), copy number (CNVs) and structural variations (SVs). The precision and integrity of large‐scale SVs detection are limited by technological restrictions. However, with the development of next‐generation sequencing technologies and algorithms, many InDels/CNVs/SVs have been identified and found to be associated with important traits in animals and plants (Chia et al., 2012; Mills et al., 2011; Yang et al., 2013). Enhancers are cis‐regulatory elements (CREs) derived from SVs or the movement of transposable elements (TEs) and control gene expression in an orchestrated spatiotemporal pattern during development (Paliou and Andrey, 2018). A classic example of an enhancer in maize (Zea mays) is a noncoding segment located ~ 58 kb upstream of the teosinte branched 1 (tb1) gene, which was found to enhance tb1 transcription and affect quantitative phenotypes (Clark et al., 2006). Further study demonstrated that this segment included a Hopscotch retrotransposon inserted in a regulatory region of tb1 and acting as an enhancer to increased tb1 expression; this explains why maize exhibits more distinctive apical dominance than its progenitor teosinte (Studer et al., 2011). Similar studies have been reported since the identification of tb1. The introduction of a 366‐bp CRE in the ZmVPP1 promoter increased drought tolerance in maize (Wang et al., 2016). Additionally, a Harbinger‐like TE located ~ 57 kb upstream of ZmCCT9 negatively regulates this flowering‐associated CCT transcription factor (Huang et al., 2018). In rice, insertion of a 17.1‐kb tandem duplication segment into the OsGL7 locus led to elevated expression of GL7, resulting in improvements in grain yield and quality (Wang et al., 2015). In wheat, a terminal‐repeat retrotransposon in miniature element was found in the TaMs2 locus and demonstrated to cause male sterility (Ni et al., 2017; Xia et al., 2017). Together, these studies indicate that some complex and important agricultural traits are determined by large‐scale genomic variations rather than small‐scale nucleotide mutations. Cotton (Gossypium) fibres are the main source of natural textile materials. These fibres are divided into two different types, viz. long lint and short fuzz, both of which are single‐celled tubular outgrowths arising from seed coat epidermal cells. The cells initiated from the epidermal layer at −3 to 0 days post‐anthesis (DPA) develop into lint fibres, whereas fuzz fibres originate from these cells at + 3 to 5 DPA (Du et al., 2013). Most studies have focused on the development of lint, with little attention given to fuzz fibre. Studies of fuzz initiation are essential for understanding the mechanism of cotton fibre development. Moreover, during the cotton seed production process, fuzz fibres are anatomically differentiated from lint fibre, integrated tightly with the seed epidermis and not easy to gin, thus disturbing the seed coating. Strong acids must be used to remove fuzz, increasing production cost and leading to severe environmental pollution. Hence, a ‘fuzzless’ seed phenotype would be beneficial and more environment‐friendly. Fuzzless mutants are important genetic resources for discovering genes associated with fibre development. Several important G. hirsutum mutants, including Xu142fl, N and n, have been well studied and characterized (Wan et al., 2014, 2016). The dominant fuzzless gene N was mapped to chromosome A12 in G. hirsutum using forward genetics, and designated GhMML3_A12, which encodes a MYBMIXTA‐like transcription factor (Wan et al., 2016). Furthermore, by combining map‐based cloning and qRT‐PCR, the recessive fuzzless gene n was mapped to chromosome D12, and MYB25‐like_Dt was proposed as the candidate gene in G. barbadense (Zhu et al., 2018). A recent survey indicated that non‐synonymous SNPs in the coding region of MML3‐A were common in all six mutants (G. hirsutum) and responsible for both the dominant and recessive fuzzless phenotype (Naoumkina et al., 2020). A previous study showed that GaHD‐1, a recessive locus on Chr06, contains an alternative splicing mutation and may confer the lintless trait in G. arboreum mutant A2_106 (Liu et al., 2020 ). Another study demonstrated that dysfunctional GaHD1 may cause fibreless mutant sma‐4 (ha) (Ding et al., 2020). A genome‐wide association study (GWAS) identified a major quantitative trait locus responsible for the fuzzless phenotype on chromosome 8 (Du et al., 2018), and another report fine‐mapped the quantitative trait locus using a large F2 population, suggesting GaFZ (Ga08G0121) as the candidate gene (Feng et al., 2019). A recent study identified two regions associated with fuzz development, one of which contains GaFZ (Liu et al., 2020). Although GaFZ was considered as the most likely causal gene responsible for the fuzzless phenotype in G. arboreum, however, the function and causal variation of this gene remains unknown. Cotton fibres, like Arabidopsis thaliana leaf trichomes, are elongated single‐cell structures of epidermal origin and may share similar regulatory mechanisms (Serna and Martin, 2006). Trichome development in A. thaliana is regulated by MYB/bHLH/WD40 (MBW) complexes, including R2R3 MYB, GLABROUS1 (GL1), the basic helix–loop–helix protein‐encoding gene GLABRA3 (GL3), ENHANCER OF GLABRA3 (EGL3), the WD40 protein‐encoding gene and TRANSPARENT TESTA GLABRA1(TTG1), which either promote or inhibit trichome initiation and growth (Serna and Martin, 2006). The homeodomain (HDZip IV) transcription factor GLABRA2 (GL2) is the downstream target of the MBW complexes and is involved in regulating trichome morphogenesis (Khosla et al., 2014). In cotton, GaMYB2 has a similar function as AtGL1, which can rescue gl1 glabrous phenotype in trichome formation (Wang et al., 2004). GhMYB109, a homologous gene of AtGL1, is involved in the positive regulation of fibre initiation and elongation (Pu et al., 2008; Suo et al., 2003). GhMYB25‐like, belonging to the MYBMIXTA‐like (MML) transcription factor family, acts upstream of the homeodomain leucine zipper protein GhHD‐1 and positively regulates cotton seed trichome development (Walford et al., 2011, 2012). Two MYB25‐like transcription factors, GhMML4_D12 and GhMML3_A12, were isolated as known regulators of fibre development (Wan et al., 2016; Wu et al., 2018). Moreover, three cotton HDZip IV factors (HOX1/2/3) have been identified, and the interaction of GhHOX3 with GhHD1 affects fibre elongation in cotton (Guan et al., 2008; Shan et al., 2014). Using a map‐based cloning strategy, HD1 genes (GaHD1, GbHD1 and GhHD1), were identified as key regulators of lint fibre, stem hair and leaf trichome initiation in the diploids G. arboreum, allotetraploids G. barbadense and G. hirsutum, respectively (Ding et al., 2020; Ding et al., 2015; Liu et al., 2020; Niu et al., 2018; Tang et al., 2019). However, whether additional regulators of trichome development exist in cotton and their association with the MBW‐GL2 system are unknown. Very‐long‐chain fatty acid (VLCFA) synthesis‐related genes have been suggested as positive regulators of cotton fibre elongation (Lassner et al., 1996; Qin et al., 2007a). Many cotton genes involved in fatty acid chain elongation and nonspecific lipid transfer are significantly upregulated during early fibre development, indicating that this pathway may be essential for fibre‐cell elongation (Ji et al., 2003; Qin et al., 2007a; Qin et al., 2007b; Shi et al., 2006). VLCFA biosynthesis occurring in the endoplasmic reticulum is controlled by the fatty acid elongase complex which comprises four proteins, 3‐ketoacyl‐CoA synthase (KCS), 3‐ketoacyl‐CoA reductase (KCR), 3‐hydroxyacyl‐CoA dehydratase (HACD) and trans‐2‐enoyl‐CoA reductase (ECR) (Fehling and Mukherjee, 1991). KCS is the first rate‐limiting enzyme determining VLCFA production in higher plants (Lassner et al., 1996). In Arabidopsis thaliana lacking KCS, abnormal plant morphology and a glabrous phenotype were observed (Pruitt et al., 2000; Reina‐Pinto et al., 2009; Yephremov et al., 1999), suggesting that the VLCFA biosynthesis pathway is involved in trichome development. Therefore, identifying regulators affecting VLCFA and elongation pathway genes may be important for understanding cotton fibre and trichome development. Following up on previous work (Du et al., 2018; Feng et al., 2019; Liu et al., 2020), we identified a large insertion in G. arboreum associated with the fuzzless/glabrous trait by upregulating GaFZ expression. GaFZ may function independently of the MBW‐GL2 system by directly repressing the fatty acid elongation pathway. Our results provide insight into the mechanisms of fibre/trichome development in plants.

Results

A large noncoding insertion is responsible for the fuzzless and the reduced leaf trichome number in G. arboreum acc. GA0149

Previous studies of G. arboreum suggested that seed fuzz was controlled by a single dominant locus (designated FZ) on chromosome 8, although the causal variation remained unknown (Du et al., 2018; Feng et al., 2019; Liu et al., 2020). We found that seed fuzz correlated significantly with the amount of leaf trichomes (Figure 1a; Figure 2c); therefore, based on published genotypes (Du et al., 2018), we supplemented the GWAS for the leaf trichome trait using non‐SNP‐based markers (InDels of any size, copy number duplications and deletions). Large‐insertion markers (larINDEL) for the seed fuzz and leaf trichome traits co‐localized to a common region (FZ) on chromosome 8, where the signal intensity for larINDEL (‐log P = 33.60) was much stronger than that of the highest SNP (‐log P = 18.95) (Figure 1a‐b). Except for SNP and larINDEL, we found no association for all other variations (Figure S1‐S3). Four linkage disequilibrium (LD) blocks were identified in FZ, whereas SNP, which is not localized to any LD blocks, was located in the second intron of Casparian‐strip membrane protein (Ga08G0117) (Figure 1c‐d). Investigation of the association between genotypes of SNP and larINDEL and seed fuzz revealed that the presence of larINDEL was highly correlated with the fuzzless phenotype (Figure 1e). Together, these results indicate that the causal variation behind the seed fuzz and leaf trichome traits is likely larINDEL, rather than SNP.
Figure 1

Both the seed fuzz and leaf hair traits are associated with large insertions/deletions in chromosome 8 in G. arboreum. (a) Manhattan plots of the GWAS of seed fuzz and leaf hairs using SNP and SV (large insertion) markers. The strongest associations, designated by red arrows and regions, are highlighted by red transparent columns. The thresholds of GWAS are indicated by red dot lines. (b) Leaf hair and seed fuzz phenotypes between fuzzy and fuzzless accessions. Scale bar = 2cm (top) and 1 cm (bottom). (c) Local Manhattan plot of the GWAS on chromosome 8. The positions of SNP and larINDEL are marked. (d) Location of the gene models and linkage disequilibrium (LD) status in the local Manhattan plot region. Four putative LD blocks were identified, and the corresponding genomic regions are represented by grey bars. The levels of pairwise correlation between SNPs are represented by different colours. (e) Association between the seed fuzz traits and genotypes of SNP and larINDEL in the GWAS population determined by resequencing mapping reads.

Figure 2

Genomic structure of larINDEL and its association with seed fuzz and trichome phenotypes (a) Schematic diagram of the larINDEL region with continuous contigs designed by overlap PCR in chromosome 8. Genomic structure of the fuzzy accession (acc. GA0146, left) and position of the insertion fragment in the fuzzless accession (acc. GA0149, right). All PCR products are shown as red lines with primer pair numbers (middle). The gel indicates the product amplified by Primer 6 (6F/6R). (b) Schematic diagram of genomic components for both fuzzy (acc. GA0146) and fuzzless (acc. GA0149) accessions nearby larINDEL. The five types of repeat elements are shown by arrowed boxes in different colours (Figure S5). The repeat element combinations are circled by red solid lines and categorized as A, B and C (red letters). The breakpoints (BPs) duplicated translocation from Chr.12 and duplication from Chr.8 are marked by red vertical dotted lines. Read depth is shown by grey columns; the horizontal black dotted lines indicate the average sequencing depth for these two accessions (~25×); and the region with an abnormally high read depth is circled by a black dotted line. The location of Primer 6F/6R on the two accessions is marked by blue arrows. (c) Relationships among seed fuzz, leaf hair and larINDEL diversity for each individual in the F2 population and GWAS population. The number of leaf hairs was determined by microscopy with a fixed field (n = 9, three fields were investigated for each individual, three individuals were randomly selected). larINDEL (+/+), larINDEL (+/‐) and larINDEL (‐/‐) indicate the insertion was homozygous present, heterozygous present and absent in individuals, respectively. The upper, median and lower quartiles for each group are denoted by three vertical lines. Significances of difference between groups were derived with an unpaired two‐tailed Student’s t‐test.

Both the seed fuzz and leaf hair traits are associated with large insertions/deletions in chromosome 8 in G. arboreum. (a) Manhattan plots of the GWAS of seed fuzz and leaf hairs using SNP and SV (large insertion) markers. The strongest associations, designated by red arrows and regions, are highlighted by red transparent columns. The thresholds of GWAS are indicated by red dot lines. (b) Leaf hair and seed fuzz phenotypes between fuzzy and fuzzless accessions. Scale bar = 2cm (top) and 1 cm (bottom). (c) Local Manhattan plot of the GWAS on chromosome 8. The positions of SNP and larINDEL are marked. (d) Location of the gene models and linkage disequilibrium (LD) status in the local Manhattan plot region. Four putative LD blocks were identified, and the corresponding genomic regions are represented by grey bars. The levels of pairwise correlation between SNPs are represented by different colours. (e) Association between the seed fuzz traits and genotypes of SNP and larINDEL in the GWAS population determined by resequencing mapping reads. Genomic structure of larINDEL and its association with seed fuzz and trichome phenotypes (a) Schematic diagram of the larINDEL region with continuous contigs designed by overlap PCR in chromosome 8. Genomic structure of the fuzzy accession (acc. GA0146, left) and position of the insertion fragment in the fuzzless accession (acc. GA0149, right). All PCR products are shown as red lines with primer pair numbers (middle). The gel indicates the product amplified by Primer 6 (6F/6R). (b) Schematic diagram of genomic components for both fuzzy (acc. GA0146) and fuzzless (acc. GA0149) accessions nearby larINDEL. The five types of repeat elements are shown by arrowed boxes in different colours (Figure S5). The repeat element combinations are circled by red solid lines and categorized as A, B and C (red letters). The breakpoints (BPs) duplicated translocation from Chr.12 and duplication from Chr.8 are marked by red vertical dotted lines. Read depth is shown by grey columns; the horizontal black dotted lines indicate the average sequencing depth for these two accessions (~25×); and the region with an abnormally high read depth is circled by a black dotted line. The location of Primer 6F/6R on the two accessions is marked by blue arrows. (c) Relationships among seed fuzz, leaf hair and larINDEL diversity for each individual in the F2 population and GWAS population. The number of leaf hairs was determined by microscopy with a fixed field (n = 9, three fields were investigated for each individual, three individuals were randomly selected). larINDEL (+/+), larINDEL (+/‐) and larINDEL (‐/‐) indicate the insertion was homozygous present, heterozygous present and absent in individuals, respectively. The upper, median and lower quartiles for each group are denoted by three vertical lines. Significances of difference between groups were derived with an unpaired two‐tailed Student’s t‐test. Bioinformatic tools can estimate only the type and approximate location of SVs. larINDEL was amplified from representative fuzzy and fuzzless accessions (GA0146 and GA0149, respectively) using 19 overlapping primer sets, and the whole sequence across this region (Chr08: 880–903 kb) was assembled (Figure 2a and Table S1). Only the fragment amplified by Primer‐6F6R, with a 6226‐bp larINDEL, showed a difference between accessions. Sequencing and alignment of the PCR products from fuzzy and fuzzless accessions identified Primer‐6F6R and breakpoint‐1 (Figure 2a, Figure S4 and S5). To determine the genomic structure of larINDEL, we first constructed the fuzzless sequence (Figure S6a, right panel) by integrating the insertion (larINDEL) into the fuzzy sequence (Figure S6a, left panel). Collinearity analysis of this region showed that five major repeat elements exist in both sequences, which were categorized into three combinations: A, B and C (Figure S6a). The fuzzless accession contained all three combinations, whereas the fuzzy accession had only B and C. In the fuzzless accession, combination A sequences included three repeat elements because of the duplication of combination C upstream of GaFZ (Ga08G0121). Another group A element (deep blue) was a translocated duplication from chromosome 12 (Figure 2b). Combination B may be a duplication relic of C, which was interrupted by the insertion of two inverted (orange, Figure 2b and Figure S6a) and two very small (dark green, Figure 2b and Figure S6a) repeat elements. The different orientations and coverage depth of Illumina read pairs in combination C, between the fuzzless and the fuzzy accessions, support this duplication event (Figure 2b), for which a total of three possible breakpoints (BPs) were identified. The position of BP‐1 (Chr08: 887 151 bp), located at the left boundary where larINDEL inserted, was further identified by aligning ten sequenced clones (Figure S5). BP‐2 (Chr12: 100 898 047 bp) and BP‐3 (Chr12: 100 899 234 bp) were located on chromosome 12 (Figure 2b, Figure S6b‐e). Therefore, the ~ 6.2‐kb larINDEL insertion identified in the fuzzless accession is composed of a homolog repeat combination (A, ~5.0 kb) and a ~ 1.2‐kb homologous fragment derived from chromosome 12. The sequence similarity compared with its original copy (combination C) emphasized this duplicate insertion as a recent event (Figure 2b, Figure S7). According to sequence identity analysis of repeat elements near locus FZ, multiple duplication events likely occurred in this region and continuously accumulated SNPs, but only the most recent duplication caused the fuzzless phenotype (Figure S7). Characterization of the genomic structure of locus FZ may improve the understanding of both the evolutionary history and biological function of this region.

Linkage/association between larINDEL and traits of interest

Previous work demonstrated that the fuzzless trait is controlled by a single dominant gene in GA0149 (Du et al., 2018; Feng et al., 2019). To assess the linkage/association between larINDEL and traits of interest, an F2 population consisting of 1212 individuals was developed by crossing the fuzzless mutant with fewer leaf trichomes (GA0149, female parent) to wild type (GA0146, male parent). The number of leaf trichomes per individual was quantified from the F2 and GWAS populations. The fuzzless accessions/individuals had significantly fewer leaf trichomes and less seed fuzz in the F2 (Figure 2c). Primers 6F/6R and 6F/6R' were used to distinguish heterozygous (larINDEL (+/−)), homozygous fuzzless (larINDEL (−/−)) and homozygous fuzzy (larINDEL (+/+)) plants in the F2 population (Figure 2c, Figure S8a and Table S2). As expected, a large number of heterozygous insertions (n = 624) was detected in fuzzless individuals, which were consistent with a 1:2:1 segregation ratio (homozygous fuzzy: heterozygous fuzzless: homozygous fuzzless; Chi‐square test, P = 0.192). Primer 6F/6R could easily distinguish between homozygous fuzzy and fuzzless accessions in the GWAS population, as they were predominant (Figure 2c and Figure S8b). Overall, our results indicate that the presence or absence of the 6226‐bp larINDEL insertion on chromosome 8 is strongly associated with leaf trichome and seed fuzz phenotypes in G. arboreum.

larINDEL was the key regulator enhancing GaFZ expression

We investigated the expression patterns of eight genes in FZ during different developmental stages in fibres, roots, hypocotyls and leaves. Consistent with previous reports (Feng et al., 2019), only GaFZ exhibited specific expression patterns. GaFZ was highly expressed during the fuzz critical initiation stages (from + 3 to + 5 DPA) in the fuzzless compared with the fuzzy accession (Figure 3a, Figure S9). To verify the relationship between GaFZ expression and larINDEL, we randomly selected six accessions each of the fuzzy and fuzzless accessions from the GWAS population to analyse the sequence diversity of the genic and promoter regions. Five variations were detected in GaFZ, including InDel‐2 (3 bp) and SNP‐2 (C/G), both of which were located in the genic region of GaFZ (Figure 3b). InDel‐1 (24 bp) and SNP‐1 (A/T) were found in the GaFZ promoter (Figure 3b). Haplotype data from this region showed that larINDEL was present in all fuzzless accessions (Figure 3b). Further, the GaFZ expression was higher during developmental stages of 0 to + 5DPA (Figure 3c) in the fuzzless accessions. Several additional SNPs were detected in the regions flanking GaFZ but did not appear to be relevant to the fuzzless trait (Figure S10).
Figure 3

larINDEL activates GaFZ gene expression. (a) Expression profiles of eight genes in the FZ locus region during critical stages of fuzz development. DPA, day post‐anthesis. (b, c) Schematic diagram of all variation polymorphisms in the FZ region and their association with fuzz phenotypes and downstream gene (GaFZ) expression at three stages in the fuzzy and fuzzless accessions. The genotypes of larINDEL in these accessions are marked by red boxes. The genotypes of InDels were represented by symbols ‘+’ or ‘‐’, and the genotypes of SNPs are shown by bases. (d) GUS promoter activity of GaFZ in fuzzy (GA0146) and fuzzless (GA0149) accessions. Each experiment was performed with three independent biological replicates. Data are shown as the mean ± SD. The P values were determined by Student's t‐test. (e) Schematic representation of the reporter vector used to determine the effect of the insert fragment on gene expression in the cotton leaf protoplast transient expression system. This vector contained the minimal CaMV 35S (35S) promoter (minimal promoter of the cauliflower mosaic virus), Renilla luciferase (REN) controlled by 35S, firefly luciferase (LUC) and nopaline synthase terminator (Ter). The red arrow indicates the multiple cloning site (MCS). (f) Dual‐luciferase transient expression assay in cotton protoplasts. The fragments with various lengths and orientations (left) were cloned into the reporter vector, and the corresponding effects on LUC were tested (right). Each experiment was performed with four independent biological replicates. The relative fluorescence value of the empty vector was set as a control. Data are shown as the mean ± SD. Comparison was performed between the vector with greatest (6F6R‐2) and second‐greatest (LUC‐2) average of LUC expression. The P value was determined by Dunnett's multiple comparison test.

larINDEL activates GaFZ gene expression. (a) Expression profiles of eight genes in the FZ locus region during critical stages of fuzz development. DPA, day post‐anthesis. (b, c) Schematic diagram of all variation polymorphisms in the FZ region and their association with fuzz phenotypes and downstream gene (GaFZ) expression at three stages in the fuzzy and fuzzless accessions. The genotypes of larINDEL in these accessions are marked by red boxes. The genotypes of InDels were represented by symbols ‘+’ or ‘‐’, and the genotypes of SNPs are shown by bases. (d) GUS promoter activity of GaFZ in fuzzy (GA0146) and fuzzless (GA0149) accessions. Each experiment was performed with three independent biological replicates. Data are shown as the mean ± SD. The P values were determined by Student's t‐test. (e) Schematic representation of the reporter vector used to determine the effect of the insert fragment on gene expression in the cotton leaf protoplast transient expression system. This vector contained the minimal CaMV 35S (35S) promoter (minimal promoter of the cauliflower mosaic virus), Renilla luciferase (REN) controlled by 35S, firefly luciferase (LUC) and nopaline synthase terminator (Ter). The red arrow indicates the multiple cloning site (MCS). (f) Dual‐luciferase transient expression assay in cotton protoplasts. The fragments with various lengths and orientations (left) were cloned into the reporter vector, and the corresponding effects on LUC were tested (right). Each experiment was performed with four independent biological replicates. The relative fluorescence value of the empty vector was set as a control. Data are shown as the mean ± SD. Comparison was performed between the vector with greatest (6F6R‐2) and second‐greatest (LUC‐2) average of LUC expression. The P value was determined by Dunnett's multiple comparison test. To further confirm whether the promoter‐localized polymorphism was involved in regulating GaFZ expression, we cloned a 2000‐bp promoter sequence of GaFZ from accessions GA0146 and GA0149. Vectors pGaFZ GA0149::GUS and pGaFZ GA0146::GUS were constructed for transient expression in tobacco (Nicotiana benthamiana) leaves. The expression level showed no significant difference (P = 0.9746) between pGaFZ GA0149::GUS and pGaFZ GA0146::GUS, but both were significantly lower than that of pCaMV35S::GUS (P = 0.0010) (Figure 3d). These findings suggest that larINDEL is the genomic variation responsible for upregulation of GaFZ. To test the effect of larINDEL on GaFZ expression, we cloned various lengths of larINDEL into a reporter vector carrying the minimal promoter of the cauliflower mosaic virus, firefly luciferase (LUC) open reading frame, and nopaline synthase terminator (Ter) (Figure 3e). The effects of these constructs on LUC expression were compared in cotton protoplasts. Only the construct carrying the complete fragment (amplified with primer 6F6R) with a specific orientation significantly activated luciferase (Figure 3f). Therefore, enhancement of GaFZ expression by larINDEL may be activated only when the inserted fragment is linked to the flanking sequence in a specific orientation. To evaluate the impact of larINDEL on gene expression within 1 Mb of the flanking FZ region (Chr. 8: 405 400‐1 405 400 bp), we compared the expression of 117 genes. GaFZ was the only gene significantly up‐regulated during fuzz‐initiation stages (0 to + 5DPA) in the fuzzless accession (Figure S11). Moreover, PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) predictions indicated that larINDEL contained an abundance of enhancer cis‐acting and core promoter elements (Figure S12; Table S3). These results suggest that larINDEL might act as an enhancer, specifically by upregulating GaFZ expression during the fuzz‐initiation stages.

GaFZ suppresses trichome and fuzz fibre development

To confirm the subcellular location of GaFZ, constructs 35S::GaFZ‐GFP and 35S::GaFZ‐RFP were prepared and transformed into A. thaliana and tobacco, respectively. The fluorescent signal was observed predominantly in the membrane and nucleus, with some fluorescence detected in the cytoplasm (Figure S13a–c), indicating that GaFZ acts as a receptor at the membrane and regulatory factor in the nucleus. Phylogenetic analysis of the conserved GaFZ domain from various species indicated that homologous proteins can be classified into three groups (Figure S14). GaFZ clustered with group 1, with most of Gossypium homologs, and was most closely related to GhA08G010900 (Figure S14). The A. thaliana homologs of GaFZ (AtGIR1 and AtGIR2), AtGIR1 and AtGIR2 are known to act as negatively regulate of root hair development (Wu and Citovsky, 2017a, 2017b), clustered in group 3. Together, these data suggest that the functions of these homologs differentiated between Gossypium and Arabidopsis. To investigate the effects of GaFZ on trichome and fibre development, we transformed GaFZ into A. thaliana and Upland cotton (G. hirsutum) (Figure 4 and Figure S15‐S18). Compared to wild‐type A. thaliana, 35S::GaFZ transgenic lines exhibited abnormal leaves and significantly reduced (~fivefold) leaf trichome numbers (Figure S16a–e). To confirm whether these phenotypes resulted from overexpression of GaFZ, we examined the expression levels of GaFZ in the T3 generation and found that GaFZ was dramatically up‐regulated (>10 000‐fold) in the overexpression lines compared with that in the wild type (Figure S16f). For Upland cotton overexpressing GaFZ, compared with WT (Figure 4a–c), transgenic lines exhibited a glabrous phenotype in young cauline leaves, stems and veins, which were clearly distinct from the wild type (Figure 4d–f, h, Figure S17). Three positive lines (OE‐L11, OE‐L14 and OE‐L25) showed high GaFZ expression, whereas the negative control lines (OE‐L18) showed very low expression levels similar to wild type, as determined by qRT‐PCR analysis (Figure 4i–j). However, no leaf abnormalities were observed (Figure S18). Lint fibre length in OE‐L14 plants appeared to be slightly shorter than that of the wild type and controls (OE‐L18 and OE‐L29), but was not evident in the other three lines (OE‐L11, OE‐L14 and OE‐L25) (Figure S19). The amount of fuzz fibre was reduced in the transgenic cotton lines (Figure 4g, Figure S19), but root hairs showed no significant difference between the wild‐type and overexpression lines in both A. thaliana and cotton (Figure S20). Our results confirm that the function of GaFZ may differ from that of AtGIR, as GaFZ suppressed trichome and fuzz fibre development but not that of root hairs, in A. thaliana and cotton.
Figure 4

GaFZ repressed cotton trichome and fuzz development. (a‐f) Trichome numbers and fibre phenotype in WT (top) and transgenic lines (below). Young leaves (a, d), stems (b, e), leaf veins (c, f) and the fuzz fibre phenotype (g). Bars = 1 cm in a, d and g and 0.5 cm in b, c, e and f. (h) Statistical analysis of trichome numbers of transgenic and control cotton plants (n = 3). Data are shown as the mean ± SD, and the P values were determined by Student’s t‐test. (i) Electrophoretogram of PCR verification. M, DNA marker, DL2000; lanes 1‐8, PCR products from the genomic DNA of transgenic plant leaves; lane 7, PCR product by deionized water; lane 8, PCR product by the recombinant plasmid of GaFZ, the primers were 35S‐F/OEGaFZ‐Sac1. (j) Quantitative real‐time PCR analysis of different GaFZ‐overexpressing transgenic lines (n = 3). WT, wild type. Data are shown as the mean ± SD.

GaFZ repressed cotton trichome and fuzz development. (a‐f) Trichome numbers and fibre phenotype in WT (top) and transgenic lines (below). Young leaves (a, d), stems (b, e), leaf veins (c, f) and the fuzz fibre phenotype (g). Bars = 1 cm in a, d and g and 0.5 cm in b, c, e and f. (h) Statistical analysis of trichome numbers of transgenic and control cotton plants (n = 3). Data are shown as the mean ± SD, and the P values were determined by Student’s t‐test. (i) Electrophoretogram of PCR verification. M, DNA marker, DL2000; lanes 1‐8, PCR products from the genomic DNA of transgenic plant leaves; lane 7, PCR product by deionized water; lane 8, PCR product by the recombinant plasmid of GaFZ, the primers were 35S‐F/OEGaFZ‐Sac1. (j) Quantitative real‐time PCR analysis of different GaFZ‐overexpressing transgenic lines (n = 3). WT, wild type. Data are shown as the mean ± SD.

GaFZ might repress a key step in VLCFA elongation

To investigate whether GaFZ is involved in the MBW‐GL2 system or other reported genes regulating fuzz/trichome development, we analysed global gene co‐expression profiles between fuzzy and fuzzless cotton accessions, as well as in A. thaliana overexpressing GaFZ. A total of 1343 and 367 differentially expressed genes (DEGs) were identified in A. thaliana lines and G. arboreum at fuzz critical development stages (+3 to + 5 DPA), respectively (Figure S21). In both species, these genes were mainly involved in secondary metabolisms, such as fatty acid elongation (ko00062) and cutin, suberin and wax biosynthesis (ko00073) (Figure 5, Figure S22 and Table S4). The expression of cotton homologs GaMYB109 and GaWER (homolog of AtGL1), GaHOX1 (homolog of AtGL2)/GaHOX2/GaHOX3, GaDEL65 (AtGL3) and GaWD40 (AtTTG1) in the MBW system was not significantly different between the two cotton accessions, and only a slight decrease in AtGL2 transcript levels was observed in GaFZ‐overexpressing A. thaliana (Table S5–S6). Yeast two‐hybrid assay results also indicated that no direct interaction occurred between GaFZ and the MBW‐GL2 system (Figure S23a). At the transcriptional level, a transient dual‐luciferase and GUS activity assay in tobacco demonstrated that the LUC/REN level and GUS activity of pGaHOX1 + GaFZ was unchanged compared with that of pGaHOX1 (Figure S23b–c). For the expression level of other reported trichome/fuzz‐regulating genes, different expression was found in GaHD1, but no difference was detected in GaMML3, between the fuzzy and fuzzless accession at the critical stage of fuzz development (Figure S24). Additionally, most DEGs were involved in the fatty acid elongation pathway in fuzzless cotton and A. thaliana overexpressing GaFZ (Figure 5). Several key DEGs in G. arboreum, including β‐ketoacyl‐CoA synthase 19 (GaKCS19), β‐ketoacyl‐CoA synthase 12 (GaKCS12) and β‐ketoacyl‐CoA synthase 2 (GaKCS2), showed significantly reduced expression patterns at the fuzz fibre formation stage (+3 to + 5DPA) in fuzzless cotton (Figure 5; Table S7). GaFZ‐overexpressing A. thaliana plants also exhibited decreased trichome levels and reduced expression that was five‐ to sixfold lower than in non‐transformed plants which involved the β‐ketoacyl‐CoA synthase (AtKCS) genes—AtKCS1, AtKCS3, AtKCS6, AtKCS10 and AtKCS12 (Figure S25a; Table S8). Genes encoding a subset of wax biosynthetic enzymes, such as cytochrome P450 (AtCYP86A2), ECERIFERUM1 (CER1) and midchain alkane hydroxylase (AtMAH1), were down‐regulated in OE plants (Figure S25a; Table S8), as were genes encoding putative wax ABC transporters (GDSL lipase and AtCER5) (Figure S25a; Table S9). A toluidine blue (TB) test confirmed that wax biosynthesis and transport were inhibited in OE plants (Figure S25b). We predicted that GaFZ might have down‐regulated the first step of fatty acid elongation (KCS genes) and further repressed downstream pathways in fatty acid elongation to impact fuzz or trichome development.
Figure 5

Proposed model showing the mechanism of GaFZ in regulating cotton fuzz/leaf hair development. Heat map indicates gene transcript profiles with FPKM value in different tissues and developmental stages between fuzzy (GA0146) and fuzzless accessions (GA0149), each horizonal block represents one gene. The simplified regulation model of fatty acid elongation and wax, cutin and suberin biosynthesis was modified from Li‐Beisson et al. (2013). Simply, GaFZ was activated by an inserted enhancer located at ~ 18 kb upstream, indicating that it inhibits the expression of genes involved with the fatty acid elongation pathway, further repressed the development of cotton fuzz and trichome through wax, cutin and subering biosynthesis pathway in G. arboreum fuzzless. KCR, ketoacyl‐CoA reductase; HCD, β‐hydroxyacyl‐CoA dehydratase; ECR, enoyl‐CoA reductase.

Proposed model showing the mechanism of GaFZ in regulating cotton fuzz/leaf hair development. Heat map indicates gene transcript profiles with FPKM value in different tissues and developmental stages between fuzzy (GA0146) and fuzzless accessions (GA0149), each horizonal block represents one gene. The simplified regulation model of fatty acid elongation and wax, cutin and suberin biosynthesis was modified from Li‐Beisson et al. (2013). Simply, GaFZ was activated by an inserted enhancer located at ~ 18 kb upstream, indicating that it inhibits the expression of genes involved with the fatty acid elongation pathway, further repressed the development of cotton fuzz and trichome through wax, cutin and subering biosynthesis pathway in G. arboreum fuzzless. KCR, ketoacyl‐CoA reductase; HCD, β‐hydroxyacyl‐CoA dehydratase; ECR, enoyl‐CoA reductase.

Discussion

Insertion larINDEL is a major regulator responsible for the fuzzless seed phenotype in G. arboreum

Lack of fuzz (fuzzless) is a common mutant seed phenotype in cotton. In commercial cotton seed production, natural naked seeds do not require treatment with strong acids during the seed‐coating process, thus reducing environmental pollution. However, commercial fuzzless varieties are rare, as the fuzzless phenotype is associated with poor yield (lint fibre). In G. arboreum, nearly a quarter (57 of 215 accessions) of investigated germplasms were found to be fuzzless (Du et al., 2018), though little is known about the fuzzless regulatory mechanism in these germplasms. As a diploid, G. arboreum is an ideal model for studying the genetic basis of fuzzless or hairless mutations in cotton. Recently, using varied technologies, the major locus for fuzzless in G. arboreum was confirmed on chromosome 8 (Du et al., 2018; Feng et al., 2019; Liu et al., 2020). In contrast to previous studies (Feng et al., 2019; Liu et al., 2020), we found that GaFZ function may differ from that of its A. thaliana homologs GIR1/GIR2. In A. thaliana, GIR1/GIR2 represses root hair development by regulating GL2 (Wu and Citovsky, 2017a, 2017b). However, there was no difference in root hair density between the wild‐type and overexpressed cotton (Figure S20). Moreover, Feng et al. (2019) and Liu et al. (2020) suggested that SNPs or small InDels are responsible for regulating GaFZ; however, we confirmed that variations such as those in GaFZ were not correlated with its expression (Figure 3b‐c, Figure S10). We performed follow‐up work analysis to determine the genetic basis of the fuzzless/glabrous trait in G. arboreum. However, in contrast to the results obtained by traditional GWAS, the strongest SNP signal (SNP) identified was not responsible for the fuzzless phenotype (Figure 1e) or expression of GaFZ (Figure 3a). The associated SNP cluster identified may be regarded as tags, and thus other variations nearby may be important. Therefore, we compared raw read mapping data at this region between fuzzy and fuzzless accessions and successfully identified a large‐insertion (larINDEL) upstream of GaFZ in the fuzzless mutant. This SV was also confirmed in non‐SNP GWAS. In humans, large‐scale genomic changes have been considered as major causes of disease (Stankiewicz and Lupski, 2010). Similarly, in‐depth studies of plant genomes revealed that many important agricultural traits are also controlled by genomic SVs. CNVs often increase the transcript levels of associated genes, such as the genetic basis of hybrid incompatibility between japonica and indica and grain size diversity in rice (Shen et al., 2018; Wang et al., 2015). SVs can act as enhancers and disrupters by inserting or deleting sequences in the host genome, which is often induced by TE activity (Duan et al., 2017; Studer et al., 2011). The G. arboreum genome harbours > 85% repeat sequences (Du et al., 2018), providing an abundance of resources for yielding regulators under natural selection or domestication (Wang et al., 2017). Here, we found evidence that distant structural variations in certain genes can effectively impact important phenotypes in cotton, and highlighted the importance of noncoding regulators in crops. Enhancers are CREs, which control the spatiotemporal expression of some functional genes (Paliou and Andrey, 2018), and are typically derived from the insertion of TEs or genomic SVs; including deletions, inversions, duplications and translocations. We identified a large insertion with a translocation, larINDEL, which acts as a distant directional enhancer (~18 kb) for GaFZ, ultimately regulating the fuzzless trait in cotton.

Fuzz development is controlled by multiple loci in cotton

Complex metabolic networks coordinately regulate cotton fibre and trichome initiation. In our study, the fuzzless phenotype in most accessions was associated with larINDEL (Figure 1). Rong et al. (2005) and Liu et al. (2020) also reported several G. arboreum fuzzless accessions with a non‐dominant inheritance pattern, which may explain these fuzzless accessions without larINDEL. Perhaps because of the low allelic frequency in the population (minor allele frequency < 0.05), other regulators responsible for fuzzless phenotype were not identified in the previous GWAS project (Du et al., 2018). To date, loci influencing seed fuzz in the G. hirsutum germplasm population have not been identified at the genome‐wide level. However, both dominant and recessive inheritance patterns have been reported to regulate seed fuzz (Rong et al., 2005). Additionally, Zhu et al. (2018) identified five loci associated with the recessive fuzzless trait, implying the presence of a complicated regulating network in G. barbadense. Therefore, larger population size should be used to comprehensively screen loci resulting in fuzz reduction in G. arboreum. More surveys, such as investigation of hybrid phenotype between extensive fuzzless mutants and fuzzy accession, should be conducted to comprehensively explore other genetic factors responsible for the fuzzless trait in G. arboreum. The function of the homologous gene (GaFZ) or sequence variations of the locus FZ region is unknown in the other two cultivated tetraploid cottons (G. hirsutum and G. barbadense). Deficient expression of GhMML3 (GhMYB25‐like) was suggested to be associated with the dominant fuzzless phenotype in G. hirsutum mutant N1 (Wan et al., 2016). However, we found that the expression level of GaMML3 did not differ between the fuzzy and dominant fuzzless accession in G. arboreum (Figure S24). This inconsistency suggests that the dominant fuzzless phenotype was controlled by different loci in G. hirsutum and G arboreum. Recently, a study of a new naked‐tufted mutant (n) was the first to mention a relationship between the fuzzless phenotype and GaFZ homolog in G. hirsutum (Naoumkina et al., 2020). We demonstrated that overexpression of GaFZ in G. hirsutum led to a significant reduction in the fuzz and trichome (Figure 4), indicating that GaFZ (or its homologs in other cotton species) might regulate similar downstream pathway influencing fuzz/trichome development in both G. arboreum and G. hirsutum. However, larINDEL is most likely a mutation that specifically exists in the G. arboreum genome, which may have resulted the activity of noncoding sequences. Studies are needed detect whether the large‐scale sequence variations also present in the homologous region of FZ locus in both G. hirsutum and G. barbadense genomes, as well as their impact on the expression of GaFZ homologs and fuzz or trichome phenotypes in the future study. Further analysis of the pattern of sequence variations of this active noncoding region among cotton species is an ideal model for understanding the genome evolution of cotton.

GaFZ may repress key VLCFA genes (KCSs) and further regulate fuzz/trichome

According to Qin et al., 2007, long‐fibre cotton contains almost twice the amount of KCS transcripts than short‐fibre cotton at the fast‐growing + 10 DPA stage. Our findings indicate that GaFZ may be associated with fatty acid elongation and cutin, suberin and wax biosynthesis (Figure 4). Transcription of all 11 KCS genes was reduced in GaFZ‐OE A. thaliana, accompanied by a distinct decrease in trichome numbers (Table S9). Particularly, KCS2 was 15‐fold lower in GaFZ‐OE compared with that in wild‐type A. thaliana. Genes encoding two major components of the fatty acid elongation complex, KCR (YBR159) and 3‐hydroxyacyl‐CoA dehydratase (PAS2) (Qin et al., 2007b; Roudier et al., 2010), were also transcriptionally down‐regulated in GaFZ‐OE A. thaliana (Table S9). These findings suggest that GaFZ might regulate KCSs. Plant waxes are typically composed of primary alcohols, wax esters, aldehydes and alkanes, derived from VLCFAs, with chain lengths ranging from C20:0 to C34:0 (Yeats and Rose, 2013). Epidermal cells and cotton fibres require wax and cutin for elongation (Degani et al., 2002; Suh et al., 2005). Two genes, CER1 and CER3, are involved in alkane biosynthesis and highly expressed in trichomes (Bernard et al., 2012; Chen et al., 2003). The levels of enzymes CER1 and MAH1, which are involved in the reactions associated with cutin, suberin and wax biosynthesis in the decarbonylation pathway, and WSD1, which is involved in the acyl reduction pathway (Greer et al., 2007; Li et al., 2008; Rowland et al., 2007), were reduced in GaFZ‐OE plants. Additionally, CER5, encoding a plasma membrane‐localized ABC transporter that exports cutin monomers and waxes (Luo et al., 2007; Panikashvili et al., 2007; Ukitsu et al., 2007) were regulated similarly by GaFZ. GDSL lipases were recently suggested to be plant cutinases (Takahashi et al., 2010), and the levels of GDSL lipase genes were decreased by 23‐fold in GaFZ‐OE plants. A TB test revealed dark blue staining in GaFZ‐OE plants, which is consistent with observations of cuticle‐defective 35S:MYB106‐SRDX plants (Oshima et al., 2013). Therefore, GaFZ may also act as a significant negative regulator of the biosynthesis and export of cuticular cutin, suberin and wax, which form the outermost layer of cotton fibre. Based on the genomic variations and gene functional analysis, we proposed a model for fuzz development in G. arboreum (Figure 5). Our results indicated that GaFZ is a dominant inhibitory regulator of fuzz initiation, activated by a ~ 18‐kb large‐insertion (larINDEL) upstream of the gene, and acts predominantly by repressing genes involved in fatty acid biosynthesis (Figure 5). Identification of the regulatory mechanism behind GaFZ provides insight into towards the noncoding sequence function, as well as provides opportunities for the development of environmentally‐friendly (by avoiding usage of strong acids to remove fuzz during seed production process) cotton varieties by combining favourable alleles related to fibre yield and fibre quality.

Experimental procedures

Plant materials, growth conditions and phenotyping

A GWAS population containing 215 accessions was used (Du et al., 2018). The two representatives, G. arboreum cultivars, fuzzy and fuzzless‐type (accessions GA0146 and GA0149, respectively), were described previously (Wang et al., 2008). The F2 population (1218 individuals) was developed by crossing GA0146 and GA0149. For GWAS and segregating populations, plants were grown in the experimental field in Anyang, China. The seed fuzz trait was easily identified after ginning. The fourth youngest leaf (the fourth leaf closest to the shoot apical meristem) was collected from field or greenhouse plants, and leaf trichomes on the main veins were counted using an inverted microscope. Three fields of view were counted on each of three leaves for each sample. Transgenic cotton lines were grown in a controlled environment (28°C day/25°C night, 75% relative humidity, and 16‐h photoperiod with a light intensity of ~ 100 molm2/s). Arabidopsis thaliana and N. benthamiana seeds were sterilized with 3% NaClO for 10 min and 70% ethanol for 1 min, and then washed with sterile ddH2O five times. Seeds were grown on MS media with 0.8% agar and 3% sucrose, vernalized at 4°C for 2 days and transferred to a growth chamber (21°C day/18°C night, 60–80% relative humidity and 16‐h photoperiod with a light intensity of ~ 100 molm2/s) (Liu et al., 2017; Murashige and Skoog, 1962). Seven‐day‐old wild‐type and transgenic A. thaliana seedlings grown on vertical on solid MS media were used for root hair density determination (Wu and Citovsky, 2017a, 2017b).

GWAS

SNP‐based GWAS was performed as described previously (Du et al., 2018). Non‐SNP markers were first transformed into a nucleotide format suitable for EMMAX (ver. emmax‐beta‐07Mar2010) (Kang et al., 2010). SNPs and InDels were called using GATK (ver. 4.0.12.0) (Du et al., 2018). SVs and CNVs from the whole population (215 accessions of G. arboreum and 1218 F2 individuals) were called using BreakDancer (v1.1) and CNVnator (v0.3), respectively (Abyzov et al., 2011; Chen et al., 2009). The existence of an InDel marker in an accession was marked as ‘T’, and ‘C’ if not present. For SV markers, chromosomes were divided into 100‐bp fragments as windows, and each window was regarded as a single marker. Windows containing SVs were marked as ‘T’, and those without were marked as ‘C’. The transformation of CNVs markers was similar to SVs. Genome‐wide significance thresholds of all tested traits were evaluated with the formula P = 0.05/n (where n is the number of variations).

Structural variation validation and visualization

For PCR validation, primers flanking the breakpoint of the interposition were designed using Primer 5.0 (Table S1). SV6 was amplified from GA0146 and GA0149 genomic DNA using Advantage 2 PCR Enzyme (Clontech, Mountain View, CA, USA) with an extension time of 4.5 min. PCR products were cloned into T‐vectors and sequenced. The GenBank number of GA0149‐SV6 sequence is MT584306. We distinguish homozygous fuzzy and fuzzless accessions using the primer sets of 6F/6R and 6F/6R’ and amplified the inserted sequence in GA0146, GA0149, hybrid F1 and F2 populations using Phanta Max Super‐Fidelity DNA Polymerase (Vazyme, Nanjing, China) with an extension time of 1.5 min. The variations in the FZ region (Figure S3, Figure S6b‐e, Figure S10) were also exhibited by Integrative Genomics Viewer (IGV, ver. 2.3.97) (Robinson et al., 2011). The introduction of viewing alignment data in Integrative Genomics Viewer is found at http://software.broadinstitute.org/software/igv/AlignmentData.

RNA sequencing and expression analyses

For RNA extraction, the root and hypocotyls of GA0149 and GA0146 were collected from two‐day‐old seedlings, and fresh leaves were collected from three‐week‐old greenhouse‐grown seedlings. Fibre and ovules (at −2 day post‐anthesis −2, 0, +3, +5, +8, +13, +18, +23DPA), flowers, bracts and stamens were harvested from the field. Samples were immediately frozen in liquid nitrogen and stored at −80°C. Total RNA was extracted using the RNAprep Pure Plant Kit (polysaccharides and polyphenolic‐rich) (TIANGEN, Beijing, China) followed by DNase I (TIANGEN) treatment, according to the manufacturer’s instructions. Simultaneously, A. thaliana RNA of wild type and 35S::GaFZ at the vegetative stage (samples were named as WT‐1 and OE‐1) and reproductive growth stage (samples were named as WT‐2 and OE‐2) was extracted (three replications each) using TRIzol (Invitrogen, Carlsbad, CA, USA), respectively. A total of 15 μg RNA was used to construct Strand‐specific Illumina RNA‐Seq libraries (Zhong et al., 2011), and RNA sequencing was performed on an Illumina HiSeq 2000 platform (Biomarker technologies, Beijing). After removing the adaptor sequences and low‐quality reads, the remaining reads were mapped to the G. arboreum reference genome (Du et al., 2018) and A. thaliana (https://www.arabidopsis.org/) reference genomes using TopHat (ver. 2.0.13) (Kim et al., 2013). Only reads with a perfect match or one mismatch were further analysed and annotated. The raw transcriptome sequencing data are available in the NCBI Sequence Read Archive (SRA) under the BioProject accession ID PRJNA622455. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway functional enrichment and Gene Ontology were performed using the KEGG Ortholog database (https://www.kegg.jp) and Gene Ontology database (http://www.geneontology.org), respectively. Nr (non‐redundant protein sequence database, https://blast.ncbi.nlm.nih.gov/Blast.cgi) and Swiss‐Prot (https://www.uniprot.org/blast/) were used for gene annotation. The Clusters of Orthologous Groups database was used for functional classification (Ashburner et al., 2000; Kanehisa et al., 2004). Gene expression was calculated by FPKM using Cufflinks (Trapnell et al., 2010). Differentially expressed genes were identified as log2(fold change) ≥1.5, with a false discovery rate of ≤ 0.05. The GaFZ expression level was validated by quantitative real‐time PCR and RT‐PCR. A total of 1 μg RNA was synthesized using a PrimeScript RT Reagent Kit with gDNA Eraser (Perfect Real Time) (Takara, Shiga, Japan) according to the manufacturer’s instructions. Primers for real‐time and RT‐PCR were designed using NCBI Primer‐BLAST (https://www.ncbi.nlm.nih.gov/tools/primer‐blast/index.cgi?LINK_LOC=BlastHome) (Table S1). Real‐time PCR was performed using the 7500 Fast Real‐Time PCR system (Applied Biosystems, Foster City, CA, USA) and SYBR Premix Ex Taq II (Tli RNaseH Plus) (Takara) in 20‐μL reactions following the manufacturer’s instructions. The PCR efficiency of all primers was firstly evaluated by LinRegPCR (Ruijter et al., 2009), which revealed an efficiency of 90.5%‐110.2%. Histone 3 (AF024716) from cotton was selected as an internal control. The relative gene expression level was calculated by the 2−ΔΔCt method (Livak and Schmittgen, 2001). Three technical replicates and independent experiments were tested. Semi‐quantitative PCR was performed to further verify the expression levels of GaFZ. The reverse‐transcribed cDNA template was serially diluted. Histone3 (AF024716) was amplified using PCR Mix (CWBIO, Beijing, China) to adjust the initial template concentration following gel band brightness. The appropriate concentration of the template was used to amplify GaFZ from various samples.

Subcellular localization of GaFZ

The full‐length CDS of GaFZ (without stop codons) was amplified from GA0149 with specific primers (Table S1) and inserted into the transient expression vectors pBinGFP438 and pBinRFP438, generating constructs 35S::GaFZ‐GFP and 35S::GaFZ‐RFP, respectively (Liu et al., 2014). Recombined plasmids 35S::GaFZ‐GFP and 35S::GaFZ‐RFP and positive control plasmids 35S::GFP and 35S::RFP were transferred into the Agrobacterium tumefaciens strain LBA4404 by liquid nitrogen freezing and thawing. Vector 35S::GaFZ‐GFP was transferred into A. tumefaciens strain GV3101. Agrobacterium tumefaciens cells were cultured overnight at 28°C with shaking at 200 rpm, collected by centrifugation and resuspended in infiltration media (10 mM MgCl2, 10 mM MESKOH at pH 5.6 and 200 μM acetosyringone) to an OD600 of 0.8‐1.0 and incubated at room temperature for 2 h. Nicotiana benthamiana leaves were infiltrated with A. tumefaciens LBA4404 containing the expression and control vectors. Arabidopsis thaliana was transformed by floral dip (Clough and Bent, 1998) with A. tumefaciens GV3101 carrying 35S::GaFZ‐GFP. The fluorescent protein signal was observed using a Zeiss LSM 710 confocal microscope (Zeiss, Oberkochen, Germany) with excitation and emission wavelengths of 488 and 495‐530 nm, respectively.

CDS and promoter sequence analyses of GaFZ alleles

Following the CTAB method (Song et al., 1998 ), genomic DNA was extracted from fresh cotton leaves (acc. GA0146 and acc. GA0149), from which the full‐length of GaFZ and the upstream 2000 bp promoters of GaFZ promoters were cloned. All PCRs were run with KOD FX enzyme (Toyobo, Osaka, Japan), following standard PCR protocols. The products were cloned into the T‐19 simple vector (Takara Kit, Code No. 3271) and sequenced (General Biosystem, Durham, NC, USA). The diversity of nucleotide sequences was analysed using DNAMAN (ver. 6.0).

GaFZ promoter activity

GaFZ promoters, 2000 bp upstream of the transcription start site, were cloned from GA0146 and GA0149 genomes with primers pGaFZ −2000‐Sca I and pGaFZ −2000‐Xba I (Table S1). pGaFZ −2000‐GA0146 and pGaFZ −2000‐GA0149 were fused into the binary vector pBI121 with GUS, respectively. The recombinant plasmids were induced into A. tumefaciens strain LBA4404 and cultured in LB containing 50 μg/mL kanamycin, 30 μg/mL rifampicin and 50 μg/mL streptomycin sulphate at 28°C with shaking at 200 rpm for 12‐16 h. The Agrobacteria cells were centrifuged and resuspended in infiltration buffer (10 mM MgCl2, 10 mM MES and 200 μM acetosyringone at pH 5.7) to an OD600 of 0.8, and incubated at room temperature for 2 h. The abaxial sides of N. benthamiana leaves were infiltrated with the suspension and cultured in a dark greenhouse for 24 h, after which they were moved to a growth chamber with a 16‐h photoperiod for two days. For GUS histochemical staining, fresh transgenic N. benthamiana tissue was cut into 3‐5 mm strips, followed immediately by incubation in staining solution (1 mM X‐Gluc, 100 mM pH 7.0 sodium phosphate, 10 mM EDTA, 0.5 mM potassium ferricyanide, 0.5 mM potassium ferrocyanide and 0.3% Triton X‐100) at 37°C for 12 h. To remove chlorophyll, the tissue was washed with an ethanol series (50, 70, and 95%) and stored in 70% ethanol. Samples were imaged with a stereomicroscope (Leica Microsystems, Wetzlar, Germany), and GUS activity was detected as described previously (Jefferson, 1987). Briefly, tissue samples were ground into a fine powder, added to the GUS‐extraction solution and centrifuged for 2 min at 1000 rpm. The supernatant contained the crude GUS enzyme. The substrate 4‐methylumbelliferyl‐d‐glucopyranoside (4‐MUG) (Sigma‐Aldrich, Shanghai) was added to the supernatant, and fluorescence of the 4‐methylumbelliferone (4‐MU) (Sigma‐Aldrich, St. Louis, MO, USA) product was determined with excitation and emission wavelengths of 365 and 455 nm, respectively. The amount of 4‐MU was calculated using the 4‐MU standard curve. Protein concentration was used to normalize GUS activity. Each experiment was performed with at least three biological and technical replicates.

Protoplast transient expression

The insertion–deletion fragments of different lengths were cloned and inserted upstream of the 35S minimal promoter in the pGreenII 0800‐LUC vector, as described previously (Huang et al., 2018). The primers used to construct all vectors are listed in Table S1. For all transfections, plasmids were purified from Escherichia coli DH5α (CWBIO) using the Plasmid Maxi Kit (Tiangen) and adjusted to 1 µg/µL. Cotton protoplasts were isolated from 10‐day‐old etiolated GA0149 seedling cotyledons, following (slightly modified) established methods (Huang et al., 2018). Transfection was performed as described for polyethylene glycol‐mediated transformation (Kuroha et al., 2018). Protoplasts harbouring the plasmid were transferred into 6‐well plates and cultured the dark at 25°C for 16‐18 h. The LUC assay was measured using the Dual‐Luciferase Reporter Assay System (Promega Madison, WI, USA), following the manufacturer's protocol. Six biological replicates and technical replicates were assayed.

GaFZ overexpression vector construction and transformation

To construct the overexpression vector, the GaFZ full‐length CDS was PCR‐amplified from GA0149 with primer sets OEGaFZ‐XabI and OEGaFZ‐SacI, subcloned into T‐Vector pMD19 (Takara) and confirmed by sequencing. The cloned fragments were digested with XbaI and SacI, and then ligated into the pBI1221 (containing the CaMV 35S promoter) XbaI‐SacI cloning site. This vector was transformed into A. tumefaciens strains GV3101 and LBA4404. The strain GV3101 including vector was introduced into Arabidopsis following the floral dip method (Clough and Bent, 1998), and the strain LBA4404 including vector was transferred into cotton (HM1) plants following the A. tumefaciens‐mediated hypocotyl transformation method (Jin et al., 2006). Transgenic plants were selected on media containing 40 μg/mL of kanamycin and 50 μg/mL of cefotaxime. To identify the transgenic positive lines, the OE lines were genotyped by PCR using a specific primer set (CaMV35S‐F/OEGaFZ‐SacI) (Table S1). The GaFZ expression level was also determined in all OE lines using a specific primer set (Table S1).

Yeast two‐hybrid

The full‐length CDS for the transcription factors (GaWER, GaHOX1, GaDEL65 and GaWD40) was amplified and cloned into the bait vector, pGBKT7. Full‐length GaFZ CDS was inserted into the prey vector, pGADT7. pGBKT7 was transformed into Y2HGold yeast, and pGADT7 into Y187 yeast following the Transformation System (Clontech). Non‐autoactivation of the bait and non‐toxicity of the prey vectors were determined. Interactions between GaFZ and the transcription factors were detected using the mating method following the Matchmaker Gold Yeast Two‐Hybrid System on SD/‐Leu/‐Trp and SD/‐Ade/‐His/‐Leu/‐Trp media containing X‐gal (5‐bromo‐4‐chloro‐3‐indolyl‐b‐D‐galactopyranoside) media (Clontech).

Transcriptional activation assays

The GaHOX1 promoter (2000 bp upstream of the ATG) was amplified and cloned into pGreenII 0800‐LUC to drive expression of the firefly LUC reporter gene. To determine relative effectiveness, the Renilla reporter gene was driven by the CaMV 35S promoter in the same vector. The coding sequence of GaFZ was amplified and fused to pGreenII 62‐SK as an effector plasmid. The dual‐reporter system was performed as described previously (Liu et al., 2008). Relative luciferase activities were determined by dividing the LUC value with the REN value. To generate the GUS reporter vector, the GaHOX1 promoter was cloned into pcambia1391, to drive expression of GUS. The CDS of GaFZ was amplified and fused to the 2300 flag as an effector plasmid. The recombinant vectors were transferred into A. tumefaciens GV3101 with the helper plasmid pSoup‐P19 (ensuring high levels of transient gene expression). Agrobacteria cells containing the effector and reporter plasmids were mixed 1:2, respectively, and then co‐transformed into tobacco leaves. GUS activity was measured as described previously (Jefferson, 1987). At least six biological replicates were performed for each transient assay.

Phylogenetic analysis

The full‐length FZ sequences from cotton and other species were aligned using CLUSTALX (v2.1) (Larkin et al., 2007). A neighbour‐joining phylogenetic tree was constructed using MEGA (v.7.0) with 1000 bootstraps.

Trypan blue staining

Three‐week‐old A. thaliana were stained with trypan blue (TB) as described previously (Tanaka et al., 2004).

Conflict of interest

The authors declare no competing financial interests.

Author contributions

X.M.D., W.F.G. and S.P.H. conceived the project and designed the research. S.P.H., G.F.S. and Y.M. Z. performed built the non‐SNP analysis pipeline and performed GWAS. G.F.S. performed the bioinformatic analysis. Z.E.P., X.M.Z., P.P.W., Y.D., Y.H.J., X.L. G., B.Y.P., D.W.H. and L.R.W. designed and carried out field collection work. X.Y.W, Y.C.M and Y.F.C performed most of the experiments. X.Y.W. and L.Y.W. performed protoplast separation experiment. G.Y.F. performed the locus homozygous identification. S.S, Z.P and G.X.J performed LUC activity. Q.G., B.J.C. and G.Y.F. performed yeast two‐hybrid assays. S.P.H, W.F.G., X.Y.W., Y.C.M. and X.M.D. wrote the paper. All authors read and approved the manuscript. Figure S1 GWAS generated by different types of variations on seed fuzz. Figure S2 GWAS generated by different types of variations on leaf hair. Figure S3 Comparison of mapping paired‐end reads in candidate region between fuzzy and fuzzless accessions. Figure S4 Gels of all designated markers for continuous contigs designing overlapped PCR in locus FZ region on chromosome 8 between fuzzy (GA0146) and fuzzless (GA0146) accessions. Figure S5 Identification of the exact breakpoint and sequence of larINDEL. Figure S6 Deciphering of sequence details in locus FZ region. Figure S7 Proposed evolutionary model nearby the genomic region of larINDEL. Figure S8 Test of genotype‐phenotype linkage in both GWAS and F2 segregation populations. Figure S9 Expression of GaFZ in different tissues between fuzzy (GA0146) and fuzzless (GA0149) accessions by RT‐PCR (a) and RNA‐seq (b). Figure S10 Alignment of GaFZ (Ga08G0121) flanking sequences between GA0146 (fuzzy) and GA0149 (fuzzless). Figure S11 Enhancement effect on gene expression within 1Mb‐length region flanking larINDEL. Figure S12 Illustration of cis‐regulatory elements in the sequence amplified by primer 6F6R in fuzzless and fuzzy accession. Figure S13 Subcellular localization of GaFZ. Figure S14 Phylogenetic tree analysis GaFZ among different species. Figure S15 Selection of transgenic T0 generation‐positive seedlings in MS included kanamycin and cefotaxime. Figure S16 Phenotypes and gene expression of GaFZ in overexpressing Arabidopsis plants. Figure S17 Stem trichome phenotype of wild type and GaFZ‐overexpressing cotton. Figure S18 Whole plant phenotype of Wild‐type (WT) and 35S::GaFZ (OE) in cotton. Figure S19 Fuzz and lint phenotypes in WT and OE cottons. Figure S20 Root hair phenotype in OE Arabidopsis and OE cotton. Figure S21 Venn diagrams showed the common differential expressed genes (DEGs) in two developmental stages of GaFZ‐OE Arabidopsis lines vs. WT (a), and in two fuzz initiation stages (+3, +5DPA) of fuzzless G. arboreum line vs. fuzzy line (b). Figure S22 KEGG enrichment analysis of DEGs between WT and OE Arabidopsis, and between fuzzy and fuzzless G. arboreum. Figure S23 Interactions of GaFZ with GaHOX1, GaWER, GaDEL65 and GaWD40 at the protein and transcriptional level. Figure S24 Gene expression of GaHD1 and GaMML3 between fuzzy (GA0146) and fuzzless (GA0149) accession. Figure S25 GaFZ affects the gene expression of fatty acid elongation (FAE) pathway and cutin, suberin and wax biosynthesis pathway in Arabidopsis. Click here for additional data file. Table S1 All primers used in this study. Table S2 Genotype of larINDEL, seed fuzz and leaf hair in both GWAS population and F2 segregation population. Table S3 Detailed information of cis‐regulatory elements in larINDEL. Table S4 KEGG enriched pathways in GaFZ‐OE Arabidopsis line vs. WT. Table S5 Expression of GL1, GL2, GL3 and TTG1 in Arabidopsis. Table S6 Expression of GL1, GL2, GL3 and TTG1 in G. arboreum. Table S7 Gene expression of biosynthesis of unsaturated fatty acids, fatty acid elongation, and cutin, suberin and wax biosynthesis pathway in G. arboreum. Table S8 DEGs in cutin, suberin and wax biosynthesis, and fatty acid elongation pathway in Arabidopsis. Table S9 DEGs in fatty acid and wax transporter, and fatty acid biosynthesis in Arabidopsis. Click here for additional data file.
  82 in total

Review 1.  Structural variation in the human genome and its role in disease.

Authors:  Paweł Stankiewicz; James R Lupski
Journal:  Annu Rev Med       Date:  2010       Impact factor: 13.739

2.  Large genomic insertion at the Shh locus results in hammer toes through enhancer adoption.

Authors:  Christina Paliou; Guillaume Andrey
Journal:  Proc Natl Acad Sci U S A       Date:  2018-01-12       Impact factor: 11.205

3.  Natural Variation in the Promoter of GSE5 Contributes to Grain Size Diversity in Rice.

Authors:  Penggen Duan; Jinsong Xu; Dali Zeng; Baolan Zhang; Mufan Geng; Guozheng Zhang; Ke Huang; Luojiang Huang; Ran Xu; Song Ge; Qian Qian; Yunhai Li
Journal:  Mol Plant       Date:  2017-03-30       Impact factor: 13.164

4.  Ectopic expression of an esterase, which is a candidate for the unidentified plant cutinase, causes cuticular defects in Arabidopsis thaliana.

Authors:  Kentaro Takahashi; Tomoo Shimada; Maki Kondo; Atsushi Tamai; Masashi Mori; Mikio Nishimura; Ikuko Hara-Nishimura
Journal:  Plant Cell Physiol       Date:  2009-12-08       Impact factor: 4.927

5.  MIXTA-like transcription factors and WAX INDUCER1/SHINE1 coordinately regulate cuticle development in Arabidopsis and Torenia fournieri.

Authors:  Yoshimi Oshima; Masahito Shikata; Tomotsugu Koyama; Norihiro Ohtsubo; Nobutaka Mitsuda; Masaru Ohme-Takagi
Journal:  Plant Cell       Date:  2013-05-24       Impact factor: 11.277

6.  A new method for rapid visualization of defects in leaf cuticle reveals five intrinsic patterns of surface defects in Arabidopsis.

Authors:  Toshihiro Tanaka; Hirokazu Tanaka; Chiyoko Machida; Masaru Watanabe; Yasunori Machida
Journal:  Plant J       Date:  2004-01       Impact factor: 6.417

7.  Identification of the wax ester synthase/acyl-coenzyme A: diacylglycerol acyltransferase WSD1 required for stem wax ester biosynthesis in Arabidopsis.

Authors:  Fengling Li; Xuemin Wu; Patricia Lam; David Bird; Huanquan Zheng; Lacey Samuels; Reinhard Jetter; Ljerka Kunst
Journal:  Plant Physiol       Date:  2008-07-11       Impact factor: 8.340

8.  The CER3 wax biosynthetic gene from Arabidopsis thaliana is allelic to WAX2/YRE/FLP1.

Authors:  Owen Rowland; Robert Lee; Rochus Franke; Lukas Schreiber; Ljerka Kunst
Journal:  FEBS Lett       Date:  2007-07-03       Impact factor: 4.124

9.  Unconventionally secreted effectors of two filamentous pathogens target plant salicylate biosynthesis.

Authors:  Tingli Liu; Tianqiao Song; Xiong Zhang; Hongbo Yuan; Liming Su; Wanlin Li; Jing Xu; Shiheng Liu; Linlin Chen; Tianzi Chen; Meixiang Zhang; Lichuan Gu; Baolong Zhang; Daolong Dou
Journal:  Nat Commun       Date:  2014-08-26       Impact factor: 14.919

10.  Genetic dissection of the fuzzless seed trait in Gossypium barbadense.

Authors:  Qian-Hao Zhu; Yuman Yuan; Warwick Stiller; Yinhua Jia; Pengpeng Wang; Zhaoe Pan; Xiongming Du; Danny Llewellyn; Iain Wilson
Journal:  J Exp Bot       Date:  2018-02-23       Impact factor: 6.992

View more
  2 in total

1.  Large-fragment insertion activates gene GaFZ (Ga08G0121) and is associated with the fuzz and trichome reduction in cotton (Gossypium arboreum).

Authors:  Xiaoyang Wang; Yuchen Miao; Yingfan Cai; Gaofei Sun; Yinhua Jia; Song Song; Zhaoe Pan; Yuanming Zhang; Liyuan Wang; Guoyong Fu; Qiong Gao; Gaoxiang Ji; Pengpeng Wang; Baojun Chen; Zhen Peng; Xiaomeng Zhang; Xiao Wang; Yi Ding; Daowu Hu; Xiaoli Geng; Liru Wang; Baoyin Pang; Wenfang Gong; Shoupu He; Xiongming Du
Journal:  Plant Biotechnol J       Date:  2021-02-07       Impact factor: 9.803

2.  Weighted Gene Co-Expression Network Analysis Reveals Hub Genes Contributing to Fuzz Development in Gossypium arboreum.

Authors:  Xiaoxu Feng; Shang Liu; Hailiang Cheng; Dongyun Zuo; Youping Zhang; Qiaolian Wang; Limin Lv; Guoli Song
Journal:  Genes (Basel)       Date:  2021-05-17       Impact factor: 4.096

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.