Literature DB >> 26420475

Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites.

Xia Liu1, Bo Zhao2, Hua-Jun Zheng3, Yan Hu4, Gang Lu3, Chang-Qing Yang2, Jie-Dan Chen4, Jun-Jian Chen1, Dian-Yang Chen2, Liang Zhang3, Yan Zhou3,5, Ling-Jian Wang2, Wang-Zhen Guo4, Yu-Lin Bai1, Ju-Xin Ruan2, Xiao-Xia Shangguan2, Ying-Bo Mao2, Chun-Min Shan2, Jian-Ping Jiang3, Yong-Qiang Zhu3, Lei Jin3, Hui Kang3, Shu-Ting Chen3, Xu-Lin He3, Rui Wang3, Yue-Zhu Wang3, Jie Chen3, Li-Jun Wang3, Shu-Ting Yu3, Bi-Yun Wang3, Jia Wei3, Si-Chao Song3, Xin-Yan Lu3, Zheng-Chao Gao3, Wen-Yi Gu3, Xiao Deng6, Dan Ma4, Sen Wang4, Wen-Hua Liang4, Lei Fang4, Cai-Ping Cai4, Xie-Fei Zhu4, Bao-Liang Zhou4, Z Jeffrey Chen4,7, Shu-Hua Xu8, Yu-Gao Zhang1, Sheng-Yue Wang3, Tian-Zhen Zhang4, Guo-Ping Zhao2,3,5, Xiao-Ya Chen2.   

Abstract

Of the two cultivated species of allopolyploid cotton, Gossypium barbadense produces extra-long fibers for the production of superior textiles. We sequenced its genome (AD)2 and performed a comparative analysis. We identified three bursts of retrotransposons from 20 million years ago (Mya) and a genome-wide uneven pseudogenization peak at 11-20 Mya, which likely contributed to genomic divergences. Among the 2,483 genes preferentially expressed in fiber, a cell elongation regulator, PRE1, is strikingly At biased and fiber specific, echoing the A-genome origin of spinnable fiber. The expansion of the PRE members implies a genetic factor that underlies fiber elongation. Mature cotton fiber consists of nearly pure cellulose. G. barbadense and G. hirsutum contain 29 and 30 cellulose synthase (CesA) genes, respectively; whereas most of these genes (>25) are expressed in fiber, genes for secondary cell wall biosynthesis exhibited a delayed and higher degree of up-regulation in G. barbadense compared with G. hirsutum, conferring an extended elongation stage and highly active secondary wall deposition during extra-long fiber development. The rapid diversification of sesquiterpene synthase genes in the gossypol pathway exemplifies the chemical diversity of lineage-specific secondary metabolites. The G. barbadense genome advances our understanding of allopolyploidy, which will help improve cotton fiber quality.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26420475      PMCID: PMC4588572          DOI: 10.1038/srep14139

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Whole-genome duplication (WGD) or polyploidy is a primary driving force in the evolution of many eukaryotic organisms, especially flowering plants1234. Many crops are neo-allopolyploids that harbor different sets of genomes56, including the cultivated Upland cotton Gossypium hirsutum (AD)1 and the extra-long staple (ELS) cotton Gossypium barbadense (AD)2. However, our understanding of the molecular mechanism that facilitates the success of allopolyploids and the formation of agronomic traits remains limited. Cotton provides the most important raw material for the textile industry and consequently profoundly affects the world economy and daily human life. The cotton genus Gossypium contains 45 diploid (2n = 26) and six tetraploid (2n = 52) species78, among which only four species, including two tetraploids (G. hirsutum and G. barbadense) and two diploids (G. herbaceum and G. arboreum), produce spinnable fiber. Diploid cottons are divided into eight cytogenetic genome groups, A-G and K. The sizes of genomes vary between groups due to the lineage-specific proliferation of retrotransposons7. The D-group species have the smallest genome with G. raimondii (D5) of less than 880 Mb91011, whereas the genome of G. arboreum (A2) in the A-group is approximately 1,700 Mb12. G. hirsutum and G. barbadense are considered classic natural allotetraploids that originated in the New World approximately 2 million years ago (Mya) from trans-oceanic hybridization between an A-genome ancestral African species, G. herbaceum (A1) or G. arboreum (A2), and a native D-genome species, G. raimondii or G. gossypioides (D6)13, followed by divergence from their common ancestor (Fig. 1). These two allotetraploids are likely the oldest major allopolyploid crops101415.
Figure 1

A schematic map of the evolution of allotetraploid cottons.

Allotetraploid cotton evolved from the natural hybridization between A- and D-genome species and has split into six species, including the widely cultivated G. barbadense (AD2) and G. hirsutum (AD1). Evolutionary time (in Mya) is indicated by a numbered axis; major evolutionary events are represented by arrows and concluded in boxes. A black star indicates a retrotransposon burst, and a red star indicates a boom in pseudogene production. Gr, G. raimondii, a diploid species (D5); Gb, G. barbadense; Gh, G. hirsutum. Mature cotton fiber is shown for extra-long stable (ELS) cotton (G. barbadense, AD2) and Upland cotton (G. hirsutum, AD1).

Cotton fiber is derived from single-celled, seed-borne hair (trichrome), and the development of fiber cells is largely synchronized in a cotton ball (fruit) in four overlapping stages: initiation, elongation, secondary cell wall synthesis and maturation16. These processes provide an excellent model to dissect cell differentiation, elongation and cellulose biosynthesis. The rate and duration of the elongation stage determines fiber length, and the secondary cell wall biosynthesis affects fiber strength and fineness1718. The Upland cotton G. hirsutum constitutes ~90% of the annual cotton output and is characterized by its high yield yet moderate fiber qualities, whereas the ELS cotton G. barbadense produces over 5% of the world’s cotton and is famous for its superior quality fiber, as based on the length, strength and fineness of its fibers (Fig. 1). Therefore, G. barbadense is preferred for the production of high-grade or special cotton textiles. Although G. barbadense and G. hirsutum may share a common progenitor, the two species substantially differ, which has hindered the transfer of the superior fiber traits of G. barbadense to G. hirsutum via inter-species hybridization. This transfer has been particularly hindered by distorted segregation19. The recently released genome sequences of G. hirsutum2021 and the two extant diploid progenitor species, G. raimondii1011 and G. arboreum12, have provided insight into cotton evolution and a wealth of resources for fiber improvement. A genome sequence of G. barbadense will further our understanding of the dynamics of genome structures and the genetic driving force associated with allotetraploids, particularly the molecular basis of the formation of fibers with superior traits.

Results

Genome sequence and assembly

We adopted a progressive strategy to sequence the allotetraploid genome of G. barbadense cv. Xinhai21 (AD)2. First, the genomes of the extant diploid species of G. arboreum (A2) and G. raimondii (D5) were separately sequenced and assembled. These sequences, together with their published genomes1012, were used as references for early assortments of the primary reads into At and Dt subgenomes. Then the sequences were assembled into At and Dt contigs and scaffolds (Supplementary Table 1). A total of 471 Gb (188× genome equivalent) of data were separately produced using the Roche 454, Illumina Hiseq2000 and PacBio SMRT sequencing platforms (Supplementary Table 2). The particularly long reads (22.67 Gb) obtained from PacBio SMRT and the assembled 53-Gb contigs of the BAC pool further reduced the effects of repeats in the assembly, yielding a gap reduction of 63.4% (Supplementary Fig. 1). Finally, we used the ultra-dense linkage map consisting of 4,999,048 single-nucleotide polymorphism (SNP) loci22 to assign and orient the 26 chromosomes and validate the polyploidy genome of G. barbadense (Supplementary Fig. 2). We detected only 20 Mb sequences in which the subgenome classification of homoeologous sequences conflicted between the sequence assembly and the linkage mapping strategies, which was likely due to sequence conversions between the two subgenomes. A total of 208 Mb sequences with erroneous inter-chromosomal joins in the At or Dt subgenome were detected and then corrected. The combination of these methods resulted in a draft genome for G. barbadense with an overall contig N50 of 72 kilobases (kb) and scaffold N50 of 503 kb covering 1.395 Gigabases (Gb) of the A subgenome (At) and 0.776 Gb of the D subgenome (Dt) (Table 1 and Fig. 2). In total, ~88% of the 2.470 Gb genome was based on k-mer estimation (Supplementary Fig. 3). The genome contains at least 63.2% repeated sequences (Supplementary Table 3), half of which are transposable elements (TEs) that primarily consist of long-terminal-repeat retrotransposons (LTR retrons) (Supplementary Fig. 4).
Table 1

Statistics of G. barbadense genome features.

CategoryAtDt
Genome Size (bp)1,394,663,696775,997,401
Gene Number40,50237,024
Ave. Gene Size (bp)2,6012,553
Total Gene Region123,247,562104,783,505
Ave. CDS Size (bp)1,0991,111
Max. CDS Length19,64716,596
Total Coding Region (bp)52,095,40245,586,340
Total Exon Number240,755208,290
Exon Number per Gene55
Ave. Exon Size (bp)216219
Max. Exon Size (bp)5,6516,031
Total Intron Number193,370167,253
Ave. Intron Size (bp)368354
Max. Intron Size (bp)85,09186,599
Figure 2

G. barbadense genome atlas and chromosome-level translocations.

(a) Genome atlas. The outermost circle represents the numbered chromosomes of At and Dt, and chromosome sizes are marked by a scale plate. The three tracks moving inside successively represent gene, peudogene and repeat densities (calculated with 1 Mb windows) across the chromosomes. The core ribbon-link shows collinearity between At and Dt. (b,c) chromosomal translocations. The translocations among chromosome 2 and chromosome 3 of either At or Dt are indicated with blue lines (b) and those among chromosome 4 and chromosome 5 with blue and purple lines (c). The vertical colored lines from left to right represent chromosomes. The loci of PRE1 implicated in fiber cell elongation are specifically marked with red in the chromosomes A05 and D04. Digits (01 to 13) after A, D or Gr indicate the chromosome of the At/Dt subgenome of G. barbadense or of G. raimondii, respectively.

Gene annotation

To initiate gene prediction, ~1 million expressed sequence tags (ESTs) that were generated using Roche 454 from a combination of 28 samples of eight tissues/organs collected at different development stages were mapped to the genome as gene models, which resulted in 40,502 and 37,024 protein-coding genes (CDSs) with an average length of 1,077 and 1,123 bp in the G. barbadense At and Dt subgenomes, respectively (Table 1), and falling in the same range as the number and length of CDSs of G. raimondii1011. Further evaluation using the 70-Gb RNA-Seq data via Illumina supported 96.6% of the predicted CDSs. The 77,526 predicted genes were annotated, which revealed 62,966 functional genes, excluding 8,518 At and 6,042 Dt genes (~20%) that lacked clear biological functions. To examine the influence of allopolyploidy on gene contents, we classified cotton genes into domain families. The composition and family size of the assigned Pfam domain families are overall identical in G. barbadense At and Dt, G. raimondii and, to a lesser extent, G. arboreum. Protein domains whose function was clearly annotated, such as protein kinase, cytochrome P450, and pentatrico-peptide repeat (PPR), were commonly over-represented as large families (Supplementary Table 4 and Supplementary Fig. 5) as in other angiosperm plants232425. Although most domains (3,039 out of 3,674) were maintained in each subgenome after the two were merged, pronounced changes in family size occurred, as exemplified by more ring finger domain (PF13639) and leucine rich repeat (PF13855) genes in the diploid D genome than in either At or Dt (Supplementary Table 4). This finding suggested that super-large families have evolved faster than others and tended to lose members in polyploids26.

Genome evolution

A total of 21,639 pairs of orthologs were identified between At and Dt. We compared the Ks values of orthologous gene pairs among G. barbadense (Gb), G. hirsutum (Gh) and G. raimondii (Gr) at the whole-genome level (Fig. 3a and Supplementary Table 5). A peak of 0.011 in both GbDt:GrD5 and GhDt:GrD5 indicates that the Dt subgenome in of both allotetraploids originated from a G. raimondii-like progenitor27. The peak values for GbAt:GaA2 and GhAt:GaA2 are lower but again similar, presumably due to a shorter time since divergence compared to that between D-genome species. In addition, unlike G. raimondii, which is a wild species, G. arboreum has long been cultivated in African and Asian countries. Another pair of similar Ks peaks (0.005) of GbAt:GhAt and GbDt:GhDt further supports the common origin of the two allotetraploid cottons and suggests their later divergence approximately 1 Mya (Fig. 3a). Based on the larger Ks value (0.04) for At:Dt, we estimated the divergence time between the Gossypium A- and D-genome species to be approximately 8 Mya, consistent with previous estimates that were based on a few single-copy genes1327. The Ks values of paralogs in the two subgenomes of G. barbadense both peak at 0.4–0.5, which indicate ancient WGD event(s) that occurred 50–70 Mya (Fig. 3b), which were responsible for the repeated genome expansion in Gossypium after divergence from the Theobroma cacao lineage more than 60 Mya10.
Figure 3

Evolutionary analysis of the G. barbadense genome.

(a) Ks distribution of orthologs in cotton genomes. Data are grouped into 0.001 Ks units. (b) Ks distribution of paralogs in the G. barbadense genome. Data are grouped into 0.01 Ks units, and the peak region corresponds to 50–70 million years. (c) The distribution curve of the insertion times in the LTR retrons in the G. barbadense genome. The LTR retrons bursts are separated by dashed lines. (d) Ks distribution of pseudogenes with their closest functional paralogous genes. Data are grouped into 0.001-Ks units. The genomes of allotetraploid cottons are labeled using At/Dt, and the genomes of G. arboreum (A2) and G. raimondii (D5) are labeled using Ga and Gr.

Both the At and Dt subgenomes of G. barbadense demonstrate a high level of co-linearity with the G. raimondii genome1011 (Supplementary Fig. 6). A total of 21 Megabase (Mb) sequences in the Dt and 7.4 Mb in the At were identified as inter-subgenome translocation regions (Supplementary Fig. 7). Two of three major intra-subgenomic rearrangements between chrA2/chrA3 and chrA4/chrA52829 were observed in the At of both of the allotetraploid cottons but absent in the Dt or G. raimondii genome (Fig. 2), suggesting that the two translocations likely occurred after the separation of the A and D genomes.

Genomic plasticity and evolution

We identified 6,014/2,422 complete LTR retrons with an average length of 9,256/8,130 bp in At/Dt (Supplementary Tables 6 and 7), similar to the numbers of LTR retrons in G. hirsutum At and Dt, G. arboreum and G. raimondii (Supplementary Table 8). The singleton LTR retrons ratio is 83.5% in At and 82.2% in Dt (compared with 85.4% in G. raimondii and 73.2% in G. arboreum), close to that (86%) in the genome of a gymnosperm tree, Picea abies30 (an indication of high divergence). The TE proliferations in G. barbadense and G. hirsutum2021, represented by insertions of LTR retrons based on estimations according to the sequence divergence between the left and right soloLTR31, have increased since 20 Mya, and three distinct bursts were identified. Interestingly, the first two bursts appear to successively pre-date the divergence and the re-unification of the diploid A/D genomes (Fig. 3c). The LTR retrons clearly show type-specific and subgenome-biased proliferations (Fig. 3c). Their insertion rates in the A genome appear consistently higher than those in the D genome. For example, a large number (9.15%) of LTR retrons burst at 5 Mya and decreased thereafter in At, whereas a substantially lower and flat peak appeared 3–5 Mya in Dt (Fig. 3c). This peak at least partly accounts for the 1.7-fold more LTR retrons in the former genome. However, the faster loss of LTR retrons in the D genome may also be responsible for genome size variations and the different rates of genome expansion32. Notably, the third asymmetric activities of transposons differ between G. barbadense and G. hirsutum (Fig. 3c), which suggests a possible cause of subgenome divergence that may have promoted the speciation of allotetraploid cottons beginning approximately 1 Mya (Fig. 1). These observations indicate that the genome-specific differential dynamics of TE proliferations could be a major force that has driven the rapid evolution and diversification of Gossypium species, which may also be inferred in other flowering plants.

Pseudogenization prior to and after polyploidization

Pseudogenes are disabled copies resembling functional genes that have been retained in the genome2633. They can be grouped into three categories: duplicated (derived from gene duplication), processed (generated by the integration of reversed-transcribed cDNAs into genomes) and fragmented (neither processed nor duplicated)33. To further investigate the influence of TE bursts and polyploidization on the cotton genomic architecture, we predicted pseudogenes in G. barbadense (Supplementary Table 9) and classified them into the three categories (Supplementary Fig. 8), most of which are silenced without any detectable transcripts in all tissues examined. Each subgenome of G. barbadense contains more predicted pseudogenes than the diploid genome of G. raimondii (Supplementary Table 9 and Supplementary Fig. 8), implying an accelerated pseudogenization after allopolyploid formation. A substantial portion of the pseudogenes in At and Dt showed a high sequence identity (above 90%, for example) with their parental genes (Supplementary Fig. 9), suggesting an insufficient duration for degeneration in recently formed pseudogenes. As expected, the Ka/Ks distributions indicate a substantially weaker natural selection on pseudogenes than on protein-coding genes (Supplementary Fig. 10), which is likely due to a loss of function in pseudogenes. The Ks value peaks at 0.06–0.1 corresponding to 11–20 Mya (Fig. 3d) and this boom of pseudogenization correlates with an LTR retron burst prior to the divergence of the A and D genomes (Fig. 3c). The average expression levels of the genes with LTR retron insertion within a 20-kb region upstream of the start codon are generally lower (RPKM = 7.72) than those of genes lacking this insertion (RPKM = 13) (Supplementary Table 10). Therefore, LTR retrons negatively affect the expression of nearby genes, which may promote pseudogenization. These results suggest that cotton progenitors likely lost genes and experienced LTR retron bursts following the ancient WGD, which promoted diversification in Gossypium genomes; however, the role of TE-associated pseudogenization in the stabilization of subgenomes in polyploids requires a more detailed analysis.

Extra-long staple fiber formation

We identified 2,483 and 1,879 genes that are specifically or preferentially expressed in fibers and the ovule, respectively (Supplementary Tables 11 and 12). The highly active genes in the ovule are abundant in the protein families of nucleic acid binding/transcription factor activity and nutrient reservoir activity, whereas the up-regulated genes of fibers are enriched in several categories, such as those related to cytoskeleton, carbohydrate metabolism, cell wall biosynthesis and cellulose biosynthesis function (Supplementary Tables 13 and 14). Consistent with a previous report34, equal numbers of genes in the At and Dt subgenomes demonstrated biased expression patterns (Supplementary Tables 15 and 16). Transcription factors play an important role in controlling agronomic novelty, and the MYB and homeodomain-containing factors have been shown to be key regulators of cotton fiber traits development10353637. We then analyzed transcription factor genes expressed in G. barbadense fiber in detail (Supplementary Table 17 and Supplementary Fig. 11). aclobutrazol sistance (PRE) genes encode a group of transcription regulators known in other plants to promote cell elongation383940. We identified 13 PRE family genes in G. raimondii; their 26 orthologous genes were recovered in G. barbadense. Analyzing the PRE-containing synteny blocks in plants revealed that cacao41 has five PRE genes, each of which has at least two orthologs in the Gossypium diploid genomes or the allotetraploid subgenomes (Fig. 4a and Supplementary Fig. 12). This expansion of PRE genes in cotton may have occurred during a complex 5–6-fold polyploidy process1011, which was followed by differential gene loss but the retention of the ancient orthologs. Interestingly, two PRE genes are located in the two At translocation regions (chrA2/chrA3 and chrA4/chrA5) (Fig. 2c and Supplementary Fig. 12). In cotton, PRE genes are preferentially expressed in young tissues (Fig. 4b,c), which is consistent with their role in controlling cell size. Moreover, the expression of At and Dt PRE homoeologous genes was biased in G. barbadense (Supplementary Tables 11–12). In particular, the expression level of At-subgenome PRE1 was high and fiber specific, whereas the expression the Dt homoeolog was nearly undetectable (Fig. 4b). The At-specific expression of a cell growth regulator provides a clue to support the origin or early evolution of spinnable fiber in the A-genome species1011. The expansion and subsequent selection1134 of PRE genes in Gossypium may have increased their regulatory activity and recruited specific member(s) for the rapid and extensive elongation of cotton fiber (Figs 1 and 4c).
Figure 4

Expansion and diversification of PRE genes in Gossypium.

(a) Phylogenetic analysis of PRE family genes in Amborella trichopoda, Arabidopsis thaliana, G. raimondii and G. barbadense. Subfamilies are overlaid with different colors, and the curved dotted lines indicate homoeologous gene pairs expressed in fiber. (b) GbPRE1 (GOBAR_AA33780, GOBAR_DD03693) is a fiber-specific gene with strong At bias expression. The expression levels (RPKM) in ovules (0 DPA) and fiber cells (5, 10, 20, and 30 DPA) are shown. Detailed expression data are provided in Supplementary Table 10. (c) Hierarchical clustering analysis of expression of PRE genes in G. barbadense. LB, leaf bud; YL, young leaf; ML, mature leaf; O, ovule; F, fiber; DPA, days post-anthesis.

Cellulose, which consists of linear chains of β (1–4)-linked D-glucose, is the major component of higher plant cell walls and the most abundant biopolymer on land. Plants express multiple cellulose synthases (CesAs) that, together with CesA-associated proteins, form the cellulose synthase complex4243. Cotton fiber is distinct not only in its extensive elongation (ELS cotton fiber is longer than 35 mm) but also in its exceptionally high amount of cellulose, which constitutes more than 95% of the dry weight of mature fiber1644. Notably, the first higher plant cellulose synthase gene was cloned from cotton45. Ten, 14 and 15 CesA genes are expressed in Arabidopsis thaliana4243, G. arboreum12 and G. raimondii10, respectively (Fig. 5 and Table 2). We identified 29 CesA genes, including 14 At and 15 Dt, in the G. barbadense genome, whereas 30 (14 At and 16 Dt) CesA genes were identified in G. hirsutum; most CesA genes had been retained after the merger of the A and D genomes (Table 2 and Supplementary Fig. 13). Compared to Arabidopsis, each cotton genome or subgenome contains more genes in the CesA3, CesA4, CesA7 and CesA8 clades. Notably, chromosome 5 of both the At and Dt subgenomes of G. barbadense (GOBAR_AA25282, GOBAR_AA25287/GOBAR_DD32643, GOBAR_DD32648 and GOBAR_DD32650) and G. hirsutum (Gh_A05G3959, Gh_A05G3965, Gh_A05G3967/Gh_D05G0077, Gh_D05G0079 and Gh_D05G0084) as well as G. arboreum and G. raimondii contain a CesA cluster composed of 3 or, rarely, 2 genes, in addition to the CesA-like (CSL) genes (Table 2); thus, the duplication(s) occurred in the ancient cotton genome.
Figure 5

Cotton CesA genes and their expression in developing fiber cells of G. barbadense and G. hirsutum.

(a) CesA genes from four cotton species were clustered (left) via MAGE5 using the maximum likelihood method. G. arboreum (Cotton_A) and G. raimondii (Gorai) contain 14 and 15 CesA genes, respectively, which are shown in the left column. The heat map (middle) shows the transcript level (FPKM, Reads Per Kilobase of exon model per Million mapped reads) of each homeologous gene in G. barbadense and G. hirsutum (Table 2) fibers at different DPA. The relative expression level in the two allotetraploid cottons was compared (right). CesA1, CesA3 and CesA6 are implicated in primary cell wall biosynthesis, and CesA4, CesA7 and CesA8 are implicated in secondary cell wall biosynthesis. (b) Temporal expression patterns of secondary cell wall CesA genes (CesA7, CesA4 and CesA8 clades) in G. barbadense and G. hirsutum fiber. Note that the expression was generally delayed in G. barbadense fiber. X-axis: day post-anthesis. Y-axis: FPKM.

Table 2

Cellulose synthase (CesA) genes in four cotton species.

Homologs in G. arboreumand G. raimondiiHomologs in G. hirsutumHomologs in G. barbadense
Genes related to the synthesis of cellulose in prototypical primarycell walls (CESA1, CESA3, CESA6 clades)
CESA1 Clade
 no apparent orthologGh_A05G3959no apparent ortholog
 Gorai.009G010200.1Gh_D05G0077GOBAR_DD32650
 Cotton_A_16257Gh_A05G3967GOBAR_AA25287
 Gorai.009G009500.1Gh_D05G0084GOBAR_DD32643
CESA3 Clade
 Cotton_A_03319Gh_A08G0498GOBAR_AA12453
 Gorai.004G065900.1Gh_D08G0584GOBAR_DD11497
 Cotton_A_12985Gh_A08G1305GOBAR_AA08823
 Gorai.004G172400.1Gh_D08G1597GOBAR_DD05460
 Cotton_A_36448Gh_A02G1066GOBAR_AA03569
 Gorai.003G092600.2Gh_D03G0611GOBAR_DD02554
CESA6 Clade
 Cotton_A_25726Gh_A02G1317GOBAR_AA16276
 Gorai.003G049300.1Gh_D03G0455GOBAR_DD10475
 Cotton_A_33937Gh_A05G3694GOBAR_AA32700
 Gorai.009G255100.1Gh_D05G2313GOBAR_DD35549
 Cotton_A_40717Gh_A06G1017GOBAR_AA04815
 Gorai.010G134500.1Gh_D06G1219GOBAR_DD30509
 Cotton_A_35866Gh_A11G3209GOBAR_AA22611
 Gorai.002G150300.1Gh_D11G2235GOBAR_DD13415
 Cotton_A_12808no apparent orthologGOBAR_AA34523
 no apparent orthologGh_D12G0885GOBAR_DD19420
 Genes related to the synthesis of cellulose in prototypicalsecondary cell walls (CESA4, CESA7, CESA8 clades)
CESA4 Clade
 Cotton_A_05922Gh_A07G1871GOBAR_AA03078
 Gorai.001G238100.1Gh_D07G2083GOBAR_DD07125
 Cotton_A_03224Gh_A08G0421no apparent ortholog
 Gorai.004G057400.1Gh_D08G0509GOBAR_DD02235
CESA7 Clade
 Cotton_A_03538Gh_A07G0322GOBAR_AA24905
 Gorai.001G044700.1Gh_D07G0380no apparent ortholog
 Cotton_A_16254Gh_A05G3965GOBAR_AA25282
 Gorai.009G009700.1Gh_D05G0079GOBAR_DD32648
CESA8 Clade
 Gorai.009G161200.1Gh_D05G1460GOBAR_AA30803*
 Cotton_A_17521Gh_A10G0327GOBAR_AA30381
 Gorai.011G037900.1Gh_D10G0333GOBAR_DD29521

*May have translocated.

Although not exclusively, plant CesAs have functionally diverged into two major classes responsible for either primary cell wall or secondary cell wall biosynthesis4243. Whereas spinnable cotton fiber evolved in the A-genome species and further developed in AD allotetraploids, the CesA gene family has not undergone expansion in any of the three cultivated cotton species sequenced. However, cotton fiber expresses many (at least 25) CesA genes (Fig. 5), demonstrating an enrichment of cellulose synthases in fiber cells. A comparison of the two allotetraploid cottons revealed that the secondary cell wall genes CesA4, CesA7 and CesA8 showed a delayed (>5 days) and more drastic up-regulation in G. barbadense fiber than in G. hirsutum fiber (Fig. 5), which indicates a prolonged duration of fiber elongation and a high activity of cellulose biosynthesis in the secondary cell wall formation stage. Additionally, this temporal expression pattern suggests that the functional allocation of CesA members to primary and secondary wall biosynthesis, which is primarily based on Arabidopsis research424346, are likely conserved in angiosperms. Thus, both the retention of CesA family members and the expression pattern of functionally specialized genes in G. barbadense support the formation of extra-long and high-grade cotton fiber.

Terpene synthases and the evolution of cotton phytoalexins

Terpenoids constitute a large family of natural compounds and play diverse roles in plant-environment interactions. Cotton plants accumulate a specialized group of cadinene-type sesquiterpenoids (including gossypol) that function as phytoalexins against pathogens and pests4748. However, these sesquiterpenoids also reduce the value of cotton seeds that are rich in oil and proteins. Terpene synthases (TPSs) are a family of enzymes responsible for the synthesis of various terpenes from the 10-, 15-, and 20-carbon precursors assembled from the 5-carbon building blocks of IPP and its isomer DMAPP49. A manual search of the G. barbadense genome with TPS N- and C-terminal domains (PF01397 and PF03936) identified 115 TPS genes, including 44 monoterpene, 59 sesquiterpene and 8 diterpene synthases, as well as 4 triterpene (squalene) synthases. This number is higher than that in T. cacao (43), Arabidopsis thaliana (34) and Vitis vinifera (98) and similar to that in G. hirsutum (110) but slightly less than twice that in G. raimondii (69). The cotton sesquiterpene synthase (+)-δ-cadinene synthase (CDN) catalyzes the first step of gossypol biosynthesis50. The G. barbadense genome harbors 19 CDN family genes (sharing >80% nucleotide identity), whereas G. raimondii, G. arboreum and G. hirsutum harbor 11, 14 and 13 of these genes, respectively (Fig. 6 and Supplementary Table 18). These genes evolved faster than cotton speciation; thus, the CDN family evolved approximately 60 Mya based on the phylogenetics of cotton plants (Fig. 1). The CDN subfamilies A and E were found closer to the ancient type and duplicated after the divergence of the cotton and cacao lineages (Fig. 6 and Supplementary Fig. 14). The variable CDN gene numbers in cotton species possibly refer to recent small-scale duplication events, e.g., CDN-A member duplication in the D genome ~1 Mya (Supplementary Table 18 and Supplementary Fig. 14). Thus, the CDN subfamilies in Gossypium represent an example of the rapid lineage-specific evolution of critical genes for specialized metabolites.
Figure 6

Phylogenetic analysis of (+)-δ-cadinene synthase (CDN) family genes and their genome distribution.

(a) The amino acid sequences of CDNs of G. arboreum (Cotton_A), G. raimondii (Gorai), G. hirsutum (Gh) and G. barbadense (GOBAR) and T. cacao (Thecc) were used to build the phylogenetic tree using a neighbor-joining algorithm via the MEGA software. The Arabidopsis thaliana sesquiterpene synthase gene At5g23960 was used as a phylogenetic outgroup. (b) Chromosomal locations of the CDN genes in four Gossypium species as indicated.

Discussion

ELS cotton likely produces one of the most resilient fibers in the plant kingdom; they are highly elongated and contain nearly pure cellulose. This draft sequence of the G. barbadense genome provides valuable genomic resources for studying various aspects of cotton. This draft sequence also facilitates breeding practices aimed at improving cotton fiber traits and increasing the production of high-quality biomass (cellulose). The genomes of two or more parental species have combined to significantly change the genome structure and function of allopolyploid plants385152. Inter-genomic chromosomal rearrangements, differential gene loss (the loss of some duplicates), gene conversion, divergence and the functional diversification of duplicated genes often arise with the onset of polyploidization53. Our comparative analysis of cotton genomes also provides new insight into dynamic allopolyploidy processes, such as the mechanism via TE (LTR retrons) bursts and pseudogenization, which have significantly contributed to plant genome evolution and trait formation.

Methods

Plant materials

Young leaves of Gossypium barbadense cv. Xinhai21, G. arboreum cv. Qingyangxiaozi and G. raimondii were collected from a single plant of each species for genomic DNA extraction and sequencing. For transcriptome sequencing, 28 samples from G. barbadense roots, stems, flowers, leaves, ovules and fibers were collected for total RNA extraction (Supplementary Table 19).

DNA isolation, library construction and sequencing

Genomic DNA was isolated from fresh cotton leaves using a previously described method54. The shotgun library (300–800 bp fragments) was prepared from 5 μg of DNA using a standard protocol, and a total of 55,296,227 reads with an average length of 542 bp were produced via Roche 454 GS FLX to provide a 12-fold coverage of the genome. The paired-end libraries of different insertion sizes were constructed, and 1,325,215,140 pairs of 100-bp reads were produced via Illumina Hiseq2000 (Illumina, San Diego, CA) to provide 105-fold coverage of the genome. The 3-, 5-, 8 and 20-kb mate-pair libraries were constructed by combining the GS FLX and Illumina mate-pair protocol, and a total of 773,715,534 mate-pair reads were produced via Illumina Hiseq2000 to provide 61.9-fold sequencing coverage. The BAC library (insert, 80–120 kb) was constructed using the pCC1BAC vector (Epicentre Inc.) and Hind III enzyme digestion. The BAC clones were both-end sequenced using ABI 3730, and 20 BACs at a time were pooled and sequenced on Illumina Hiseq2000 to generate a 300-bp paired-end library. For the PacBio library construction and sequencing, genomic DNA was sheared using a Covaris g-TUBE followed by purification via binding to pre-washed AMPure XP beads (Beckman Coulter Inc.). After end-repair, the blunt adapters were ligated, followed by exonuclease incubation to remove all un-ligated adapters and DNA. The final “SMRT bells” were annealed with primers and bound to the proprietary polymerase using the PacBio DNA/Polymerase Binding Kit P4 (Part Number 100–236–500) to form the “Binding Complex”. After dilution, the library was loaded onto the instrument with DNA Sequencing Kit 2.0 (Part Number 100–216–400) and a SMRT Cell 8Pac for sequencing. A primary filtering analysis was performed with the RS instrument, and the secondary analysis was performed using the SMRT analysis pipeline version 2.1.0.

Genome assembly

The genomes of two diploid cotton species, G. arboreum and G. raimondii, were each sequenced at 100-fold coverage using Illumina Hiseq2000. The assembly resulted in 3,767,593 contigs of 1.5 Gb for G. arboreum and 1,111,300 contigs of 788 Mb for G. raimondii. These contigs, together with the published genomic data of G. raimondii10 and G. arboreum12, were used as template for grouping of G. barbadense sequencing reads into subgenomes, which resulted in totally 44.9% of the reads being At-unique, 26.9% being Dt-unique and 9.7% being both sharing. The remaining 18.5% none hit reads were further grouped during subgenome during sequence assembly. After subgenome grouping, the At and Dt subgenomes of G. barbadense were assembled separately using a combined strategy. The Roche 454 reads were first assembled using Newbler v2.3. In total, 773,548 contigs with an average length of 2.5 kb were produced. Illumina pair-end reads, mate-pair reads, PacBio SMRT reads and BAC ends were then successively mapped to the contigs to improve quality. The 59,868 contigs (BACtigs) with an N50 of 23.8 kb from 515 BAC pools were merged. These approaches resulted in 4,586 At scaffolds and 2,186 Dt scaffolds with a total size of 2.2 Gb and maximum length of 3.4 Mb. Data statistics are given in Supplementary Table 2 and Table 1. Finally, a high-density genetic map of G. hirsutum cv. TM-1 × G. barbadense cv. Hai7124 containing 4,999,048 SNPs22 was mapped to the G. barbadense assembly using the BWA program, which anchored 1.95 Gb or 88% of the assembled sequences and produced 26 pseudo-molecules (chromosomes).

Gene prediction and annotation

Three gene prediction programs, GeneMark (v2.3a)54, Augustus (v2.5)55 and FgeneSH56, were used to predict protein-coding genes in the G. barbadense genome. A final gene model was produced by combining the three prediction results with an in-house developed program (GLAD), a tool that creates consensus gene lists by integrating evidence from homology, de novo prediction, and RNA-Seq/EST data. Annotation was performed by comparing the predicted proteins with non-redundant proteins (nr) and the UniProt and KEGG databases. Blast2go57 was used to assign preliminary GO terms to the predicted gene models. Transcription factors were predicted using PlantTFDB v3.058. Protein domain predictions were performed using RPS-BLAST with a coverage >90%. The metabolic pathways were constructed using the KEGG database59.

Ortholog identification and Ks calculation

Genes were classified into ortholog groups with OrthoMCL60 against OrthoMCL proteins (default parameters) [PMID: 12952885]. The orthologs between species, or homoeologs between the At and Dt subgenomes of G. barbadense, were defined using BLASTP based on the Bidirectional Best Hit (BBH) method with a sequence coverage >30% and identity >30%, followed by selection of the best match. The Ka and Ks between orthologs were calculated using the KaKs_Calculator61 via model averaging. The unique gene in each subgenome was defined using the following parameters: 1. protein sequence with no match according to BLASTP against proteins of the other subgenome with E-value 1E-3; and 2. the sum of the length of the high-scoring segment pairs (HSP) was less than 1/3 of the CDS length (via BLASTN) against the genome sequence of the other subgenome.

Repeat and LTR retrotransposon analysis

Repetitive sequences were identified using RepeatScout with default parameters. The consensus sequences of each repeat family were used to identify repeat compositions in the genome via Censor. The complete LTR retron structures were predicted using LTR_finder62, and miniature inverted-repeat transposable elements (MITEs) were identified using MITE-Hunter63. Individual LTR retrotransposons were clustered into the same family using the 80–80–80 rule: If two TIR sequences share 80% or higher similarity in at least 80% of their length with an alignment length longer than 80 bp, the two sequences were clustered into the same family64. The insertion ages of each full-length LTR retron were calculated based on the divergence between the left and right solo-LTR sequences using distmat from EMBOSS with the Kimura-2-parameter distance option, and insertion ages were calculated according to the formula T = K/(2r) (K = Kimura distance value, average substitution rate r = 2.6 × 10–9 in cotton).

Pseudogene identification

Pseudogenes were predicted using Pseudopipe65 with default parameters. The predicted protein-coding gene sequences from both G. barbadense subgenomes were used as queries to search repeat-masked intergenic regions. Putative pseudogenes were filtered by excluding genes that significantly overlapped with functional gene annotations, genes with parental genes annotated as transposon elements or plastid genes, and genes with sequence lengths shorter than 150 bp.

RNA extraction and transcriptome sequencing

The total RNA from each sample was extracted using TRIzol reagent (Invitrogen) following a standard protocol. The mRNAs were purified with the MicroPoly(A) Purist Kit (Ambion), fragmented and converted into an RNA-Seq library using the mRNAseq library construction kit (Illumina Inc.) and sequenced via Illumina Hiseq2000. The mRNAs of 28 samples were also pooled and sequenced on the 454 Genome Sequencer FLX instrument. Sequence reads from all samples were cleaned using the FASTX toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). All reads containing ‘N’ were discarded. Adapter sequences were then removed using the fastx_clipper program, followed by the removal of low-quality (Q < 5) bases from the 3′ end with fastq_quality_trimmer while requiring a minimum sequence length of 50 bp. The RNA-Seq reads of each sample were mapped to the At and Dt genes using bowtie266 with a mismatch in seed alignment of 0. Differentially expressed genes were identified via the DEGseq package using the MARS method (MA-plot-based method with Random Sampling model)67 based on their RPKM (Reads Per Kilobases per Million reads) or FPKM (reads per kilobase of exon model per million mapped reads) values68 with an FDR≤0.001 and |log2 Ratio |≥ 1 as the threshold. KEGG pathway enrichment was performed with a corrected P-value of < 0.05 as a threshold. GO enrichment was performed using Blast2go57.

Additional Information

Accession numbers: The G. barbadense genome assembly contigs and scaffolds have been deposited in GenBank under PRJNA251673. The sequences and functional annotation of G. barbadense protein encoding genes, including predicted genes and transcriptome data, are available from the website. (http://database.chgc.sh.cn/cotton/index.html). How to cite this article: Liu, X. et al. Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites. Sci. Rep. 5, 14139; doi: 10.1038/srep14139 (2015).
  64 in total

Review 1.  Cotton fiber growth in planta and in vitro. Models for plant cell elongation and cell wall biogenesis.

Authors:  H J Kim; B A Triplett
Journal:  Plant Physiol       Date:  2001-12       Impact factor: 8.340

Review 2.  Novel patterns of gene expression in polyploid plants.

Authors:  Keith L Adams; Jonathan F Wendel
Journal:  Trends Genet       Date:  2005-10       Impact factor: 11.639

3.  Plant terpenoid synthases: molecular biology and phylogenetic analysis.

Authors:  J Bohlmann; G Meyer-Gauen; R Croteau
Journal:  Proc Natl Acad Sci U S A       Date:  1998-04-14       Impact factor: 11.205

4.  Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution.

Authors:  Fuguang Li; Guangyi Fan; Cairui Lu; Guanghui Xiao; Changsong Zou; Russell J Kohel; Zhiying Ma; Haihong Shang; Xiongfeng Ma; Jianyong Wu; Xinming Liang; Gai Huang; Richard G Percy; Kun Liu; Weihua Yang; Wenbin Chen; Xiongming Du; Chengcheng Shi; Youlu Yuan; Wuwei Ye; Xin Liu; Xueyan Zhang; Weiqing Liu; Hengling Wei; Shoujun Wei; Guodong Huang; Xianlong Zhang; Shuijin Zhu; He Zhang; Fengming Sun; Xingfen Wang; Jie Liang; Jiahao Wang; Qiang He; Leihuan Huang; Jun Wang; Jinjie Cui; Guoli Song; Kunbo Wang; Xun Xu; John Z Yu; Yuxian Zhu; Shuxun Yu
Journal:  Nat Biotechnol       Date:  2015-04-20       Impact factor: 54.908

5.  Inheritance of long staple fiber quality traits of Gossypium barbadense in G. hirsutum background using CSILs.

Authors:  Peng Wang; Yajuan Zhu; Xianliang Song; Zhibin Cao; Yezhang Ding; Bingliang Liu; Xiefei Zhu; Sen Wang; Wangzhen Guo; Tianzhen Zhang
Journal:  Theor Appl Genet       Date:  2012-05       Impact factor: 5.699

6.  Genome sequence of the cultivated cotton Gossypium arboreum.

Authors:  Fuguang Li; Guangyi Fan; Kunbo Wang; Fengming Sun; Youlu Yuan; Guoli Song; Qin Li; Zhiying Ma; Cairui Lu; Changsong Zou; Wenbin Chen; Xinming Liang; Haihong Shang; Weiqing Liu; Chengcheng Shi; Guanghui Xiao; Caiyun Gou; Wuwei Ye; Xun Xu; Xueyan Zhang; Hengling Wei; Zhifang Li; Guiyin Zhang; Junyi Wang; Kun Liu; Russell J Kohel; Richard G Percy; John Z Yu; Yu-Xian Zhu; Jun Wang; Shuxun Yu
Journal:  Nat Genet       Date:  2014-05-18       Impact factor: 38.330

7.  Plant genetics. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome.

Authors:  Boulos Chalhoub; France Denoeud; Shengyi Liu; Isobel A P Parkin; Haibao Tang; Xiyin Wang; Julien Chiquet; Harry Belcram; Chaobo Tong; Birgit Samans; Margot Corréa; Corinne Da Silva; Jérémy Just; Cyril Falentin; Chu Shin Koh; Isabelle Le Clainche; Maria Bernard; Pascal Bento; Benjamin Noel; Karine Labadie; Adriana Alberti; Mathieu Charles; Dominique Arnaud; Hui Guo; Christian Daviaud; Salman Alamery; Kamel Jabbari; Meixia Zhao; Patrick P Edger; Houda Chelaifa; David Tack; Gilles Lassalle; Imen Mestiri; Nicolas Schnel; Marie-Christine Le Paslier; Guangyi Fan; Victor Renault; Philippe E Bayer; Agnieszka A Golicz; Sahana Manoli; Tae-Ho Lee; Vinh Ha Dinh Thi; Smahane Chalabi; Qiong Hu; Chuchuan Fan; Reece Tollenaere; Yunhai Lu; Christophe Battail; Jinxiong Shen; Christine H D Sidebottom; Xinfa Wang; Aurélie Canaguier; Aurélie Chauveau; Aurélie Bérard; Gwenaëlle Deniot; Mei Guan; Zhongsong Liu; Fengming Sun; Yong Pyo Lim; Eric Lyons; Christopher D Town; Ian Bancroft; Xiaowu Wang; Jinling Meng; Jianxin Ma; J Chris Pires; Graham J King; Dominique Brunel; Régine Delourme; Michel Renard; Jean-Marc Aury; Keith L Adams; Jacqueline Batley; Rod J Snowdon; Jorg Tost; David Edwards; Yongming Zhou; Wei Hua; Andrew G Sharpe; Andrew H Paterson; Chunyun Guan; Patrick Wincker
Journal:  Science       Date:  2014-08-21       Impact factor: 47.728

8.  MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences.

Authors:  Yujun Han; Susan R Wessler
Journal:  Nucleic Acids Res       Date:  2010-09-29       Impact factor: 16.971

9.  LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons.

Authors:  Zhao Xu; Hao Wang
Journal:  Nucleic Acids Res       Date:  2007-05-07       Impact factor: 16.971

10.  Insights into the evolution of cotton diploids and polyploids from whole-genome re-sequencing.

Authors:  Justin T Page; Mark D Huynh; Zach S Liechty; Kara Grupp; David Stelly; Amanda M Hulse; Hamid Ashrafi; Allen Van Deynze; Jonathan F Wendel; Joshua A Udall
Journal:  G3 (Bethesda)       Date:  2013-10-03       Impact factor: 3.154

View more
  106 in total

1.  Fine-mapping qFS07.1 controlling fiber strength in upland cotton (Gossypium hirsutum L.).

Authors:  Xiaomei Fang; Xueying Liu; Xiaoqin Wang; Wenwen Wang; Dexin Liu; Jian Zhang; Dajun Liu; Zhonghua Teng; Zhaoyun Tan; Fang Liu; Fengjiao Zhang; Maochao Jiang; Xiuling Jia; Jianwei Zhong; Jinghong Yang; Zhengsheng Zhang
Journal:  Theor Appl Genet       Date:  2017-01-31       Impact factor: 5.699

2.  Preferential insertion of a Ty1 LTR-retrotransposon into the A sub-genome's HD1 gene significantly correlated with the reduction in stem trichomes of tetraploid cotton.

Authors:  Mengling Tang; Xingcheng Wu; Yuefen Cao; Yuan Qin; Mingquan Ding; Yurong Jiang; Chengdong Sun; Hua Zhang; Andrew H Paterson; Junkang Rong
Journal:  Mol Genet Genomics       Date:  2019-08-16       Impact factor: 3.291

3.  Enriching an intraspecific genetic map and identifying QTL for fiber quality and yield component traits across multiple environments in Upland cotton (Gossypium hirsutum L.).

Authors:  Xueying Liu; Zhonghua Teng; Jinxia Wang; Tiantian Wu; Zhiqin Zhang; Xianping Deng; Xiaomei Fang; Zhaoyun Tan; Iftikhar Ali; Dexin Liu; Jian Zhang; Dajun Liu; Fang Liu; Zhengsheng Zhang
Journal:  Mol Genet Genomics       Date:  2017-07-21       Impact factor: 3.291

4.  Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton.

Authors:  Yan Hu; Jiedan Chen; Lei Fang; Zhiyuan Zhang; Wei Ma; Yongchao Niu; Longzhen Ju; Jieqiong Deng; Ting Zhao; Jinmin Lian; Kobi Baruch; David Fang; Xia Liu; Yong-Ling Ruan; Mehboob-Ur Rahman; Jinlei Han; Kai Wang; Qiong Wang; Huaitong Wu; Gaofu Mei; Yihao Zang; Zegang Han; Chenyu Xu; Weijuan Shen; Duofeng Yang; Zhanfeng Si; Fan Dai; Liangfeng Zou; Fei Huang; Yulin Bai; Yugao Zhang; Avital Brodt; Hilla Ben-Hamo; Xiefei Zhu; Baoliang Zhou; Xueying Guan; Shuijin Zhu; Xiaoya Chen; Tianzhen Zhang
Journal:  Nat Genet       Date:  2019-03-18       Impact factor: 38.330

5.  A D-genome-originated Ty1/Copia-type retrotransposon family expanded significantly in tetraploid cottons.

Authors:  Qian Li; Yue Zhang; Zhengsheng Zhang; Xianbi Li; Dan Yao; Yi Wang; Xufen Ouyang; Yaohua Li; Wu Song; Yuehua Xiao
Journal:  Mol Genet Genomics       Date:  2017-08-28       Impact factor: 3.291

6.  Exploration of miRNAs and target genes of cytoplasmic male sterility line in cotton during flower bud development.

Authors:  Hushuai Nie; Yumei Wang; Ying Su; Jinping Hua
Journal:  Funct Integr Genomics       Date:  2018-04-07       Impact factor: 3.410

Review 7.  Is It Ordered Correctly? Validating Genome Assemblies by Optical Mapping.

Authors:  Joshua A Udall; R Kelly Dawe
Journal:  Plant Cell       Date:  2017-12-20       Impact factor: 11.277

8.  Identification of candidate genes from the SAD gene family in cotton for determination of cottonseed oil composition.

Authors:  Xiaoguang Shang; Chaoze Cheng; Jian Ding; Wangzhen Guo
Journal:  Mol Genet Genomics       Date:  2016-10-28       Impact factor: 3.291

9.  The cellulose synthase (CesA) gene family in four Gossypium species: phylogenetics, sequence variation and gene expression in relation to fiber quality in Upland cotton.

Authors:  Sujun Zhang; Zhenxing Jiang; Jie Chen; Zongfu Han; Jina Chi; Xihua Li; Jiwen Yu; Chaozhu Xing; Mingzhou Song; Jianyong Wu; Feng Liu; Xiangyun Zhang; Jinfa Zhang; Jianhong Zhang
Journal:  Mol Genet Genomics       Date:  2021-01-13       Impact factor: 3.291

10.  Fine mapping and candidate gene analysis of the virescent gene v 1 in Upland cotton (Gossypium hirsutum).

Authors:  Guangzhi Mao; Qiang Ma; Hengling Wei; Junji Su; Hantao Wang; Qifeng Ma; Shuli Fan; Meizhen Song; Xianlong Zhang; Shuxun Yu
Journal:  Mol Genet Genomics       Date:  2017-10-20       Impact factor: 3.291

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.