| Literature DB >> 26420475 |
Xia Liu1, Bo Zhao2, Hua-Jun Zheng3, Yan Hu4, Gang Lu3, Chang-Qing Yang2, Jie-Dan Chen4, Jun-Jian Chen1, Dian-Yang Chen2, Liang Zhang3, Yan Zhou3,5, Ling-Jian Wang2, Wang-Zhen Guo4, Yu-Lin Bai1, Ju-Xin Ruan2, Xiao-Xia Shangguan2, Ying-Bo Mao2, Chun-Min Shan2, Jian-Ping Jiang3, Yong-Qiang Zhu3, Lei Jin3, Hui Kang3, Shu-Ting Chen3, Xu-Lin He3, Rui Wang3, Yue-Zhu Wang3, Jie Chen3, Li-Jun Wang3, Shu-Ting Yu3, Bi-Yun Wang3, Jia Wei3, Si-Chao Song3, Xin-Yan Lu3, Zheng-Chao Gao3, Wen-Yi Gu3, Xiao Deng6, Dan Ma4, Sen Wang4, Wen-Hua Liang4, Lei Fang4, Cai-Ping Cai4, Xie-Fei Zhu4, Bao-Liang Zhou4, Z Jeffrey Chen4,7, Shu-Hua Xu8, Yu-Gao Zhang1, Sheng-Yue Wang3, Tian-Zhen Zhang4, Guo-Ping Zhao2,3,5, Xiao-Ya Chen2.
Abstract
Of the two cultivated species of allopolyploid cotton, Gossypium barbadense produces extra-long fibers for the production of superior textiles. We sequenced its genome (AD)2 and performed a comparative analysis. We identified three bursts of retrotransposons from 20 million years ago (Mya) and a genome-wide uneven pseudogenization peak at 11-20 Mya, which likely contributed to genomic divergences. Among the 2,483 genes preferentially expressed in fiber, a cell elongation regulator, PRE1, is strikingly At biased and fiber specific, echoing the A-genome origin of spinnable fiber. The expansion of the PRE members implies a genetic factor that underlies fiber elongation. Mature cotton fiber consists of nearly pure cellulose. G. barbadense and G. hirsutum contain 29 and 30 cellulose synthase (CesA) genes, respectively; whereas most of these genes (>25) are expressed in fiber, genes for secondary cell wall biosynthesis exhibited a delayed and higher degree of up-regulation in G. barbadense compared with G. hirsutum, conferring an extended elongation stage and highly active secondary wall deposition during extra-long fiber development. The rapid diversification of sesquiterpene synthase genes in the gossypol pathway exemplifies the chemical diversity of lineage-specific secondary metabolites. The G. barbadense genome advances our understanding of allopolyploidy, which will help improve cotton fiber quality.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26420475 PMCID: PMC4588572 DOI: 10.1038/srep14139
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1A schematic map of the evolution of allotetraploid cottons.
Allotetraploid cotton evolved from the natural hybridization between A- and D-genome species and has split into six species, including the widely cultivated G. barbadense (AD2) and G. hirsutum (AD1). Evolutionary time (in Mya) is indicated by a numbered axis; major evolutionary events are represented by arrows and concluded in boxes. A black star indicates a retrotransposon burst, and a red star indicates a boom in pseudogene production. Gr, G. raimondii, a diploid species (D5); Gb, G. barbadense; Gh, G. hirsutum. Mature cotton fiber is shown for extra-long stable (ELS) cotton (G. barbadense, AD2) and Upland cotton (G. hirsutum, AD1).
Statistics of G. barbadense genome features.
| Genome Size (bp) | 1,394,663,696 | 775,997,401 |
| Gene Number | 40,502 | 37,024 |
| Ave. Gene Size (bp) | 2,601 | 2,553 |
| Total Gene Region | 123,247,562 | 104,783,505 |
| Ave. CDS Size (bp) | 1,099 | 1,111 |
| Max. CDS Length | 19,647 | 16,596 |
| Total Coding Region (bp) | 52,095,402 | 45,586,340 |
| Total Exon Number | 240,755 | 208,290 |
| Exon Number per Gene | 5 | 5 |
| Ave. Exon Size (bp) | 216 | 219 |
| Max. Exon Size (bp) | 5,651 | 6,031 |
| Total Intron Number | 193,370 | 167,253 |
| Ave. Intron Size (bp) | 368 | 354 |
| Max. Intron Size (bp) | 85,091 | 86,599 |
Figure 2G. barbadense genome atlas and chromosome-level translocations.
(a) Genome atlas. The outermost circle represents the numbered chromosomes of At and Dt, and chromosome sizes are marked by a scale plate. The three tracks moving inside successively represent gene, peudogene and repeat densities (calculated with 1 Mb windows) across the chromosomes. The core ribbon-link shows collinearity between At and Dt. (b,c) chromosomal translocations. The translocations among chromosome 2 and chromosome 3 of either At or Dt are indicated with blue lines (b) and those among chromosome 4 and chromosome 5 with blue and purple lines (c). The vertical colored lines from left to right represent chromosomes. The loci of PRE1 implicated in fiber cell elongation are specifically marked with red in the chromosomes A05 and D04. Digits (01 to 13) after A, D or Gr indicate the chromosome of the At/Dt subgenome of G. barbadense or of G. raimondii, respectively.
Figure 3Evolutionary analysis of the G. barbadense genome.
(a) Ks distribution of orthologs in cotton genomes. Data are grouped into 0.001 Ks units. (b) Ks distribution of paralogs in the G. barbadense genome. Data are grouped into 0.01 Ks units, and the peak region corresponds to 50–70 million years. (c) The distribution curve of the insertion times in the LTR retrons in the G. barbadense genome. The LTR retrons bursts are separated by dashed lines. (d) Ks distribution of pseudogenes with their closest functional paralogous genes. Data are grouped into 0.001-Ks units. The genomes of allotetraploid cottons are labeled using At/Dt, and the genomes of G. arboreum (A2) and G. raimondii (D5) are labeled using Ga and Gr.
Figure 4Expansion and diversification of PRE genes in Gossypium.
(a) Phylogenetic analysis of PRE family genes in Amborella trichopoda, Arabidopsis thaliana, G. raimondii and G. barbadense. Subfamilies are overlaid with different colors, and the curved dotted lines indicate homoeologous gene pairs expressed in fiber. (b) GbPRE1 (GOBAR_AA33780, GOBAR_DD03693) is a fiber-specific gene with strong At bias expression. The expression levels (RPKM) in ovules (0 DPA) and fiber cells (5, 10, 20, and 30 DPA) are shown. Detailed expression data are provided in Supplementary Table 10. (c) Hierarchical clustering analysis of expression of PRE genes in G. barbadense. LB, leaf bud; YL, young leaf; ML, mature leaf; O, ovule; F, fiber; DPA, days post-anthesis.
Figure 5Cotton CesA genes and their expression in developing fiber cells of G. barbadense and G. hirsutum.
(a) CesA genes from four cotton species were clustered (left) via MAGE5 using the maximum likelihood method. G. arboreum (Cotton_A) and G. raimondii (Gorai) contain 14 and 15 CesA genes, respectively, which are shown in the left column. The heat map (middle) shows the transcript level (FPKM, Reads Per Kilobase of exon model per Million mapped reads) of each homeologous gene in G. barbadense and G. hirsutum (Table 2) fibers at different DPA. The relative expression level in the two allotetraploid cottons was compared (right). CesA1, CesA3 and CesA6 are implicated in primary cell wall biosynthesis, and CesA4, CesA7 and CesA8 are implicated in secondary cell wall biosynthesis. (b) Temporal expression patterns of secondary cell wall CesA genes (CesA7, CesA4 and CesA8 clades) in G. barbadense and G. hirsutum fiber. Note that the expression was generally delayed in G. barbadense fiber. X-axis: day post-anthesis. Y-axis: FPKM.
Cellulose synthase (CesA) genes in four cotton species.
| Homologs in | ||
|---|---|---|
| Genes related to the synthesis of cellulose in prototypical primarycell walls (CESA1, CESA3, CESA6 clades) | ||
| CESA1 Clade | ||
| no apparent ortholog | Gh_A05G3959 | no apparent ortholog |
| Gorai.009G010200.1 | Gh_D05G0077 | GOBAR_DD32650 |
| Cotton_A_16257 | Gh_A05G3967 | GOBAR_AA25287 |
| Gorai.009G009500.1 | Gh_D05G0084 | GOBAR_DD32643 |
| CESA3 Clade | ||
| Cotton_A_03319 | Gh_A08G0498 | GOBAR_AA12453 |
| Gorai.004G065900.1 | Gh_D08G0584 | GOBAR_DD11497 |
| Cotton_A_12985 | Gh_A08G1305 | GOBAR_AA08823 |
| Gorai.004G172400.1 | Gh_D08G1597 | GOBAR_DD05460 |
| Cotton_A_36448 | Gh_A02G1066 | GOBAR_AA03569 |
| Gorai.003G092600.2 | Gh_D03G0611 | GOBAR_DD02554 |
| CESA6 Clade | ||
| Cotton_A_25726 | Gh_A02G1317 | GOBAR_AA16276 |
| Gorai.003G049300.1 | Gh_D03G0455 | GOBAR_DD10475 |
| Cotton_A_33937 | Gh_A05G3694 | GOBAR_AA32700 |
| Gorai.009G255100.1 | Gh_D05G2313 | GOBAR_DD35549 |
| Cotton_A_40717 | Gh_A06G1017 | GOBAR_AA04815 |
| Gorai.010G134500.1 | Gh_D06G1219 | GOBAR_DD30509 |
| Cotton_A_35866 | Gh_A11G3209 | GOBAR_AA22611 |
| Gorai.002G150300.1 | Gh_D11G2235 | GOBAR_DD13415 |
| Cotton_A_12808 | no apparent ortholog | GOBAR_AA34523 |
| no apparent ortholog | Gh_D12G0885 | GOBAR_DD19420 |
| CESA4 Clade | ||
| Cotton_A_05922 | Gh_A07G1871 | GOBAR_AA03078 |
| Gorai.001G238100.1 | Gh_D07G2083 | GOBAR_DD07125 |
| Cotton_A_03224 | Gh_A08G0421 | no apparent ortholog |
| Gorai.004G057400.1 | Gh_D08G0509 | GOBAR_DD02235 |
| CESA7 Clade | ||
| Cotton_A_03538 | Gh_A07G0322 | GOBAR_AA24905 |
| Gorai.001G044700.1 | Gh_D07G0380 | no apparent ortholog |
| Cotton_A_16254 | Gh_A05G3965 | GOBAR_AA25282 |
| Gorai.009G009700.1 | Gh_D05G0079 | GOBAR_DD32648 |
| CESA8 Clade | ||
| Gorai.009G161200.1 | Gh_D05G1460 | GOBAR_AA30803 |
| Cotton_A_17521 | Gh_A10G0327 | GOBAR_AA30381 |
| Gorai.011G037900.1 | Gh_D10G0333 | GOBAR_DD29521 |
*May have translocated.
Figure 6Phylogenetic analysis of (+)-δ-cadinene synthase (CDN) family genes and their genome distribution.
(a) The amino acid sequences of CDNs of G. arboreum (Cotton_A), G. raimondii (Gorai), G. hirsutum (Gh) and G. barbadense (GOBAR) and T. cacao (Thecc) were used to build the phylogenetic tree using a neighbor-joining algorithm via the MEGA software. The Arabidopsis thaliana sesquiterpene synthase gene At5g23960 was used as a phylogenetic outgroup. (b) Chromosomal locations of the CDN genes in four Gossypium species as indicated.