| Literature DB >> 26634818 |
Daojun Yuan1, Zhonghui Tang1,2, Maojun Wang1, Wenhui Gao1, Lili Tu1, Xin Jin1, Lingling Chen1,3, Yonghui He1, Lin Zhang1, Longfu Zhu1, Yang Li1, Qiqi Liang1, Zhongxu Lin1, Xiyan Yang1, Nian Liu1, Shuangxia Jin1, Yang Lei3, Yuanhao Ding1, Guoliang Li1,2,3, Xiaoan Ruan1,2, Yijun Ruan1,2,4, Xianlong Zhang1.
Abstract
Gossypium hirsutum contributes the most production of cotton fibre, but G. barbadense is valued for its better comprehensive resistance and superior fibre properties. However, the allotetraploid genome of G. barbadense has not been comprehensively analysed. Here we present a high-quality assembly of the 2.57 gigabase genome of G. barbadense, including 80,876 protein-coding genes. The double-sized genome of the A (or At) (1.50 Gb) against D (or Dt) (853 Mb) primarily resulted from the expansion of Gypsy elements, including Peabody and Retrosat2 subclades in the Del clade, and the Athila subclade in the Athila/Tat clade. Substantial gene expansion and contraction were observed and rich homoeologous gene pairs with biased expression patterns were identified, suggesting abundant gene sub-functionalization occurred by allopolyploidization. More specifically, the CesA gene family has adapted differentially temporal expression patterns, suggesting an integrated regulatory mechanism of CesA genes from At and Dt subgenomes for the primary and secondary cellulose biosynthesis of cotton fibre in a "relay race"-like fashion. We anticipate that the G. barbadense genome sequence will advance our understanding the mechanism of genome polyploidization and underpin genome-wide comparison research in this genus.Entities:
Mesh:
Year: 2015 PMID: 26634818 PMCID: PMC4669482 DOI: 10.1038/srep17662
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Characteristics of the G. barbadense genome.
| Whole genome | Allocated to subgenomes | |||
|---|---|---|---|---|
| At-subgenome | Dt-subgenome | Ungrouped | ||
| Assembly | ||||
| Scaffold N50 (Mb) | 0.260 | 0.253 | 0.306 | 0.157 |
| Maximum scaffold length (Mb) | 2.15 | 1.63 | 2.15 | 0.96 |
| Minimum scaffold length (Mb) | 0.001 | 0.001 | 0.001 | 0.001 |
| Number of scaffolds | 29,751 | 14,319 | 6,967 | 8,465 |
| Total length of assemblies (Mb) | 2,573.19 | 1,493.53 | 852.98 | 226.68 |
| Total gaps in assemblies (Mb) | 334.55 | 223.47 | 82.35 | 28.72 |
| Annotation | ||||
| Number of protein-coding genes | 80,876 | 36,947 | 34,575 | 9,354 |
| Average gene density (per 100 kb) | 3.14 | 2.47 | 4.05 | 4.12 |
| Average exon/intron sizes (bp) | 283.40/423.41 | 283.07/434.49 | 281.81/422.92 | 290.92/449.44 |
| Total size of transposable elements (Mb) | 1,778.62 | 1097.99 | 541.57 | 139.06 |
Figure 1Assembling the allotetraploid genome of G. barbadense.
(a) Three-dimensional scatter plot with density contour shows the mapping base pair coverage of shotgun sequencing reads derived from G. arboreum, G. raimondii and G. hirsutum to the assembled scaffolds of G. barbadense tetraploid genome. (b) Examples of G. barbadense scaffolds assigned to At or Dt by mapping of shotgun sequencing reads from diploid progenitor genomes (red tracks for A genome clade; green track for tetraploid AD1 genome; blue tracks for D5 genome clade). Mapping density scale is the same (maxima 100 in each track) for all tracks. (c) Circos plot shows the genome-wide alignments between the TM-1 genome11 and the two subgenomes of G. barbadense. I: syntenic alignments between the At, Dt and TM-1 genome; II: SNP density between G. barbadense and G. hirsutum (window size is 1 Mb); III: SNP density in At (G. barbadense vs. G. arboreum) and in Dt (G. barbadense vs. G. raimondii). IV-VII: coverage of G. raimondii, G. herbaceum, G. arboreum and G. hirsutum mapped to the scaffolds of G. barbadense, respectively; VIII: gene density; IX: TE density; X: the length of pseudo-chromosome.
Figure 2Expansions of LTR retrotransposons in the At subgenome of G. barbadense.
(a) Distribution of different classes of transposable elements. (b) LTR retrotransposons are clustered into putative families according to reciprocal blast similarity. The Gossypium lineage shared (I), divergent (II) and At lineage specific (III) LTR transposons are enlarged. (c) The insertion time of LTR retrotransposons in At and Dt respectively.
Figure 3Impact of allotetraploidization on genomic variation.
(a) An example of At-Dt-hybrid scaffold. The mapping tracks, order and density scale are same as in Fig. 1b. (b) Evaluating assembly quality of hybrid scaffold at junction region. The y-axis indicates that the fragment size of paired-end reads in log scale, the x-axis indicates the length of the scaffold. As shown, single-end shotgun reads (blue, ~100 bp) and paired-end shotgun reads (white, ~300 bp) were clustered into contigs, which were further connected by abundant DNA-PET reads (size range of 5, 10, and 20 Kb) in this scaffold. (c) Genome-wide distribution of inter-subgenome translocations. Using the D5 genome reference as a framework, all 77 putative translocation sites are indicated along the 13 chromosomes by triangles. (d) Lineage-specific SNP divergence in diploid genomes and tetraploid subgenomes. The right panel provides examples of genes that were affected by lineage-specific non-synonymous SNPs in subgenomes of G. barbadense (two MYB genes holding lineage-specific SNPs in At and two auxin-responsive factor genes holding lineage-specific SNPs in Dt). The RNA-Seq tracks show the expression profile of the four genes (the details of sample was listed in the Supplementary Table 10) and adjusted on the same scale.
Figure 4Dynamic changes of genes and their expression in allotetraploid subgenomes of G. barbadense.
(a) Dot plot shows the number of orthologous genes in subgenomes. The y-axis represents the log2 ratio of gene number changes of At versus Dt using the D5 reference genome as normalizer. The x-axis represents the number of orthologous gene clusters. Expanded orthologous gene clusters with biologically relevant functions (GO terms) are indicated using colored dots. (b) Enriched representatives of GO items of genes expanded in At (category I), expanded in Dt (III), and unchanged (II) orthologous gene clusters. The dotted red line indicates the GO enrichment cutoff at p-value = 0.01. (c) Phylogenetic tree of HD-Zip genes from G. barbadense, G. raimondii, and Arabidopsis. Highlighted is a sub-phylogenetic clade. Right panel is the expression profile of corresponding HD-Zip genes. (d) Transcriptional divergence of homoeologous gene pairs in G. barbadense. Heat map shows normalized expression levels for 6,461 paired homoeologous genes. (e) An example of homoeologous gene pairs and their differential expression patterns.
Figure 5Subgenomic contribution of CesA functions to fibre development in allotetraploid G. barbadense.
(a) Phylogenetic tree of CesA orthologous gene family from G. barbadense, G. raimondii, as well as Arabidopsis. According to their function in Arabidopsis, CesAs were grouped into primary (P1 to P3) and secondary (S1 to S3) clusters in cell wall cellulose biosynthesis. In each of the clusters, the numbers of CesA genes are shown in parentheses in the order of Arabidopsis, D5 genome, At and Dt. (b) Expression profiles of CesA genes in both At and Dt of G. barbadense. The expression profiles of major contributing genes are indicated with their gene IDs. For the P3 group, the expression data of eight CesA genes in At (up-left) and the 7 CesA genes in Dt (up-right) are summed to show expression profile, respectively. (c) A proposed CesA “relay race” working model for fibre development in allotetraploid cotton.