| Literature DB >> 24916971 |
Isobel A P Parkin, Chushin Koh, Haibao Tang, Stephen J Robinson, Sateesh Kagale, Wayne E Clarke, Chris D Town, John Nixon, Vivek Krishnakumar, Shelby L Bidwell, France Denoeud, Harry Belcram, Matthew G Links, Jérémy Just, Carling Clarke, Tricia Bender, Terry Huebert, Annaliese S Mason, J Chris Pires, Guy Barker, Jonathan Moore, Peter G Walley, Sahana Manoli, Jacqueline Batley, David Edwards, Matthew N Nelson, Xiyin Wang, Andrew H Paterson, Graham King, Ian Bancroft, Boulos Chalhoub, Andrew G Sharpe.
Abstract
BACKGROUND: Brassica oleracea is a valuable vegetable species that has contributed to human health and nutrition for hundreds of years and comprises multiple distinct cultivar groups with diverse morphological and phytochemical attributes. In addition to this phenotypic wealth, B. oleracea offers unique insights into polyploid evolution, as it results from multiple ancestral polyploidy events and a final Brassiceae-specific triplication event. Further, B. oleracea represents one of the diploid genomes that formed the economically important allopolyploid oilseed, Brassica napus. A deeper understanding of B. oleracea genome architecture provides a foundation for crop improvement strategies throughout the Brassica genus.Entities:
Mesh:
Year: 2014 PMID: 24916971 PMCID: PMC4097860 DOI: 10.1186/gb-2014-15-6-r77
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Assembly statistics for
| 33,459 | Number of bases | 488,574,611 | |
| 4,900,790 | | | |
| 169 | N50 size | 850,003 | |
| 236 | N60 size | 654,695 | |
| 321 | N70 size | 488,433 | |
| 443 | N80 size | 328,342 | |
| 659 | N90 size | 142,857 |
aData are reported for all scaffolds greater than 200 bp.
Figure 1Comparison of efficacy of GBS methods. (a) Distribution of restriction sites across the B. oleracea genome, representing potential tag sites for RAD (blue) and GBS (red). (b) Observed tag coverage for restriction sites within the B. oleracea genome for RAD (blue) and GBS (red). A sliding window of 500 kb was used and the trend line is based on the mean of 10 windows.
Genetic anchoring of TO1000 assembly to pseudochromosomes (C1 to C9)
| C1 | 154 | 44,537,578 | 9.1% | 106 | 43,754,388 (9.0) |
| C2 | 161 | 53,780,918 | 11.0% | 111 | 52,875,895 (10.8) |
| C3 | 142 | 65,831,836 | 13.5% | 102 | 64,974,595 (13.5) |
| C4 | 235 | 55,258,765 | 11.3% | 138 | 53,705,393 (11.0) |
| C5 | 141 | 48,366,635 | 9.9% | 99 | 46,892,785 (9.6) |
| C6 | 132 | 40,383,462 | 8.3% | 79 | 39,814,676 (8.1) |
| C7 | 129 | 49,639,853 | 10.2% | 64 | 48,360,397 (9.9) |
| C8 | 124 | 43,398,395 | 8.9% | 63 | 41,752,485 (8.5) |
| C9 | 192 | 56,144,286 | 11.5% | 121 | 54,667,868 (11.2) |
| Total | 1,410 | 457,341,728 | 93.6% | 883 | 446,798,482 (91.5) |
The total number of assembled bases was 488,535,107; the percentage genetically linked (RAD mapping) was 93.61%; the percentage anchored to pseudochromosomes was 91.46%.
Figure 2Distribution of unique and shared gene families among Brassicaceae species. Homologous proteins in A. thaliana, A. lyrata, B. rapa and B. oleracea were clustered into gene families using TRIBE-MCL. Numbers in individual sections indicate number of gene families (not genes).
Figure 3The genome. From the outside ring to the centre: 1) the nine B. oleracea pseudochromosomes (C1 to C9 represented on a Mb scale) are shown in different colors with putative centromeric regions indicated by black bands; 2) gene expression levels (average (log (FPKM)), bin = 500 kb), values range from 0 (yellow) to 3.19 (red); 3) the distribution of protein coding regions (nucleotides per 100 kb; orange) compared to repetitive sequences (nucleotides per 100 kb; yellow); 4) cytosine methylation levels (average number of methylated cytosines, bin = 500 kb) for mCG (blue), mCHG (yellow) and mCHH (grey); and 5) Ka/Ks ratios (median, bin = 500 kb) of syntenic (black) and non-syntenic (green) genes.
Summary of repeat elements annotated in and
| | | | | | ||
|---|---|---|---|---|---|---|
| | | 283,841,084 | 273,102,035 | | 488,622,507 | 445,620,295 |
| RLC | 30,349 | 11,292,047 | 4.13% | 77,899 | 42,075,014 | 9.44% |
| RLG | 19,229 | 9,327,740 | 3.42% | 52,619 | 36,956,399 | 8.29% |
| RLX | 11,358 | 3,768,473 | 1.38% | 22,357 | 10,165,627 | 2.28% |
| RSX | 4,248 | 549,493 | 0.20% | 8,442 | 1,141,202 | 0.26% |
| RIL | 2,658 | 1,200,998 | 0.44% | 4,845 | 2,500,079 | 0.56% |
| RIX | 4,432 | 2,059,525 | 0.75% | 6,453 | 4,002,179 | 0.90% |
| Subtotal | | | | | ||
| DTA | 14,722 | 2,915,918 | 1.07% | 22,838 | 5,053,940 | 1.13% |
| DTC | 17,742 | 5,289,213 | 1.94% | 44,958 | 16,124,758 | 3.62% |
| DTH | 2,057 | 581,852 | 0.21% | 4,028 | 1,193,462 | 0.27% |
| DTM | 13,307 | 3,022,686 | 1.11% | 19,208 | 5,098,147 | 1.14% |
| DTT | 16,867 | 2,720,339 | 1.00% | 33,537 | 6,425,309 | 1.44% |
| DTX | 29,919 | 7,453,903 | 2.73% | 46,687 | 12,566,497 | 2.82% |
| DHH | 46,182 | 10,206,949 | 3.74% | 67,720 | 18,514,526 | 4.15% |
| Subtotal | | | | | ||
| Unclassified | | 2,183,677 | 0.80% | | 3,945,235 | 0.89% |
| Total |
aRepeat elements are named according to Wicker et al. [44].
Figure 4Alignment of the genome with that of and . (a) Alignment with B. rapa genome; (b) alignment with A. thaliana genome. Dot-plots showing Nucmer alignments of stretches of sequence similarity between the genomes.
Figure 5Derived ancestral block structure for and .
Figure 6Cytosine methylation levels across specific categories of genes of the genome. The mCG (red), mCHG (green) and mCHH (blue) levels are shown for each gene model (includes promoter regions, UTRs, exons, introns and 3′ flanking), based on a sliding window of 500 kb.
Figure 7Correlation of methylation status with gene expression and genome triplication in . (a) Expression levels (log(FPKM)) plotted against mCG gene body methylation levels. (b) Box plot representation of different levels of mCG gene body methylation in syntenic genes (along x-axis) with normalized gene expression levels plotted on the y-axis. (c) Box plot representation of different levels of mCG observed across the three sub-genomes. (d) Correlation of gene expression (FPKM) and methylation levels among the fully retained orthologues of the three genomes. Below the diagonal, positive and negative pair-wise correlations are indicated in blue and red, respectively. Darker coloring indicates a greater magnitude for the correlation. Above the diagonal, the color and extent of the filled area of each of the pie-charts represents the strength of each pair-wise correlation. Positive and negative correlations are indicated by the pie being filled in a clockwise or anticlockwise direction, respectively.
Figure 8Genome dominance and functional diversification of homologues retained across three sub-genomes. (a) Cumulative frequency of homologous genes belonging to the three sub-genomes with highest expression across all tissue types. P-values were calculated for interaction between sub-genomes (G) and tissue-type (T) effects on expression. (b) Hierarchical clustering of gene expression profiles for fully retained triplicated genes across four tissue types. Red and blue indicate lowest and highest expression values, respectively. Intermediate expression values follow a rainbow coloring pattern. The dotted lines to the right correspond to partitioning of the genes into 15 clusters.