| Literature DB >> 26217816 |
Zhesi He1, Feng Cheng2, Yi Li1, Xiaowu Wang2, Isobel A P Parkin3, Boulos Chalhoub4, Shengyi Liu5, Ian Bancroft1.
Abstract
This data article reports the establishment of the first pan-transcriptome resources for the Brassica A and C genomes. These were developed using existing coding DNA sequence (CDS) gene models from the now-published Brassica oleracea TO1000 and Brassica napus Darmor-bzh genome sequence assemblies representing the chromosomes of these species, along with preliminary CDS models from an updated Brassica rapa Chiifu genome sequence assembly. The B. rapa genome sequence scaffolds required splitting and re-ordering to match the expected genome organisation based on a high density SNP linkage map, but the B. oleracea assembly was used unchanged. The resulting B. rapa (A genome) pseudomolecules contained 47,656 ordered CDS models and the B. oleracea (C genome) pseudomolecules contained 54,766 ordered CDS models. Interpolation of B. napus CDS models not already represented by orthologues resulted in 52,790 and 63,308 ordered CDS models in the A and C pan-transcriptomes, an increase of 13,676 overall. Comparison of the organisation of this resource with publicly available genome sequences for B. napus showed excellent consistency for the B. napus Darmor-bzh resource, but more breakdown of collinearity for the B. napus ZS11 resource. CDS datasets comprising the pan-transcriptomes are available with this article (B. rapa) or from public repositories (B. oleracea and B. napus).Entities:
Year: 2015 PMID: 26217816 PMCID: PMC4510581 DOI: 10.1016/j.dib.2015.06.016
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Collinearity of ordered pan-transcriptomes and the genome sequences of B. napus Darmor-bzh and B. napus ZS11. The positions of best sequence matches in the B. napus chromosome assemblies are plotted for CDS models with significant similarity matches (threshold e-value 1E−30) in the B. napus Darmor-bzh assembly and B. napus ZS11 assembly.
| Subject area | Biology |
| More specific subject area | Plant genome organisation |
| Type of data | CDS gene model sequences for the A genome, in FASTA format. Tables (in the form of MS Excel spreadsheets) providing A genome pseudomolecule specification based on genome sequence scaffolds, inferred order and anchoring positions in the A and C genome pseudomolecules for CDS models and a figure illustrating the collinearity of the ordered pan-transcriptome and two genome sequences reported for |
| How data was acquired | CDS gene model sequences for the A genome were developed as part of the reported work. Genome sequence scaffolds and other CDS data were obtained from the groups generating them prior to publication. |
| Data format | The data accompanying this article are provided as text files (for |
| Experimental factors | n/a |
| Experimental features | CDS modelling was undertaken using V2.0 |
| Data source location | SRA, NCBI, ENA |
| Data accessibility | All genome sequence datasets were provided for analysis prior to publications, but are now available: |
| The |