| Literature DB >> 30579326 |
Shubin Li1,2, Micai Zhong3,4, Xue Dong3, Xiaodong Jiang3,4, Yuxing Xu4, Yibo Sun3,4, Fang Cheng3, De-Zhu Li5, Kaixue Tang2, Siqing Wang6, Silan Dai7, Jin-Yong Hu8.
Abstract
BACKGROUND: Roses are important plants for human beings with pivotal economical and biological traits like continuous flowering, flower architecture, color and scent. Due to frequent hybridization and high genome heterozygosity, classification of roses and their relatives remains a big challenge.Entities:
Keywords: Comparative transcriptomics; Rosa sp.; Rosa-specific; Rosaceae-common; Selection pattern
Mesh:
Year: 2018 PMID: 30579326 PMCID: PMC6303930 DOI: 10.1186/s12870-018-1585-x
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Fig. 1Leaf (a) and shoot (b) materials used for RNA-seq in this study. For each panel, left for Rosa wichuriana ‘Basyes’ Thornless’ (BT), and right for R. chinensis ‘Old Blush’ (OB). Bars = 1 cm
Fig. 2Working flow for assembling of reference trancriptomes and identification of Rosacaeae-common and Rosa-specific transcripts. Main steps were shown in boxes with key transcripts numbers given. Major tools used in these analysis were marked in blue. Dashed arrows and boxes indicated the data generated from this study could be explored in these applications
Summary of sequencing strategies and sequences obtained
| Species | Sample | Repetition | Reads number | Reads bases (nt) | Q20 (%) | Accession code |
|---|---|---|---|---|---|---|
| SAMa | Total | 156,722,692 | 14,104,983,240 | 95.8 | _ | |
| Rep1 | 51,669,692 | 4,650,272,280 | SAMN07808870 | |||
| Rep2 | 53,341,618 | 4,800,745,620 | SAMN07808871 | |||
| Rep3 | 51,710,726 | 4,653,965,340 | SAMN07808872 | |||
| leaf_novb | Total | 131,025,452 | 18,923,248,232 | 95.8 | _ | |
| Rep1 | 42,294,066 | 6,100,424,928 | SAMN07808867 | |||
| Rep2 | 42,725,012 | 6,170,189,794 | SAMN07808868 | |||
| Rep3 | 46,006,374 | 6,652,633,510 | SAMN07808869 | |||
| leaf_marb | Total | 134,110,978 | 19,389,814,312 | 96.1 | _ | |
| Rep1 | 44,236,512 | 6,418,203,960 | SAMN07808864 | |||
| Rep2 | 46,651,880 | 6,762,697,476 | SAMN07808865 | |||
| Rep3 | 43,222,586 | 6,208,912,876 | SAMN07808866 | |||
| SAMa | Total | 159,834,774 | 14,385,129,660 | 95.3 | _ | |
| Rep1 | 52,443,532 | 4,643,438,236 | SAMN07808861 | |||
| Rep2 | 53,284,740 | 4,800,745,624 | SAMN07808862 | |||
| Rep3 | 54,106,502 | 4,940,945,800 | SAMN07808863 | |||
| leaf_novb | Total | 137,399,212 | 19,719,505,590 | 95.8 | _ | |
| Rep1 | 44,950,340 | 6,501,086,190 | SAMN07808858 | |||
| Rep2 | 45,260,556 | 6,426,857,278 | SAMN07808859 | |||
| Rep3 | 47,188,316 | 6,791,562,122 | SAMN07808860 | |||
| leaf_marb | Total | 130,341,518 | 18,669,073,728 | 96.7 | _ | |
| Rep1 | 42,299,982 | 6,021,996,256 | SAMN07808855 | |||
| Rep2 | 43,001,278 | 6,170,274,204 | SAMN07808856 | |||
| Rep3 | 45,040,258 | 6,476,803,268 | SAMN07808857 | |||
| all other datac | 550,108,308 | 63,356,156,640 | 95.6 | _ |
Data are sum of three biological replications. aand b, samples sequenced via Illumina pair-end methods (PE100bp for a and PE150bp for b);c, data from references (see Table 2)
Statistics of final assemblies for this study and published data
| Assembly components | Contig number | Transcript number | Transcript N50 | GC content (%) | Total assembled bases | Average length (bp) | Data sources |
|---|---|---|---|---|---|---|---|
| BTa | 86,642 | 68,612 | 2099 | 46.4 | 92 M | 1338 | This study |
| OBa | 99,456 | 81,389 | 2092 | 44.2 | 111 M | 1359 | This study |
| OB | na. | 80,714 | na | na. | 36 M | 444 | Dubois et al. 2012 [ |
| OBb | na. | 68,565 | na. | 46.46 | 61 M | 887 | Yan et al. 2016 [ |
| OB | na | 85,663 | na | na. | 70 M | 814 | Guo et al. 2017 [ |
| OBc | 208,039 | 111,954 | 1997 | 45.8 | 231Mc | 1111 | Han et al. 2017 [ |
| Core set 1 BT vs. OB; 90% identity | na. | 10,773 | 2282 | na. | 20 M | 1863 | This study |
|
| 78,676 | 61,864 | 1907 | 46.03 | 75 M | 1216 | Zhang et al. 2016 [ |
| na. | 80,226 | na | na. | 60 M | 743 | Gao et al. 2016 [ | |
|
| na. | 106,590 | na | na. | 37 M | 343 | Yan et al. 2015 [ |
|
| 93,947 | na. | 1589 | na. | na. | na. | [ |
| na. | 89,614 | Na. | na. | 38 M | 428 | Yan et al. 2014 [ | |
| 60,944 | na. | 314 | na. | 18 M | 302 | Fei lab | |
| Core set 2 All samples; 80% identity | na. | 5959 | 2326 | na. | 13 M | 2161 | This study |
aassembly based on data produced from this study; b, assembly based on data from this study and references Yan et al. [20] and Han et al. [27]; c, conceptual confusion in original text; d, data from Fei lab (http://bioinfo.bti.cornell.edu/cgi-bin/rose_454/index.cgi) with transcript N50 and average length recalculated
Fig. 3The assembly of high quality transcriptomes for roses. a Length distribution in proportion of assembled unigenes for the two species, Rosa chinensis ‘Old Blush’ (OB, bars filled in grey color), and R. wichuriana ‘Basyes’ Thornless’ (BT, open bars). Bars filled in black color mark the length distribution of shared transcripts between the two species (coreset1; see below and main text). b BUSCO analysis shows the completeness of assemblies and coreset1. c Annotation results of the assembled unigenes and core-sets for Rosa. The coreset1 is between the two species, while coreset2 is for the unigenes shared among Rosa (see Fig. 2) based on published and newly collected data from this study. For each category (Nr_plants, GO, Uniprot, Swissprot and COG databases), total unigene counts annotated in different databases besides the proportion (in brackets) are given. Shared and total unigenes annotated by all databases are also given. d Venn diagram shows the results of coreset1 identification. About 10,773 transcripts were identified at the 95% sequence identity level between the two species
Fig. 4Identification and characterization of Rosaceae-common potential coding gene. a Venn diagram shows the Rosaceae-common and Rosa-specific transcripts. Note that, except Rosa, transcripts specific for other genera were not identified (marked with na). For that we are not interested in other share sets. b GO enrichment analysis of the 4447 Rosaceae-common transcripts (http://bioinfo.cau.edu.cn/agriGO). X-axis shows the enrichment fold of specific GO terms in comparison with the background. BP, CC and MF mean biological process, cellular component and molecular function separately. The area indicates gene counts. c Representive phylogentic topologies based on 4447 Rosaceae-common genes. Upper panel indicates about 65% topologies (2812) supporting the clustering of Prunus with Malus, while the topology in lower panel is supported by 33% genes (1436). Numbers on branches indicate distance. d Three-dimensional plots for the genetic distances of the 4447 transcripts between Rosa and Fragaria (X-axis) /Malus (Y-axis) /Prunus (Z-axis). Black and blue dots mark the genes supporting the topologies in C (Black for upper panel and blue for lower panel), while gray dots show genes supporting other topologies. e Distribution and GO enrichment analysis of the 409 selected Rosaceae-common transcripts. Y-axis shows the enrichment fold of specific GO terms in comparison with the background. Only four GO items are significantly enriched (marked in orange color). f Clustered heat map comparing scaled expression values for the 409 selected Rosaceae-common transcripts. Yellow indicates higher while purple marks lower expression. Blue and red bars indicate membership in the identified transcription clusters
Fig. 5Identification and characterization Rosa-specific transcripts. a Heat map comparing scaled expression values for the 164 Rosa-specific transcripts. Yellow indicates higher while purple marks lower expression. Blue, green, yellow, and black bars indicate membership in the identified transcription clusters. b F-box and TMV resistance protein were significantly enriched in Rosa-specific transcripts. X2 tests were performed online (http://www.quantpsy.org/chisq/chisq.htm) by comparing the Rosa-specific transcripts number with those from Rosaceae-common, coreset1 and coreset2 genes. P values were corrected with Bonferroni correction