| Literature DB >> 30422122 |
Zhiying Li1,2,3,4, Jiabin Wang1,2,3,4, Yunliu Fu1,2,3,4, Yu Gao1,5, Hunzhen Lu1,5, Li Xu1,2,3,4.
Abstract
Anthurium andraeanum is a popular tropical ornamental plant. Its spathes are brilliantly coloured due to variable anthocyanin contents. To examine the mechanisms that control anthocyanin biosynthesis, we sequenced the spathe transcriptomes of 'Albama', a red-spathed cultivar of A. andraeanum, and 'Xueyu', its anthocyanin-loss mutant. Both long reads and short reads were sequenced. Long read sequencing produced 805,869 raw reads, resulting in 83,073 high-quality transcripts. Short read sequencing produced 347.79 M reads, and the subsequent assembly resulted in 111,674 unigenes. High-quality transcripts and unigenes were quantified using the short reads, and differential expression analysis was performed between 'Albama' and 'Xueyu'. Obtaining high-quality, full-length transcripts enabled the detection of long transcript structures and transcript variants. These data provide a foundation to elucidate the mechanisms regulating the biosynthesis of anthocyanin in A. andraeanum.Entities:
Mesh:
Substances:
Year: 2018 PMID: 30422122 PMCID: PMC6233480 DOI: 10.1038/sdata.2018.247
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Metadata of samples submitted to the NCBI Sequence Read Archive.
| Source | Library strategy | Samples | Library layout | Platform | Instrument model | Biosample accession | Tissue |
|---|---|---|---|---|---|---|---|
| Albama | RNA-Seq | Albama_1 | paired | ILLUMINA | Illumina HiSeq 4000 | SAMN08322140 | Spathe |
| Albama | RNA-Seq | Albama_2 | paired | ILLUMINA | Illumina HiSeq 4000 | SAMN08322141 | Spathe |
| Albama | RNA-Seq | Albama_3 | paired | ILLUMINA | Illumina HiSeq 4000 | SAMN08322142 | Spathe |
| Xueyu | RNA-Seq | Xueyu_1 | paired | ILLUMINA | Illumina HiSeq 4000 | SAMN08322143 | Spathe |
| Xueyu | RNA-Seq | Xueyu_2 | paired | ILLUMINA | Illumina HiSeq 4000 | SAMN08322144 | Spathe |
| Xueyu | RNA-Seq | Xueyu_3 | paired | ILLUMINA | Illumina HiSeq 4000 | SAMN08322145 | Spathe |
| Albama and Xueyu | RNA-Seq | Mixed samples | single | PACBIO_SMRT | PacBio RS II | SAMN08322146 | Spathe |
Summary of long read filtering.
| Library | reads of insert | five prime reads | three prime reads | poly-A reads | full-length non-chimeric reads | full-length non-chimeric read length(bp) |
|---|---|---|---|---|---|---|
| between1k2k | 258848 | 171,398(66.22%) | 174,002(67.22%) | 166,730(64.41%) | 132,754(51.29%) | 1836 |
| between2k3k | 172219 | 96,963(56.3%) | 102,382(59.45%) | 94,980(55.15%) | 69,908(40.59%) | 2967 |
| between3k6k | 174783 | 88,434(50.6%) | 90,415(51.73%) | 78,934(45.16%) | 53,959(30.87%) | 4026 |
| under1k | 200019 | 150,610(75.3%) | 160,467(80.23%) | 153,074(76.53%) | 131,224(65.61%) | 703 |
Cluster of long reads.
| Library | Cluster type | Total isoforms | Total base(bp) | Mean Quality | Mean isoform length(bp) | Mean Full length coverage |
|---|---|---|---|---|---|---|
| between1k2k | High quality | 40898 | 74299859 | 0.9967 | 1817 | 2.8 |
| between1k2k | Low quality | 18000 | 38692106 | 0.3382 | 2150 | 1.01 |
| between2k3k | High quality | 20121 | 57171114 | 0.9953 | 2841 | 2.4 |
| between2k3k | Low quality | 21410 | 71870532 | 0.4915 | 3357 | 1.01 |
| between3k6k | High quality | 18403 | 68961773 | 0.9916 | 3747 | 1.81 |
| between3k6k | Low quality | 20589 | 93097977 | 0.4182 | 4522 | 1 |
| under1k | High quality | 17162 | 11707217 | 0.9991 | 682 | 5.1 |
| under1k | Low quality | 12006 | 9306751 | 0.3018 | 775 | 3.64 |
Summary of short read filtering.
| Sample | Total Raw Reads(Mb) | Total Clean Reads(Mb) | Total Clean Bases(Gb) | Clean Reads Q20(%) | Clean Reads Q30(%) | Clean Reads Ratio(%) |
|---|---|---|---|---|---|---|
| R1 | 52.25 | 44.24 | 6.64 | 98.61 | 95.75 | 84.66 |
| R2 | 58.78 | 44.62 | 6.69 | 98.62 | 95.77 | 75.91 |
| R3 | 58.78 | 44.13 | 6.62 | 98.59 | 95.68 | 75.08 |
| W1 | 60.42 | 44.4 | 6.66 | 98.6 | 95.72 | 73.49 |
| W2 | 58.78 | 45.22 | 6.78 | 98.48 | 95.39 | 76.93 |
| W3 | 58.78 | 45.1 | 6.77 | 98.45 | 95.31 | 76.73 |
Summary of short read de novo assembly.
| Sample | Total Number | Total Length | Mean Length | N50 | N70 | N90 | GC(%) |
|---|---|---|---|---|---|---|---|
| R1 | 61609 | 54847001 | 890 | 1561 | 939 | 329 | 48.36 |
| R2 | 61048 | 55007752 | 901 | 1579 | 948 | 335 | 48.31 |
| R3 | 60934 | 54374909 | 892 | 1560 | 939 | 330 | 48.35 |
| W1 | 64474 | 57552118 | 892 | 1579 | 937 | 329 | 48.2 |
| W2 | 68776 | 62144741 | 903 | 1620 | 964 | 330 | 47.49 |
| W3 | 67950 | 61466947 | 904 | 1606 | 965 | 332 | 47.57 |
| All-Unigene | 111674 | 110235185 | 987 | 1875 | 1166 | 340 | 47.45 |
Figure 1Length distributions of transcripts and CDS.
(a) The transcript lengths of the HQ transcripts and unigenes. (b) The CDS length distribution of the transcripts.
Figure 2Annotation and Blast results for the HQ transcripts and unigenes.
(a) HQ transcripts and unigenes were mapped to the NR, KEGG, COG, Swiss-Prot and InterPro databases. (b) HQ transcripts were mapped to unigenes with different similarity levels (H, identity more than 95%; L, identity less than 95%; U, no similarity).
Figure 3Volcano plot of differently expressed genes between ‘Xueyu’ and ‘Albama’.
(a) The volcano plot of unigenes. (b) The volcano plot of HQ transcripts. The X-axis represents –log10-transformed significance. The Y-axis represents log2-transformed fold change. The red dot indicates differently expressed genes.