| Literature DB >> 26111100 |
Runxuan Zhang1, Cristiane P G Calixto2, Nikoleta A Tzioutziou2, Allan B James3, Craig G Simpson4, Wenbin Guo1,2, Yamile Marquez5, Maria Kalyna6, Rob Patro7, Eduardo Eyras8,9, Andrea Barta5, Hugh G Nimmo3, John W S Brown2,4.
Abstract
RNA-sequencing (RNA-seq) allows global gene expression analysis at the individual transcript level. Accurate quantification of transcript variants generated by alternative splicing (AS) remains a challenge. We have developed a comprehensive, nonredundant Arabidopsis reference transcript dataset (AtRTD) containing over 74 000 transcripts for use with algorithms to quantify AS transcript isoforms in RNA-seq. The AtRTD was formed by merging transcripts from TAIR10 and novel transcripts identified in an AS discovery project. We have estimated transcript abundance in RNA-seq data using the transcriptome-based alignment-free programmes Sailfish and Salmon and have validated quantification of splicing ratios from RNA-seq by high resolution reverse transcription polymerase chain reaction (HR RT-PCR). Good correlations between splicing ratios from RNA-seq and HR RT-PCR were obtained demonstrating the accuracy of abundances calculated for individual transcripts in RNA-seq. The AtRTD is a resource that will have immediate utility in analysing Arabidopsis RNA-seq data to quantify differential transcript abundance and expression.Entities:
Keywords: Arabidopsis thaliana; RNA-sequencing (RNA-seq); Sailfish; Salmon; alternative splicing; high resolution reverse transcription (HR RT)-PCR; transcripts per million
Mesh:
Substances:
Year: 2015 PMID: 26111100 PMCID: PMC4744958 DOI: 10.1111/nph.13545
Source DB: PubMed Journal: New Phytol ISSN: 0028-646X Impact factor: 10.151
Number of Arabidopsis thaliana genes and transcripts in different datasets
| Number of genes | Number of transcripts | Average transcripts per gene | |
|---|---|---|---|
| TAIR10 | 33 602 | 41 671 | 1.24 |
| Marquez | 23 905 | 57 408 | 2.40 |
| AtRTD v3 | 33 625 | 74 216 | 2.21 |
Note: TAIR10, The Arabidopsis Information Resource version 10; AtRTD, Arabidopsis reference transcript dataset. aContains redundant transcripts which differ only by different lengths of 5′ and 3′ UTRs. b De novo assembled transcripts defined by splice junctions. cMerged, nonredundant transcripts.
Figure 1Distribution of the number of transcripts per gene in The Arabidopsis Information Resource version 10 (TAIR10) and the Arabidopsis reference transcript dataset (AtRTD). The number of Arabidopsis thaliana genes (y‐axis) containing two or more transcripts (x‐axis) are shown.
Figure 2Arabidopsis thaliana gene and transcript structures and histograms of transcript ratios from transcripts per million (TPM) generated by Sailfish and Salmon with Arabidopsis reference transcript dataset (AtRTD) and from relative fluorescence units (RFUs) that measures peak areas from high resolution reverse transcription polymerase chain reaction (HR RT‐PCR. (a) At5g16820, (); (b) At2g39730, (). Transcripts are shown below the gene structure. Open boxes, exons; black rectangles, untranslated regions; thin lines, introns; diagonal lines, splicing events; arrowheads, approximate positions of primers used in HR RT‐PCR. FS, fully spliced; E, exon; I2R/I6R, intron retention of introns 2 and 6 in (a) and (b), respectively; Alt 3′/5′ss, alternative (a) 3′ and (b) 5′ splice sites, respectively; T1 and T2, different time‐points. Transcript variants from The Arabidopsis Information Resource (TAIR) and the alternative splicing (AS) discovery dataset (Marquez et al., 2012) are indicated by .1, .2, or _ID3, respectively. Error bars represent the ± standard deviation (SD) of three independent biological replicates.
Figure 3Correlation of the splicing ratios calculated from the RNA‐seq data and the high resolution reverse transcription polymerase chain reaction (HR RT‐PCR. (a) Sailfish, (b) Salmon. Splicing ratios for 50 alternative splicing events from 29 Arabidopsis thaliana genes (three biological replicates of the time points T1 and T2) generated 300 data points in total. AS, alternatively spliced; FS, fully spliced; TPM, transcripts per million; RFU, relative fluorescence unit. Pearson's correlation coefficient: Sailfish = 0.7044, Salmon = 0.9051. Spearman's rank correlation coefficient: Sailfish = 0.712, Salmon = 0.907.