| Literature DB >> 30253752 |
Juan P Romero1, María Ortiz-Estévez2, Ander Muniategui1, Soraya Carrancio2, Fernando J de Miguel3, Fernando Carazo1, Luis M Montuenga3,4,5,6, Remco Loos2, Rubén Pío3,5,7,6, Matthew W B Trotter2, Angel Rubio8.
Abstract
BACKGROUND: RNA-seq is a reference technology for determining alternative splicing at genome-wide level. Exon arrays remain widely used for the analysis of gene expression, but show poor validation rate with regard to splicing events. Commercial arrays that include probes within exon junctions have been developed in order to overcome this problem. We compare the performance of RNA-seq (Illumina HiSeq) and junction arrays (Affymetrix Human Transcriptome array) for the analysis of transcript splicing events. Three different breast cancer cell lines were treated with CX-4945, a drug that severely affects splicing. To enable a direct comparison of the two platforms, we adapted EventPointer, an algorithm that detects and labels alternative splicing events using junction arrays, to work also on RNA-seq data. Common results and discrepancies between the technologies were validated and/or resolved by over 200 PCR experiments.Entities:
Keywords: Alternative splicing; Microarrays; RNA-seq
Mesh:
Year: 2018 PMID: 30253752 PMCID: PMC6156849 DOI: 10.1186/s12864-018-5082-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1EventPointer overview for junction arrays and RNA-Seq data. a The CEL or BAM files are the input data for each technology. The splicing graph for each gene is built using the array annotation files or directly using the sequenced reads. b Each node in the splicing graph is splitted into two nodes that correspond to the start and end positions in the genome respectively. EventPointer identifies events within each gene and annotates the type of event. In the figure, among the events in the gene, an exon cassette is highlighted. c Statistical significance of the events is computed. d Finally, the top-ranked events are validated using PCR and the results visualized in IGV
Number and statistical significance of detected AS events using both RNA-seq and array technologies
| Expression Threshold | Detected Events | Significant events | FDR for significant |
|---|---|---|---|
| RNASeq | |||
| Junction coverage > 6 FPKM | 9277 | 4526 | 2.7e-4 |
| Junction coverage > 2 FPKM | 34,961 | 13,780 | 4.7e-4 |
| Junction coverage > 2/3 FPKM | 92,986 | 29,443 | 7.0e-4 |
| Exon-junction arrays | |||
| Signal > 50% | 10,114 | 2385 | 9.2e-4 |
| Signal > 25% | 31,506 | 6197 | 1.37e-3 |
| No threshold | 92,405 | 11,761 | 3.45e-3 |
for different expression thresholds, default filters are junction coverage greater than 2 FPKM for RNA-seq and probe-set signal greater than top 25% quantile for microarray
Number of AS events detected per technology, alongside statistical significance of events against distinct thresholds
| Matched Events | |||||
|---|---|---|---|---|---|
| Matched Events | Significant in both (R+M+) | FDR (RNASeq) | FDR (arrays) | ||
| 6222 | 1324 | 4.96e-4 | 6.23e-4 | ||
| R+M∅ | R∅M+ | ||||
| Expression Threshold | Detected Events | FDR | Expression Threshold | Detected Events | FDR for significant |
| Junction coverage > 6 FPKM | 2973 | 2.44e-4 | Signal > 50% | 1016 | 1.46e-3 |
| Junction coverage > 2 FPKM | 10,617 | 4.56e-4 | Signal > 25% | 3297 | 1.99e-3 |
| Junction coverage > 2/3 FPKM | 25,063 | 6.90e-4 | No threshold | 7581 | 4.85e-3 |
Where thresholds not shown, default filters were employed (junction coverage > 2 FPKM for RNA-seq; upper quartile probe signal for microarrays)
Fig. 2Correspondence between the events detected by arrays and RNA-seq. An event is considered to be significant if the p.value is smaller than 0.001 and non-significant is it is larger than 0.2. Events with p-values between both are considered to be inconclusive cases
Fig. 3FDR for different types of events using both technologies. Panel a shows the FDR for matched events. Panel b shows FDR for the events detected in each technology regardless or being matched or not. In both technologies, alternative 3′, 5′, first and last exons have larger FDR than other types of events
PCR validation for RNA-seq and microarray technologies across events detected by one or both technologies
| AS Event Category | RNA-seq | Arrays |
|---|---|---|
| Top-ranked events (topRNA, topArrays) | 5/5 | 5/5 |
| Significant in RNA-seq and not detected by arrays (R + M∅) | 5/5 | – |
| Significant in arrays and not detected by RNA-seq (R∅M+) | – | 5/5 |
| Detected by both. Significant in RNA-seq, not significant in arrays (R + M-) | 5/5 | |
| Detected by both. Significant in arrays, not significant in RNA-seq (R-M+) | 3/5 | |
| Detected by both. Significant and coherent events (R + M+) | 5/5 | |
Values reported are validations / events selected
Fig. 4Estimated PSI (for RNA-seq, microarrays and PCR image analysis), PCR bands, the reference HTA transcriptome and the alternative paths of the DONSON (panel a) and MELK (panel b) genes in RM. Each of the points represents the same replicate in either of the three technologies. The last numbers shown are expected bands for the selected primers. If the number is shown to the left side of the double bars, the band corresponds to Path 1 of the event (long path). If shown to the right side, corresponds to Path 2 (short path)
Fig. 5Increment of PSI for both microarrays and RNA-seq. The black (gray) dots represent events with high (low) standard deviation in the differential usage of the isoforms in both paths. Correlation between events with high and low variability are 0.90 and 0.61 respectively
Fig. 6a Events detected using RNA-seq and array technologies. b Type of event after matching the events detected by both technologies
Obtained FDR values after subsampling the number of reads in the RNA-seq experiment
| Decimation percentage | FDR |
|---|---|
| Decimated 10% | 1.86e-3 |
| Decimated 30% | 6.97e-4 |
Resources required for both technologies. Analysis was performed on 16 cores (Intel Xeon E5–2670 @ 2.60 GHz) with 64 GB of RAM Linux server running 64-bit CentOS distribution
| Computing time | Memory requirements | Storage requirements | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| Mapping to the transcriptome (STAR) | 11.5 h | – | 32Gb | – | 1023 GB | – |
| Splicing graph generation (SGSeq) | 2 days | 14 h | 8 Gb per core | 5 Gb | 70.3 Mb | 2 Gb |
| Event detection | 7 min 16 s | 1.2 Gb per core | 643.6 Mb | |||
| Statistical analysis | 1 min 43 s | 3 min 06 s | 2 Gb | < 1Gb | 6.2 Mb | 11 Mb |