| Literature DB >> 35012468 |
David J Wright1, Nicola A L Hall2,3, Naomi Irish1, Angela L Man1, Will Glynn1, Arne Mould2,3, Alejandro De Los Angeles2,3, Emily Angiolini1, David Swarbreck1, Karim Gharbi1, Elizabeth M Tunbridge2,3, Wilfried Haerty4.
Abstract
BACKGROUND: Alternative splicing is a key mechanism underlying cellular differentiation and a driver of complexity in mammalian neuronal tissues. However, understanding of which isoforms are differentially used or expressed and how this affects cellular differentiation remains unclear. Long read sequencing allows full-length transcript recovery and quantification, enabling transcript-level analysis of alternative splicing processes and how these change with cell state. Here, we utilise Oxford Nanopore Technologies sequencing to produce a custom annotation of a well-studied human neuroblastoma cell line SH-SY5Y, and to characterise isoform expression and usage across differentiation.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35012468 PMCID: PMC8744310 DOI: 10.1186/s12864-021-08261-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Comparison of Sequin spike-in sensitivity of detection between ONT (long read) and short read sequencing (Illumina paired-end, SRS) sequencing for Sequin synthetic spike-ins MixA (A) and MixB (B). Labelled sequin (red point) in each plot is at ONT limit of quantification (LOQ) in each mix; mix A 0.059 attomol/μl and mix B 0.27 attomol/μl. Plot C shows correlation of ONT differential isoform observed vs observed expression (log2 fold change + 1) of synthetic Sequin spike-ins of known concentration. Pearson correlation coefficient is displayed along with a linear regression trend line with standard error in pale grey
Fig. 2A Mean normalised coverage (± std. dev) across transcript normalised positions for ONT and Illumina libraries calculated with picard toolkit. B Custom UCSC Genome Browser visualization of the full coverage of short read RNA-Seq and a subset of long read RNA across a representative genome model for two samples (undifferentiated: blue, differentiated: red). Full UCSC tracks for visualisation of all sequencing reads are available as supplementary material
Fig. 3Breakdown of novel transcripts identified using ONT long reads and TALON custom transcriptome annotation. Cassette exons are previously unannotated positions. *CPAT assessment of CDS coding probability (CP ≥ 0.364). **Novel junctions are previously unannotated junctions identified between existing exonic parts. All novel assessments are relative to Gencode v29 human transcriptome annotation
Showing number of genes and transcripts differentially expressed between undifferentiated and differentiated cells, after multiple testing correction using ONT long read counts (FDR < 0.05). Bracketed numbers refer to the portion of total that are TALON-identified novel transcripts. U = undifferentiated and D = differentiated cells, with arrows displaying expression directionality
| Metric | Count | |
|---|---|---|
| Gene level | Transcript level (Talon) | |
| Total features assessed | 32,977 | 99,067 (1855) |
| Differentially Expressed | 4239 | 5456 (197) |
| ↑U ↓D (all > 0 log2FC) | 2041 | 2390 (67) |
| ↑D ↓U (all < 0 log2FC) | 2198 | 3066 (130) |
Fig. 4Panel of A gene-level differential expression (DE) smear plot, solid red lines highlighting ±1.5 logFC threshold, B gene-level DE of CACNA2D2 during differentiation, C isoform-level DE smear plot with ±1.5 logFC threshold, D CACNA2D2 isoform expression, showing novel TALON isoform with highest read count. Red points on smear plots indicate significant differential expression (FDR < 0.05). Boxplots display median and IQR. Short and long read mapping example provided in Fig. S8