| Literature DB >> 30462165 |
Masahide Seki1, Eri Katsumata1, Ayako Suzuki1, Sarun Sereewattanawoot1, Yoshitaka Sakamoto1, Junko Mizushima-Sugano1,2, Sumio Sugano1,3, Takashi Kohno4, Martin C Frith1,5,6, Katsuya Tsuchihara7, Yutaka Suzuki1.
Abstract
The current RNA-Seq method analyses fragments of mRNAs, from which it is occasionally difficult to reconstruct the entire transcript structure. Here, we performed and evaluated the recent procedure for full-length cDNA sequencing using the Nanopore sequencer MinION. We applied MinION RNA-Seq for various applications, which would not always be easy using the usual RNA-Seq by Illumina. First, we examined and found that even though the sequencing accuracy was still limited to 92.3%, practically useful RNA-Seq analysis is possible. Particularly, taking advantage of the long-read nature of MinION, we demonstrate the identification of splicing patterns and their combinations as a form of full-length cDNAs without losing precise information concerning their expression levels. Transcripts of fusion genes in cancer cells can also be identified and characterized. Furthermore, the full-length cDNA information can be used for phasing of the SNPs detected by WES on the transcripts, providing essential information to identify allele-specific transcriptional events. We constructed a catalogue of full-length cDNAs in seven major organs for two particular individuals and identified allele-specific transcription and splicing. Finally, we demonstrate that single-cell sequencing is also possible. RNA-Seq on the MinION platform should provide a novel approach that is complementary to the current RNA-Seq.Entities:
Keywords: allelic expression; nanopore sequencing; transcript isoform; transcriptome
Mesh:
Substances:
Year: 2019 PMID: 30462165 PMCID: PMC6379022 DOI: 10.1093/dnares/dsy038
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1.MinION sequencing of full-length cDNA. (A) Schematic of the FL-cDNA-Seq method. (B) Distribution of the concentration bioanalyzer and frequency of pass 2d reads of LC2/ad (R9.4) in each range of lengths. (C, D) Distribution of the coverage (C) and identity (D) of pass 2D reads of LC2ad, aligned by LAST (top) and BWA MEM (bottom). The average coverage or identity and percentages of reads with a coverage or identity greater than 0.8 is shown on the graphs. The coverage was defined as the ratio of the length of the RefSeq transcript covered by a single MinION read.
Figure 2.Comparison of FL-cDNA-Seq and Illumina RNA-Seq. (A) The gene expression of FL-cDNA-Seq of LC2/ad (R9.4) was compared with that of TruSeq RNA (left) and SMART-Seq (right). Pearson correlation coefficients are shown on the graph. (B) Influence of sequencing depth on the estimation of gene expression level and gene detection. Reads for each method were randomly sampled in triplicate. The average of the Pearson correlation coefficients between TruSeq RNA and randomly sampled data for FL-cDNA-Seq and SMART-Seq is shown (left). The average number of genes with an expression level of more than 1 tpm or ppm is shown (right). (C) Comparison to qRT-qPCR. Forty-four genes detected by all methods were analyzed. The gene expression of these genes was normalized to GAPDH. Pearson correlation coefficients are shown on the graph. qRT-qPCR data of LC2/ad was obtained as in our previous study.
Application of MinION transcriptome sequencing: statistics for the isoform detected by FL-cDNA-Seq
| No. of mapped reads | No. of PF reads | % PF reads | Known isoform | Novel isoform | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| No. of reads | % reads | No. of isoforms | No. of reads | % reads | No. of isoforms | No. of cufflinks-supported isoforms | No. of Illumina-supported isoforms | |||
| 556,195 | 297,865 | 54% | 294,780 | 99% | 6,018 | 3,085 | 1% | 158 | 33 | 137 |
Figure 3.Applications of MinION transcriptome sequencing. (A) Novel isoforms of the STRAP gene (left) and the ACTG1 gene (right). RefSeq transcripts and cufflink assembles using Illumina RNA-Seq are shown in the upper panel. FL-cDNA-Seq reads annotated as novel isoforms are shown in the middle panel. Illumina RNASeq reads are shown in the lower panel. (B) Three fusion genes detected by FL-cDNA. CCDC6-RET is known as a fusion gene of LC2/ad. Only the fusion chromosome of CCDC6-RET harbors a G to T SNP. WAC-SFMBT2 and ZSCAN22-CHMP2A were detected on LC2/ad and PC-9, respectively. (C) Sanger sequencing of the three fusion junctions. (D) Phased hetero SNP by FL-cDNA-Seq reads. We exemplified two phased genes, DSG2 (left) and SEPT9 (right), detected by the R9.4 reads of LC2ad. The distance between the phased SNP on the transcript and genome and hetero SNP patterns are shown at the top. The FL-cDNA-Seq reads are shown in the middle. The phased SNP pattern and number of reads covering all the SNPs are shown at the bottom.
Application of MinION transcriptome sequencing: number of fusion gene candidates
| Cell line | Flow cell | No. of reads of fusion gene candidates | No. of fusion gene candidates |
|---|---|---|---|
| H1975 | R9 | 8 | 8 |
| PC-9 | R9 | 5 | 3 |
| PC-7 | R9 | 5 | 5 |
| H2228 | R9 | 7 | 7 |
| VMRC-LCD | R9 | 14 | 11 |
| LC2/ad | R9 | 10 | 10 |
| LC2/ad | R9.4 | 158 | 151 |
Application of MinION transcriptome sequencing: phased gene number and heterozygous site distance
| Cell line | Flow cell | No. of phased gene | Average of phase distance | Average of phase distance without intron |
|---|---|---|---|---|
| H1975 | R9 | 17 | 5,791 | 737 |
| PC-9 | R9 | 5 | 5,256 | 570 |
| PC-7 | R9 | 2 | 1,546 | 725 |
| H2228 | R9 | 12 | 6,528 | 484 |
| VMRC-LCD | R9 | 2 | 2,246 | 102 |
| LC2/ad | R9 | 16 | 1,542 | 219 |
| LC2/ad | R9.4 | 237 | 8,872 | 569 |
Number of genes with allelic imbalance expression
| Tissue | Male | Female | ||
|---|---|---|---|---|
| No. of phased gene | No. of genes with allelic expression ( | No. of phased gene | No. of genes with allelic expression ( | |
| Merged data | 1,219 | 105 | 1,707 | 201 |
| Colon | 625 | 64 | 1,394 | 161 |
| Heart | 836 | 78 | 1,227 | 161 |
| Kidney | 842 | 82 | 1,305 | 165 |
| Liver | 854 | 85 | 1,387 | 172 |
| Lung | 986 | 87 | 1,563 | 175 |
| Pancreas | 760 | 83 | 818 | 143 |
| Skeletal muscle | 792 | 63 | 1,215 | 157 |
Figure 4.Detection of differential allelic expression using FL-cDNA-Seq reads. We exemplified two differentially expressed allelic genes: UBXN4 (A) in colon and liver of the female sample and GNAS (B) in merged data of the male sample. We showed the FL-cDNA-Seq reads separately by discrete SNP patterns. The positions of the SNPs are marked by arrows. (A) We also showed the expression patterns of UBXN4 in each tissues of the female sample. *P < 0.01.
Figure 5.Single cell FL-cDNA-Seq using C1. (A) Comparison of the expression level of the virtual bulk of nine single cells of LC2/ad quantified by FL-cDNA-Seq and SMART-Seq. (B) Comparison of the expression level of bulk and virtual bulk of nine single cells of LC2/ad quantified by FL-cDNA-Seq. (C) Correlation of the number of mapped reads of FL-cDNA-Seq and the correlation coefficient between the same single cell data quantified by FL-cDNA-Seq using MinION. The Pearson correlation coefficient between them is shown in the graph. (A, B) The Pearson correlation coefficient was calculated using the genes detected for both data sets.