| Literature DB >> 20625424 |
Valerio Costa1, Claudia Angelini, Italia De Feis, Alfredo Ciccodicola.
Abstract
In recent years, the introduction of massively parallel sequencing platforms for Next Generation Sequencing (NGS) protocols, able to simultaneously sequence hundred thousand DNA fragments, dramatically changed the landscape of the genetics studies. RNA-Seq for transcriptome studies, Chip-Seq for DNA-proteins interaction, CNV-Seq for large genome nucleotide variations are only some of the intriguing new applications supported by these innovative platforms. Among them RNA-Seq is perhaps the most complex NGS application. Expression levels of specific genes, differential splicing, allele-specific expression of transcripts can be accurately determined by RNA-Seq experiments to address many biological-related issues. All these attributes are not readily achievable from previously widespread hybridization-based or tag sequence-based approaches. However, the unprecedented level of sensitivity and the large amount of available data produced by NGS platforms provide clear advantages as well as new challenges and issues. This technology brings the great power to make several new biological observations and discoveries, it also requires a considerable effort in the development of new bioinformatics tools to deal with these massive data files. The paper aims to give a survey of the RNA-Seq methodology, particularly focusing on the challenges that this application presents both from a biological and a bioinformatics point of view.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20625424 PMCID: PMC2896904 DOI: 10.1155/2010/853916
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Figure 1Evolution of DNA revolution.
Selection of papers on mammalian RNA-Seq.
| Reference | Organism | Cell type/tissue | NGS platform |
|---|---|---|---|
| Bainbridge et al., 2006 [ | Prostate cancer cell line | Roche | |
| Cloonan et al., 2008 [ | ES cells and Embryoid bodies | ABI | |
| Core et al., 2008 [ | Lung fibroblasts | IIlumina | |
| Hashimoto et al., 2008 [ | HT29 cell line | ABI | |
| Li et al., 2008 [ | Prostate cancer cell line | IIlumina | |
| Marioni et al., 2008 [ | Liver and kidney samples | IIlumina | |
| Morin et al., 2008 [ | ES cells and Embryoid bodies | IIlumina | |
| Morin et al., 2008 [ | Homo sapiens | HeLa S3 cell line | IIlumina |
| Mortazavi et al., 2008 [ | Brain, liver and skeletal muscle | IIlumina | |
| Rosenkran et al., 2008 [ | ES cells | IIlumina | |
| Sugarbaker et al., 2008 [ | Malignant pleural mesothelioma, adenocarcinoma and normal lung | Roche | |
| Sultan et al., 2008 [ | Human embryonic kidney and B cell line | IIlumina | |
| Asmann et al., 2009 [ | Universal and brain human reference RNAs | IIlumina | |
| Chepelev et al., 2009 [ | Jurkat and GD4+ T cells | IIlumina | |
| Levin et al., 2009 [ | K562 | IIlumina | |
|
Maher et al., 2009 [ | Prostate cancer cell lines | Roche | |
| IIlumina | |||
| Parkhomchuk et al., 2009 [ | Brain | IIlumina | |
| Reddy et al., 2009 [ | A549 cell line | IIlumina | |
| Tang et al., 2009 [ | Blastomere and oocyte | ABI | |
|
Blekhman et al., 2010 [ | Liver | IIlumina | |
| Heap et al., 2010 [ | Primary GD4+ T cells | IIlumina | |
| Raha et al., 2010 [ | K562 cell line | IIlumina |
Figure 2Library preparation and clonal amplification. Schematic representation of a workflow for library preparation in RNA-Seq experiments on the SOLiD platform. In the figure is depicted a total RNA sample after depletion of rRNA, containing both polyA and non-polyA mRNA, tRNAs, miRNAs and small noncoding RNAs. Ribo-depleted total RNA is fragmented (1), then ligated to specific adaptor sequences (2) and retro-transcribed (3). The resulting cDNA is size selected by gel electrophoresis (4), and cDNAs are PCR amplified (5). Then size distribution is evaluated (6). Emulsion PCR, with one cDNA fragment per bead, is used for the clonal amplification of cDNA libraries (7). Purified and enriched beads are finally deposited onto glass slides (8), ready to be sequenced by ligation.
Figure 3RNA-Seq computational pipeline.
Figure 4Strand-Specific Read Distribution in UCSC Genome Browser and IGV. (a) UCSC Genome Browser showing an example of stranded sequences generated by RNA-Seq experiment on NGS platform. In particular, the screenshot—of a characteristic “tail to tail” orientation of two human genes—clearly shows the specific expression in both strands where these two genes overlap, indicating that the strandedness of reads is preserved. (b) The same genomic location in the IGV browser, showing the reads (coloured blocks) distribution along TMED1 gene. The grey arrows indicate the sense of transcription. The specific expression in both strands where the genes overlap, indicates that the strandedness of reads is preserved. In (c) a greater magnification of the reads mapping to the same region at nucleotide level, useful to SNP analysis. The chromosome positions are shown at the top and genomic loci of the genes are shown at the bottom of each panel.
Figure 5Mapping and quantification of the signal. RNA-seq experiments produce short reads sequenced from processed mRNAs. When a reference genome is available the reads can be mapped on it using efficient alignment software. Classical alignment tools will accurately map reads that fall within an exon, but they will fail to map spliced reads. To handle such problem suitable mappers, based either on junctions library or on more sophisticated approaches, need to be considered. After the mapping step annotated features can be quantified.
Figure 6Alternative splicing. Schematic representation of the possible patterns of alternative splicing of a gene. Boxes are discrete exons that can be independently included or excluded from the mRNA transcript. Light blue boxes represent constitutive exons, violet and red boxes are alternatively spliced exons. Dashed lines represent alternative splicing events. (a) Canonical exon skipping; (b) 5′ or (c) 3′ alternative splicing; (d) Mutually exclusive splicing event involving the selection of only one from two or more exon variants; (e) Intra-exonic “cryptic” splice site causing the exclusion of a portion of the exon from the transcript; (f) Usage of new alternative 5′ or (g) 3′ exons; (h) Intron retention.