| Literature DB >> 24676093 |
Hideya Kawaji1, Marina Lizio, Masayoshi Itoh, Mutsumi Kanamori-Katayama, Ai Kaiho, Hiromi Nishiyori-Sueki, Jay W Shin, Miki Kojima-Ishiyama, Mitsuoki Kawano, Mitsuyoshi Murata, Noriko Ninomiya-Fukuda, Sachi Ishikawa-Kato, Sayaka Nagao-Sato, Shohei Noma, Yoshihide Hayashizaki, Alistair R R Forrest, Piero Carninci.
Abstract
CAGE (cap analysis gene expression) and RNA-seq are two major technologies used to identify transcript abundances as well as structures. They measure expression by sequencing from either the 5' end of capped molecules (CAGE) or tags randomly distributed along the length of a transcript (RNA-seq). Library protocols for clonally amplified (Illumina, SOLiD, 454 Life Sciences [Roche], Ion Torrent), second-generation sequencing platforms typically employ PCR preamplification prior to clonal amplification, while third-generation, single-molecule sequencers can sequence unamplified libraries. Although these transcriptome profiling platforms have been demonstrated to be individually reproducible, no systematic comparison has been carried out between them. Here we compare CAGE, using both second- and third-generation sequencers, and RNA-seq, using a second-generation sequencer based on a panel of RNA mixtures from two human cell lines to examine power in the discrimination of biological states, detection of differentially expressed genes, linearity of measurements, and quantification reproducibility. We found that the quantified levels of gene expression are largely comparable across platforms and conclude that CAGE and RNA-seq are complementary technologies that can be used to improve incomplete gene models. We also found systematic bias in the second- and third-generation platforms, which is likely due to steps such as linker ligation, cleavage by restriction enzymes, and PCR amplification. This study provides a perspective on the performance of these platforms, which will be a baseline in the design of further experiments to tackle complex transcriptomes uncovered in a wide range of cell types.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24676093 PMCID: PMC3975069 DOI: 10.1101/gr.156232.113
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Experimental design and reproducibility. (A) Schematic representation of experimental design. (B) Scatter plots of quantified levels of gene expression and basic statistics. (±) Standard deviation. (C) Scatter plots of quantified levels of TSS activities at 1-bp resolution.
Figure 2.Gene expression quantification across different RNAs. (A) Venn diagram of up-regulated genes in HeLa cells against THP-1 cells. (B) Hierarchical clustering of the six RNAs, based on highly expressed 8000 gene expressions. Gray font indicates the reliability of the grouping, approximately unbiased probabilities with multiscale bootstrap resampling calculated by the pvclust package (Suzuki and Shimodaira 2006).
Figure 3.Gene expression quantification by using different platforms. (A) Scatter plot of gene expressions by using different platforms, where RPKM (reads per kilobase of exon models per million) is used for RNA-seq. The Spearman’s correlation coefficient is shown for individual comparisons. (B) Individual profiles in SNAPC4 locus. Blue signals indicate reverse-strand signals by the CAGE and RNA-seq platforms. ENCODE histone modification profiles indicating promoter and elongation activities are shown below. (C) MA plot between the 50% mixture experimental profile and the computationally synthesized one from the THP-1 and HeLa profiles. The 50% profile is based on the average of triplicates, while the computational one is based on combining six profiles (triplicates of THP-1 and HeLa).
Figure 4.Systematic bias at the gene level. (A) Schematic view of the CAGE platforms. Blue box indicates RNA; pink box, DNA; yellow box, EcoP15I sites; red box, internal EcoP15I sites; and green and purple boxes, 5′ and 3′ linkers. Text in blue suggests potential causes of gene quantification bias. (B) Relative expression of the CAGE profiles against RNA-seq quantification, according to GC content within 500 bp from RefSeq TSS. (C) Relative expression of the CAGE profiles depending on the presence of EcoP15I sites on an antisense strand to the gene orientation. (D) Relative expression of the RNA-seq profiles against HeliScopeCAGE quantification, according to GC content of the exons.
Figure 5.Systematic bias at a single-base-pair resolution. (A) Relative TSS activities of IlluminaCAGE against HeliScopeCAGE. (B) The CAGE and RNA-seq profiles on the TUBB promoter.