| Literature DB >> 20615900 |
Matthew S Hestand1, Andreas Klingenhoff, Matthias Scherf, Yavuz Ariyurek, Yolande Ramos, Wilbert van Workum, Makoto Suzuki, Thomas Werner, Gert-Jan B van Ommen, Johan T den Dunnen, Matthias Harbers, Peter A C 't Hoen.
Abstract
Next-generation sequencing is excellently suited to evaluate the abundance of mRNAs to study gene expression. Here we compare two alternative technologies, cap analysis of gene expression (CAGE) and serial analysis of gene expression (SAGE), for the same RNA samples. Along with quantifying gene expression levels, CAGE can be used to identify tissue-specific transcription start sites, while SAGE monitors 3'-end usage. We used both methods to get more insight into the transcriptional control of myogenesis, studying differential gene expression in differentiated and proliferating C2C12 myoblast cells with statistical evaluation of reproducibility and differential gene expression. Both CAGE and SAGE provided highly reproducible data (Pearson's correlations >0.92 among biological triplicates). With both methods we found around 10,000 genes expressed at levels >2 transcripts per million (approximately 0.3 copies per cell), with an overlap of 86%. We identified 4304 and 3846 genes differentially expressed between proliferating and differentiated C2C12 cells by CAGE and SAGE, respectively, with an overlap of 2144. We identified 196 novel regulatory regions with preferential use in proliferating or differentiated cells. Next-generation sequencing of CAGE and SAGE libraries provides consistent expression levels and can enrich current genome annotations with tissue-specific promoters and alternative 3'-UTR usage.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20615900 PMCID: PMC2938216 DOI: 10.1093/nar/gkq602
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Sequencing results
| No. of reads sequenced | No. of reads aligned | Percent aligned (%) | |
|---|---|---|---|
| CAGE sample | |||
| Prolif-1 | 4 886 341 | 2 086 233 | 42.7 |
| Prolif-1 duplo | 3 933 233 | 1 770 247 | 45.0 |
| Prolif-2 | 5 003 964 | 2 421 443 | 48.4 |
| Prolif-3 | 4 734 605 | 2 062 081 | 43.6 |
| Diff-1 | 4 525 321 | 1 679 081 | 37.1 |
| Diff-1 duplo | 3 101 153 | 1 252 451 | 40.4 |
| Diff-2 | 5 060 041 | 2 195 263 | 43.4 |
| Diff-3 | 4 830 194 | 1 578 087 | 32.7 |
| SAGE sample | |||
| Prolif-1 | 5 941 753 | 3 351 426 | 56.4 |
| Prolif-2 | 7 768 787 | 4 464 057 | 57.5 |
| Prolif-3 | 6 723 476 | 3 878 953 | 57.7 |
| Diff-1 | 9 467 926 | 5 811 947 | 61.4 |
| Diff-2 | 7 269 002 | 4 618 715 | 63.5 |
| Diff-3 | 4 392 416 | 2 494 618 | 56.8 |
Indicators for CAGE and SAGE samples: Prolif for proliferating cells and Diff for differentiating cells, followed by a number representing the biological triplicates. For CAGE there are sequencing duplicates indicated by ‘duplo'. The table contains the number of reads, the number of reads that align uniquely to the repeat-masked genome and the percent aligned.
Figure 1.CAGE and SAGE wiggle tracks for proliferating (Prolif) and differentiated (Diff) cells in the UCSC Genome Browser for the myogenic marker MyoD. We only display reads aligning to the forward strand, the coding direction for MyoD. Chromosomal positions are indicated at the top. For each track the Y-axis scale corresponds to the number of tags aligned at that genomic position. Scales use a maximum from each relevant technique in this viewing window (129 for CAGE and 3912 for SAGE). There is 5 and 3 concordance for CAGE and SAGE samples, respectively. CAGE provides broader peaks, reflecting TSSs plus 26 nt of downstream sequence, wheres SAGE provides discrete peaks. A higher number of tags are in differentiated compared to proliferating samples.
Figure 2.High reproducibility was found in CAGE regions between sequencing duplicates (A) and biological replicates (B). (C) Shows correlation between CAGE samples from proliferating and differentiated cells. High reproducibility can also be found between SAGE biological replicates (D). (E) Shows the correlation between SAGE samples from proliferating and differentiated cells. The plotted values represent the square root of the number of tags per region.
Figure 3.Correlation of CAGE versus SAGE for proliferating samples (A), differentiated samples (B), and the ratio of proliferating and differentiated cells (C). Values are the square root of the number of tags per gene for A and B. For C, the values are the log ratio of the normalized number of tags per gene in differentiated over proliferating cells. The overlap of detectable genes (D) and differentially expressed genes (E) between CAGE and SAGE is indicated.
Differential gene expression
| CAGE gene | Ratio | Microarray | SAGE gene | Ratio | Microarray |
|---|---|---|---|---|---|
| Hfe2 | 4073 | NA | RP23-36P22.5 | 576 | NA |
| Myom3 | 1624 | NA | Neb | 525 | NA |
| Lmod2 | 1305 | NA | Mylpf | 504 | 1.70 × 10−15 |
| Myh7 | 1124 | 5.98 × 10−3 | Ttn | 380 | NA |
| Mb | 908 | 1.07 × 10−14 | Myh3 | 368 | 2.40 × 10−14 |
| RP23-36P22.5 | 735 | NA | Xirp1 | 306 | 2.24 × 10−13 |
| Pygm | 717 | 4.82 × 10−17 | 1110002H13Rik | 263 | NA |
| Myl4 | 614 | 8.86 × 10−20 | Tnnc1 | 232 | 1.24 × 10−11 |
| Synpo2l | 595 | NA | Cav3 | 150 | 3.58 × 10−22 |
| Myh1 | 561 | 3.64 × 10−15 | Cbfa2t3 | 133 | 2.89 × 10−10 |
| Tnni1 | 529 | 2.24 × 10−9 | Chrng | 115 | 4.63 × 10−9 |
| Tnni2 | 442 | 3.20 × 10−11 | Myom2 | 105 | 6.66 × 10−16 |
| Mpa2l | 410 | NA | Tnnt1 | 100 | 1.15 × 10−10 |
| Ctrb1 | 406 | 7.55 × 10−7 | Ryr1 | 92 | 7.03 × 10−14 |
| Ttn | 402 | NA | Apobec2 | 84 | 2.95 × 10−15 |
| Neb | 374 | NA | Cox6a2 | 72 | 2.45 × 10−16 |
| Kcnq4 | 365 | NA | Dio2 | 64 | 2.14 × 10−10 |
| Mylpf | 341 | 1.70 × 10−15 | C1qtnf3 | 52 | 4.36 × 10−5 |
| 1110002H13Rik | 341 | NA | Htr2b | 43 | 3.76 × 10−6 |
| Inpp4b | 328 | NA | Sgcg | 42 | 1.15 × 10−12 |
| Xirp1 | 307 | 2.24 × 10−13 | Fndc5 | 39 | NA |
| Atp2a1 | 304 | 2.06 × 10−14 | Jsrp1 | 36 | NA |
| Casq2 | 297 | 4.74 × 10−6 | Ankrd23 | 36 | NA |
| Cacna1s | 296 | 5.20 × 10−19 | AK031267 | 29 | NA |
| Ces2 | 245 | NA | Sema6a | 26 | 3.08 × 10−3 |
| Cox6a2 | 241 | 2.45 × 10−16 | Lgr5 | 23 | 9.33 × 10−1 |
| Myog | 238 | 2.36 × 10−6 | Pdlim3 | 22 | 3.18 × 10−6 |
| Myh3 | 234 | 2.40 × 10−14 | Klhl31 | 22 | NA |
| Tmem182 | 216 | NA | ORF63 | 21 | NA |
| Tnnc1 | 215 | 1.24 × 10−11 | Gfra2 | 19 | 2.98 × 10−2 |
Top 30 genes from SAGE and CAGE expression data. All genes with a Bayesian error rate 1 × 10−50 were sorted on the ratio (normalized tags from differentiated/proliferating cells) and the highest ratios for differentiated cells displayed. The microarray P-values are adjusted P-values for differential gene expression from a similar experiment [proliferating and differentiated C2C12 cells (12)]. NA, no probe annotation for the gene.
| CAGE GO | SAGE GO | Microarray GO |
|---|---|---|
| ( | ( | ( |
| ( | ( | ( |
| ( | ( | ( |
| ( | ( | ( |
| ( | ( | ( |
| ( | ( | ( |
| ( | ( | ( |
| ( | ( | ( |
| ( | ( | ( |
| ( | ( | ( |
The top 10 GO biological processes associated with the top 30 genes for CAGE, SAGE and microarray experiments indicate clear muscle relations, with the exception of three (in italics) processes in the microarray data.
Figure 4.The UCSC display of (A) UCSC/Ensembl-defined first exon and an upstream Myl1 CAGE region (reverse strand reads only, on which the gene lies) for samples Prolif-1 and Diff-1. The Y-axis indicates the number of tags aligned at each position in the genome. We also display additional track information (UCSC genes, Ensembl genes, Vega genes, Other RefSeq, AceView Genes, N-SCAN and Transcriptome), several of which confirm the presence of the upstream CAGE region. (B) qPCR with primers within the CAGE region for Prolif, Prolif-C (reverse transcriptase control), Diff and Diff-C (reverse transcriptase control). The qPCR results are plotted as threshold cycle (Cp) values (lower = higher expression), with bars indicating a range of one SD between technical duplicates. (C) Standard PCR on agarose gel with forward primer in the novel CAGE region and reverse primer in the conventional exon 1. Comparison with the genomic control verifies the presence of an intron of 200 bases. A 100-bp ladder is included. (A–C) are consistent with higher expression in differentiated than proliferating cells. (D) Cross-species conserved muscle-specific transcription factor binding sites around and upstream of the Myl1 CAGE region support its role as a promoter for this region.