| Literature DB >> 19119315 |
Shin-ichi Hashimoto1, Wei Qu, Budrul Ahsan, Katsumi Ogoshi, Atsushi Sasaki, Yoichiro Nakatani, Yongjun Lee, Masako Ogawa, Akio Ametani, Yutaka Suzuki, Sumio Sugano, Clarence C Lee, Robert C Nutter, Shinichi Morishita, Kouji Matsushima.
Abstract
Massively parallel, tag-based sequencing systems, such as the SOLiD system, hold the promise of revolutionizing the study of whole genome gene expression due to the number of data points that can be generated in a simple and cost-effective manner. We describe the development of a 5'-end transcriptome workflow for the SOLiD system and demonstrate the advantages in sensitivity and dynamic range offered by this tag-based application over traditional approaches for the study of whole genome gene expression. 5'-end transcriptome analysis was used to study whole genome gene expression within a colon cancer cell line, HT-29, treated with the DNA methyltransferase inhibitor, 5-aza-2'-deoxycytidine (5Aza). More than 20 million 25-base 5'-end tags were obtained from untreated and 5Aza-treated cells and matched to sequences within the human genome. Seventy three percent of the mapped unique tags were associated with RefSeq cDNA sequences, corresponding to approximately 14,000 different protein-coding genes in this single cell type. The level of expression of these genes ranged from 0.02 to 4,704 transcripts per cell. The sensitivity of a single sequence run of the SOLiD platform was 100-1,000 fold greater than that observed from 5'end SAGE data generated from the analysis of 70,000 tags obtained by Sanger sequencing. The high-resolution 5'end gene expression profiling presented in this study will not only provide novel insight into the transcriptional machinery but should also serve as a basis for a better understanding of cell biology.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19119315 PMCID: PMC2606021 DOI: 10.1371/journal.pone.0004108
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Schematic depicting the 5′-end library construction for SOLiD sequencing (a) and the transcript copy number and abundance in libraries prepared from HT-29 cells, either untreated or treated with 5Aza.
Total expressed mRNA (b). The frequency denotes the category of expression level as determined by the number of transcript copies per cell. Unique genes represent the total number of unique genes that corresponded to the RefSeq dataset (c). 5 copies = 100 tags/6 million tags, since human cells are predicted to contain 300,000 mRNA molecules. (d) Validation of the 5′SOLiD analysis using qPCR. Comparison of the 5′SOLiD tag profiling gene expression data with qPCR. The mRNAs corresponding to 40 genes in cells treated with 5Aza were quantified using qPCR. The ratio of mRNA abundances in 5Aza-treated versus control cells determined by this method was compared to the corresponding ratios determined using 5′SOLiD data. The logarithmic values of these ratios were plotted. The Y = X line with a slope of 1 is the expected line when both platforms have identical expression patterns.
Sequencing Summary.
| Sequenced Tags | Used Tags (A) | Mapped Tags (B) | % (B/A) | Unique Tags (C) | % (C/B) | Unique tags in RefSeq TSSs (D) | % (D/C) | |
| Control | 29,231,644 | 16,050,987 | 9,980,657 | 62% | 7,062,060 | 71% | 5,110,167 | 72% |
| 5Aza | 32,629,872 | 19,865,139 | 13,037,914 | 66% | 9,373,606 | 72% | 6,981,749 | 74% |
| Total | 61,861,516 | 35,916,126 | 23,018,571 | 64% | 16,435,666 | 72% | 12,091,916 | 73% |
Unique tags were aligned to a position unambiguously. Unique tags in TSSs present numbers of unique tags mapped to the regions within 500 bases from the representative TSSs of genes in the RefSeq database. Unique tags are categorized into three groups according to the number of mismatches in individual alignments.
Copy number and transcripts abundance in each library.
| Copy/cell | Cont (Unique genes) | Mass fraction mRNA | 5Aza (Unique genes) | Mass fraction mRNA | ||
| >500 | 68 | 0% | 24% | 65 | 0% | 24% |
| 51–500 | 1016 | 7% | 41% | 1046 | 7% | 41% |
| 6–50 | 5666 | 38% | 30% | 5609 | 39% | 30% |
| 1–5. | 4012 | 27% | 4% | 3913 | 27% | 4% |
| 0.1–<1 | 4151 | 28% | 1% | 3904 | 27% | 1% |
| Total | 14913 | 100% | 14537 | 100% | ||
Frequency denotes the category of expression level analyzed in transcript copies per cell in each librasiry. Unique genes represent a total number of unique genes hit to the RefSeq sequencing. An estimate of about 300,000 transcripts per cell was used to concert the abundances to copies per cell.
Figure 2Comparison of gene expression patterns between 5′SOLiD and 5′SAGE.
These data show a scatter plot of unique transcripts identified by SOLiD sequencing (a) and random samples of 5′SOLiD tags (b) 5′SAGE tag sequencing(c) of libraries prepared from RNA isolated from HT29 cells treated with 5Aza. Individual 5′-end tags are associated with the human gene in which the tag originated. In the graph, each dot represents one gene, and its x and y coordinates indicate the numbers of 5′-end tags associated with the gene in each library (a–c). The number represents a comparison of the Pearson Correlation coefficient between two libraries. (d)–(f), MA plots are calculated from the pair of the number of 5′SOLiD tags and the number of 5′SAGE tags that are associated with a gene. X and Y values are translated into A and M values according to the following formulas: A = ½ (log2X+log2Y), M = log2X−log2Y.
Comparison of cell-cycle related gene profiling between 5′-end SAGE and 5′-end-SOLiD.
| Description | 5′-end SAGE (n) | RefSeq | 5′endSOLiD (n) | ||
| Cont | 5Aza | Cont | 5Aza | ||
| eukaryotic translation elongation factor 2 | 33 | 7 | NM_001961 | 5605 | 4110 |
| cyclin-dependent kinase inhibitor 1A | 0 | 1 | NM_000389 | 34 | 55 |
| cyclin-dependent kinase inhibitor 2A | 0 | 2 | NM_000077 | 172 | 124 |
| cyclin-dependent kinase inhibitor 1B | 0 | 0 | NM_004064 | 61 | 59 |
| cyclin-dependent kinase inhibitor 2B | 0 | 0 | NM_004936 | 27 | 70 |
| cyclin-dependent kinase inhibitor 1C | 0 | 0 | NM_000076 | 6 | 6 |
| cyclin-dependent kinase inhibitor 2C | 0 | 0 | NM_001262 | 124 | 109 |
| cyclin-dependent kinase inhibitor 2D | 0 | 1 | NM_001800 | 319 | 248 |
| cyclin-dependent kinase inhibitor 3 | 1 | 0 | NM_005192 | 477 | 591 |
| cyclin-dependent kinase 2 | 0 | 0 | NM_001798 | 384 | 406 |
| cyclin-dependent kinase 3 | 0 | 0 | NM_001258 | 34 | 24 |
| cyclin-dependent kinase 4 | 7 | 4 | NM_000075 | 1077 | 1155 |
| cyclin-dependent kinase 5 | 0 | 0 | NM_004935 | 998 | 996 |
| cyclin-dependent kinase 6 | 0 | 0 | NM_001259 | 459 | 406 |
| cyclin-dependent kinase 7 | 0 | 1 | NM_001799 | 162 | 118 |
| cyclin-dependent kinase 8 | 0 | 0 | NM_001260 | 78 | 63 |
| cyclin-dependent kinase 9 | 0 | 0 | NM_001261 | 259 | 199 |
| cyclin-dependent kinase 10 | 0 | 1 | NM_052988 | 183 | 161 |
| retinoblastoma 1 | 0 | 0 | NM_000321 | 71 | 59 |
| tumor protein p53 | 0 | 3 | NM_000546 | 1068 | 1260 |
| myc proto-oncogene protein | 0 | 0 | NM_002467 | 337 | 340 |
| MAX protein isoform f | 0 | 0 | NM_197957 | 73 | 76 |
| adenomatosis polyposis coli | 0 | 0 | NM_000038 | 156 | 138 |
| phosphatase and tensin homolog | 0 | 0 | NM_000314 | 245 | 223 |
| matrix metalloproteinase 7 | 7 | 6 | NM_002423 | 4458 | 3800 |
| erbB-2 | 0 | 1 | NM_001005862 | 412 | 349 |
| erbB-3 | 0 | 1 | NM_001982 | 1046 | 851 |
| v-erb-a erythroblastic leukemia viral oncogene | 0 | 0 | NM_001042599 | 171 | 84 |
| wee1 tyrosine kinase | 0 | 0 | NM_003390 | 277 | 204 |
| F-box only protein 5 | 0 | 0 | NM_012177 | 48 | 41 |
| cyclin A | 0 | 1 | NM_001237 | 657 | 617 |
| cyclin B1 | 0 | 0 | NM_031966 | 1144 | 1587 |
| cyclin B2 | 0 | 0 | NM_004701 | 975 | 686 |
| cyclin D1 | 1 | 3 | NM_053056 | 311 | 449 |
| cyclin D2 | 0 | 0 | NM_001759 | 3 | 3 |
| cyclin D3 | 0 | 0 | NM_001760 | 225 | 220 |
| cyclin E1 | 0 | 0 | NM_001238 | 83 | 66 |
| cyclin E2 | 0 | 0 | NM_057749 | 17 | 13 |
| cyclin J | 0 | 0 | NM_019084 | 43 | 44 |
| cyclin J-like | 0 | 0 | NM_024565 | 133 | 141 |
| cyclin M2 | 0 | 0 | NM_017649 | 53 | 58 |
| cyclin M4 | 0 | 0 | NM_020184 | 48 | 56 |
| cyclin N-terminal domain containing 2 | 0 | 0 | NM_024877 | 6 | 11 |
| cyclin T1 | 0 | 0 | NM_001240 | 15 | 14 |
| cyclin Y-like 1 | 0 | 0 | NM_152523 | 32 | 31 |
| cell division cycle 25A | 0 | 0 | NM_001789 | 306 | 348 |
| cell division cycle 25B | 0 | 0 | NM_004358 | 907 | 867 |
| cell division cycle 25C | 0 | 0 | NM_001790 | 177 | 120 |
| jun oncogene | 0 | 0 | NM_002228 | 670 | 626 |
| polo-like kinase | 0 | 1 | NM_005030 | 414 | 612 |
| secreted frizzled-related protein 1 | 0 | 0 | NM_003012 | 6 | 2 |
| secreted frizzled-related protein 5 | 1 | 0 | NM_003015 | 1 | 1 |
| secreted frizzled-related protein 2 precursor | 0 | 0 | NM_003013 | 1 | 0 |
| myeloid cell leukemia sequence 1 | 0 | 0 | NM_021960 | 1106 | 951 |
| cell division cycle associated 3 | 0 | 0 | NM_031299 | 44 | 43 |
| catenin (cadherin-associated protein), beta 1, | 0 | 0 | NM_001098209 | 424 | 499 |
| tumor protein p73 | 0 | 0 | NM_005427 | 25 | 14 |
| cadherin 1, type 1 preproprotein | 1 | 0 | NM_004360 | 1515 | 1428 |
| lymphoid enhancer-binding factor 1 | 0 | 0 | NM_016269 | 21 | 10 |
| glycogen synthase kinase 3 beta | 1 | 0 | NM_002093 | 436 | 489 |
In this table, each number of tags from 5′SAGE, 5′SOLiD was normarized to 40,000 and 6,000,000.
Figure 3The distribution of 5′-end tags that correspond to annotated exons and introns of well-characterized genes(a).
(b); Correlation of fold change of the number of tags within the promoter+1st exon relative to the number of tags in the inner exon/intron. The solid lines represent a linear regression fit. The slope is 0.79, 0.73 for the control versus 5Aza (intron/promoter+1st exon) or the control versus 5Aza (exon)/promoter+1st exon), respectively. (c) and (d). Analysis of TSS identified by SOLiD sequence tags. (c) Ratio of 5′SOLiD tags from each library to known TSSs and all libraries. (d) Ratio of SOLiD TSSs to known 5′end tags. Horizontal axes shows the distance of 5′SOLiD tags relative to the mRNA start sites in 1,562,911 known TSSs collected from a variety of human tissues in DBTSS database. Distances are shown as the number of upstream and downstream nucleotides. The coverage is given on the y-axis.