| Literature DB >> 19591787 |
Wanling Yang1, Dingge Ying, Yu-Lung Lau.
Abstract
Quantitative gene expression analysis plays an important role in identifying differentially expressed genes in various pathological states, gene expression regulation and co-regulation, shedding light on gene functions. Although microarray is widely used as a powerful tool in this regard, it is suboptimal quantitatively and unable to detect unknown gene variants. Here we demonstrated effective detection of differential expression and co-regulation of certain genes by expressed sequence tag analysis using a selected subset of cDNA libraries. We discussed the issues of sequencing depth and library preparation, and propose that increased sequencing depth and improved preparation procedures may allow detection of many expression features for less abundant gene variants. With the reduction of sequencing cost and the emerging of new generation sequencing technology, in-depth sequencing of cDNA pools or libraries may represent a better and powerful tool in gene expression profiling and cancer biomarker detection. We also propose using sequence-specific subtraction to remove hundreds of the most abundant housekeeping genes to increase sequencing depth without affecting relative expression ratio of other genes, as transcripts from as few as 300 most abundantly expressed genes constitute about 20% of the total transcriptome. In-depth sequencing also represents a unique advantage of detecting unknown forms of transcripts, such as alternative splicing variants, fusion genes, and regulatory RNAs, as well as detecting mutations and polymorphisms that may play important roles in disease pathogenesis.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19591787 PMCID: PMC5054226 DOI: 10.1016/S1672-0229(09)00003-5
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Non-normalized libraries used in the analysis of this study
| Tissue origin of libraries | No. of libraries | Total sequence entries | Library size (total entries) | ||
|---|---|---|---|---|---|
| Minimum | Maximum | Median | |||
| Normal bulk tissue | 43 | 429,781 | 5,235 | 23,703 | 9,180 |
| Normal cell line | 12 | 124,308 | 6,462 | 18,479 | 9,541 |
| Tumor bulk tissue | 23 | 261,446 | 5,385 | 25,235 | 8,430 |
| Tumor cell line | 64 | 908,856 | 5,178 | 41,936 | 11,945 |
| Total | 142 | 1,724,391 | 5,178 | 41,936 | 10,583 |
Figure 1Differentially expressed genes detected by EST analysis using 130 non-normalized libraries. The libraries were grouped according to their tissue origin as “normal tissue (normal_bulk)”, “cancer tissue (neoplasia_bulk)”, and “cultured cancer cell line (neoplasia_cell)”. Differential gene expression was analyzed by their detected copy numbers per 10,000 sequenced clones in each library. A. Differential expression for ACTB, GAPDH, RPS2, EEF1G, RAC1, and MALAT1. Comparisons showed significant P values calculated by unpaired nonparametric t-test with Welch’s correction: non-muscle beta actin (ACTB) between normal_bulk and neoplasia_bulk, P=0.04; between normal_bulk and neoplasia_cell, P=0.011; glyceraldehyde-3-phosphate dehydrogenase (GAPDH) between normal_bulk and neoplasia_bulk, P=0.0016; between normal_bulk and neoplasia_cell, P<0.0001; between neoplasia_bulk and neoplasia_cell, P=0.0002; ribosomal protein S2 (RPS2) between normal_bulk and neoplasia_bulk, P=0.0022; P<0.0001 between normal_bulk and neoplasia_cell as well as between neoplasia_bulk and neoplasia_cell; eukaryotic translation elongation factor 1 gamma (EEF1G) between normal_bulk and neoplasia_bulk, P=0.0004; between neoplasia_bulk and neoplasia_cell, P<0.0001; ras-related C3 botulinum toxin substrate 1 (rho family, small GTP binding protein Rac1, RAC1) between normal_bulk and neoplasia_bulk, P=0.0034; between normal_bulk and neoplasia_cell, P=0.0013; between neoplasia_bulk and neoplasia_cell, P=0.043; metastasis associated lung adenocarcinoma transcript 1 (non-protein coding) (MALAT1) between normal_bulk and neoplasia_bulk, P=0.0084; and between normal_bulk and neoplasia_cell, P=0.0098. B. Differential expression of BIRC5 analyzed by libraries of different preparations. For baculoviral IAP repeat-containing 5 (survivin) (BIRC5) in uncharacterized libraries, between normal_bulk and neoplasia_cell, P=0.0067; between neoplasia_bulk and neoplasia_cell, P=0.017. For normalized/subtracted libraries, between normal_bulk and neoplasia_cell, P=0.0065; between neoplasia_bulk and neoplasia_cell, P=0.0090. For non-normalized libraries between normal_bulk and neoplasia_cell, P<0.0001; between neoplasia_bulk and neoplasia_cell, P<0.0001.
Expression of RAC1b variant form detected in the 142 non-normalized libraries
| Library name | Tissue | Histology | Total sequenced clones | Library origin | |
|---|---|---|---|---|---|
| NIH_MGC_70 | pancreas | neoplasia | 16,633 | 1 | cell line |
| NIH_MGC_15 | colon | neoplasia | 14,224 | 1 | cell line |
| NIH_MGC_98 | brain | neoplasia | 12,808 | 1 | cell line |
| NIH_MGC_42 | pancreas | neoplasia | 10,751 | 1 | cell line |
| NIH_MGC_101 | lung | neoplasia | 9,166 | 1 | cell line |
| NCI_CGAP_GU1 | uncharacterized tissue | neoplasia | 5,550 | 1 | bulk |
Figure 2Expression correlation between ACTB and ACTG1 and between KRT8/18. A. Correlation between ACTB and ACTG1 in the uncharacterized libraries. There is no significant expression correlation between the two genes detected (correlation coefficient r2=0.0086, P=0.43). B. Correlation between ACTB and ACTG1 among the normalized/subtracted libraries. The correlation is significant with r2=0.47 and P<0.0001. C. Correlation between ACTB and ACTG1 among the non-normalized libraries. The correlation is significant with r2=0.56 and P<0.0001. D. Expression correlation between KRT8 and KRT18 in non-normalized libraries. The correlation is significant with r2=0.57 and P<0.0001.
Figure 3Probability analysis of false negative detection in relationship with library sizes and gene expression levels. The analysis is based on library sizes (x-axis, to the power of 10) and gene expression levels at 1 copy of mRNA in 1,000, 10,000, 50,000, and 1 million total transcripts, using Poisson distribution as described in Results. The y-axis stands for the probability of false negative detection—undetected when the gene is really expressed.
Targeted library sizes for certain false negative rates for various expression levels
| Expression level (copy number in transcriptome) | Targeted library sizes | |||
|---|---|---|---|---|
| 1/100 | 120 | 160 | 230 | 300 |
| 1/1,000 | 1,200 | 1,600 | 2,300 | 3,000 |
| 1/10,000 | 12,000 | 16,000 | 23,000 | 30,000 |
| 1/100,000 | 120,000 | 160,000 | 230,000 | 300,000 |
| 1/1,000,000 | 1,200,000 | 1,600,000 | 2,300,000 | 3,000,000 |
Figure 4Percentage of sequence entries in the total transcriptome by the most abundant genes. The x-axis stands for the portion of UniGene clusters ranked by abundance from the most abundant to the least abundant clusters. The y-axis stands for the cumulative portion of total UniGene entries constituted by the most abundant UniGene clusters. The inset is the full scale of the same figure.
Albumin sequences in libraries of liver origin*
| Library name | Total sequenced clones | Detected albumin sequences | Portion of albumin (%) | Library histology | Library protocol |
|---|---|---|---|---|---|
| LIVER2 | 6,715 | 2,336 | 34.79 | normal | uncharacterized treatment |
| TLIVE2 | 8,656 | 2,202 | 25.44 | neoplasia | uncharacterized treatment |
| human hepatoblastoma cDNA | 7,898 | 280 | 3.55 | neoplasia | uncharacterized treatment |
| Stratagene liver (#937224) | 8,417 | 1,022 | 12.14 | normal | non-normalized |
| 10,027 | 907 | 9.05 | normal | non-normalized | |
| NIH_MGC_76 | 11,960 | 1,046 | 8.75 | normal | non-normalized |
| GLC | 19,285 | 1,272 | 6.60 | normal | uncharacterized treatment |
| GKC | 17,736 | 1,146 | 6.46 | neoplasia | uncharacterized treatment |
| 779 (synonym: hncc1) | 10,690 | 682 | 6.38 | normal | uncharacterized treatment |
Including libraries with 5,000 sequenced clones or more, and tissues only (no libraries from cell lines).
Median expression levels of ACTB and ACTG1 as reflected by libraries of different preparation process
| Library type | Expression level (copy number/10,000 sequenced clones) | |
|---|---|---|
| Non-normalized library | 24.65 | 19.43 |
| Normalized library | 1.97 | 1.06 |
| Subtracted library | 0 | 0 |
| Uncharacterized library | 19.85 | 0.16 |