| Literature DB >> 18927111 |
Peter A C 't Hoen1, Yavuz Ariyurek, Helene H Thygesen, Erno Vreugdenhil, Rolf H A M Vossen, Renée X de Menezes, Judith M Boer, Gert-Jan B van Ommen, Johan T den Dunnen.
Abstract
The hippocampal expression profiles of wild-type mice and mice transgenic for deltaC-doublecortin-like kinase were compared with Solexa/Illumina deep sequencing technology and five different microarray platforms. With Illumina's digital gene expression assay, we obtained approximately 2.4 million sequence tags per sample, their abundance spanning four orders of magnitude. Results were highly reproducible, even across laboratories. With a dedicated Bayesian model, we found differential expression of 3179 transcripts with an estimated false-discovery rate of 8.5%. This is a much higher figure than found for microarrays. The overlap in differentially expressed transcripts found with deep sequencing and microarrays was most significant for Affymetrix. The changes in expression observed by deep sequencing were larger than observed by microarrays or quantitative PCR. Relevant processes such as calmodulin-dependent protein kinase activity and vesicle transport along microtubules were found affected by deep sequencing but not by microarrays. While undetectable by microarrays, antisense transcription was found for 51% of all genes and alternative polyadenylation for 47%. We conclude that deep sequencing provides a major advance in robustness, comparability and richness of expression profiling data and is expected to boost collaborative, comparative and integrative genomics studies.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18927111 PMCID: PMC2588528 DOI: 10.1093/nar/gkn705
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Categorization and abundance of tags. Distribution (in percentage of total) of unique tags (black bars) and individual reads (counts; open bars) over different categories (average from eight samples): high-confidence transcripts (canonical), low-confidence transcripts (noncanonical), mitochondrial RNA (mito), ribosomal RNA (ribo), genomic region with no evidence for transcription (just genome), repetitive genomic region (repeats) and tags with no hits in the genome.
Counts for blood-derived transcripts including P-values from Fisher test and Student's t-test
| Gene | Name | Pool_WT | Pool_dC | Fisher | WT1 | WT3 | WT4 | WT6 | dC1 | dC2 | dC3 | dC4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Serpina3k | Serine (or cysteine) peptidase inhibitor, clade A, member 3K | 87 | 0 | 1.22E-26 | 143 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0.18 |
| Gc | Group specific component | 22 | 0 | 4.21E-19 | 41 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0.36 |
| Fgg | Fibrinogen, gamma polypeptide | 60 | 0 | 1.69E-18 | 72 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0.14 |
| Serpina1a | Serine (or cysteine) peptidase inhibitor, clade A, member 1a | 35 | 0 | 5.76E-11 | 71 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.36 |
| Mug1 | Murinoglobulin 1 | 20 | 0 | 2.96E-08 | 25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.16 |
| Itih4 | Inter alpha-trypsin inhibitor, heavy chain 4 | 26 | 0 | 4.75E-07 | 51 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0.28 |
| Mup1 | Major urinary protein 1 | 14 | 0 | 1.90E-06 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.36 |
| Orm1 | Orosomucoid 1 | 11 | 0 | 7.61E-06 | 22 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0.36 |
| Rdh7 | Retinol dehydrogenase 7 | 17 | 0 | 1.52E-05 | 21 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.36 |
| Exosc8 | Exosome component 8 | 14 | 0 | 1.22E-04 | 28 | 2 | 0 | 0 | 0 | 0 | 1 | 0 | 0.17 |
| Mup1 | Major urinary protein 1 | 18 | 0 | 1.22E-04 | 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.36 |
| Pnpo | Pyridoxine 5′-phosphate oxidase | 12 | 0 | 9.76E-04 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.14 |
Figure 2.Volcano plot of canonical tags. For every tag, the ratio in expression levels of transgenic over wild-type mice (2log scale, x-axis) is plotted against the Bayesian error rate (10log scale, y-axis). The horizontal line indicates the significance threshold applied, the 3179 differentially expressed tags being above that line. The plot shows that the tags with highest average differences between trasngenic and wild-type mice (far left and right part of the plot) are not all significant (due to large intragroup variation). The most significant tags (top of the plot) generally display small differences in expression between transgenic and wild-type but are, due to relatively high expression levels, very accurately measured and therefore display low intragroup variation.
List of the 20 most significantly differentially expressed tags
| Tag | Chr | Strand | Start | Unigene ID | Entrez ID | Gene symbol | Gene name | Ratio | Vencio's error rate |
|---|---|---|---|---|---|---|---|---|---|
| CATGCACTTAGAGTGTGAGAG | chr10 | − | 126575485 | Mm.248373 | 216441 | C78409 | Expressed sequence C78409 | 2.48 | <1E-50 |
| CATGTCCACTACACAGAGCAT | chr6 | + | 55008968 | Mm.250004 | 353172 | Gars | Glycyl-tRNA synthetase | 1.98 | <1E-50 |
| CATGGGGCAGGGAGCATTCAG | chr4 | + | 151150448 | Mm.277464 | 57295 | Icmt | Isoprenylcysteine carboxyl methyltransferase | 2.75 | <1E-50 |
| CATGGTCAGAAGCAGAAGCTA | chr8 | − | 88150714 | Mm.296520 | 65114 | Vps35 | Vacuolar protein sorting 35 | 3.83 | <1E-50 |
| CATGCTGCTAAGCAGAAGCAA | chr19 | − | 5274809 | Mm.196532 | 319322 | Sf3b2 | Splicing factor 3b, subunit 2 | 18.36 | <1E-50 |
| CATGAAATTAATAAAAGTTAC | chr16 | − | 30232416 | Mm.426334 | 106342 | AU022875 | Expressed sequence AU022875 | 0.34 | <1E-50 |
| CATGAAGGACTATGTCTAATC | chr19 | − | 60918807 | Mm.29821 | 11757 | Prdx3 | Peroxiredoxin 3 | 0.31 | <1E-50 |
| CATGATGTCTAAGCTGAGAAA | chr12 | − | 80083926 | Mm.265929 | 11847 | Arg2 | Arginasetype II | 0.43 | <1E-50 |
| CATGTAGTCAGGGAGAAAACC | chr8 | + | 126289830 | Mm.178818 | 66855 | Tcf25 | Transcription factor 25 (basic helix-loop-helix) | 0.62 | <1E-50 |
| CATGGTGAACGTGCCTAAAAC | chrX | + | 129932066 | Mm.286408 | 19982 | Rpl36a | Ribosomal protein L36a | 0.30 | <1E-50 |
| CATGACAGACTTAAAACTGCT | chr9 | + | 54514230 | Mm.52319 | 58233 | Dnaja4 | DnaJ (Hsp40) homolog, subfamily A, member 4 | 0.26 | 1.00E-50 |
| CATGACAGCAGTATAAGGATC | chr10 | + | 83192493 | Mm.271188 | 69784 | 1500009 L16Rik | RIKEN cDNA 1500009L16 gene | 0.41 | 1.00E-50 |
| CATGACTGACTCACACAGAGA | chr18 | + | 77175488 | Mm.236127 | 76987 | Hdhd2 | Haloacid dehalogenase-like hydrolase domain containing 2 | 0.56 | 4.20E-49 |
| CATGATGATAATGGACTGAGC | chr14 | − | 24757417 | Mm.33344 | 211623 | Plac9 | Placenta specific 9 | 2.15 | 1.98E-48 |
| CATGAAATAAATGTCAAGGGC | chr9 | − | 26724636 | Mm.289244 | 66948 | Acad8 | Acyl-coenzyme A dehydrogenase family, member 8 | 0.43 | 3.12E-47 |
| CATGTACAATGTGACAATAAA | chr18 | + | 33320540 | Mm.391658 | 12326 | Camk4 | Calcium/calmodulin-dependent protein kinase IV | 0.45 | 2.30E-45 |
| CATGTTTCAAATAAAATTCTC | chr7 | + | 130555878 | Mm.86322 | 57752 | Tacc2 | Transforming, acidic coiled-coil containing protein 2 | 0.26 | 1.09E-44 |
| CATGGACCTGAAGCTCCTGGA | chr2 | − | 30782819 | Mm.154994 | 30931 | Tor1a | Torsin family 1, member A (torsin A) | 2.08 | 2.57E-43 |
| CATGCCAATTGTCCTGTGCAT | chr8 | + | 86886174 | Mm.19111 | 18747 | Prkaca | Protein kinase, cAMP dependent, catalytic, alpha | 1.70 | 5.71E-43 |
| CATGCTGTCTGGCCTTAGTGT | chr5 | − | 124379384 | Mm.44261 | 19679 | Pitpnm2 | Phosphatidylinositol transfer protein, membrane-associated 2 | 1.74 | 1.13E-41 |
Displayed ratios are the ratios of the averaged normalized number of counts in transgenic over those in wild-type mice.
Significantly deregulated pathways in DCLK transgenic mice
| GOID | Term | Ontology | Genes tested | Statistic | Median | |
|---|---|---|---|---|---|---|
| GO:0051010 | Microtubule plus-end binding | MF | 4 | 136 | 3.07 | 0.022 |
| GO:0004683 | Calmodulin regulated protein kinase activity | MF | 8 | 161 | 2.79 | 0.011 |
| GO:0005391 | Sodium:potassium-exchanging ATPase activity | MF | 6 | 416 | 2.71 | 0.013 |
| GO:0016909 | SAP kinase activity | MF | 5 | 31 | 2.67 | 0.010 |
| GO:0019238 | Cyclohydrolase activity | MF | 4 | 40 | 2.61 | 0.027 |
| GO:0019209 | Kinase activator activity | MF | 9 | 70 | 2.31 | 0.014 |
| GO:0043552 | Positive regulation of phosphoinositide 3-kinase activity | BP | 4 | 454 | 2.29 | 0.009 |
| GO:0046339 | Diacylglycerol metabolic process | BP | 5 | 45 | 2.18 | 0.039 |
| GO:0021782 | Glial cell development | BP | 7 | 118 | 2.07 | 0.015 |
| GO:0048709 | Oligodendrocyte differentiation | BP | 5 | 143 | 2.07 | 0.017 |
| GO:0014037 | Schwann cell differentiation | BP | 5 | 37 | 2.07 | 0.027 |
| GO:0030325 | Adrenal gland development | BP | 5 | 23 | 2.07 | 0.031 |
| GO:0001936 | Regulation of endothelial cell proliferation | BP | 5 | 27 | 2.07 | 0.035 |
| GO:0009894 | Regulation of catabolic process | BP | 10 | 20 | 1.94 | 0.017 |
| GO:0006970 | Response to osmotic stress | BP | 6 | 298 | 1.84 | 0.010 |
| GO:0004602 | Glutathione peroxidase activity | MF | 6 | 44 | 1.80 | 0.012 |
| GO:0042176 | Regulation of protein catabolic process | BP | 9 | 21 | 1.77 | 0.018 |
| GO:0006265 | DNA topological change | BP | 8 | 38 | 1.75 | 0.027 |
| GO:0015020 | Glucuronosyltransferase activity | MF | 9 | 34 | 1.66 | 0.016 |
| GO:0000149 | SNARE binding | MF | 15 | 584 | 1.55 | 0.014 |
| GO:0030295 | Protein kinase activator activity | MF | 7 | 75 | 1.51 | 0.016 |
The global test (11) was used to identify which pathways, as defined by the Gene Ontology consortium (BP = biological process; MF = molecular function), were significantly deregulated in DCLK mice. Only nonredundant pathways which contained at least four genes, had an asymptotic P-value <0.05, and for which the median of the z-scores of all genes in the pathway was at least 1.5, are shown.
Overlap between DGE and microarrays in detectable transcripts
| Platform | DGE | ABI | Affy | Agilent | Illumina | LGTC |
|---|---|---|---|---|---|---|
| Detectable | 15 189 | 13 331 | 11 683 | 22 510 | 13 376 | 2017 |
| Detected with DGE | 100% | 78% | 89% | 61% | 82% | 83% |
For each platform we determined how many ENSEMBL transcripts could be reliably detected. For DGE, we put the threshold at 2 t.p.m., while for the microarray platforms the signal should be higher than the lowest 95% of all negative control spots. In the second row the number of transcripts detected by both—by a specific platform and by DGE—is expressed as a percentage of all transcripts detected by this specific platform.
Figure 3.Correlation between absolute expression level (DGE) and microarrays signal intensity. Correlation of the tag abundance (square root transformed; x-axis) and intensities [normalized as described in (9)] on the five microarray platforms (y-axis) for matching ENSEMBL transcripts, for wild-type sample 1. Pearson correlations are indicated in the graphs. ABI: Applied Biosystems; AFF: Affymetrix; ILL: Illumina; AGL: Agilent; LGTC: home-spotted long oligonucleotide arrays.
Figure 4.Assessment of precision and accuracy of DGE. (A) Samples from the wild-type and transgenic pools were sequenced in three different lanes. We calculated the three possible independent log ratios between transgenic and wild-type samples (technical replicates). As a measure of precision, we determined the pair-wise differences between these technical replicates. The distribution of these differences is plotted as a density function (black line). This is also done for three technical replicates of wild-type over transgenic ratios determined on Agilent (red) and home-spotted (blue) microarrays. We balanced the number of observations per platform through random selection of 21 886 features. (B) As a measure of accuracy, we correlated logged ratios of the expression in transgenic versus wild-type mice as obtained by DGE (x-axis) against those obtained by qPCR (y-axis). All data and primer sequences can be found in Supplementary Table 3.
Overlap between DGE and microarrays in differentially expressed transcripts
| Differentially expressed | Statistics | Direction | |||||
|---|---|---|---|---|---|---|---|
| MA | DGE | Overlap | Chi-square | P-value | Same | Opposite | |
| ABI | 8 | 2088 | 4 | 6.0 | 1.4E-02 | 4 | 0 |
| AFF | 153 | 2041 | 41 | 19.2 | 1.2E-05 | 31 | 10 |
| ILL | 52 | 2404 | 17 | 13.9 | 1.9E-04 | 14 | 3 |
| AGL | 2701 | 2414 | 400 | 1.9 | 1.7E-01 | 189 | 211 |
| LGTC | 33 | 1864 | 7 | 0.9 | 3.5E-01 | 6 | 1 |
For each subset of matching ENSEMBL transcripts between the DGE and one of the microarray platforms, we show the number of differentially expressed genes for DGE (Vencio's error rate < 0.05) and the microarray (MA; false discovery rate 10%), and the overlap. We calculate chi-square statistic and P-value, and indicate whether the overlapping genes are changed in the same or opposite direction.