| Literature DB >> 21470421 |
Klemens Vierlinger1, Markus H Mansfeld, Oskar Koperek, Christa Nöhammer, Klaus Kaserer, Friedrich Leisch.
Abstract
BACKGROUND: Several DNA microarray based expression signatures for the different clinically relevant thyroid tumor entities have been described over the past few years. However, reproducibility of these signatures is generally low, mainly due to study biases, small sample sizes and the highly multivariate nature of microarrays. While there are new technologies available for a more accurate high throughput expression analysis, we show that there is still a lot of information to be gained from data deposited in public microarray databases. In this study we were aiming (1) to identify potential markers for papillary thyroid carcinomas through meta analysis of public microarray data and (2) to confirm these markers in an independent dataset using an independent technology.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21470421 PMCID: PMC3082219 DOI: 10.1186/1755-8794-4-30
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Figure 1DWD Integration. The effect of DWD on the first two principal components (PC) and hierarchical clustering of the data. DWD was able to remove the separation between the datasets as indicated by the PC-plots and by the mixing of the branches in the dendrogram. The PC plots show that biological information is preserved after DWD integration (Samples cluster by dataset before integration and by tumor entity thereafter). Leaves in the dendrogram are colored by tumor entity and branches are colored by dataset.
Classification Results before and after DWD integration
| before | DWD | after | DWD | |||||
|---|---|---|---|---|---|---|---|---|
| he | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 | 1.00 | 0.96 | 1.00 |
| huang | 0.50 | 1.00 | 0.55 | 0.50 | 0.50 | 1.00 | 0.90 | 0.71 |
| jarzab | 0.50 | 0.81 | 1.00 | 0.57 | 0.89 | 1.00 | 1.00 | 1.00 |
| reyes | 0.78 | 0.50 | 0.92 | 1.00 | 0.89 | 0.88 | 0.90 | 1.00 |
Classification results for PTC-data when applying classifiers from one study on another study. Before (left) and after (right) DWD integration.
Figure 2Heatmap and hierarchical clustering of meta analysis data on further candidate marker genes. As shown in the heatmap, there are a range of different genes with good discriminatory power between PTC and benign nodules. Therefore, removing SERPINA1 from the meta analysis dataset leads to a range of possible expression signatures with 99% classification accuracy in leave-one-out crossvalidation (see main text). In all cases, the same one sample is misclassified (see discussion for details). Columns correspond to samples and rows to genes, the red/green color bar on top of the heatmap corresponds to the histological classification (Red: benign, Green: PTC).
Figure 3SERPINA1 expression. Expression values and receiver-operating-characteristics (ROC) analysis of the SERPINA1 gene in the meta analysis data (left) and the RT-qPCR independent validation data (right). Classification thresholds were chosen from ROC analysis (shown as 'X' in the ROC plots). Positive Predictive Values (PPV) were calculated as number of true positives/number of all positives, Negative Predictive Values (NPV) as number of true negatives/number of all negatives, both at the chosen threshold.
Differential Expression Analysis
| SYMBOL | RefSeq | logFC | adj.P.Val | KEGGID | ||
|---|---|---|---|---|---|---|
| SERPINA1 | NM_000295, NM_001002235, NM_001002236 | 3.30 | 7.81e-39 | 0.98 | 1.00 | 04610 |
| PROS1 | NM_000313 | 2.12 | 8.89e-34 | 0.98 | 1.00 | 04610 |
| LRP4 | NM_002334 | 2.80 | 8.89e-34 | 1.00 | 0.96 | NA |
| NPC2 | NM_006432 | 1.29 | 5.83e-33 | 1.00 | 0.94 | 04142 |
| LAMB3 | NM_000228, NM_001017402 | 2.46 | 5.11e-31 | 0.96 | 1.00 | 04510, 04512, 05200, 05222 |
| DPP4 | NM_001935 | 2.86 | 6.96e-31 | 0.98 | 0.98 | NA |
| SDC4 | NM_002999 | 1.68 | 5.50e-30 | 0.96 | 0.96 | 04512, 04514 |
| IPCEF1 | NM_015553 | -2.10 | 2.88e-29 | 1.00 | 0.00 | NA |
| QPCT | NM_012413 | 2.09 | 3.00e-29 | 0.98 | 1.00 | NA |
| MPPED2 | NM_001584 | -2.28 | 7.73e-29 | 1.00 | 0.00 | NA |
| TIMP1 | NM_003254 | 1.85 | 1.48e-27 | 0.96 | 0.94 | NA |
| TFF3 | NM_003226 | -3.42 | 2.07e-27 | 1.00 | 0.00 | NA |
| PRSS23 | NM_007173 | 1.48 | 2.32e-27 | 0.98 | 0.98 | NA |
| MET | NM_000245 | 1.64 | 2.37e-27 | 0.94 | 0.98 | 04060, 04144, 04360, 04510, 04520, 05120, 05200, 05210, 05211, 05218 |
| CDH3 | NM_001793 | 2.72 | 3.45e-27 | 0.96 | 0.94 | 04514 |
| GGCT | NM_024051 | 1.21 | 2.74e-26 | 0.98 | 0.92 | 00480 |
| PDLIM4 | NM_003687 | 2.21 | 3.37e-26 | 0.96 | 0.98 | NA |
| KRT19 | NM_002276 | 2.33 | 3.40e-25 | 0.98 | 0.98 | NA |
| CITED1 | NM_004143 | 3.17 | 4.42e-25 | 0.94 | 0.96 | NA |
| CHI3L1 | NM_001276 | 3.50 | 4.42e-25 | 0.96 | 0.90 | NA |
The first 20 entries in the toptable of differential expression in meta analysis data, including log2 fold changes (logFC) and Benjamini-Hochberg adjusted p-values (adj.P.Value) as calculated by the limma-software. Sensitivity and Specificity are given for the distinction of PTC vs NG at maximum accuracy. Annotational information like gene symbols, consensus sequence identifiers (RefSeq) and KEGG Pathway IDs are shown.
Datasets used for meta analysis
| Published | PTC | c.lat | Platform | Size | ||
|---|---|---|---|---|---|---|
| He | PNAS 2005 | 9 | 0 | 9 | Affy U133plus | 54k |
| Huang | PNAS 2001 | 8 | 0 | 8 | Affy U95A | 12k |
| Jarzab | Cancer Res 2005 | 23 | 11 | 17 | Affy U133A | 22k |
| Reyes | not published | 7 | 0 | 7 | Affy U133A | 22k |
Microarray Data used for meta analysis. The studies used 2 types of benign samples: samples from patients that were operated for other thyroid disease (o.d) and samples from the contralateral lobe (c.lat). Data was obtained from GEO http://www.ncbi.nlm.nih.gov/geo/ or institutional websites.