| Literature DB >> 29522282 |
Li-Ou Han1, Xin-Yu Li2, Ming-Ming Cao2, Yan Cao2, Li-Hong Zhou2.
Abstract
New molecular signatures are needed to improve the diagnosis of thyroid cancer (TC) and avoid unnecessary surgeries. In this study, we aimed to develop a robust and individualized diagnostic signature in TC. Gene expression profiles of tumor and nontumor samples were from 13 microarray datasets of Gene Expression Omnibus (GEO) database and one RNA-sequencing dataset of The Cancer Genome Atlas (TCGA). A total of 1246 samples were divided into a training set (N = 435), a test set (N = 247), and one independent validation set (N = 564). In the training set, 115 most frequent differentially expressed genes (DEGs) among the included datasets were used to construct 6555 gene pairs, and 19 significant pairs were detected to further construct the diagnostic signature by a penalized generalized linear model. The signature showed a good diagnostic ability for TC in the training set (area under receiver operating characteristic curve (AUC) = 0.976), test set (AUC = 0.960), and TCGA dataset (AUC = 0.979). Subgroup analyses showed consistent results when considering the type of nontumor samples and microarray platforms. When compared with two existing molecular signatures in the diagnosis of thyroid nodules, the signature (AUC = 0.933) also showed a higher diagnostic ability (AUC = 0.886 for a 7-gene signature and AUC = 0.892 for a 10-gene signature). In conclusion, our study developed and validated an individualized diagnostic signature in TC. Large-scale prospective studies were needed to further validate its diagnostic ability.Entities:
Keywords: Diagnostic; individualized; signature; thyroid cancer
Mesh:
Substances:
Year: 2018 PMID: 29522282 PMCID: PMC5911625 DOI: 10.1002/cam4.1397
Source DB: PubMed Journal: Cancer Med ISSN: 2045-7634 Impact factor: 4.452
Details about the datasets used in this study
| Accession | Year | Area | Platform | Number of samples | ||
|---|---|---|---|---|---|---|
| Total | Tumor | Nontumor | ||||
| Training set | ||||||
| GSE27155 | 2011 | USA | GPL96 | 99 | 78 | 21 |
| GSE33630 | 2012 | Belgium | GPL570 | 105 | 60 | 45 |
| GSE35570 | 2015 | Poland | GPL570 | 116 | 65 | 51 |
| GSE60542 | 2015 | Belgium | GPL570 | 63 | 33 | 30 |
| GSE82208 | 2017 | Poland | GPL570 | 52 | 27 | 25 |
| Test set | ||||||
| GSE29265 | 2012 | Belgium | GPL570 | 49 | 29 | 20 |
| GSE3467 | 2005 | USA | GPL570 | 18 | 9 | 9 |
| GSE3678 | 2006 | USA | GPL570 | 14 | 7 | 7 |
| GSE53157 | 2013 | Portugal | GPL570 | 27 | 24 | 3 |
| GSE5364 | 2008 | Singapore | GPL96 | 51 | 35 | 16 |
| GSE58545 | 2015 | Poland | GPL96 | 45 | 27 | 18 |
| GSE6004 | 2006 | USA | GPL570 | 18 | 14 | 4 |
| GSE65144 | 2015 | USA | GPL570 | 25 | 12 | 13 |
| Independent validation set | ||||||
| TCGA | 2015 | USA | IlluminaHiSeq | 564 | 505 | 59 |
Figure 1Heatmap of the gene‐pair scores in tumor and nontumor samples of training set.
Signature information
| Gene pair 1 | Full name | Gene pair 2 | Full name | Coefficient |
|---|---|---|---|---|
| CA4 | Carbonic anhydrase IV | CDH3 | Cadherin 3, type 1, P‐cadherin | −0.343957681 |
| CA4 | Carbonic anhydrase IV | DPP4 | Dipeptidyl‐peptidase 4 | −0.414152857 |
| DPP4 | Dipeptidyl‐peptidase 4 | SMAD9 | SMAD family member 9 | 0.198178763 |
| GLRB | Glycine receptor, beta | SEMA3D | Sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3D | 0.213058352 |
| GLT8D2 | Glycosyltransferase 8 domain containing 2 | IER2 | Immediate early response 2 | 0.054809905 |
| GLUL | Glutamate‐ammonia ligase | TFF3 | Trefoil factor 3 | 0.106305040 |
| HSPA5 | Heat shock 70 kDa protein 5 | TFF3 | Trefoil factor 3 | 0.137164218 |
| ID1 | Inhibitor of DNA binding 1 | TPO | Thyroid peroxidase | 0.133836321 |
| ITIH5 | Interalpha‐trypsin inhibitor heavy chain family, member 5 | LRP4 | Low‐density lipoprotein receptor‐related protein 4 | −0.449239654 |
| KRT19 | Keratin 19 | LMOD1 | Leiomodin 1 | 0.166685729 |
| KRT19 | Keratin 19 | LRP1B | Low‐density lipoprotein receptor‐related protein 1B | 0.026410689 |
| LMOD1 | Leiomodin 1 | SLC34A2 | Solute carrier family 34, member 2 | −0.134480682 |
| LRP1B | Low‐density lipoprotein receptor‐related protein 1B | LRP4 | Low‐density lipoprotein receptor‐related protein 4 | −0.180802140 |
| LRP1B | Low‐density lipoprotein receptor‐related protein 1B | MYEF2 | Myelin expression factor 2 | −1.070962895 |
| LRP1B | Low‐density lipoprotein receptor‐related protein 1B | SLMO1 | Slowmo homolog 1 | −0.158658033 |
| LRP4 | Low‐density lipoprotein receptor‐related protein 4 | TNFRSF11B | Tumor necrosis factor receptor superfamily, member 11b | 0.149105475 |
| NELL2 | NEL‐like 2 | TCEAL2 | Transcription elongation factor A (SII)‐like 2 | 0.178969566 |
| QPCT | Glutaminyl‐peptide cyclotransferase | TNFRSF11B | Tumor necrosis factor receptor superfamily, member 11b | 0.692483617 |
| TCEAL2 | Transcription elongation factor A (SII)‐like 2 | TRAPPC6A | Trafficking protein particle complex 6A | −0.229715917 |
Figure 2Receiver operating characteristic (ROC) curves and area under ROC curve (AUC) of the diagnostic signature in training set.
Figure 3Receiver operating characteristic (ROC) curves and area under ROC curve (AUC) of the diagnostic signature in test set.
Figure 4Receiver operating characteristic (ROC) curves and area under ROC curve (AUC) of different signatures in diagnosing thyroid nodules.