| Literature DB >> 32650803 |
G C Mayne1, C M Woods1, N Dharmawardana1, T Wang2, S Krishnan3, J C Hodge3, A Foreman3, S Boase4,5, A S Carney2, E A W Sigston6, D I Watson1, E H Ooi1, D J Hussey7.
Abstract
BACKGROUND: Oropharyngeal squamous cell carcinoma (OPSCC) is often diagnosed at an advanced stage because the disease often causes minimal symptoms other than metastasis to neck lymph nodes. Better tools are required to assist with the early detection of OPSCC. MicroRNAs (miRNAs, miRs) are potential biomarkers for early head and neck squamous cell cancer diagnosis, prognosis, recurrence, and presence of metastatic disease. However, there is no widespread agreement on a panel of miRNAs with clinically meaningful utility for head and neck squamous cell cancers. This could be due to variations in the collection, storage, pre-processing, and isolation of RNA, but several reports have indicated that the selection and reproducibility of biomarkers has been widely affected by the methods used for data analysis. The primary analysis issues appear to be model overfitting and the incorrect application of statistical techniques. The purpose of this study was to develop a robust statistical approach to identify a miRNA signature that can distinguish controls and patients with inflammatory disease from patients with human papilloma virus positive (HPV +) OPSCC.Entities:
Keywords: Biomarkers; Data analysis; Oropharyngeal squamous cell carcinoma; Serum; microRNAs
Mesh:
Substances:
Year: 2020 PMID: 32650803 PMCID: PMC7350687 DOI: 10.1186/s12967-020-02446-1
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Fig. 1Nested cross validation scheme with stable variable selection (StaVarSel). In the inner loop the level of regularisation (lambda) for the regression model was optimised via repeated tenfold cross validation. For the StaVarSel, the miR-ratios derived from applying lasso regression with the optimised lambda to each training set were collated, ranked according to frequency of selection, and then subjected to stepwise selection at percentile cut-offs to determine the optimum model with the least prediction error. The stable miR-ratios thus selected from the inner loop cross validation were then used to build regression models in the cross validation outer loop and make predictions of the held-out samples
Clinicopathologic characteristics of the patients included in this analysis
| Characteristic | Controls (n = 19) | GORD (n = 20) | OPSCCs (n = 39) |
|---|---|---|---|
| Median age, years (range) ** | 60 (50–69) | 56 (39–86) | 58 (47–74) |
| Sex | |||
| Male | 19 | 20 | 36 |
| Female | 0 | 0 | 3 |
| Smoking | |||
| Never smoked | – | – | 20 |
| Smoked | – | – | 19 |
| Overall stage (AJCC 7) | |||
| Stage III | 3 | ||
| Stage IVa | 35 | ||
| Stage IVb | 1 | ||
| T-stage | |||
| T1 | – | – | 10 |
| T2 | – | – | 14 |
| T3 | – | – | 9 |
| T4 | – | – | 6 |
| Lymph node metastasis | |||
| N0 | – | – | 2 |
| N1-N2 | – | – | 37 |
| Cancer location | |||
| Tonsil | – | – | 26 |
| Base of tongue | – | – | 13 |
**There were no significant differences in median age between controls, patients with GORD, and patients with OPSCC (Kruskal–Wallis test, p = 0.75)
Fig. 2ROC curves with 95% confidence intervals for sensitivity and specificity at each threshold level. a Standard nested 2-stage cross validation method (optimized lambda lasso regression). b Nested 2-stage cross validation with additive penalization (one-standard-error rule). c Stabilized percentile lasso nested 3-stage cross validation method (11 miR-ratio logistic regression model)
miRNAs present in the 11 miR-ratios model
| MiRNA-ratio | Denominator miRNA (miRBase) | Numerator miRNA (miRBase) |
|---|---|---|
| 1 | hsa-miR-206 | |
| 2 | ||
| 3 | hsa-miR-532-3p | |
| 4 | hsa-miR-193b-3p | |
| 5 | ||
| 6 | hsa-miR-150-5p | |
| 7 | hsa-miR-193a-5p | |
| 8 | hsa-miR-93-5p | |
| 9 | hsa-miR-152-3p | |
| 10 | ||
| 11 | hsa-miR-375-3p |
Each row in the table lists the two miRs present in each miR-ratio. The bold highlighted miRNAs were differentially expressed when normalized with selected house keeping genes
Fig. 3a cross validated sensitivity vs. specificity estimates from ROC curve analysis using the “stable” 11 miR-ratio multivariate logistic regression model. b cross validated sensitivity (red) and specificity (blue) lower bound estimates at increasing threshold levels using the “stable” 11 miR-ratio model