| Literature DB >> 28545021 |
Yao Liu1, Yicheng Li2, Yue Fu1, Tong Liu1, Xiaoyong Liu3, Xinyan Zhang4, Jie Fu1, Xiaobing Guan1, Tong Chen5, Xiaoxin Chen2, Zheng Sun1.
Abstract
Exfoliative cytology has been widely used for early diagnosis of oral squamous cell carcinoma. We have developed an oral cancer risk index using DNA index value to quantitatively assess cancer risk in patients with oral leukoplakia, but with limited success. In order to improve the performance of the risk index, we collected exfoliative cytology, histopathology, and clinical follow-up data from two independent cohorts of normal, leukoplakia and cancer subjects (training set and validation set). Peaks were defined on the basis of first derivatives with positives, and modern machine learning techniques were utilized to build statistical prediction models on the reconstructed data. Random forest was found to be the best model with high sensitivity (100%) and specificity (99.2%). Using the Peaks-Random Forest model, we constructed an index (OCRI2) as a quantitative measurement of cancer risk. Among 11 leukoplakia patients with an OCRI2 over 0.5, 4 (36.4%) developed cancer during follow-up (23 ± 20 months), whereas 3 (5.3%) of 57 leukoplakia patients with an OCRI2 less than 0.5 developed cancer (32 ± 31 months). OCRI2 is better than other methods in predicting oral squamous cell carcinoma during follow-up. In conclusion, we have developed an exfoliative cytology-based method for quantitative prediction of cancer risk in patients with oral leukoplakia.Entities:
Keywords: exfoliative cytology; oral cancer risk; oral leukoplakia; oral squamous cell carcinoma; quantitative prediction
Mesh:
Year: 2017 PMID: 28545021 PMCID: PMC5542248 DOI: 10.18632/oncotarget.17550
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
Figure 1Oral cancer risk index 2 (OCRI2) of normal subjects, OLK and OSCC patients in the training and validation sets using five statistical models (SVM, SVMfull, KNN, CF and RF)
Y-axis represents the value of OCRI2. Each boxplot showed the median and 25%-75% of values
Figure 2Prediction of OSCC in OLK patients during the follow-up period using three methods
Seven patients developed OSCC during follow-up. With 0.5 as an arbitrary cut-off, OCRI2 predicted OSCC better than OCRI and traditional method.
General characteristics of subjects of the training set and the validation set
| General characteristics | Training set | Validation set | ||||
|---|---|---|---|---|---|---|
| Normal | OLK | OSCC | Normal | OLK | OSCC | |
| Age (y) | ||||||
| Mean ± SD | 39.67 ± 15.48 | 57.68 ± 13.51 | 64.68 ± 11.71 | 44.00 ± 16.00 | 58.16 ± 11.48 | 61.70 ± 11.11 |
| Range | 20 – 68 | 26 - 77 | 32 - 88 | 22 - 80 | 25 - 85 | 21 - 83 |
| Sex | ||||||
| Male, n (%) | 5 (27.8) | 19 (67.9) | 20 (48.8) | 46 (45.1) | 37 (45.1) | 45 (48.4) |
| Female, n (%) | 13 (72.2) | 9 (32.1) | 21 (51.2) | 56 (54.9) | 45 (54.9) | 48 (51.6) |
| Site | ||||||
| Tongue, n (%) | 4 (22.2) | 6 (21.4) | 16 (39.0) | 28 (27.5) | 22 (26.8) | 41 (44.1) |
| Gingival, n (%) | 4 (22.2) | 11 (39.3) | 7 (17.1) | 15 (14.7) | 33 (40.2) | 27 (29.0) |
| Other, n (%) | 10 (55.6) | 11 (39.3) | 18 (43.9) | 59 (57,8) | 27 (32.9) | 25 (26.9) |
| Smoking | ||||||
| Yes, n (%) | 1 (5.6) | 16 (57.1) | 10 (24.4) | 32 (31.4) | 29 (35.4) | 31 (33.3) |
| No, n (%) | 17 (94.4) | 12 (42.9) | 31 (75.6) | 70 (68.6) | 53 (64.6) | 62 (66.7) |
| Drinking | ||||||
| Yes, n (%) | 1 (5.6) | 9 (32.1) | 9 (22.0) | 29 (28.4) | 19 (23.2) | 29 (31.2) |
| No, n (%) | 17 (94.4) | 19 (67.9) | 32 (78.0) | 73 (71.6) | 63 (76.8) | 64 (68.8) |
| Total, n (%) | 18 (100.0) | 28 (100.0) | 41 (100.0) | 102 (100.0) | 82 (100.0) | 93 (100.0) |
Figure 3Peaks method of data transformation
Peaks method was developed from first derivatives with positives defined as peaks. Ten intervals were based on ploidy value from 0.5 to 10.5, with 0.5 to 1.5 as the first interval, 1.5 to 2.5 as the second, and so on. All data of the training set (87 cases) were pooled together to show the distribution of the data (A). Normal subjects (18 cases) showed two peaks in the first and second intervals (B). OSCC subjects (41 cases) had high variance and their data were spread out through most of the intervals (C).