| Literature DB >> 26617681 |
Jose-Miguel Yamal1, Martial Guillaud2, E Neely Atkinson3, Michele Follen4, Calum MacAulay2, Scott B Cantor5, Dennis D Cox3.
Abstract
Although the Papanicolaou smear has been successful in decreasing cervical cancer incidence in the developed world, there exist many challenges for implementation in the developing world. Quantitative cytology, a semi-automated method that quantifies cellular image features, is a promising screening test candidate. The nested structure of its data (measurements of multiple cells within a patient) provides challenges to the usual classification problem. Here we perform a comparative study of three main approaches for problems with this general data structure: a) extract patient-level features from the cell-level data; b) use a statistical model that accounts for the hierarchical data structure; and c) classify at the cellular level and use an ad hoc approach to classify at the patient level. We apply these methods to a dataset of 1,728 patients, with an average of 2,600 cells collected per patient and 133 features measured per cell, predicting whether a patient had a positive biopsy result. The best approach we found was to classify at the cellular level and count the number of cells that had a posterior probability greater than a threshold value, with estimated 61% sensitivity and 89% specificity on independent data. Recent statistical learning developments allowed us to achieve high accuracy.Entities:
Keywords: DNA ploidy; L1-regularized logistic regression; cross-validation; multilevel classification; quantitative cytology; variable selection
Year: 2015 PMID: 26617681 PMCID: PMC4659436 DOI: 10.1002/sam.11261
Source DB: PubMed Journal: Stat Anal Data Min ISSN: 1932-1864 Impact factor: 1.051