| Literature DB >> 31358836 |
Visa Suomi1, Gaber Komar2, Teija Sainio3, Kirsi Joronen4, Antti Perheentupa4, Roberto Blanco Sequeiros2.
Abstract
The study aim was to utilise multiple feature selection methods in order to select the most important parameters from clinical patient data for high-intensity focused ultrasound (HIFU) treatment outcome classification in uterine fibroids. The study was retrospective using patient data from 66 HIFU treatments with 89 uterine fibroids. A total of 39 features were extracted from the patient data and 14 different filter-based feature selection methods were used to select the most informative features. The selected features were then used in a support vector classification (SVC) model to evaluate the performance of these parameters in predicting HIFU therapy outcome. The therapy outcome was defined as non-perfused volume (NPV) ratio in three classes: <30%, 30-80% or >80%. The ten most highly ranked features in order were: fibroid diameter, subcutaneous fat thickness, fibroid volume, fibroid distance, Funaki type I, fundus location, gravidity, Funaki type III, submucosal fibroid type and urinary symptoms. The maximum F1-micro classification score was 0.63 using the top ten features from Mutual Information Maximisation (MIM) and Joint Mutual Information (JMI) feature selection methods. Classification performance of HIFU therapy outcome prediction in uterine fibroids is highly dependent on the chosen feature set which should be determined prior using different classifiers.Entities:
Mesh:
Year: 2019 PMID: 31358836 PMCID: PMC6662821 DOI: 10.1038/s41598-019-47484-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of study patients and treatment outcome class labels.
| Category | Count | |||||
|---|---|---|---|---|---|---|
| Treated patients | 66 | |||||
| Uterine fibroids | 89 | |||||
| Non-perfused volume (NPV) ratio | NPV | < | 30% | 15 | ||
| 30% | ≤ | NPV | ≤ | 80% | 52 | |
| 80% | < | NPV | 22 | |||
Numerical features with their mean, standard deviation (SD), minimum and maximum values per fibroid.
| Feature | Mean | SD | Min | Max |
|---|---|---|---|---|
| Age (years) | 41.9 | 6.0 | 26 | 51 |
| Weight (kg) | 68.0 | 12.2 | 44 | 108 |
| Height (cm) | 165.4 | 6.2 | 152 | 178 |
| Gravidity | 1.2 | 1.5 | 0 | 7 |
| Parity | 0.8 | 1.0 | 0 | 4 |
| Subcutaneous fat thickness (mm) | 16.2 | 7.9 | 3.3 | 41.2 |
| Front-back distance (mm) | 141.0 | 17.9 | 89.2 | 173.3 |
| Fibroid diameter (mm) | 42.0 | 20.5 | 8.6 | 89.8 |
| Fibroid distance (mm) | 47.5 | 18.7 | 15.6 | 91.8 |
| Fibroid volume (ml) | 84.4 | 126.0 | 0.4 | 898 |
Categorical features with counts per fibroid.
| Feature | Categories | Count |
|---|---|---|
| Ethnicity | White | 80 |
| Black | 6 | |
| Asian | 3 | |
| Previous pregnancies | Yes | 50 |
| No | 39 | |
| Live births | Yes | 42 |
| No | 47 | |
| C-section | Yes | 4 |
| No | 85 | |
| Treatment history | Esmya | 30 |
| Open myomectomy | 4 | |
| Laparoscopic myomectomy | 3 | |
| Hysteroscopic myomectomy | 14 | |
| Embolisation | 1 | |
| Abdominal scars | Yes | 24 |
| No | 65 | |
| Symptoms | Bleeding | 69 |
| Pain | 8 | |
| Mass | 17 | |
| Urinary | 11 | |
| Infertility | 15 | |
| Fibroid type | Intramural | 54 |
| Subserosal | 8 | |
| Submucosal | 31 | |
| Fibroid location | Anterior | 44 |
| Posterior | 38 | |
| Lateral | 31 | |
| Fundus | 7 | |
| Uterus position | Anteverted | 80 |
| Retroverted | 9 | |
| Funaki type | Type I | 18 |
| Type II | 66 | |
| Type III | 5 |
Figure 1Overview of the data processing pipeline: (1) The data were read to a dataframe; (2) split into training and test sets; (3) imputed with mean or mode values based on the training set; (4) the features were log-scaled; (5) feature selection method (k = 1–14) was used on the training set; (6) the highest-ranking features (n = 2–20) were obtained; (7) the n features from method k were used to train a support vector classification (SVC) model using hyperparameter grid search with inner 10-fold cross-validation; (8) the SVC model was refit on the whole training set using the combination of hyperparameters based on the highest cross-validation score (F1-micro); (9) the fitted SVC model was used to classify uterine fibroids in the test set with the same n features and the test score (F1-micro) was saved. Steps (1–9) were repeated 200 times with a new randomisation seed.
Hyperparameters for grid search in support vector classification (SVC) model fitting.
| Parameter | Values |
|---|---|
| Kernel | RBF |
| C | 1e-1, 1, 1e1, 1e2, 1e3, 1e4 |
| Gamma | 1e-2, 1e-1, 1, 1e1, 1e2, 1e3, 1e4 |
Figure 2Heatmap showing the median feature rankings (range 0–38 from the best to worst) from 14 different filter-based feature selection methods (200 rankings per feature) in classifying HIFU treatment outcome in uterine fibroids. In addition, the median ranking from all methods (14 methods × 200 repetitions = 2800 rankings per feature) for each feature is shown at the bottom (TOPN).
Figure 3Boxplot showing the feature rankings (range 0–38 from best to worst) from 14 different filter-based feature selection methods (14 methods × 200 repetitions = 2800 rankings per feature) in classifying HIFU treatment outcome in uterine fibroids. The features are ordered by their median value based on the rankings. Boxes show the interquartile ranges (IQR) with median values (notch) and whiskers show 1.5 IQR from the lower and upper quartiles. Outliers are plotted as individual points beyond the ends of the whiskers.
Figure 4Diagonal correlation matrix showing the pairwise Spearman’s correlation coefficients between numerical features.
Figure 5(a) Heatmap showing the mean test scores (F1-micro) from support vector classification (SVC) model in classifying HIFU treatment outcome in uterine fibroids. The values show the mean test score from 200 repetitions using the stated number of highest-ranking features from each feature selection method. In addition, the mean test scores using the highest-ranking features from aggregate votes are shown at the bottom (TOPN). (b) Lineplot showing the mean and dispersion of validation and test scores with the number of highest-ranking features from all feature selection methods. The faded areas show the 95% confidence intervals.