| Literature DB >> 35207657 |
Manuel Casal-Guisande1,2,3, Alberto Comesaña-Campos1,3, Inês Dutra2,3, Jorge Cerqueiro-Pequeño1,3, José-Benito Bouza-Rodríguez1,3.
Abstract
Breast cancer is currently one of the main causes of death and tumoral diseases in women. Even if early diagnosis processes have evolved in the last years thanks to the popularization of mammogram tests, nowadays, it is still a challenge to have available reliable diagnosis systems that are exempt of variability in their interpretation. To this end, in this work, the design and development of an intelligent clinical decision support system to be used in the preventive diagnosis of breast cancer is presented, aiming both to improve the accuracy in the evaluation and to reduce its uncertainty. Through the integration of expert systems (based on Mamdani-type fuzzy-logic inference engines) deployed in cascade, exploratory factorial analysis, data augmentation approaches, and classification algorithms such as k-neighbors and bagged trees, the system is able to learn and to interpret the patient's medical-healthcare data, generating an alert level associated to the danger she has of suffering from cancer. For the system's initial performance tests, a software implementation of it has been built that was used in the diagnosis of a series of patients contained into a 130-cases database provided by the School of Medicine and Public Health of the University of Wisconsin-Madison, which has been also used to create the knowledge base. The obtained results, characterized as areas under the ROC curves of 0.95-0.97 and high success rates, highlight the huge diagnosis and preventive potential of the developed system, and they allow forecasting, even when a detailed and contrasted validation is still pending, its relevance and applicability within the clinical field.Entities:
Keywords: breast cancer; clinical decision support system; data augmentation; design science research; expert systems; exploratory factorial analysis; machine learning; medical algorithm
Year: 2022 PMID: 35207657 PMCID: PMC8880667 DOI: 10.3390/jpm12020169
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Summary of descriptors for the dataset used.
| Number of cases | 130 |
| Cancer cases | 21 |
| No cancer cases | 109 |
| Average age | 55.2 |
| Number of significative criteria used | 17 |
| Nature of data | Qualitative |
Figure 1Diagram of the Clinical Decision Support System. This diagram shows the different flows of information across the three system stages. Stage 1 collects and groups the initial data, Stage 2 performs the data analysis and the inference of results, and finally, Stage 3 determines the decision of the system.
Data associated to the patient’s history.
| Data | Comment |
|---|---|
| Age | - |
| Personal history | It aims to show if the person had/had not previously any cancer type or any cancer-related issue. |
| Family history | It assesses whether any breast cancer cases existed in the family’s first- or second-degree members. |
Mammogram-related data.
| Data | Comment |
|---|---|
| Masses | In the case any mass is present, then its shape, margins, and density are documented. |
| Calcifications | In a similar way to masses, in the case of calcifications, their morphology is documented (these are usually grouped into ‘typically benign’ and ‘suspicious morphology’ categories), and their distribution is characterized. It is also indicated whether the calcifications are either primary or associated, that is, if this is a predominant characteristic or, on the contrary, it is associated to other characteristic, thus having a minor entity. |
| Architectural distortion and asymmetries | In the case of distortion, it is indicated if the characteristic is present and if it is primary or associated. On the other hand, in the case of asymmetries, it is indicated if any is present, and then its type (focal, in development, etc.). |
| BI-RADS© indicator | The BI-RADS© (Breast Imaging Reporting and Data System) system is nowadays a widely accepted and used diagnosis instrument in the evaluation of breast cancer. It was developed by the American College of Radiology (ACR) [ |
| Composition | The breast type, i.e., the tissue type, will also be taken into account. As commented in |
Figure 2Detail of the expert systems cascade.
Figure 3Screen capture of the application’s initial screen. [1.a] refers to the inputs of the Expert system #1a shown in Figure 1 and Figure 2; [1.b] refers to the inputs of the Expert system #1b shown in Figure 1 and Figure 2; [1.c] refers to the inputs of the Expert system #1c shown Figure 1 and Figure 2; [2] refers to the inputs of the Expert system #2 shown in Figure 1 and Figure 2; [3] refers to the inputs of the Expert system #3 shown in Figure 1 and Figure 2; [4] refers to the inputs of the Expert system #4 shown in Figure 1 and Figure 2.
Figure 4Schematic diagram of the inference systems.
General data about the Machine Learning algorithms.
|
|
| Distance: Euclidean distance |
| Number of nearest neighbors in X used to classify each point: 10 |
| Distance weight: Squared inverse |
|
|
| Ensemble aggregation method: Bag |
| Number of ensemble learning cycles: 30 |
| Learners: Decision tree |
| Maximum number of splits: 218 |
Figure 5ROC curves for the best models. (a) Weighted KNN model; (b) Bagged Trees model.
Data of the patient to be studied input to the application.
|
| |
| Present/Absent | Absent |
| Shape | (None) |
| Margins | (None) |
| Density | (None) |
|
| |
| Present/Absent | Present |
| Primary/Associated | Associated |
| Shape | Coarse heterogeneous |
| Distribution | Segmental |
|
| |
| Present/Absent | Present |
| Type | Focal |
|
| |
| Present/Absent | Absent |
| Primary/Associated | (None) |
|
| 4A |
|
| Scattered |
|
| |
| Age | 53 |
| Patient history | No |
| Family history | Minor |
Diagnosis returned by the system for the 11 developed cases.
| Mass | Calcifications | Asymmetry | Architectural Distortion | BI-RADS Category | Breast Density | Other Data | Hazard Index | System Advice | Real Chart | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Shape | Margin | Density | Type | Shape | Distribution | Age | Patient History | Family History | ||||||||
| 1 | - | - | - | Associated | Coarse heterogeneous | Segmental | Focal | . | 4A | Scattered | 53 | No | Minor | 66.67 | Cancer | Cancer |
| 2 | Oval | Circumscribed | High | - | - | - | - | - | 4B | Heterogeneously | 50 | N/A | N/A | 63.33 | Uncertain | No cancer |
| 3 | Irregular | Indistinct | High | - | - | - | - | - | 4B | Heterogeneously | 42 | No | None | 56.67 | No cancer | No cancer |
| 4 | Irregular | Spiculated | Equal | Associated | Coarse heterogeneous | Grouped | - | - | 4B | Scattered | 65 | No | None | 53.33 | No cancer | No cancer |
| 5 | - | - | - | Associated | Amorphous | Grouped | Focal | - | 4B | Heterogeneously | 31 | No | None | 56.67 | No cancer | No cancer |
| 6 | - | - | - | Primary | Amorphous | Grouped | - | - | 4B | Heterogeneously | 49 | Yes | N/A | 53.33 | No cancer | No cancer |
| 7 | - | - | - | - | - | - | - | Primary | 4C | Scattered | 62 | Yes | Major | 60 | Uncertain | No cancer |
| 8 | - | - | - | - | - | - | - | Primary | 4B | Scattered | 58 | No | None | 56.67 | No cancer | No cancer |
| 9 | - | - | - | - | - | - | Developing | - | 4B | Scattered | 64 | No | None | 50 | No cancer | No cancer |
| 10 | - | - | - | Primary | Fine pleomomophic | Grouped | - | - | 4B | Scattered | 58 | No | Major | 30 | No cancer | No cancer |
| 11 | Irregular | Indistinct | High | - | - | - | - | - | 4B | Heterogeneously | 53 | No | Major | 87.22 | Cancer | Cancer |