| Literature DB >> 24039730 |
Boris Campillo-Gimenez1, Wassim Jouini, Sahar Bayat, Marc Cuggia.
Abstract
INTRODUCTION: Case-based reasoning (CBR) is an emerging decision making paradigm in medical research where new cases are solved relying on previously solved similar cases. Usually, a database of solved cases is provided, and every case is described through a set of attributes (inputs) and a label (output). Extracting useful information from this database can help the CBR system providing more reliable results on the yet to be solved cases.Entities:
Mesh:
Year: 2013 PMID: 24039730 PMCID: PMC3767727 DOI: 10.1371/journal.pone.0071991
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Experimental Protocol.
During the learning phase, a training set is used to compute the parameters of a logistic regression model. These parameters enable the computations of the weights of attributes as well as patients’ weights. Then a setting set is used to evaluate an optimal K value for the K-NN algorithm. Finally all these estimates are exploited to evaluate five decision making algorithms referred to by the indexes (i) to (v).
List of the attributes and weights used by the K-Nearest Neighbours algorithms before and after adding the 50 random attributes, and before and after stepwise selection of the case description attributes.
| Before adding of 50 random factors | After adding of 50 random factors | ||||
|
|
|
|
| ||
| Social and demographic factors | Sex | 0.0% | – | 0.2% | – |
| Age | 65.4% | 68.8% | 12.2% | 23.9% | |
| Current occupation | 2.5% | 2.7% | 1.3% | 1.3% | |
| Clinical and biological factors | diabetes (type 1 or 2) | 1.0% | – | 2.7% | 2.5% |
| Hypertension | 5.2% | 5.1% | 5.2% | 4.8% | |
| Chronic respiratory failure | 0.4% | – | 2.4% | 1.9% | |
| Chronic heart failure | 2.0% | – | 1.3% | 2.2% | |
| Ischemic heart disease | 5.7% | 7.3% | 2.0% | 1.3% | |
| Heart conduction disorder (or arrythmia) | 0.2% | – | 0.8% | 1.2% | |
| Past history of malignancy | 6,1% | 4.5% | 3.1% | 4.3% | |
| Positive serology (HCV, HBV, HIV) | 1.3% | – | 1.4% | – | |
| Liver cirrhosis | 0.9% | – | 1.0% | 1.9% | |
| Disability | 2.7% | 3.0% | 1.5% | 1.5% | |
| Hemoglobin (< or ≥ 11 g/dl) | 0.0% | – | 0.0% | - | |
| Factors related to medical care | Ownership of nephrology facilities (private or public) | 3.4% | 5.9% | 0.1% | – |
| Institution performing transplantation | 3.1% | 2.8% | 0.1% | – | |
| Hemodialysis or perotoneal dialysis | 0.0% | – | 1.4% | 2.6% | |
| Urgent or planned dialysis session | 0.0% | – | 0.1% | – | |
| Urgent or planned first catheterization | 0.0% | – | 0.2% | 1.8% | |
|
|
|
|
|
| |
at the first renal replacement therapy;
HCV: Hepatitis C Virus, HBV: Hepatitis B Virus, HIV: Human Immunodeficiency Virus.
Figure 2Performances of the different classification algorithms.
Predictions were performed by a logistic regression, a K-NN algorithm (standalone CBR), and three combinations of the K-NN algorithm with the logistic regression: CBR+ - a K-NN with weighted attributes, CBR+ - a K-NN with weighted patients, CBR+ - a K-NN with both weightings of attributes and patients. Performances are presented in terms of bootstrap estimates of the aera under the ROC curve with 95% confidence intervals. Prediction before adding the 50 random variables, using either the complete available attributes of the case database (A), or only the attributes selected by a stepwise automatic selection procedure (B). Prediction after adding the 50 random variables, using either the complete available attributes of the case database (C), or only the variables selected by a stepwise automatic selection procedure (D).