| Literature DB >> 32664923 |
Michela Carlotta Massi1,2, Francesca Ieva3,4,5, Emanuele Lettieri6.
Abstract
BACKGROUND: The healthcare sector is an interesting target for fraudsters. The availability of a great amount of data makes it possible to tackle this issue with the adoption of data mining techniques, making the auditing process more efficient and effective. This research has the objective of developing a novel data mining model devoted to fraud detection among hospitals using Hospital Discharge Charts (HDC) in Administrative Databases. In particular, it is focused on the DRG upcoding practice, i.e., the tendency of registering codes for provided services and inpatients health status so to make the hospitalization fall within a more remunerative DRG class.Entities:
Keywords: Administrative database; DRG; Data mining; Fraud; Upcoding
Mesh:
Year: 2020 PMID: 32664923 PMCID: PMC7362640 DOI: 10.1186/s12911-020-01143-9
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Process Flow. Schematic representation of the algorithm’s process flow. The first three grey boxes on the left represent the data sources. From HDC Dataset we derived the two additional datasets (Patients and Hospitals). Only Hospital Dataset enters Step 1, while all three of them are retrieved for Step 2
Pseudo-code for the HDCs extraction process
Pseudo-code for the first part of Step 1 Algorithm
Pseudo-code for the second part of Step 1 Algorithm
Variables of interest for cross validation
| Age | Patients | Indicator of patients’ complexity |
| length of Stay | Patients, HDC | Indicator of patients’ complexity |
| Comorbidity | Patients, HDC | Indicator of patients’ complexity |
| Total Costs | Patients, HDC | Patients’ expensiveness |
| Cost / length of Stay | HDC | Expenses in relation with intensity of care |
| Cost / Comorbidity | Hospital | Expenses in relation with intensity of care |
Fig. 2Distribution of Local Distances and Threshold. Local distances distribution of hospitals from the center of their cluster. The 95th percentile threshold is highlighted by the vertical dashed line
Fig. 3Outliers’ Analysis w.r.t. relevant Features. Distribution of the five Hospital Variables mentioned in “Model Proposal” section for the whole population of hospitals. The vertical lines represent the position of the three outliers (H11, H31, H51) which behavior is analyzed in “Results” section
Percentile of Variables Distributions each analyzed outlier falls into
| Variable | H11 | H31 | H51 |
|---|---|---|---|
| Avg Cost | 0.994 | 0.937 | 0.734 |
| Percent CC | 0.399 | 0.899 | 0.981 |
| Specialization | 0.994 | 0.816 | 0 |
| Number of Visits (patients) | 0.532 | 0.994 | 0.032 |
| Upcoding Index | 0.513 | 0.506 | 0.890 |
Fig. 4Outliers and Length of Stay. Logarithm of Length of Stay distribution for each outlier’s patients compared to all patients’ population
Fig. 5Outliers and Cost/LOS. Logarithm of Cost/Length of Stay distribution for each hospitals’ HDCs compared to all HDCs in the dataset
Fig. 6Outliers and Cost/Comorbidity. Logarithm of Cost/Comorbidity Score distribution for each hospitals’ HDCs compared to all HDCs in the dataset