| Literature DB >> 33027884 |
Conghai Zhang1, Xinyao Xiao2, Chao Wu1.
Abstract
It is estimated that approximately 10% of healthcare system expenditures are wasted due to medical fraud and abuse. In the medical area, the combination of thousands of drugs and diseases make the supervision of health care more difficult. To quantify the disease-drug relationship into relationship score and do anomaly detection based on this relationship score and other features, we proposed a neural network with fully connected layers and sparse convolution. We introduced a focal-loss function to adapt to the data imbalance and a relative probability score to measure the model's performance. As our model performs much better than previous ones, it can well alleviate analysts' work.Entities:
Keywords: anomaly detection; healthcare fraud; medical abuse
Mesh:
Year: 2020 PMID: 33027884 PMCID: PMC7579458 DOI: 10.3390/ijerph17197265
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Main data structure in the digital healthcare system.
| Personal Attributes | Payment Detail | Settlement Detail | Hospitalized Detail |
|---|---|---|---|
| Person ID | Detail-ID | Settlement-ID | Hospitalized-ID |
| Sex | Person ID | Person ID | Person ID |
| Age | Insurance pay | Total cost | Type |
| Medical insurance usage | Drug name | Type of insurance | Department code |
| Outpatient amount | Number of medications | Medical insurance costs | Hospitalized days |
| Hospitalized amount | … | … | … |
Figure 1The top 10 disease record count distributions.
Figure 2The disease record count distributions.
Main features.
| Dimension | Attributes | Description |
|---|---|---|
| Fee | Total amount | Total amount during last year |
| Total health-care pay | Total health-care fund paid amount during last year | |
| Total self-pay | Total health-care fund paid amount during last year | |
| Total amount of medicine | Total cost of medicine during last year | |
| Average amount | Average amount during last year | |
| Average self-pay | Average self-pay during last year | |
| Average medicine fee | Average medicine fee during last year | |
| Maximum self-pay | Maximum self-pay during last year | |
| Total amount percent in all patients | Rank percent sort by total amount in all patients | |
| Frequency/Hospital | Total visit times | Total hospital visits times last year |
| Average gap | Average gap between hospital visits | |
| Hospital count | Total visited hospital count last year | |
| Personal information | Age | |
| Health-care type | ||
| Gender | ||
| Total balance | ||
| … | Other description | |
| Treatment detail | Primary disease | |
| Secondary disease | ||
| Prescription | Prescription drugs list | |
| Maximum amount of single drugs | Maximum amount of single drugs prescription |
Multilabel algorithms’ result.
| Algorithm | One-Error | Coverage |
|---|---|---|
| ML-DT | 0.670 | 25.801 |
| Rank SVM | 0.733 | 36.962 |
| NN | 0.427 | 13.431 |
Figure 3System diagram.
Figure 4(a) the t-SNE result by K-means; (b) the result by isolation forest.
Detection rate of different algorithms.
| Algorithm | Detection Rate (DR) |
|---|---|
| Traditional rule sorts | 24.0% |
| K-means | 35.0% |
| DBScan | 33.0% |
| Isolation Forest | 47.0% |
| LocalOutlierFactor | 45.0% |
Figure 5Ranks of known abnormal record based on (a) our model (b) traditional rule.
Detection rate of different algorithms on the SMOTE 10 k dataset.
| Algorithm | Detection Rate (DR) @10% |
|---|---|
| K-means | 38.1% |
| Isolation Forest | 45.2% |
Abnormal Score vs. Actual Abnormal ratio.
| Abnormal Score | Actual Abnormal/Samples |
|---|---|
| Top 0.01% | 24/500 |
| Top 20.01%–Top 20.02% | 2/500 |
| Top 40.01%–Top 40.02% | 3/500 |
| Top 60.01%–Top 60.02% | 2/500 |
| Top 80.01%–Top 80.02% | 1/500 |
| Random 0.01% except top 1% | 1/500, 2/500, 2/500 |