Literature DB >> 30186595

The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data.

Richard A Bauder1, Taghi M Khoshgoftaar1.   

Abstract

Healthcare in the United States is a critical aspect of most people's lives, particularly for the aging demographic. This rising elderly population continues to demand more cost-effective healthcare programs. Medicare is a vital program serving the needs of the elderly in the United States. The growing number of Medicare beneficiaries, along with the enormous volume of money in the healthcare industry, increases the appeal for, and risk of, fraud. In this paper, we focus on the detection of Medicare Part B provider fraud which involves fraudulent activities, such as patient abuse or neglect and billing for services not rendered, perpetrated by providers and other entities who have been excluded from participating in Federal healthcare programs. We discuss Part B data processing and describe a unique process for mapping fraud labels with known fraudulent providers. The labeled big dataset is highly imbalanced with a very limited number of fraud instances. In order to combat this class imbalance, we generate seven class distributions and assess the behavior and fraud detection performance of six different machine learning methods. Our results show that RF100 using a 90:10 class distribution is the best learner with a 0.87302 AUC. Moreover, learner behavior with the 50:50 balanced class distribution is similar to more imbalanced distributions which keep more of the original data. Based on the performance and significance testing results, we posit that retaining more of the majority class information leads to better Medicare Part B fraud detection performance over the balanced datasets across the majority of learners.

Entities:  

Keywords:  Big data; Class imbalance; Medicare fraud; Random undersampling

Year:  2018        PMID: 30186595      PMCID: PMC6120851          DOI: 10.1007/s13755-018-0051-3

Source DB:  PubMed          Journal:  Health Inf Sci Syst        ISSN: 2047-2501


  11 in total

1.  Comparing individual means in the analysis of variance.

Authors:  J W TUKEY
Journal:  Biometrics       Date:  1949-06       Impact factor: 2.571

2.  Variability in Medicare utilization and payment among urologists.

Authors:  Joan S Ko; Heather Chalfin; Bruce J Trock; Zhaoyong Feng; Elizabeth Humphreys; Sung-Woo Park; H Ballentine Carter; Kevin D Frick; Misop Han
Journal:  Urology       Date:  2015-03-04       Impact factor: 2.649

3.  Big data. The parable of Google Flu: traps in big data analysis.

Authors:  David Lazer; Ryan Kennedy; Gary King; Alessandro Vespignani
Journal:  Science       Date:  2014-03-14       Impact factor: 47.728

Review 4.  A review of analytics and clinical informatics in health care.

Authors:  Allan F Simpao; Luis M Ahumada; Jorge A Gálvez; Mohamed A Rehman
Journal:  J Med Syst       Date:  2014-04-03       Impact factor: 4.460

Review 5.  Data mining in healthcare and biomedicine: a survey of the literature.

Authors:  Illhoi Yoo; Patricia Alafaireet; Miroslav Marinov; Keila Pena-Hernandez; Rajitha Gopidi; Jia-Fu Chang; Lei Hua
Journal:  J Med Syst       Date:  2011-05-03       Impact factor: 4.460

6.  Facing Imbalanced Data Recommendations for the Use of Performance Metrics.

Authors:  László A Jeni; Jeffrey F Cohn; Fernando De La Torre
Journal:  Int Conf Affect Comput Intell Interact Workshops       Date:  2013

Review 7.  No evidence of the effect of the interventions to combat health care fraud and abuse: a systematic review of literature.

Authors:  Arash Rashidian; Hossein Joudaki; Taryn Vian
Journal:  PLoS One       Date:  2012-08-24       Impact factor: 3.240

8.  Does Medical School Training Relate to Practice? Evidence from Big Data.

Authors:  Keith Feldman; Nitesh V Chawla
Journal:  Big Data       Date:  2015-06-01       Impact factor: 2.128

Review 9.  Big data analytics in healthcare: promise and potential.

Authors:  Wullianallur Raghupathi; Viju Raghupathi
Journal:  Health Inf Sci Syst       Date:  2014-02-07

Review 10.  Using data mining to detect health care fraud and abuse: a review of literature.

Authors:  Hossein Joudaki; Arash Rashidian; Behrouz Minaei-Bidgoli; Mahmood Mahmoodi; Bijan Geraili; Mahdi Nasiri; Mohammad Arab
Journal:  Glob J Health Sci       Date:  2014-08-31
View more
  5 in total

1.  Identifying and Mitigating Potential Biases in Predicting Drug Approvals.

Authors:  Qingyang Xu; Elaheh Ahmadi; Alexander Amini; Daniela Rus; Andrew W Lo
Journal:  Drug Saf       Date:  2022-05-17       Impact factor: 5.606

2.  Ensemble-AMPPred: Robust AMP Prediction and Recognition Using the Ensemble Learning Method with a New Hybrid Feature for Differentiating AMPs.

Authors:  Supatcha Lertampaiporn; Tayvich Vorapreeda; Apiradee Hongsthong; Chinae Thammarongtham
Journal:  Genes (Basel)       Date:  2021-01-21       Impact factor: 4.096

3.  Prediction Model of Osteonecrosis of the Femoral Head After Femoral Neck Fracture: Machine Learning-Based Development and Validation Study.

Authors:  Huan Wang; Wei Wu; Chunxia Han; Jiaqi Zheng; Xinyu Cai; Shimin Chang; Junlong Shi; Nan Xu; Zisheng Ai
Journal:  JMIR Med Inform       Date:  2021-11-19

4.  Apache Spark and Deep Learning Models for High-Performance Network Intrusion Detection Using CSE-CIC-IDS2018.

Authors:  Abdulnaser A Hagar; Bharti W Gawali
Journal:  Comput Intell Neurosci       Date:  2022-08-26

5.  Classification and prediction of diabetes disease using machine learning paradigm.

Authors:  Md Maniruzzaman; Md Jahanur Rahman; Benojir Ahammed; Md Menhazul Abedin
Journal:  Health Inf Sci Syst       Date:  2020-01-03
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.