Literature DB >> 33816918

Adaptations of data mining methodologies: a systematic literature review.

Veronika Plotnikova1, Marlon Dumas1, Fredrik Milani1.   

Abstract

The use of end-to-end data mining methodologies such as CRISP-DM, KDD process, and SEMMA has grown substantially over the past decade. However, little is known as to how these methodologies are used in practice. In particular, the question of whether data mining methodologies are used 'as-is' or adapted for specific purposes, has not been thoroughly investigated. This article addresses this gap via a systematic literature review focused on the context in which data mining methodologies are used and the adaptations they undergo. The literature review covers 207 peer-reviewed and 'grey' publications. We find that data mining methodologies are primarily applied 'as-is'. At the same time, we also identify various adaptations of data mining methodologies and we note that their number is growing rapidly. The dominant adaptations pattern is related to methodology adjustments at a granular level (modifications) followed by extensions of existing methodologies with additional elements. Further, we identify two recurrent purposes for adaptation: (1) adaptations to handle Big Data technologies, tools and environments (technological adaptations); and (2) adaptations for context-awareness and for integrating data mining solutions into business processes and IT systems (organizational adaptations). The study suggests that standard data mining methodologies do not pay sufficient attention to deployment issues, which play a prominent role when turning data mining models into software products that are integrated into the IT architectures and business processes of organizations. We conclude that refinements of existing methodologies aimed at combining data, technological, and organizational aspects, could help to mitigate these gaps.
© 2020 Plotnikova et al.

Entities:  

Keywords:  CRISP-DM; Data mining; Data mining methodology; Literature review

Year:  2020        PMID: 33816918      PMCID: PMC7924527          DOI: 10.7717/peerj-cs.267

Source DB:  PubMed          Journal:  PeerJ Comput Sci        ISSN: 2376-5992


  1 in total

1.  A data mining framework for time series estimation.

Authors:  Xiao Hu; Peng Xu; Shaozhi Wu; Shadnaz Asgari; Marvin Bergsneider
Journal:  J Biomed Inform       Date:  2009-11-10       Impact factor: 6.317

  1 in total
  1 in total

1.  Current approaches for executing big data science projects-a systematic literature review.

Authors:  Jeffrey S Saltz; Iva Krasteva
Journal:  PeerJ Comput Sci       Date:  2022-02-21
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.