| Literature DB >> 35528953 |
Nurulhuda Mustafa1, Lew Sook Ling2, Siti Fatimah Abdul Razak2.
Abstract
Background: Customer churn is a term that refers to the rate at which customers leave the business. Churn could be due to various factors, including switching to a competitor, cancelling their subscription because of poor customer service, or discontinuing all contact with a brand due to insufficient touchpoints. Long-term relationships with customers are more effective than trying to attract new customers. A rise of 5% in customer satisfaction is followed by a 95% increase in sales. By analysing past behaviour, companies can anticipate future revenue. This article will look at which variables in the Net Promoter Score (NPS) dataset influence customer churn in Malaysia's telecommunications industry. The aim of This study was to identify the factors behind customer churn and propose a churn prediction framework currently lacking in the telecommunications industry.Entities:
Keywords: Classification and Regression Trees (CART); Customer Churn; Data Mining Techniques; Net Promoter Score (NPS)
Mesh:
Year: 2021 PMID: 35528953 PMCID: PMC9051585 DOI: 10.12688/f1000research.73597.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. Telecommunications and Internet industry: Participants Annual Revenue as of 31 st December 2019.
Net Promoter (NPS) scale.
| Scale | Score | Description |
|---|---|---|
| Promoters | 9-10 | Customers who are typically the brand's ambassadors, enhancing a brand's reputation and/publicity and referrals flow. |
| Passives | 7-8 | Customers who have positive/constructive feelings towards the brand but are not expressing a need to change. |
| Detractors | 0-6 | Customers who are unlikely to remain or encourage others to return—and even worse— may discourage others from trying to trust the business or brand. |
Source: 2021 Guide & Definition (2021).
A summary of churn prediction studies.
| Author (Year) | Techniques and method | The disadvantage of the prediction studies & proposed enhancement |
|---|---|---|
| Ahmad et al. (2019) |
|
|
| Höppner et al. (2020) |
|
|
| Yang (2019) |
|
|
| Ahmed and Maheswari (2017) |
|
|
| Eria and Marikannan (2018) |
|
|
A summary for churn prediction framework studies.
| Author (Year) | |||
|---|---|---|---|
| Ahn et al. (2006) | Clemes et al. (2010) | Geetha and Abitha Kumari (2012) | Kim et al. (2017) |
|
| |||
| Describes a customer's status transition from active to non-user or suspended as a partial defection and from functional to absolute defective defection from active to churn.
| Identifies and analyses factors influencing bank customers' switching behaviour in the Chinese retail banking industry.
| Provides a brief overview of the trend of non-revenue earning customers (NRECs) that trigger revenue churn and are likely to churn soon.
| Analyses the factors that are affecting IPTV service providers' behaviour regarding switching barriers, VOCs, and content consumption.
|
|
| |||
|
|
|
|
|
|
| |||
|
|
|
|
|
|
| |||
|
|
|
|
|
|
| |||
| H1a, H1c, H2a, H3a, H4, H1c′, H2a′ and H3a′ = Supported
| H1 to H15 = Supported | H1 to H4 = Supported | H1, H2, H3, H4, H5, H6, H9, H14 and H15 = Supported
|
Churn prediction techniques.
| Algorithms | Description |
|---|---|
| Logistic Regression (LR) | Logistic regression is a practical regression analysis where the dependent variable is dichotomous; (binary). Logistic regression is used to characterise data and explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval, or ratio-level independent variables. The odds ratio of multiple explanatory variables is calculated by logistic regression. Except that the response variable is binomial, the method is like multiple linear regressions. The result is the impact of each variable on the odds ratio of the event of interest.
|
| Linear Discriminant Analysis (LDA) | Linear Discriminant Analysis, or LDA, is a technique used to minimise dimensionality. It is used as a pre-processing stage in applications for ML and pattern classification. The LDA aims to project the functions into a lower-dimensional space in a higher-dimensional space to avoid the curse of dimensionality and minimise energy and dimensional costs.
|
| K-Nearest Neighbours Classifier (KNN) | The Nearest Neighbour Classifier is a classification accomplished by defining the nearest neighbours as an example of a query and using those neighbours to evaluate the query's class. This classification approach is of particular interest since common run-time efficiency concerns are not the available computing resources these days.
|
| Classification and Regression Trees (CART) | Classification of and Regression Trees is a classification scheme that uses historical data to construct so-called decision trees. Decision trees can then be used in the form of new outcomes. First, the CART algorithm will search for all possible variables and all possible values to find the best partition—a query that divides the data into two parts with the highest homogeneity. Then, the process is repeated for each of the resulting data fragments.
|
| Gaussian Naive Bayes (NB) | Gaussian Naive Bayes is a particular case of probabilistic networks that allows the treatment of continuous variables. It is a generalisation of Naive Bayes Networks. The Naïve Bayes Classifiers are based on the Theorem of Bayes. One of the assumptions taken is the apparent presumption of freedom between functions. Furthermore, these classifiers assume that a particular function's value is unaffected by the value of any other feature. Therefore, naive Bayed Classifiers require a small amount of training data.
|
| Support Vector Machine (SVM) | Support vector machines (SVMs) are supervised learning methods known as regression, used for classification. Support vector machine (SVM) uses machine learning theory to classifier and regression prediction to maximise predictive accuracy while preventing overfitting the training. In general, SVMs may be thought of as systems that utilise functions in a high-dimensional feature space and are taught using an optimisation theory-based learning method that promotes statistical learning.
|
Figure 2. A conceptual model for the prediction of potential churner.
H1 Hypothesis.
| H1a | WEEK is positively associated with the customer churn probability |
| H1b | SR TYPE is positively associated with the customer churn probability |
| H1c | SR AREA is positively associated with the customer churn probability |
| H1d | REPLY DT are positively associated with the customer churn probability |
| H1e | REPLY DAY are positively associated with the customer churn probability |
| H1f | REPLY SHIFT is positively associated with the customer churn probability |
| H1g | SR CREATED DATE is positively associated with the customer churn probability |
| H1h | SR DAY are positively associated with the customer churn probability |
| H1i | SR SHIFT is positively associated with the customer churn probability |
| H1j | DURATION is positively associated with the customer churn probability |
H2 Hypothesis.
| H2a | SR CREATOR ID is positively associated with the customer churn probability |
| H2b | SR CREATOR NAME is positively associated with the customer churn probability |
| H2c | SR CREATOR POSITION is positively associated with the customer churn probability |
| H2d | USERNAME ASSIGNED TO are positively associated with the customer churn probability |
| H2e | ASSIGNED TO are positively associated with the customer churn probability |
| H2f | ASSIGNED TO POSITION are positively associated with the customer churn probability |
H3 Hypothesis.
| H3a | DIVISION ASSIGNED TO are positively associated with the customer churn probability |
| H3b | OUTLET NAME is positively associated with the customer churn probability |
H4 Hypothesis.
| H4 | A lower NPS feedback rating is considered more potential churner than a customer with a higher NPS feedback rating |
Mediation effects on NPS feedback.
| H1a' | A NPS Feedback rating mediates the effect of request week on customer churn. |
| H1b' | A NPS Feedback rating mediates the effect of service request type on customer churn. |
| H1c' | A NPS Feedback rating mediates the effect of the service request area on customer churn |
| H1d' | A NPS Feedback rating mediates the effect of respond date on customer churn |
| H1e' | A NPS Feedback rating mediates the effect of respond day on customer churn |
| H1f' | A NPS Feedback rating mediates the effect of respond day shift on customer churn |
| H1g' | A NPS Feedback rating mediates the effect of service request date on customer churn |
| H1h' | A NPS Feedback rating mediates the effect of service request day on customer churn |
| H1i' | A NPS Feedback rating mediates the effect of service request day shift on customer churn |
| H1j' | A NPS Feedback rating mediates the effect of service duration on customer churn. |
| H2a' | A NPS Feedback rating mediates the effect of service request created staff ID on customer churn |
| H2b' | A NPS Feedback rating mediates the effect of service request created staff name on customer churn |
| H2c' | A NPS Feedback rating mediates the effect of service request created staff position on customer churn |
| H2d' | A NPS Feedback rating mediates the effect of assigned staff ID on customer churn |
| H2e' | A NPS Feedback rating mediates the effect of assigned staff name on customer churn |
| H2f' | A NPS Feedback rating mediates the effect of assigned staff position on customer churn |
| H3a' | A NPS Feedback rating mediates the effect of assigned division on customer churn |
| H3b' | A NPS Feedback rating mediates the effect of the assigned outlet on customer churn |
Dataset details.
| No | Data | Data type | Original/adding data | Details |
|---|---|---|---|---|
| 1 | SR NUMBER | object | Original data | Service request tracking number |
| 2 | WEEK | int64 | Original data | Number of weeks (year) |
| 3 | CUSTOMER NAME | object | Original data | Customer/Business Name |
| 4 | RACE | object | Add data | Customer Race |
| 5 | REPLY DT | object | Original data | Respond Date |
| 6 | REPLY DAY | object | Add data | Respond Day |
| 7 | REPLY SHIFT | object | Add data | Respond Day Shift |
| 8 | S_NPS_FEEDBACK | int64 | Original data | NPS Feedback Rating (0-10) |
| 9 | S_NPS_FEEDBACK_TYPE_FK | object | Original data | NPS Feedback Type (Promoter, Passive, Distractor) |
| 10 | NES RESPONSE | object | Original data | NPS comment |
| 11 | SEGMENT GROUP | object | Original data | Customer segmentation group (consumer, SME's, government, and enterprise) |
| 12 | SEGMENT CODE | object | Original data | Customer segmentation code |
| 13 | ARPU | float64 | Add data | The average revenue per user (customer) |
| 14 | SR CREATED DATE | object | Original data | Service Request Date |
| 15 | SR DAY | object | Add data | Service Request Day |
| 16 | SR SHIFT | object | Add data | Service Request Day shift |
| 17 | DURATION | object | Add data | Respond Time duration |
| 18 | SR TYPE | object | Original data | Service Request Type |
| 19 | SR AREA | object | Original data | Service Request Area |
| 20 | SR SUB AREA | object | Original data | Service Request Sub Area |
| 21 | SR CREATOR ID | object | Original data | Helpdesk Staff Creator ID |
| 22 | SR CREATOR NAME | object | Original data | Helpdesk Staff Creator Name |
| 23 | CREATOR POSITION | object | Original data | Helpdesk Staff Creator Position |
| 24 | USERNAME ASSIGNED TO | object | Original data | Officer in Charge Username |
| 25 | ASSIGNED TO | object | Original data | Officer in Charge Name |
| 26 | ASSIGNED TO POSITION | object | Original data | Officer in Charge Position |
| 27 | DIVISION ASSIGNED TO | object | Original data | Assigned Division |
| 28 | BUILDING ID | object | Original data | Assigned Outlet ID |
| 29 | OUTLET NAME | object | Original data | Assigned Outlet Name |
| 30 | ZONE | object | Original data | Assigned Outlet Zone |
| 31 | STATE | object | Original data | Assigned Outlet State |
| 32 | SOURCE | object | Original data | Service Request System |
| 33 | POTENTIAL CHURNER | object | Add data | Potential churner or not |
Key findings from the discovery process.
| MTD Sept 2019 | MTD Sept 2020 |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
Variable selection and transformation.
| No | Data | Data type |
|---|---|---|
| 1 | WEEK | int64 |
| 2 | RACE | int64 |
| 3 | REPLY DT | int64 |
| 4 | REPLY DAY | int64 |
| 5 | REPLY SHIFT | int64 |
| 6 | S_NPS_FEEDBACK | int64 |
| 7 | S_NPS_FEEDBACK_TYPE_FK | int64 |
| 8 | SEGMENT GROUP | int64 |
| 9 | SEGMENT CODE | int64 |
| 10 | ARPU | float64 |
| 11 | SR CREATED DATE | int64 |
| 12 | SR DAY | int64 |
| 13 | SR SHIFT | int64 |
| 14 | DURATION | int64 |
| 15 | SR TYPE | int64 |
| 16 | SR AREA | int64 |
| 17 | SR CREATOR ID | int64 |
| 18 | SR CREATOR NAME | int64 |
| 19 | SR CREATOR POSITION | int64 |
| 20 | USERNAME ASSIGNED TO | int64 |
| 21 | ASSIGNED TO | int64 |
| 22 | ASSIGNED TO POSITION | int64 |
| 23 | DIVISION ASSIGNED TO | int64 |
| 24 | OUTLET NAME | int64 |
| 25 | POTENTIAL CHURNER | int64 |
Figure 3. Correlation coefficient results for MTD Sept 2019 and MTD Sept 2020.
Machine learning algorithms comparison results.
| Algorithms Name | MTD Sept 2019 | MTD Sept 2020 | ||||
|---|---|---|---|---|---|---|
| Mean | Std | Accuracy score | Mean | Std | Accuracy score | |
| Logistic Regression (LR) | 0.42 | 0.01 | 41% | 0.44 | 0.01 | 45% |
| Linear Discriminant Analysis (LDA) | 0.41 | 0.02 | 42% | 0.47 | 0.02 | 45% |
| K-Nearest Neighbours Classifier (KNN) | 0.98 | 0.01 | 98% | 0.98 | 0.01 | 97% |
| Classification and Regression Trees (CART) | 0.98 | 0.01 | 98% | 0.98 | 0.01 | 98% |
| Gaussian Naive Bayes (NB) | 0.41 | 0.01 | 41% | 0.44 | 0.02 | 44% |
| Support Vector Machine (SVM) | 0.98 | 0.01 | 98% | 0.96 | 0.01 | 98% |
Figure 4. Comparing machine learning algorithms for MTD Sept 2019 and MTD Sept 2020.
Mediation analysis results.
| X: Independent variable | ||||
|---|---|---|---|---|
| M: NPS feedback | ||||
| Y: Potential churner | ||||
| X | X and M (p-value) | X and Y (p-value) | X, M and Y (p-value) sobel test | Significant relationship X and Y via M |
| WEEK | 0.5365 | 0.7852 | 0.5365 | No |
| SR TYPE
| 0.0001 | 0.0024 | 0.0001 | Yes |
| SR AREA | 0.6196 | 0.7088 | 0.6195 | No |
| REPLY DT | 0.5634 | 0.8215 | 0.5634 | No |
| REPLY DAY | 0.4331 | 0.4755 | 0.4331 | No |
| REPLY SHIFT
| 0.0001 | 0.0138 | 0.0001 | Yes |
| SR CREATED DATE | 0.7494 | 0.8044 | 0.7494 | No |
| SR DAY | 0.7311 | 0.7859 | 0.7311 | No |
| SR SHIFT | 0.4008 | 0.2146 | 0.4008 | No |
| DURATION
| 0.0001 | 0.0001 | 0.0001 | Yes |
| SR CREATOR ID
| 0.0009 | 0.0653 | 0.0009 | Yes |
| SR CREATOR NAME | 0.5596 | 0.1048 | 0.5596 | No |
| SR CREATOR POSITION | 0.7382 | 0.7535 | 0.7382 | No |
| USERNAME ASSIGNED TO
| 0.0002 | 0.0292 | 0.0002 | Yes |
| ASSIGNED TO | 0.1727 | 0.0206 | 0.1727 | No |
| ASSIGNED TO POSITION | 0.9376 | 0.4814 | 0.9376 | No |
| DIVISION ASSIGNED TO | 0.9622 | 0.6390 | 0.9622 | No |
| OUTLET NAME | 0.6197 | 0.5280 | 0.6197 | No |
p-value (<.05).
Summary of hypothesis results.
| Hypothesis | Description | Decision |
|---|---|---|
| H1a | WEEK is positively associated with the customer churn probability | Rejected |
| H1b | SR TYPE is positively associated with the customer churn probability | Supported |
| H1c | SR AREA is positively associated with the customer churn probability | Rejected |
| H1d | REPLY DT are positively associated with the customer churn probability | Rejected |
| H1e | REPLY DAY are positively associated with the customer churn probability | Rejected |
| H1f | REPLY SHIFT is positively associated with the customer churn probability | Supported |
| H1g | SR CREATED DATE is positively associated with the customer churn probability | Rejected |
| H1h | SR DAY are positively associated with the customer churn probability | Rejected |
| H1i | SR SHIFT is positively associated with the customer churn probability | Rejected |
| H1j | DURATION is positively associated with the customer churn probability | Supported |
| H2a | SR CREATOR ID is positively associated with the customer churn probability | Supported |
| H2b | SR CREATOR NAME is positively associated with the customer churn probability | Rejected |
| H2c | SR CREATOR POSITION is positively associated with the customer churn probability | Rejected |
| H2d | USERNAME ASSIGNED TO are positively associated with the customer churn probability | Supported |
| H2e | ASSIGNED TO are positively associated with the customer churn probability | Rejected |
| H2f | ASSIGNED TO POSITION are positively associated with the customer churn probability | Rejected |
| H3a | DIVISION ASSIGNED TO are positively associated with the customer churn probability | Rejected |
| H3b | OUTLET NAME is positively associated with the customer churn probability | Rejected |
| H4 | A lower NPS feedback rating is considered more potential churner than a customer with a higher NPS feedback rating | Supported |