| Literature DB >> 25879060 |
Ali Rodan1, Ayham Fayyoumi2, Hossam Faris1, Jamal Alsakran1, Omar Al-Kadi1.
Abstract
Recently, telecommunication companies have been paying more attention toward the problem of identification of customer churn behavior. In business, it is well known for service providers that attracting new customers is much more expensive than retaining existing ones. Therefore, adopting accurate models that are able to predict customer churn can effectively help in customer retention campaigns and maximizing the profit. In this paper we will utilize an ensemble of Multilayer perceptrons (MLP) whose training is obtained using negative correlation learning (NCL) for predicting customer churn in a telecommunication company. Experiments results confirm that NCL based MLP ensemble can achieve better generalization performance (high churn rate) compared with ensemble of MLP without NCL (flat ensemble) and other common data mining techniques used for churn analysis.Entities:
Year: 2015 PMID: 25879060 PMCID: PMC4386545 DOI: 10.1155/2015/473283
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Related work for churn prediction methods.
| Author | Method | Description |
|---|---|---|
| Idris et al. [ | GP | GP is applied with AdaBoost for churn prediction |
|
| ||
| Tsai and Lu [ | ANN with BP | Applied as a hybrid approach in two stages (i.e., reduction and prediction) |
|
| ||
| Wang and Niu [ | SVM | Least squares support vector machine (LS-SVM) is applied to establish a prediction model of credit card customer churn |
|
| ||
| Eastwood and Gabrys [ | IBK | Authors apply simple |
|
| ||
|
Kraljevíc and Gotovac [ | Decision trees | Decision trees (DT) were applied and compared with ANN and logistic regression. DT results outperform other models |
|
| ||
| Verbraken et al. [ | Naive Bayes | Number of Bayesian Network algorithms, ranging from the Naive Bayes classifier to General Bayesian Network classifiers are applied for churn prediction |
Figure 1Ensemble of MLP networks.
List of attributes.
| Attribute name | Description |
|---|---|
| 3G | Subscriber is provided with 3G service (yes, no) |
| Total consumption | Total monthly fees (calling + sms) (JD) |
| Calling fees | Total monthly calling fees (JD) |
| Local sms fees | Monthly local sms fees (JD) |
| International sms fees | Monthly fees for international sms (JD) |
| International calling fees | Monthly fees for international calling (JD) |
| Local sms count | Number of monthly local sms |
| International sms count | Number of monthly international sms |
| International MOU | Total of international outgoing calls in minutes |
| Total MOU | Total minutes of use for all outgoing calls |
| On net MOU | Minutes of use for on-net-outgoing calls |
| Churn | Churning customer status (yes, no) |
Confusion matrix.
| Predicted | ||
|---|---|---|
| Nonchurn | Churn | |
| Actual nonchurn |
|
|
| Actual churn |
|
|
Selected ensemble of MLP using NCL parameters based on 5-fold cross validation.
| Parameter | Value |
|---|---|
| Hidden layers | 1 |
| Ensemble size ( | 10 |
| Decay | 0.001 |
| hidden Layer nodes ( | 10 |
| Activation function | 1/(1 + |
| Learning rate ( | 0.3 |
| Momentum | 0.2 |
|
| 0.5 |
Tuning parameters for data mining techniques used in the comparison study.
| Method | Parameters |
|---|---|
| GP | Population size = 1000, |
|
| |
| ANN with BP | Activation function = Sigmoid, Epoches = 5000, Learning Rate = 0.3, Momentum = 0.2 |
|
| |
| SVM | Cost = 1, Gamma = 10000 |
|
| |
| IBK | Number of neighbours = 1, nearest neighbor search algorithm = linear search (brute force search) |
|
| |
| AdaBoost | Number of classifiers = 10 |
|
| |
| Bagging | Number of classifiers = 10 |
|
| |
| NNCS | Hidden layers = 2, hidden nodes = 15 |
|
| |
| SMOTE | Number of neighbors = 5 |
|
| |
| NCR + CPSO | Number of neighbors = 5 for SMOTE, number of particles = 75 for CPSO |
Evaluation results (results of best five models are shown in bold).
| Model | Accuracy | Actual churn rate | Hit rate |
|---|---|---|---|
|
| 0.927 | 0.022 | 0.067 |
| Naive Bayes (NB) | 0.597 | 0.901 | 0.115 |
| Random Forest (RF) | 0.940 | 0.006 | 0.109 |
| Genetic programing (GP) | 0.759 | 0.638 | 0.142 |
| Single ANN with BP | 0.941 | 0.625 | 0.607 |
| Decision tress (C4.5) |
| 0.703 |
|
| Support vector machine (SVM) |
| 0.703 |
|
| AdaBoosting |
| 0.719 |
|
| Bagging |
| 0.703 |
|
| MLP for cost-sensitive classification (NNCS) | 0.496 | 0.819 | 0.113 |
| SMOTE + MLP | 0.722 | 0.724 | 0.177 |
| NCR + CPSO | 0.894 | 0.827 | 0.694 |
| NCR + MLP | 0.642 | 0.751 | 0.144 |
| Flat ensemble of ANN | 0.958 | 0.732 | 0.725 |
| Ensemble of ANN using (NCL) |
|
|
|