| Literature DB >> 29474411 |
Rishav Raj Agarwal1, Chia-Ching Lin1, Kuan-Ta Chen1, Vivek Kumar Singh2,3.
Abstract
An ability to understand and predict financial wellbeing for individuals is of interest to economists, policy designers, financial institutions, and the individuals themselves. According to the Nilson reports, there were more than 3 billion credit cards in use in 2013, accounting for purchases exceeding US$ 2.2 trillion, and according to the Federal Reserve report, 39% of American households were carrying credit card debt from month to month. Prior literature has connected individual financial wellbeing with social capital. However, as yet, there is limited empirical evidence connecting social interaction behavior with financial outcomes. This work reports results from one of the largest known studies connecting financial outcomes and phone-based social behavior (180,000 individuals; 2 years' time frame; 82.2 million monthly bills, and 350 million call logs). Our methodology tackles highly imbalanced dataset, which is a pertinent problem with modelling credit risk behavior, and offers a novel hybrid method that yields improvements over, both, a traditional transaction data only approach, and an approach that uses only call data. The results pave way for better financial modelling of billions of unbanked and underbanked customers using non-traditional metrics like phone-based credit scoring.Entities:
Mesh:
Year: 2018 PMID: 29474411 PMCID: PMC5825009 DOI: 10.1371/journal.pone.0191863
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Dataset summary for various data sources used in the study.
| Dataset | Description | # records |
|---|---|---|
| Records of bill generated for each month | 3.6 million accounts, 82.2 million monthly bills | |
| Day to day transaction for each individual including purchase type and location of purchase | 2.3 million accounts, 190 million transactions | |
| The account holder which is recorded and regularly updated by the bank. | 1.6 million | |
| Call related data including hashed remote number called and doesn’t include any private information | 180, 000 users, 350 million call logs |
The datasets are explained further in the following subsections.
Bill data consumer rating summary.
| Rating | Description | % of Population |
|---|---|---|
| No consumption | 52.35% | |
| Pay full amount on time | 39.85% | |
| Pay full amount—not on time | 0.76% | |
| Pay minimum | 5.98% | |
| Pay minimum amount late | 0.55% | |
| Pay less than minimum amount | 0.06% | |
| Not pay | 0.46% |
Trouble variable summary.
| Trouble | Description | % of Population |
|---|---|---|
| Avoids late payment | 98.94% | |
| Makes Payment late or no Payment | 1.07% |
Fig 1Distribution of credit card transaction amounts (on a log scale).
Transaction based features.
| Metric | Formula | Description |
|---|---|---|
| Min /Max / Log (T) | Quantifies the overall transaction activity. Note: T can be number of transactions /transaction amount | |
| Quantifies the spread of T. | ||
| Quantifies the preference to spend on weekdays. | ||
| Quantifies the preference to spend on holidays. | ||
| Quantifies the preference to spend in Taiwan vs foreign. | ||
| Quantifies the likelihood to spend on a given category as described below. | ||
| Quantifies the likelihood to spend within 10 days of salary credit (i.e. first 10 days of the month) | ||
| ∑ | Quantifies the evenness of spending across MCC | |
| Quantifies the preference of spending in top 3 MCC |
*where T can be number of transactions /transaction amount MCC categories [44]:
Business Services
Utilities
Service providers
Retail Stores (Grocery stores, Supermarkets
Government services
Call/Call duration data based features.
| Metric | Formula | Description |
|---|---|---|
| Log (Comm | Quantifies the overall communication activity. | |
| Quantifies the circadian rhythm the ratio of communication taking place in four six hour phases. | ||
| Quantifies the difference between “work” week and the more social weekend behavior. | ||
| Quantifies the communication effort that a user devotes to her top/bottom third contacts. We do this for both known, unknown and overall contacts. | ||
| Quantifies the frequency of comm | ||
| ∑ | Quantifies the evenness of engagements across contacts | |
| Quantifies the reception of comm | ||
| Quantifies the latency in comm | ||
| Quantifies the likelihood of replying to the communication during a given time period. |
*where comm can be incoming/outgoing/missed/total calls
Demographic based features.
| Variable | Description |
|---|---|
| 1: male 2: female | |
| 1: doctoral; 2: master; 3: bachelor; 4: college; 5: high school; 6: others | |
| (1: below 1 million; 2: 1 ~ 3 million; 3: 3 ~ 5 million; 4: above 5 million) | |
| (1: married; 2: not married; 3: divorced) | |
| (1: responsible persons; 2: executives; 3: mid-level executives; 4: normal employees; 5: others) | |
| Number of open credit cards across all banks |
Testing results—Predicting financial trouble as a function of different feature sets.
| Call +Transaction + Demographics | Call | Transaction + Demographics | |
|---|---|---|---|
| AUCROC | 0.781 | 0.725 | 0.731 |
| Recall | 0.731 | 0.679 | 0.689 |
| Accuracy | 0.687 | 0.649 | 0.652 |
Fig 2AUCROC comparison across all periods.
T-test for AUCROC.
| Testing call + transaction model against: | T score (P value) |
|---|---|
| Only transaction model | 28.422 (4.342e-13) |
| Only call model | 33.248 (5.802e-14) |
T-test comparing call only and transaction only model.
| Testing call vs transaction model | T score (P value) |
|---|---|
| AUCROC | -2.1056 (0.05525) |
| Accuracy | -1.027 (0.3231) |
Top-10 features for each category, as well as the sign (positive or negative) of their Pearson’s correlation with the outcome variable (having financial trouble or not).
| Rank | Call +Transaction + Demographics | Call | Transaction + Demographics |
|---|---|---|---|
| 1 | Inter-event time (incoming) | Inter-event time (incoming) | # months with at least one transaction |
| 2 | # months with at least one transaction | Incoming Latency (daytime) | Coefficient of variation |
| 3 | MCC ratio (Business Services) | Missed call Latency | Weekend ratio (# transactions) |
| 4 | Coefficient of variation | Contact engagement ratio (Outgoing) | Mean transaction across all banks |
| 5 | MCC ratio (Retail Stores) | Interevent time (Outgoing) | MCC ratio (Retail Stores) |
| 6 | Mean transaction across all banks | Incoming Latency (morning) | MCC ratio (Business Services) |
| 7 | Missed Call Latency (total) | Landline engagement ratio (outgoing) | Domestic Ratio (# transactions) |
| 8 | # of opened credit cards | Contact engagement ratio (total) | MCC ratio (Utilities) |
| 9 | Domestic Ratio (# transactions) | Contact engagement ratio (incoming) | Maximum transaction amount |
| 10 | Incoming Latency (daytime) | Incoming Latency (night) | # of opened credit cards |
Red indicates negative impact while green indicates positive taken over all the periods. COV is white as it exhibited positive correlation in some periods while being negative in others.