| Literature DB >> 31899785 |
Mikaela Irene D Fudolig1, Kunal Bhattacharya2,3, Daniel Monsivais3, Hang-Hyun Jo1,4,3, Kimmo Kaski3,5.
Abstract
We present a link-centric approach to study variation in the mobile phone communication patterns of individuals. Unlike most previous research on call detail records that focused on the variation of phone usage across individual users, we examine how the calling and texting patterns obtained from call detail records vary among pairs of users and how these patterns are affected by the nature of relationships between users. To demonstrate this link-centric perspective, we extract factors that contribute to the variation in the mobile phone communication patterns and predict demographics-related quantities for pairs of users. The time of day and the channel of communication (calls or texts) are found to explain most of the variance among pairs that frequently call each other. Furthermore, we find that this variation can be used to predict the relationship between the pairs of users, as inferred from their age and gender, as well as the age of the younger user in a pair. From the classifier performance across different age and gender groups as well as the inherent class overlap suggested by the estimate of the bounds of the Bayes error, we gain insights into the similarity and differences of communication patterns across different relationships.Entities:
Mesh:
Year: 2020 PMID: 31899785 PMCID: PMC6941803 DOI: 10.1371/journal.pone.0227037
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Features obtained from the call detail records.
*These features were log-transformed. †Log transformation was applied for late-night quantities involving the numbers of calls and call durations. ‡Log transformation was applied for every feature.
| Weekly | Weekday/weekend daytime/evening/late-night | Mean*, median*, std*, min*, max*, skewness, kurtosis (taken over the entire observation period) |
| (a) number of calls, | ||
| (b) call duration and | ||
| (c) number of texts | ||
| Total | Weekday/weekend daytime/evening/late-night | Fraction of [weekday, weekend] [calls, call duration, texts] in daytime/evening/late-night† |
| (a) number of calls, | ||
| (b) call duration and | ||
| (c) number of texts | ||
| for the entire observation period | ||
| (a) Number of days with at least one call and | Weekday/weekend daytime/evening/late-night | Number of days with at least one [call, text] in each division of time‡ |
| (b) number of days with at least one text | ||
| for the entire observation period | ||
| Reciprocity for | N/A | |in−out| ÷ (in+out) for each quantity |
| (a) number of calls, | ||
| (b) call duration and | ||
| (c) number of texts | ||
| for the entire observation period | ||
| (a) Interevent time for calls and | N/A | Mean*, median*, std*, min*, max*, skewness*, kurtosis* |
| (b) interevent time for texts | ||
| for the entire observation period | ||
| Number of common contacts | N/A | Number of common contacts within top 5 most called alters |
| Number of common contacts among all alters |
Fig 1Scree plot obtained from PCA.
Factors accounting for the top 5 principal components in the dataset with the corresponding top 5 features assigned to each and their absolute loadings.
Loadings were obtained by performing an oblimin rotation on the loadings obtained from the PCA.
| Daytime calls (7am–4pm) | Mean number of daytime calls in a week on weekdays | 0.90 |
| Number of days with daytime calls on weekdays | 0.89 | |
| Stdev of number of daytime calls in a week on weekdays | 0.82 | |
| Median of number of daytime calls in a week on weekdays | 0.82 | |
| Maximum number of daytime calls in a week on weekdays | 0.80 | |
| Evening calls (5pm–10pm) | Number of days with evening calls on weekdays | 0.82 |
| Number of days with evening calls on weekends | 0.79 | |
| Mean number of evening calls in a week on weekdays | 0.79 | |
| Mean number of evening calls in a week on weekends | 0.77 | |
| Median evening call duration in a week on weekdays | 0.76 | |
| Late-night calls (11pm–4am) | Fraction of late-night calls (frequency) on weekdays | 0.89 |
| Fraction of late-night calls (duration) on weekdays | 0.89 | |
| Fraction of late-night calls (frequency) on weekends | 0.88 | |
| Fraction of late-night calls (duration) on weekends | 0.88 | |
| Maximum number of late-night calls in a week on weekdays | 0.84 | |
| Texts | Mean number of daytime texts in a week on weekdays | 0.92 |
| Mean number of daytime texts in a week on weekends | 0.92 | |
| Mean number of evening texts in a week on weekdays | 0.91 | |
| Number of days with daytime texts on weekdays | 0.90 | |
| Number of days with daytime texts on weekends | 0.90 | |
| Texts | Median interevent time for texts | 0.60 |
| Mean interevent time for texts | 0.58 | |
| Minimum interevent time for texts | 0.55 | |
| Skewness of daytime texts in a week on weekdays | 0.54 | |
| Maximum interevent time for texts | 0.53 |
Fig 2Test accuracy for the different training sizes and prediction models in identifying opposite-gender peers.
The symbols ○ and × indicate the accuracies obtained using logistic regression and linear SVM, respectively, for feature selection, while the triangles (△) denote accuracies obtained using the full set of features.
Accuracy of best performing models (ntrain = 20000) in identifying opposite-gender peers (OGP) and predicting the age of the younger user (YUA) for each relationship as inferred from a pair’s age and gender difference.
For peers, + indicates same-gender pairs, while − indicates opposite-gender pairs. Y corresponds to pairs where the younger user is in the age range 18–28; M, 29–45; L, 46–55; O, 56–79. Peers are indicated by “peers”, while parent-child relationships are denoted by “child”. Note that in the latter case, it is the age group of the child that is given, while the parent is at least 20 years older. For the YUA prediction, we also include information about the composition and accuracy found for the subgroups of those in age group M corresponding to cases where the YUA is below and at least 35 years old.
| Relationship code | Accuracy (%) | % of test set | % of train set | ||
|---|---|---|---|---|---|
| OGP | YUA | OGP | YUA | ||
| −Y peers | 86.1 | 88.8 | 13.8 | 11.6 | 13.3 |
| +Y peers | 38.0 | 85.1 | 4.3 | 5.6 | 4.2 |
| −M peers | 74.8 | 66.0 | 36.4 | 30.1 | 36.6 |
| +M peers | 50.2 | 66.8 | 10.6 | 12.9 | 10.7 |
| −L peers | 40.0 | 82.9 | 7.1 | 5.4 | 6.9 |
| +L peers | 69.3 | 77.5 | 3.2 | 4.3 | 3.3 |
| −O peers | 23.1 | 91.5 | 3.3 | 2.8 | 3.0 |
| +O peers | 80.4 | 88.2 | 1.8 | 2.5 | 2.2 |
| Y child | 56.6 | 63.5 | 6.3 | 7.9 | 6.0 |
| M child | 72.5 | 63.4 | 9.6 | 12.2 | 10.0 |
| L child | 84.0 | 90.9 | 1.3 | 1.7 | 1.4 |
| −M peers (< 35) | 69.1 | 16.6 | 16.4 | ||
| −M peers (≥ 35) | 63.4 | 19.8 | 20.2 | ||
| +M peers (< 35) | 63.3 | 4.5 | 4.0 | ||
| +M peers (≥ 35) | 69.4 | 6.1 | 6.6 | ||
| M child (< 35) | 41.4 | 4.4 | 4.3 | ||
| M child (≥ 35) | 81.9 | 5.3 | 5.7 | ||
Fig 3Relative frequencies for each relationship category among (a) peers and (b) parent-child pairs showing the probabilities that users in a particular pair are opposite-gender peers.
The dashed vertical line shows p = 0.5, the case where the classifier assigns equal probabilities to whether users in a pair are opposite-gender peers or not.
Accuracy (in %) in predicting if a pair of peers is opposite-gender for peers in different age groups using the full and age-restricted training sets.
The OGP columns give the accuracy among opposite-gender peers, the SGP columns among same-gender peers, and the OGP+SGP columns among all peers in the given age group.
| Peer age group | OGP+SGP | OGP only | SGP only | |||
|---|---|---|---|---|---|---|
| Full | Age-restricted | Full | Age-restricted | Full | Age-restricted | |
| Y | 74.6 | 70.2 | 86.1 | 75.9 | 38.0 | 52.6 |
| M | 69.3 | 64.5 | 74.8 | 67.6 | 50.2 | 53.8 |
| L | 49.2 | 58.2 | 40.0 | 57.0 | 69.3 | 60.5 |
| O | 43.3 | 58.4 | 23.1 | 60.4 | 80.4 | 54.6 |
Fig 4Test accuracy for the different training sizes and prediction models in identifying whether the younger user of a pair is less than 35 years of age or not.
The symbols ○ and × indicate the accuracies obtained using logistic regression and linear SVM, respectively, for feature selection, while the triangles (△) denote accuracies obtained using the full set of features.
Fig 5Relative frequencies for each relationship category among (a) peers and (b) parent-child pairs showing the probabilities that the younger user of a particular pair is <35 years old.
The “y” and “o” suffix refer to the subgroups of pairs in the M age group that are below and at least 35 years old, respectively. The dashed vertical line shows p = 0.5, the case where the classifier assigns equal probabilities to whether the younger user’s age is below or at least 35.