| Literature DB >> 33343085 |
Anatoliy Gruzd1, Priya Kumar2, Deena Abul-Fottouh3, Caroline Haythornthwaite4.
Abstract
As social media become a staple for knowledge discovery and sharing, questions arise about how self-organizing communities manage learning outside the domain of organized, authority-led institutions. Yet examination of such communities is challenged by the quantity of posts and variety of media now used for learning. This paper addresses the challenges of identifying (1) what information, communication, and discursive practices support successful online communities, (2) whether such practices are similar on Twitter and Reddit, and (3) whether machine learning classifiers can be successfully used to analyze larger datasets of learning exchanges. This paper builds on earlier work that used manual coding of learning and exchange in Reddit 'Ask' communities to derive a coding schema we refer to as 'learning in the wild'. This schema of eight categories: explanation with disagreement, agreement, or neutral presentation; socializing with negative, or positive intent; information seeking; providing resources; and comments about forum rules and norms. To compare across media, results from coding Reddit's AskHistorians are compared to results from coding a sample of #Twitterstorians tweets (n = 594). High agreement between coders affirmed the applicability of the coding schema to this different medium. LIWC lexicon-based text analysis was used to build machine learning classifiers and apply these to code a larger dataset of tweets (n = 69,101). This research shows that the 'learning in the wild' coding schema holds across at least two different platforms, and is partially scalable to study larger online learning communities.Entities:
Keywords: #Twitterstorians; AskHistorians; Content analysis; Knowledge exchange; Machine learning; Reddit; Social media; Twitter
Year: 2020 PMID: 33343085 PMCID: PMC7731652 DOI: 10.1007/s10606-020-09376-y
Source DB: PubMed Journal: Comput Support Coop Work ISSN: 0925-9724 Impact factor: 1.825
‘Learning in the Wild’ Coding Schema.
| Code | Definition | Linguistic Dialogue Example |
|---|---|---|
| Expresses a NEGATIVE take on the content of the previous posts by adding new ideas or facts to discussion thread | ‘But’, ‘I disagree’, ‘not sure’, ‘not exactly’ with explanation/ judgement/ reasoning/ etc. | |
| Expresses a POSITIVE take on the content of the previous posts by adding new ideas or facts to discussion thread | ‘Indeed’, ‘also’, ‘I agree’, with explanation/ judgement/ reasoning/ etc. | |
| Expresses a NEUTRAL explanation/ judgement/reasoning/etc. with neither negative nor positive reference to the content of the previous posts, nor necessarily any reference to previous posts | ‘I can understand’, ‘interesting’, ‘depends on…’ or statement responses | |
| Socializing that expresses negative affect through tone, words, insults, expletives intended as abusive | ‘No’, ‘you’re an idiot’, ‘this has been explained multiple times’ | |
| Socializing that expresses positive affect tone, words, praise, humour, irony intended in a positive way | ‘Thanks’, ‘great feedback’, ‘you’re correct’ | |
| Postings asking questions or soliciting opinions, resources, etc. This does not include questions answered rhetorically within the post, e.g., if a question is asked and answered | ‘Does anyone know?’, ‘Can anyone explain?’ | |
| Postings that include direct reference to a URL, book, article, etc.; postings that call upon a well-known theory or the name of a well-known figure | Link to resource (book, URL, article, audio/video file). Referencing theory/theorists, scholar or public work (Einstein, Newton, Freud) | |
| Postings on topics such as what is the appropriate for a particular discussion, what language is appropriate to use, how to back up claims by using resources, using hashtags, etc. | ‘See/don’t forget link’, ‘this post doesn’t belong here’, acknowledging OP/HT Twitter users, hashtags and bots |
Figure 1Process of Building Classifiers Using Ensemble Stacking Technique.
Manual Coding Results: #Twitterstorians vs Reddit’s AskHistorians.
| #Twitterstorians (n = 594) | Reddit AskHistorians ( | |
|---|---|---|
| Explanation with Disagreement | 3 (1%) | 71 (6%) |
| Explanation with Agreement | 4 (1%) | 45 (4%) |
| Explanation with Neutral Presentation | 73 (12%) | 592 (48%) |
| Socializing with Negative Intent | 1 (0%) | 4 (0%) |
| Socializing with Positive Intent | 99 (17%) | 204 (17%) |
| Information Seeking | 100 (17%) | 274 (22%) |
| Providing Resources | 223 (38%) | 260 (21%) |
| Rules and Norms | 22 (4%) | 66 (5%) |
| Krippendorff’s Alpha | 0.65 (73%) | 0.76 (79%) |
*Messages were classified under a particular code only if the two coders agreed. Percentages add up to over 100% because coders could assign up to three codes per message. Percentages are rounded to the nearest 1%
Figure 2Examples of Tweets for Frequently Found Categories.
Performance Measures of Machine Learning Classifiers.
| Code | Classifier | Mean Accuracy over Ten Folds | Accuracy Standard Deviation over Ten Folds | Precision | Recall | F1 Score (Harmonic Mean) |
|---|---|---|---|---|---|---|
| Explanation with Neutral Presentation | Stacking of RF, SVM, Tree Bagging, LR, and GBM | 0.9164 | 0.0109 | 0.9000 | 0.4091 | 0.5625 |
| Socializing with Positive Intent | Stacking of RF, SVM, Tree Bagging, LR, Naïve Bayes, and GBM | 0.8836 | 0.0178 | 0.4615 | 0.207 | 0.286 |
| Information Seeking | Stacking of RF, SVM, Tree Bag, LR, Naïve Bayes, and GBM | 0.9750 | 0.0106 | 0.8500 | 0.5667 | 0.6800 |
| Providing Resources | Stacking of RF, SVM, Tree Bagging, LR, and Naïve Bayes | 0.8728 | 0.0192 | 0.7308 | 0.5588 | 0.6333 |
Distribution of ‘Learning in the Wild’ Codes in 2017 versus 2018 Datasets
| Manual coding | Machine learning | |
|---|---|---|
| % of tweets in the 2017 dataset (n = 594) | % of tweets in the 2018 dataset (n = 69,101) | |
| Explanation with Neutral Presentation | 12% | 7% |
| Socializing with Positive Intent | 17% | 15% |
| Information Seeking | 17% | 17% |
| Providing Resources | 38% | 33% |
Figure 3Sample Tweets as Categorized.
Comparing Performance across Classifiers
| Classifier Model | Mean Accuracy over Ten Folds | Accuracy Standard Deviation over Ten Folds |
|---|---|---|
| Code: Explanation with Neutral Presentation | ||
| RF | 0.8983 | 0.0226 |
| SVM | 0.9065 | 0.0240 |
| Tree Bagging | 0.8954 | 0.0266 |
| LR | 0.8974 | 0.0308 |
| GBM | 0.9007 | 0.0282 |
| Stacking of RF, SVM, Tree Bagging, LR, and GBM | 0.9164 | 0.0109 |
| Code: Socializing with Positive Intent | ||
| RF | 0.8468 | 0.0326 |
| SVM | 0.8467 | 0.0321 |
| Tree Bagging | 0.8434 | 0.0363 |
| LR | 0.8360 | 0.0406 |
| Naïve Bayes | 0.8346 | 0.0380 |
| GBM | 0.8467 | 0.0276 |
| Stacking of RF, SVM, Tree Bagging, LR, Naïve Bayes, and GBM | 0.8836 | 0.0178 |
| Code: Information Seeking | ||
| RF | 0.9484 | 0.0263 |
| SVM | 0.9465 | 0.0300 |
| Tree Bagging | 0.9345 | 0.0355 |
| LR | 0.9407 | 0.0308 |
| Naïve Bayes | 0.8935 | 0.0440 |
| GBM | 0.9495 | 0.0291 |
| Stacking of RF, SVM, Tree Bag, LR, Naïve Bayes, and GBM | 0.9750 | 0.0106 |
| Code: Providing Resources | ||
| RF | 0.8116 | 0.0641 |
| SVM | 0.8025 | 0.0635 |
| Tree Bagging | 0.7730 | 0.0633 |
| LR | 0.7875 | 0.0725 |
| Naïve Bayes | 0.7638 | 0.0638 |
| Stacking of RF, SVM, Tree Bagging, LR, and Naïve Bayes | 0.8728 | 0.0192 |
Lexical Features Predicting Codes of the ‘Learning in the Wild’ Schema.
| Codes | LIWC Categories | |||||
|---|---|---|---|---|---|---|
| Linguistic Processes | Other Grammar | Psychological Processes | Time Orientations | Personal Concerns | Punctuation | |
| Explanation with Neutral Presentation | Analytic Authentic Words per sentence Words >6 letters Dictionary words Total function words Auxiliary verbs Prepositions | Common verbs Number | Power Drives Relativity | Past focus Space Time | Death Informal language Netspeak | All Punctuation Other punctuation Period Comma |
| Socializing with Positive Intent | Analytic Tone Words per sentence Words >6 letters Dictionary words 1st person singular | Common verbs | Positive emotion | Present focus | Work | Exclamation marks Other punctuation |
| Information Seeking | Analytic Words per sentence Words >6 letters Dictionary words Impersonal pronouns Auxiliary verbs | Common verbs Interrogatives | Social processes Tentative | Informal language Netspeak | Period Question marks Other punctuation | |
| Providing Resources | Word count Analytic Clout Authentic Words per sentence Words >6 letters Dictionary words Total pronouns Articles Prepositions Auxiliary verbs | Common verbs Common adjectives Number | Affective processes Social processes Drives | Past focus Present focus Space Time | Informal language Netspeak | Period Comma Colon Question mark Other punctuation |