| Literature DB >> 34322047 |
Diana Paula Dudău1, Florin Alin Sava1.
Abstract
Today, there is a range of computer-aided techniques to convert text into data. However, they convey not only strengths but also vulnerabilities compared to traditional content analysis. One of the challenges that have gained increasing attention is performing automatic language analysis to make sound inferences in a multilingual assessment setting. The current study is the first to test the equivalence of multiple versions of one of the most appealing and widely used lexicon-based tools worldwide, Linguistic Inquiry and Word Count 2015 (LIWC2015). For this purpose, we employed supervised learning in a classification problem and computed Pearson's correlations and intraclass correlation coefficients on a large corpus of parallel texts in English, Dutch, Brazilian Portuguese, and Romanian. Our findings suggested that LIWC2015 is a valuable tool for multilingual analysis, but within-language standardization is needed when the aim is to analyze texts sourced from different languages.Entities:
Keywords: LIWC; LIWC2015; Linguistic Inquiry and Word Count; automatic text analysis; content analyses; multilingual analysis
Year: 2021 PMID: 34322047 PMCID: PMC8311520 DOI: 10.3389/fpsyg.2021.570568
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Pearson's correlation coefficients for the word percentages obtained with the four LIWC2015 dictionaries.
| I | 0.99 | 0.89 | 0.88 | 0.89 | 0.88 | 0.86 |
| We | 0.97 | 0.72 | 0.75 | 0.72 | 0.75 | 0.68 |
| You | 0.93 | 0.71 | 0.65 | 0.71 | 0.64 | 0.50 |
| She and he | 0.74 | 0.41 | 0.30 | 0.33 | 0.33 | 0.21 |
| They | 0.83 | 0.70 | 0.60 | 0.65 | 0.58 | 0.54 |
| Impersonal | 0.73 | 0.69 | 0.50 | 0.67 | 0.55 | 0.51 |
| Articles | 0.82 | 0.64 | 0.45 | 0.59 | 0.43 | 0.33 |
| Prepositions | 0.77 | 0.65 | 0.64 | 0.64 | 0.61 | 0.66 |
| Auxiliary verbs | 0.74 | 0.77 | 0.18 | 0.65 | 0.20 | −0.01 |
| Adverbs | 0.58 | 0.61 | 0.34 | 0.67 | 0.31 | 0.31 |
| Conjunctions | 0.55 | 0.69 | 0.36 | 0.52 | 0.30 | 0.30 |
| Negations | 0.95 | 0.86 | 0.92 | 0.86 | 0.90 | 0.80 |
| Verbs | 0.75 | 0.72 | 0.73 | 0.64 | 0.65 | 0.64 |
| Adjectives | 0.46 | 0.68 | 0.49 | 0.37 | 0.31 | 0.44 |
| Comparisons | 0.64 | 0.68 | 0.57 | 0.54 | 0.45 | 0.45 |
| Interrogatives | 0.66 | 0.60 | 0.67 | 0.49 | 0.55 | 0.68 |
| Numbers | 0.91 | 0.72 | 0.77 | 0.68 | 0.73 | 0.81 |
| Quantifiers | 0.61 | 0.63 | 0.66 | 0.53 | 0.53 | 0.51 |
| Positive | 0.83 | 0.64 | 0.80 | 0.60 | 0.77 | 0.65 |
| Negative | 0.86 | 0.83 | 0.85 | 0.75 | 0.77 | 0.76 |
| Anxiety | 0.84 | 0.82 | 0.86 | 0.74 | 0.77 | 0.77 |
| Anger | 0.83 | 0.82 | 0.83 | 0.74 | 0.77 | 0.76 |
| Sadness | 0.69 | 0.71 | 0.60 | 0.53 | 0.44 | 0.52 |
| Family | 0.77 | 0.88 | 0.80 | 0.70 | 0.63 | 0.81 |
| Friend | 0.57 | 0.56 | 0.47 | 0.52 | 0.59 | 0.53 |
| Female | 0.81 | 0.85 | 0.67 | 0.76 | 0.58 | 0.67 |
| Male | 0.85 | 0.70 | 0.60 | 0.65 | 0.56 | 0.51 |
| Insight | 0.81 | 0.81 | 0.73 | 0.70 | 0.66 | 0.65 |
| Causation | 0.75 | 0.58 | 0.60 | 0.53 | 0.59 | 0.52 |
| Discrepancy | 0.66 | 0.64 | 0.72 | 0.52 | 0.59 | 0.68 |
| Tentative | 0.73 | 0.63 | 0.72 | 0.50 | 0.61 | 0.62 |
| Certainty | 0.60 | 0.65 | 0.47 | 0.56 | 0.50 | 0.52 |
| Difference | 0.77 | 0.76 | 0.69 | 0.73 | 0.65 | 0.66 |
| See | 0.86 | 0.80 | 0.88 | 0.75 | 0.85 | 0.77 |
| Hear | 0.96 | 0.82 | 0.96 | 0.83 | 0.94 | 0.79 |
| Feel | 0.81 | 0.60 | 0.79 | 0.59 | 0.65 | 0.54 |
| Body | 0.90 | 0.90 | 0.89 | 0.87 | 0.86 | 0.87 |
| Health | 0.91 | 0.91 | 0.94 | 0.86 | 0.88 | 0.90 |
| Sexual | 0.89 | 0.85 | 0.90 | 0.87 | 0.87 | 0.86 |
| Ingest | 0.93 | 0.71 | 0.91 | 0.70 | 0.87 | 0.68 |
| Affiliation | 0.90 | 0.74 | 0.60 | 0.68 | 0.51 | 0.65 |
| Achievement | 0.76 | 0.83 | 0.66 | 0.71 | 0.59 | 0.64 |
| Power | 0.71 | 0.75 | 0.15 | 0.70 | 0.11 | 0.12 |
| Reward | 0.45 | 0.35 | 0.56 | 0.28 | 0.42 | 0.45 |
| Risk | 0.81 | 0.55 | 0.62 | 0.51 | 0.57 | 0.41 |
| Past | 0.81 | 0.88 | 0.86 | 0.77 | 0.73 | 0.81 |
| Present | 0.74 | 0.75 | 0.78 | 0.56 | 0.64 | 0.64 |
| Future | 0.60 | 0.67 | 0.73 | 0.53 | 0.56 | 0.63 |
| Motion | 0.52 | 0.62 | 0.69 | 0.39 | 0.43 | 0.53 |
| Space | 0.80 | 0.76 | 0.81 | 0.67 | 0.76 | 0.73 |
| Time | 0.76 | 0.64 | 0.84 | 0.66 | 0.74 | 0.67 |
| Work | 0.81 | 0.87 | 0.87 | 0.76 | 0.75 | 0.82 |
| Leisure | 0.92 | 0.65 | 0.97 | 0.62 | 0.91 | 0.64 |
| Home | 0.88 | 0.85 | 0.83 | 0.81 | 0.78 | 0.73 |
| Money | 0.91 | 0.89 | 0.94 | 0.84 | 0.89 | 0.89 |
| Religion | 0.89 | 0.90 | 0.93 | 0.83 | 0.85 | 0.87 |
| Death | 0.88 | 0.88 | 0.93 | 0.82 | 0.86 | 0.85 |
| Swear | 0.40 | 0.38 | 0.28 | 0.25 | 0.24 | 0.23 |
| Net speak | 0.59 | 0.14 | 0.46 | 0.32 | 0.16 | 0.11 |
| Agreement | 0.79 | 0.66 | 0.78 | 0.59 | 0.69 | 0.65 |
| Non-fluencies | 0.50 | 0.27 | 0.76 | 0.41 | 0.48 | 0.21 |
| Filler words | 0.16 | 0.12 | 0.25 | 0.14 | 0.12 | 0.01 |
Eng, English; Du, Dutch; BP, Brazilian Portuguese; Ro, Romanian; N = 7,012; n = 1,753 transcripts per language;
p < 0.01.
Means (M) and standard deviations (SD) for the word percentages obtained with each LIWC2015 dictionary and the intraclass correlation coefficients (ICC) for the between-dictionary agreement.
| I | 2.78 | 2.21 | 2.81 | 2.27 | 1.87 | 1.63 | 1.17 | 1.12 | 0.83 [0.82, 0.85] |
| We | 2.13 | 1.26 | 2.12 | 1.26 | 1.07 | 0.72 | 0.95 | 0.63 | 0.71 [0.69, 0.72] |
| You | 1.87 | 1.25 | 1.82 | 1.27 | 1.59 | 0.93 | 1.05 | 0.82 | 0.69 [0.67, 0.70] |
| She and he | 0.74 | 0.95 | 1.72 | 1.00 | 7.07 | 1.38 | 3.19 | 1.13 | 0.36 [0.33, 0.38] |
| They | 1.12 | 0.72 | 1.45 | 0.85 | 2.58 | 0.94 | 1.18 | 0.56 | 0.63 [0.61, 0.65] |
| Impersonal | 7.28 | 1.83 | 6.81 | 1.41 | 17.96 | 2.28 | 3.73 | 0.96 | 0.56 [0.54, 0.58] |
| Articles | 7.40 | 1.49 | 9.59 | 1.82 | 12.64 | 1.82 | 4.27 | 0.98 | 0.54 [0.52, 0.56] |
| Prepositions | 13.34 | 1.88 | 14.11 | 2.00 | 16.10 | 2.28 | 12.60 | 1.89 | 0.65 [0.63, 0.67] |
| Auxiliary verbs | 8.93 | 1.66 | 7.20 | 1.41 | 6.95 | 1.35 | 4.45 | 1.13 | 0.45 [0.43, 0.48] |
| Adverbs | 5.84 | 1.42 | 8.31 | 1.60 | 13.30 | 2.18 | 7.39 | 2.13 | 0.44 [0.41, 0.46] |
| Conjunctions | 7.25 | 1.41 | 7.50 | 1.39 | 11.85 | 1.85 | 4.59 | 1.94 | 0.43 [0.40, 0.45] |
| Negations | 1.25 | 0.62 | 1.34 | 0.62 | 1.54 | 0.64 | 1.65 | 0.84 | 0.86 [0.85, 0.87] |
| Verbs | 16.33 | 2.68 | 15.59 | 2.21 | 13.97 | 2.11 | 17.69 | 2.66 | 0.68 [0.66, 0.70] |
| Adjectives | 4.17 | 1.10 | 6.84 | 1.33 | 4.09 | 1.01 | 7.71 | 1.51 | 0.43 [0.40, 0.45] |
| Comparisons | 2.29 | 0.79 | 3.51 | 0.88 | 2.85 | 0.82 | 2.43 | 0.87 | 0.55 [0.53, 0.58] |
| Interrogatives | 1.92 | 0.67 | 1.51 | 0.62 | 5.81 | 1.21 | 3.37 | 0.89 | 0.55 [0.53, 0.58] |
| Numbers | 1.97 | 1.01 | 2.02 | 1.00 | 4.44 | 1.16 | 4.62 | 1.20 | 0.76 [0.75, 0.78] |
| Quantifiers | 2.36 | 0.75 | 2.08 | 0.71 | 2.58 | 0.76 | 1.78 | 0.62 | 0.57 [0.55, 0.60] |
| Positive | 2.86 | 1.39 | 2.33 | 1.08 | 2.64 | 0.96 | 3.93 | 1.49 | 0.70 [0.68, 0.72] |
| Negative | 1.27 | 0.87 | 1.12 | 0.68 | 1.53 | 0.89 | 2.37 | 1.20 | 0.76 [0.74, 0.77] |
| Anxiety | 0.23 | 0.30 | 0.22 | 0.25 | 0.20 | 0.26 | 0.33 | 0.36 | 0.78 [0.77, 0.80] |
| Anger | 0.32 | 0.38 | 0.26 | 0.29 | 0.30 | 0.36 | 0.58 | 0.55 | 0.74 [0.72, 0.75] |
| Sadness | 0.25 | 0.30 | 0.29 | 0.27 | 0.29 | 0.28 | 0.57 | 0.44 | 0.54 [0.52, 0.56] |
| Family | 0.28 | 0.44 | 0.43 | 0.64 | 0.31 | 0.46 | 0.46 | 0.58 | 0.73 [0.71, 0.75] |
| Friend | 0.17 | 0.22 | 0.15 | 0.17 | 0.20 | 0.20 | 0.19 | 0.22 | 0.53 [0.51, 0.55] |
| Female | 0.48 | 0.88 | 1.33 | 1.02 | 0.80 | 0.78 | 0.81 | 0.57 | 0.70 [0.68, 0.72] |
| Male | 0.74 | 0.92 | 1.69 | 0.93 | 1.31 | 0.78 | 1.09 | 0.60 | 0.64 [0.62, 0.66] |
| Insight | 2.47 | 0.92 | 2.96 | 0.88 | 2.20 | 0.85 | 3.09 | 1.16 | 0.71 [0.69, 0.72] |
| Causation | 1.98 | 0.74 | 2.11 | 0.75 | 3.76 | 0.89 | 3.52 | 1.03 | 0.57 [0.55, 0.59] |
| Discrepancy | 1.46 | 0.64 | 2.70 | 0.86 | 2.94 | 0.92 | 2.52 | 1.00 | 0.61 [0.59, 0.63] |
| Tentative | 2.48 | 0.89 | 2.72 | 0.84 | 3.54 | 1.00 | 4.41 | 1.25 | 0.61 [0.59, 0.64] |
| Certainty | 1.40 | 0.53 | 1.60 | 0.55 | 1.69 | 0.56 | 2.01 | 0.74 | 0.53 [0.50, 0.55] |
| Difference | 3.05 | 0.90 | 3.11 | 0.92 | 4.07 | 1.04 | 3.81 | 1.07 | 0.70 [0.69, 0.72] |
| See | 1.25 | 0.91 | 1.06 | 0.75 | 1.38 | 0.85 | 1.33 | 0.96 | 0.81 [0.80, 0.83] |
| Hear | 1.13 | 2.85 | 1.03 | 2.46 | 1.14 | 1.66 | 1.80 | 5.55 | 0.69 [0.67, 0.71] |
| Feel | 0.40 | 0.38 | 0.35 | 0.33 | 0.54 | 0.42 | 0.45 | 0.41 | 0.65 [0.63, 0.67] |
| Body | 0.61 | 0.75 | 0.47 | 0.58 | 0.59 | 0.72 | 0.71 | 0.80 | 0.87 [0.86, 0.88] |
| Health | 0.78 | 0.93 | 0.61 | 0.66 | 0.81 | 0.90 | 0.77 | 0.97 | 0.88 [0.87, 0.89] |
| Sexual | 0.11 | 0.29 | 0.08 | 0.23 | 0.09 | 0.27 | 0.10 | 0.33 | 0.85 [0.84, 0.86] |
| Ingest | 0.34 | 0.56 | 0.26 | 0.43 | 0.94 | 0.53 | 0.45 | 0.62 | 0.78 [0.77, 0.80] |
| Affiliation | 3.08 | 1.43 | 3.10 | 1.44 | 2.11 | 0.99 | 1.65 | 0.88 | 0.66 [0.64, 0.68] |
| Achievement | 1.45 | 0.70 | 1.39 | 0.63 | 1.52 | 0.65 | 2.93 | 0.98 | 0.66 [0.64, 0.68] |
| Power | 2.32 | 1.04 | 2.09 | 0.98 | 2.45 | 0.98 | 4.15 | 3.20 | 0.17 [0.15, 0.19] |
| Reward | 1.19 | 0.58 | 0.83 | 0.58 | 2.49 | 0.73 | 1.03 | 0.54 | 0.40 [0.37, 0.42] |
| Risk | 0.47 | 0.39 | 0.48 | 0.37 | 1.79 | 0.60 | 0.99 | 0.62 | 0.52 [0.49, 0.54] |
| Past | 3.82 | 1.88 | 5.47 | 1.77 | 2.91 | 1.55 | 8.16 | 2.54 | 0.77 [0.76, 0.79] |
| Present | 11.17 | 2.64 | 12.72 | 2.10 | 8.78 | 1.84 | 10.76 | 2.42 | 0.68 [0.66, 0.70] |
| Future | 1.11 | 0.56 | 2.29 | 0.79 | 0.83 | 0.50 | 0.97 | 0.62 | 0.59 [0.57, 0.61] |
| Motion | 2.09 | 0.78 | 1.72 | 0.76 | 4.14 | 1.03 | 2.32 | 0.87 | 0.52 [0.50, 0.54] |
| Space | 7.31 | 1.78 | 6.60 | 1.49 | 6.65 | 1.58 | 9.99 | 2.26 | 0.73 [0.71, 0.75] |
| Time | 4.46 | 1.55 | 5.02 | 1.22 | 6.09 | 1.33 | 6.00 | 1.69 | 0.71 [0.69, 0.73] |
| Work | 2.45 | 1.50 | 2.02 | 1.16 | 2.15 | 1.23 | 2.77 | 1.44 | 0.81 [0.79, 0.82] |
| Leisure | 1.09 | 3.03 | 0.79 | 2.42 | 0.86 | 1.02 | 1.17 | 2.75 | 0.74 [0.72, 0.75] |
| Home | 0.29 | 0.33 | 0.21 | 0.27 | 0.27 | 0.31 | 0.32 | 0.40 | 0.79 [0.78, 0.80] |
| Money | 0.64 | 0.80 | 0.52 | 0.60 | 0.68 | 0.72 | 0.66 | 0.81 | 0.88 [0.87, 0.89] |
| Religion | 0.19 | 0.42 | 0.17 | 0.34 | 0.21 | 0.41 | 0.22 | 0.46 | 0.87 [0.86, 0.88] |
| Death | 0.18 | 0.32 | 0.15 | 0.25 | 0.18 | 0.29 | 0.20 | 0.35 | 0.86 [0.85, 0.87] |
| Swear | 0.03 | 0.08 | 0.03 | 0.08 | 0.01 | 0.04 | 0.05 | 0.11 | 0.26 [0.24, 0.29] |
| Net speak | 0.09 | 0.43 | 0.46 | 0.60 | 0.21 | 0.38 | 0.16 | 0.28 | 0.3 [0.27, 0.32] |
| Agreement | 0.17 | 0.31 | 0.14 | 0.26 | 0.37 | 0.36 | 0.25 | 0.33 | 0.68 [0.66, 0.70] |
| Non-fluencies | 0.20 | 0.24 | 0.03 | 0.11 | 0.19 | 0.22 | 0.07 | 0.20 | 0.39 [0.37, 0.42] |
| Filler words | 0.01 | 0.05 | 1.71 | 1.15 | 0.65 | 0.44 | 0.01 | 0.03 | 0.04 [0.02, 0.06] |
Eng, English; Du, Dutch; BP, Brazilian Portuguese; Ro, Romanian; N = 7,012; n = 1,753 transcripts per language.
The composition of the transcripts based on the LIWC2015 tokenizer.
| English | 1,980.93 | 999.15 | 25.71 | 54.83 | 86.62% | 4.55% |
| Dutch | 1,852.36 | 931.24 | 48.19 | 150.73 | 79.09% | 4.69% |
| Brazilian Portuguese | 1,915.69 | 957.49 | 52.31 | 97.51 | 79.59% | 4.46% |
| Romanian | 1,792.31 | 921.10 | 29.00 | 94.71 | 75.25% | 5.12% |
Word counts = raw number of words; Dictionary words = the percentage of words in the analyzed text covered by the dictionary; N = 7,012; n = 1,753 transcripts per language.
Figure 1The performance of the SVM in estimating the language of the transcripts based on all LIWC2015 categories after grand mean standardization (image on the left) and within-language standardization (image on the right). The results were obtained on the test subset.
The performance of the SVM in classifying the transcripts.
| Eng | 0.44 | 0.06 | 0.87 | 0.91 | 0.48 | 0.09 | |
| Du | 0.61 | 0.08 | 0.82 | 0.93 | 0.56 | 0.12 | |
| BP | 0.99 | 0.01 | 1 | 0.99 | 1 | 0.02 | |
| Ro | 0.99 | 0.87 | 0.99 | 0.17 | 0.98 | 0.40 | |
| Eng | 0.62 | 0.84 | 0.89 | 0.15 | 0.64 | 0.38 | |
| Du | 0.72 | 0.01 | 0.89 | 0.99 | 0.70 | 0.02 | |
| BP | 0.94 | 0.14 | 0.98 | 0.86 | 0.95 | 0.18 | |
| Ro | 0.88 | 0.00 | 0.95 | 1 | 0.87 | – | |
GMS, Grand mean standardization; WLS, Within-language standardization; The results were obtained on the test subset; Eng, English; Du, Dutch; BP, Brazilian Portuguese; Ro, Romanian; N = 1,752; n = 438 transcripts per language.
The AUC for the binary classifications.
| English | Dutch | 0.53 | 0.49 | |
| Brazilian Portuguese | 0.98 | 0.47 | ||
| Romanian | 0.98 | 0.50 | ||
| Dutch | Brazilian Portuguese | 0.99 | 0.48 | |
| Romanian | 0.99 | 0.52 | ||
| Brazilian Portuguese | Romanian | 0.99 | 0.53 | |
| English | Dutch | 0.65 | 0.50 | |
| Brazilian Portuguese | 0.90 | 0.50 | ||
| Romanian | 0.92 | 0.49 | ||
| Dutch | Brazilian Portuguese | 0.96 | 0.50 | |
| Romanian | 0.94 | 0.50 | ||
| Brazilian Portuguese | Romanian | 0.89 | 0.49 |
AUC.