| Literature DB >> 25153333 |
David M Markowitz1, Jeffrey T Hancock2.
Abstract
When scientists report false data, does their writing style reflect their deception? In this study, we investigated the linguistic patterns of fraudulent (N = 24; 170,008 words) and genuine publications (N = 25; 189,705 words) first-authored by social psychologist Diederik Stapel. The analysis revealed that Stapel's fraudulent papers contained linguistic changes in science-related discourse dimensions, including more terms pertaining to methods, investigation, and certainty than his genuine papers. His writing style also matched patterns in other deceptive language, including fewer adjectives in fraudulent publications relative to genuine publications. Using differences in language dimensions we were able to classify Stapel's publications with above chance accuracy. Beyond these discourse dimensions, Stapel included fewer co-authors when reporting fake data than genuine data, although other evidentiary claims (e.g., number of references and experiments) did not differ across the two article types. This research supports recent findings that language cues vary systematically with deception, and that deception can be revealed in fraudulent scientific discourse.Entities:
Mesh:
Year: 2014 PMID: 25153333 PMCID: PMC4143312 DOI: 10.1371/journal.pone.0105937
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Frequencies and Percentages of Language Categories Across Stapel's Publications.
| Fraudulent | Genuine | |||||
| Discourse Category | Word Count: | 170,008 | 189,705 | |||
| Science-related | Example | Frequency | % | Frequency | % | LLR |
| Means and methods | pattern, procedure | 822 | 0.48 | 576 | 0.30 | 74.68**** |
| Certainty | explicit, precise | 840 | 0.49 | 646 | 0.34 | 51.13**** |
| Investigation | feedback, research, assess | 1,329 | 0.78 | 1,265 | 0.67 | 16.38**** |
| Amplifiers | more, extreme, profoundly | 1,192 | 0.70 | 1,125 | 0.59 | 16.24**** |
| Diminishers | less, somewhat, merely | 202 | 0.12 | 312 | 0.16 | 13.21*** |
| Reasoning | interpret, comprehend | 787 | 0.46 | 744 | 0.39 | 10.52† |
| Quantities | multiple, general, enough | 703 | 0.41 | 839 | 0.44 | 1.73 |
| Cause and effect/connection | determine, result, attribute | 4,452 | 2.62 | 5,101 | 2.69 | 1.67 |
| Deception-related | ||||||
| Emotional states and processes | affective, mood | 256 | 0.15 | 133 | 0.07 | 54.22**** |
| Adjectives | cooperative, difficult | 16,535 | 9.73 | 19,314 | 10.18 | 18.65**** |
| Negations | no, not, nor | 1,352 | 0.80 | 1,608 | 0.85 | 2.99 |
| Conjunctions | and, or | 5,536 | 3.26 | 6,025 | 3.18 | 1.80 |
| Discrepancies | could, would, should | 1,813 | 1.07 | 2,053 | 1.08 | 0.21 |
Note: Table 1 is organized by descending LLR. LLR values of 10.83 and 15.13 equate to ***p<.001 and ****p<.0001, †p<.01 respectively [20]. Wmatrix categories were renamed for clarity: Amplifiers = “Degree: Boosters,” Reasoning = “Understanding,” Certainty = “Detailed,” Discrepancies = “Modal Auxiliary Verbs,” and Negations = “Negative.”
Cross-Validated Classification Accuracy Across Stapel's Fraudulent and Genuine Publications.
| Predicted | |||
| Fraudulent | Genuine | Classification Accuracy | |
| Fraudulent ( | 17 | 7 | 70.8% |
| Genuine ( | 7 | 18 | 72.0% |
| Overall: | 71.4% | ||