| Literature DB >> 28328964 |
Vassiliki Rentoumi1, Timothy Peters2, Jonathan Conlin3, Peter Garrard1.
Abstract
We used a computational linguistic approach, exploiting machine learning techniques, to examine the letters written by King George III during mentally healthy and apparently mentally ill periods of his life. The aims of the study were: first, to establish the existence of alterations in the King's written language at the onset of his first manic episode; and secondly to identify salient sources of variation contributing to the changes. Effects on language were sought in two control conditions (politically stressful vs. politically tranquil periods and seasonal variation). We found clear differences in the letter corpus, across a range of different features, in association with the onset of mental derangement, which were driven by a combination of linguistic and information theory features that appeared to be specific to the contrast between acute mania and mental stability. The paucity of existing data relevant to changes in written language in the presence of acute mania suggests that lexical, syntactic and stylometric descriptions of written discourse produced by a cohort of patients with a diagnosis of acute mania will be necessary to support the diagnosis independently and to look for other periods of mental illness of the course of the King's life, and in other historically significant figures with similarly large archives of handwritten documents.Entities:
Mesh:
Year: 2017 PMID: 28328964 PMCID: PMC5362044 DOI: 10.1371/journal.pone.0171626
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Structure and characteristics of the data sets used in contrasts A to F.
| Comparison | Date ranges of letters | Number of letters in subset | Average word tokens per letter | Relevant contrast | Expected outcome of ML classification | Data sources | |
|---|---|---|---|---|---|---|---|
| A | Pre-mania | April 1788–September 1788 | 21 | 129.1 | Mania related language change (immediate) | Distinguishable | Aspinal Vol. 1 [19] Stanhope Vol. 2 [55] |
| vs. | |||||||
| Acute mania | October 1788–April 1789 | 31 | 152.4 | ||||
| B | Acute mania | October 1788–April 1789 | 31 | 152.4 | Mania related language change (immediate) | Distinguishable | Aspinal Vol. 1 [19] Stanhope Vol. 2 [55] Dobree [21] |
| vs. | |||||||
| Post-mania | May 1788–October 1789 | 37 | 73.3 | ||||
| C | Acute mania | October 1788–April 1789 | 31 | 152.4 | Mania related language change (delayed) | Distinguishable | Aspinal Vol. 1 [19] Stanhope Vol. 2 [55] Fortescue [16] |
| vs. | |||||||
| Mentally healthy, no stressors | October 1770–April 1771 | 47 | 85.1 | ||||
| D | Acute mania | October 1788–April 1789 | 31 | 152.4 | Stress related language change | Distinguishable | Aspinal Vol. 1 [19] Stanhope Vol. 2 [55] Fortescue [16] |
| vs. | |||||||
| Mentally healthy, political stressors | April 1780–October 1781 | 42 | 127 | ||||
| E | Mentally healthy: Spring & Summer | April 1770–September 1770 | 20 | 57.2 | Seasonally-determined language change | Indistinguishable | Fortescue [16] |
| vs. | |||||||
| Mentally Healthy: Autumn & Winter | October 1770–April 1771 | 47 | 85.1 | ||||
| F | Mentally healthy: Autumn & Winter | October 1770–April 1771 | 47 | 85.1 | Seasonally-determined language change | Indistinguishable | Fortescue [16] |
| vs. | |||||||
| Mentally healthy: Spring & Summer | May 1771–October 1771 | 11 | 101 | ||||
Definitions of the syntactic and textual features used in the analysis.
| Feature category | Feature | Abbreviation | Definition | |
|---|---|---|---|---|
| 1 | Mean length of clause | MLC | Number of words / number of clauses | |
| 2 | Mean length of sentence | MLS | Number of words / number of sentences | |
| 3 | Mean length of T-Unit | MLT | Number of words / number of T-units | |
| 4 | Sentence complexity ratio | C/S | Number of clauses/number of sentences | |
| 5 | T-Unit complexity ratio | C/T | Number of clauses/number of T-units | |
| 6 | Complex T-Unit ratio | CT/T | Number of complex T-units | |
| 7 | Dependent clause ratio | DC/C | Number of dependent clauses | |
| 8 | Dependent clauses per T-Unit | DC/T | Number of dependent clauses / number of T-units | |
| 9 | Coordinate phrases per clause | CP/C | Number of coordinate phrases | |
| 10 | Coordinate phrases per T-Unit | CP/T | Number of coordinate phrases / number of T-units | |
| 11 | Sentence coordination ratio | T/S | Number of T-units / number of sentences | |
| 12 | Complex nominals per clause | CN/C | Number of complex nominals | |
| 13 | Complex nominals per T-Unit | CN/T | Number of complex nominals / number of T-Units | |
| 14 | Verb phrases per T-Unit | VP/T | Number of verb phrases / number of T-units | |
| 15 | Lexical word variation | LV | Number of lexical word types / number of lexical words | |
| 16 | Bilogarithmic type token ratio | LogTTR | Log10 word types / Log10 total words (tokens) | |
| 17 | Noun variation | NV | Number of noun types / number of lexical words | |
| 18 | Adjective variation | ADJV | Number of adjective types / number of lexical words | |
| 19 | Modifier variation | MODV | Number of adjective types + adverb types / number of lexical words | |
| 20 | Adverb variation | ADVV | Number of adverb types / number of lexical words | |
| 21 | Corrected verb variation | CVV | Number of lexical verb types / √(number of verbs x 2) | |
| 22 | Verb variation | VV | Number of lexical verb types / number of lexical words | |
| 23 | Brunet's index W | W | Nv-0.165 (N = number of word tokens; V = vocabulary) | |
| 24 | Hapax legomena | HL | Mean number of hapax legomena (once-used tokens) per 10 word block | |
| 25 | Pair hapax legomena | PHL | Mean number of pair hapax legomena (once-used token pairs) per 10 word block | |
| 26 | Simpson's diversity index | D | ||
| 27 | Dis legomena over vocabulary | DL/V | Mean number of dis legomenta (twice-used tokens) / number of word types | |
| 28 | Shannon entropy | H | ||
| 29 | Compression ratio | CR | Compressed (zipped) file size / uncompressed size |
1 Clause = a structure consisting of at least a subject and a finite verb
2 Sentence = a group of words delimited with a punctuation mark that signals a sentence-ending
3 T-Unit = a main clause plus attached or embedded subordinate clause or other structure
4 Complex T-Unit = a T-Unit that contains a dependent clause
5 Dependent clause = a clause that cannot form a sentence on its own
6 Coordinate phrase = a phrase immediately before a coordinating conjunction (e.g. 'and', 'but', 'or')
7 Complex nominal = a sequence of words denoting a single concept
Feature selection results for contrasts A to F.
| Contrast | Features selected | IG value | Average feature value in each text set | |||
|---|---|---|---|---|---|---|
| 1 | 2 | 1 | 2 | |||
| A | Pre-mania | Acute mania | DL/V | 0.35 | 0.12 | 0.09 |
| W | 0.33 | 9.55 | 8.56 | |||
| Log TTR | 0.17 | 0.91 | 0.92 | |||
| B | Acute mania | Post mania | W | 0.34 | 9.62 | 8.68 |
| CR | 0.3 | 0.59 | 0.64 | |||
| Log TTR | 0.21 | 0.91 | 0.94 | |||
| H | 0.2 | 0.95 | 0.96 | |||
| ADVV | 0.1 | 0.037 | 0.05 | |||
| C | Acute mania | Mentally healthy, no stressors | W | 0.34 | 9.61 | 8.74 |
| Log TTR | 0.21 | 0.91 | 0.93 | |||
| H | 0.15 | 0.95 | 0.96 | |||
| CR | 0.15 | 0.59 | 0.64 | |||
| LV | 0.15 | 0.89 | 0.93 | |||
| CP/T | 0.12 | 0.62 | 0.39 | |||
| D | Acute mania | Mentally healthy, political stressors | C/S | 0.28 | 4.19 | 5.95 |
| MLS | 0.27 | 44.01 | 55.32 | |||
| CP/T | 0.18 | 0.62 | 0.37 | |||
| CP/C | 0.14 | 0.2 | 0.12 | |||
| E | Mentally healthy: Spring & Summer | Mentally healthy: Autumn & Winter | None | |||
| F | Mentally healthy: Autumn & Winter | Mentally healthy: Spring & Summer | None | |||
Micro-average classification accuracy of ML classifiers bs. Baseline for comparisons A to F.
| Comparison | ML classifier | Baseline | ||||||
|---|---|---|---|---|---|---|---|---|
| Correct | Incorrect | Accuracy | Correct | Incorrect | Accuracy | p-value | ||
| A | Pre-mania | 11 | 10 | 0.75 | 0 | 21 | 0.6 | < 0.05 |
| Acute mania | 28 | 3 | 31 | 0 | ||||
| B | Acute mania | 26 | 5 | 0.76 | 0 | 31 | 0.54 | < 0.05 |
| Post-mania | 26 | 11 | 37 | 0 | ||||
| C | Acute mania | 18 | 13 | 0.7 | 0 | 31 | 0.6 | < 0.05 |
| Mentally healthy, no stressors | 37 | 10 | 47 | 0 | ||||
| D | Acute mania | 25 | 6 | 0.69 | 0 | 31 | 0.57 | < 0.01 |
| Mentally healthy, political stressors | 26 | 16 | 42 | 0 | ||||
| E | Mentally healthy: Spring & Summer | 38 | 9 | 0.72 | 47 | 0 | 0.7 | > 0.05 |
| Mentally Healthy: Autumn & Winter | 10 | 10 | 0 | 20 | ||||
| F | Mentally healthy: Autumn & Winter | 34 | 13 | 0.62 | 47 | 0 | 0.81 | > 0.05 |
| Mentally healthy: Spring & Summer | 2 | 9 | 0 | 11 | ||||
1 Naïve Bayes (NB) for comparisons A to D; Multilayer perceptron (MLP) for comparisons E & F
2 Student's t-tests, classifier vs. baseline
Stability analysis for letters written during the acute mania period.
| Feature category | Feature | MEAN (sd) VALUE IN: | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Early manic phase | Mid manic phase | Late manic phase | p value | Significance | ||||||
| 1 | MLC | 11 | (1.38) | 11.59 | (1.19) | 12.2 | (3.91) | 0.07 | ns | |
| 2 | MLS | 41.49 | (13.14) | 43.49 | (15.25) | 47.28 | (13.59) | 0.73 | ns | |
| 3 | MLT | 28.48 | (7.73) | 35.47 | (10.67) | 37.7 | (16.53) | 0.27 | ns | |
| 4 | C/S | 4.62 | (1.27) | 3.8 | (1.42) | 4.1 | (1.35) | 0.54 | ns | |
| 5 | C/T | 3.13 | (0.6) | 3.12 | (1.1) | 3.07 | (0.86) | 0.99 | ns | |
| 6 | CT/T | 0.85 | (0.11) | 0.76 | (0.24) | 0.85 | (0.21) | 0.5 | ns | |
| 7 | DC/C | 0.59 | (0.13) | 0.6 | (0.14) | 0.58 | (0.16) | 0.93 | ns | |
| 8 | DC/T | 1.9 | (0.7) | 1.99 | (1.11) | 1.87 | (0.98) | 0.94 | ns | |
| 9 | CP/C | 0.19 | 0.25 | (0.16) | 0.15 | (0.11) | 0.18 | ns | ||
| 10 | CP/T | 0.61 | (0.33) | 0.82 | (0.68) | 0.44 | (0.36) | 0.25 | ns | |
| 11 | T/S | 1.49 | (0.35) | 1.24 | (0.36) | 1.36 | (0.46) | 0.34 | ns | |
| 12 | CN/C | 1.47 | (0.25) | 1.51 | (0.4) | 1.57 | (0.76) | 0.45 | ns | |
| 13 | CN/T | 4 | (1.13) | 4.59 | (1.54) | 4.9 | (2.87) | 0.07 | ns | |
| 14 | VP/T | 4.02 | (0.84) | 3.99 | (1.85) | 4.59 | (1.96) | 0.65 | ns | |
| 15 | LV | 0.88 | (0.04) | 0.91 | (0.04) | 0.89 | (0.05) | 0.88 | ns | |
| 16 | LogTTR | 0.79 | (0.02) | 0.8 | (0.05) | 0.77 | (0.07) | 0.09 | ns | |
| 17 | NV | 0.88 | (0.06) | 0.94 | (0.06) | 0.87 | (0.08) | 0.94 | ns | |
| 18 | ADJV | 0.15 | (0.04) | 0.13 | (0.06) | 0.11 | (0.05) | 0.34 | ns | |
| 19 | MODV | 0.2 | (0.03) | 0.16 | (0.08) | 0.14 | (0.06) | 0.24 | ns | |
| 20 | ADVV | 0.05 | (0.02) | 0.03 | (0.03) | 0.04 | (0.03) | 0.17 | ns | |
| 21 | CVV | 2.85 | (0.06) | 2.34 | (0.69) | 2.26 | (0.53) | 0.08 | ns | |
| 22 | VV | 0.27 | (0.05) | 0.24 | (0.07) | 0.24 | (0.05) | 0.34 | ns | |
| 23 | W | 10.27 | (0.5) | 9.23 | (1.43) | 9.23 | (0.67) | 0.07 | ns | |
| 24 | HL | 9.4 | (0.18) | 9.32 | (0.59) | 8.99 | (0.56) | 0.25 | ns | |
| 25 | PHL | 9.98 | (0.04) | 9.89 | (0.31) | 9.99 | (0.03) | 0.36 | ns | |
| 26 | D | 0.89 | (0.02) | 0.89 | (0.01) | 0.89 | (0.01) | 0.24 | ns | |
| 27 | DL/V | 0.12 | (0.04) | 0.11 | (0.04) | 0.13 | (0.05) | 0.71 | ns | |
| 28 | H | 0.96 | (0.01) | 0.96 | (0.02) | 0.95 | (0.01) | 0.15 | ns | |
| 29 | CR | 0.58 | (0.03) | 0.63 | (0.11) | 0.6 | (0.04) | 0.08 | ns | |
1One-way ANOVA
2Based on a Bonferroni-corrected threshold of 0.001.
Stability analysis for letters written during the healthy period October 1770- April 1771.
| Feature category | Feature | MEAN (sd) VALUE IN: | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Early healthy phase | Mid healthy phase | Late healthy phase | p-value | Significance | ||||||
| 1 | MLC | 10.02 | 2.02 | 10.75 | 3.98 | 11.13 | 3.32 | 0.38 | ns | |
| 2 | MLS | 44.38 | 16.28 | 40.88 | 11.71 | 46.04 | 15.6 | 0.78 | ns | |
| 3 | MLT | 33.11 | 13.37 | 31.84 | 14.35 | 27.16 | 8.22 | 0.19 | ns | |
| 4 | C/S | 4.45 | 1.45 | 4.09 | 1.43 | 4.26 | 1.31 | 0.9 | ns | |
| 5 | C/T | 3.26 | 0.99 | 3.1 | 1.29 | 2.56 | 0.81 | 0.08 | ns | |
| 6 | CT/T | 0.83 | 0.23 | 0.82 | 0.25 | 0.82 | 0.24 | 0.89 | ns | |
| 7 | DC/C | 0.64 | 0.12 | 0.66 | 0.14 | 0.58 | 0.1 | 0.1 | ns | |
| 8 | DC/T | 2.15 | 0.95 | 2.16 | 1.21 | 1.47 | 0.7 | 0.22 | ns | |
| 9 | CP/C | 0.18 | 0.15 | 0.16 | 0.21 | 0.08 | 0.1 | 0.17 | ns | |
| 10 | CP/T | 0.51 | 0.38 | 0.45 | 0.61 | 0.22 | 0.24 | 0.12 | ns | |
| 11 | T/S | 1.44 | 0.46 | 1.4 | 0.46 | 1.73 | 0.5 | 0.72 | ns | |
| 12 | CN/C | 1.07 | 0.33 | 1.17 | 0.53 | 1.42 | 0.46 | 0.08 | ns | |
| 13 | CN/T | 3.59 | 1.79 | 3.7 | 2.47 | 3.44 | 1.18 | 0.59 | ns | |
| 14 | VP/T | 4.47 | 1.25 | 4.15 | 1.61 | 3.95 | 1.13 | 0.51 | ns | |
| 15 | LV | 0.94 | 0.05 | 0.92 | 0.08 | 0.94 | 0.06 | 0.86 | ns | |
| 16 | LogTTR | 0.79 | 0.03 | 0.8 | 0.05 | 0.798 | 0.04 | 0.58 | ns | |
| 17 | NV | 0.93 | 0.07 | 0.9 | 0.1 | 0.92 | 0.09 | 0.69 | ns | |
| 18 | ADJV | 0.1 | 0.05 | 0.13 | 0.06 | 0.14 | 0.07 | 0.22 | ns | |
| 19 | MODV | 0.14 | 0.07 | 0.17 | 0.07 | 0.18 | 0.09 | 0.39 | ns | |
| 20 | ADVV | 0.04 | 0.04 | 0.04 | 0.03 | 0.04 | 0.03 | 0.92 | ns | |
| 21 | CVV | 2.2 | 0.48 | 1.79 | 0.55 | 2.15 | 0.59 | 0.13 | ns | |
| 22 | VV | 0.26 | 0.04 | 0.23 | 0.06 | 0.28 | 0.07 | 0.18 | ns | |
| 23 | W | 8.93 | 0.75 | 8.52 | 0.91 | 8.78 | 0.99 | 0.37 | ns | |
| 24 | HL | 9.03 | 0.72 | 9.43 | 0.56 | 9.27 | 0.33 | 0.2 | ns | |
| 25 | PHL | 9.96 | 0.17 | 9.97 | 0.09 | 10 | 0 | 0.55 | ns | |
| 26 | D | 0.89 | 0.01 | 0.89 | 0.01 | 0.89 | 0 | 0.24 | ns | |
| 27 | DL/V | 0.11 | 0.04 | 0.08 | 0.03 | 0.08 | 0.04 | 0.11 | ns | |
| 28 | H | 0.96 | 0 | 0.97 | 0 | 0.96 | 0 | 0.28 | ns | |
| 29 | CR | 0.63 | 0.04 | 0.65 | 0.06 | 0.63 | 0.06 | 0.4 | ns | |
1One-way ANOVA
2Based on a Bonferroni-corrected threshold of 0.001