| Literature DB >> 29599734 |
David Kernot1,2, Terry Bossomaier3, Roger Bradbury1.
Abstract
Little is known of the private life of William Shakespeare, but he is famous for his collection of plays and poems, even though many of the works attributed to him were published anonymously. Determining the identity of Shakespeare has fascinated scholars for 400 years, and four significant figures in English literary history have been suggested as likely alternatives to Shakespeare for some disputed works: Bacon, de Vere, Stanley, and Marlowe. A myriad of computational and statistical tools and techniques have been used to determine the true authorship of his works. Many of these techniques rely on basic statistical correlations, word counts, collocated word groups, or keyword density, but no one method has been decided on. We suggest that an alternative technique that uses word semantics to draw on personality can provide an accurate profile of a person. To test this claim, we analyse the works of Shakespeare, Christopher Marlowe, and Elizabeth Cary. We use Word Accumulation Curves, Hierarchical Clustering overlays, Principal Component Analysis, and Linear Discriminant Analysis techniques in combination with RPAS, a multi-faceted text analysis approach that draws on a writer's personality, or self to identify subtle characteristics within a person's writing style. Here we find that RPAS can separate the known authored works of Shakespeare from Marlowe and Cary. Further, it separates their contested works, works suspected of being written by others. While few authorship identification techniques identify self from the way a person writes, we demonstrate that these stylistic characteristics are as applicable 400 years ago as they are today and have the potential to be used within cyberspace for law enforcement purposes.Entities:
Keywords: authorship identification; linear discriminant analysis; personality; principal component analysis; sensory processing
Year: 2018 PMID: 29599734 PMCID: PMC5862847 DOI: 10.3389/fpsyg.2018.00289
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Shakespeare, Marlowe, and Cary's Works and how they were broken into chunks.
| 1 | 1589 | Comedy of Errors | C | C1 | Comedy of Errors |
| 2 | 1590 | Henry VI, Part II | H | H1 | Henry VI, Part II |
| 3 | 1590 | Henry VI, Part III | H | H2 | Henry VI, Part III |
| 4 | 1591 | Henry VI, Part I | H | H3 | Henry VI, Part I |
| 5 | 1592 | Richard III | H | H4 | Richard III |
| 6 | 1593 | Taming of the Shrew | C | C2 | Taming of the Shrew |
| 7 | 1593 | Titus Andronicus | T | T1 | Titus Andronicus |
| 8 | 1593 | Venus and Adonis | P | P1 | Venus and Adonis |
| 9 | 1594 | Love's Labour's Lost | C | C4 | Love's Labour's Lost |
| 10 | 1594 | Romeo and Juliet | T | T2 | Romeo and Juliet |
| 11 | 1594 | The Rape of Lucrece | P | P2 | The Rape of Lucrece |
| 12 | 1594 | Two Gentlemen of Verona | C | C3 | Two Gentlemen of Verona |
| 13 | 1595 | Midsummer Night's Dream | C | C5 | Midsummer Night's Dream |
| 14 | 1595 | Richard II | H | H5 | Richard II |
| 15 | 1596 | King John | H | H6 | King John |
| 16 | 1596 | Merchant of Venice | C | C6 | Merchant of Venice |
| 17 | 1597 | Henry IV, Part I | H | H7 | Henry IV, Part I |
| 18 | 1597 | Henry IV, Part II | H | H8 | Henry IV, Part II |
| 19 | 1598 | Henry V | H | H9 | Henry V |
| 20 | 1598 | Much Ado about Nothing | C | C7 | Much Ado about Nothing |
| 21 | 1599 | As You Like It | C | C9 | As You Like It |
| 22 | 1599 | Julius Caesar | T | T3 | Julius Caesar |
| 23 | 1599 | Love's Answer | P | P5 | The Passionate Pilgrim |
| 24 | 1599 | Sonnets to sundry notes of music | P | P4 | The Passionate Pilgrim |
| 25 | 1599 | The Passionate Pilgrim | P | P3 | The Passionate Pilgrim |
| 26 | 1599 | Twelfth Night | C | C8 | Twelfth Night |
| 27 | 1600 | Hamlet | T | T4 | Hamlet |
| 28 | 1600 | Merry Wives of Windsor | C | C10 | Merry Wives of Windsor |
| 29 | 1601 | The Phoenix and the Turtle | P | P6 | The Phoenix and the Turtle |
| 30 | 1601 | Threnos | P | P7 | The Phoenix and the Turtle |
| 31 | 1601 | Troilus and Cressida | C | C11 | Troilus and Cressida |
| 32 | 1602 | All's Well That Ends Well | C | C12 | All's Well That Ends Well |
| 33 | 1604 | Measure for Measure | C | C13 | Measure for Measure |
| 34 | 1604 | Othello | T | T5 | Othello |
| 35 | 1605 | King Lear | T | T6 | King Lear |
| 36 | 1605 | Macbeth | T | T7 | Macbeth |
| 37 | 1606 | Anthony and Cleopatra | T | T10 | Anthony and Cleopatra |
| 38 | 1607 | Coriolanus | T | T8 | Coriolanus |
| 39 | 1607 | Timon of Athens | T | T9 | Timon of Athens |
| 40 | 1608 | Pericles | C | C14 | Pericles |
| 41 | 1609 | A Lover's Complaint | P | P8 | The Passionate Pilgrim |
| 42 | 1609 | Cymbeline | C | C15 | Cymbeline |
| 43 | 1609 | Sonnets | P | P9 | Sonnets |
| 44 | 1610 | Winter's Tale | C | C16 | Winter's Tale |
| 45 | 1611 | Tempest | C | C17 | Tempest |
| 46 | 1612 | Henry VIII | H | H10 | Henry VIII |
| 47 | 1590 | Tamburlaine Part I | M1 | Tamburlaine The Great Part I | |
| 48 | 1590 | Tamburlaine Part II | M2 | Tamburlaine The Great Part II | |
| 49 | Edward II | H | M3 | Edward II | |
| 50 | The Jew of Malta | T | M4 | The Jew of Malta | |
| 51 | Doctor Faustus | M5 | Doctor Faustus | ||
| 52 | Dido Queen of Carthage | M6 | Dido Queen of Carthage | ||
| 53 | The Massacre at Paris | M7 | The Massacre at Paris with the Death of the Duke of Guise | ||
| 54 | Hero and Leander | P | M8 | Hero and Leander | |
| 55 | The Passionate Shepherd | P | M9 | The Passionate Shepherd to His Love | |
| 56 | Walter Raleigh | P | M10 | The Passionate Shepherd to His Love | |
| ELIZABETH CARY | |||||
| 57 | 1612 | The Tragedy of Mariam | T | EC1 | The Tragedy of Mariam, the Fair Queen of Jewry |
Type: C, Comedies; H, Histories; T, Tragedies; P, Poems.
The Year may not have any bearing as many works may well have been written earlier. In Marlowe's case, all but two of his works were published after his death.
Pearson correlation coefficient, R, results of RPAS, the five Sensory elements (VAHOG), and the four Referential Activity Power elements.
| Richness (R) | Pearson Correlation | 1 | 0.399 | −0.833 | 0.456 | |
| Sig. (2-tailed) | 0.002 | 0 | 0 | |||
| N | 57 | 57 | 57 | 57 | ||
| Personal_Pronouns (P) | Pearson Correlation | 0.399 | 1 | −0.451 | 0.366 | |
| Sig. (2-tailed) | 0.002 | 0 | 0.005 | |||
| N | 57 | 57 | 57 | 57 | ||
| RA Power (A) | Pearson Correlation | −0.833 | −0.451 | 1 | −0.575 | |
| Sig. (2-tailed) | 0 | 0 | 0 | |||
| N | 57 | 57 | 57 | 57 | ||
| Sensory (S) | Pearson Correlation | 0.456 | 0.366 | −0.575 | 1 | |
| Sig. (2-tailed) | 0 | 0.005 | 0 | |||
| N | 57 | 57 | 57 | 57 | ||
| Sensory-Visual (V) | Pearson Correlation | 1 | 0.284 | 0.715 | 0.784 | 0.571 |
| Sig. (2-tailed) | 0.032 | 0 | 0 | 0 | ||
| N | 57 | 57 | 57 | 57 | 57 | |
| Sensory-Auditory (A) | Pearson Correlation | 0.284 | 1 | −0.038 | 0.167 | −0.119 |
| Sig. (2-tailed) | 0.032 | 0.777 | 0.215 | 0.378 | ||
| N | 57 | 57 | 57 | 57 | 57 | |
| Sensory-Haptic (H) | Pearson Correlation | 0.715 | −0.038 | 1 | 0.632 | 0.772 |
| Sig. (2-tailed) | 0 | 0.777 | 0 | 0 | ||
| N | 57 | 57 | 57 | 57 | 57 | |
| Sensory-Olfactory (O) | Pearson Correlation | 0.784 | 0.167 | 0.632 | 1 | 0.628 |
| Sig. (2-tailed) | 0 | 0.215 | 0 | 0 | ||
| N | 57 | 57 | 57 | 57 | 57 | |
| Sensory-Gustatory (G) | Pearson Correlation | 0.571 | −0.119 | −0.119 | 0.628 | 1 |
| Sig. (2-tailed) | 0 | 0.378 | 0.378 | 0 | ||
| N | 57 | 57 | 57 | 57 | 57 | |
| RA Power-Article (A) | Pearson Correlation | 1 | 0.800 | 0.899 | 0.686 | |
| Sig. (2-tailed) | 0 | 0 | 0 | |||
| N | 57 | 57 | 57 | 57 | ||
| RA Power-Conjunctive (C) | Pearson Correlation | 0.800 | 1 | 0.859 | 0.563 | |
| Sig. (2-tailed) | 0 | 0 | 0 | |||
| N | 57 | 57 | 57 | 57 | ||
| RA Power-Preposition (P) | Pearson Correlation | 0.899 | 0.859 | 1 | 0.706 | |
| Sig. (2-tailed) | 0 | 0 | 0 | |||
| N | 57 | 57 | 57 | 57 | ||
| RA Power-Pronoun (PRON) | Pearson Correlation | 0.686 | 0.563 | 0.706 | 1 | |
| Sig. (2-tailed) | 0 | 0 | 0 | |||
| N | 57 | 57 | 57 | 57 | ||
Correlation is significant at the 0.05 level (2-tailed).
Correlation is significant at the 0.01 level (2-tailed).
Figure 1Word Accumulation Curves for Shakespeare, Marlowe, and Cary by word groups and accumulated words. In the (lower), the different number of words each playwright used is shown and is different, but in the (upper), the similarities between Marlowe and Shakespeare's word usage is highlighted.
Figure 2Results of the two clusters from the Principal Component Analysis overlayed with the Hierarchical Cluster Analysis results and showing the three clusters that form to separate the known works of the three playwrights from the works that are of contested authorship (or in the case of 8, 29, and 30 are stylistically different). The Personal Pronoun (gender) scores where they are > 0.25 are also shown to emphazise differences. The table highlights the contribution of the two components that the RPAS-VAHOG variables made.
Figure 3Results of the Linear Discriminant Analysis of the uncontested works of the playwrights showing the most significant element from each canonical function (Auditory and Haptic Sensory elements). The mean of the works of each playwright is also shown. After constructing five partially synthetic Shakespeare works and overlaying them against the original data, they are closest to Shakespeare.
Figure 4The Venus and Adonis play (8) which seems to be stylistically different and has an unusual Richness to Referential Activity Power relationship (see inset) is divided into 2,000 word chunks as is the Merchant of Venice (16). The centroids of each play maintain the low RA Power/high Richness anomaly, highlighting the results in the inset is not an artifact of the size of the play.