| Literature DB >> 28922416 |
Jaimie Murdock1,2, Colin Allen1,3,4, Katy Börner1,2,5,6, Robert Light2, Simon McAlister7, Andrew Ravenscroft7, Robert Rose1,8, Doori Rose1, Jun Otsuka9, David Bourget10, John Lawrence11, Chris Reed11.
Abstract
We show how faceted search using a combination of traditional classification systems and mixed-membership topic models can go beyond keyword search to inform resource discovery, hypothesis formulation, and argument extraction for interdisciplinary research. Our test domain is the history and philosophy of scientific work on animal mind and cognition. The methods can be generalized to other research areas and ultimately support a system for semi-automatic identification of argument structures. We provide a case study for the application of the methods to the problem of identifying and extracting arguments about anthropomorphism during a critical period in the development of comparative psychology. We show how a combination of classification systems and mixed-membership models trained over large digital libraries can inform resource discovery in this domain. Through a novel approach of "drill-down" topic modeling-simultaneously reducing both the size of the corpus and the unit of analysis-we are able to reduce a large collection of fulltext volumes to a much smaller set of pages within six focal volumes containing arguments of interest to historians and philosophers of comparative psychology. The volumes identified in this way did not appear among the first ten results of the keyword search in the HathiTrust digital library and the pages bear the kind of "close reading" needed to generate original interpretations that is the heart of scholarly work in the humanities. Zooming back out, we provide a way to place the books onto a map of science originally constructed from very different data and for different purposes. The multilevel approach advances understanding of the intellectual and societal contexts in which writings are interpreted.Entities:
Mesh:
Year: 2017 PMID: 28922416 PMCID: PMC5602542 DOI: 10.1371/journal.pone.0184188
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Corpus analysis sequence.
Schematic rendering of the six-step process that sequentially drills down from macroscopic “distant reading” to microscopic “close reading” before zooming back out to the macroscopic scale at the final step. The approximate orders of magnitude of the datasets either side of each processing step are shown below the icons as powers of 10 of book/fulltext-sized units, and grey bars representing the data are scaled logarithmically.
Topics ranked by similarity to ‘anthropomorphism’ in the HT1315 corpus.
Topic 16 (highlighted with bold text) is highly relevant to the inquiry.
| Topic | 10 most probable words from topic |
|---|---|
| 38 | god, religion, life, man, religious, spirit, world, nature, spiritual, divine |
| 51 | philosophy, nature, knowledge, world, thought, idea, things, reason, truth, science |
| 58 | man, among, tribes, primitive, men, people, also, races, women, race |
| 12 | child, children, first, development, movements, play, life, little, mental, mother |
| 21 | social, life, new, mind, upon, individual, human, mental, world, subfield |
| 11 | motion, force, must, forces, matter, changes, us, parts, like, evolution |
| 1 | pp, der, vol, die, de, des, und, ibid, university, la |
| 31 | gods, religion, p, name, see, god, india, ancient, one, worship |
Topics ranked by similarity to ‘anthropomorphism’, ‘animal’, and ‘psychology’ in the HT1315 corpus.
Topics 26, 16, and 10 (highlighted with bold text) were used to derive the HT86 corpus, as they were most relevant to the inquiry.
| Topic | 10 most probable words from topic |
|---|---|
| 47 | college, university, professor, school, law, work, students, degree, education, new |
| 49 | subfield, code, datafield, tag, ind2, ind1, b, d, c, controlfield |
| 1 | pp, der, vol, die, de, des, und, ibid, university, la |
| 12 | child, children, first, development, movements, play, life, little, mental, mother |
| 58 | man, among, tribes, primitive, men, people, also, races, women, race |
| 21 | social, life, new, mind, upon, individual, human, mental, world, subfield |
| 2 | test, tests, age, group, children, mental, table, per, cent, number |
Book titles ranked by proximity of the full texts to topics 10, 16, and 26 in the k = 60 model of the HT1315 corpus.
| Document | Distance |
|---|---|
| Secrets of animal life | 0.87689 |
| Comparative studies in the psychology of ants and of higher … | 0.88814 |
| The colours of animals, their meaning and use, especially … | 0.98445 |
| The foundations of normal and abnormal psychology | 0.99833 |
| The bird rookeries of the Tortugas | 1.00286 |
| Mind in animals | 1.00294 |
| Ants and some other insects; an inquiry into the psychic … | 1.00504 |
| Systematic science teaching: a manual of inductive … | 1.01040 |
| The riddle of the universe at the close of the 19th C. | 1.01450 |
| The colour-sense: its origin and development. | 1.02795 |
Topics ranked by similarity to ‘anthropomorphism’ in the HT86 corpus, as modeled at the page level.
| Topic | Top Ten Most Probable Words from Topic |
|---|---|
| 18 | god, religion, evolution, religious, man, human, science, world, christian, belief |
| 3 | mind, man, facts, life, evolution, instinct, subjective, instincts, organic, development |
| 1 | animal, animals, may, stimulus, experience, would, instinct, reaction, one, stimuli |
| 51 | sense, sensation, qualities, touch, perception, sensations, extension, sight, senses, us |
Topics ranked by similarity to ‘anthropomorphism’, ‘animal’, and ‘psychology’ in the HT86 corpus.
| Topic | Top Ten Most Probable Words from Topic |
|---|---|
| 1 | animal, animals, may, stimulus, experience, would, instinct, reaction, one, stimuli |
| 51 | sense, sensation, qualities, touch, perception, sensations, extension, sight, senses, us |
| 18 | god, religion, evolution, religious, man, human, science, world, christian, belief |
| 3 | mind, man, facts, life, evolution, instinct, subjective, instincts, organic, development |
Pages ranked by similarity to Topic 1.
| Document | Distance |
|---|---|
| The animal mind, 1st ed., p. 43 | 0.04414 |
| The animal mind, 2nd ed., p. 47 | 0.04552 |
| The animal mind, 2nd ed., p. 263 | 0.10360 |
| The animal mind, 2nd ed., p. 16 | 0.12336 |
| The animal mind, 2nd ed., p. 71 | 0.15828 |
| The animal mind, 1st ed., p. 219 | 0.16288 |
| The animal mind, 1st ed., p. 232 | 0.16674 |
| The animal mind, 1st ed., p. 57 | 0.18380 |
| The animal mind, 1st ed., p. 72 | 0.22610 |
| Mind in the lower animals, p. 179 | 0.23408 |
Pages for which OVA+ argument maps were created, showing total number of pages analyzed and numbers of arguments identified on each of the passes described in the main text.
| Volume | Pages | Total | Pass 1 | Pass A |
|---|---|---|---|---|
| 13–16, 16–21, 24–27, 28–31, 31–34, 58–64, 204–207, 288–294 | 40 | 9 | 15 | |
| Preface, 15–19, 31–34, 48–53, 99–103, 108–112, 206–209, 209–214 | 37 | 8 | 10 | |
| 374, 381, 382, 385, 386, 390, 394, 395 | 10 | 8 | ||
| 434–435, 436 | 3 | 2 | ||
| 16–18, 21–26, 30–32 | 12 | 5 | ||
| 479–484 | 6 | 3 | ||
| 108 | 17 | 43 |
Fig 2An argument map derived from The Animal Mind, represented in OVA+.
Fig 3UCSD map of science with overlay of HathiTrust search results.
This image shows topical coverage of humanities and life science data. The basemap of science shows each sub-discipline denoted by a circle colored or shaded according to the 13 core disciplines. Links indicate journal co-citations from the basemap. The 776 volumes of HT1315 with LCCN metadata are shown on the map as circles. Volumes also in HT86 are shown with thicker circles, and those in HT6 are shown in the thickest circles. An online, interactive version can be explored at http://inpho.cogs.indiana.edu/scimap/scits.