| Literature DB >> 36002624 |
Aki-Juhani Kyröläinen1, James Gillett2, Megan Karabin2, Ranil Sonnadara2, Victor Kuperman2.
Abstract
This paper presents the Cognitive and Social WELL-being (CoSoWELL) project that consists of two components. One is a large corpus of narratives written by over 1000 North American older adults (55+ years old) in five test sessions before and during the first year of the COVID-19 pandemic. The other component is a rich collection of socio-demographic data collected through a survey from the same participants. This paper introduces the first release of the corpus consisting of 1.3 million tokens and the survey data (CoSoWELL version 1.0). It also presents a series of analyses validating design decisions for creating the corpus of narratives written about personal life events that took place in the distant past, recent past (yesterday) and future, along with control narratives. We report results of computational topic modeling and linguistic analyses of the narratives in the corpus, which track the time-locked impact of the COVID-19 pandemic on the content of autobiographical memories before and during the COVID-19 pandemic. The main findings demonstrate a high validity of our analytical approach to unique narrative data and point to both the locus of topical shifts (narratives about recent past and future) and their detailed timeline. We make the CoSoWELL corpus and survey data available to researchers and discuss implications of our findings in the framework of research on aging and autobiographical memories under stress.Entities:
Keywords: Aging; Autobiographical memory; COVID-19; Text analytics
Year: 2022 PMID: 36002624 PMCID: PMC9400578 DOI: 10.3758/s13428-022-01926-0
Source DB: PubMed Journal: Behav Res Methods ISSN: 1554-351X
Number of COVID-19 cases and deaths in the United States and Canada
| USA | CANADA | ||
|---|---|---|---|
| T1 (March 1, 2019 to June 30, 2019) | |||
| Total confirmed cases as of March 1, 2019 | 0 | Total confirmed cases as of March 1, 2019 | 0 |
| Total confirmed deaths as of March 1, 2019 | 0 | Total confirmed deaths as of March 1, 2019 | 0 |
| T2 (April 8, 2020 to June 16, 2020) | |||
| Total confirmed cases as of April 8, 2020 | 378,220 | Total confirmed cases as of April 8, 2020 | 17,049 |
| Total confirmed deaths as of April 8, 2020 | 12,620 | Total confirmed deaths as of April 8, 2020 | 345 |
| Week of March 30 – April 5 cases | 171,279 | Week of March 30 – April 5 cases | 8181 |
| Week of March 30 – April 5 deaths | 5002 | Week of March 30 – April 5 deaths | 159 |
| T3 (June 17, 2020 to June 30, 2020) | |||
| Total confirmed cases as of June 17, 2020 | 2,105,922 | Total confirmed cases as of June 17, 2020 | 108,829 |
| Total confirmed deaths as of June 17, 2020 | 117,182 | Total confirmed deaths as of June 17, 2020 | 8175 |
| Week of July 13 – July 19 cases | 461,531 | Week of July 13 – July 19 cases | 2543 |
| Week of July 13 – July 19 deaths | 6263 | Week of July 13 – July 19 deaths | 80 |
| T4 (Oct 14, 2020 to Nov 5, 2020) | |||
| Total confirmed cases as of Oct 14, 2020 | 7,880,896 | Total confirmed cases as of October 14, 2020 | 182,839 |
| Total confirmed deaths as of Oct 14, 2020 | 219,658 | Total confirmed deaths as of October 14, 2020 | 9589 |
| Week of Oct 12 – Oct 19 cases | 387,466 | Week of Oct 12 – Oct 19 cases | 15,989 |
| Week of Oct 12 – Oct 19 deaths | 4805 | Week of Oct 12 – Oct 19 deaths | 136 |
| T5 (Jan 12, 2021 to Feb 15, 2021) | |||
| Total confirmed cases as of Jan 12, 2021 | 23,533,768 | Total confirmed cases as of Jan 12, 2021 | 661,334 |
| Total confirmed deaths as of Jan 12, 2021 | 390,029 | Total confirmed deaths as of Jan 12, 2021 | 16,849 |
| Week of Jan 11 – Jan 17 cases | 1,591,103 | Week of Jan 11 – Jan 17 cases | 51,355 |
| Week of Jan 11 – Jan 17 deaths | 23,289 | Week of Jan 11 – Jan 17 deaths | 1016 |
Summary of participation in CoSoWELL tasks presented by the number of completed test sessions
| Number of test sessions | Total | |||
|---|---|---|---|---|
| CoSoWELL | 1 | 2 | 3 | |
| Survey | 1451 | 0 | 0 | 1451 |
| Corpus | 687 | 315 | 176 | 1178 |
| Survey and corpus | 546 | 308 | 174 | 1028 |
Number of participants by the CoSoWELL components and age group
| Age ( | |||||
|---|---|---|---|---|---|
| CoSoWELL | 55–59 | 60–64 | 65–69 | 70–100 | |
| Survey | 1451 | 536 | 473 | 297 | 145 |
| Survey and corpus | 1028 | 409 | 315 | 208 | 96 |
Number of participants by the source of the data in CoSoWELL and level of education including missing data (NA)
| Level of education ( | ||||||
|---|---|---|---|---|---|---|
| CoSoWELL | High school or less | College (complete or partial) | Bachelor’s | Graduate degree | NA | |
| Survey | 1451 | 134 | 530 | 488 | 286 | 13 |
| Survey and corpus | 1028 | 86 | 372 | 336 | 229 | 5 |
Descriptive statistics of the corpus by the test session. The upper part summarizes corpus data from all participants in the narrative writing task. The lower part summarizes data from the participants who completed both the narrative writing task and the survey (corpus + survey)
| Test session | Token | Type | Lemma | Narrative | Sentence | Participant |
|---|---|---|---|---|---|---|
| CoSoWELL corpus | ||||||
| t1 | 161,181 | 9462 | 7787 | 848 | 9278 | 212 |
| t2 | 397,416 | 14,670 | 12,195 | 2111 | 22,497 | 402 |
| t3 | 270,443 | 12,779 | 10,571 | 1708 | 15,137 | 427 |
| t4 | 248,931 | 12,058 | 10,002 | 1640 | 13,588 | 409 |
| t5 | 260,349 | 12,461 | 10,318 | 1668 | 14,315 | 395 |
| Total | 1,338,320 | 61,430 | 50,873 | 7975 | 74,815 | 1845∗ (1178) |
| CoSoWELL corpus + survey | ||||||
| t1 | 158,656 | 9394 | 7728 | 832 | 9127 | 208 |
| t2 | 326,118 | 13,252 | 10,978 | 1727 | 18,429 | 317 |
| t3 | 256,606 | 12,427 | 10,272 | 1604 | 14,371 | 401 |
| t4 | 224,696 | 11,407 | 9483 | 1480 | 12,274 | 370 |
| t5 | 256,957 | 12,378 | 10,251 | 1640 | 14,105 | 388 |
| Total | 1,223,033 | 58,858 | 48,712 | 7283 | 68,306 | 1684∗ (1028) |
The number marked by ∗ represents the total number of participants across test sessions, counting participation in each test session separately. The number in parentheses represents the unique number of participants across all test sessions.
Fig. 1Visualization of the output of the preprocessing pipeline for one sentence. The arcs represent dependency relations. Tokenization is given in green, part-of-speech tagging in black and the labels of the dependency relations in red
Fig. 2Model diagnostics for evaluating the number of topics. The vertical line indicates the chosen model (k = 22)
Fig. 3A word cloud of the estimated keywords for each topic. The most important keyword of a given topic is placed in the center of the cloud. The keywords are presented in rank order and the ranking is further denoted with the size and color
Fig. 4Estimated document-topic probability distribution across the four story types. Each shaded line represents a life story and its document-topic probability distribution
Confusion matrix of the RF based on the test data. Correctly classified narratives are located on the diagonal
| Observed | ||||
|---|---|---|---|---|
| Predicted | cookie | future | past | yesterday |
| cookie | 402 | 0 | 3 | 3 |
| future | 2 | 332 | 43 | 59 |
| past | 9 | 29 | 366 | 49 |
| yesterday | 3 | 55 | 31 | 316 |
Fig. 5Estimated PD profiles of the most discriminative predictors for the story type “cookie”. The average predicted probability is given on the y-axis and the x-axis gives the values associated with a specific predictor (topic) broken down by the four story types
Fig. 6Estimated PD profiles of the most discriminative predictors for the story type “future”. The average predicted probability is given on the y-axis and the x-axis gives the values associated with a specific predictor (topic) broken down by the four story types
Fig. 7Estimated PD profiles of the most discriminative predictors for the story type “yesterday”. The average predicted probability is given on the y-axis and the x-axis gives the values associated with a specific predictor (topic) broken down by the four story types
Fig. 8Estimated PD profiles of the most discriminative predictors for the story type “past”. The average predicted probability is given on the y-axis and the x-axis gives the values associated with a specific predictor (topic) broken down by the four story types
Fig. 9Topic prevalence by test session. Panel titles indicate Story type: topic name. Error bars stand for 95% CI. Dotted lines mark the baseline prevalence. The test sessions correspond to the following dates: t1 = 2019; t2 = from 2020-04-08 to 2020-06-16; t3 = from 2020-06-17 to 2020-06-30; t4 = from 2020-10-14 to 2020-11-05 and t5 = from 2021-01-12 to 2021-02-15