Literature DB >> 35175566

Machine learning to detect invalid text responses: Validation and comparison to existing detection methods.

Ryan C Yeung1, Myra A Fernandes2.   

Abstract

A crucial step in analysing text data is the detection and removal of invalid texts (e.g., texts with meaningless or irrelevant content). To date, research topics that rely heavily on analysis of text data, such as autobiographical memory, have lacked methods of detecting invalid texts that are both effective and practical. Although researchers have suggested many data quality indicators that might identify invalid responses (e.g., response time, character/word count), few of these methods have been empirically validated with text responses. In the current study, we propose and implement a supervised machine learning approach that can mimic the accuracy of human coding, but without the need to hand-code entire text datasets. Our approach (a) trains, validates, and tests on a subset of texts manually labelled as valid or invalid, (b) calculates performance metrics to help select the best model, and (c) predicts whether unlabelled texts are valid or invalid based on the text alone. Model validation and evaluation using autobiographical memory texts indicated that machine learning accurately detected invalid texts with performance near human coding, significantly outperforming existing data quality indicators. Our openly available code and instructions enable new methods of improving data quality for researchers using text as data.
© 2022. The Psychonomic Society, Inc.

Entities:  

Keywords:  Autobiographical memory; Careless responding; Machine learning; Text as data; Text classification

Year:  2022        PMID: 35175566     DOI: 10.3758/s13428-022-01801-y

Source DB:  PubMed          Journal:  Behav Res Methods        ISSN: 1554-351X


  12 in total

1.  Survey research.

Authors:  J A Krosnick
Journal:  Annu Rev Psychol       Date:  1999       Impact factor: 24.137

2.  Identifying careless responses in survey data.

Authors:  Adam W Meade; S Bartholomew Craig
Journal:  Psychol Methods       Date:  2012-04-16

3.  Item non-response in open-ended questions: Who does not answer on the meaning of left and right?

Authors:  Evi Scholz; Cornelia Zuell
Journal:  Soc Sci Res       Date:  2012-07-20

Review 4.  Individual Differences in Autobiographical Memory.

Authors:  Daniela J Palombo; Signy Sheldon; Brian Levine
Journal:  Trends Cogn Sci       Date:  2018-05-25       Impact factor: 20.229

5.  A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research.

Authors:  Terry K Koo; Mae Y Li
Journal:  J Chiropr Med       Date:  2016-03-31

6.  Can't get it out of my mind: A systematic review of predictors of intrusive memories of distressing events.

Authors:  Elizabeth H Marks; Anna R Franklin; Lori A Zoellner
Journal:  Psychol Bull       Date:  2018-03-19       Impact factor: 17.737

7.  A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data.

Authors:  Víctor B Arias; L E Garrido; C Jenaro; A Martínez-Molina; B Arias
Journal:  Behav Res Methods       Date:  2020-12

8.  Unraveling the linguistic nature of specific autobiographical memories using a computerized classification algorithm.

Authors:  Keisuke Takano; Mayumi Ueno; Jun Moriya; Masaki Mori; Yuki Nishiguchi; Filip Raes
Journal:  Behav Res Methods       Date:  2017-06

9.  Text Mining in Organizational Research.

Authors:  Vladimer B Kobayashi; Stefan T Mol; Hannah A Berkers; Gábor Kismihók; Deanne N Den Hartog
Journal:  Organ Res Methods       Date:  2017-08-10

Review 10.  Array programming with NumPy.

Authors:  Charles R Harris; K Jarrod Millman; Stéfan J van der Walt; Ralf Gommers; Pauli Virtanen; David Cournapeau; Eric Wieser; Julian Taylor; Sebastian Berg; Nathaniel J Smith; Robert Kern; Matti Picus; Stephan Hoyer; Marten H van Kerkwijk; Matthew Brett; Allan Haldane; Jaime Fernández Del Río; Mark Wiebe; Pearu Peterson; Pierre Gérard-Marchant; Kevin Sheppard; Tyler Reddy; Warren Weckesser; Hameer Abbasi; Christoph Gohlke; Travis E Oliphant
Journal:  Nature       Date:  2020-09-16       Impact factor: 49.962

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.