Literature DB >> 34296226

Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality.

Adithya V Ganesan1, Matthew Matero1, Aravind Reddy Ravula1, Huy Vu1, H Andrew Schwartz1.   

Abstract

In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just 1 12 of the embedding dimensions.

Entities:  

Year:  2021        PMID: 34296226      PMCID: PMC8294338          DOI: 10.18653/v1/2021.naacl-main.357

Source DB:  PubMed          Journal:  Proc Conf


  13 in total

Review 1.  Categories versus dimensions in personality and psychopathology: a quantitative review of taxometric research.

Authors:  N Haslam; E Holland; P Kuppens
Journal:  Psychol Med       Date:  2011-09-23       Impact factor: 7.723

2.  Cohort profile: 1958 British birth cohort (National Child Development Study).

Authors:  Chris Power; Jane Elliott
Journal:  Int J Epidemiol       Date:  2005-09-09       Impact factor: 7.196

3.  Reducing the dimensionality of data with neural networks.

Authors:  G E Hinton; R R Salakhutdinov
Journal:  Science       Date:  2006-07-28       Impact factor: 47.728

Review 4.  Barriers to improvement of mental health services in low-income and middle-income countries.

Authors:  Benedetto Saraceno; Mark van Ommeren; Rajaie Batniji; Alex Cohen; Oye Gureje; John Mahoney; Devi Sridhar; Chris Underhill
Journal:  Lancet       Date:  2007-09-29       Impact factor: 79.321

5.  Gaining insights from social media language: Methodologies and challenges.

Authors:  Margaret L Kern; Gregory Park; Johannes C Eichstaedt; H Andrew Schwartz; Maarten Sap; Laura K Smith; Lyle H Ungar
Journal:  Psychol Methods       Date:  2016-08-08

6.  Twelve-month use of mental health services in the United States: results from the National Comorbidity Survey Replication.

Authors:  Philip S Wang; Michael Lane; Mark Olfson; Harold A Pincus; Kenneth B Wells; Ronald C Kessler
Journal:  Arch Gen Psychiatry       Date:  2005-06

7.  Private traits and attributes are predictable from digital records of human behavior.

Authors:  Michal Kosinski; David Stillwell; Thore Graepel
Journal:  Proc Natl Acad Sci U S A       Date:  2013-03-11       Impact factor: 11.205

8.  The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R).

Authors:  Ronald C Kessler; Patricia Berglund; Olga Demler; Robert Jin; Doreen Koretz; Kathleen R Merikangas; A John Rush; Ellen E Walters; Philip S Wang
Journal:  JAMA       Date:  2003-06-18       Impact factor: 56.272

9.  Facebook language predicts depression in medical records.

Authors:  Johannes C Eichstaedt; Robert J Smith; Raina M Merchant; Lyle H Ungar; Patrick Crutchley; Daniel Preoţiuc-Pietro; David A Asch; H Andrew Schwartz
Journal:  Proc Natl Acad Sci U S A       Date:  2018-10-15       Impact factor: 11.205

10.  Personality, gender, and age in the language of social media: the open-vocabulary approach.

Authors:  H Andrew Schwartz; Johannes C Eichstaedt; Margaret L Kern; Lukasz Dziurzynski; Stephanie M Ramones; Megha Agrawal; Achal Shah; Michal Kosinski; David Stillwell; Martin E P Seligman; Lyle H Ungar
Journal:  PLoS One       Date:  2013-09-25       Impact factor: 3.240

View more
  3 in total

1.  Using Facebook language to predict and describe excessive alcohol use.

Authors:  Rupa Jose; Matthew Matero; Garrick Sherman; Brenda Curtis; Salvatore Giorgi; Hansen Andrew Schwartz; Lyle H Ungar
Journal:  Alcohol Clin Exp Res       Date:  2022-05-16       Impact factor: 3.928

Review 2.  Applications of natural language processing in ophthalmology: present and future.

Authors:  Jimmy S Chen; Sally L Baxter
Journal:  Front Med (Lausanne)       Date:  2022-08-08

3.  Natural language analyzed with AI-based transformers predict traditional subjective well-being measures approaching the theoretical upper limits in accuracy.

Authors:  Oscar N E Kjell; Sverker Sikström; Katarina Kjell; H Andrew Schwartz
Journal:  Sci Rep       Date:  2022-03-10       Impact factor: 4.379

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.