Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 subs2vec: Word embeddings from subtitles in 55 languages.

Literature DB >> 32789660

subs2vec: Word embeddings from subtitles in 55 languages.

Abstract

This paper introduces a novel collection of word embeddings, numerical representations of lexical semantics, in 55 languages, trained on a large corpus of pseudo-conversational speech transcriptions from television shows and movies. The embeddings were trained on the OpenSubtitles corpus using the fastText implementation of the skipgram algorithm. Performance comparable with (and in some cases exceeding) embeddings trained on non-conversational (Wikipedia) text is reported on standard benchmark evaluation datasets. A novel evaluation method of particular relevance to psycholinguists is also introduced: prediction of experimental lexical norms in multiple languages. The models, as well as code for reproducing the models and all analyses reported in this paper (implemented as a user-friendly Python package), are freely available at: https://github.com/jvparidon/subs2vec .

Entities: Chemical Disease Gene Species

Keywords: Distributional semantics; Lexical norms; Multilingual; Word embeddings

Year: 2021 PMID： 32789660 DOI： 10.3758/s13428-020-01406-3

Source DB: PubMed Journal: Behav Res Methods ISSN： 1554-351X

31 in total

1. Age-of-acquisition norms for a set of 1,749 Portuguese words.

Authors: Manuela L Cameirão; Selene G Vicente
Journal: Behav Res Methods Date: 2010-05

2. Subjective frequency and imageability ratings for 3,600 French nouns.

Authors: Alain Desrochers; Glenn L Thompson
Journal: Behav Res Methods Date: 2009-05

3. Affective norms for 210 British English and Finnish nouns.

Authors: Tiina M Eilola; Jelena Havelka
Journal: Behav Res Methods Date: 2010-02

4. Lexico-semantic effects on word naming in Persian: does age of acquisition have an effect?

Authors: Mehdi Bakhtiar; Brendan Weekes
Journal: Mem Cognit Date: 2015-02

5. Sensory experience ratings for 5,500 Spanish words.

Authors: Antonio M Díez-Álamo; Emiliano Díez; Dominika Zofia Wojcik; María Angeles Alonso; Angel Fernandez
Journal: Behav Res Methods Date: 2019-06

6. Toward a brain-based componential semantic representation.

Authors: Jeffrey R Binder; Lisa L Conant; Colin J Humphries; Leonardo Fernandino; Stephen B Simons; Mario Aguilar; Rutvik H Desai
Journal: Cogn Neuropsychol Date: 2016-06-16 Impact factor: 2.468

7. Normative ratings for perceptual and motor attributes of 750 object concepts in Spanish.

Authors: Antonio M Díez-Álamo; Emiliano Díez; María Ángeles Alonso; C Alejandra Vargas; Angel Fernandez
Journal: Behav Res Methods Date: 2018-08

8. Concreteness norms for 1,659 French words: Relationships with other psycholinguistic variables and word recognition times.

Authors: Patrick Bonin; Alain Méot; Aurélia Bugaiska
Journal: Behav Res Methods Date: 2018-12

9. Assessing the usefulness of google books' word frequencies for psycholinguistic research on word processing.

Authors: Marc Brysbaert; Emmanuel Keuleers; Boris New
Journal: Front Psychol Date: 2011-03-02

10. Humor norms for 4,997 English words.

Authors: Tomas Engelthaler; Thomas T Hills
Journal: Behav Res Methods Date: 2018-06

2 in total

1. Rapid adaptation of predictive models during language comprehension: Aperiodic EEG slope, individual alpha frequency and idea density modulate individual differences in real-time model updating.

Authors: Ina Bornkessel-Schlesewsky; Isabella Sharrad; Caitlin A Howlett; Phillip M Alday; Andrew W Corcoran; Valeria Bellan; Erica Wilkinson; Reinhold Kliegl; Richard L Lewis; Steven L Small; Matthias Schlesewsky
Journal: Front Psychol Date: 2022-08-26

2. The verb-self link: An implicit association test study.

Authors: Patrick P Weis; Jan Nikadon; Cornelia Herbert; Magdalena Formanowicz
Journal: Psychon Bull Rev Date: 2022-05-02

2 in total