Literature DB >> 35881664

Corpus-based typology: applications, challenges and some solutions.

Natalia Levshina1.   

Abstract

Over the last few years, the number of corpora that can be used for language comparison has dramatically increased. The corpora are so diverse in their structure, size and annotation style, that a novice might not know where to start. The present paper charts this new and changing territory, providing a few landmarks, warning signs and safe paths. Although no corpus at present can replace the traditional type of typological data based on language description in reference grammars, corpora can help with diverse tasks, being particularly well suited for investigating probabilistic and gradient properties of languages and for discovering and interpreting cross-linguistic generalizations based on processing and communicative mechanisms. At the same time, the use of corpora for typological purposes has not only advantages and opportunities, but also numerous challenges. This paper also contains an empirical case study addressing two pertinent problems: the role of text types in language comparison and the problem of the word as a comparative concept.
© 2021 Walter de Gruyter GmbH, Berlin/Boston.

Entities:  

Keywords:  analyticity; comparable corpora; corpus annotation; language comparison; parallel corpora; universals

Year:  2021        PMID: 35881664      PMCID: PMC9159679          DOI: 10.1515/lingty-2020-0118

Source DB:  PubMed          Journal:  Linguist Typol        ISSN: 1430-0532


  10 in total

1.  Language trees and zipping.

Authors:  Dario Benedetto; Emanuele Caglioti; Vittorio Loreto
Journal:  Phys Rev Lett       Date:  2002-01-08       Impact factor: 9.161

2.  The cross-linguistic categorization of everyday events: a study of cutting and breaking.

Authors:  Asifa Majid; James S Boster; Melissa Bowerman
Journal:  Cognition       Date:  2008-10-22

3.  Balancing Effort and Information Transmission During Language Acquisition: Evidence From Word Order and Case Marking.

Authors:  Maryia Fedzechkina; Elissa L Newport; T Florian Jaeger
Journal:  Cogn Sci       Date:  2016-02-22

4.  Word lengths are optimized for efficient communication.

Authors:  Steven T Piantadosi; Harry Tily; Edward Gibson
Journal:  Proc Natl Acad Sci U S A       Date:  2011-01-28       Impact factor: 11.205

5.  Predictability of meaning in grammatical encoding: Optional plural marking.

Authors:  Chigusa Kurumada; Scott Grimm
Journal:  Cognition       Date:  2019-06-24

6.  A universal cue for grammatical categories in the input to children: Frequent frames.

Authors:  Steven Moran; Damián E Blasi; Robert Schikowski; Aylin C Küntay; Barbara Pfeiler; Shanley Allen; Sabine Stoll
Journal:  Cognition       Date:  2018-03-16

7.  Is "huh?" a universal word? Conversational infrastructure and the convergent evolution of linguistic items.

Authors:  Mark Dingemanse; Francisco Torreira; N J Enfield
Journal:  PLoS One       Date:  2013-11-08       Impact factor: 3.240

8.  The statistical trade-off between word order and word structure - Large-scale evidence for the principle of least effort.

Authors:  Alexander Koplenig; Peter Meyer; Sascha Wolfer; Carolin Müller-Spitzer
Journal:  PLoS One       Date:  2017-03-10       Impact factor: 3.240

9.  Nouns slow down speech across structurally and culturally diverse languages.

Authors:  Frank Seifart; Jan Strunk; Swintha Danielsen; Iren Hartmann; Brigitte Pakendorf; Søren Wichmann; Alena Witzlack-Makarevich; Nivja H de Jong; Balthasar Bickel
Journal:  Proc Natl Acad Sci U S A       Date:  2018-05-14       Impact factor: 11.205

10.  Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche.

Authors:  Christophe Coupé; Yoon Mi Oh; Dan Dediu; François Pellegrino
Journal:  Sci Adv       Date:  2019-09-04       Impact factor: 14.136

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.