Literature DB >> 32762729

CAS: corpus of clinical cases in French.

Natalia Grabar1,2, Clément Dalloux3, Vincent Claveau3.   

Abstract

BACKGROUND: Textual corpora are extremely important for various NLP applications as they provide information necessary for creating, setting and testing those applications and the corresponding tools. They are also crucial for designing reliable methods and reproducible results. Yet, in some areas, such as the medical area, due to confidentiality or to ethical reasons, it is complicated or even impossible to access representative textual data. We propose the CAS corpus built with clinical cases, such as they are reported in the published scientific literature in French.
RESULTS: Currently, the corpus contains 4,900 clinical cases in French, totaling nearly 1.7M word occurrences. Some clinical cases are associated with discussions. A subset of the whole set of cases is enriched with morpho-syntactic (PoS-tagging, lemmatization) and semantic (the UMLS concepts, negation, uncertainty) annotations. The corpus is being continuously enriched with new clinical cases and annotations. The CAS corpus has been compared with similar clinical narratives. When computed on tokenized and lowercase words, the Jaccard index indicates that the similarity between clinical cases and narratives reaches up to 0.9727.
CONCLUSION: We assume that the CAS corpus can be effectively exploited for the development and testing of NLP tools and methods. Besides, the corpus will be used in NLP challenges and distributed to the research community.

Entities:  

Keywords:  Corpus with clinical cases; Medical area; Morpho-syntactic and semantic annotation; Natural language processing; Reproducibility; Sustainability

Mesh:

Year:  2020        PMID: 32762729      PMCID: PMC7410149          DOI: 10.1186/s13326-020-00225-x

Source DB:  PubMed          Journal:  J Biomed Semantics


  30 in total

1.  Linguistic approach for identification of medication names and related information in clinical narratives.

Authors:  Thierry Hamon; Natalia Grabar
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

2.  Automatic de-identification of French clinical records: comparison of rule-based and machine-learning approaches.

Authors:  Cyril Grouin; Pierre Zweigenbaum
Journal:  Stud Health Technol Inform       Date:  2013

3.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

Authors:  Özlem Uzuner; Brett R South; Shuying Shen; Scott L DuVall
Journal:  J Am Med Inform Assoc       Date:  2011-06-16       Impact factor: 4.497

4.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions.

Authors:  Wendy W Chapman; Prakash M Nadkarni; Lynette Hirschman; Leonard W D'Avolio; Guergana K Savova; Ozlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2011 Sep-Oct       Impact factor: 4.497

5.  Automated misspelling detection and correction in clinical free-text records.

Authors:  Kenneth H Lai; Maxim Topaz; Foster R Goss; Li Zhou
Journal:  J Biomed Inform       Date:  2015-04-24       Impact factor: 6.317

Review 6.  Psychology, Science, and Knowledge Construction: Broadening Perspectives from the Replication Crisis.

Authors:  Patrick E Shrout; Joseph L Rodgers
Journal:  Annu Rev Psychol       Date:  2018-01-04       Impact factor: 24.137

7.  Policy: NIH plans to enhance reproducibility.

Authors:  Francis S Collins; Lawrence A Tabak
Journal:  Nature       Date:  2014-01-30       Impact factor: 49.962

8.  An efficient prototype method to identify and correct misspellings in clinical text.

Authors:  T Elizabeth Workman; Yijun Shao; Guy Divita; Qing Zeng-Treitler
Journal:  BMC Res Notes       Date:  2019-01-18

9.  PSYCHOLOGY. Estimating the reproducibility of psychological science.

Authors: 
Journal:  Science       Date:  2015-08-28       Impact factor: 47.728

10.  Diagnosis code assignment: models and evaluation metrics.

Authors:  Adler Perotte; Rimma Pivovarov; Karthik Natarajan; Nicole Weiskopf; Frank Wood; Noémie Elhadad
Journal:  J Am Med Inform Assoc       Date:  2013-12-02       Impact factor: 4.497

View more
  1 in total

1.  A survey on text classification: Practical perspectives on the Italian language.

Authors:  Andrea Gasparetto; Alessandro Zangari; Matteo Marcuzzo; Andrea Albarelli
Journal:  PLoS One       Date:  2022-07-06       Impact factor: 3.752

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.