Literature DB >> 30815091

Optimizing Corpus Creation for Training Word Embedding in Low Resource Domains: A Case Study in Autism Spectrum Disorder (ASD).

Yang Gu1, Gondy Leroy1, Sydney Pettygrove1, Maureen Kelly Galindo1, Margaret Kurzius-Spencer1.   

Abstract

Automating the extraction of behavioral criteria indicative of Autism Spectrum Disorder (ASD) in electronic health records (EHRs) can contribute significantly to the effort to monitor the condition. Word embedding algorithms such as Word2Vec can encode semantic meanings of words in vectors and assist in automated vocabulary discovery from EHRs. However, text available for training word embeddings for ASD is miniscule compared to the billions of tokens typically used. We evaluate the importance of corpus specificity versus size and hypothesize that for specific domains small corpora can generate excellent word embeddings. We custom-built 6 ASD-themed corpora (N=4482), using ASD EHRs and abstracts from PubMed (N=39K) and PsychInfo (N=69K) and evaluated them. We were able to generate the most useful 200-dimension embeddings based on the small ASD EHR data. Due to diversity in its vocabulary, the abstract-based embeddings generated fewer related terms and saw minimal improvement when the size of the corpus increased.

Entities:  

Mesh:

Year:  2018        PMID: 30815091      PMCID: PMC6371367     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  9 in total

1.  Measures of semantic similarity and relatedness in the biomedical domain.

Authors:  Ted Pedersen; Serguei V S Pakhomov; Siddharth Patwardhan; Christopher G Chute
Journal:  J Biomed Inform       Date:  2006-06-10       Impact factor: 6.317

2.  A machine learning approach for identifying anatomical locations of actionable findings in radiology reports.

Authors:  Kirk Roberts; Bryan Rink; Sanda M Harabagiu; Richard H Scheuermann; Seth Toomay; Travis Browning; Teresa Bosler; Ronald Peshock
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

3.  Methods for identifying suicide or suicidal ideation in EHRs.

Authors:  K Haerian; H Salmasian; C Friedman
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

4.  Prevalence of autism spectrum disorder among children aged 8 years - autism and developmental disabilities monitoring network, 11 sites, United States, 2010.

Authors: 
Journal:  MMWR Surveill Summ       Date:  2014-03-28

5.  Text Classification towards Detecting Misdiagnosis of an Epilepsy Syndrome in a Pediatric Population.

Authors:  Ryan Sullivan; Robert Yao; Randa Jarrar; Jeffrey Buchhalter; Graciela Gonzalez
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

6.  EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification.

Authors:  Licong Cui; Alireza Bozorgi; Samden D Lhatoo; Guo-Qiang Zhang; Satya S Sahoo
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

7.  Bidirectional RNN for Medical Event Detection in Electronic Health Records.

Authors:  Abhyuday N Jagannatha; Hong Yu
Journal:  Proc Conf       Date:  2016-06

8.  Evaluating the state of the art in disorder recognition and normalization of the clinical narrative.

Authors:  Sameer Pradhan; Noémie Elhadad; Brett R South; David Martinez; Lee Christensen; Amy Vogel; Hanna Suominen; Wendy W Chapman; Guergana Savova
Journal:  J Am Med Inform Assoc       Date:  2014-08-21       Impact factor: 4.497

9.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

Authors:  Azadeh Nikfarjam; Abeed Sarker; Karen O'Connor; Rachel Ginn; Graciela Gonzalez
Journal:  J Am Med Inform Assoc       Date:  2015-03-09       Impact factor: 4.497

  9 in total
  1 in total

1.  Development and evaluation of novel ophthalmology domain-specific neural word embeddings to predict visual prognosis.

Authors:  Sophia Wang; Benjamin Tseng; Tina Hernandez-Boussard
Journal:  Int J Med Inform       Date:  2021-04-16       Impact factor: 4.730

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.