Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Optimizing Corpus Creation for Training Word Embedding in Low Resource Domains: A Case Study in Autism Spectrum Disorder (ASD).

Literature DB >> 30815091

Optimizing Corpus Creation for Training Word Embedding in Low Resource Domains: A Case Study in Autism Spectrum Disorder (ASD).

Yang Gu¹, Gondy Leroy¹, Sydney Pettygrove¹, Maureen Kelly Galindo¹, Margaret Kurzius-Spencer¹.

Abstract

Automating the extraction of behavioral criteria indicative of Autism Spectrum Disorder (ASD) in electronic health records (EHRs) can contribute significantly to the effort to monitor the condition. Word embedding algorithms such as Word2Vec can encode semantic meanings of words in vectors and assist in automated vocabulary discovery from EHRs. However, text available for training word embeddings for ASD is miniscule compared to the billions of tokens typically used. We evaluate the importance of corpus specificity versus size and hypothesize that for specific domains small corpora can generate excellent word embeddings. We custom-built 6 ASD-themed corpora (N=4482), using ASD EHRs and abstracts from PubMed (N=39K) and PsychInfo (N=69K) and evaluated them. We were able to generate the most useful 200-dimension embeddings based on the small ASD EHR data. Due to diversity in its vocabulary, the abstract-based embeddings generated fewer related terms and saw minimal improvement when the size of the corpus increased.

Entities: Disease

Mesh：

Year: 2018 PMID： 30815091 PMCID： PMC6371367

Source DB: PubMed Journal: AMIA Annu Symp Proc ISSN： 1559-4076

9 in total

1. Measures of semantic similarity and relatedness in the biomedical domain.

Authors: Ted Pedersen; Serguei V S Pakhomov; Siddharth Patwardhan; Christopher G Chute
Journal: J Biomed Inform Date: 2006-06-10 Impact factor: 6.317

2. A machine learning approach for identifying anatomical locations of actionable findings in radiology reports.

Authors: Kirk Roberts; Bryan Rink; Sanda M Harabagiu; Richard H Scheuermann; Seth Toomay; Travis Browning; Teresa Bosler; Ronald Peshock
Journal: AMIA Annu Symp Proc Date: 2012-11-03

3. Methods for identifying suicide or suicidal ideation in EHRs.

Authors: K Haerian; H Salmasian; C Friedman
Journal: AMIA Annu Symp Proc Date: 2012-11-03

4. Prevalence of autism spectrum disorder among children aged 8 years - autism and developmental disabilities monitoring network, 11 sites, United States, 2010.

Authors:
Journal: MMWR Surveill Summ Date: 2014-03-28

5. Text Classification towards Detecting Misdiagnosis of an Epilepsy Syndrome in a Pediatric Population.

Authors: Ryan Sullivan; Robert Yao; Randa Jarrar; Jeffrey Buchhalter; Graciela Gonzalez
Journal: AMIA Annu Symp Proc Date: 2014-11-14

6. EpiDEA: extracting structured epilepsy and seizure information from patient discharge summaries for cohort identification.

Authors: Licong Cui; Alireza Bozorgi; Samden D Lhatoo; Guo-Qiang Zhang; Satya S Sahoo
Journal: AMIA Annu Symp Proc Date: 2012-11-03

7. Bidirectional RNN for Medical Event Detection in Electronic Health Records.

Authors: Abhyuday N Jagannatha; Hong Yu
Journal: Proc Conf Date: 2016-06

8. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative.

Authors: Sameer Pradhan; Noémie Elhadad; Brett R South; David Martinez; Lee Christensen; Amy Vogel; Hanna Suominen; Wendy W Chapman; Guergana Savova
Journal: J Am Med Inform Assoc Date: 2014-08-21 Impact factor: 4.497

9. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

Authors: Azadeh Nikfarjam; Abeed Sarker; Karen O'Connor; Rachel Ginn; Graciela Gonzalez
Journal: J Am Med Inform Assoc Date: 2015-03-09 Impact factor: 4.497

9 in total

1 in total

1. Development and evaluation of novel ophthalmology domain-specific neural word embeddings to predict visual prognosis.

Authors: Sophia Wang; Benjamin Tseng; Tina Hernandez-Boussard
Journal: Int J Med Inform Date: 2021-04-16 Impact factor: 4.730

1 in total