Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Ensembles of natural language processing systems for portable phenotyping solutions.

Literature DB >> 31655273

Ensembles of natural language processing systems for portable phenotyping solutions.

Cong Liu¹, Casey N Ta¹, James R Rogers¹, Ziran Li¹, Junghwan Lee¹, Alex M Butler¹, Ning Shang¹, Fabricio Sampaio Peres Kury¹, Liwei Wang², Feichen Shen², Hongfang Liu², Lyudmila Ena¹, Carol Friedman¹, Chunhua Weng³.

Abstract

BACKGROUND: Manually curating standardized phenotypic concepts such as Human Phenotype Ontology (HPO) terms from narrative text in electronic health records (EHRs) is time consuming and error prone. Natural language processing (NLP) techniques can facilitate automated phenotype extraction and thus improve the efficiency of curating clinical phenotypes from clinical texts. While individual NLP systems can perform well for a single cohort, an ensemble-based method might shed light on increasing the portability of NLP pipelines across different cohorts.
METHODS: We compared four NLP systems, MetaMapLite, MedLEE, ClinPhen and cTAKES, and four ensemble techniques, including intersection, union, majority-voting and machine learning, for extracting generic phenotypic concepts. We addressed two important research questions regarding automated phenotype recognition. First, we evaluated the performance of different approaches in identifying generic phenotypic concepts. Second, we compared the performance of different methods to identify patient-specific phenotypic concepts. To better quantify the effects caused by concept granularity differences on performance, we developed a novel evaluation metric that considered concept hierarchies and frequencies. Each of the approaches was evaluated on a gold standard set of clinical documents annotated by clinical experts. One dataset containing 1,609 concepts derived from 50 clinical notes from two different institutions was used in both evaluations, and an additional dataset of 608 concepts derived from 50 case report abstracts obtained from PubMed was used for evaluation of identifying generic phenotypic concepts only.
RESULTS: For generic phenotypic concept recognition, the top three performers in the NYP/CUIMC dataset are union ensemble (F1, 0.634), training-based ensemble (F1, 0.632), and majority vote-based ensemble (F1, 0.622). In the Mayo dataset, the top three are majority vote-based ensemble (F1, 0.642), cTAKES (F1, 0.615), and MedLEE (F1, 0.559). In the PubMed dataset, the top three are majority vote-based ensemble (F1, 0.719), training-based (F1, 0.696) and MetaMapLite (F1, 0.694). For identifying patient specific phenotypes, the top three performers in the NYP/CUIMC dataset are majority vote-based ensemble (F1, 0.610), MedLEE (F1, 0.609), and training-based ensemble (F1, 0.585). In the Mayo dataset, the top three are majority vote-based ensemble (F1, 0.604), cTAKES (F1, 0.531) and MedLEE (F1, 0.527).
CONCLUSIONS: Our study demonstrates that ensembles of natural language processing can improve both generic phenotypic concept recognition and patient specific phenotypic concept identification over individual systems. Among the individual NLP systems, each individual system performed best when they were applied in the dataset that they were primary designed for. However, combining multiple NLP systems to create an ensemble can generally improve the performance. Specifically, the ensemble can increase the results reproducibility across different cohorts and tasks, and thus provide a more portable phenotyping solution compared to individual NLP systems.

Entities: Chemical Disease Gene Species

Keywords: Concept recognition; Ensemble method; Evaluation; Human phenotype ontology; Natural language processing; Reproducibility

Mesh：

Year: 2019 PMID： 31655273 PMCID： PMC6899194 DOI： 10.1016/j.jbi.2019.103318

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

45 in total

1. Automated encoding of clinical documents based on natural language processing.

Authors: Carol Friedman; Lyudmila Shagina; Yves Lussier; George Hripcsak
Journal: J Am Med Inform Assoc Date: 2004-06-07 Impact factor: 4.497

2. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

Authors: Özlem Uzuner; Brett R South; Shuying Shen; Scott L DuVall
Journal: J Am Med Inform Assoc Date: 2011-06-16 Impact factor: 4.497

3. Phenolyzer: phenotype-based prioritization of candidate genes for human diseases.

Authors: Hui Yang; Peter N Robinson; Kai Wang
Journal: Nat Methods Date: 2015-07-20 Impact factor: 28.547

4. iRSpot-EL: identify recombination spots with an ensemble learning approach.

Authors: Bin Liu; Shanyi Wang; Ren Long; Kuo-Chen Chou
Journal: Bioinformatics Date: 2016-08-16 Impact factor: 6.937

5. Clinical diagnostics in human genetics with semantic similarity searches in ontologies.

Authors: Sebastian Köhler; Marcel H Schulz; Peter Krawitz; Sebastian Bauer; Sandra Dölken; Claus E Ott; Christine Mundlos; Denise Horn; Stefan Mundlos; Peter N Robinson
Journal: Am J Hum Genet Date: 2009-10 Impact factor: 11.025

6. Feasibility of pooling annotated corpora for clinical concept extraction.

Authors: Kavishwar Wagholikar; Manabu Torii; Siddhartha Jonnalagadda; Hongfang Liu
Journal: AMIA Jt Summits Transl Sci Proc Date: 2012-03-19

7. Doc2Hpo: a web application for efficient and accurate HPO concept curation.

Authors: Cong Liu; Fabricio Sampaio Peres Kury; Ziran Li; Casey Ta; Kai Wang; Chunhua Weng
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

8. Clinical application of whole-exome sequencing across clinical indications.

Authors: Kyle Retterer; Jane Juusola; Megan T Cho; Patrik Vitazka; Francisca Millan; Federica Gibellini; Annette Vertino-Bell; Nizar Smaoui; Julie Neidich; Kristin G Monaghan; Dianalee McKnight; Renkui Bai; Sharon Suchy; Bethany Friedman; Jackie Tahiliani; Daniel Pineda-Alvarez; Gabriele Richard; Tracy Brandt; Eden Haverfield; Wendy K Chung; Sherri Bale
Journal: Genet Med Date: 2015-12-03 Impact factor: 8.822

9. Recognition of medication information from discharge summaries using ensembles of classifiers.

Authors: Son Doan; Nigel Collier; Hua Xu; Hoang Duy Pham; Minh Phuong Tu
Journal: BMC Med Inform Decis Mak Date: 2012-05-07 Impact factor: 2.796

10. Diagnosis code assignment: models and evaluation metrics.

Authors: Adler Perotte; Rimma Pivovarov; Karthik Natarajan; Nicole Weiskopf; Frank Wood; Noémie Elhadad
Journal: J Am Med Inform Assoc Date: 2013-12-02 Impact factor: 4.497

5 in total

1. Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance.

Authors: Scott A Malec; Peng Wei; Elmer V Bernstam; Richard D Boyce; Trevor Cohen
Journal: J Biomed Inform Date: 2021-03-11 Impact factor: 6.317

2. Evaluation of phenotype-driven gene prioritization methods for Mendelian diseases.

Authors: Xiao Yuan; Jing Wang; Bing Dai; Yanfang Sun; Keke Zhang; Fangfang Chen; Qian Peng; Yixuan Huang; Xinlei Zhang; Junru Chen; Xilin Xu; Jun Chuan; Wenbo Mu; Huiyuan Li; Ping Fang; Qiang Gong; Peng Zhang
Journal: Brief Bioinform Date: 2022-03-10 Impact factor: 11.622

3. Deep phenotyping: Embracing complexity and temporality-Towards scalability, portability, and interoperability.

Authors: Chunhua Weng; Nigam H Shah; George Hripcsak
Journal: J Biomed Inform Date: 2020-04-23 Impact factor: 6.317

4. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies.

Authors: Martijn G Kersloot; Florentien J P van Putten; Ameen Abu-Hanna; Ronald Cornet; Derk L Arts
Journal: J Biomed Semantics Date: 2020-11-16

5. Feasibility of capturing real-world data from health information technology systems at multiple centers to assess cardiac ablation device outcomes: A fit-for-purpose informatics analysis report.

Authors: Guoqian Jiang; Sanket S Dhruva; Jiajing Chen; Wade L Schulz; Amit A Doshi; Peter A Noseworthy; Shumin Zhang; Yue Yu; H Patrick Young; Eric Brandt; Keondae R Ervin; Nilay D Shah; Joseph S Ross; Paul Coplan; Joseph P Drozda
Journal: J Am Med Inform Assoc Date: 2021-09-18 Impact factor: 4.497

5 in total