Literature DB >> 31545655

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes.

Yujia Bao1, Zhengyi Deng2, Yan Wang2, Heeyoon Kim1, Victor Diego Armengol2, Francisco Acevedo2, Nofal Ouardaoui3, Cathy Wang3,4, Giovanni Parmigiani3,4, Regina Barzilay1, Danielle Braun3,4, Kevin S Hughes2,5.   

Abstract

PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools that help to monitor and prioritize the literature to understand the clinical implications of pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance-risk of cancer for germline mutation carriers-or prevalence of germline genetic mutations.
MATERIALS AND METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated data set for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule on the basis of the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule on the basis of the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence.
RESULTS: For penetrance classification, we annotated 3,740 paper titles and abstracts and evaluated the two models using 10-fold cross-validation. The SVM model achieved 88.93% accuracy-percentage of papers that were correctly classified-whereas the CNN model achieved 88.53% accuracy. For prevalence classification, we annotated 3,753 paper titles and abstracts. The SVM model achieved 88.92% accuracy and the CNN model achieved 88.52% accuracy.
CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date.

Entities:  

Mesh:

Year:  2019        PMID: 31545655      PMCID: PMC6873946          DOI: 10.1200/CCI.19.00042

Source DB:  PubMed          Journal:  JCO Clin Cancer Inform        ISSN: 2473-4276


  21 in total

1.  Combining relevance assignment with quality of the evidence to support guideline development.

Authors:  Marcelo Fiszman; Bruce E Bray; Dongwook Shin; Halil Kilicoglu; Glen C Bennett; Olivier Bodenreider; Thomas C Rindflesch
Journal:  Stud Health Technol Inform       Date:  2010

2.  Text categorization models for high-quality article retrieval in internal medicine.

Authors:  Yindalon Aphinyanaphongs; Ioannis Tsamardinos; Alexander Statnikov; Douglas Hardin; Constantin F Aliferis
Journal:  J Am Med Inform Assoc       Date:  2004-11-23       Impact factor: 4.497

3.  Reducing workload in systematic review preparation using automated citation classification.

Authors:  A M Cohen; W R Hersh; K Peterson; Po-Yin Yen
Journal:  J Am Med Inform Assoc       Date:  2005-12-15       Impact factor: 4.497

Review 4.  Advances in natural language processing.

Authors:  Julia Hirschberg; Christopher D Manning
Journal:  Science       Date:  2015-07-17       Impact factor: 47.728

5.  A new algorithm for reducing the workload of experts in performing systematic reviews.

Authors:  Stan Matwin; Alexandre Kouznetsov; Diana Inkpen; Oana Frunza; Peter O'Blenis
Journal:  J Am Med Inform Assoc       Date:  2010 Jul-Aug       Impact factor: 4.497

6.  A natural language processing pipeline for pairing measurements uniquely across free-text CT reports.

Authors:  Merlijn Sevenster; Jeffrey Bozeman; Andrea Cowhy; William Trost
Journal:  J Biomed Inform       Date:  2014-09-06       Impact factor: 6.317

7.  Using natural language processing to improve efficiency of manual chart abstraction in research: the case of breast cancer recurrence.

Authors:  David S Carrell; Scott Halgrim; Diem-Thy Tran; Diana S M Buist; Jessica Chubak; Wendy W Chapman; Guergana Savova
Journal:  Am J Epidemiol       Date:  2014-01-30       Impact factor: 4.897

8.  The feasibility of using natural language processing to extract clinical information from breast pathology reports.

Authors:  Julliette M Buckley; Suzanne B Coopey; John Sharko; Fernanda Polubriaginof; Brian Drohan; Ahmet K Belli; Elizabeth M H Kim; Judy E Garber; Barbara L Smith; Michele A Gadd; Michelle C Specht; Constance A Roche; Thomas M Gudewicz; Kevin S Hughes
Journal:  J Pathol Inform       Date:  2012-06-30

9.  Reducing systematic review workload through certainty-based screening.

Authors:  Makoto Miwa; James Thomas; Alison O'Mara-Eves; Sophia Ananiadou
Journal:  J Biomed Inform       Date:  2014-06-19       Impact factor: 6.317

10.  Semi-automated screening of biomedical citations for systematic reviews.

Authors:  Byron C Wallace; Thomas A Trikalinos; Joseph Lau; Carla Brodley; Christopher H Schmid
Journal:  BMC Bioinformatics       Date:  2010-01-26       Impact factor: 3.169

View more
  7 in total

1.  Machine Learning Approach to Facilitate Knowledge Synthesis at the Intersection of Liver Cancer, Epidemiology, and Health Disparities Research.

Authors:  Travis C Hyams; Ling Luo; Brionna Hair; Kyubum Lee; Zhiyong Lu; Daniela Seminara
Journal:  JCO Clin Cancer Inform       Date:  2022-05

2.  Predictive article recommendation using natural language processing and machine learning to support evidence updates in domain-specific knowledge graphs.

Authors:  Bhuvan Sharma; Van C Willis; Claudia S Huettner; Kirk Beaty; Jane L Snowdon; Shang Xue; Brett R South; Gretchen P Jackson; Dilhan Weeraratne; Vanessa Michelini
Journal:  JAMIA Open       Date:  2020-09-29

3.  Non-medullary Thyroid Cancer Susceptibility Genes: Evidence and Disease Spectrum.

Authors:  Jingan Zhou; Preeti Singh; Kanhua Yin; Jin Wang; Yujia Bao; Menghua Wu; Kush Pathak; Sophia K McKinley; Danielle Braun; Carrie C Lubitz; Kevin S Hughes
Journal:  Ann Surg Oncol       Date:  2021-03-03       Impact factor: 5.344

4.  Artificial Intelligence Clinical Evidence Engine for Automatic Identification, Prioritization, and Extraction of Relevant Clinical Oncology Research.

Authors:  Fernando Suarez Saiz; Corey Sanders; Rick Stevens; Robert Nielsen; Michael Britt; Leemor Yuravlivker; Anita M Preininger; Gretchen P Jackson
Journal:  JCO Clin Cancer Inform       Date:  2021-01

5.  Disease Spectrum of Breast Cancer Susceptibility Genes.

Authors:  Jin Wang; Preeti Singh; Kanhua Yin; Jingan Zhou; Yujia Bao; Menghua Wu; Kush Pathak; Sophia K McKinley; Danielle Braun; Kevin S Hughes
Journal:  Front Oncol       Date:  2021-04-20       Impact factor: 6.244

6.  Natural Language Processing to Identify Digital Learning Tools in Postgraduate Family Medicine: Protocol for a Scoping Review.

Authors:  Hui Yan; Arya Rahgozar; Claire Sethuram; Sathya Karunananthan; Douglas Archibald; Lindsay Bradley; Ramtin Hakimjavadi; Mary Helmer-Smith; Kheira Jolin-Dahel; Tess McCutcheon; Jeffrey Puncher; Parisa Rezaiefar; Lina Shoppoff; Clare Liddy
Journal:  JMIR Res Protoc       Date:  2022-05-02

Review 7.  Penetrance of Colorectal Cancer Among Mismatch Repair Gene Mutation Carriers: A Meta-Analysis.

Authors:  Cathy Wang; Yan Wang; Kevin S Hughes; Giovanni Parmigiani; Danielle Braun
Journal:  JNCI Cancer Spectr       Date:  2020-04-23
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.