PURPOSE: Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes-that is, penetrance-enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) -based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure. MATERIALS AND METHODS: We compared the semiautomated NLP-based procedure, which involves automated abstract classification and text mining, followed by human review of identified studies, with the traditional procedure that requires human review of all studies. Ten high-quality gene-cancer penetrance meta-analyses spanning 16 gene-cancer associations were used as the gold standard by which to evaluate the performance of our procedure. For each meta-analysis, we evaluated the number of abstracts that required human review (workload) and the ability to identify the studies that were included by the authors in their quantitative analysis (coverage). RESULTS: Compared with the traditional procedure, the semiautomated NLP-based procedure led to a lower workload across all 10 meta-analyses, with an overall 84% reduction (2,774 abstracts v 16,941 abstracts) in the amount of human review required. Overall coverage was 93%-we are able to identify 132 of 142 studies-before reviewing references of identified studies. Reasons for the 10 missed studies included blank and poorly written abstracts. After reviewing references, nine of the previously missed studies were identified and coverage improved to 99% (141 of 142 studies). CONCLUSION: We demonstrated that an NLP-based procedure can significantly reduce the review workload without compromising the ability to identify relevant studies. NLP algorithms have promising potential for reducing human efforts in the literature review process.
PURPOSE: Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes-that is, penetrance-enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) -based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure. MATERIALS AND METHODS: We compared the semiautomated NLP-based procedure, which involves automated abstract classification and text mining, followed by human review of identified studies, with the traditional procedure that requires human review of all studies. Ten high-quality gene-cancer penetrance meta-analyses spanning 16 gene-cancer associations were used as the gold standard by which to evaluate the performance of our procedure. For each meta-analysis, we evaluated the number of abstracts that required human review (workload) and the ability to identify the studies that were included by the authors in their quantitative analysis (coverage). RESULTS: Compared with the traditional procedure, the semiautomated NLP-based procedure led to a lower workload across all 10 meta-analyses, with an overall 84% reduction (2,774 abstracts v 16,941 abstracts) in the amount of human review required. Overall coverage was 93%-we are able to identify 132 of 142 studies-before reviewing references of identified studies. Reasons for the 10 missed studies included blank and poorly written abstracts. After reviewing references, nine of the previously missed studies were identified and coverage improved to 99% (141 of 142 studies). CONCLUSION: We demonstrated that an NLP-based procedure can significantly reduce the review workload without compromising the ability to identify relevant studies. NLP algorithms have promising potential for reducing human efforts in the literature review process.
Authors: Qiu-Yue Zhong; Leena P Mittal; Margo D Nathan; Kara M Brown; Deborah Knudson González; Tianrun Cai; Sean Finan; Bizu Gelaye; Paul Avillach; Jordan W Smoller; Elizabeth W Karlson; Tianxi Cai; Michelle A Williams Journal: Eur J Epidemiol Date: 2018-12-10 Impact factor: 8.082
Authors: E Theodoratou; H Campbell; A Tenesa; R Houlston; E Webb; S Lubbe; P Broderick; S Gallinger; E M Croitoru; M A Jenkins; A K Win; S P Cleary; T Koessler; P D Pharoah; S Küry; S Bézieau; B Buecher; N A Ellis; P Peterlongo; K Offit; L A Aaltonen; S Enholm; A Lindblom; X-L Zhou; I P Tomlinson; V Moreno; I Blanco; G Capellà; R Barnetson; M E Porteous; M G Dunlop; S M Farrington Journal: Br J Cancer Date: 2010-11-09 Impact factor: 7.640
Authors: Yujia Bao; Zhengyi Deng; Yan Wang; Heeyoon Kim; Victor Diego Armengol; Francisco Acevedo; Nofal Ouardaoui; Cathy Wang; Giovanni Parmigiani; Regina Barzilay; Danielle Braun; Kevin S Hughes Journal: JCO Clin Cancer Inform Date: 2019-09
Authors: Bhuvan Sharma; Van C Willis; Claudia S Huettner; Kirk Beaty; Jane L Snowdon; Shang Xue; Brett R South; Gretchen P Jackson; Dilhan Weeraratne; Vanessa Michelini Journal: JAMIA Open Date: 2020-09-29
Authors: Tyler J Loftus; Amanda C Filiberto; Jeremy Balch; Alexander L Ayzengart; Patrick J Tighe; Parisa Rashidi; Azra Bihorac; Gilbert R Upchurch Journal: J Surg Res Date: 2020-04-24 Impact factor: 2.192
Authors: Chayakrit Krittanawong; Kipp W Johnson; Edward Choi; Scott Kaplin; Eric Venner; Mullai Murugan; Zhen Wang; Benjamin S Glicksberg; Christopher I Amos; Michael C Schatz; W H Wilson Tang Journal: Life (Basel) Date: 2022-02-14