Mengmeng Liu1, Yunshan Zhong1, Hongqian Liu2, Desheng Liang3,4, Erhong Liu1, Yu Zhang1, Feng Tian1, Qiaowei Liang4, David S Cram1, Hua Wang5, Lingqian Wu3, Fuli Yu6. 1. Berry Genomics Corporation, Beijing, China. 2. Department of Obstetrics and Gynecology, West China Second University Hospital, Sichuan University, Chengdu. 3. Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China. 4. Hunan Jiahui Genetics Hospital, Changsha, China. 5. Hunan Provincial Maternal and Child Health Care Hospital, Changsha, China. 6. Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.
Abstract
BACKGROUND: Current copy number variation (CNV) identification methods have rapidly become mature. However, the postdetection processes such as variant interpretation or reporting are inefficient. To overcome this situation, we developed REDBot as an automated software package for accurate and direct generation of clinical diagnostic reports for prenatal and products of conception (POC) samples. METHODS: We applied natural language process (NLP) methods for analyzing 30,235 in-house historical clinical reports through active learning, and then, developed clinical knowledge bases, evidence-based interpretation methods and reporting criteria to support the whole postdetection pipeline. RESULTS: Of the 30,235 reports, we obtained 37,175 CNV-paragraph pairs. For these pairs, the active learning approaches achieved a 0.9466 average F1-score in sentence classification. The overall accuracy for variant classification was 95.7%, 95.2%, and 100.0% in retrospective, prospective, and clinical utility experiments, respectively. CONCLUSION: By integrating NLP methods in CNVs postdetection pipeline, REDBot is a robust and rapid tool with clinical utility for prenatal and POC diagnosis.
BACKGROUND: Current copy number variation (CNV) identification methods have rapidly become mature. However, the postdetection processes such as variant interpretation or reporting are inefficient. To overcome this situation, we developed REDBot as an automated software package for accurate and direct generation of clinical diagnostic reports for prenatal and products of conception (POC) samples. METHODS: We applied natural language process (NLP) methods for analyzing 30,235 in-house historical clinical reports through active learning, and then, developed clinical knowledge bases, evidence-based interpretation methods and reporting criteria to support the whole postdetection pipeline. RESULTS: Of the 30,235 reports, we obtained 37,175 CNV-paragraph pairs. For these pairs, the active learning approaches achieved a 0.9466 average F1-score in sentence classification. The overall accuracy for variant classification was 95.7%, 95.2%, and 100.0% in retrospective, prospective, and clinical utility experiments, respectively. CONCLUSION: By integrating NLP methods in CNVs postdetection pipeline, REDBot is a robust and rapid tool with clinical utility for prenatal and POC diagnosis.
Authors: Fan Hsu; W James Kent; Hiram Clawson; Robert M Kuhn; Mark Diekhans; David Haussler Journal: Bioinformatics Date: 2006-02-24 Impact factor: 6.937
Authors: Sue Richards; Nazneen Aziz; Sherri Bale; David Bick; Soma Das; Julie Gastier-Foster; Wayne W Grody; Madhuri Hegde; Elaine Lyon; Elaine Spector; Karl Voelkerding; Heidi L Rehm Journal: Genet Med Date: 2015-03-05 Impact factor: 8.822
Authors: Erin Rooney Riggs; Erica F Andersen; Athena M Cherry; Sibel Kantarci; Hutton Kearney; Ankita Patel; Gordana Raca; Deborah I Ritter; Sarah T South; Erik C Thorland; Daniel Pineda-Alvarez; Swaroop Aradhya; Christa Lese Martin Journal: Genet Med Date: 2019-11-06 Impact factor: 8.822
Authors: Melissa J Landrum; Jennifer M Lee; Mark Benson; Garth Brown; Chen Chao; Shanmuga Chitipiralla; Baoshan Gu; Jennifer Hart; Douglas Hoffman; Jeffrey Hoover; Wonhee Jang; Kenneth Katz; Michael Ovetsky; George Riley; Amanjeev Sethi; Ray Tully; Ricardo Villamarin-Salomon; Wendy Rubinstein; Donna R Maglott Journal: Nucleic Acids Res Date: 2015-11-17 Impact factor: 16.971