| Literature DB >> 30671672 |
Jia Xu1, Pengwei Yang2, Shang Xue2, Bhuvan Sharma2, Marta Sanchez-Martin2, Fang Wang2, Kirk A Beaty2, Elinor Dehan2, Baiju Parikh2.
Abstract
In the field of cancer genomics, the broad availability of genetic information offered by next-generation sequencing technologies and rapid growth in biomedical publication has led to the advent of the big-data era. Integration of artificial intelligence (AI) approaches such as machine learning, deep learning, and natural language processing (NLP) to tackle the challenges of scalability and high dimensionality of data and to transform big data into clinically actionable knowledge is expanding and becoming the foundation of precision medicine. In this paper, we review the current status and future directions of AI application in cancer genomics within the context of workflows to integrate genomic analysis for precision cancer care. The existing solutions of AI and their limitations in cancer genetic testing and diagnostics such as variant calling and interpretation are critically analyzed. Publicly available tools or algorithms for key NLP technologies in the literature mining for evidence-based clinical recommendations are reviewed and compared. In addition, the present paper highlights the challenges to AI adoption in digital healthcare with regard to data requirements, algorithmic transparency, reproducibility, and real-world assessment, and discusses the importance of preparing patients and physicians for modern digitized healthcare. We believe that AI will remain the main driver to healthcare transformation toward precision medicine, yet the unprecedented challenges posed should be addressed to ensure safety and beneficial impact to healthcare.Entities:
Mesh:
Year: 2019 PMID: 30671672 PMCID: PMC6373233 DOI: 10.1007/s00439-019-01970-5
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
Fig. 1Topics discussed in the review paper. This figure demonstrates that several key topics discussed in the paper with the green icons representing benefits or improvements and red icons representing challenges or caveats
Fig. 2Publication number plotted against publication year. In this figure, two y-axes have been plotted. One y-axis represents the number for papers related to “Cancer Genomics”. The other y-axis represents the number for papers related to “Cancer Genomics + NLP”. The x-axis represents the publication year
Survey list of selected tools or algorithms for Bio-NER and relationship extraction in genomics
| Category | Paper | Tool_Name | Extraction_Target | Algorithm_Model | License_Availability | Evaluation_Corpus |
|---|---|---|---|---|---|---|
| Name entity | Wei et al. ( | tmVar | Mutation | CRF (conditional random field) + rule-based | NCBI; accessible by RESTful API | PubMed abstract |
| Doughty et al. ( | EMU | Mutation | Rule-based approach | Freely available | Inhouse corpus | |
| Caporaso et al. ( | MutationFinder | Mutation | Rule-based approach | source code available in JAVA, PYTHON, PERL | Inhouse corpus | |
| Thomas et al. ( | SETH | Mutation | Extended Backus–Naur Form (EBNF) grammar | Freely available | A series of pervious available corpus | |
| Settles ( | ABNER | Genes, protein | CRF | Open source | Inhouse corpus | |
| Leaman and Gonzalez ( | BANNER | Genes | CRF | Open source | BioCreative 2 GM task | |
| Wei et al. ( | GNormPlus | Genes | CRF + additional information | Open source | BioCreative II GN corpus and Citation GIA test collection | |
| Rocktaschel et al. ( | ChemSpot | Drugs | CRF + dictionary | Freely available | SCAI corpus and IUPAC test corpus | |
| Leaman et al. ( | tmChem | Drugs | CRF | RESTful API | CHEMDNER task | |
| Lee et al. ( | BEST Biomedical Entity Extractor | Gene, disease, drug and cell line names | Dictionary-based | Freely available | BRONCO | |
| Leaman et al. ( | Dnorm | Disease | Machine learning | RESTful API | NCBI disease corpus | |
| Leaman and Lu ( | TaggerOne | Disease and chemical | Semi-Markov Models | Open source | NCBI disease corpus and BioCreative V Chemical-Disease Relation corpus | |
| Relationship | Pletscher-Frankild et al. ( | Diseases | Disease gene | Dictionary and co-occurrence | Freely available | Inhouse corpus |
| Mahmood et al. ( | DiMeX | Disease mutation | Lexical and semantic patterns + additional information | – | Bio_muta project | |
| Ravikumar et al. ( | MutD | Protein-mutation disease | Dependency parse graph | Plan to release web and RESTAPI | Abstracts from Pubmed articles | |
| Zou et al. ( | IBRel | microRNA gene | Multi-instance learning | Open source | Bagewadi’s corpus | |
| Burger et al. ( | Mturk | Gene mutation | Crowdsourcing | Open source | Inhouse corpus | |
| Mallory et al. ( | DeepDive | Gene interactions | Distant supervision | Open source | Inhouse corpus | |
| Quirk and Poon ( | DISCREX | Drug–gene | Distant supervision | – | Inhouse corpus | |
| Barbosa-Silva et al. ( | PESCADOR | Gene/protein interactions | Co-occurrence | Web application | AIMed corpus | |
| Bravo et al. ( | BeFree | Gene disease, drug disease, and drug-target | Shallow Linguistic Kernel, Dependency Kernel KDEP | Code available upon request | EU-ADR corpus and GAD corpus | |
| Song et al. ( | PKDE4J | Protein–protein interactions, gene–disease, and disease–drug | Dependence parsing-based rules | Publicly available | BioInfer,AIMed (protein–protein interactions); GAD, CoMAGC, Gene–cancer (disease-gene), PolySearch (drug-disease) | |
| Tsuruoka et al. ( | FACTA+ | Various binary relationship | Joint CRF learning model | Web application | BioNLP’09 shared task corpus | |
| Rinaldi et al. ( | OntoGene | Various binary relationship | Rule-based approach and maximum entropy | RESTful API | Inhouse corpus | |
| Poon et al. ( | Literome | Various binary relationship | Dependency graph and co-occurrence | Freely available for non-commercial usage | Inhouse corpus | |
| Liu et al. ( | PolySearch2 | Various binary relationship | Bag of words + dictionaries | Freely available | Inhouse corpus | |
| Xu and Wang ( | – | Drug gene | Co-occurrence | – | MEDLINE abstracts | |
| Percha and Altman ( | – | Drug gene | Dependence graph generated by Stanford Parser and Ensemble Biclustering for Classification | – | Inhouse corpus | |
| Singhal et al. ( | – | Disease mutation | Decision tree, multi-layer perceptron and Bayesian logistic regression | – | EMU, PubMed_data set generated inhouse | |
| Muzaffar et al. ( | – | Treatment disease | Machine learning | – | Corpus obtained from MEDLINE 2001 | |
| Poon et al. ( | – | Pathway interactions | Distant supervision | – | Inhouse corpus | |
| Miwa et al. ( | – | Protein–protein interactions | Combining kernels | – | Aimed, BionInfer, HPRD50, IEPA, LLL corpus | |
| Yang et al. ( | – | Protein–protein interactions | Weighted multiple kernel learning-based approach | – | Aimed, BionInfer, HPRD50, IEPA, LLL corpus | |
| Bui et al. ( | – | Protein–protein interactions | Somatic properties + machine learning | Open source | Aimed, BionInfer, HPRD50, IEPA, LLL corpus | |
| Tikk et al. ( | – | Protein–protein interactions | Kernel-based approach | – | Five publicly available annotated corpora | |
| Thomas et al. ( | – | Drug–drug interaction | Ensemble learning | – | DDI Extraction 2011 challenge | |
| Bui et al. ( | – | Drug–drug interaction | Feature-based machine learning | Open source | 2011 and 2013 DDI extraction challenge | |
| Bundschus et al. ( | – | Disease-treatment and genes-disease | CRF | – | GeneRIFs data set and annotated MEDLINE abstracts | |
| Lee et al. ( | – | Mutation-gene-drug | CNN | – | Inhouse corpus | |
| Peng et al. ( | – | n-ary relationship | Graph LSTM | – | Inhouse corpus | |
| Hakenberg et al. ( | – | Various binary relationship | Rule-based approach | – | Inhouse corpus |
Tools were evaluated with respect to selected technical criteria including extraction target, algorithm, license, and evaluation corpus, and were grouped into named entity recognition (NER) and relation extraction categories
Major functionalities and transparency for key players in text mining and personalized medicine field
| Players | Functionality | Transparency |
|---|---|---|
| Blueprint genetics | Offers single gene test, targeted variant testing or whole exome sequencing service along with interpretation | No explicit AI description |
| Cambridge cancer genomics | Uses blood tests to guide cancer therapy | No explicit AI description |
| Deep gene | Provides cancer-type classifier based on deep learning and somatic point mutations | Publication is available (Yuan et al. |
| Deep genomics | Develops genetic medicines using artificial intelligence technology, with a focus on the preclinical development of oligonucleotide therapies | No detailed explanation but related publication is available (Wainberg et al. |
| DeepVariant | Analysis pipeline using a deep neural network to call genetic variants from NGS DNA data | Available in GitHub |
| Genomenon | Genomic search engine and database to provide disease-gene-variant relationships from the full text of the scientific literature for gene and variant interpretation | No explicit AI description |
| Genoox | Fully customized platform for genetic applications including primary, secondary and tertiary analyses | No explicit AI description but related publications available (Stajkovska et al. |
| Literome | Automatic curation system to extract genomic knowledge from PubMed articles to facilitate browsing, searching, and reasoning | Publications are available (Poon et al. |
| Perthera | Manage process from tumor testing through Perthera Report to provide cancer patients and physicians with therapeutic options ranked by the probability of outcome | No explicit AI description |
| Sophia Genetics | Provides NGS data analysis to detect, annotate and pre-classify genomic variants associated to multiple disorder areas | No explicit AI description |
| Watson for Genomics | Provides in-depth clinical interpretation of the genetic alterations in the sample automatically, enabling clinical decision-making for personalized cancer care | No explicit AI description but related publication is available (Patel et al. |
| WuXi NextCODE | Uses genomics to identify the underlying biology and advance the scientific understanding of disease and propel the next generation of transformative therapies | No explicit AI description but related publication is available (Zhang et al. |
For each company, the main functionality and transparency are summarized