| Literature DB >> 35207566 |
Chayakrit Krittanawong1,2,3,4, Kipp W Johnson3,5, Edward Choi6, Scott Kaplin2, Eric Venner4, Mullai Murugan7, Zhen Wang8,9, Benjamin S Glicksberg3,5, Christopher I Amos10, Michael C Schatz11,12, W H Wilson Tang13,14,15.
Abstract
Polygenic diseases, which are genetic disorders caused by the combined action of multiple genes, pose unique and significant challenges for the diagnosis and management of affected patients. A major goal of cardiovascular medicine has been to understand how genetic variation leads to the clinical heterogeneity seen in polygenic cardiovascular diseases (CVDs). Recent advances and emerging technologies in artificial intelligence (AI), coupled with the ever-increasing availability of next generation sequencing (NGS) technologies, now provide researchers with unprecedented possibilities for dynamic and complex biological genomic analyses. Combining these technologies may lead to a deeper understanding of heterogeneous polygenic CVDs, better prognostic guidance, and, ultimately, greater personalized medicine. Advances will likely be achieved through increasingly frequent and robust genomic characterization of patients, as well the integration of genomic data with other clinical data, such as cardiac imaging, coronary angiography, and clinical biomarkers. This review discusses the current opportunities and limitations of genomics; provides a brief overview of AI; and identifies the current applications, limitations, and future directions of AI in genomics.Entities:
Keywords: AI; artificial intelligence; cardiology; cardiovascular disease; deep learning; genetics; genomics; machine learning
Year: 2022 PMID: 35207566 PMCID: PMC8875522 DOI: 10.3390/life12020279
Source DB: PubMed Journal: Life (Basel) ISSN: 2075-1729
Example of direct-to-consumer genomics companies.
| Company | AI algorithms | Input | Database | Limitations | More Information | Example Diseases |
|---|---|---|---|---|---|---|
| 23andMe | ML models | Genetic variants | In-house 23andMe database and public databases (e.g., UK Biobank) | Heterogeneity of data (phenotypes, QC control for genetics) between UK Biobank and 23andMe | Map the impact of individuals’ genetic material on phenotypes | Weight pharmacogenetic testing |
| AncestryDNA | Not specified | Genotype samples on the Illumina OmniExpress platforms | AncestryDNA database | Serious privacy concerns | ||
| Atomwise | ANN model | Gene targets and drug discovery | Public databases and proprietary sources | NA | Predict novel binding compounds; drug discovery | Prevent drug related cardiac toxicity |
| ATUM | ML to develop its Leap-In transposase technology | DNA synthesis | Protein engineering | NA | Enables any recombinant DNA sequence to behave as a transposon (a DNA sequence that can change its position within a genome altering the cell’s genetic identity and genomic size) | NA |
| BenevolentAI | Several models: BioNLP, BERT, deep learning, GuacaMol, Monte Carlo tree search, and symbolic AI | The Reaxys | NA | Understanding the disease mechanisms at the earliest stage of our programs; identify the patients who are likely to respond to a treatment; identify drug targets that control these mechanism(s); and make drugs to correct them | NA | |
| Calico (Calico Life Sciences LLC) | Proteome Analysis GWAS | AncestryDNA database | NA | NA | ||
| Color Genomics | ML models | Inhouse and industry (e.g., Agilent, Illumina and Hamilton) | No detail of ML model provided | Long QT syndrome (LQTS):Left ventricular noncompaction cardiomyopathy | ||
| CZ Biohub | ML models | Biochips embedded with human cells | Transcriptome data from animal model | NA | NA | |
| Deep Genomics | Deep Learning | Several types of genetic data | European Genome-Phenome Archive | No detail of DL model provided | Identifying one or more genes responsible for a disease, potential drug therapies for an individual based on genome | Spinal muscular atrophy, nonpolyposis colorectal cancer, and autism |
| DNAnexus | DeepVariant | NGS data | Public database such as UK Biobank | NA | NA | |
| Fabric Genomics | Proprietary algorithms | NGS | Public database such as gnomAD (gnomad.broadinstitute.org/) | Proprietary model | A proprietary set of algorithms; The Variant Annotation, Analysis and Search Tool (AAST) and Phevor (Phenotype Driven Variant Ontological Re-ranking tool) | NA |
| Freenome | Standard ML models such as logistic regression, principal component analysis (PCA) and support vector machine (SVM) | Whole-genome sequencing, cfDNA, cfRNA, and protein data | Proprietary sources and public database (e.g., NIH Roadmap Epigenome Mapping Consortium) | Proprietary sources | AI-EMERGE (NCT03688906) | NA |
| Futura Genetics | DNA from saliva | NA | APEX (arrayed primer extension) technology for detecting SNPs | NA | ||
| Genoox | AI-based variant classification (aiVCE) | NGS | In-house exome | NA | Diagnosis and treatment of genetic disorders and cancer, as well as new drug discovery and family planning; automated classification engine based on ACMG guidelines | NA |
| Grail | NA | The Circulating Cell-free Genome Atlas (CCGA) Study | NA | |||
| IBM Watson for Genomics | NLP for several different predictive models | VCFs, CNV, and gene expression data abstracts and full-text articles | In-house hospital, PubMed and ClinicalTrials.gov | NA | Driver alterations, actionable variants, VUS, relevant therapies, and potential clinical trials | glioblastoma |
| Illumina | SpliceAI | NGS | Public databases (e.g., the ExAC/gnomAD database; | NA | Distinguish a handful of disease-causing mutations in patients with rare genetic diseases from a large number of benign variants present in healthy people | NA |
| Karius | Proprietary Karius AI technology | blood test based on next-generation sequencing | NA | Proprietary model | endocarditis | |
| Nvidia and Scripps Research Translational Institute | Deep Learning | Development phase | NA | Still in development phase and not many details disclosed | Blood pressure monitoring; blood glucose genomics; digital wearable data | NA |
| Quest Diagnostics | Watson’s cognitive computing and hc1’s machine learning technology | Genome sequencing | In-house | No detail of ML model provided | NA | |
| SOPHiA Genetics | Proprietary and standard algorithms (e.g., hidden Markov model algorithm) | NGS data | In-house and public databases (e.g., ClinVar, ExAC, and dbSNP) | NA | SNVs, Indels and CNVs detection, ALU insertions, Pseudogene variants differentiation and variant annotation | arrhythmias (e.g., Long/Short QT syndrome or Brugada syndrome) and cardiomyopathies |
| Synpromics | ML models | Gene promoter design, a novel genomics-based platform | BIOBASE Biological Databases, UCSC GoldenPath, European Bioinformatics Institute | No detail of ML model provided | Predict the genomic sequences that are involved in cell type-specific regulation of gene expression | Design of Synthetic Mammalian Promoters |
| Verge Genomics | AI in pharmacogenomics | microRNA (miRNA) | Academic databases, research centers, and public databases (e.g., the NCBI database and | Proprietary AI model | AI-generated therapies for ALS and Parkinson by screening thousands genes | NA |
| Verily | DeepMass | Protein signals, genomics, and transcriptomics | Identify and quantify proteins | No validation | Integrate protein signals with other biomolecular data, such as genomics and transcriptomics, as well as with device measurements and disease status, to find out how genetics and behavior affect protein profiles | NA |
| Veritas Genetics | ML models and AI | Whole Genome Sequencing and Whole Exome Sequencing | Internal databases of two clinical testing laboratories (Laboratory for Molecular Medicine and Veritas Genetics) and public databases (e.g., ClinVar) | NA | NA | |
| Viome | Watson machine-learning | Gut microbiome | NA | No publications seen in Pubmed | NA |
Examples of variant calling, reporting, and interpretation AI.
| Name | Algorithms | Example Function |
|---|---|---|
| DeepVariant [ | Deep convolutional neural network (CNN) | Variant calling from short-read sequencing by reconstructing DNA alignments as an image |
| Clairvoyante [ | A multi-task convolutional deep neural network | (1) Variant calling in single molecule sequencing |
| Skyhawk [ | Neural network | Mimics the process of expert review for clinically significant genomics variants identification |
| DeepBind [ | Deep CNN | Predicts the binding sites of DNA-binding proteins and RBPs |
| iDeep [ | Deep belief networks (DBN) and CNN | Cross-domain features and sequence information |
| DeepSEA [ | Deep CNN | Predicts functional consequences of noncoding variants |
| DeepNano [ | Recurrent neural networks (RNN) | Base calling in MinION nanopore reads |
| SpliceAI [ | Deep neural network (DNN) | (1) Predicts splice junctions from an arbitrary pre-mRNA transcript sequence |
| DeepGestalt [ | DNN | Distinguishes more than 200 rare diseases based on patient face images, which could also separate different genetic subtypes (e.g., Noonan syndrome) |
| DeepPVP [ | DNN | Variant prioritization by integrating patients’ phenotype information |
| DeepSVR [ | Deep learning and random forest models | Predicts somatic variants confirmed by orthogonal validation sequencing data |
| DeepGene [ | DNN | Extracts the high-level features between combinatorial somatic point mutations and cancer types. |
| Deep AE [ | Autoencoder | gene expression data |
| DeepMethyl [ | Predicts methylation states of DNA CpG dinucleotides | |
| BioVec [ | Feature representation | |
| DeepMotif [ | Deep convolutional/highway MLP framework | Sequential data about gene regulation |
| DeepChrome [ | Deep CNN | Sequential data about gene regulation |
| Chiron [ | Deep learning model | Translates the raw signal to DNA sequence |
| Variational Autoencoders [ | Autoencoder | Predicts drug response |
| GARFIELD-NGS [ | Deep CNN | Dissects false and true variants in exome sequencing |
| DeepGS [ | Deep CNN | Predicts phenotypes from genotypes |
| DANN [ | DNN | Predicts deleterious annotation or pathogenicity of genetic variants |
| DanQ [ | Hybrid model Deep RNN and CNN | Quantifies the function of non-coding DNA |
| ProLanGO [ | RNN | Protein function prediction |
| BCC-NER [ | NLP | Bidirectional and contextual clues named entity tagger for gene/protein mention recognition |
| BioNLP [ | NLP | Gene regulation network |
| SpaCy [ | NLP | Tagging, parsing, and entity recognition |
Figure 1Conceptual schematic for artificial intelligence in cardiovascular genetics. Artificial intelligence encompasses a spectrum of concepts, including machine learning, NLP, and cognitive computing, which are generally enabled by deep learning and could ultimately be used in cardiovascular genomics for prediction, integration, reconstruction, bioinformatic techniques (e.g., pipeline, screening, variant analysis), and clinical practice. Artificial intelligence has the potential to filter raw genetic data into novel insights that could inform future clinical trials and, ultimately, clinical practice.
Figure 2Potential analytic models for cardiovascular genomics. Reference genome or a single read could be fed into neural network models using convolutional genetic coding based on genetic structures. After neural network processing, outputs can be categorized into homozygous variants, heterozygous variants, and references (no variants), which could ultimately provide novel clinical genetic insights.
Figure 3Potential artificial intelligence improvements to the workflow in cardiovascular genomics. This includes the assessment of the quality of genetic samples obtained (e.g., DNA, RNA, exome), the improvement of informatics pipelines for variant calling, the translation of clinical guidelines for variant interpretation, the transformation of genetic files (e.g., VCF to BAM, VCF to PED), the prediction of variant pathogenicity, the mapping of an individual’s sequence to genome references, and the identification of any clinically actionable mutations.