| Literature DB >> 31783696 |
Sandra Brasil1,2, Carlota Pascoal1,2,3, Rita Francisco1,2,3, Vanessa Dos Reis Ferreira1,2, Paula A Videira1,2,3, And Gonçalo Valadão4,5,6.
Abstract
The amount of data collected and managed in (bio)medicine is ever-increasing. Thus, there is a need to rapidly and efficiently collect, analyze, and characterize all this information. Artificial intelligence (AI), with an emphasis on deep learning, holds great promise in this area and is already being successfully applied to basic research, diagnosis, drug discovery, and clinical trials. Rare diseases (RDs), which are severely underrepresented in basic and clinical research, can particularly benefit from AI technologies. Of the more than 7000 RDs described worldwide, only 5% have a treatment. The ability of AI technologies to integrate and analyze data from different sources (e.g., multi-omics, patient registries, and so on) can be used to overcome RDs' challenges (e.g., low diagnostic rates, reduced number of patients, geographical dispersion, and so on). Ultimately, RDs' AI-mediated knowledge could significantly boost therapy development. Presently, there are AI approaches being used in RDs and this review aims to collect and summarize these advances. A section dedicated to congenital disorders of glycosylation (CDG), a particular group of orphan RDs that can serve as a potential study model for other common diseases and RDs, has also been included.Entities:
Keywords: artificial intelligence; big data; congenital disorders of glycosylation; diagnosis; drug repurposing; machine learning; personalized medicine; rare diseases
Mesh:
Year: 2019 PMID: 31783696 PMCID: PMC6947640 DOI: 10.3390/genes10120978
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
List of available artificial intelligence (AI)- and machine learning (ML)-based methods used in rare diseases (RDs) organized by function.
|
|
|
|
|
|
|
| Mutation Detection and Prediction | Predicts the pathogenicity/disease relevance of genetic variants | Alirezaie et al. | CADD | SVM | Several RDs |
| ClinPred | Ensemble classifier (RF and gradient boosting models) | ||||
| Yan et al. | CNVdigest | DNorm (conditional random fields, stochastic gradient descent, pairwise learning to rank) | Digeorge syndrome | ||
| Alirezaie et al. | Fathmm-MKL | SVM based on multiple kernel learning | Several RDs | ||
| GenoCanyon | Unsupervised statistical learning | ||||
| M-CAP | Gradient boosting trees | ||||
| MetaLR | Ensemble classifier | ||||
| Meta-SVM | Meta-analytic SVM | ||||
| Browne et al. | Meta-SNP | RF | Mevalonic kinase deficiency | ||
| nsSNP Analyzer | RF | ||||
| PhD-SNP | SVM | ||||
|
|
|
|
|
|
|
| Mutation Detection and Prediction | Predicts the pathogenicity/disease relevance of genetic variants | Browne et al. | PredictSNP | Consensus classifier using the Naïve Bayes classifier, the multinomial logistic regression model, NN, SVM, K-nearest neighbor classifier, and RF | Mevalonic kinase deficiency |
| Alirezaie et al. | REVEL | RF | Several RDs | ||
| Buske et al. | SilVA | RF | Meckel syndrome and other RDs | ||
| Browne et al. | SNAP | Neural network | Mevalonic kinase deficiency | ||
| Jaganathan et al. | SpliceAI | Deep residual NN | RDs with intellectual disability and autism spectrum disorders | ||
| Alirezaie et al. | VAAST Variant Prioritizer (VVP) | Probabilistic search ML tool using the CLRT | Several RDs | ||
| Papadimitriou et al. | VarCoPP | RF | Several RDs (including MODY, Kallman syndrome, familial hemophagocytic lymphohistiocytosis, and nontype I cystinuria) | ||
| Carter et al. | VEST | RF | Miller and Freeman Sheldon syndrome | ||
|
|
|
|
|
|
|
| Mutation Detection and Prediction | Predicts the impact of SNVs on protein stability, affinity and functionality | Alirezaie et al. | Eigen | Unsupervised spectral approach | Several RDs |
| Browne et al. | I-Mutant | SVM | Mevalonic kinase deficiency | ||
| iStable | SVM | ||||
| mCSM | Gaussian process regression model | ||||
| MUpro | SVM and neural NN | ||||
| Carter et al., Alirezaie et al., Browne et al. | PolyPhen2 | Naïve Bayes classifier | Several RDs’ Mevalonic kinase deficiency | ||
| Browne et al. | PoPMuSiC-2.1/DEZYME | Simple NN | Mevalonic kinase deficiency | ||
| Predicts gene/variant pathogenicity and clinical relevance while integrating phenotypic data | Boudellioua et al. | DeepPVP | Deep NN | Several RDs | |
| Bosio et al. | eDiVA | RF | CF, PKU, and other RDs | ||
| Li et al. | Exomiser | RF | Several RDs | ||
| Li et al. | Xrare | Gradient boosting decision tree | Several RDs | ||
|
|
|
|
|
|
|
| Decision Support Systems | DDSS based on phenotype | Ronickle et al. | AdaDX | Augmented QMR Bayesian network | Several RDs |
| (Basel-Vanegaite et al., Liehr et al., Zarate et al., Marbach et al., Martinez-Monseny et al., Hsieh et al. | Face2Gene | Deep NN | RDs including Cornelia de Lange syndrome, Emanuel syndrome and Pallister–Killan syndrome, SATB2-associated syndrome | ||
| Rao et al. | HANRD | Graph convolution-based association scoring | Several RDs | ||
| Jayed et al. | Phen–Gen | Bayesian network | Several RDs | ||
| Jia et al. | RDAD | Logistic regression, K-nearest neighbor, RF, extra | Several RDs | ||
| Garcelon et al., Garcelon et al. | Dr. Warehouse | Vector space model | Lowe syndrome, dystrophic epidermolysis bullosa, activated PI3K delta syndrome, Rett syndrome, and Dowling Meara | ||
|
|
|
|
|
|
|
| Disease Classification and Mechanisms’ Elucidation | Data mining for discovery of molecular patterns | Blasco et al., Lagrue et al. | Biosigner | Partial least square discriminant analysis (PLS-DA), RF, and SVM | ALS |
| N-, O-, and C-glycosylation sites prediction | Caragea et al. | EnsembleGly | Ensembles of SVM | Possible application in human disorders of glycosylation | |
| Hamby et al. | GPP | RF | |||
| Sub-Golgi proteins identification | Rahman et al. | isGPT | RF and SVM | Possible application in human disorders of glycosylation | |
| Data clustering | Hoehndorf et al. | FLAME | Fuzzy clustering | Several RDs, including LSDs and Charcot–Marie–Tooth disease 4J | |
| Disease pathways prediction | Taroni et al. | MultiPLIER | Transfer learning | Systemic lupus erythematous, microscopic polyangiitis, and (eosinophilic) granulomatosis with polyangiitis | |
|
|
|
|
|
|
|
| Disease Classification and Mechanisms’ Elucidation | Prediction models based on gene expression data and anatomical relationships hierarchy | Lee et al. | URSAHD | Bayesian network | Refractory anemia with excessive blasts and sideroblastic anemia |
| Data mining, clustering, and visualization tools | Dehiya et al. | Weka | Collection of ML algorithms | Several RDs (they exemplify for CF and Rett syndrome) |
Legend: AI—artificial intelligence; ALS—amyotrophic lateral sclerosis; CADD—combined annotation dependent depletion; CF—cystic fibrosis; CLRT—composite likelihood ratio test; DDSS—diagnosis decision support system; DeepPVP—deep phenomeNET variant predictor; eDIVA—exome disease variant analysis; GPP—glycosylation predictor; HANRD—heterogeneous association network for rare diseases; isGTP—identification of sub-Golgi protein types; LSD–lysosomal storage diseases; M-CAP—mendelian clinically applicable pathogenicity; mCSM—mutation cutoff scanning matrix; ML—machine learning; NER—named entities recognition; NN—neural network; PKU—phenylketonuria; RD—rare diseases; RDAD—rare disease auxiliary diagnosis; RF—random forest; SilVA—silent variant analyzer; SVM—support vector machine; URSAHD—unveiling RNA sample annotation for human diseases; VarCoPP—variant combinations pathogenicity predictor; VEST—variant effect scoring tool; SNP—single nucleotide polymorphism; SNV—single nucleotide variants.