| Literature DB >> 32195365 |
I S Stafford1,2, M Kellermann1, E Mossotto1,2, R M Beattie3, B D MacArthur2, S Ennis1.
Abstract
Autoimmune diseases are chronic, multifactorial conditions. Through machine learning (ML), a branch of the wider field of artificial intelligence, it is possible to extract patterns within patient data, and exploit these patterns to predict patient outcomes for improved clinical management. Here, we surveyed the use of ML methods to address clinical problems in autoimmune disease. A systematic review was conducted using MEDLINE, embase and computers and applied sciences complete databases. Relevant papers included "machine learning" or "artificial intelligence" and the autoimmune diseases search term(s) in their title, abstract or key words. Exclusion criteria: studies not written in English, no real human patient data included, publication prior to 2001, studies that were not peer reviewed, non-autoimmune disease comorbidity research and review papers. 169 (of 702) studies met the criteria for inclusion. Support vector machines and random forests were the most popular ML methods used. ML models using data on multiple sclerosis, rheumatoid arthritis and inflammatory bowel disease were most common. A small proportion of studies (7.7% or 13/169) combined different data types in the modelling process. Cross-validation, combined with a separate testing set for more robust model evaluation occurred in 8.3% of papers (14/169). The field may benefit from adopting a best practice of validation, cross-validation and independent testing of ML models. Many models achieved good predictive results in simple scenarios (e.g. classification of cases and controls). Progression to more complex predictive models may be achievable in future through integration of multiple data types.Entities:
Keywords: Autoimmune diseases; Machine learning; Predictive medicine
Year: 2020 PMID: 32195365 PMCID: PMC7062883 DOI: 10.1038/s41746-020-0229-3
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1The three factors contributing to autoimmune disease development.
I genetic susceptibility is conferred by a combination of genes that may include genes encoding human leukocyte antigen (HLA) innate and adaptive immune proteins, and directly or indirectly affect the regulation of the immune system. II examples of potential environmental triggers for dysregulation. III autoantibody production alone will not always result in disease development; self-antigen production and subsequent elevated immune response is necessary.[3]
Fig. 2Simplified workflow for developing a machine learning model.
This includes the cycle of feature selection, training and validation that is required to avoid overfitting (cross validation).
Fig. 3Methodological flowchart and number of papers reviewed at each stage.
The inclusion and exclusion criteria are applied to the title and abstract at the screening step and to the full article at the eligibility step. During the screening step, it was unclear from some abstracts if the article fulfilled the criteria, and therefore a full read is completed at the eligibility step to clarify the status of those records. Two reviewers completed screening independently, and where consensus could not be reached, a third reviewer assessed these articles and decided whether they were included or excluded.
Machine learning and artificial intelligence applications to autoimmune diseases.
| Disease | Number of studies | Years | Most popular classification/prediction application(s) | Most popular machine learning method(s) | Median sample size (min, max) | Data types used |
|---|---|---|---|---|---|---|
| Multiple sclerosis | 41[ | 2008–2019 | Diagnosis, Prognosis, Disease Subtype | Type of Regression, Random Forest, Support Vector Machine | 99 (12, 12566) | Clinical, Survey, Genetic, MRI, Lipid Markers, SNPs, Gait Data, Immune repertoire, Gene Expression |
| Rheumatoid arthritis | 32[ | 2003–2018 | Risk, Diagnosis, Early Diagnosis, Identify Patients | Support Vector Machine, Variations of Random Forest, Neural Network and Decision Tree | 338 (22, 922199) | Medical Database, Immunoassay, Metagenomic, Microbiome, GWAS/SNP, Clinical, Movement Data, Amino acid analytes, Transcriptomic, EMRs, Ultrasound images, Proteomic, Laser images |
| Inflammatory bowel disease | 30[ | 2007–2018 | Diagnosis, Response to Treatment, Disease Risk, Disease Severity | Random Forest, Support Vector Machine | 273 (50, 53279) | Clinical, Colonoscopy Images, Metagenomic, Gene Expression, GWAS, Microbiota, miRNA Expression, EMRs, Exome, MRI |
| Type 1 diabetes | 17[ | 2009–2018 | Disease Management | Novel Methods/Hybrid Models, Neural Network, Support Vector Regression | 23 (10, 10579) | Clinical, Red Blood Cell Images, VOCs, GWAS/SNPs |
| Systemic lupus erythematosus | 14[ | 2009–2018 | Variations of prognosis, Diagnosis | Logistic Regression, Neural Network, Random Forest Decision Tree | 318 (14, 17057) | Clinical, Electronic Health Records, Drug Treatment, SNPs, MRI, Exome, Gene Expression, Proteomic, Urine Biomarkers |
| Psoriasis | 11[ | 2007–2018 | Diagnosis, Disease Severity | Support Vector Machine | 540 (80, 22181) | Digital Image, GWAS, Proteomic, RNA Biomarkers |
| Coeliac disease | 7[ | 2011–2018 | Diagnosis | Random Forest, Logistic Regression, Bayesian Classifier, Support Vector Machine, Logistic Model, Natural Language Processing, Combined Fuzzy Cognitive Map and Possibilistic Fuzzy c-means clustering. | 465 (47, 1498) | VOCs, Clinical, Peptide, EMRs |
| Thyroid diseases | 6[ | 2008–2018 | Diagnosis | Hybrid Models | 215 (215, 7200) | Clinical |
| Autoimmune liver diseases | 5[ | 2009–2018 | Prognosis | Variations on Random Forest | 288 (64, 787) | Clinical, Clinical Trial, Microbiome |
| Systemic sclerosis | 4[ | 2016–2018 | Diagnosis, Treatment, Prognosis | Support Vector Machine, Random Forest | 119 (37, 991) | Gene Expression, Nailfold capillaroscopy images, Peripheral Blood Mononuclear cell data (flow cytometry, DNA, mRNA) |
Information includes the number of studies per autoimmune disease, the years they occurred, popular applications and methods and data types used. Median sample size was a better representation than mean, due to large cohorts in studies using data from genome-wide association studies and electronic medical records.
EMR electronic medical record, GWAS genome-wide association study, miRNA micro RNA, MRI magnetic resonance imaging, SNP single nucleotide polymorphism, VOC volatile organic compound.
Search terms used in OvidSP and EBSCO for each autoimmune disease.
| Autoimmune disease | Disease Search Term(s) Used |
|---|---|
| Addison’s disease | Addison* |
| Alopecia | Alopecia |
| Celiac disease | Celiac, Coeliac |
| Inflammatory bowel disease | Inflammatory bowel disease, Crohn* disease, ulcerative colitis |
| Type 1 diabetes | Type 1 Diabetes, Insulin dependent Diabetes? |
| Autoimmune hepatitis | Autoimmune hepatitis, chronic active hepatitis, primary biliary cirrhosis, primary sclerosing cholangitis |
| Thyroid disease | Autoimmune thyroiditis, Hashimoto* thyroiditis, Hashimoto* disease, Grave* disease, hyperthyroid*, hypothyroid* |
| Multiple sclerosis | Multiple sclerosis |
| Myasthenia gravis | Myasthenia gravis |
| Polymyalgia rheumatica | Polymyalgia rheumatica |
| Psoriasis | Psoriasis |
| Psoriatic arthritis | Psoriatic arthritis |
| Rheumatoid arthritis | Rheumatoid Arthritis |
| Sjӧgren syndrome | Sjogren syndrome |
| Systemic sclerosis | Systemic sclerosis |
| Systemic lupus erythematosus | Lupus |
| Systemic vasculitis | Polyarteritis nodosa, microscopic polyangiitis, granulomatosis with polyangiitis, eosinophilic granulomatosis with polyangiitis. |
| Uveitis (iridocyclitis) | Uvetitis, iridocyclitis |
| Vitiligo | Vitiligo |
Asterisk (*) and question mark (?) are wildcard characters used for searching the databases OvidSP and EBSCO.