| Literature DB >> 36189431 |
Apeksha Koul1, Rajesh K Bawa2, Yogesh Kumar3.
Abstract
Airway disease is a major healthcare issue that causes at least 3 million fatalities every year. It is also considered one of the foremost causes of death all around the globe by 2030. Numerous studies have been undertaken to demonstrate the latest advances in artificial intelligence algorithms to assist in identifying and classifying these diseases. This comprehensive review aims to summarise the state-of-the-art machine and deep learning-based systems for detecting airway disorders, envisage the trends of the recent work in this domain, and analyze the difficulties and potential future paths. This systematic literature review includes the study of one hundred fifty-five articles on airway diseases such as cystic fibrosis, emphysema, lung cancer, Mesothelioma, covid-19, pneumoconiosis, asthma, pulmonary edema, tuberculosis, pulmonary embolism as well as highlights the automated learning techniques to predict them. The study concludes with a discussion and challenges about expanding the efficiency and machine and deep learning-assisted airway disease detection applications.Entities:
Year: 2022 PMID: 36189431 PMCID: PMC9516534 DOI: 10.1007/s11831-022-09818-4
Source DB: PubMed Journal: Arch Comput Methods Eng ISSN: 1134-3060 Impact factor: 8.171
Fig. 1Global impact of airway diseases [2]
Traditional approaches to diagnose airway diseases
| Technique | Image | Description | Drawbacks |
|---|---|---|---|
| Spirometry [ |
| The most common pulmonary function test is spirometry. It assesses lung function, precisely the amount of air that can be inhaled and expelled and the rate at which it can be done | Patient feels dizziness and shortness of breath for a moment after the test |
| Body plethysmography [ |
| A pulmonary (lung-related) function test that evaluates how much air is in your lungs when you take a deep breath is called body plethysmography. It also determines how much air is left in your lungs after exhaling as much as possible | Technically demanding and time consuming |
| Impulse oscillometry [ |
| A new technique for measuring airway resistance and reactance is the impulse oscillation system (IOS). It's a sort of forced oscillation in which oscillating sound waves of various frequencies, typically 5 and 20 Hz, are conveyed along the bronchial tree | The impulse used in the test is forceful which causes slight change in lung mechanism |
| Washout tests [ |
| Nitrogen washout (also known as Fowler's method) is a test that measures anatomic dead space in the lungs and other factors related to airway closure throughout a breathing cycle | It consumes a lot of time and also fails to estimate the lung areas which are poorly ventilated |
Fig. 2Role of AI in healthcare
Inclusion standards and exclusion standards
| S. no. | Attributes | Inclusion standards | Exclusion standards |
|---|---|---|---|
| 1 | Duration | Research work that had been carried out between 2010 and 2022 | Published articles before 2010 |
| 2 | Exploration | Research work concentrating on (a) the findings, (b) the benchmark dataset, and (c) the research goal | Research work that focus on other diseases and not on airway diseases |
| 3 | Comparability | Research studies aims at the prediction of airway diseases | Research studies that work on other than airway diseases |
| 4 | Techniques | Research articles that mostly focus on machine and deep learning methods including few traditional ones | Research articles that apply the methods other than machine and deep learning models |
| 5 | Research design | Original articles that comprise of experimental results | Case studies, Language other than English, Patents |
Fig. 3PRISMA flow chart
Fig. 4Predicting airway diseases using multiple learning models
Dataset of multiple airway diseases
| References | Diseases | Dataset name | Description | URLs |
|---|---|---|---|---|
| [ | Cystic fibrosis (CF) | Cystic fibrosis data | The dataset contains one file and 25 columns | |
| [ | Cystic fibrosis registry in the United Kingdom | The dataset contains the geographical information about the cystic fibrosis patients | ||
| [ | European cystic fibrosis registry | The database includes data from more than 49,000 CF people from 38 countries, and from 2008 to 2018 | ||
| [ | Cystic fibrosis dataset | BioGPS has 10 datasets of cystic fibrosis on 5 species | ||
| [ | Cystic fibrosis data and statistics | State of residency, weight, height, sexuality, race, respiratory function test results, pancreatic enzyme usage, duration of hospitalizations, home IVs, and CF-related problems are among the information collected | ||
| [ | Pulmonary embolism | RNSA STR (pulmonary embolism detection) | The dataset includes the training and testing images, which each include 17 data fields related to PE | |
| [ | Pulmonary embolism in CT images | The dataset comprises of CT scan images taken from 35 different individuals | ||
| [ | Pulmonary embolism_codelist | The ICD-10 codes for pulmonary embolism diagnosis are included in the dataset | ||
| [ | Chinese Clinical Trials Registry Center | The dataset, which includes both APE and non-APE, was chosen retrospectively based on medical diagnosis | ||
| [ | FUMPE dataset | The dataset has 35 patients suffering from pulmonary embolism and fall within the age range of 24 to 82 | ||
| [ | Asthma | Informatica | The data set covered all visits by the asthma patient cohort within Intermountain Healthcare between 2005 and 2018 | |
| [ | Taiwan National Health Insurance Research Database | The dataset includes 1,000,000 samples randomly from the Registry of Beneficiaries (ID) in 2010, which contained around 27.38 million people | ||
| [ | Data.world | A repository containing 10 asthma datasets | ||
| [ | CHIS open data | This dataset contains the estimated %age of Californians with asthma (asthma prevalence) | ||
| [ | Asthma dataset | BioGPS has 24 datasets of asthma on 5 species | ||
| [ | Lung Cancer | LIDC-IDRI | The dataset has computed tomography (CT) scans of lung cancer having marked-up annotated lesions | |
| [ | Lung cancer dataset | The data described pathological lung cancers in its 3 types | ||
| [ | Lung cancer dataset | The dataset has an information about the diagnosis of each lung cancer at every trial | ||
| [ | Chest X-ray 8 | The dataset comprises of 108,948 frontal-view X-ray images of 32,717 | ||
| [ | Dataset for Lung cancer diagnosis | CT and PET-CT DICOM images of lung cancer is in the dataset. It also has an XML Annotation files that locate the position of tumor with bounding boxes | ||
| [ | Covid-19 | Novel coronavirus 2019 dataset | This dataset contains daily statistics on the number of patients impacted, fatalities, and recovery from the 2019 new coronavirus | |
| [ | Covid-19 dataset | The repository contains the data of COVID-2019 from January 22, 2020, to April 1, 2020, | ||
| [ | Covid-chest X-ray dataset | The details of the patients who are positive or suspected of Covid19 are in this public open dataset | ||
| [ | Covid-19 radiography database | There are 3616 COVID-19 positive instances in the database, as well as 10,192 normal, 6012 non-COVID lung infection, and 1345 viral pneumonia photos | ||
| [ | Tuberculosis | Shenzhen dataset | 662 frontal CXR images are available in the dataset in which 335 are TB positive and 327 TB negative | |
| [ | Tuberculosis Chest X-ray dataset | The database contains 700 Tuberculosis (TB) positive chest X-ray images as well as 2800 Normal images | ||
| [ | Montgomery County X-ray Set | This collection has 138 × rays posterior-anterior side, out of which normal are 80 and abnormal are 58 that shows TB signs | ||
| [ | ImageCLEF tuberculosis | The dataset contains the CT images of TB affected patients | ||
| [ | Tuberculosis datasets | BioGPS has 16 datasets of tuberculosis on 5 species | ||
| [ | Emphysema | Exasens dataset | This repository presents a new dataset for the categorization of four types of respiratory diseases: COPD, bronchitis, infections, and Healthy Controls (HC) | |
| [ | COPD dataset | BioGPS has 10 datasets of emphysema on 5 species | ||
| [ | COPD Gene | The data identifies phenotypes on chest CT in COPD patients, such as emphysema, air trapping, and airway clot formation | ||
| [ | Danish Lung Cancer Screening Trial | In 4104 smokers and former smokers between the ages of 50 and 70, the dataset includes yearly CT screening for lung cancer compared to no screening | ||
| [ | Computed tomography emphysema database | 115 high-resolution CT (HRCT) scans and 168 square areas are carefully labeled in a sample of the slices to make up the database | ||
| [ | Mesothelioma | Mesothelioma disease dataset | The dataset contains three hundred and twenty-four patients with Mesothelioma. All of the samples in the collection have 34 characteristics | |
| [ | BioGPS | BioGPS has 1 dataset of mesotheliomaon 5 species | ||
| [ | Mesobank | MesobanK collects fresh Mesothelioma tumour samples, stored for 24 h in RNA later and then frozen at − 80 degrees | ||
| [ | ICCR mesothelioma | The dataset is made up of parts that include Mesothelioma histology, clinical care, grading, and mortality | ||
| [ | Mesothelioma disease dataset | The dataset has RNA-Seq data for Mesothelioma | ||
| [ | Pulmonary edema | MIMIC-CXR database | The collection includes 377,110 pictures from 227,835 radiography tests completed at Boston's Beth Israel Deaconess Medical Center | |
| [ | MIMIC-CXR dataset | 473,064 chest X-ray pictures and 206,574 clinical information from 63,478 edema patients make up the dataset | ||
| [ | Chest X-ray dataset | There are 112,120 unknown CXR front view pictures in this public collection, obtained from 30,805 patients | ||
| [ | MIMIC-CXR dataset | A large publically labelled chest radiographs | ||
| [ | Pneumoconiosis | ChestX-ray8 database | This database includes CXR classifications depending on the presence or exclusion of 14 radiological abnormalities | |
| [ | Pneumoconiosis radiograph dataset | The data is collected from Chongqing CDC's electronic health records, which is a complete image dataset | ||
| [ | Chest X-ray dataset | The dataset has images and diagnostic labels associated with it | ||
| [ | Chest X-ray dataset | There are posterior-anterior (PA) radiography images in the collection, some of which are totally digital and others which are digitized films | ||
| [ | Pneumoconiosis dataset | The dataset contains the chest-X ray information |
Evaluation metrics to test system performance
| Parameters | Symbols | Formulae |
|---|---|---|
| Accuracy | Acc | |
| Loss | Loss | |
| Area under the curve | AUC | |
| Sensitivity | St | |
| Specificity | Sp | |
| Recall | Re | |
| Precision | Pr | |
| F1 Score | F1 | |
| Root mean square error | RMSE |
Comparative analysis of the techniques used to predict pulmonary edema, pulmonary embolism, covid-19
| References | Diseases | Dataset | Techniques | Outcome | Findings | Challenges/remarks |
|---|---|---|---|---|---|---|
| [ | Pulmonary edema | Indiana chest X-ray dataset, JSRT dataset, Shenzen dataset | Alex Net, VGG16, VGG19, ResNet50, Adam Optimizer, ResNet101, ResNet152 | Accuracy = 52% AUC = 0.94 Sensitivity = 96% Specificity = 96% | The authors found that their network could localize the abnormalities such as Pulmonary edema successfully | The system generated degraded results when the rule based features were concatenated with the features extracted from DCN |
| [ | 330,000 chest X-ray images collected from Beth Israel Deaconess Medical Center | EM (electron microscopy), DGM(Deep Galerkin Method) | RMSE = 0.66 Pearson CC (correlation coefficient) = 0.52 | Their findings promised to help physicians provide better treatment by allowing them to quantify the degree of pulmonary edema using chest X-ray pictures | Limited ground truth labels had been used for medical image analysis | |
| [ | in situ biochemical investigation dataset | Principal Component Analysis, Random Forest | Sensitivity = 97.3% Accuracy = 96.5% Specificity = 95.5% | The findings showed that FTIR micro spectroscopy in conjunction with chemometrics might be a useful tool recognizing it pulmonary edema | The model should be able to determine the contaminative effects of breakdown components in FTIR (Fourier transform infrared spectroscopy) spectroscopy measurements | |
| [ | MIMIC-CXR dataset | DenseNet, random minority oversampling, ResNet50 | AUC = 79.1% | The authors found DL as a capable approach to estimate pulmonary edema from chest radiographs | Class imbalance | |
| [ | 40 normal images taken from Show Chwan Memorial Hospital, Changhua, Taiwan | Support vector machine, Gabor filter | AUC (area under curve) = 0.999 | The study was able to distinguish between normal and pulmonary edema lung images | The algorithms had been applied only on limited dataset which needs to be improved | |
| [ | CXR images from NIH clinical centre | CNN DenseNet model, Adam optimizer, K-fold cross validation | AUROC = 0.9164 Sensitivity = 71.493% Specificity = 10.011% | The design was able to distinguish precise information with localisation in the form of a heat map, making it simpler to discover and characterise pathologic anomalies associated with acute pulmonary edema | The system’s performance needed to be improved to enhance the evaluation values | |
| [ | 1.5 M frontal (PA) CXR studies obtained from adults over 18 years of age | RadBot-CXR(chest X-ray) | AUC = 93.6% | The authors reached a level of automatic interpretation of chest X-rays for pulmonary edema that was comparable to that of an expert | The architecture needed to address the localization of disease identified on the radiographic image | |
| [ | Pulmonary embolism (PE) | CT scan Dataset | Deep Learning, Image Classification, Long Short Term Model | IoU (Intersection over union) threshold = 50% Accuracy = 91% Precision = 68% | The model provides a quick fix by combining categorization and detection of object algorithms to increase the performance of pulmonary embolism detection | The authors used CTPA-Scans from a single CT-Scan system (software and hardware) which varied from one system to another hence decreased the accuracy |
| [ | 2800 CTPA-Scan (CT pulmonary angiogram) Dataset | Linear Regression | Sensitivity = 93% Specificity = 95.5% F1 score = 86% | The authors found that the AI prototype algorithm had a greater degree of diagnostic performance for detecting PE on CTPAs | Testing for pulmonary embolism that might influence performance measurements were incorrectly categorised by the system | |
| [ | 85 CTA lung images collected from November 2016 to May 2017 at the Wuhan Central Hospital | Generational Advanced Network | Sensitivity = 90.9% Susceptibility = 92% | The suggested computer-aided diagnostic technique successfully increased the diagnosis rate of PE | The algorithm had a significant probability of misdetection, indicating that more in-depth training datasets were required for embolism detection below grade 3 | |
| [ | DICOM images of CTPA taken since January 2019 | Deep Learning algorithm | Sensitivity = 79.6% Specificity = 95.0% | The model assessed the impact of DL-based PE identification on patient care parameters | Despite their excellent performance, the authors were unable to produce meaningful effects on medical performance metrics | |
| [ | 590 patients (460 with APE and 130 without APE) | Deep Learning, Convolution Neural Network, U-Net | Sensitivity = 94.6% Specificity = 76.5% | The AUC of DL-CNN for the identification of pulmonary embolism was high, and it may help doctors to minimize their strain | The DL-CNN model could not be trained due to a lack of big training sets | |
| [ | Stanford dataset | Convolution Neural Network, PENet | AUROC = 84% Accuracy = 81% Specificity = 82% Sensitivity = 75% | The findings worked on the hard part of diagnosing pulmonary embolism with no the need of time and cost consuming methods | Failed in recognizing the other pathogens of pulmonary embolism | |
| [ | 1427 people of an Italian National Hospital “Ospedali Riuniti di Ancona” | Machine Learning, Artificial Network, Q Analysis | Accuracy = 86% | The proposed framework was able to study partial as well as incomplete data of pulmonary embolism | The S[B]-paradigm had to be used to characterise final negative and positive diagnoses in the system | |
| [ | Covid-19 | Cohen’s dataset | CNN, HOG (Histogram of Oriented Gradients) | Accuracy = 92.95% Recall = 85% Specificity = 82% Precision = 91.5% | The suggested CNN approach has a high detection rate and is quick and easy to use | The study analyzed the resilience of the systems by reacting to real-world circumstances using restricted datasets from multiple sources |
| [ | 200 chest X-ray and 180 Covid-19 images | SVM, CNN model, ResNet50 | Accuracy: 92% | Local descriptors were improved by deep learning algorithms. Deep features as well as the SVM classification algorithm, in particular, outperformed the other techniques | The study should incorporate different imagistic patterns of Covid 19 | |
| [ | Data collected from Jan 22, 2020 to Apr 1 2020 at Johns Hopkins University | Support Vector Machine, Deep Neural Network, Long Short Term Memory, Polynomial Regression | RMSE Confirmed: 455.92 Death: 117.94 Recovered: 809.71 | In anticipating the COVID-19 transmission, the findings demonstrated that polynomial regression (PR) produced the lowest root mean square error (RMSE) score when compared to other methodologies | The study should work on more algorithms to enhance the RMSE score | |
| [ | OSR dataset of 1624 patients | Logistic regression, Naïve Bayes, KNN( K nearest neighbour), Random forest, SVM | Accuracy = 88% Sensitivity = 89% Specificity = 91% AUC = 90% | The ML algorithms provided in their research performed similarly to, but not as well as, RT-PCR for COVID-19 diagnosis | The technique only worked for those people who are covid negative. For positive patients, the model failed to detect covid in them | |
| [ | Real world data of 337 patient images | nConv net, Deep learning | Accuracy: 97.6% | This approach might aid hospital administrators and medical professionals in taking the essential actions to handle COVID-19 patients following their rapid diagnosis | System worked on the small dataset | |
| [ | Dataset collected from Joseph Paul Cohen and Paul Morrison Lan Dao | Mamta Ray Foraging Optimization, Fractional Multichannel Exponent Moments | Accuracy: 98.09% | By picking the most important traits, the suggested technique was capable of achieving both high efficiency and low resource usage | The system dealt with resource limitations and high CPU time | |
| [ | 1065 CT pathogenic images | CNN, GraphNet | Accuracy: 89.5% Specificity: 0.88 Sensitivity: 0.87 | The findings showed the achievement of rapid and accurately diagnosis of COVID-19 | The performance of deep learning models was hampered due to signal-to-noise ratio |
Comparative analysis of the techniques used to predict Mesothelioma and lung cancer
| References | Diseases | Dataset | Techniques | Outcome | Findings | Challenges/remarks |
|---|---|---|---|---|---|---|
| [ | Mesothelioma | Data collected from the Diyarbakir district of southeast Turkey (324 patients) | Apriori method, recursive feature elimination method | Lift = 1.0–1.6 Support = 0.5–1.0 Confidence = 0.5–1.0 | This study came to some important results on MM prognostic variables. Their findings revealed that histopathological variables have a role in the development of MM | Large dataset would escalate the execution time |
| [ | UCI repository dataset | Apriori Algorithm | Support 75% confidence 90% | Asbestos exposure, its length, and duration of symptoms all had a significant influence on the frequency of MM, according to the findings | To extract the rules, the model did not examine the various association mining techniques | |
| [ | Dicle University, Turkey | Random forest, Clojure classifier (CC), decision tree model, kernel logistic regression (KLR) | Accuracy = 71.29% | The findings from biopsy as well as radiological testing are good predictors of mesothelioma, according to the authors | Overfitting was a problem with decision tree models, particularly random forest, which lowered the system's accuracy | |
| [ | UCI repository | Apriori Algorithm, Association Rule Mining, Data Normalization | Confidence Support = 75% | The data showed that the length of symptoms, together with exposure to asbestos, has had a significant impact on the MM (malignant Mesothelioma) rate | If memory usage for the quantity of transitions was limited, the Apriori algorithm revealed an issue of incompetence | |
| [ | Dataset collected from UCI machine learning repository | SMOTE, ADASYN, Artificial neural network, principal component analysis | Accuracy = 96% | Their study highlighted the significant input features of malignant Mesothelioma problem | Only few factors related to Mesothelioma were studied | |
| [ | Lung Cancer | LUNA 16 and Kaggle Data Science Bowl (KDSB) | DFD-Net, denoising model, Convolution Neural Network | Accuracy = 87.8% Specificity = 89.1% | The method considerably enhanced the performance of model | The method addressed the issue of class imbalance in the information while using Kaggle dataset |
| [ | Computed Tomography Scan Images | Fuzzy Particle Swarm Optimization(FPSO),Convolution Neural Network, | Accuracy = 94.97% St = 96.68% Sp = 95.89% | This study used an input lung picture to detect malignant lung nodules and to categorise lung cancer and its degree | Incorrect classification of cancer as benign or malignant | |
| [ | Hubei Taihe Hospital (110 patients) | Naïve Bayes Classifier,Fast Correlation Based Filter, | AUC = 98.9% Sensitivity = 98.1% Specificity = 100% | The ability of metabolic indicators to identify lung tumours early had been proven | Smaller size of data used to carry the research | |
| [ | Chest X-ray dataset | Convolution Neural Network | Acc = 84.02% Specificity = 85.34% Sensitivity = 82.71% | The training strategy that had been proposed was performing superior than the already existed transfer learning method | Accuracy needed to improved by working on its features | |
| [ | Data collected from cancer imaging archive (CIA) dataset | Ensemble classifier | Accuracy = 97.6% Precision = 97.2% Recall = 97.5% F1 score = 97.5% | The system recognized the cancer with maximum accuracy | - | |
| [ | LIDC-IDRI dataset | Convolution neural network | AUC = 0.967 | The system gave a summary of the most common methods for nodule categorization as well as lung cancer prediction using CT imaging data | The author mentioned that the system efficiency for the training and testing data sets employed should be considered |
Comparative analysis of the techniques used to predict asthma, tuberculosis, cystic fibrosis
| References | Diseases | Dataset | Techniques | Outcome | Findings | Challenges/remarks |
|---|---|---|---|---|---|---|
| [ | Asthma | Asthma dataset | Decision Tree, Naive Bayes, K Nearest Neigbour | Accuracy = 96.52% | The suggested model provided a low-cost, user-friendly instrument for detecting and classifying asthma in its early stages | Limited dataset used for testing |
| [ | Clinical data of 2870 patients | Feature based time series classification | Recall = 87% Accuracy = 92% Precision = 89% F measure = 88% Specificity = 94% AUC = 87% | The researchers looked at how time—series data dynamics as well as temporal sequences affected daily asthma symptoms | The models should be trained with large sample of dataset | |
| [ | Data of 2010 patients | Principal component Analysis, Naïve Bayes | AUC = 85% Sensitivity = 90% Specificity = 83% | The authors had shown the capability to progress the early detection of asthma exacerbations when compared to traditional paper-based action plans | The data was collected manually which could have been prone to inaccuracies | |
| [ | Clinical data of 1225 patients | Normalization, Orthogonal array, Mahalanobis distance | Accuracy = 94.15% | The study developed an asthma detection algorithm that accurately detects illness | The study did not included a precise asthma risk score or a list of reference doctors | |
| [ | Tuberculosis | Data collected from Shenzhen china dataset and montgomery country dataset | CLAHE method, Deep convolution neural network, UNet architecture | Accuracy = 97.1% Specificity = 96.2% Sensitivity = 97.9% | The model provided inexpensive easily accessible, highly accurate solutions for the low-income countries who are suffering from TB disease | Even if the model achieved good sensitivity and specificity, it isn't easy to compare model performance to human performance because it hasn't been tested in the field yet |
| [ | microscopic images of 22 sputum smear | CNN | Recall = 97.1% Precision = 78% F-score = 86.7% | The model helped the physicians to detect the disease in a small amount of time to improve the clinical outcome | – | |
| [ | 100 TB ctscan images | ResNet | Accuracy = 85.2% | This research was expected for the substantial part to the discipline and encourages the use of ML approaches in the medical area | Traditional 3D CNN architectures tend to be less successful due to the limitation of the datasets as well as the features of comparable aberrant patterns of TB among the five severity levels | |
| [ | Montgomery and Shenzhen datasets | Ensemble classifier, Canny Edge Detector, Deep Learning | Accuracy = 93.59%, Specificity = 94.87% Sensitivity = 92.31% | The findings show that employing ensemble classifiers trained on a variety of characteristics derived from different types of photos improves detection performance | The model needed to be expanded to categorize chest X-rays on the basis of severity of tuberculosis, and more features needed to be investigated to improve the classifiers' performance | |
| [ | 501 Computed Tomography Scan images | Convolution Neural Network, Deep Learning | Recall = 98.7% Precision = 93.7% | The detection of pulmonary TB was successfully researched in the article, and a quantitative diagnosis report was developed after researching a large number of relevant publications and methodologies | The system failed to correctly classify pulmonary tuberculosis | |
| [ | Taipei Medical university | RF, ANN | Accuracy = 88.67% Specificity = 90.4% Sensitivity = 80% | The model gave a chance for doctors to conduct preventative actions before ATDH occurred | The models were not trained with large data sample | |
| [ | Real time dataset collected in the form CXray images | Linear regression, KNN, Naïve Bayes, Decision Trees, Random Forest, SVM, MLP | Accuracy = 98.57% Precision = 99.58% Sensitivity = 91.50% Specificity = 99.50% | The authors filled a significant vacuum in the literature by comparing and contrasting multiple algorithms for predicting tuberculosis | To enhance model hyper-parameter tuning, the authors intended to investigate single class classification approaches and assessed the use of, deep learning ensembles | |
| [ | Montgomery dataset | InceptionNet model | Accuracy = 87.5% AUC = 0.92 Sensitivity = 0.76 Specificity = 0.95 | The model offers a quick and accurate method for mass tuberculosis screening in low-resource areas across the world | Limited dataset alone gave insight into their model's intrinsic capabilities | |
| Shenzen dataset | Accuracy = 91.7% AUC = 0.96 Sensitivity = 0.89 Specificity = 0.93 | |||||
| [ | Cystic fibrosis | Sixteen clinical data of patients whose age ranged from 19–59 years | HASTE technique (Half Fourier Single Shot Turbo spin-Echo), Lasso method | Recall = 0.68 Precision = 0.016 | The researchers were able to define small intestines morpho-functional abnormalities in individuals with cystic fibrosis | Objective indicators of therapeutic responsiveness to the novel CFTR modulators, as well as gastrointestinal outcome assessments, were required |
| [ | CT scans of 194 patients | Deep Learning, Cascade Network, binary and multiclass classification | Accuracy = 94% | The findings revealed that the suggested technique may detect abnormalities and assign a grade to each illness in the initial stages of CF respiratory illness | The model needed to work on the detection of cystic fibrosis | |
| [ | To produce synthetic data, a second order ODE mathematical model of the lung was employed | Support vector machine, Naïve Bayes classifier, logistic regression, | Accuracy = 99% | The model gave real-time assistance to clinicians in making diagnostic decisions | Their work was limited by the fact that all of the training and testing was done with simulated data and that all of the classes used the similar profile | |
| [ | 1000 distinct peaks were extracted from 277 perspiration samples | Gaussian based decision tree model | Accuracy = 98% Recall = 96% Precision = 94% Specificity = 98% | The model looked at the relationship between lipid profiles and moderate and severe CFTR gene mutations | Enrolling additional patients to enhance the sample size of a genetically varied CF community might improve this research in the future |
Comparative analysis of the techniques used to predict emphysema, pneumoconiosis
| References | Diseases | Dataset | Techniques | Outcome | Findings | Challenges/remarks |
|---|---|---|---|---|---|---|
| [ | Emphysema | Benchmark dataset and manual dataset of 39 and 19 patients respectively | Improved red deer algorithm,Fuzzy C means, Adaptive local ternary pattern | Benchmark Dataset Accuracy = 94.99% F1 Score = 89.45% Sensitivity = 85.66% | Their research looked at how a deep learning technology may be used to diagnose pulmonary emphysema automatically | On the generated picture, the approach was unable to offer localization information |
Manual Dataset Accuracy = 95.56% Sensitivity = 91.6% F1 score = 95.25% | ||||||
| [ | Data from the Danish Lung Cancer Screening Trial, which included 1990 people | Proportion Net, GAP Net | AUC = 0.96 | The model's possible to identify the target was excellent enough to accurately characterize the geographical distribution of emphysema | With human-level accuracy, the geo-location was excellent enough to categorize the geographical distribution of emphysema | |
| [ | 7143 COPD Gene Participants | Deep Learning, Convolution Neural Network, Cox proportional hazard models | Confidence Interval = 95% | The technology produced a decipherable result to recognize patients who are at higher risk of death | The model might be impacted by the unique CT methodology because it was trained using just COPD Gene data | |
| [ | 126 input recordings taken from respiratory sounds database | CNN, MFCC, Librosa machine learning features, K fold cross validation | Specificity = 0.93 Sensitivity = 0.93 ICBHI score = 0.93 | Medical specialists are able to use the algorithm to diagnose COPD using breathing sounds | They can expand its functions in the future to assist physicians in diagnosing numerous other ailments and their severity | |
| [ | The Danish Lung Cancer Screening Study used 600 low-dose CT images | Multiple instance learning approach | AUC (scan level prediction) = 0.82 AUC ( region level prediction) = 0.88 | The study give accurate predictions of emphysema occurrence at both the scan and area levels | The method resulted in a higher ratio of false to true detections in the region, resulting in worse search performance | |
| [ | From COLIBRI-COPD, 1409 COPD patients were recruited | SMOTE, Random forest | Accuracy = 84% PPV = 87% Sensitivity = 59% | The system was able to detect COPD at all stages of severity | Accuracy needs to be improved | |
| [ | 9,925 CT scans collected from COPD Gene multi-center | Convolutional Neural Network | Pearson co-relation coefficient = 0.940 | Biomarkers from images were learned directly by the deep-learning regression architecture and also simplified the development of biomarker extraction algorithms | The suggested method's shortcoming was that it required two-dimensional depth of field of the structures where biomarkers were calculated, which was not practical at the time | |
| [ | HRCT scans collected from Frederikshavn (Fre) and Aalborg (Aal) datasets | MILES classifier, Cross validation, miSVM-Q classifier | miSVM-Q AUC = 95% | Their research revealed two novel multiple instance based classifiers that can detect emphysema patches in COPD people without any need for manual annotation | The quantity and proportion of the datasets were the study's key limitations | |
MILES AUC = 78.8% | ||||||
| [ | Pneumoconiosis | 1881 digital images of chest X-ray | DCNN, Inception V3 | AUC = 87.8% | The authors found that deep leaning solution could produce a comparatively superior classification performance | Limited dataset and more time complexity |
| [ | Chongqing CDC, China collected image data from August 2016 to June 2017 | ResNet 34, DenseNet40, DenseNet64, DenseNet53 | Accuracy = 88.6% Precision = 83.3% Recall = 53.6% F1 score = 65.2% | The authors suspected pneumoconiosis in radiographic images | The authors needed to work on the performance of the system to enhance its accuracy | |
| [ | 5424 chest radiographic images | CAD algorithm, McNemar test | Sensitivity = 0.89 to 0.98 Specificity = 0.68 to 0.86 Kappa values = 0.57to 0.84 | The findings of investigation showed that CAD can significantly increase performance of pneumoconiosis diagnosis | The research only covered a limited amount of patients, and the outcomes were inconsistent | |
| [ | 405 pneumoconiosis patients | Convolution Neural Network | Accuracy = 97.3% Sensitivity = 98.1% Specificity = 97% | In terms of pneumoconiosis grading accuracy, their study beat two categories of radiologists | The model was unable to distinguish pneumoconiosis from those other lung illnesses with comparable pathologies | |
| [ | Original pulse signal | Support vector machine, RBF kernel function | Precision = 100% Recall = 86.85% F-measure = 92.96% Accuracy = 88.31% | The researchers' proposed technique of detecting coal miners' pulse signals will remind coal workers of the need of verifying and preventing illness progression | In the future, the authors would need to optimize the goal function to make the entire system greater realistic, taking into account the various consequences of misinterpretation of diagnosis | |
| [ | Pneumoconiosis data | Transfer learning,CNN | Specificity = 87% Sensitivity = 95% AUC = 94% | Both transfer learning strategies outperformed beginning from scratch with insufficient training data, according to the findings of the experiments | It was difficult to satisfy the need of information using deep learning since chest X-rays utilised in pneumoconiosis was limited due to sensitive patient data | |
| [ | Dataset taken from National Institute for Occupational Safety and Health website (NIOSH) | Autoencoder, SVM, CheXNet, Multilayer perceptron | Sensitivity = 93.3% Specificity = 88.46% Accuracy = 90.24% | When training datasets are skewed or lack variety, the cascaded machine learning architecture might be employed in various medical image analysis | Future research included a pilot study in which the approach is tested in a clinical environment using human readers |
Fig. 5Architecture based on multi-pathway CNN to detect lung cancer [107]
Fig. 6Ways to predict asthma exacerbation patients
Fig. 7Architecture of cascade approach for the detection of cystic fibrosis [128]
Overall comparison of models
| Ref | Diseases | Techniques | Accuracy |
|---|---|---|---|
| [ | Pulmonary Embolism | Deep Learning, Image Classification, Long Short Term Model | 91% |
| [ | Pulmonary Edema | Principal Component Analysis, Random Forest | 96.5% |
| [ | Cystic fibrosis | Support vector machine, Naïve Bayes classifier, logistic regression | 99% |
| [ | Pneumoconiosis | Convolution Neural Network | 97.3% |
| [ | Lung cancer | Ensemble classifier | 97.6% |
| [ | Asthma | Decision Tree, Naive Bayes, K Nearest Neighbour | 96.52% |
| [ | Covid-19 | Mamta Ray Foraging Optimization, Fractional Multichannel Exponent Moments | 98.09% |
| [ | Mesothelioma | SMOTE, ADASYN, Artificial neural network, principal component analysis | 96% |
| [ | Tuberculosis | CLAHE method, Deep convolution neural network, UNet architecture | 97.1% |
| [ | Emphysema | Improved red deer algorithm, Fuzzy C Means, Adaptive local ternary pattern | 95.56% |
Fig. 8Distribution of papers
Fig. 9Analysis between Pulmonologists and AI techniques in predicting the diseases
Fig. 10Airway Diseases Prediction by AI and Pulmonologists
Fig. 11Machine and deep learning based prediction models