Literature DB >> 36189431

Artificial Intelligence Techniques to Predict the Airway Disorders Illness: A Systematic Review.

Apeksha Koul¹, Rajesh K Bawa², Yogesh Kumar³.

Abstract

Airway disease is a major healthcare issue that causes at least 3 million fatalities every year. It is also considered one of the foremost causes of death all around the globe by 2030. Numerous studies have been undertaken to demonstrate the latest advances in artificial intelligence algorithms to assist in identifying and classifying these diseases. This comprehensive review aims to summarise the state-of-the-art machine and deep learning-based systems for detecting airway disorders, envisage the trends of the recent work in this domain, and analyze the difficulties and potential future paths. This systematic literature review includes the study of one hundred fifty-five articles on airway diseases such as cystic fibrosis, emphysema, lung cancer, Mesothelioma, covid-19, pneumoconiosis, asthma, pulmonary edema, tuberculosis, pulmonary embolism as well as highlights the automated learning techniques to predict them. The study concludes with a discussion and challenges about expanding the efficiency and machine and deep learning-assisted airway disease detection applications.

© The Author(s) under exclusive licence to International Center for Numerical Methods in Engineering (CIMNE) 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Entities: Chemical

Year: 2022 PMID： 36189431 PMCID： PMC9516534 DOI： 10.1007/s11831-022-09818-4

Source DB: PubMed Journal: Arch Comput Methods Eng ISSN： 1134-3060 Impact factor: 8.171

Introduction

The airway diseases have been called the worldwide source of sickness, as they damage the air tubes that carry oxygen and other gases into and out of the lungs. These diseases generally induce a constriction or obstruction of the airways. People with airway illnesses often remark that they are "trying to breathe out through a straw." Airway disease is a fundamental cause of mortality and disability as it touches at least 65 million people and kills 3 million individuals yearly. Hence it is ranked as the 3rd most prevalent reason for death around the globe overall [1]. As shown in Fig. 1, chronic respiratory disorders fatalities rose from 3.32 million in 1990 to 3.91 million in 2017. Annually, the age-standardized mortality rate for chronic respiratory diseases fell by 2.41% (2.28% to 2.55%), but in 195 nations, the annual fatality rates attributable to chronic respiratory illnesses have shown a great variation. The highest mortality was seen in regions with a low Socio-demographic index [2]. Particulate matter pollution, such as pneumoconiosis and asthma, has been the leading cause of COPD fatalities in areas with a low Socio-demographic index [3]. In addition, one of the studies also states that 40,000 people have been suffering from cystic fibrosis in the US [4], 10 million tuberculosis cases have been estimated, and 1.4 million are reported TB deaths in 2019 [5]. In contrast, lung cancer has been termed the leading cause of cancer for approx 350 deaths each day in the US [6].

Fig. 1

Global impact of airway diseases [2]

Global impact of airway diseases [2] After considering these numbers, several physicians, clinicians, and medical experts from all over the world have tried their best possible way to predict, detect and diagnose these dreadful diseases, for which they used conventional and AI-based approaches that are described in the upcoming subsections.

Conventional Methods to Diagnose Airway Diseases

Small airways account for between 10 and 25% of total airway resistance in healthy lungs, with their contribution to total airway resistance increasing significantly in multiple airway disorders. It is because the internal diameter of the tiny airways is less than 2 mm, and there is no cartilage [5]. Hence in Table 1, a few traditional ways such as spirometry, body plethysmography, impulse oscillometry, and washout tests used for diagnosing airway disorders in patients are shown, along with their brief description and drawbacks.

Table 1

Traditional approaches to diagnose airway diseases

Technique	Description	Drawbacks
Spirometry [7]	The most common pulmonary function test is spirometry. It assesses lung function, precisely the amount of air that can be inhaled and expelled and the rate at which it can be done	Patient feels dizziness and shortness of breath for a moment after the test
Body plethysmography [8]	A pulmonary (lung-related) function test that evaluates how much air is in your lungs when you take a deep breath is called body plethysmography. It also determines how much air is left in your lungs after exhaling as much as possible	Technically demanding and time consuming
Impulse oscillometry [9]	A new technique for measuring airway resistance and reactance is the impulse oscillation system (IOS). It's a sort of forced oscillation in which oscillating sound waves of various frequencies, typically 5 and 20 Hz, are conveyed along the bronchial tree	The impulse used in the test is forceful which causes slight change in lung mechanism
Washout tests [10]	Nitrogen washout (also known as Fowler's method) is a test that measures anatomic dead space in the lungs and other factors related to airway closure throughout a breathing cycle	It consumes a lot of time and also fails to estimate the lung areas which are poorly ventilated

Traditional approaches to diagnose airway diseases

AI Techniques to Predict Airway Diseases

Artificial intelligence has shown tremendous growth in the health sector in today's era via its various applications, as shown in Fig. 2. After having a brief idea about the drawbacks of traditional approaches to diagnosing airway diseases, artificial intelligence has demonstrated its efficiency and excellent performance in automatic image categorization using multiple machine and deep learning algorithms to detect numerous airway illnesses [10]. Researchers from various disciplines are steadily amassing evidence to support the use of AI in diagnosing airway sickness using models that can learn and make choices utilizing massive input data [11]. The fundamental rationale for using learning models is that these approaches learn by constructing a more abstract representation of input (unlike classical machine learning) in which the model collects information automatically and produces more accurate results [12].

Fig. 2

Role of AI in healthcare

Role of AI in healthcare Researchers from many fields are progressively accumulating data to support the usage of AI to diagnose airway illness. Researchers have applied various algorithms of the machine and deep learning, as discussed in Sect. 5, to conduct an extensive assessment to detect multiple airway diseases. A corresponding spike has been seen in the medical applications of AI, particularly in pulmonology. Applications of AI to the global problem of airway illnesses can meet the most priority-based requirements highlighted for accurately detecting, diagnosing, and providing the best treatments [13]. A deep learning system generated results nearly equal to a group of thoracic radiologists identifying fibrotic lung disease in research [14]. Another study [15] found that a neural network developed by Google scientists was just as good as radiologists in detecting cancerous lung nodules. A similar model [16] could identify and predict acute respiratory illness episodes and death in smokers along with chronic obstructive pulmonary disease (COPD). Large amounts of well-structured data are required to create and validate AI algorithms, and the algorithms must operate with data of varying quality [17]. Clinicians must grasp how AI works in the context of multiple disorders such as asthma and chronic obstructive pulmonary disease. As a result, it would be interesting to witness the pros of AI (Artificial Intelligence) developed for doctors and patients due to its usage in medical practice in the future [18]. Other AI tools have also been used tremendously in medical science, such as robotics, natural language processing, expert systems, etc. In robotics, more than 2 lakh robots have been installed annually and are used to perform prostate surgery, head-neck surgery, update patient records, etc. Likewise, natural language processing (NLP) helps to analyze, understand and classify the unstructured format of clinical documentation and the interaction of patients with doctors via bots [19]. Thusly, to fully understand the role of techniques using artificial intelligence concepts to predict and diagnose airway diseases, a systematic literature review (SLR) has been conducted based on the AI learning models to predict such illnesses. The study is structured into various sections. Initially, a brief description of airway diseases and their global impact has been briefed in Sect. 1. This section also includes the traditional approaches as well as the role of AI in diagnosing airway diseases. Section 2 presents the qualities of the paper used in the study, while Sect. 3 presents the framework to predict airway diseases using AI-based techniques. Section 4 provides a comparison of various methods based on their dataset, applied algorithms with their results in terms of different metrics such as precision, accuracy, F1-score, etc., and the limitations of multiple airway diseases such as Mesothelioma, cystic fibrosis, emphysema, pulmonary edema, pneumoconiosis, lung cancer, pulmonary embolism, tuberculosis, asthma, and covid-19. The discussion segment, which answers the research questions from Sect. 2, is handled in Sect. 5. Finally, Sect. 6 concludes the study that assists researchers in determining the optimal technique for detecting disorders and the future scope.

Materials and Methods

Full-text archives in the English language of six different publication databases have been searched between 2010 and 2022 such as (ScienceDirect (https://www.sciencedirect.com), Google Scholar (https://scholar.google.co.in), Scopus (https://www.scopus.com), Web of Science (http://isiwebofknowledge.com), PubMed (http://www.ncbi.nlm.nih.gov/pubmed), EMBASE (https://www.embase.com). The paper has been searched using the keywords "airway diseases", "cystic fibrosis"," pneumoconiosis"," pulmonary embolism", "pulmonary edema", "Mesothelioma", "lung cancer ", "tuberculosis", "asthma ", "covid-19"," emphysema", "machine learning" "artificial intelligence," "deep learning," as well as combinations of these keywords. In addition to this, the articles have also been chosen based on inclusion and exclusion parameters (Table 2) which are based on various metrics such as time duration, scholarly articles from which the paper can be accessed, research to highlight the problem, a comparator for analyzing and comparing the researchers' work, methodologies to demonstrate the strategies that had been used in their articles, and finally research design to analyze the results.

Table 2

Inclusion standards and exclusion standards

S. no.	Attributes	Inclusion standards	Exclusion standards
1	Duration	Research work that had been carried out between 2010 and 2022	Published articles before 2010
2	Exploration	Research work concentrating on (a) the findings, (b) the benchmark dataset, and (c) the research goal	Research work that focus on other diseases and not on airway diseases
3	Comparability	Research studies aims at the prediction of airway diseases	Research studies that work on other than airway diseases
4	Techniques	Research articles that mostly focus on machine and deep learning methods including few traditional ones	Research articles that apply the methods other than machine and deep learning models
5	Research design	Original articles that comprise of experimental results	Case studies, Language other than English, Patents

Inclusion standards and exclusion standards In this systematic literature review, PRISMA (Preferred reporting items for systematic Reviews and Meta-Analyses) guidelines have been applied in which four phases have been used to select the research papers (Fig. 3), i.e., Identification- in which the identification of records is carried out by accessing various repositories, Screening- in which the papers are selected transparently by assaying the decisions that are made at different stages of the systematic review, Eligibility- in which all full-length articles are evaluated [20] and finally Included- in which the final selected articles are included to write the review paper. PRISMA is chosen as it helps improve the reporting and transparency of systematic reviews and meta-analyses. It is also useful for the readers to understand how the authors have filtered out the selected papers by using keywords, year of publication, language, etc., to frame any article.

Fig. 3

PRISMA flow chart

PRISMA flow chart As far as this paper is concerned, for a better understanding of the state of research on machine and deep learning in airway disease detection, peer-reviewed papers stated that these algorithms had played a vital role in predicting such disorders. In addition to this, a few research questions have also been framed that were investigated in the study: RQ 1: Year wise analysis of predicting multiple airway diseases using AI based techniques. RQ 2: How doctors are being helped by deep and machine learning techniques in detecting the airway diseases? RQ 3: Which ML and DL techniques are broadly applied to predict airway diseases? RQ 4: Name the characteristics that manipulate the quality of prediction models based on deep and machine learning?

Framework to Predict Multiple Airway Diseases

In this section, various phases to predict and classify airway diseases have been mentioned and diagrammatically shown in Fig. 4.

Fig. 4

Predicting airway diseases using multiple learning models

Predicting airway diseases using multiple learning models Dataset: The foremost step is to gather the images from various repositories or datasets so that the system can classify them by learning them. It is essential to feed the system with many images for better classification. The data for predicting airway diseases like pulmonary embolism, emphysema, tuberculosis, pulmonary edema, pneumoconiosis, cystic fibrosis, asthma, covid 19, and lung cancer has been collected from multiple data sources X-ray, CTscan (Computed Tomography), histopathology image, etc. Table 3 shows the detailed description of the dataset for all the respective airway diseases.

Table 3

Dataset of multiple airway diseases

References	Diseases	Dataset name	Description	URLs
[21]	Cystic fibrosis (CF)	Cystic fibrosis data	The dataset contains one file and 25 columns	https://www.kaggle.com/ukveteran/cystic-fibrosis-data
[22]		Cystic fibrosis registry in the United Kingdom	The dataset contains the geographical information about the cystic fibrosis patients	https://www.cysticfibrosis.org.uk/the-work-we-do/uk-cf-registry
[23]		European cystic fibrosis registry	The database includes data from more than 49,000 CF people from 38 countries, and from 2008 to 2018	https://www.ecfs.eu/projects/ecfs-patient-registry/intro
[24]		Cystic fibrosis dataset	BioGPS has 10 datasets of cystic fibrosis on 5 species	http://biogps.org/dataset/tag/cystic%20fibrosis/
[25]		Cystic fibrosis data and statistics	State of residency, weight, height, sexuality, race, respiratory function test results, pancreatic enzyme usage, duration of hospitalizations, home IVs, and CF-related problems are among the information collected	https://www.health.ny.gov/statistics/diseases/chronic/cystic_fibrosis/
[26]	Pulmonary embolism	RNSA STR (pulmonary embolism detection)	The dataset includes the training and testing images, which each include 17 data fields related to PE	https://www.kaggle.com/c/rsna-str-pulmonary-embolism-detection
[27]		Pulmonary embolism in CT images	The dataset comprises of CT scan images taken from 35 different individuals	https://www.kaggle.com/andrewmvd/pulmonary-embolism-in-ct-images
[28]		Pulmonary embolism_codelist	The ICD-10 codes for pulmonary embolism diagnosis are included in the dataset	https://datacompass.lshtm.ac.uk/id/eprint/734/
[29]		Chinese Clinical Trials Registry Center	The dataset, which includes both APE and non-APE, was chosen retrospectively based on medical diagnosis	http://www.chictr.org/en/
[30]		FUMPE dataset	The dataset has 35 patients suffering from pulmonary embolism and fall within the age range of 24 to 82	https://figshare.com/authors/Moj-taba_Masoudi/5215238
[31]	Asthma	Informatica	The data set covered all visits by the asthma patient cohort within Intermountain Healthcare between 2005 and 2018	https://www.informatica.com/in/about-us/customers/customer-success-stories/intermountain-healthcare.html
[32]		Taiwan National Health Insurance Research Database	The dataset includes 1,000,000 samples randomly from the Registry of Beneficiaries (ID) in 2010, which contained around 27.38 million people	https://nhird.nhri.org.tw/en/
[33]		Data.world	A repository containing 10 asthma datasets	https://data.world/datasets/asthma
[34]		CHIS open data	This dataset contains the estimated %age of Californians with asthma (asthma prevalence)	https://data.chhs.ca.gov/dataset/01f456c3-db34-44f2-a52c-6811bef8ba6d/resource/13d6472c-1b35-4d4e-9c66-e643941de459/download/current-asthma-prev]
[35]		Asthma dataset	BioGPS has 24 datasets of asthma on 5 species	http://biogps.org/dataset/tag/asthma/
[36]	Lung Cancer	LIDC-IDRI	The dataset has computed tomography (CT) scans of lung cancer having marked-up annotated lesions	https://wiki.cancerimagingarchive.net/display/Public/LIDC-IDRI
[37]		Lung cancer dataset	The data described pathological lung cancers in its 3 types	https://archive.ics.uci.edu/ml/datasets/lung+cancer
[38]		Lung cancer dataset	The dataset has an information about the diagnosis of each lung cancer at every trial	https://cdas.cancer.gov/datasets/nlst/
[39]		Chest X-ray 8	The dataset comprises of 108,948 frontal-view X-ray images of 32,717	https://paperswithcode.com/dataset/chestx-ray8
[40]		Dataset for Lung cancer diagnosis	CT and PET-CT DICOM images of lung cancer is in the dataset. It also has an XML Annotation files that locate the position of tumor with bounding boxes	https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=70224216
[41]	Covid-19	Novel coronavirus 2019 dataset	This dataset contains daily statistics on the number of patients impacted, fatalities, and recovery from the 2019 new coronavirus	https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset
[42]		Covid-19 dataset	The repository contains the data of COVID-2019 from January 22, 2020, to April 1, 2020,	https://coronavirus.jhu.edu/map.html
[43]		Covid-chest X-ray dataset	The details of the patients who are positive or suspected of Covid19 are in this public open dataset	https://github.com/ieee8023/covid-chestxray-dataset
[44]		Covid-19 radiography database	There are 3616 COVID-19 positive instances in the database, as well as 10,192 normal, 6012 non-COVID lung infection, and 1345 viral pneumonia photos	https://www.kaggle.com/tawsifurrahman/covid19-radiography-database
[45]	Tuberculosis	Shenzhen dataset	662 frontal CXR images are available in the dataset in which 335 are TB positive and 327 TB negative	https://www.kaggle.com/raddar/tuberculosis-chest-xrays-shenzhen
[46]		Tuberculosis Chest X-ray dataset	The database contains 700 Tuberculosis (TB) positive chest X-ray images as well as 2800 Normal images	https://www.kaggle.com/tawsifurrahman/tuberculosis-tb-chest-xray-dataset
[47]		Montgomery County X-ray Set	This collection has 138 × rays posterior-anterior side, out of which normal are 80 and abnormal are 58 that shows TB signs	https://lhncbc.nlm.nih.gov/LHC-publications/pubs/TuberculosisChestXrayImageDataSets.html
[48]		ImageCLEF tuberculosis	The dataset contains the CT images of TB affected patients	https://www.imageclef.org/2018/tuberculosis
[49]		Tuberculosis datasets	BioGPS has 16 datasets of tuberculosis on 5 species	http://biogps.org/dataset/tag/tuberculosis/
[50]	Emphysema	Exasens dataset	This repository presents a new dataset for the categorization of four types of respiratory diseases: COPD, bronchitis, infections, and Healthy Controls (HC)	https://archive.ics.uci.edu/ml/datasets/Exasens
[51]		COPD dataset	BioGPS has 10 datasets of emphysema on 5 species	http://biogps.org/dataset/tag/copd/
[52]		COPD Gene	The data identifies phenotypes on chest CT in COPD patients, such as emphysema, air trapping, and airway clot formation	http://www.copdgene.org/study-design
[53]		Danish Lung Cancer Screening Trial	In 4104 smokers and former smokers between the ages of 50 and 70, the dataset includes yearly CT screening for lung cancer compared to no screening	https://clinicaltrials.gov/ct2/show/NCT00496977
[54]		Computed tomography emphysema database	115 high-resolution CT (HRCT) scans and 168 square areas are carefully labeled in a sample of the slices to make up the database	https://lauge-soerensen.github.io/emphysema-database/
[55]	Mesothelioma	Mesothelioma disease dataset	The dataset contains three hundred and twenty-four patients with Mesothelioma. All of the samples in the collection have 34 characteristics	https://archive.ics.uci.edu/ml/datasets/Mesothelioma%C3%A2%E2%82%AC%E2%84%A2s+disease+data+set+
[56]		BioGPS	BioGPS has 1 dataset of mesotheliomaon 5 species	http://biogps.org/dataset/tag/malignant%20mesothelioma/
[57]		Mesobank	MesobanK collects fresh Mesothelioma tumour samples, stored for 24 h in RNA later and then frozen at − 80 degrees	https://www.mesobank.com/samples-and-research/?lang=en
[58]		ICCR mesothelioma	The dataset is made up of parts that include Mesothelioma histology, clinical care, grading, and mortality	http://www.iccr-cancer.org/datasets/published-datasets/thorax/mesothelioma
[59]		Mesothelioma disease dataset	The dataset has RNA-Seq data for Mesothelioma	https://ega-archive.org/datasets/EGAD00001001915
[60]	Pulmonary edema	MIMIC-CXR database	The collection includes 377,110 pictures from 227,835 radiography tests completed at Boston's Beth Israel Deaconess Medical Center	https://physionet.org/content/mimic-cxr-pe-severity/1.0.1/
[61]		MIMIC-CXR dataset	473,064 chest X-ray pictures and 206,574 clinical information from 63,478 edema patients make up the dataset	https://lcp.mit.edu/mimic
[62]		Chest X-ray dataset	There are 112,120 unknown CXR front view pictures in this public collection, obtained from 30,805 patients	https://clinicalcenter.nih.gov/
[63]		MIMIC-CXR dataset	A large publically labelled chest radiographs	https://sciwheel.com/work/signin?targetUrl=%2Fwork%2F%23%2Fitems%2F7368080
[64]	Pneumoconiosis	ChestX-ray8 database	This database includes CXR classifications depending on the presence or exclusion of 14 radiological abnormalities	https://nihcc.app.box.com/v/ChestXrayNIHCC
[65]		Pneumoconiosis radiograph dataset	The data is collected from Chongqing CDC's electronic health records, which is a complete image dataset	https://cloud.tsinghua.edu.cn/f/d8324c2 5dbb744b183df/
[66]		Chest X-ray dataset	The dataset has images and diagnostic labels associated with it	https://www.cdc.gov/niosh/index.htm
[67]		Chest X-ray dataset	There are posterior-anterior (PA) radiography images in the collection, some of which are totally digital and others which are digitized films	https://www.ilo.org/safework/info/WCMS_108548/lang--en/index.htm
[68]		Pneumoconiosis dataset	The dataset contains the chest-X ray information	https://github.com/liyu10000/pneumoconiosis

Dataset of multiple airway diseases Pre-processing: After collecting the images from various repositories, it is our prime duty to pre-process them before training our system with them. It is because the images could be blurred, noisy or their features are not visible, and training the system with such images can affect the accuracy performance and generate the wrong output, which would be risky for people's health. There are various techniques by which the image can be pre-processed, such as Contrast Limited Adaptive Histogram Equalization (CLAHE) for increasing the image contrast, geometric transformations, image filtering, etc. [69]. Data pre-processing also includes data augmentation to increase the dataset size without collecting the new data and reduce overfitting. Data normalization is the process of organizing the data in a structured way. Multiple images are put into a common statistical distribution in terms of size and pixel values so that there can be a change in the range of pixel intensity values [70]. Resizing images is a critical pre-processing step since neural networks receive inputs of the same size. All images need to be resized to a fixed size before sending them as an input to the convolution neural network. After this, feature extraction could be conducted on the training images so that those features will feed the learning models to identify or predict the class of any new image. This process produces images that have been altered or modified and will be utilized in the training phase [71]. Learning models: Modern systems are deemed artificially intelligent when they use the machine and deep learning methods, which allows the computer (the machine) to understand tasks from an ever-changing dataset. Thanks to recent breakthroughs in learning algorithms and processing speed machines, deep learning models have become possible for many prediction problems [72]. Hence in this phase, machine learning algorithms, deep learning algorithms, transfer learning, and ensemble learning models can be selected based on various factors such as the size of the dataset, complexity of the data, etc. CNN (convolution neural network), which is excellent for finding image patterns, is a perfect algorithm for classifying things (and many other tasks involving images). CNN's, like neural networks in the real brain, are made up of neurons with trainable biases and weights that receive various inputs [73]. The information's weighted sum is then calculated. After that, the weighted total is input into an activation function, which results. As far as transfer learning is concerned, VGG16 (Visual Geometry Group), VGG19, MobileNetV2, ResNet50 (Residual Neural Network), and other pre-trained models are frequently trained on massive datasets, a standard benchmark in the computer vision area. These models can be used directly to predict new tasks or as part of a model's training process. Moreover, these techniques, such as transfer and ensemble techniques, are also used to minimize training time, enhance the accuracy of the classification and prevent modeling errors in the system [74]. Classification: In this phase, the trained model will determine which class a picture belongs to, such as emphysema, cystic fibrosis, pneumoconiosis, pulmonary embolism, pulmonary edema, asthma, mesothelioma, tuberculosis, covid-19, and lung cancer. If the image does not correspond to either of these disorders, the model should suggest that it is a normal lung image. To verify the system's performance, the model that correctly classifies the image should be evaluated using specific evaluative metrics such as accuracy, loss, precision, recall, F1 score, and so on (Table 4) [75-79].

Table 4

Evaluation metrics to test system performance

Parameters	Symbols	Formulae
Accuracy	Acc	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{True Negative+True Positive}{True Positive+False Positive+True Negative+False Negative}$$\end{document}TrueNegative+TruePositiveTruePositive+FalsePositive+TrueNegative+FalseNegative
Loss	Loss	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{{(Actual Value-Predicted Value)}^{2}}{Number of observations}$$\end{document}(ActualValue-PredictedValue)2Numberofobservations
Area under the curve	AUC	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\underset{x\to \infty }{\mathrm{lim}}\sum_{i=1}^{n}f(x)$$\end{document}limx→∞∑i=1nf(x)
Sensitivity	St	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{True Positive}{True Positive+False Positive}$$\end{document}TruePositiveTruePositive+FalsePositive
Specificity	Sp	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{True Negative}{True Negative+False Negative}$$\end{document}TrueNegativeTrueNegative+FalseNegative
Recall	Re	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{True Positive}{True Positive+False Negative}$$\end{document}TruePositiveTruePositive+FalseNegative
Precision	Pr	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{True Positive}{True Positive+False Positive}$$\end{document}TruePositiveTruePositive+FalsePositive
F1 Score	F1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{2PrecisionRecall}{Precision+Recall}$$\end{document}2∗Precision∗RecallPrecision+Recall
Root mean square error	RMSE	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{\sum_{{\varvec{i}}=1}^{{\varvec{n}}}\frac{{({\widehat{{\varvec{y}}}}_{{\varvec{i}}}-{{\varvec{y}}}_{{\varvec{i}}})}^{2}}{{\varvec{n}}}}$$\end{document}∑i=1n(y^i-yi)2n;

Evaluation metrics to test system performance

Background

The work done by researchers to forecast the numerous airway diseases has been discussed in three subsections, i.e., pulmonary edema, pulmonary embolism, and covid 19; Mesothelioma and lung cancer; asthma, tuberculosis, and cystic fibrosis; emphysema and pneumoconiosis. To undertake comparative analysis, the goal or objective of their study has been initially briefed. Further, the datasets, procedures, outcomes, findings, and limits are given in the tabular form, i.e., from Table 5, 6, 7, 8, followed by the overall analysis.

Table 5

Comparative analysis of the techniques used to predict pulmonary edema, pulmonary embolism, covid-19

References	Diseases	Dataset	Techniques	Outcome	Findings	Challenges/remarks
[79]	Pulmonary edema	Indiana chest X-ray dataset, JSRT dataset, Shenzen dataset	Alex Net, VGG16, VGG19, ResNet50, Adam Optimizer, ResNet101, ResNet152	Accuracy = 52% AUC = 0.94 Sensitivity = 96% Specificity = 96%	The authors found that their network could localize the abnormalities such as Pulmonary edema successfully	The system generated degraded results when the rule based features were concatenated with the features extracted from DCN
[80]		330,000 chest X-ray images collected from Beth Israel Deaconess Medical Center	EM (electron microscopy), DGM(Deep Galerkin Method)	RMSE = 0.66 Pearson CC (correlation coefficient) = 0.52	Their findings promised to help physicians provide better treatment by allowing them to quantify the degree of pulmonary edema using chest X-ray pictures	Limited ground truth labels had been used for medical image analysis
[82]		in situ biochemical investigation dataset	Principal Component Analysis, Random Forest	Sensitivity = 97.3% Accuracy = 96.5% Specificity = 95.5%	The findings showed that FTIR micro spectroscopy in conjunction with chemometrics might be a useful tool recognizing it pulmonary edema	The model should be able to determine the contaminative effects of breakdown components in FTIR (Fourier transform infrared spectroscopy) spectroscopy measurements
[83]		MIMIC-CXR dataset	DenseNet, random minority oversampling, ResNet50	AUC = 79.1%	The authors found DL as a capable approach to estimate pulmonary edema from chest radiographs	Class imbalance
[84]		40 normal images taken from Show Chwan Memorial Hospital, Changhua, Taiwan	Support vector machine, Gabor filter	AUC (area under curve) = 0.999	The study was able to distinguish between normal and pulmonary edema lung images	The algorithms had been applied only on limited dataset which needs to be improved
[85]		CXR images from NIH clinical centre	CNN DenseNet model, Adam optimizer, K-fold cross validation	AUROC = 0.9164 Sensitivity = 71.493% Specificity = 10.011%	The design was able to distinguish precise information with localisation in the form of a heat map, making it simpler to discover and characterise pathologic anomalies associated with acute pulmonary edema	The system’s performance needed to be improved to enhance the evaluation values
[86]		1.5 M frontal (PA) CXR studies obtained from adults over 18 years of age	RadBot-CXR(chest X-ray)	AUC = 93.6%	The authors reached a level of automatic interpretation of chest X-rays for pulmonary edema that was comparable to that of an expert	The architecture needed to address the localization of disease identified on the radiographic image
[87]	Pulmonary embolism (PE)	CT scan Dataset	Deep Learning, Image Classification, Long Short Term Model	IoU (Intersection over union) threshold = 50% Accuracy = 91% Precision = 68%	The model provides a quick fix by combining categorization and detection of object algorithms to increase the performance of pulmonary embolism detection	The authors used CTPA-Scans from a single CT-Scan system (software and hardware) which varied from one system to another hence decreased the accuracy
[88]		2800 CTPA-Scan (CT pulmonary angiogram) Dataset	Linear Regression	Sensitivity = 93% Specificity = 95.5% F1 score = 86%	The authors found that the AI prototype algorithm had a greater degree of diagnostic performance for detecting PE on CTPAs	Testing for pulmonary embolism that might influence performance measurements were incorrectly categorised by the system
[98]		85 CTA lung images collected from November 2016 to May 2017 at the Wuhan Central Hospital	Generational Advanced Network	Sensitivity = 90.9% Susceptibility = 92%	The suggested computer-aided diagnostic technique successfully increased the diagnosis rate of PE	The algorithm had a significant probability of misdetection, indicating that more in-depth training datasets were required for embolism detection below grade 3
[99]		DICOM images of CTPA taken since January 2019	Deep Learning algorithm	Sensitivity = 79.6% Specificity = 95.0%	The model assessed the impact of DL-based PE identification on patient care parameters	Despite their excellent performance, the authors were unable to produce meaningful effects on medical performance metrics
[89]		590 patients (460 with APE and 130 without APE)	Deep Learning, Convolution Neural Network, U-Net	Sensitivity = 94.6% Specificity = 76.5%	The AUC of DL-CNN for the identification of pulmonary embolism was high, and it may help doctors to minimize their strain	The DL-CNN model could not be trained due to a lack of big training sets
[100]		Stanford dataset	Convolution Neural Network, PENet	AUROC = 84% Accuracy = 81% Specificity = 82% Sensitivity = 75%	The findings worked on the hard part of diagnosing pulmonary embolism with no the need of time and cost consuming methods	Failed in recognizing the other pathogens of pulmonary embolism
[90]		1427 people of an Italian National Hospital “Ospedali Riuniti di Ancona”	Machine Learning, Artificial Network, Q Analysis	Accuracy = 86%	The proposed framework was able to study partial as well as incomplete data of pulmonary embolism	The S[B]-paradigm had to be used to characterise final negative and positive diagnoses in the system
[91]	Covid-19	Cohen’s dataset	CNN, HOG (Histogram of Oriented Gradients)	Accuracy = 92.95% Recall = 85% Specificity = 82% Precision = 91.5%	The suggested CNN approach has a high detection rate and is quick and easy to use	The study analyzed the resilience of the systems by reacting to real-world circumstances using restricted datasets from multiple sources
[92]		200 chest X-ray and 180 Covid-19 images	SVM, CNN model, ResNet50	Accuracy: 92%	Local descriptors were improved by deep learning algorithms. Deep features as well as the SVM classification algorithm, in particular, outperformed the other techniques	The study should incorporate different imagistic patterns of Covid 19
[93]		Data collected from Jan 22, 2020 to Apr 1 2020 at Johns Hopkins University	Support Vector Machine, Deep Neural Network, Long Short Term Memory, Polynomial Regression	RMSE Confirmed: 455.92 Death: 117.94 Recovered: 809.71	In anticipating the COVID-19 transmission, the findings demonstrated that polynomial regression (PR) produced the lowest root mean square error (RMSE) score when compared to other methodologies	The study should work on more algorithms to enhance the RMSE score
[94]		OSR dataset of 1624 patients	Logistic regression, Naïve Bayes, KNN( K nearest neighbour), Random forest, SVM	Accuracy = 88% Sensitivity = 89% Specificity = 91% AUC = 90%	The ML algorithms provided in their research performed similarly to, but not as well as, RT-PCR for COVID-19 diagnosis	The technique only worked for those people who are covid negative. For positive patients, the model failed to detect covid in them
[95]		Real world data of 337 patient images	nConv net, Deep learning	Accuracy: 97.6%	This approach might aid hospital administrators and medical professionals in taking the essential actions to handle COVID-19 patients following their rapid diagnosis	System worked on the small dataset
[96]		Dataset collected from Joseph Paul Cohen and Paul Morrison Lan Dao	Mamta Ray Foraging Optimization, Fractional Multichannel Exponent Moments	Accuracy: 98.09%	By picking the most important traits, the suggested technique was capable of achieving both high efficiency and low resource usage	The system dealt with resource limitations and high CPU time
[97]		1065 CT pathogenic images	CNN, GraphNet	Accuracy: 89.5% Specificity: 0.88 Sensitivity: 0.87	The findings showed the achievement of rapid and accurately diagnosis of COVID-19	The performance of deep learning models was hampered due to signal-to-noise ratio

Table 6

Comparative analysis of the techniques used to predict Mesothelioma and lung cancer

References	Diseases	Dataset	Techniques	Outcome	Findings	Challenges/remarks
[101]	Mesothelioma	Data collected from the Diyarbakir district of southeast Turkey (324 patients)	Apriori method, recursive feature elimination method	Lift = 1.0–1.6 Support = 0.5–1.0 Confidence = 0.5–1.0	This study came to some important results on MM prognostic variables. Their findings revealed that histopathological variables have a role in the development of MM	Large dataset would escalate the execution time
[102]		UCI repository dataset	Apriori Algorithm	Support 75% confidence 90%	Asbestos exposure, its length, and duration of symptoms all had a significant influence on the frequency of MM, according to the findings	To extract the rules, the model did not examine the various association mining techniques
[103]		Dicle University, Turkey	Random forest, Clojure classifier (CC), decision tree model, kernel logistic regression (KLR)	Accuracy = 71.29%	The findings from biopsy as well as radiological testing are good predictors of mesothelioma, according to the authors	Overfitting was a problem with decision tree models, particularly random forest, which lowered the system's accuracy
[104]		UCI repository	Apriori Algorithm, Association Rule Mining, Data Normalization	Confidence Support = 75%	The data showed that the length of symptoms, together with exposure to asbestos, has had a significant impact on the MM (malignant Mesothelioma) rate	If memory usage for the quantity of transitions was limited, the Apriori algorithm revealed an issue of incompetence
[105]		Dataset collected from UCI machine learning repository	SMOTE, ADASYN, Artificial neural network, principal component analysis	Accuracy = 96%	Their study highlighted the significant input features of malignant Mesothelioma problem	Only few factors related to Mesothelioma were studied
[107]	Lung Cancer	LUNA 16 and Kaggle Data Science Bowl (KDSB)	DFD-Net, denoising model, Convolution Neural Network	Accuracy = 87.8% Specificity = 89.1%	The method considerably enhanced the performance of model	The method addressed the issue of class imbalance in the information while using Kaggle dataset
[108]		Computed Tomography Scan Images	Fuzzy Particle Swarm Optimization(FPSO),Convolution Neural Network,	Accuracy = 94.97% St = 96.68% Sp = 95.89%	This study used an input lung picture to detect malignant lung nodules and to categorise lung cancer and its degree	Incorrect classification of cancer as benign or malignant
[109]		Hubei Taihe Hospital (110 patients)	Naïve Bayes Classifier,Fast Correlation Based Filter,	AUC = 98.9% Sensitivity = 98.1% Specificity = 100%	The ability of metabolic indicators to identify lung tumours early had been proven	Smaller size of data used to carry the research
[110]		Chest X-ray dataset	Convolution Neural Network	Acc = 84.02% Specificity = 85.34% Sensitivity = 82.71%	The training strategy that had been proposed was performing superior than the already existed transfer learning method	Accuracy needed to improved by working on its features
[111]		Data collected from cancer imaging archive (CIA) dataset	Ensemble classifier	Accuracy = 97.6% Precision = 97.2% Recall = 97.5% F1 score = 97.5%	The system recognized the cancer with maximum accuracy	-
[113]		LIDC-IDRI dataset	Convolution neural network	AUC = 0.967	The system gave a summary of the most common methods for nodule categorization as well as lung cancer prediction using CT imaging data	The author mentioned that the system efficiency for the training and testing data sets employed should be considered

Table 7

Comparative analysis of the techniques used to predict asthma, tuberculosis, cystic fibrosis

References	Diseases	Dataset	Techniques	Outcome	Findings	Challenges/remarks
[114]	Asthma	Asthma dataset	Decision Tree, Naive Bayes, K Nearest Neigbour	Accuracy = 96.52%	The suggested model provided a low-cost, user-friendly instrument for detecting and classifying asthma in its early stages	Limited dataset used for testing
[115]		Clinical data of 2870 patients	Feature based time series classification	Recall = 87% Accuracy = 92% Precision = 89% F measure = 88% Specificity = 94% AUC = 87%	The researchers looked at how time—series data dynamics as well as temporal sequences affected daily asthma symptoms	The models should be trained with large sample of dataset
[116]		Data of 2010 patients	Principal component Analysis, Naïve Bayes	AUC = 85% Sensitivity = 90% Specificity = 83%	The authors had shown the capability to progress the early detection of asthma exacerbations when compared to traditional paper-based action plans	The data was collected manually which could have been prone to inaccuracies
[117]		Clinical data of 1225 patients	Normalization, Orthogonal array, Mahalanobis distance	Accuracy = 94.15%	The study developed an asthma detection algorithm that accurately detects illness	The study did not included a precise asthma risk score or a list of reference doctors
[118]	Tuberculosis	Data collected from Shenzhen china dataset and montgomery country dataset	CLAHE method, Deep convolution neural network, UNet architecture	Accuracy = 97.1% Specificity = 96.2% Sensitivity = 97.9%	The model provided inexpensive easily accessible, highly accurate solutions for the low-income countries who are suffering from TB disease	Even if the model achieved good sensitivity and specificity, it isn't easy to compare model performance to human performance because it hasn't been tested in the field yet
[119]		microscopic images of 22 sputum smear	CNN	Recall = 97.1% Precision = 78% F-score = 86.7%	The model helped the physicians to detect the disease in a small amount of time to improve the clinical outcome	–
[120]		100 TB ctscan images	ResNet	Accuracy = 85.2%	This research was expected for the substantial part to the discipline and encourages the use of ML approaches in the medical area	Traditional 3D CNN architectures tend to be less successful due to the limitation of the datasets as well as the features of comparable aberrant patterns of TB among the five severity levels
[121]		Montgomery and Shenzhen datasets	Ensemble classifier, Canny Edge Detector, Deep Learning	Accuracy = 93.59%, Specificity = 94.87% Sensitivity = 92.31%	The findings show that employing ensemble classifiers trained on a variety of characteristics derived from different types of photos improves detection performance	The model needed to be expanded to categorize chest X-rays on the basis of severity of tuberculosis, and more features needed to be investigated to improve the classifiers' performance
[122]		501 Computed Tomography Scan images	Convolution Neural Network, Deep Learning	Recall = 98.7% Precision = 93.7%	The detection of pulmonary TB was successfully researched in the article, and a quantitative diagnosis report was developed after researching a large number of relevant publications and methodologies	The system failed to correctly classify pulmonary tuberculosis
[123]		Taipei Medical university	RF, ANN	Accuracy = 88.67% Specificity = 90.4% Sensitivity = 80%	The model gave a chance for doctors to conduct preventative actions before ATDH occurred	The models were not trained with large data sample
[124]		Real time dataset collected in the form CXray images	Linear regression, KNN, Naïve Bayes, Decision Trees, Random Forest, SVM, MLP	Accuracy = 98.57% Precision = 99.58% Sensitivity = 91.50% Specificity = 99.50%	The authors filled a significant vacuum in the literature by comparing and contrasting multiple algorithms for predicting tuberculosis	To enhance model hyper-parameter tuning, the authors intended to investigate single class classification approaches and assessed the use of, deep learning ensembles
[125]		Montgomery dataset	InceptionNet model	Accuracy = 87.5% AUC = 0.92 Sensitivity = 0.76 Specificity = 0.95	The model offers a quick and accurate method for mass tuberculosis screening in low-resource areas across the world	Limited dataset alone gave insight into their model's intrinsic capabilities
[125]		Shenzen dataset	InceptionNet model	Accuracy = 91.7% AUC = 0.96 Sensitivity = 0.89 Specificity = 0.93
[126]	Cystic fibrosis	Sixteen clinical data of patients whose age ranged from 19–59 years	HASTE technique (Half Fourier Single Shot Turbo spin-Echo), Lasso method	Recall = 0.68 Precision = 0.016	The researchers were able to define small intestines morpho-functional abnormalities in individuals with cystic fibrosis	Objective indicators of therapeutic responsiveness to the novel CFTR modulators, as well as gastrointestinal outcome assessments, were required
[128]		CT scans of 194 patients	Deep Learning, Cascade Network, binary and multiclass classification	Accuracy = 94%	The findings revealed that the suggested technique may detect abnormalities and assign a grade to each illness in the initial stages of CF respiratory illness	The model needed to work on the detection of cystic fibrosis
[129]		To produce synthetic data, a second order ODE mathematical model of the lung was employed	Support vector machine, Naïve Bayes classifier, logistic regression,	Accuracy = 99%	The model gave real-time assistance to clinicians in making diagnostic decisions	Their work was limited by the fact that all of the training and testing was done with simulated data and that all of the classes used the similar profile
[130]		1000 distinct peaks were extracted from 277 perspiration samples	Gaussian based decision tree model	Accuracy = 98% Recall = 96% Precision = 94% Specificity = 98%	The model looked at the relationship between lipid profiles and moderate and severe CFTR gene mutations	Enrolling additional patients to enhance the sample size of a genetically varied CF community might improve this research in the future

Table 8

Comparative analysis of the techniques used to predict emphysema, pneumoconiosis

References	Diseases	Dataset	Techniques	Outcome	Findings	Challenges/remarks
[131]	Emphysema	Benchmark dataset and manual dataset of 39 and 19 patients respectively	Improved red deer algorithm,Fuzzy C means, Adaptive local ternary pattern	Benchmark Dataset Accuracy = 94.99% F1 Score = 89.45% Sensitivity = 85.66%	Their research looked at how a deep learning technology may be used to diagnose pulmonary emphysema automatically	On the generated picture, the approach was unable to offer localization information
[131]				Manual Dataset Accuracy = 95.56% Sensitivity = 91.6% F1 score = 95.25%
[132]		Data from the Danish Lung Cancer Screening Trial, which included 1990 people	Proportion Net, GAP Net	AUC = 0.96	The model's possible to identify the target was excellent enough to accurately characterize the geographical distribution of emphysema	With human-level accuracy, the geo-location was excellent enough to categorize the geographical distribution of emphysema
[133]		7143 COPD Gene Participants	Deep Learning, Convolution Neural Network, Cox proportional hazard models	Confidence Interval = 95%	The technology produced a decipherable result to recognize patients who are at higher risk of death	The model might be impacted by the unique CT methodology because it was trained using just COPD Gene data
[134]		126 input recordings taken from respiratory sounds database	CNN, MFCC, Librosa machine learning features, K fold cross validation	Specificity = 0.93 Sensitivity = 0.93 ICBHI score = 0.93	Medical specialists are able to use the algorithm to diagnose COPD using breathing sounds	They can expand its functions in the future to assist physicians in diagnosing numerous other ailments and their severity
[135]		The Danish Lung Cancer Screening Study used 600 low-dose CT images	Multiple instance learning approach	AUC (scan level prediction) = 0.82 AUC ( region level prediction) = 0.88	The study give accurate predictions of emphysema occurrence at both the scan and area levels	The method resulted in a higher ratio of false to true detections in the region, resulting in worse search performance
[136]		From COLIBRI-COPD, 1409 COPD patients were recruited	SMOTE, Random forest	Accuracy = 84% PPV = 87% Sensitivity = 59%	The system was able to detect COPD at all stages of severity	Accuracy needs to be improved
[137]		9,925 CT scans collected from COPD Gene multi-center	Convolutional Neural Network	Pearson co-relation coefficient = 0.940	Biomarkers from images were learned directly by the deep-learning regression architecture and also simplified the development of biomarker extraction algorithms	The suggested method's shortcoming was that it required two-dimensional depth of field of the structures where biomarkers were calculated, which was not practical at the time
[138]		HRCT scans collected from Frederikshavn (Fre) and Aalborg (Aal) datasets	MILES classifier, Cross validation, miSVM-Q classifier	miSVM-Q AUC = 95%	Their research revealed two novel multiple instance based classifiers that can detect emphysema patches in COPD people without any need for manual annotation	The quantity and proportion of the datasets were the study's key limitations
[138]			MILES classifier, Cross validation, miSVM-Q classifier	MILES AUC = 78.8%
[139]	Pneumoconiosis	1881 digital images of chest X-ray	DCNN, Inception V3	AUC = 87.8%	The authors found that deep leaning solution could produce a comparatively superior classification performance	Limited dataset and more time complexity
[65]		Chongqing CDC, China collected image data from August 2016 to June 2017	ResNet 34, DenseNet40, DenseNet64, DenseNet53	Accuracy = 88.6% Precision = 83.3% Recall = 53.6% F1 score = 65.2%	The authors suspected pneumoconiosis in radiographic images	The authors needed to work on the performance of the system to enhance its accuracy
[140]		5424 chest radiographic images	CAD algorithm, McNemar test	Sensitivity = 0.89 to 0.98 Specificity = 0.68 to 0.86 Kappa values = 0.57to 0.84	The findings of investigation showed that CAD can significantly increase performance of pneumoconiosis diagnosis	The research only covered a limited amount of patients, and the outcomes were inconsistent
[141]		405 pneumoconiosis patients	Convolution Neural Network	Accuracy = 97.3% Sensitivity = 98.1% Specificity = 97%	In terms of pneumoconiosis grading accuracy, their study beat two categories of radiologists	The model was unable to distinguish pneumoconiosis from those other lung illnesses with comparable pathologies
[142]		Original pulse signal	Support vector machine, RBF kernel function	Precision = 100% Recall = 86.85% F-measure = 92.96% Accuracy = 88.31%	The researchers' proposed technique of detecting coal miners' pulse signals will remind coal workers of the need of verifying and preventing illness progression	In the future, the authors would need to optimize the goal function to make the entire system greater realistic, taking into account the various consequences of misinterpretation of diagnosis
[143]		Pneumoconiosis data	Transfer learning,CNN	Specificity = 87% Sensitivity = 95% AUC = 94%	Both transfer learning strategies outperformed beginning from scratch with insufficient training data, according to the findings of the experiments	It was difficult to satisfy the need of information using deep learning since chest X-rays utilised in pneumoconiosis was limited due to sensitive patient data
[144]		Dataset taken from National Institute for Occupational Safety and Health website (NIOSH)	Autoencoder, SVM, CheXNet, Multilayer perceptron	Sensitivity = 93.3% Specificity = 88.46% Accuracy = 90.24%	When training datasets are skewed or lack variety, the cascaded machine learning architecture might be employed in various medical image analysis	Future research included a pilot study in which the approach is tested in a clinical environment using human readers

Comparative analysis of the techniques used to predict pulmonary edema, pulmonary embolism, covid-19 Accuracy = 52% AUC = 0.94 Sensitivity = 96% Specificity = 96% RMSE = 0.66 Pearson CC (correlation coefficient) = 0.52 Sensitivity = 97.3% Accuracy = 96.5% Specificity = 95.5% AUROC = 0.9164 Sensitivity = 71.493% Specificity = 10.011% IoU (Intersection over union) threshold = 50% Accuracy = 91% Precision = 68% Sensitivity = 93% Specificity = 95.5% F1 score = 86% Sensitivity = 90.9% Susceptibility = 92% Sensitivity = 79.6% Specificity = 95.0% Sensitivity = 94.6% Specificity = 76.5% AUROC = 84% Accuracy = 81% Specificity = 82% Sensitivity = 75% Accuracy = 92.95% Recall = 85% Specificity = 82% Precision = 91.5% RMSE Confirmed: 455.92 Death: 117.94 Recovered: 809.71 Accuracy = 88% Sensitivity = 89% Specificity = 91% AUC = 90% Dataset collected from Joseph Paul Cohen and Paul Morrison Lan Dao Accuracy: 98.09% Accuracy: 89.5% Specificity: 0.88 Sensitivity: 0.87 Comparative analysis of the techniques used to predict Mesothelioma and lung cancer Lift = 1.0–1.6 Support = 0.5–1.0 Confidence = 0.5–1.0 Support 75% confidence 90% Accuracy = 87.8% Specificity = 89.1% Fuzzy Particle Swarm Optimization(FPSO),Convolution Neural Network, Accuracy = 94.97% St = 96.68% Sp = 95.89% AUC = 98.9% Sensitivity = 98.1% Specificity = 100% Acc = 84.02% Specificity = 85.34% Sensitivity = 82.71% Accuracy = 97.6% Precision = 97.2% Recall = 97.5% F1 score = 97.5% Comparative analysis of the techniques used to predict asthma, tuberculosis, cystic fibrosis Recall = 87% Accuracy = 92% Precision = 89% F measure = 88% Specificity = 94% AUC = 87% AUC = 85% Sensitivity = 90% Specificity = 83% The authors had shown the capability to progress the early detection of asthma exacerbations when compared to traditional paper-based action plans Accuracy = 97.1% Specificity = 96.2% Sensitivity = 97.9% Recall = 97.1% Precision = 78% F-score = 86.7% Ensemble classifier, Canny Edge Detector, Deep Learning Accuracy = 93.59%, Specificity = 94.87% Sensitivity = 92.31% Recall = 98.7% Precision = 93.7% Accuracy = 88.67% Specificity = 90.4% Sensitivity = 80% Accuracy = 98.57% Precision = 99.58% Sensitivity = 91.50% Specificity = 99.50% Accuracy = 87.5% AUC = 0.92 Sensitivity = 0.76 Specificity = 0.95 The model offers a quick and accurate method for mass tuberculosis screening in low-resource areas across the world Accuracy = 91.7% AUC = 0.96 Sensitivity = 0.89 Specificity = 0.93 Recall = 0.68 Precision = 0.016 Accuracy = 98% Recall = 96% Precision = 94% Specificity = 98% Comparative analysis of the techniques used to predict emphysema, pneumoconiosis Benchmark Dataset Accuracy = 94.99% F1 Score = 89.45% Sensitivity = 85.66% Manual Dataset Accuracy = 95.56% Sensitivity = 91.6% F1 score = 95.25% Specificity = 0.93 Sensitivity = 0.93 ICBHI score = 0.93 AUC (scan level prediction) = 0.82 AUC ( region level prediction) = 0.88 Accuracy = 84% PPV = 87% Sensitivity = 59% miSVM-Q AUC = 95% MILES AUC = 78.8% Accuracy = 88.6% Precision = 83.3% Recall = 53.6% F1 score = 65.2% Sensitivity = 0.89 to 0.98 Specificity = 0.68 to 0.86 Kappa values = 0.57to 0.84 Accuracy = 97.3% Sensitivity = 98.1% Specificity = 97% Precision = 100% Recall = 86.85% F-measure = 92.96% Accuracy = 88.31% Specificity = 87% Sensitivity = 95% AUC = 94% Sensitivity = 93.3% Specificity = 88.46% Accuracy = 90.24%

Role of AI to Predict Pulmonary Edema, Pulmonary Embolism, Covid-19

Medical pictures are one of the first diagnoses since they can disclose specific undetected pathologic changes. Still, the absence of publicly available datasets and benchmark studies makes it impossible to compare and define the best detection systems. To solve this, Islam et al. [79] employed several datasets to analyze the effectiveness of models on various disorders, including pulmonary edema. To find the anomalies in chest X-rays, the investigator's employed trained classifiers. According to Liao et al. [80], one major obstacle in analyzing medical pictures is limited ground truth labels. As a result, they designed and tested a semi-supervised learning system for estimating pulmonary edema to aid therapeutic choices in congestive heart failure. To tackle the problem, they created a Bayesian model (Eq. 1) that learns the probabilistic feature representations from the complete picture collection and uses them to forecast the degree of edemawhere is the posterior probability, is the likelihood, is the class prior probability, and is the predictor prior probability, The authors maximized the log probability of the data to construct the probabilistic feature representation (Eq. 2) that has been learned from all images in predicting pulmonary edema. Here is the parameters, N is the total number of images, y is the edema severity label, x represent the single image. According to Hong et al. [81], patients with pulmonary edema have many consequences for the rest of their lives. Physicians must deal with stress, concern, and inconvenience when treating such patients because they may suffocate if they are not treated. As a result, the author advocated a comprehensive investigation on the application of auditory classification algorithms for the automated identification of excessive lung water that engorges the alveolar beds to address these issues. The authors developed a unique approach using recursive feature eliminations, logistic regression (Eq. 3), and principal component analysis (Eqs. 4–6) to validate the learned data with supplemented samples from local hospitals. Here is dependent variable, is population y-intercept, is population slope coefficient, is an independent variable, and is random term error. Given a dataset of N centered observations in a d-dimensional space PCA diagonalizes the covariance matrix where C is a covariance matrix, To solve the Eq. (5) using eigen values Lin et al. [82] investigated the diagnosis and progression of illnesses to integrate bio-fluid-based infrared spectroscopy into the clinical area. The authors looked at using Fourier transform infrared micro spectroscopy to detect abrupt cardiac death. Assessing the degree of pulmonary edema is complex, and detecting it in chest radiographs allows doctors to make prompt treatment decisions for patients. Using the large-scale clinical MIMIC-CXR (Chest X-ray Scan) database, Kumar et al. [83] tested different supervised and semi-supervised deep learning approaches to detect the degree of edema from radiological pictures. Furthermore, the authors evaluated three ways to alleviate class imbalance during implementation: weighted cross entropy loss (Eqs. 7–8), class aware sampling, and random minority oversampling. Kumar et al. [84] reported a texture analysis of chest X-ray to detect pulmonary edema in chest X-rays automatically. To put it another way, the authors were able to tell the difference between a chest X-ray with indications of pulmonary edema and a normal chest X-ray.where is the ith window's component of the multiclass cross-entropy loss, is the weighted cross-entropy function, p is the probability that the window i belongs to class c as predicted by the given model, is the weight applied to the component. According to Hayat et al. [85], quick screening of pulmonary edema patients is required such that radiologists can make a prediction as soon as feasible. However, depending on specialists' knowledge of reasoning impedes the diagnostic process. As a result, the author created a deep learning-based architectural model to detect the presence of acute pulmonary edema in chest X-ray pictures. Brestel et al. [86] sought to deliver expert-level information to every chest X-ray image right away. To achieve expert-level automatic interpretation of regular chest X-rays, they used a machine learning approach and discussed the results using a robust approach of clinical validation. Kiourt et al. [87] used transfer learning methodologies to adopt and analyze some of the most common convolutional neural network designs to get decent model accuracy for detecting pulmonary embolism in CT scans. Deep convolutional neural network (DCNN) exhibited good results in identifying critical abnormalities in CT images, including intracranial haemorrhage, acute brain ischemia, and essential abdomen findings, according to Weikert et al. [88]. An end-to-end, fully convolutional network (i.e., U-Net) was created by Liu et al. [89] to segment clots and determine clot volume in CTPA (CT pulmonary angiogram). This study aimed to evaluate U-ability Net's to identify clots in terms of efficiency and accuracy and compute the APE clot load (acute pulmonary embolism). Rucco et al. [90] suggested CAD (computer-aided drugs) for the mathematical theory of Hyper-networks and Q-analysis, allowing for a dimensional patient dataset description. The findings were utilized for feature selection in the artificial neural network training stage. The authors were able to identify and diagnose pulmonary embolism in that manner. Since we know that the number of Covid-19 cases has risen dramatically in the last year, it's become even more critical to track and identify healthy and infected persons quickly and precisely. Many existing detection approaches are ineffective in detecting viral patterns. As a result, Chen [91] developed an excellent classification approach for detecting COVID-19 viral sequences. Ismael et al. [92] employed an AI-based technique that effectively monitors various lung illnesses. For deep feature extraction, the authors operated pre-trained CNN models with SVM (support vector machine) classifiers with different kernel functions such as linear, quadratic, cubic, and Gaussian, used for covid-19 classification. Similarly, according to Punn et al. [93], artificial intelligence experts have concentrated their professional knowledge on constructing mathematical models for assessing the epidemic condition utilizing state-wide shared data. As a result, the authors advocated a few AI models to analyze everyday exponential behaviour and the projection of future corona virus reach-ability across countries using real-time data. The data science community has developed many machine learning (ML) models to improve Covid-19's diagnostic capabilities. Most are based on computed tomography (CT) images or chest X-rays. Cabitza et al. [94] used various classifiers by including machine learning techniques for blood-test results, which are generally available in clinical practices. Panwar et al. [95] focused on observational lockdown analysis and said that artificial intelligence technologies were necessary to defeat such a solution. With this in mind, the scientists presented a CNN-based algorithm, emphasizing that lockdown isn't the only way to combat the covid19 epidemic. Elaziz et al. [96] demonstrated a technique for accurately classifying covid-19 chest X-ray pictures. The characteristics of orthogonal moment features and feature selection approaches determine the classification strategy. The authors created a novel feature selection approach based on several assessment techniques to improve the behavior of Mamta ray foraging optimization. Several aspects may be used to identify viral infections based on imaging patterns. Wang et al. [97] hypothesized that CNN might aid in identifying distinctive characteristics that would be difficult to detect using visual recognition alone.

Role of AI to Predict Mesothelioma, Lung Cancer

Alam et al. [101] aimed to look for clinical, radiological, and histological variables in malignant Mesothelioma. The authors suggested a novel framework for identifying prognostic indicators utilizing non-invasive and cost-effective methods based on various techniques. According to the authors, their suggested framework would aid medical professionals and healthcare experts in detecting malignant Mesothelioma early and treating it more effectively by including crucial prognostic markers. In their study, Latif et al. [102] looked at the risk factors for malignant Mesothelioma. The scientists employed a dataset that included healthy people and Mesothelioma patients, but only Mesothelioma patients were chosen for symptom identification. According to the authors, these findings will aid in managing MM-related co-morbidities such as cardiovascular disease, cancer-related mental distress, diabetes, anemia, and hypothyroidism. Choudhry et al. [103] employed artificial intelligence-based algorithms to offer the best system for Malignant Pleural Mesothelioma early identification and prognosis (MPM). According to the authors, decision tree models, random forest, have a risk of overfitting; hence, they created a model that can diagnose with or without pricey biopsy data to overcome the flaws outlined above. Similarly, Alam et al. [104] concentrated on investigating MM risk variables. The scientists included ill and healthy people in their study, resulting in a larger dataset. The dataset has a class imbalance problem, with the number of malignant Mesothelioma patients being much lower than the number of healthy people. Furthermore, the numerical attributes were categorized as nominal attributes, and association rules were created in the dataset. To detect malignant Mesothelioma, Gupta et al. [105] compared numerous machine learning algorithms with different feature sets to address the class imbalance problem. To achieve this, the authors used three sampling techniques: resampling, synthetic minority oversampling technique (SMOTE), and adaptive synthetic sampling (ADASYN). They also used other dimension reduction approaches, such as the ordinary least square method (OLS), principal component analysis (PCA), and random forest feature selection (RFFS), as well as genetic algorithms, to determine the exact collection of features. MesoNet was developed by Courtiol et al. [106]. It successfully predicted survival rates of Mesothelioma patients using whole-slide digitized images with no need for pathologist-provided locally labeled regions. The researchers also confirmed that the method was more suitable for estimating the patient's survival rate than conventional pathological approaches. Several researchers have proposed several CNN strategies for lung cancer detection. These models, however, could not give the expected detection accuracy. Sori et al. [107] utilized a retraining technique, i.e., multi-phase CNN. The initial training was identical to the usual one. It was based on fine-tuning, which just used the selected parts of the model to describe and learn distinct morphology of lung nodule characteristics for contextual information, as shown in Fig. 5. Much work has gone into developing computer-assisted diagnosis and detection methods to increase the diagnostic quality for lung cancer detection categorization. As a result, Asuntha et al. [108] attempted to identify malignant lung nodules in the input picture and categorize lung cancer according to severity. The authors employed several optimal feature extraction approaches such as local binary pattern (LBP), scale-invariant feature transform (SIFT), and others to extract textural, geometric, and intensity features. Machine learning models improve the model's performance by learning from previous experiences. These models also seek to identify practical factors and their relationships. As a result, Xie et al. [109] discovered clinical metabolic markers that demonstrated significant differences between lung tumor patients and healthy persons. According to the authors, biomarkers were also employed to distinguish between histological subtypes and illness degrees.

Fig. 5

Architecture based on multi-pathway CNN to detect lung cancer [107]

Architecture based on multi-pathway CNN to detect lung cancer [107] Because they had a restricted lung cancer image dataset that did not work out using typical transfer learning and data augmentation techniques, Ausawalaithong et al. [110] implemented the transfer learning model twice to improve its performance. The first occurred when the model was transferred from the domain of public image to the chest X-ray. The model was then applied to lung cancer for the second time. According to the authors, multi-transfer learning solved the sample's limited size and produced greater results on the test than classic transfer learning. Shakeel et al. [111] introduced an intelligent machine learning approach to improvise the lung detection process. The CT scan-based lung pictures were continually evaluated using a multilayer brightness-preserving method that reduced image noise and improved lung image quality. Due to the segmentation process's relevance, the scientists adopted a multilayer augmented deep neural network technique to extract the cancer-affected area. Pradhan et al. [112] researched to predict various illnesses to make a judgment on lung cancer prediction. The authors also provided a comprehensive analysis of several machine learning algorithms to assess their competence and performance in predicting lung cancer and, as a result, detecting lung cancer with IoT integration. In a retrospective study of the USA National Lung Screening Trial (NLST), Kadir et al. [113] indicated the influence of lung imaging reporting and data system (lung RADS). Although lung RADS has been shown to lower the total number of benign nodules during screening, its categorization task has proven complicated. As a result, the authors addressed the issue by recommending that radiologists and pulmonary medicine specialists use computer-assisted technology as a tool. As a result, the authors analyzed their study's progress in developing and validating the lung cancer predictive model and nodule categorization.

Role of AI to Predict Asthma, Tuberculosis, Cystic Fibrosis

Awal et al. [114] applied multiple learning models to investigate the parameters that characterize asthma diagnosis and prediction. BOMLA (Bayesian Optimisation-based Machine Learning Framework for Asthma), a new machine learning technique, had been developed for identifying asthma. Khasha et al. [115] set out to identify asthma control levels, and they only looked at research that used supervised approaches like classification models in data mining. Their primary goal was to improve the effectiveness of classification algorithms by investigating the impact of daily clinical data's time-series/time-sequences dynamics on asthma control level detection in patients. Zhang et al. [116] identified severe asthma exacerbations based on freely accessible daily monitoring data. Compared to previously published models, the authors hypothesized that a predictive algorithm created utilizing machine learning techniques and an extensive training dataset of daily monitoring data would yield greater accuracy for identifying asthma exacerbations, as shown in Fig. 6.

Fig. 6

Ways to predict asthma exacerbation patients

Ways to predict asthma exacerbation patients Several studies have employed Mahalanobis–Taguchi System (MTS) for intelligent illness detection with reasonable accuracy, according to Zhan et al. [117]. Their research aimed to see if MTS could be used to diagnose asthma based on regular blood measurements from healthy people and people living with asthma. Tuberculosis can be cured by diagnosing it at its early stages, and to make it happen, the main requirement is to use the diagnostic technologies properly. Hence, Dasanayaka et al. [118] presented a model that can detect TB using deep generative adversal network. The chest X-ray images were chosen based on the subjective and objective quality assessment metrics. The objective quality assessment metrics were selected as peak signal-to-noise ratio (PSNR) (calculated by Eqs. 9–10), and the radiologists performed a subjective evaluation. And to calculate (mean square error) MSE, we have an Eq. 10 where I and K are the observed and predicted values, m &n are the data points respectively According to Panicker et al. [119], traditional microscopic sputum smear screening for TB (tuberculosis) diagnosis is time-consuming and error-prone. To make the diagnostic procedure more accessible, the scientists employed an image binarization approach and a modified convolution neural network to detect TB by identifying the pixels in the picture corresponding to bacilli. Gao et al. [120] concentrated on using cutting-edge deep learning approaches to analyze CT pulmonary images. The authors identified five categories of severity for tuberculosis to track therapy effectiveness. The latter was the subject of Hwa et al. [121]. They detected tuberculosis using contrast-enhanced canny edge detected (CEED-Canny) X-ray images which generated edge detection in lung X-ray pictures. Li et al. [122] developed a new model for generating quantitative computed tomography images in a clinic for diagnosing pulmonary TB. Based on CT data, the scientists used a fine-tuned 3D CNN model to classify pulmonary TB lesion areas. Their mission was to digitally assess the spatial position of each lesion, the confidence of each infection, the presence of calcification, lesion type classifications, overall infection likelihood, and practical volume of the left and right lungs. According to Lai et al. [123], machine learning technologies identify and diagnose various disorders. The effectiveness of artificial neural networks and support vector machine in predicting the development of prostate, ovarian, breast, and liver cancer were outstanding. As a result, the researchers utilized machine learning-based three algorithms to predict anti-tuberculosis drug-induced hepatotoxicity. According to Barros et al. [124], determining the severity of illnesses is necessary to improve patient quality of life, efficiently manage health resources, and so on. As a result, they focused on evaluating nine machine learning models, including KNN (calculated by Eq. 11), Nave Bayes, and decision trees, on improving TB prognosis to predict the chance of mortality using patient geographical, medical, and laboratory data. To summarise, the authors used feature selection techniques to identify the most relevant fields, used randomized search techniques to choose the optimal hyper-parameters of the machine learning model, and proposed an ensemble learning model to achieve better results. Here the two points in Euclidean n-space are p and q. According to Das et al. [125], a chest X-ray is a potential signal for detecting TB. However, the shortage of competent radiologists in limited resource areas exacerbates the problem. As a result, the authors wanted to employ an end-to-end deep learning TB screening tool based on chest X-ray images so that their system could adjust to changed data over time. The cystic fibrosis transmembrane conductance regulator is widely distributed in the colon and plays a crucial function in controlling gut secretion viscosity and pH. As a result, Malagelada et al. [126] evaluated gut function using imaging methods often used to detect anatomical abnormalities of the digestive system. The authors also predicted that combining internal (endoluminal) and exterior (MRI) imaging gives a new viewpoint on the relationship between cystic fibrosis anatomical and functional findings. Zucker et al. [127] investigated if a deep CNN model might aid automated Brasfield rating of radiographic images of a chest for cystic fibrosis sufferer that would be analogous to that of a radiologist. Marques et al. [128] developed a texture classification-based technique to detect cystic fibrosis anomalies. The researchers used convolutional neural networks, as shown in Fig. 7, in two ways, in which the first one detected aberrant tissues, and the next determined the sort of structural abnormalities. The authors also presented a network that solely used patch-wise annotations to compute pixel-wise heatmaps of present irregularities.

Fig. 7

Architecture of cascade approach for the detection of cystic fibrosis [128]

Architecture of cascade approach for the detection of cystic fibrosis [128] Dio et al. [129] stated that physicians were given tools to recognize airway disorders in an automated and quick manner. Their goal was to produce a proof-of-concept and the first favorable results that might lead to rapid, reliable, and automatic detection of such disorders, similar to how the Sweat Chloride Test was used to diagnose Cystic Fibrosis (CF). Likewise, Zhou et al. [130] proposed a novel technique for cystic fibrosis diagnosis based on combining desorption electrospray ionization mass spectrometry with gradient boosted decision trees (calculated by Eqs. 12, 13) that divided the set of data into sub-sets, where S, X is a discrete random variable and P is a probability.and entropy using the frequency table of two attributes is given asto analyze perspiration samples.

Role of AI to Predict Emphysema, Pneumoconiosis

Emphysema is a condition that causes difficulties in breathing and needs to be detected early with computed tomography scans and primary function testing. On the other hand, the challenges involved with specific diagnostic methods have prompted additional computer-assisted treatments to flourish. As a result, Mondal et al. [131] used deep learning networks to conduct automated pulmonary emphysema diagnosis, resulting in increased detection accuracy. Bortsova et al. [132] investigated a weakly labeled strategy comparable to multiple instance learning. Using this method, the authors measured emphysema by leveraging %age labels and applying existing information on the nature of these labels. The authors developed an approach that used a custom loss to learn the intervals and a label proportion problem-specific architecture (LPP). Deep learning has made significant progress in various complex tasks of processing images. In light of this, Humphries et al. [133] used the Fleischner technique to classify emphysema using chest CT image processing. The authors' primary goal was to see if emphysema patterns at the participant level may predict disability and death when identified using a deep learning algorithm. Srivastava et al. [134] employed neural networks to help medical experts diagnose Chronic Obstructive Pulmonary Disease by providing a complete and systematic assessment of clinical pulmonary audio data. The authors used ten splits of K-fold Cross-Validation (defined by Eq. 14) to escalate the presentation of the existing deep neural networks for N observations.which is an indexing function that indicates the division to which randomization assigns observation ‘i’. The fitted function that is computed with the kth part of the removed data is denoted by . The estimate prediction error using cross validation (CV) is shown by Eq. (15) According to Nyboe et al. [135], automatic evaluation of emphysema presence might produce highly steady forecasts at a far cheaper cost, so it could be a viable alternative for an expert to do the assessments. As a result, they proposed a multiple instance learning (MIL) technique for emphysema detection. They also looked at whether emphysema at the scan level is sufficient to train the system to predict emphysema occurrence at the area level. Improving the detection of extreme inactivity (EI) in COPD patients can reduce morbidity and death. Except for patients with evident EI, detecting such conduct in a real-life session is impossible. As a result, the authors such as Aguilaniu et al. [136] presented a machine learning technique to test for excessive inactivity. They created a prediction system that could accurately identify EI patients who would benefit the most from therapies like pulmonary rehabilitation. Gonzalez et al. [137] introduced a biomarker estimated approach based on deep learning techniques that relied on a regression network. In this system, the input was supplied to algorithm photos having the structure where the biomarker was generated, and the output was shown directly as the biomarker value. The two indicators used to demonstrate the suggested regression architectures were emphysema and bone mineral density (BMD), which are medically necessary. Pena et al. [138] described a technique for autonomously estimating emphysema areas in patients with chronic obstructive pulmonary disease (COPD) using High-Resolution Computed Tomography (HRCT) pictures that don’t require manually labeled records for training. The authors wanted to use HRCT images without local annotations to detect emphysema sites in COPD patients. Wang, X et al. [139] wanted to see if deep learning could be used to consider pneumoconiosis on digital chest radiographs as well as contrast its performance to that of qualified radiologists. The scientists used a conventional deep convolutional neural network to analyze chest X-ray pictures and verify them with various parameters. Furthermore, the scientists requested two trained radiologists to assess the testing dataset separately and compare their results to the computerized system. Hao et al. [65] presented an electronic health record-based pneumoconiosis radiography dataset. According to the authors, recent research has focused on machine learning for computer-aided detection. These have attained remarkable precision, with the artificial neural network (ANN) doing exceptionally well. Nevertheless, wide use in clinical practice has been challenging due to unbalanced samples and a lack of readability. Hence to address such issues, the authors initially created a pneumoconiosis-based radiograph dataset, which included both un-favorable and favorable representatives. Secondly, deep convolutional diagnostic methodologies were examined in identifying pneumoconiosis, and balanced training was used to improve recall. Pneumoconiosis diagnosis is based mainly on chest radiographic images, according to Wang et al. [140], and there is substantial disagreement across clinicians. Zhang et al. [141] set out to create an artificial intelligence (AI)-based model that would aid physicians in pneumoconiosis diagnosis as well as grading using chest radiographic images. The chest radiograph system was created with the help of a training cohort and validated with the help of an independent assessment cohort. Their groundbreaking research evaluated the possibility and effectiveness of AI-assisted radiography diagnosis and screening in the field of occupational lung disease. To surmount the issues of insufficient annotation pneumoconiosis data and increase the accuracy of pneumoconiosis diagnostics, Zheng et al. [142] presented two transfer learning models. They also demonstrated various pre-processing techniques for improving the quality and accuracy of X-rays, such as segmentation of lungs and amplification of data. Many computer-aided investigations on pneumoconiosis classification algorithms have been offered, according to Zhang et al. [143], but most of them were based on lung pictures. As a result, the authors provided a technique for diagnosing pneumoconiosis using wrist pulse signals gathered from non-pneumoconiosis individuals and pneumoconiosis patients. Machine learning approaches were utilized to process and assay the pulses of non-pneumoconiosis persons and the pneumoconiosis patient. Other than specialist radiologists, Wang et al. [144] noted a lack of sequential, automatic, and primary procedures for identifying and analyzing the evolution of pneumoconiosis in every coal miner. Consequently, the authors presented the most recent research findings from a study to address the challenges described by creating Computer-Aided Diagnosis (CAD) tools to identify pneumoconiosis using chest X-rays automatically.

Overall analysis

The best techniques have been filtered out based on their respective accuracy in Table 9 after observing the performance of the models that have been used for the detection and diagnosis of multiple airway diseases such as pulmonary edema, pulmonary embolism, cystic fibrosis, pneumoconiosis, lung cancer, asthma, covid-19, Mesothelioma, tuberculosis, and emphysema.

Table 9

Overall comparison of models

Ref	Diseases	Techniques	Accuracy
[87]	Pulmonary Embolism	Deep Learning, Image Classification, Long Short Term Model	91%
[82]	Pulmonary Edema	Principal Component Analysis, Random Forest	96.5%
[129]	Cystic fibrosis	Support vector machine, Naïve Bayes classifier, logistic regression	99%
[141]	Pneumoconiosis	Convolution Neural Network	97.3%
[111]	Lung cancer	Ensemble classifier	97.6%
[114]	Asthma	Decision Tree, Naive Bayes, K Nearest Neighbour	96.52%
[96]	Covid-19	Mamta Ray Foraging Optimization, Fractional Multichannel Exponent Moments	98.09%
[105]	Mesothelioma	SMOTE, ADASYN, Artificial neural network, principal component analysis	96%
[118]	Tuberculosis	CLAHE method, Deep convolution neural network, UNet architecture	97.1%
[131]	Emphysema	Improved red deer algorithm, Fuzzy C Means, Adaptive local ternary pattern	95.56%

Overall comparison of models

Discussion

RQ 1: Year Wise Analysis of Predicting Multiple Airway Diseases Using AI Techniques

As demonstrated in Fig. 8, one hundred fifty-five (155) papers that range over 12 years, i.e. from (2010 to 2022)* have been studied for identifying different types of airway diseases using AI techniques which include pulmonary edema, cystic fibrosis, emphysema, Mesothelioma, pneumoconiosis, pulmonary embolism, lung cancer, covid19, tuberculosis, and asthma.

Fig. 8

Distribution of papers

Distribution of papers One hundred papers have been taken from recent years i.e. 2019 to 2022, and the remaining fifty-five papers have been taken from 2010 to 2018. This research intends to create a broader sense of the various types of AI and AI-derived techniques, such as machine and deep learning techniques for detecting and diagnosing multiple airway diseases. *- (As per the sources, the papers from the past ten years can be considered for SLR, but in this paper, we had to include some other information, such as a dataset of any particular disease, etc. for which we studied the papers atmost from the past 12 years.)

RQ 2: How Doctors are Being Helped by Deep and Machine Learning Techniques in Detecting the Airway Diseases?

The goal of an AI system is to organically learn a function and continue to improve without being actively taught. Machine learning and deep learning are artificial intelligence technologies that allow systems to detect patterns and connections between data and desired outputs [145]. These techniques connect multiple chunks of information discovered from the facts without requiring explicit human description [146]. At its most basic level, a machine and deep learning e-based strategy lead to a better diagnosis by analyzing a more excellent range of data than a physician. We know that medical treatment is substantially affected by the increasing amount of healthcare data which makes it very difficult for pulmonologists to handle and analyze it for treating their patients manually. Hence, to improvise such a patient-doctor relationship, clinicians use AI techniques to predict, classify, and diagnose diseases efficiently. The stats have been shown in Fig. 9, which clearly defines the accuracy of AI techniques in predicting and analyzing the diseases as compared to the traditional methods used by pulmonologists.

Fig. 9

Analysis between Pulmonologists and AI techniques in predicting the diseases

Analysis between Pulmonologists and AI techniques in predicting the diseases One advantage of employing AI to interpret diagnostic exams such as imaging is the ability to evaluate tests done in geographically isolated or underserved places [147]. This may result in a more early and accurate diagnosis, as well as a referral to expert treatment at an earlier stage of the disease, potentially influencing the prognosis Radiographs from these centers, on the other hand, can be remotely submitted and analyzed by a single central system using AI [148]. In Fig. 10, it has been shown that artificial intelligence techniques have successfully proven to be better than pulmonologists in predicting airway disorders.

Fig. 10

Airway Diseases Prediction by AI and Pulmonologists

RQ 3: Which ML and DL Techniques are Broadly Applied to Predict Airway Diseases?

Patients who require early diagnosis and treatment can benefit from AI-driven sickness detection models that help medical companies produce improved diagnostic tools. As a result, the literature mentions the techniques for diagnosing airway disorders such as cystic fibrosis, pneumoconiosis, pulmonary edema, pulmonary embolism, covid-19, emphysema, tuberculosis, lung cancer, asthma, and mesothelioma. Linear regression, support vector machine (SVM), random forest (RF), decision tree (DT), nave Bayes (NB), fuzzy particle, logistic regression, and ensemble learning are the most commonly used machine learning models in the literature. Convolutional Neural Networks (CNN) are the most often utilized deep learning models for illness diagnosis. Furthermore, artificial neural networks (ANN) and transfer learning techniques such as VGGNet, ResNet, and others have been extensively used as shown in Fig. 11.

Fig. 11

Machine and deep learning based prediction models

RQ 4: Name the Characteristics that Manipulate the Quality of Prediction Models Based on Deep and Machine Learning?

AI algorithms in the medical industry are essential, notably for identifying a disease from the medical database. Many firms employ these approaches to predict illnesses early and better medical diagnostics. Irrespective of their continual progress, there are specific problems that remain. On the one hand, machine and deep learning algorithms can handle any complicated situation, but on the other hand, they also demand more research effort for practical implementations [149]. Hence, certain limitations occur while utilizing machine and deep learning models to diagnose airway illnesses gathered from research gaps and described below. Data Paucity One of the most pervasive issues in artificial intelligence is a shortage of high-quality data. Every business will encounter this difficulty throughout the AI deployment process. Thus, in the future, open-source datasets of multiple airway diseases should be considered, though they occasionally lack quality but represent promising solutions for organizations [150]. In addition, synthetic data should be created to increase data security and privacy; data augmentation should be considered to enhance the size of the dataset without accumulating additional data; and finally, transfer learning techniques should be used when we have enough training data. Modeling errors: Two main restricting errors are overfitting and underfitting. This is called overfitting, when a model learns the information and noise in the training dataset to the point that it degrades its performance on a new dataset. In underfitting, models cannot train or generalize new datasets, thus impairing the system's performance [151]. Therefore, algorithms like pre trained models should be incorporated that work on system modeling errors in the future. Perpetual improvement of models: At times, optimizing a model's performance might be difficult. This is because optimizing a model's performance makes it more accurate in predicting and the most dependable and acceptable in artificial intelligence. Developing an AI-based model is not difficult for engineers while verifying its performance is critical to obtaining accurate and trustworthy results [152]. Hence to improve it, in the future, the appropriate amount of data should be utilized, the proper algorithms should be used, and models should be verified and evaluated appropriately. Transfer learning can also enhance the performance of AI-based models [153]. Class imbalance: An unbalanced classification issue is when the distribution of instances across recognized classes is uneven or biased [154]. Imbalanced classifications provide a problem to predict the class because most machine learning methods for classification are built on the premise of an equal number of instances for each class. So, in the future, Re-sampling, K fold cross-validation, or pre-trained models can be used to deal with class imbalance issues [155].

Conclusion and Future Work

Between 2010 and 2022, 155 studies were chosen from six digital libraries that can be accessed online, and four questions were investigated after going through them. Researchers analyzed a variety of technical developments that might be used to enhance AI-based models in the field of pulmonology. This article discusses the influence of machine learning and deep learning techniques on analyzing the information related to airway disorders. Furthermore, the paper details several academics' rigorous research efforts to demonstrate how machine learning and deep learning models help to detect or categorize various airway illnesses. The literature's systematic review is presented in a tabular format, with each column indicating the dataset, methodology, and outcomes used by the researchers and their limitations. After accumulating the constraints encountered by researchers for the prediction of airway problems, an attempt has been made to determine the usage of the latest models that may be incorporated in the future to enhance the system's performance. The lasso method, CNN, Decision Tree, GAP Net, and other algorithms were examined by certain researchers. However, their findings revealed that the models could not distinguish between diseases or detect anomalies in their data. Aside from that, several models only worked with limited data sets, failed to pre-process data, and refused to provide localization information for the final image. The model's overall performance is hampered due to these limitations, which must be addressed. Several approaches used in research publications, such as random forest and logistic regression, have the lowest prediction accuracy because of modeling flaws such as overfitting and underfitting. Many researchers have also struggled to categorize data using DFD-Net, Fuzzy Particle effectively, Swarm Optimization to detect lung cancer, CNN to detect pneumoconiosis, Dense Net, Inception V3, ResNet, and DNN to diagnose pulmonary embolism, and Canny Edge to detect tuberculosis. It's also worth noting that researchers have only focused on one or two airway disorders for prediction, which limits users' capacity to distinguish between other types of airway difficulties. Although artificial technology has innumerable benefits, its flaws and limitations may restrict its applications, notably in the healthcare business. But there is also scope for improvement in this field to overcome the issues such as the best selection of models which can be further improved or innovated to provide better output, and optimization techniques should be added to the network to provide the optimal results. The most important task while using the learning models is correctly classifying the data. Hence, such a methodology that connects the data's features to improve the classification model and generates correct output should be included. Besides this, deep learning techniques should be improved to be applied to a dataset where they can detect various airway diseases using less processing time. In addition, the conventional learning methods should also be considered in the future as they can be used to improve the airway disease detection model.

85 in total

1. Deep learning to automate Brasfield chest radiographic scoring for cystic fibrosis.

Authors: Evan J Zucker; Zachary A Barnes; Matthew P Lungren; Yekaterina Shpanskaya; Jayne M Seekins; Safwan S Halabi; David B Larson
Journal: J Cyst Fibros Date: 2019-05-02 Impact factor: 5.482

Review 2. Airway macrophages as the guardians of tissue repair in the lung.

Authors: Franz Puttur; Lisa G Gregory; Clare M Lloyd
Journal: Immunol Cell Biol Date: 2019-02-15 Impact factor: 5.126

3. Predicting cancer using supervised machine learning: Mesothelioma.

Authors: Avishek Choudhury
Journal: Technol Health Care Date: 2021 Impact factor: 1.285

4. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

Authors: Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior
Journal: J Digit Imaging Date: 2013-12 Impact factor: 4.056

5. Deep learning-based classification of mesothelioma improves prediction of patient outcome.

Authors: Pierre Courtiol; Charles Maussion; Françoise Galateau-Sallé; Gilles Wainrib; Thomas Clozel; Matahi Moarii; Elodie Pronier; Samuel Pilcer; Meriem Sefta; Pierre Manceron; Sylvain Toldo; Mikhail Zaslavskiy; Nolwenn Le Stang; Nicolas Girard; Olivier Elemento; Andrew G Nicholson; Jean-Yves Blay
Journal: Nat Med Date: 2019-10-07 Impact factor: 53.440

6. Evaluation of different phenotypic methods to detect methicillin resistance in Staphylococcus aureus isolates recovered from cystic fibrosis patients.

Authors: Andrea García-Caballero; Juan de Dios Caballero; Ainhize Maruri; Maria Isabel Serrano-Tomás; Rosa Del Campo; María Isabel Morosini; Rafael Cantón
Journal: Diagn Microbiol Infect Dis Date: 2021-09-23 Impact factor: 2.803

7. Analyzing the epidemiological outbreak of COVID-19: A visual exploratory data analysis approach.

Authors: Samrat K Dey; Md Mahbubur Rahman; Umme R Siddiqi; Arpita Howlader
Journal: J Med Virol Date: 2020-03-11 Impact factor: 2.327

8. Psychosocial consequences of false positives in the Danish Lung Cancer CT Screening Trial: a nested matched cohort study.

Authors: Jakob Fraes Rasmussen; Volkert Siersma; Jessica Malmqvist; John Brodersen
Journal: BMJ Open Date: 2020-06-04 Impact factor: 2.692

9. Cystic fibrosis-related diabetes: Prevalence, screening, and diagnosis.

Authors: Swapnil Khare; Marisa Desimone; Nader Kasim; Christine L Chan
Journal: J Clin Transl Endocrinol Date: 2021-12-07

10. Early lung cancer diagnostic biomarker discovery by machine learning methods.

Authors: Ying Xie; Wei-Yu Meng; Run-Ze Li; Yu-Wei Wang; Xin Qian; Chang Chan; Zhi-Fang Yu; Xing-Xing Fan; Hu-Dan Pan; Chun Xie; Qi-Biao Wu; Pei-Yu Yan; Liang Liu; Yi-Jun Tang; Xiao-Jun Yao; Mei-Fang Wang; Elaine Lai-Han Leung
Journal: Transl Oncol Date: 2020-11-17 Impact factor: 4.243