D Arul Pon Daniel1, K Thangavel2. 1. Department of Computer Science, Loyola College, Mettala, Namakkal, Tamil Nadu, India. 2. Department of Computer Science, Periyar University, Salem, Tamil Nadu, India.
Abstract
Breathomics is the metabolomics study of exhaled air. It is a powerful emerging metabolomics research field that mainly focuses on health-related volatile organic compounds (VOCs). Since the quantity of these compounds varies with health status, breathomics assures to deliver noninvasive diagnostic tools. Thus, the main aim of breathomics is to discover patterns of VOCs related to abnormal metabolic processes occurring in the human body. Classification systems, however, are not designed for cost-sensitive classification domains. Therefore, they do not work on the gastric carcinoma (GC) domain where the benefit of correct classification of early stages is more than that of later stages, and also the cost of wrong classification is different for all pairs of predicted and actual classes. The aim of this work is to demonstrate the basic principles for the breathomics to classify the GC, for that the determination of VOCs such as acetone, carbon disulfide, 2-propanol, ethyl alcohol, and ethyl acetate in exhaled air and stomach tissue emission for the detection of GC has been analyzed. The breath of 49 GC and 30 gastric ulcer patients were collected for the study to distinguish the normal, suspected, and positive cases using back-propagation neural network (BPN) and produced the accuracy of 93%, sensitivity of 94.38%, and specificity of 89.93%. This study carries out the comparative study of the result obtained by the single- and multi-layer cascade-forward and feed-forward BPN with different activation functions. From this study, the multilayer cascade-forward outperforms the classification of GC from normal and benign cases.
Breathomics is the metabolomics study of exhaled air. It is a powerful emerging metabolomics research field that mainly focuses on health-related volatile organic compounds (VOCs). Since the quantity of these compounds varies with health status, breathomics assures to deliver noninvasive diagnostic tools. Thus, the main aim of breathomics is to discover patterns of VOCs related to abnormal metabolic processes occurring in the human body. Classification systems, however, are not designed for cost-sensitive classification domains. Therefore, they do not work on the gastric carcinoma (GC) domain where the benefit of correct classification of early stages is more than that of later stages, and also the cost of wrong classification is different for all pairs of predicted and actual classes. The aim of this work is to demonstrate the basic principles for the breathomics to classify the GC, for that the determination of VOCs such as acetone, carbon disulfide, 2-propanol, ethyl alcohol, and ethyl acetate in exhaled air and stomach tissue emission for the detection of GC has been analyzed. The breath of 49 GC and 30 gastric ulcerpatients were collected for the study to distinguish the normal, suspected, and positive cases using back-propagation neural network (BPN) and produced the accuracy of 93%, sensitivity of 94.38%, and specificity of 89.93%. This study carries out the comparative study of the result obtained by the single- and multi-layer cascade-forward and feed-forward BPN with different activation functions. From this study, the multilayer cascade-forward outperforms the classification of GC from normal and benign cases.
Entities:
Keywords:
Breath Analysis; Human Body; Metabolomics; Neural Networks; Sensitivity and Specificity; Stomach Cancer; Stomach Ulcer; Volatile Organic Compounds
Recently, analytical methods for measuring volatile organic compounds (VOCs) in exhaled air with high resolution and high throughput have been extensively developed. Yet, the application of machine learning methods for fingerprinting VOC profiles in the breathomics is still in its infancy. A recent literature suggests that the potential utility of breath analysis is an alternative noninvasive methodology.In clinical medicine, reaching a conclusion about a patient's symptoms, when presented with complex and sometimes contradictory clinical information, is really difficult. A clinician usually makes decisions based on a set of measurements and observations about a patient and evaluates all the factors subjectively to reach a diagnosis. However, it is obvious that clinicians may have great difficulty in analyzing enormous amount of clinical and histopathological data. Therefore, more sophisticated quantitative techniques are needed to help clinicians consider all the data and make better diagnoses. Some sophisticated quantitative techniques are proposed to the doctors by computer scientists through machine-learning techniques to help in this decision-making process.[1]Breath diagnostics, the measurement of volatile chemicals in human breath, is currently receiving attention as a technique for the detection of disease which, being noninvasive in nature, is particularly suited to screening for presymptomatic disease in healthy populations.[2] An entirely noninvasive methodology, breath analysis, has the potential to deliver accurate and reproducible diagnostic tests without risk to the patient, making it ideal for population-based health screening as well as individual testing in response to symptom occurrence. Breath analysis relies on the fact that disease states alter cellular metabolite levels, these being transferred to the bloodstream and, for volatile compounds, subsequently discerned in the breath.[3]Gastric carcinoma (GC) is the second most common cause of cancer-related deaths among Indian men and women.[4] GC ranks among the five most common cancer among young Indian men and women (aged 15–44 years) based on a study from Karnataka.[5] It has been estimated that the number of new GC cases is about 34,000 (with a male predominance ratio of 1:2) with a progressive increase postulated such that by the year 2020, there would be approximately 50,000 new GC cases annually in India.[6]
The Gastric Carcinoma Domain
GC is a disease in which cancer (malignant) cells are found in the tissues of the stomach. The stomach is a J-shaped organ in the upper abdomen where the food is digested. Food reaches the stomach through a tube called the esophagus that connects the mouth to the stomach. After leaving the stomach, partially digested food passes into the duodenum then the small intestine and then into the large intestine called the colon. Sometimes, cancer can be in the stomach for a long time and can grow very large before it causes any symptom. In the early stages of the stomach cancer, a patient may have indigestion and stomach discomfort, a bloated feeling after eating, mild nausea, loss of appetite, or heartburn. In more advanced stages of cancer of the stomach, the patient may have blood in stool and vomiting, weight loss, or pain in the stomach. Some factors that increase the chances of getting stomach cancer are a stomach disorder, called atrophic gastritis, disorder of the blood, called anemia, or a hereditary condition of growths, called polyps, in the large intestine. Stomach cancer is difficult to detect in its early stages because its early symptoms are absent or mild. Unfortunately, this is a highly aggressive cancer and the overall survival rate is very low. The chance of recovery (prognosis) and the choice of treatment depend on the stage of cancer, whether it is just in the stomach or it has spread to other places, and the patient's general state of health.[1]
Classification of Gastric Carcinoma
If there are symptoms of cancer, a physician will usually order an upper gastrointestinal X-ray or he may also look inside the stomach with a thin, lighted tube called a gastroscope. This procedure is called gastroscopy, and it is useful in the detection of most stomach cancer. For this test, the gastroscope is inserted through the mouth and guided into the stomach and the stomach mucosa is examined. According to the Gastroenterological Endoscopy Society, based on the visual inspection of the mucosal surface of the patient's stomach, GC is classified mainly into three categories: early GC (EGC) and advanced GC (AGC) and the remaining ones which cannot be included in these categories.[7]EGC is defined as GC confined to the mucosa or submucosa, regardless of the presence or absence of lymph node.[8] On the other hand, in AGC, as defined by Bormann, the tumor is invaded into the proper muscle layer beyond the stomach.[9] Moreover, knowledge of these types permits a preliminary assessment of tumor spread in stomach.Chemical analysis of the breath samples showed that five VOCs (2-propenenitrile, 2-butoxyethanol, furfural and 6-methyl-5-hepten-2-one and isoprene) were significantly elevated in patients with GC and/or peptic ulcer as compared with less severe gastric conditions. The encouraging preliminary results presented here have initiated a multicenter clinical trial with considerably increased sample size to confirm the observed breath prints.This study has been organized into five sections. Section 2 presents the experimental setup of Breath analysis. Section 3 elaborates the methodology of collecting Breath samples from the volunteers. Section 4 discusses the experimental method and materials. Section 5 presents the experimental analysis results. In Section 6, conclusion and further research scope are presented.
EXPERIMENTAL SETUP
Study Design
The primary aim of this study was to distinguish GC patients from patients with benign gastric conditions who may present similar clinical symptoms. The secondary aim was to distinguish subpopulations in the malignant and nonmalignant study groups. This study with a limited patient group of 161 (out of 236 patients after application of the exclusion criteria) was designed as a feasibility test of a nonmaterial-based breath test for GC, with a more realistic ration of malignant to nonmalignant gastric conditions.
Sensor Array
Three screen-printed commercially available metal oxide semiconductor gas sensor arrays are used to construct the proposed Breath analyzer. The gas sensors are manufactured and commercialized by Figaro USA Inc. The resulting array, populated by sensor devices tagged by the manufacturer as TGS813, TGS822, and TGS2620, is placed into a test chamber. The obtained sensor element is mounted onto a stainless steel substrate with head of chlorinated polyvinyl chloride, and then connected by lead wires to the pins of the sensor package. To generate the required dataset, connect the said test chamber to a data acquisition card (DAQ), which provides versatility for conveying the chemical compounds of interest at the desired concentrations to the sensing chamber. The response of the gas sensor array was measured when the operating temperature of sensors which, according to the deterministic one-to-one look-up table provided by the manufacturer (Figaro USA Inc., http://www.figarosensor.com), is attained through a built-in heater that is driven by an external DC voltage source set at 5V. The sensor response is read-out in the form of the resistance across the active layer of each sensor; hence, each measurement produces a 6-channel time series sequence. The DAQ collects data from the gas sensors and controls the analog voltage signal to every sensor heater. The experimental setup is shown in Figure 1.
Figure 1
Electronic-nose system using data acquisition card
Electronic-nose system using data acquisition card
COLLECTION OF BREATH SAMPLES
Exhaled alveolar breath was collected in a controlled manner, none of the volunteers consumed food, tobacco, or alcohol during an (overnight) 12 h interval before the breath collection. All volunteers were asked to rest for 1 h before the breath sampling and did not perform heavy physical exercise 24 h before taking the breath sample. All breath samples were collected in the same clinical environment and in duplicates (for the dual analysis) from each volunteer. The breath samples were characterization of the breath samples with an array of sensors, combined with a statistical pattern recognition algorithm, with the aim of identifying specific patterns (the so-called breath prints) for GC and nonmalignant gastric conditions.Samples are collected through stainless steel chamber of 140 ml volume which is standardized. Samples could be collected even from the elderly or bedridden patients without causing discomfort. The collection period was 1.5 min at 0.5 per 1 min, and the dead space of samples is removed by setting the system timer. Volunteers are from various hospitals and dispensary in and around the Tirunelveli district, Tamil Nadu, India. All subject samples were collected in random order, sample collection from subject volunteer is shown in Figure 2.
Figure 2
Collection of breath samples from cancer volunteers
Collection of breath samples from cancer volunteersBreath samples were collected after written informed consent from 270 volunteers, aged 21–73 years, at the CSI Jayaraj Annapackiam Mission Hospital and CSI Bell Pins Inndrani Chelladurai Mission Hospital, Palayamkottai, Tirunelveli. All volunteers underwent upper digestive endoscopy after recruitment according to the hospital's routine clinical protocol. Biopsy samples were taken for histopathology if lesions (including ulceration of the stomach lining) were visually observed.The following exclusion criteria were applied before sample collection: patients who have undergone gastric resection in the past; patients who were found to suffer from endoscopically detectable precancerous conditions (e.g., mucosal atrophy); and patients who took medication affecting gastric acid secretion (e.g., proton pump inhibitors) and/or antibiotics during an interval of 1 month before the breath test. The reason for the latter exclusion criterion for this study was that previous medication could strongly affect the composition of the exhaled breath.After excluding, we employed the breath samples of 161 patients were analyzed for this study: 49 GC patients, 19 patients with benign gastric ulcers, and 11 patients with less severe gastric conditions are shown in Table 1. The less severe stomach conditions cases included with no endoscopic abnormalities (82) and with endoscopic abnormalities without ulceration (11).
Table 1
Composition of the subject database
Composition of the subject databaseEthical approval was obtained from the Ethics Committee of Periyar University, Salem, Tamil Nadu, India, and the clinical trial was registered. The treatment decisions were based solely on the conventional diagnosis described above. Neither the patients nor their treating physicians were informed of the results of the breath tests.
EXPERIMENTAL RESULTS
Data Sampling
The dataset consists of experimentally obtained 161 observations. The dataset was divided into ten disjoint subsets, namely, training set containing 145 observations (90% of total observations) and test set comprising 16 observations (10% of total observations). The overall acceptability was used as output parameter for developing the artificial intelligence tool.
Artificial Neural Networks
Cascade-forward back-propagation (CFBP) and feed-forward back-propagation (FFBP) artificial intelligence models [Figures 3-6] were trained with the breath sampled dataset. Different combinations of several internal parameters, i.e., data preprocessing, data partitioning approach, number of hidden layers, number of neurons in each hidden layer, transfer function, error goal, etc., were attempted. Different variants of the back-propagation algorithm were tried here: Levenberg-Marquardt (LM), Bayesian regularization, BFGS Quasi-Newton, Resilient (RP), scaled conjugate gradient, conjugate gradient with Powell/Beale restarts, conjugate gradient with Fletcher-Powell, conjugate gradient with Polak-Ribiére, one step secant, variable learning rate gradient descent, gradient descent with momentum, gradient descent shown in Tables 2-5 with different activation functions such as radial basis, normalized radial basis, triangular basis, hyperbolic tangent sigmoid, Elliot symmetric sigmoid, Elliot 2 symmetric sigmoid, hard-limit, symmetric hard-limit, competitive, soft max shown in Tables 6-9. RP algorithm produced better results, during training of single- and multi-layer FFBP neural network as shown in Figures 7-12; however, during training of multilayer CFBP neural network, LM produces better results as shown in Figures 13 and 14. Normalized radial basis, symmetric hard-limit, competitive, soft max activation function supports to the algorithms outperformance are shown in Figures 15-22. There is no generalized method to determine the optimum values for number of hidden layers, neurons in each hidden layer, etc., as they are working of expected intelligence.
Figure 3
Network diagram of single-layer feed-forward neural network classifiers
Figure 6
Network diagram of multilayer cascade-forward neural network classifiers
Table 2
Classification accuracy of single-layer feed-forward neural network
Table 5
Classification accuracy of multilayer cascade-forward neural network
Table 6
Classification accuracy of single-layer feed-forward neural network based on activation functions
Table 9
Classification accuracy of multilayer cascade-forward neural network based on activation functions
Figure 7
Resilient back-propagation training algorithm outperformance: good in classification accuracy of single-layer feed-forward neural network
Figure 12
Resilient back-propagation training algorithm outperformance: good in error rate of single-layer cascade-forward neural network
Figure 13
Levenberg-Marquardt back-propagation training algorithm outperformance: good in classification accuracy of multilayer cascade-forward neural network
Figure 14
Levenberg-Marquardt back-propagation training algorithm outperformance: good in error rate of multilayer cascade-forward neural network
Figure 15
Symmetric hard-limit activation function outperformance: good in classification accuracy of single-layer feed-forward neural network
Figure 22
Competitive activation function outperformance: good in error rate of multilayer cascade-forward neural network
Network diagram of single-layer feed-forward neural network classifiersNetwork diagram of multilayer feed-forward neural network classifiersNetwork diagram of single-layer cascade-forward neural network classifiersNetwork diagram of multilayer cascade-forward neural network classifiersClassification accuracy of single-layer feed-forward neural networkClassification accuracy of multilayer feed-forward neural networkClassification accuracy of single-layer cascade-forward neural networkClassification accuracy of multilayer cascade-forward neural networkClassification accuracy of single-layer feed-forward neural network based on activation functionsClassification accuracy of multilayer feed-forward neural network based on activation functionsClassification accuracy of single-layer cascade-forward neural network based on activation functionsClassification accuracy of multilayer cascade-forward neural network based on activation functionsResilient back-propagation training algorithm outperformance: good in classification accuracy of single-layer feed-forward neural networkResilient back-propagation training algorithm outperformance: good in error rate of single-layer feed-forward neural networkResilient back-propagation training algorithm outperformance: good in classification accuracy of multilayer feed-forward neural networkResilient back-propagation training algorithm outperformance: good in error rate of multilayer feed-forward neural networkResilient back-propagation training algorithm outperformance: good in classification accuracy of single-layer cascade-forward neural networkResilient back-propagation training algorithm outperformance: good in error rate of single-layer cascade-forward neural networkLevenberg-Marquardt back-propagation training algorithm outperformance: good in classification accuracy of multilayer cascade-forward neural networkLevenberg-Marquardt back-propagation training algorithm outperformance: good in error rate of multilayer cascade-forward neural networkSymmetric hard-limit activation function outperformance: good in classification accuracy of single-layer feed-forward neural networkSymmetric hard-limit activation function outperformance: good in error rate of single-layer feed-forward neural networkNormalized radial basis activation function outperformance: good in classification accuracy of multilayer feed-forward neural networkNormalized radial basis activation function outperformance: good in error rate of multilayer feed-forward neural networkElliot symmetric sigmoid and soft max activation function outperformance: good in classification accuracy of single-layer cascade-forward neural networkElliot symmetric sigmoid and soft max activation function outperformance: good in error rate of single layer cascade-forward neural networkCompetitive activation function outperformance: good in classification accuracy of multilayer cascade-forward neural networkCompetitive activation function outperformance: good in error rate of multilayer cascade-forward neural networkInitially, the reduced feature set selected from the feature selection methods is normalized between zero and one. That is each value in the feature set is divided by the maximum value from the set. These normalized values are assigned to the input neurons.The number of hidden neurons is greater than or equal to the number of input neurons. Moreover, there is only one output neuron. Initial weights are assigned randomly. The output from each hidden neuron is calculated using the sigmoid function:, where λ = 1 and (1)where wih is the weight assigned between input and hidden layer and k is the input value. The output from the output layer is calculated using the sigmoid function., where λ = 1 and (2)where who is the weight assigned between hidden and output layer and Si is the output value from hidden neurons. S2 is subtracted from the desired output. Using this error (e) value, the updating of weight is performed as:δ = eS2 (1–S2) (3)The weights assigned between the input and the hidden layer and the hidden and output layer are updated as:who = who + (nδ S1) (4)wih = wih + (nδ K) (5)where n is the learning rate and k is the input value. Again the output is calculated from the hidden and output neurons. Then, the error (e) value is checked and the weights get updated.[2] This procedure is repeated till the target output is equal to the desired output. The algorithm of back-propagation classifier for classification is shown below.[10]
Feed-forward back-propagation model
FFBP artificial intelligence model consists of input, hidden, and output layers. Back-propagation learning algorithm was used for learning these networks. During training this network, calculations were carried out from input layer of network toward output layer, and error values were then propagated to prior layers. Feed-forward networks often have one or more hidden layers of sigmoid neurons followed by an output layer of linear neurons. Multiple layers of neurons with nonlinear transfer functions allow the network to learn nonlinear and linear relationships between input and output vectors. The linear output layer lets the network produce values outside the range –1 to +1. On the other hand, outputs of a network such as between 0 and 1 are produced, then the output layer should use a sigmoid transfer function.[11]
Cascade-forward back-propagation model
CFBP models are similar to feed-forward networks but include a weight connection from the input to each layer and from each layer to the successive layers. While two-layer feed-forward networks can potentially learn virtually any input-output relationship, feed-forward networks with more layers might learn complex relationships more quickly. For example, a three-layer network has connections from layer 1 to layer 2, layer 2 to layer 3, and layer 1 to layer 3. The three-layer network also has connections from the input to all three layers. The additional connections might improve the speed at which the network learns the desired relationship.[12] CFBP artificial intelligence model is similar to FFBP neural network in using the back-propagation algorithm for weights updating, but the main symptom of this network is that each layer of neurons related to all previous layer of neurons.[11]The performance of CFBP and FFBP were evaluated using mean squared normalized error, mean absolute error, sum squared error, and sum absolute error technique.The functionality of 12 different training algorithms, which are used in this work, is synopsized in Table 10. A short description of all training algorithms is presented in Table 10[25] while more analytical representations are shown in Table 10.[131415161718192021222324] The basic steps of the back-propagation algorithm have been described in several textbooks.[2627] The functionality of ten different activation functions, which are used in this work, is synopsized in Table 11.[2829] The overall performance of the neural network based on the algorithm and activation functions are shown in Figures 23 and 24.
Table 10
Description of artificial neural networks training algorithms
Table 11
Description of artificial neural networks activation functions
Figure 23
Classification accuracy outperformance in different neural network based on algorithms
Figure 24
Classification accuracy outperformance in different neural network based on activation function
Description of artificial neural networks training algorithmsDescription of artificial neural networks activation functionsClassification accuracy outperformance in different neural network based on algorithmsClassification accuracy outperformance in different neural network based on activation function
CONCLUSION
In this study, neural network has been used to classify the GC as malignant or benign or normal. Based on the obtained results, the RP algorithm produced, up to the mark of classification accuracy (92.54%, 93.17%, and 92.49%), during training of single- and multi-layer FFBP neural network; however, by multilayer CFBP neural network, LM (92.62%) produces better classification accuracy. Normalized radial basis (92.15%), symmetric hard-limit (91.71%), Elliot symmetric sigmoid (91.56%), competitive (92.36%), soft max (91.56%) activation function supports the algorithms performance for the better classification. It was also observed that in general, multi hidden layer network provided the better classification accuracy compared to the single hidden layer network to classify the breath samples of GC. In the near future, we need to standardize the procedures and develop a learning system widely acceptable by breath analysts worldwide. In this way, we will be able to reduce deaths due to GC, the second leading cause of cancer deaths worldwide.
Financial Support and Sponsorship
Nil.
Conflicts of Interest
There are no conflicts of interest.
Table 3
Classification accuracy of multilayer feed-forward neural network
Table 4
Classification accuracy of single-layer cascade-forward neural network
Table 7
Classification accuracy of multilayer feed-forward neural network based on activation functions
Table 8
Classification accuracy of single-layer cascade-forward neural network based on activation functions
Authors: David J Beale; Oliver A H Jones; Avinash V Karpe; Saravanan Dayalan; Ding Yuan Oh; Konstantinos A Kouremenos; Warish Ahmed; Enzo A Palombo Journal: Int J Mol Sci Date: 2016-12-23 Impact factor: 5.923