Literature DB >> 29186913

Colorectal Cancer and Colitis Diagnosis Using Fourier Transform Infrared Spectroscopy and an Improved K-Nearest-Neighbour Classifier.

Qingbo Li1, Can Hao2, Xue Kang3, Jialin Zhang4, Xuejun Sun5, Wenbo Wang6, Haishan Zeng7.   

Abstract

Combining Fourier transform infrared spectroscopy (FTIR) with endoscopy, it is expected that noninvasive, rapid detection of colorectal cancer can be performed in vivo in the future. In this study, Fourier transform infrared spectra were collected from 88 endoscopic biopsy colorectal tissue samples (41 colitis and 47 cancers). A new method, viz., entropy weight local-hyperplane k-nearest-neighbor (EWHK), which is an improved version of K-local hyperplane distance nearest-neighbor (HKNN), is proposed for tissue classification. In order to avoid limiting high dimensions and small values of the nearest neighbor, the new EWHK method calculates feature weights based on information entropy. The average results of the random classification showed that the EWHK classifier for differentiating cancer from colitis samples produced a sensitivity of 81.38% and a specificity of 92.69%.

Entities:  

Keywords:  Fourier transform infrared spectroscopy (FTIR); colorectal cancer; entropy weight local-hyperplane k-nearest-neighbor (EWHK); pattern recognition

Mesh:

Year:  2017        PMID: 29186913      PMCID: PMC5750796          DOI: 10.3390/s17122739

Source DB:  PubMed          Journal:  Sensors (Basel)        ISSN: 1424-8220            Impact factor:   3.576


1. Introduction

Every year, the number of cancer-caused deaths rises [1]. Among all types of cancer, colorectal cancer is the third most common cause of cancer death worldwide, with an annual incidence of approximately one million cases and 600,000 deaths. The high mortality rate is partially attributed to the fact that established clinical procedures lack reliability and sensitivity for finding cancer at early stages [2,3]. Thus, the importance of early diagnosis in preventing and treating cancer mandates development of an accurate, fast, convenient, and inexpensive diagnostic tool for early detection [4]. Substantial modifications in cancer cells at the molecular level occur prior to morphological changes could be observed in tissues. Therefore, molecular spectroscopes are promising tools to detect cancer-related chemical changes at an early stage [1]. In particular, Fourier transform infrared spectroscopy (FTIR), a popular tool in modern analytical chemistry labs, provides rich information about the bio-molecules that act as building blocks in tissues and cells [5,6,7,8]. Existing clinical diagnosis requires taking biopsy via endoscope, which causes pain and requires lengthy pathological exams. In addition, surgical resection can lead to taking biopsies from non-cancerous tissues due to a number of factors, and there is always a possibility that malignant cells could go into blood stream during such invasive procedure. Therefore, being able to diagnose colorectal cancer in vivo or ex vivo could largely overcome the limitations of existing procedures, providing accurate and rapid determination of proper operative treatment. Combining an attenuated total reflectance (ATR) fiber probe-coupled FTIR spectrometer with an endoscope, a simple, rapid, and noninvasive method to detect human cancer tissues directly with minimal sample preparation may achieve results comparable to the gene expression-based method [9,10,11,12,13]. The current work was carried out on ex vivo tissues using an ATR-FTIR probe. In recent years, the use of FTIR to diagnose various cancers, such as lung, breast, gastric, liver, and colorectal cancer, has been reported [14,15,16,17,18,19,20,21,22,23]. Chemometric methods, such as support vector machine (SVM) [21], K-nearest neighbor classifier (KNN) [22], and K-local hyperplane distance nearest neighbor (HKNN), enable efficient information extraction and classification model calibration [24,25]. These afore mentioned reports indicate that FTIR spectroscopy along with an effective chemometric classifier could be a useful tool for screening a variety of human tumors. Until now, few studies have been developed for diagnosis and discrimination of colorectal cancers and colitis using FTIR spectroscopy [26]. Most research efforts focus on enabling a high-accuracy and high-sensitivity algorithm for cancer diagnosis. In this study, we combine preprocessing techniques and a novel classification method for analyzing FT-IR spectral data and achieve high accuracy in diagnosing colorectal cancer tissues.

2. Materials and Methods

2.1. Tissue Specimens

All colorectal cancer and colitis tissues were provided by the Medical Division of the First Hospital of Xi’an Jiaotong University, China. Informed consent was obtained from each patient prior to the study, and clinical diagnosis was confirmed by histopathology. A total of 88 tissue samples from 42 female and 46 male patients, were obtained. The average age was 53.7 years old with the oldest being 76 years and the youngest age being 21 years. One fresh endoscopic biopsy of 1–3 mm in diameter was obtained from each patient. According to the pathological exam results, the samples consisted of 41 cases of colitis and 47 cases of cancer.

2.2. Instrumentation and FTIR Data Collection

A WQF-500 FTIR spectrometer linked with a modified attenuated total reflectance (ATR) fiber probe (Beijing No. 2 optical instrument factory, Beijing, China) was used to acquire spectra. The FTIR spectrometer was equipped with a liquid-nitrogen-cooled mercury cadmium telluride (MCT) detector. Specimens were frozen and transported to the laboratory. Before experiment, frozen specimens were thawed at room temperature for approximately 3–5 min. Then, a background spectrum was acquired first. The ATR probe was placed at a 90° angle on the tissue specimen surface for spectrum acquisition. To achieve an acceptable signal-to-noise ratio at a resolution of 4 cm−1, 32 scans were recorded with wavenumbers ranging from 1000 cm−1 to 4000 cm−1. The procedure took approximately 1–2 min. After sample spectra were recorded, samples were stored in liquid nitrogen and sent for the histological examination as reference for spectral analysis.

2.3. Spectra Preprocessing Method

Two preprocessing methods, viz., smoothing and standard normal variate (SNV) [27,28,29], were performed on the FTIR spectra. First, the Savitsky–Golay algorithm with a window width of 5 points was applied to each spectrum to reduce random noise in the data. Then, all available spectra were normalized by the SNV method to remove multiplication interference, slope variation, and scatter effects generated by particles of the sample. For spectrum of sample at wavenumber , SNV standardization is defined as follows: where denotes the average spectrum of sample , denotes the number of wavelengths, and denotes the degree of freedom.

2.4. Entropy Weight Local-Hyperplane K-Nearest Neighbor Method

A novel classification method called entropy weight local-hyperplane k-nearest neighbor (EWHK) is proposed for discrimination between colorectal cancer and colitis. For the EWHK method, which is an upgrade of K-local hyperplane distance nearest neighbor (HKNN) algorithm [24,25], feature weights of training sets based on the information entropy are objectively considered to measure the importance of each single feature and to avoid the bias in high dimensions and the limit in small values of the nearest neighbor. On the other hand, HKNN treats every variable as an equally relevant component for classification. Therefore, the class labels of unknown samples are calculated according to the feature weights, the Euclidean distance, and the local hyperplane. Suppose that training set consists of training instances with classes. Each training instance consists of input features with known class label , for and . The class label of a query with input vector . The three stages in the proposed method were as follows: prototype selection, local hyperplane construction, and query classification. Firstly, the feature weight is estimated objectively based on the concept of information entropy to figure out the entropy weight according to the variance of every variable. Low information entropy resulted in high feature weight, which corresponds to a feature with better class separation capability. The entropy weight is calculated according to the following formula: where denotes the normalized th component of sample in the training set; denotes the regularization parameter; denotes the information entropy of the th feature of the sample. Hence, new weighted Euclidean distance metric between and is defined as follows: Then, a local hyperplane of class is constructed for the given query according to the distance metric and the number of nearest neighbors of class . Formally, the formula is as follows: where is the th nearest neighbor of class ; is solved by minimizing the distance between and using regularization. Thus, the calculated minimum distance is as follows: where is the regularization parameter. is minimized, and the equation , where is used to calculate , is derived. Finally, the class of the query is assigned as follows: .

3. Results and Discussion

3.1. Preprocessing of FTIR Spectra

In the process of measurement, the obtained spectra contain not only useful information regarding the molecular structure and the components of the measured samples, but also the noises, such as the high-frequency random noise, the baseline drift, and the stray light. This additional noise needs to be eliminated; otherwise, they will affect the discrimination result. Prior to classification analysis, data preprocessing is necessary to improve performance of the classification model. Savitzky–Golay (SG) smoothing reduces random noise, and SNV was applied to remove unwanted background variances to some extent. There is no bio-molecules absorbance peak in the 1800–2800 cm−1 region, the majority of peaks are in the 1000–1800 cm−1 region and in the 2800–3800 cm−1 region. Thus, the SNV method was separately used from 1000 to 1800 cm−1 and from 2800 to 3800 cm−1.The FTIR spectra of colitis and cancerous tissues before and after preprocessing are shown in Figure 1. After performing background correction and normalization, useful information of all spectra (such as at the wavenumber near 1743 cm−1, 2858cm−1, and 2924 cm−1) were marked as shown in Figure 2. The quality of FTIR spectra was greatly improved after data preprocessing.
Figure 1

The Fourier transform infrared spectra (FTIR) of colitis and cancerous tissues. (a) Original spectra of colitis tissues; (b) original spectra of cancerous tissues; (c) preprocessed spectra of colitis tissues using smoothing and standard normal variate (SNV); (d) preprocessed spectra of cancerous tissues using smoothing and SNV.

Figure 2

The typical FTIR spectra of colorectal biopsies after preprocessing.

3.2. Analysis of FTIR Spectra

A total of 88 spectra were obtained by FTIR spectroscopy within the spectral region between 1000 and 4000 cm−1. Because there were no bio-molecule absorbance peaks in the 1800–2800 cm−1 region, the SNV method was separately used from 1000 to 1800 cm−1 and from 2800 to 3800 cm−1 after smoothing. Figure 2 shows the spectra of colitis and colorectal cancer biopsies after preprocessing, where the band assignments of major absorption in the FTIR spectra of colorectal tissue are marked. The major peaks are similar for the spectra of colitis and colorectal cancer. However, the differences including peak shape and relative intensity can be observed. These results are reasonable because significant changes occurred in both the structure and composition of the main bio-molecules, which constitute the cell such as DNA, water, protein, and lipids, between cancerous and colitis tissues. As is shown in Figure 2, the spectral profile of cancerous tissues indicates the presence of fewer lipids and more proteins compared with colitis tissues. The peak intensity of the C=O band assigning to the lipids (near 1743 cm−1) and the peak intensity of C-H stretching vibration bands relating to the lipids (near 2958 cm−1, 2924 cm−1, and 2858 cm−1) decrease and even disappear in the spectra of malignant tissues, making it essential to consume fat in the malignant tissue to meet the nutritional and energy requirements in carcinoma development. The spectral profile of cancerous tissues indicates the presence of proteins at wavenumbers ~1643 cm−1 and ~1550cm−1, which belong to amide I band and amide II band of the protein, respectively. The relative intensity near decreases more for the spectra of cancerous tissues than for those in colitis biopsies because of the changes in the proportion of proteins during tumor formation. The intensity of the ~1460 cm−1 peak is weaker than that of the ~1400 cm−1 peak in the spectra of the cancerous samples, while the peak at ~1460 cm−1 is stronger than or equal to that of ~1400 cm−1 in the spectra of colitis samples. Cancerous tissue contains greater amounts of nucleic acids, collagen, and certain amino acids compared to the colitis ones. In colitis tissues, the peak at ~1240 cm−1 is weaker, and the band near 1310 cm−1 becomes weak and sometimes disappears. The absorption peak ~1080 cm−1 assigned to nucleic acid is obviously weaker in the spectra of colitis samples than that in the spectra of cancerous tissues. The peak at ~1160 cm−1 assigned to carbohydrate decreases noticeably in the spectra of the cancerous samples. Thus, the characteristics mentioned above between cancerous and colitis tissues provide the basis for spectroscopic diagnosis. Specific assignments of individual peaks can be found in Table 1.
Table 1

Peak positions and assignments of FTIR bands in colon tissues.

Peak Positions (cm−1)Major Assignment
1080Stretching vibration(DNA, RNA)
1160Carbohydrate
1240Asymmetric stretching vibration(RNA)
1310C-H deformation vibration & Amide III (protein)
1550amide II (protein)
1643amide I (protein)
1743C=O stretching vibration(lipids)
2858, 2924, 2958C-H stretching vibration (lipids)
3300N-H stretching vibration( protein), O-H stretching vibration(water)

3.3. Classification Analysis

After spectra preprocessing, 88 spectra data (41 colitis and 47 cancers) were analyzed to identify their class labels. The total 88 spectra were divided into two data sets. The 44 FTIR spectra (21 colitis and 23 cancers) after preprocessing were randomly selected as the training set. The other 44 FTIR spectra (20 colitis and 24 cancers) after preprocessing were randomly selected as the test set. Both EWHK and traditional classification models were built by the training set and validated by the test set. Traditional classification models include SVM and HKNN in this paper. These procedures were repeated five times. The five predicted results were averaged. Table 2 shows the classification results of colorectal tissues with entropy weight local-hyperplane k-nearest neighbor (EWHK). Table 3 shows the average of the five results using the three different methods.
Table 2

The average of predict results of colorectal tissues with entropy weight local-hyperplane k-nearest neighbor (EWHK).

Histologic ExaminationThe Predicted Results of Fourier Transform Infrared Spectroscopy (FTIR) Spectra
CancerColitis
Cancer389
Colitis338
Table 3

Comparison of the average of statistical analysis results with several classification models.

MethodSensitivity (%)Specificity (%)PPV 1 (%)NPV 2 (%)Accuracy (%)
EWHK81.3892.6992.6880.8585.91
HKNN66.4696.1995.2570.4379.77
SVM72.4668.7070.3071.8770.45

Note: 1 Positive predictive value; 2 Negative predictive value.

The experiment results are summarized in Table 2 and Table 3. The classification results of colorectal tissue samples with EWHK (Table 2) showed that, among the 88 cases of colorectal tissue samples, only three colitis samples and nine cancer samples are misclassified. In Table 3, EWHK achieved a high accuracy, viz., 85.91%. In addition, other statistics results of detection of colorectal biopsies by FTIR spectroscopy with EWHK and traditional classification models are shown in Table 3. For colorectal cancer diagnosis with EWHK, sensitivity is 81.38%, specificity is 92.69%, predictive value of a positive test is 92.68%, and the predictive value of a negative test is 80.85%. In comparison, statistical analysis results with HKNN were worse than those with EWHK in diagnosing colorectal cancer tissues, achieving 66.46% sensitivity and 79.77% accuracy. The SVM works worse than HKNN. SVM can perform well with large-scale data. However, the choices of the parameters for the kernel are complex and unstable. The HKNN works well only for small values of the nearest-neighbor. However, the accuracy decreases as values of the nearest-neighbor increase. The FTIR spectra can be classified accurately with EWHK because it considers the influence of feature weight according to the information entropy of every variable. In conclusion, the results indicate that the EWHK has better capability in identifying colorectal cancer from colitis.

4. Conclusions

This study shows that it is feasible to classify colitis and cancers using FTIR spectroscopy and chemometrics. FTIR fiber-optic ATR spectroscopy is a powerful tool to detect changes at the molecular level and can rapidly capture small changes in molecular compositions and structures. Therefore, it has the potential to be further developed into noninvasive, in vivo, and real-time detection tools of cancerous tissues before a surgical operation is required. Data pre-processing such as smoothing and SNV greatly improved the signal-to-noise ratio for the FTIR spectra of colorectal tissues, and the EWHK classifier achieved a classification accuracy of 85.91%. The reason that EWHK performs well is because feature weights are calculated according to the information entropy of every variable. The proposed preprocessing and classification method using FTIR spectroscopy is effective and practical for in vivo colorectal cancer or other malignant tissue diagnosis and will be pursued in future studies.
  21 in total

1.  The potential of chiroptical and vibrational spectroscopy of blood plasma for the discrimination between colon cancer patients and the control group.

Authors:  Michal Tatarkovič; Michaela Miškovičová; Lucie Šťovíčková; Alla Synytsya; Luboš Petruželka; Vladimír Setnička
Journal:  Analyst       Date:  2015-04-07       Impact factor: 4.616

Review 2.  IR microspectroscopy: potential applications in cervical cancer screening.

Authors:  Michael J Walsh; Matthew J German; Maneesh Singh; Hubert M Pollock; Azzedine Hammiche; Maria Kyrgiou; Helen F Stringfellow; Evangelos Paraskevaidis; Pierre L Martin-Hirsch; Francis L Martin
Journal:  Cancer Lett       Date:  2006-05-19       Impact factor: 8.679

3.  Fourier transform infrared spectroscopy of gallbladder carcinoma cell line.

Authors:  Jun-Kai Du; Jing-Sen Shi; Xue-Jun Sun; Jian-Sheng Wang; Yi-Zhuang Xu; Jin-Guang Wu; Yuan-Fu Zhang; Shi-Fu Weng
Journal:  Hepatobiliary Pancreat Dis Int       Date:  2009-02

4.  Identification and characterization of colorectal cancer using Raman spectroscopy and feature selection techniques.

Authors:  Shaoxin Li; Gong Chen; Yanjiao Zhang; Zhouyi Guo; Zhiming Liu; Junfa Xu; Xueqiang Li; Lin Lin
Journal:  Opt Express       Date:  2014-10-20       Impact factor: 3.894

5.  Intraoperative diagnosis of benign and malignant breast tissues by fourier transform infrared spectroscopy and support vector machine classification.

Authors:  Peirong Tian; Weitao Zhang; Hongmei Zhao; Yutao Lei; Long Cui; Wei Wang; Qingbo Li; Qing Zhu; Yuanfu Zhang; Zhi Xu
Journal:  Int J Clin Exp Med       Date:  2015-01-15

6.  [Research on Application of Fourier Transform Infrared Spectrometry in the Diagnosis of Lymph Node Metastasis in Gastric Cancer].

Authors:  Yue-kui Bai; Li-wei Yu; Le Zhang; Jing Fu; Hui Leng; Xiao-jun Yang; Jun-qiang Ma; Xiao-juan Li; Xiu-juan Li; Qing Zhu; Yuan-fu Zhang; Xiao-feng Ling; Wen-lan Cao
Journal:  Guang Pu Xue Yu Guang Pu Fen Xi       Date:  2015-03       Impact factor: 0.589

7.  In vivo and in situ detection of colorectal cancer using Fourier transform infrared spectroscopy.

Authors:  Qing-Bo Li; Zhi Xu; Neng-Wei Zhang; Li Zhang; Fan Wang; Li-Min Yang; Jian-Sheng Wang; Su Zhou; Yuan-Fu Zhang; Xiao-Si Zhou; Jing-Sen Shi; Jin-Guang Wu
Journal:  World J Gastroenterol       Date:  2005-01-21       Impact factor: 5.742

8.  Infrared spectral histopathology for cancer diagnosis: a novel approach for automated pattern recognition of colon adenocarcinoma.

Authors:  Jayakrupakar Nallala; Marie-Danièle Diebold; Cyril Gobinet; Olivier Bouché; Ganesh Dhruvananda Sockalingum; Olivier Piot; Michel Manfait
Journal:  Analyst       Date:  2014-08-21       Impact factor: 4.616

9.  [Study on malignant and normal rectum tissues using IR and 1H and 31P NMR spectroscopy].

Authors:  Xiu-xiang Gao; Hong-wei Yao; Jun-kai Du; Mei-xian Zhao; Jian Qi; Hui-zhen Li; Qing-hua Pan; Yi-zhuang Xu; Jin-guang Wu
Journal:  Guang Pu Xue Yu Guang Pu Fen Xi       Date:  2009-04       Impact factor: 0.589

10.  Rapid detection of melamine adulteration in dairy milk by SB-ATR-Fourier transform infrared spectroscopy.

Authors:  Sana Jawaid; Farah N Talpur; S T H Sherazi; Shafi M Nizamani; Abid A Khaskheli
Journal:  Food Chem       Date:  2013-06-02       Impact factor: 7.514

View more
  6 in total

1.  An investigation on annular cartilage samples for post-mortem interval estimation using Fourier transform infrared spectroscopy.

Authors:  Zhouru Li; Jiao Huang; Zhenyuan Wang; Ji Zhang; Ping Huang
Journal:  Forensic Sci Med Pathol       Date:  2019-08-01       Impact factor: 2.007

Review 2.  Liquid biopsy for cancer diagnosis using vibrational spectroscopy: systematic review.

Authors:  D J Anderson; R G Anderson; S J Moug; M J Baker
Journal:  BJS Open       Date:  2020-05-19

3.  Fourier transform infrared spectroscopic imaging of colon tissues: evaluating the significance of amide I and C-H stretching bands in diagnostic applications with machine learning.

Authors:  Cai Li Song; Martha Z Vardaki; Robert D Goldin; Sergei G Kazarian
Journal:  Anal Bioanal Chem       Date:  2019-08-16       Impact factor: 4.142

4.  Synergy Effect of Combined Near and Mid-Infrared Fibre Spectroscopy for Diagnostics of Abdominal Cancer.

Authors:  Thaddäus Hocotz; Olga Bibikova; Valeria Belikova; Andrey Bogomolov; Iskander Usenov; Lukasz Pieszczek; Tatiana Sakharova; Olaf Minet; Elena Feliksberger; Viacheslav Artyushenko; Beate Rau; Urszula Zabarylo
Journal:  Sensors (Basel)       Date:  2020-11-23       Impact factor: 3.576

5.  Colorectal Cancer Detected by Machine Learning Models Using Conventional Laboratory Test Data.

Authors:  Hui Li; Jianmei Lin; Yanhong Xiao; Wenwen Zheng; Lu Zhao; Xiangling Yang; Minsheng Zhong; Huanliang Liu
Journal:  Technol Cancer Res Treat       Date:  2021 Jan-Dec

6.  Fourier-Transform Infra-Red Microspectroscopy Can Accurately Diagnose Colitis and Assess Severity of Inflammation.

Authors:  Charlotte Keung; Philip Heraud; Nathan Kuk; Rebecca Lim; William Sievert; Gregory Moore; Bayden Wood
Journal:  Int J Mol Sci       Date:  2022-03-05       Impact factor: 5.923

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.