Literature DB >> 28451550

Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review.

Azadeh Bashiri1, Marjan Ghazisaeedi1, Reza Safdari1, Leila Shahmoradi1, Hamide Ehtesham1.   

Abstract

BACKGROUND: Today, despite the many advances in early detection of diseases, cancer patients have a poor prognosis and the survival rates in them are low. Recently, microarray technologies have been used for gathering thousands data about the gene expression level of cancer cells. These types of data are the main indicators in survival prediction of cancer. This study highlights the improvement of survival prediction based on gene expression data by using machine learning techniques in cancer patients.
METHODS: This review article was conducted by searching articles between 2000 to 2016 in scientific databases and e-Journals. We used keywords such as machine learning, gene expression data, survival and cancer.
RESULTS: Studies have shown the high accuracy and effectiveness of gene expression data in comparison with clinical data in survival prediction. Because of bewildering and high volume of such data, studies have highlighted the importance of machine learning algorithms such as Artificial Neural Networks (ANN) to find out the distinctive signatures of gene expression in cancer patients. These algorithms improve the efficiency of probing and analyzing gene expression in cancer profiles for survival prediction of cancer.
CONCLUSION: By attention to the capabilities of machine learning techniques in proteomics and genomics applications, developing clinical decision support systems based on these methods for analyzing gene expression data can prevent potential errors in survival estimation, provide appropriate and individualized treatments to patients and improve the prognosis of cancers.

Entities:  

Keywords:  Cancer; Clinical decision support system; Gene expression; Machine-learning techniques; Survival

Year:  2017        PMID: 28451550      PMCID: PMC5402773     

Source DB:  PubMed          Journal:  Iran J Public Health        ISSN: 2251-6085            Impact factor:   1.429


Introduction

Today, cancer is one of the major health problems worldwide (1–3). Cancer research in the 21st century has become one of the most common efforts. The International Agency for Research on Cancer, based on 2002 dataset, estimated that the numbers of cancer patients are 25 million and the American Cancer Society in 2004, announced that officially cancer would replace heart disease as the main cause of death (4–6). Despite the many advances in early detection of diseases, cancer patients have a poor prognosis and survival rate for such patients are low (7–9). Correct perception of the biologic behavior of the tumor and its analysis, help to correct choice of treatment and has a potential to improve the consequences of cancer as well. Accurate estimation of prognosis and survival duration is the most important part of a process of clinical decision-making in patients with malignant disease (9). The first step to making sure that cancer patients have received proper care is to improve the ability of physicians to formulate this type of estimation (10). The prediction of prognosis includes the vast range of decisions about different aspects of cancer treatment (10). The gene expression profiles obtain from different tissue types (11). By comparing the genes expressed in normal tissue and diseased tissue can bring better insight and understanding of the cancer pathology and help to physicians in decision-making. Checking gene expression patterns for attributes associated with the clinical behavior are very important, because these patterns, examine prognosis and leading to the alternative approach to understand the molecular and physiological mechanisms (10–12). Gene expression pattern analysis offers ways to improve the diagnosis and classification of risk for many cancers (11, 12). Studies have marked the power of analytical methods than histological and clinical criteria in survival prediction. Recently in artificial intelligence domain, developing clinical decision support systems based on machine-learning methods to analyze gene expression data has facilitated and improved the medical prognosis. Studies have shown the higher accuracy of machine learning algorithms than regression models in predicting cancer survival (12, 13). Gene expression data have the potential to prevent errors caused by fatigue and impatience of oncology experts in survival estimation. Analyzing such a data by using machine-learning techniques leads to developing clinical decision support systems for the correct estimation of survival time and so provides proper treatments to patients according to their survival. This achievement can prevent unnecessary surgical and treatment procedures that increase the use of human resources and time that impose unnecessary costs on patients and the health care system (8, 14, 15). This study has highlighted the advantages of machine learning techniques in survival prediction of cancer patients based on gene expression data.

Methods

This review article was conducted by searching articles between 2000 to 2016 in scientific databases (SCOPUS & Pub Med & Google Scholar & IEEE) and e-Journal (science direct), and by using keywords such as machine learning algorithms, gene expression data, survival and cancer. Non-English and unavailable full texts and also the studies that not defined as a journal article were excluded from this study.

Results

Microarray Technology and Gene Expression Data

Several genes and proteins with abnormal function and expression contribute to the cancer development and its pathogenesis (9, 16, 17). Gene expression measures the level of gene activity in a tissue and thus gives information about the complex activities of cells. This data usually obtain by measuring the activation and function of genes during their translation. Since cancers are associated with genetic abnormalities, gene expression data can display these abnormalities. To obtain information about these abnormalities, microarray techniques largely have used. Microarray is a technology for measuring the expression and activities of dozens, hundreds or thousands of genes or proteins simultaneously on a small scale, to monitor changes in their structure and activity. The microarray process involves raw data preprocessing to gain gene expression matrix and then analysis it to show the differences and similarities of gene expression (7, 13, 18, 19). The ultimate goal of this technology is discovering a new treatment or reforming previous treatments based on a combination of genetic tumors (11, 20, 21). Three categories of studies based on microarrays technology are as follows; Class Discovery, Class comparison, and Class prediction. Class Comparison; compare the gene expression profiles of two or more groups of patients. The aim of this category is to find the genes that have different expression levels in groups. Class Discovery; discover sub-groups that are in similar gene-expression profiles. The most common technique in this type of study is clustering. In Class Prediction, categories are predefined and the aim is to determine which class is the target sample belongs to (8, 16, 19). In recent years, microarray technology has provided great evolution in the biomedical sciences and considered in a vast domain of genomics analysis, such as drug discovery, identifying genes and successful clinical diagnostics. The ultimate goal of this technology in cancer is to discover new genetic markers in the diagnosis, treatment and prediction of patient outcomes based on the genetic components of the tumors (8, 16).

Survival prediction in patients with cancers

Cancer prognosis is related to three predictive facets that consist of catching cancer, cancer recurrence, and cancer survivability. The first case predicts the likelihood of developing a type of cancers before the disease occurrence. Second case trying to predict the likelihood of redeveloping cancer and the third case predicts outcomes after the diagnosis of the disease such as life expectancy, survivability, progression and tumor drug sensitivity (22). The survival is an interval that patients are beginning from one starting point until the occurrence of an event, like the period of the beginning and end of a recovery or the time of an illness diagnosis until death (23). Survival analysis is often used to evaluate data from time-to-event in medical research. Often oncologists face the difficult tasks of prognosis and survival in patients with incurable malignancy. Their assessments in these cases are based on clinical experience and comprehensive knowledge of patients. But, such predictions with 20% to 60% accuracy, are largely unreliable, inaccurate and usually more optimistic. Physicians usually tend to over-estimate survival in patients with advanced cancer and sometimes, in spite of the willingness of patients to express prognosis, they refuse to tell it (23, 24). Genetic factors, size, grade and stage of the tumor and the relationship between physicians and patients are vital in survivability prediction. When physicians have a good understanding of their patients’ prognosis, patients likely receive less invasive care (8, 16, 25). Hence, in the medical decision-making process, the ability of physician to formulate a correct estimation of survival among patients with advanced and incurable cancers is necessary (24, 26).

Machine learning in analysis of gene expression data

The levels of gene expression definitely associated with overall survival of cancer patients (9). There are a high association between gene expression levels and survival and various studies have indicated the power of this type of data than clinical data and other prognostic factors (21, 27). The superiority of gene expression data was demonstrated in providing individualized and right treatments in malignant patients (28, 30–33). Analysis of gene profiles is helpful to improve the accuracy of survival estimation and histopathological classification (28). Using predictive tools is the important step in this analysis (26). Machine learning methods have the ability to develop survival predict models based on gene expression data (16). Machine learning is a subfield of computer sciences that create and check algorithms to facilitate pattern recognition, classification, and prediction. Supervised and unsupervised learning are two main facets of mechanization in the machine learning that have applications in biology. The examples of supervised machine learning techniques are ANN and Decision Tree and a popular unsupervised learning technique is clustering (34–36). In analyzing gene expression by using machine learning techniques, at first preprocessing methods extract and analyze gene expression data. In this step, inputs are identified. Then using machine-learning algorithms, predictive models are built and tested. After, this model based on inputs (gene expression data) can predict outputs (survival time) in cancer patients (8, 9, 28). Fig. 1 briefly indicates this process (9).
Fig. 1:

The process of analyzing gene expression data by using machine-learning techniques (9)

The process of analyzing gene expression data by using machine-learning techniques (9) Different studies strongly marked the powerful ability of machine-learning techniques to identify patterns, process the interactions of gene expression data and improve the accuracy of cancer prediction, susceptibility, and recurrence (9, 37). Also, these methods can reduce potential errors that cause by fatigue and impatience of oncology experts in survival estimation (37, 38). Predictive models that have created by using these analytical techniques and based on gene expression data can help physicians to optimize clinical decision-making, provide individualized treatment, manage the patients, and reduce the cost puts patients under pressure and the healthcare systems. We have demonstrated each of these studies, cancer type, participants’ number, machine learning techniques have used and the obtained benefit of them (9, 12–14, 18, 21, 22, 27–33, 38–47) (Table 1).
Table 1:

Experiences of survival prediction by using machine learning methods and based on gene expression data on cancer patients

Cancer TypeParticipants numberMachine learning techniqueTraining dataBenefitsReference
Mantle Cell Lymphoma (MCL)N/ABayesian Model Averaging (BMA)GenomicAnalyzing survival with high precision and low cost by using BMA .Moslemi et al (2016)
Esophageal adenocarcinoma64Tail-strength statistic and Cox regression analysisGenomicCreating high association between gene expression levels and survivalPennathur et al (2013)
Esophageal squamous cell carcinoma12ClusteringGenomicWell predicting by using gene expression data than other prognostic factors.Ishibashi et al (2013)
Non- small cell lung carcinomas91Hierarchical clusteringGenomicImproving histopathological classificationHou et al (2010)
Diffuse large B-Cell lymphoma (DLBCL)58Artificial neural networksGenomic/ClinicalCreating correct prediction of survival time with high accuracyYen-Chen Chen et al (2009)
Astrocytic tumor65Artificial neural networkGenomicCreating a novel model by using ANN for grading Astrocytic tumorPetalidis et al (2008)
Lung adenocarcinomas86Random committee and Bayesian belief networksGenomicProviding correct prediction of patient outcomes and individualized treatment and also increase survival timeGuo et al (2006)
Esophageal carcinoma418Artificial neural networksGenomic/ClinicalProviding more accurate prognosisSato et al(2005)
Breast carcinomas295Decision tree analysisGenomic/ClinicalImproving cancer classifications, clinical decision making and patients’ treatmentChang et al (2005)
Malignant pleural mesothelioma21Artificial neural networksGenomicImproving appropriate therapyPass et al (2004)
Hepatocellular carcinoma (HCC)90ClusteringGenomicProviding a source for treatment selectionLee et al (2004)
Diffuse large B-cell lymphomaN/AClustering techniquesGenomic/ClinicalProviding powerful tool for diagnosing and treating cancerBair et al (2004)
Neuroblastoma49Artificial Neural NetworksGenomicHelping physicians in patient managementWei et al (2004)
Breast cancer78Artificial Neural NetworksGenomicSelecting patients with poor prognosisLancashire et al (2003)
Diffuse large B-cell lymphoma40Artificial Neural NetworksGenomic/ClinicalPredicting survival time with high accuracyO’Neil and song (2003)
Lung adenocarcinomasN/AUnivariate Cox analysisGenomicDetermining of high-risk groupsBeer et al (2002)
Diffuse large B-cell lymphoma40Fuzzy Neural NetworkGenomicExtracting biological markers with high accuracyAndo et al (2002)
Experiences of survival prediction by using machine learning methods and based on gene expression data on cancer patients

Discussion

According to the cancer priority as one of the important health issues in the world that imposes mortality and costs, there is an urgent need for survival prediction strategies. One of the main goals in cancer patients is the estimation of survival. It leads to better management, optimal uses of resources and providing individualized treatments (1, 3). In recent years, the advent of machine learning algorithms to extract and analysis, gene expression in cancer tissues had led to use of these techniques in detection, diagnosis, classification and prediction of cancer patients. Also the researches in biology and medicine domain have confirmed new achievements in cancer prognosis (34–36). In analyzing gene expression data by using machine-learning techniques for survivability prediction, at first, these data are extracted and analyzed by preprocessing methods. Then, by making and testing predictive models using machine learning algorithms, survival duration can be predicted (8, 28). In this study, we highlighted the advantages of using machine-learning techniques in analyzing gene expression data in order to correct prediction of patients’ survival and histopathological classification. By investigating studies with different kinds of cancers, our findings indicated the power of gene expression data in survival prediction than other prognostic factors such as clinical data. The studies of Ishibashi et al. on patients with esophageal squamous cell carcinoma and Pennathur et al. in esophageal adenocarcinoma supported this point. According to their findings, there is a high association between gene expression levels and survival and also gene expression data provide better prediction of cancer patients (21, 27). Some studies used the combination of genetic and clinical data in the prediction of survival. Using gene expression data can provide a more accurate prognosis and cancer classifications as well as providing correct decision making for patients’ treatment (9, 28, 31, 40, 44) supported the effectiveness of gene expression data in helping physicians in patient management and providing a source for appropriate treatment of cancer patients (32, 45). Bair et al. highlighted such a data provide powerful tool for diagnosing and treating any type of malignancy (44). Because of the capabilities of machine-learning techniques in the vast range including detection, diagnosis, classification and prediction, we found the feasibility of such analytical techniques in creating a clinical decision support tool to predict survival in cancer patients (48, 49). Chang et al. showed the improvement in classifications of breast cancer by using decision tree analysis (31). Other studies mentioned the high precision and low cost of Bayesian Model techniques in survival prediction of Mantle Cell Lymphoma (30). In addition, Guo et al. using the Random committee and Bayesian belief networks on Lung adenocarcinomas showed the capacity of these algorithms in the true prediction of patient outcomes (42). Among machine learning algorithms, our review of various studies has indicated the frequency usage of (ANN) with high accuracy in survival prediction for any malignancy based on gene expression data (9, 28, 32, 43, 45, 46). However, the combination of ANN and Fuzzy logic with 93% accuracy is superior and powerful tool for extracting significant biological markers (38, 39).

Conclusion

Developing clinical decision support systems based on these algorithms for analyzing gene expression data can improve survival prediction and prognosis of cancer patients.

Ethical considerations

Ethical issues (Including plagiarism, Informed Consent, misconduct, data fabrication and/or falsification, double publication and/or submission, redundancy, etc.) have been completely observed by the authors.
  39 in total

1.  Fuzzy logic-based prognostic score for outcome prediction in esophageal cancer.

Authors:  Chang-Yu Wang; Tsair-Fwu Lee; Chun-Hsiung Fang; Jyh-Horng Chou
Journal:  IEEE Trans Inf Technol Biomed       Date:  2012-08-02

Review 2.  cDNA microarray analysis of esophageal cancer: discoveries and prospects.

Authors:  Yutaka Shimada; Fumiaki Sato; Kazuharu Shimizu; Gozoh Tsujimoto; Kazuhiro Tsukada
Journal:  Gen Thorac Cardiovasc Surg       Date:  2009-07-14

3.  Predicting the risk of squamous dysplasia and esophageal squamous cell carcinoma using minimum classification error method.

Authors:  Motahareh Moghtadaei; Mohammad Reza Hashemi Golpayegani; Farshad Almasganj; Arash Etemadi; Mohammad R Akbari; Reza Malekzadeh
Journal:  Comput Biol Med       Date:  2013-11-26       Impact factor: 4.589

Review 4.  Esophageal cancer: a systematic review.

Authors:  R Wong; R Malthaner
Journal:  Curr Probl Cancer       Date:  2000 Nov-Dec       Impact factor: 3.187

5.  Constructing molecular classifiers for the accurate prognosis of lung adenocarcinoma.

Authors:  Lan Guo; Yan Ma; Rebecca Ward; Vince Castranova; Xianglin Shi; Yong Qian
Journal:  Clin Cancer Res       Date:  2006-06-01       Impact factor: 12.531

6.  Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival.

Authors:  Howard Y Chang; Dimitry S A Nuyten; Julie B Sneddon; Trevor Hastie; Robert Tibshirani; Therese Sørlie; Hongyue Dai; Yudong D He; Laura J van't Veer; Harry Bartelink; Matt van de Rijn; Patrick O Brown; Marc J van de Vijver
Journal:  Proc Natl Acad Sci U S A       Date:  2005-02-08       Impact factor: 11.205

7.  Improved grading and survival prediction of human astrocytic brain tumors by artificial neural network analysis of gene expression microarray data.

Authors:  Lawrence P Petalidis; Anastasis Oulas; Magnus Backlund; Matthew T Wayland; Lu Liu; Karen Plant; Lisa Happerfield; Tom C Freeman; Panayiota Poirazi; V Peter Collins
Journal:  Mol Cancer Ther       Date:  2008-04-29       Impact factor: 6.261

8.  Gene expression profiles predict survival and progression of pleural mesothelioma.

Authors:  Harvey I Pass; Zhandong Liu; Anil Wali; Raphael Bueno; Susan Land; Daniel Lott; Fauzia Siddiq; Fulvio Lonardo; Michele Carbone; Sorin Draghici
Journal:  Clin Cancer Res       Date:  2004-02-01       Impact factor: 12.531

9.  Fuzzy neural network applied to gene expression profiling for predicting the prognosis of diffuse large B-cell lymphoma.

Authors:  Tatsuya Ando; Miyuki Suguro; Taizo Hanai; Takeshi Kobayashi; Hiroyuki Honda; Masao Seto
Journal:  Jpn J Cancer Res       Date:  2002-11

Review 10.  Machine learning and its applications to biology.

Authors:  Adi L Tarca; Vincent J Carey; Xue-wen Chen; Roberto Romero; Sorin Drăghici
Journal:  PLoS Comput Biol       Date:  2007-06       Impact factor: 4.475

View more
  6 in total

1.  Association of specific gene mutations derived from machine learning with survival in lung adenocarcinoma.

Authors:  Han-Jun Cho; Soonchul Lee; Young Geon Ji; Dong Hyeon Lee
Journal:  PLoS One       Date:  2018-11-12       Impact factor: 3.240

2.  Characterizing Artificial Intelligence Applications in Cancer Research: A Latent Dirichlet Allocation Analysis.

Authors:  Bach Xuan Tran; Carl A Latkin; Noha Sharafeldin; Katherina Nguyen; Giang Thu Vu; Wilson W S Tam; Ngai-Man Cheung; Huong Lan Thi Nguyen; Cyrus S H Ho; Roger C M Ho
Journal:  JMIR Med Inform       Date:  2019-09-15

3.  Comparison of wavelet transformations to enhance convolutional neural network performance in brain tumor segmentation.

Authors:  Mohamadreza Hajiabadi; Behrouz Alizadeh Savareh; Hassan Emami; Azadeh Bashiri
Journal:  BMC Med Inform Decis Mak       Date:  2021-11-23       Impact factor: 2.796

Review 4.  Research Progress of Gliomas in Machine Learning.

Authors:  Yameng Wu; Yu Guo; Jun Ma; Yu Sa; Qifeng Li; Ning Zhang
Journal:  Cells       Date:  2021-11-15       Impact factor: 6.600

5.  Ensemble Methods with Voting Protocols Exhibit Superior Performance for Predicting Cancer Clinical Endpoints and Providing More Complete Coverage of Disease-Related Genes.

Authors:  Runyu Jing; Yu Liang; Yi Ran; Shengzhong Feng; Yanjie Wei; Li He
Journal:  Int J Genomics       Date:  2018-01-10       Impact factor: 2.326

6.  Transfer learning with convolutional neural networks for cancer survival prediction using gene-expression data.

Authors:  Guillermo López-García; José M Jerez; Leonardo Franco; Francisco J Veredas
Journal:  PLoS One       Date:  2020-03-26       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.