Literature DB >> 35265709

Healthcare Biclustering-Based Prediction on Gene Expression Dataset.

M Ramkumar¹, N Basker², D Pradeep³, Ramesh Prajapati⁴, N Yuvaraj⁵, R Arshath Raja⁵, C Suresh⁶, Rahul Vignesh⁷, U Barakkath Nisha⁸, K Srihari⁹, Assefa Alene¹⁰.

Abstract

In this paper, we develop a healthcare biclustering model in the field of healthcare to reduce the inconveniences linked to the data clustering on gene expression. The present study uses two separate healthcare biclustering approaches to identify specific gene activity in certain environments and remove the duplication of broad gene information components. Moreover, because of its adequacy in the problem where populations of potential solutions allow exploration of a greater portion of the research area, machine learning or heuristic algorithm has become extensively used for healthcare biclustering in the field of healthcare. The study is evaluated in terms of average match score for nonoverlapping modules, overlapping modules through the influence of noise for constant bicluster and additive bicluster, and the run time. The results show that proposed FCM blustering method has higher average match score, and reduced run time proposed FCM than the existing PSO-SA and fuzzy logic healthcare biclustering methods.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35265709 PMCID： PMC8901349 DOI： 10.1155/2022/2263194

Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411

1. Introduction

DNA microarray technologies help to measure levels of expression in experimental circumstances of thousands of genes [1]. Local patterns have motivated the large study to use pattern-based searches to deal with them. Due to its capacity to uncover hidden designs, the use of healthcare biclustering in biological data is common in the field of healthcare. In the field analysis of gene expression data in healthcare field, in particular, healthcare biclustering is very important. Its primary objective is to be able to identify groups of genes that act equally under a subset of samples (conditions). But the pioneering literature algorithms have shown certain limitations on the quality of biclusters that were unveiled [2]. A network of biological entities, e.g., genes, proteins, and metabolists, is linked together [3]. One of the key issues in healthcare bioinformatics is analyzing and extracting biologically significant information from these entities. The mechanism for generating a protein from the gene is gene expression. In two main steps, transcript and translation, this process happens, while transcription involves the production and processing of the resultant mRNA molecule by the enzyme RNA polymerase [4]. The concentration of mRNA in numeric values, namely, gene expression information, is measured using microarray DNA technologies. These technologies, known as DNA microarray technologies, enable the evaluation in various experimental conditions of expression levels of thousands of genes [5]. Indeed, for numerous biologists, these technologies have become indispensable tools. This is because genomes are used to control broad levels of gene expression in a particular organism. DNA microarray was used in various areas of research, including gene discovery [6], the diagnosis of disease [7], and drug findings [8]. Microarrays are used to identify the functions of the genes and the mechanisms that underlie diseases. In genetic engineering, gene expression is the most important stage of genotype phenotyping. Gene expression occurs when DNA is transcribed into RNA which is consistent with the process of gene synthesis information. These products are usually proteins, but the product is a functional RNA in nonprotein coding genes [9]. The number of genes in one cell varies between 2,000 and 60,000 for the eukaryotic cell depending on the organism. While the cell DNA sequence is equal in a certain organism, each cell exhibits only a subset of these genes at any given time, in keeping with the intricacies of gene expression regulation [10]. It is important that the genes are checked in several different conditions to draw meaningful conclusions from gene expression data. The analysis of gene expression includes a wide range of computer technologies that contribute to knowledge of gene-expression datasets. The main contribution of the work involves the following: The authors investigate the proposed method using the experimental setup Further, the raw data after the microarray has been created from the initial setup gets completed The authors analyse various several preprocessing techniques of data, and the high level of data analysis is done with the help of computational techniques

2. Background

It is of most importance to unravel the transcriptional modules of genes which are coregulated through a series of experiments [11]. Of course, in many challenges in bioinformatics, the clustering technology was shown to be beneficial. In fact, researchers can collect data such as cancer, certain subtypes of tumors, and cancer survival rates (Wei et al., 2010). While the results were encouraging, clustering algorithms were used. There are two main disadvantages to clustering algorithms: They take the entire set of samples into account. This is despite the fact that not all samples are subject to genes. They can instead only be applicable to a subset of samples, which are a key aspect of many problems in the field of biomedicine [12]. Therefore, both genes and conditions should be clustered simultaneously Only in one group can each gene be clustered. However, many genes can be included in various clusters according to their effect in various biochemical processes [13]. Healthcare biclustering, which is one type of clustering, has palliated these inconveniences in this respect. Biclustering thus aims to identify maximum submatrices (along with biclusters) where a subset of genes is highly correlated with a variety of conditions [14]. However, healthcare biclustering is a highly combinatorial and NP-hard problem [15]. Healthcare biclustering use is common in the analysis of gene expression data, as can be seen in the literature. Below, we remember some basic definitions taken from the field of healthcare biclustering [16].

3. Proposed Model

The design of a microarray experiment is a procedure which comprises five main steps. A summary of the whole process of the proposed system has been elaborated in Figure 1. The process starts with a biological question that needed to be answered and ends up with the biological interpretation of the results.

Figure 1

Proposed FCM architecture.

Experimental design. This phase consists of the definition of the objectives, selecting the genes and experimental conditions under study, as well as choosing the platform, the marking methodology, and the number of replicates. In Figure 1, this phase is depicted by experimental box Data generation. Raw data after the microarray has been created is obtained in this phase. This phase corresponds to data generation in Figure 1 Data Preprocessing. Raw data from the former step need to be preprocessed before being used. This step comprises background correction of the image, value extraction, data normalization, and data summarization. After that, the process continues with the first step if the data quality is not good enough High-level analysis. After the obtainment of matrix expression, one or more statistical and computational methodologies must be performed for the purpose of collecting useful and relevant data. Although many techniques of data mining are now used on microarrays, a lot of work still remains to be done in this field Two separate healthcare biclustering approaches are used in this analysis to identify specific gene activity in certain environments and remove the duplication of broad gene information components. Moreover, because of its adequacy, machine learning have become extensively used for healthcare biclustering. Furthermore, none of the solutions currently in place allow the user to pick the query priorities and their importance. Here, we have also worked towards developing a customizable approach to healthcare biclustering, where the user can weigh different targets according to the desired results preferences. Such goals include size, gene variation, or bicluster overlap. In fact, integrating new goals into search would also be extremely simple for the user. Firstly, gene expression data biclusters tend to contain identical data under multiple gene expression conditions. The first thing is to identify the set of submatrices using the large average submatrix. This is based on a simple sense ranking that transcends a series width and average value. At this point, there are serious problems when information is enhanced by using the healthcare biclustering algorithm. Therefore, to increase healthcare biclustering, we use the large average submatrix. This compresses or removes irrelevant or less-correlated clustering results. The study uses FCM also to ensure that the number of rows and columns can be added to the submatrix for further calculation. The system is calculated for the accuracy of the components and the strength of the submatrices. Secondly, a healthcare biclustering of the elements that are missing from gene expression data is done with a genetic-based simulated annealing algorithm. This approach uses a method of healthcare biclustering which is considered necessary to distinguish the information on gene expression.

3.1. Preliminaries

An M cell data matrix displays a microarray dataset by representing the level of gene expression of each cell in the matrix. With such aim, the proposed method uses a new strategy to eliminate low correlated biclusters and accepts only high-quality biclusters with larger correlation between them. In this method, the consistency of matrix element is measured using mean square residue, and further, it helps in finding the threshold value, in place of substituting the submatrix element values during the input matrix generation for subsequent clustering. The prediction of threshold value is usually processed using greedy strategy. Further, fuzzy-based ruleset helps in determining increment or decrement of the row/column in a submatrix.

3.2. Mean-Squared Residues

In this section, the consistency of matrix element is measured using mean-squared residue. The average values in rows and columns are expressed as below: Average value of a matrix of row i and column j is given below: The square matrix element residues are given below: The mean-square residues (MSR) is finally estimated over a matrix element: The average values of residue R of the ith row and C values of the jth column is given below: where ∣I∣ represents rows and ∣J∣ represents columns.

3.3. Fitness Function

The final fitness function of our algorithm is shown in Equation (6). The objective is to minimize the value of each term so that large biclusters with a low transposed virtual error (TVE) value, a high gene variance and a little overlap are found. The aim is to minimize the value of every word. All terms are weighted, except for TVE as the benchmark. The value of TVE was, however, divided into the TVE value of the whole microarray for the healthcare biclustering. This is because the range of TVE values in each microarray depends on the values, although the algorithm tries to minimize it, when using a different microarray. The algorithm leads to different types of biclusters according to their sizes, overlap, or variance between genes, changing the weights associated with different goals. All weights have been laid down the same way; biclusters with lower values for the relevant characteristic are lowered by a certain weight and vice versa. For example, small biclusters result in lower w values, whereas bigger w values lead to larger biclusters.

3.4. FCM Healthcare Biclustering

The proposed system uses FCM to check the number of columns and rows in the submatrix for calculation. FCM is developed using cost minimization function, and it is given below: FCM operates as an iterative process and the process is given below. Let u be the membership degree for an input image x, and it is given in the form of a cluster C, i = 1, 2, ⋯, N, and j = 1, 2, ⋯, c, and then, Euclidean distance is computed of x over θ′. where θ is a representative, i.e., updated regularly using weighted vectors. The FCM algorithm can be terminated in a variety of ways. If the θ value or the membership grade differs slightly between two iterations, the process can be halted. However, the number of iterations is set. FCM is sensitive with outliers, and the requirement of it is mentioned as below: Equation (10) represents the noise, and that accounts to obtain higher degree of membership in a cluster.

4. Performance Evaluation

This section presents the performance evaluation of the proposed FCM healthcare biclustering with existing PSO-SA healthcare biclustering and fuzzy logic healthcare biclustering. The study is evaluated in terms of average match score for nonoverlapping modules, overlapping modules through the influence of noise for constant bicluster and additive bicluster, and the run time. Likewise, the methods are tested in terms of Garber gene expression dataset (Table 1) with 4553 genes, 66 samples, and 4 classes. The simulation is conducted in Python3.8, where the CNN is modelled in TensorFlow2.2. The implementations were run on a computer with two 2.10 GHz 6 core Intel Xeon CPUs and 126 GB of RAM memory.

Table 1

Dataset.

Dataset	Garber
Objects	66
Features	2
Classes	4

Figure 2 shows the nonoverlapping modules with increasing noise levels for constant biclusters for Garber dataset. The results from the graph shows that with increased noise levels, the average match score tends to reduce in all the methods. However, the proposed FCM healthcare biclustering attains higher rate of average match score than other methods in healthcare field.

Figure 2

Nonoverlapping modules with increasing noise levels for constant bicluster.

Figure 3 shows the overlapping modules with increasing overlap degree in case of constant bicluster. The results from the graph shows that with increased overlap degree, the average match score tends to reduce in all the methods. However, the proposed FCM healthcare biclustering attains higher rate of average match score than other methods in healthcare field.

Figure 3

Overlapping modules in case of constant bicluster with increasing overlap degree.

Figure 4 shows the nonoverlapping modules with increasing noise levels for additive bicluster. The results from the graph shows that with increased noise levels, the average match score tends to reduce in all the methods. However, the proposed FCM healthcare biclustering attains higher rate of average match score than other methods in healthcare field.

Figure 4

Nonoverlapping modules for additive bicluster with increasing noise levels.

Figure 5 shows the overlapping modules with increasing overlap degree in case of additive bicluster. The results from the graph shows that with increased overlap degree, the average match score tends to reduce in all the methods. However, the proposed FCM healthcare biclustering attains higher rate of average match score than other methods in healthcare field.

Figure 5

Overlapping modules in case of additive bicluster with increasing overlap degree.

The results of run time is shown in Table 2, where the proposed method achieves reduced run time than existing method. This shows a faster response of gene expression estimation than other methods.

Table 2

Run time (ms).

Number of rows	PSO-SA	Fuzzy logic	Proposed FCM
4000	9.2345295	8.5186745	6.647225
8000	9.796987	8.7027515	8.160747
12000	10.594654	9.490192	8.426636
16000	11.474133	10.1958205	9.1118115
24000	12.4661035	10.492389	9.3572475
32000	13.8160015	11.760475	11.228697

5. Conclusion

This study uses healthcare biclustering to reduce the inconveniences linked to the information clustering phase for gene expression and applied to gene expression results in the field of healthcare. Many strategies are necessary to deal with this issue and expand the search area through heuristic clustering approach. Results of noise level for constant bicluster and additive bicluster and the results of overlapping modules show that the proposed FCM blustering is accurate in obtaining clusters than the existing methods. Further, the result shows that the run time is less in proposed FCM blustering and is accurate in obtaining clusters than the other existing methods. In the future, the proposed modelling can improvise with the several utilizations of machine learning or deep learning methods.

5 in total

1. BiCoN: Network-constrained biclustering of patients and omics data.

Authors: Olga Lazareva; Stefan Canzar; Kevin Yuan; Jan Baumbach; David B Blumenthal; Paolo Tieri; Tim Kacprowski; Markus List
Journal: Bioinformatics Date: 2020-12-26 Impact factor: 6.937

2. QUBIC2: a novel and robust biclustering algorithm for analyses and interpretation of large-scale RNA-Seq data.

Authors: Juan Xie; Anjun Ma; Yu Zhang; Bingqiang Liu; Sha Cao; Cankun Wang; Jennifer Xu; Chi Zhang; Qin Ma
Journal: Bioinformatics Date: 2020-02-15 Impact factor: 6.937

3. Comparison of sparse biclustering algorithms for gene expression datasets.

Authors: Kath Nicholls; Chris Wallace
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622

4. Bayesian biclustering for microbial metagenomic sequencing data via multinomial matrix factorization.

Authors: Fangting Zhou; Kejun He; Qiwei Li; Robert S Chapkin; Yang Ni
Journal: Biostatistics Date: 2022-07-18 Impact factor: 5.279

5. BDBB: A Novel Beta-Distribution-Based Biclustering Algorithm for Revealing Local Co-Methylation Patterns in Epi-Transcriptome Profiling Data.

Authors: Zhaoyang Liu; Yuteng Xiao; Hongsheng Yin; Xiaodan Li; Shutao Chen; Kaijian Xia; Lin Zhang
Journal: IEEE J Biomed Health Inform Date: 2022-06-03 Impact factor: 7.021

5 in total

2 in total

1. Dense Convolutional Neural Network for Detection of Cancer from CT Images.

Authors: S V N Sreenivasu; S Gomathi; M Jogendra Kumar; Lavanya Prathap; Abhishek Madduri; Khalid M A Almutairi; Wadi B Alonazi; D Kali; S Arockia Jayadhas
Journal: Biomed Res Int Date: 2022-06-20 Impact factor: 3.246

2. Region Convolutional Neural Network for Brain Tumor Segmentation.

Authors: R Pitchai; K Praveena; P Murugeswari; Ashok Kumar; M K Mariam Bee; Nouf M Alyami; R S Sundaram; B Srinivas; Lavanya Vadda; T Prince
Journal: Comput Intell Neurosci Date: 2022-09-10

2 in total