Literature DB >> 30386172

Parallel Algorithms for Inferring Gene Regulatory Networks: A Review.

Omid Abbaszadeh¹, Ali Reza Khanteymoori¹, Ali Azarpeyvand¹.

Abstract

System biology problems such as whole-genome network construction from large-scale gene expression data are sophisticated and time-consuming. Therefore, using sequential algorithms are not feasible to obtain a solution in an acceptable amount of time. Today, by using massively parallel computing, it is possible to infer large-scale gene regulatory networks. Recently, establishing gene regulatory networks from large-scale datasets have drawn the noticeable attention of researchers in the field of parallel computing and system biology. In this paper, we attempt to provide a more detailed overview of the recent parallel algorithms for constructing gene regulatory networks. Firstly, fundamentals of gene regulatory networks inference and large-scale datasets challenges are given. Secondly, a detailed description of the four parallel frameworks and libraries including CUDA, OpenMP, MPI, and Hadoop is discussed. Thirdly, parallel algorithms are reviewed. Finally, some conclusions and guidelines for parallel reverse engineering are described.

Entities: CellLine Chemical Disease Gene Species

Keywords: CUDA; Gene regulatory network; Hadoop; MPI; OpenMP; Parallel algorithms; Parallel processing; Reverse engineering

Year: 2018 PMID： 30386172 PMCID： PMC6194435 DOI： 10.2174/1389202919666180601081718

Source DB: PubMed Journal: Curr Genomics ISSN： 1389-2029 Impact factor: 2.236

INTRODUCTION

Each cell consists of thousands of genes. In each cell, the only small percentage of genes is expressed. Genes that are expressed interact with each other through mRNAs (messenger RNAs), proteins, or other types of molecules and managed cellular phenotypes and functions. Differences in gene expression are responsible for both morphological and phenotypic differences which indicate cellular reactions to environmental disturbances or hormonal stimuli [1]. There are several methods available for measuring gene expression level. Sequential Analysis of Gene Expression (SAGE) [2], DNA microarrays [3], Tiling arrays [4] and RNA-Seq [5] are the most used and important methods. The output of these methods is the expression profiles of genes that can be used in bioinformatics applications. One of the main objectives of bioinformatics researchers is deciphering the gene-gene interactions which are known as constructing a Gene Regulatory Network (GRN) or reverse engineering from gene expression profiles. A GRN is a graphical representation that demonstrates associations between a set of genes. In this model, edges represent regulatory influence or co-expression relationships in the regulatory network or co-expression network, respectively, and nodes represent molecular entities like genes [6]. The knowledge regarding the gene network not only shed light on the biological processes such as cellular differentiation, division, and signaling, but also can provide valuable information for drug discovery, molecular biology, cancer-related, and medical-related research [7, 8]. For example, Imoto et al. [9] and di Bernardo et al. [10] studies are prominent works that used gene regulatory networks in drug discovery. According to conventional wisdom, reverse engineering is a difficult problem, particularly in dealing with large-scale data. Considerable sequential algorithms have been developed to derive GRN model and meaningful information from experimental data. These algorithms can be categorized into seven main groups, namely; Boolean networks [11], Statistical methods like Partial-least-squares [12, 13], Differential equation systems [14], Bayesian networks [15, 16], Graphical Gaussian Models [17], Evolutionary approaches [18], and Information theory-based approaches [19]. Though there are various sequential algorithms for reverse engineering, they will not construct high dimensional gene networks and demonstrate valuable information like hub nodes, master-regulators, and some important regulated genes [20]. Furthermore, by increasing the size of data, the quality of the constructed network based on sequential algorithms is reduced (e.g. large number of false-positive edges in huge network). Even with recent progress in reverse engineering, in order to construct an appropriate network from large-scale data, the use of new machine learning methods and high-performance computing are important and challenging at the same time [7, 21]. Recently, parallel and distributed reverse engineering algorithms have received significant attention. Therefore, most of the proposed algorithms are scalable and reasonably accurate for reconstruction of GRNs from large-scale datasets. In practice, parallel and distributed algorithms can considerably reduce the execution time and provide scalability without losing quality. In the past, one of the main difficulties in the implementation of parallel and distributed algorithms was lack of an efficient framework for developing parallel algorithms. Furthermore, parallel programming required a level of expertise that few researchers and biologists have. Today, the use parallel frameworks such as CUDA (Compute Unified Device Architecture), MPI (Message Passing Interface), OpenMP, and Hadoop can considerably ease the work of researchers when they need to implement the efficient parallel algorithms. There are outstanding review papers covering the field of GRN inference. Some of the well-structured overviews of the general idea behind GRN inference and common mathematical modeling can be found in [19, 22-28]. Bansal et al. [22], Chai et al. [23], Lee and Tzou [24], and Schlitt et al. [25] prepared review papers on the computational approaches and a brief mathematical formulation for GRNs reconstruction. Sima et al. [26] reviewed dynamic methods that inferred GRN from time-series experiment data. Biswas et al. [27] reviewed evolutionary approach for GRN inference and Sirbu et al. [28] analyzed several evolutionary algorithms. Based on our knowledge, no work has been carried out to review the parallel algorithms for GRN inference. Therefore, we want to highlight two major issues: algorithms and tools which have been implemented in parallel frameworks and different parallel frameworks that can be used in learning gene networks. The paper is organized as follows: Section 2 introduces a brief overview of parallelism and parallel frameworks such as CUDA, MPI, OpenMP, and Hadoop. In Section 3, we review recent parallel algorithms in reverse engineering. Section 4 draws conclusion and prepares some directions for future research on parallel GRN inference.

PARALLEL FRAMEWORKS

In order to achieve the promise of GRN inference on large-scale datasets, it is necessary for existing GRN algorithms to be executed in parallel. Parallel programming is concerned with the distribution of a program among a set of processors and defines how they interact in order to make the results. One of the most important aspects of parallelism is its relation to the hardware and programming frameworks. There are several frameworks for parallel programming. CUDA, MPI, OpenMP, and Hadoop are the most popular frameworks for parallel and distributed programming. CUDA proposed by Nvidia is a parallel programming framework for Nvidia GPUs (Graphical Processing Unit). It is an extension to the C and C++ that provide a set of libraries for exploiting GPUs as general purpose processors. MPI (Message Passing Interface) and OpenMP (Open Multiprocessing) are the set of standard libraries for parallel programming in distributed and shared memory space environments, respectively. MPI uses message passing among the processes in clustered systems where it generally shared nothing. OpenMP is used for parallel programming in multi-core fashion which is generally based on shared memory architecture. Also, MPI can be used to distribute the algorithm when using multiple GPUs. Hadoop is a software framework that enables us to run applications and store big datasets in the distributed environments. In this section, we briefly introduce the organization of CUDA, MPI, OpenMP, and Hadoop.

GPU and CUDA

In the last decade, the clock speed of processors has remained constant. Therefore, processor designers came to the conclusion that complex multi-core processor is not the most efficient for massively parallel computing. Currently, processor designing trend is going to many-core approach such as GPUs or co-processors (such as the Xeon Phi) instead of complex multi-core processors [29]. This kind of architecture provides heterogeneous computing and achievable performance for SIMD (Single Instruction Multiple Data) programs. It describes programs with one instruction that performs the same operations on multiple data points simultaneously. This change of designing paradigm has had (and will have) a significant impact on the designing parallel algorithm [30]. Each GPU consists of a series of streaming multiprocessor (SM, or SMX in the latest architectures) which within each SM, a number of streaming processors (SP), known as cores, are placed in arrays and execute arithmetic and logical operations in parallel. Furthermore, each SM has a number of registers and a private per-block shared memory to transfer data between concurrent threads. According to the programming model, threads and thread blocks are distributed along SPs and SMs. There is another memory called global memory that is used to share data between the grid of SMs. A grid is a set of SMs that work independently and thus may be executed asynchronously in parallel [31]. In 2007, Nvidia released CUDA framework which is an extension to the C and C++ and makes available using the GPU as general purpose GPUs [30]. CUDA provides three features for programmers: 1) Threads management, 2) Memories management, and 3) Synchronization features. These fine-grained features help us to divide the program into subprograms that can be executed in parallel and then integrate them. The written codes in CUDA contain one or more functions that are called kernels which are loaded to the GPUs and replicated in many threads. The programmer determines the number of threads for each kernel and manages the available memory spaces visible to the kernel functions [31, 32]. One of the main tasks in CUDA based parallel algorithms is to determine the threads, blocks, grids, and managing memory allocation, which is the source of differences in the performance of algorithms. In spite of the remarkable advantages in programming, GPU-based programming is different from CPU-based programming. Nevertheless, several packages were released whose users without any knowledge of GPU programming can also access the high-performance computing power of GPUs. OpenCL is another framework for cross-platform GPU programming maintained by the Khronos group, which can be run on different hardware platforms. Recently, Nobile et al. [33] studied some computational tools in bioinformatics that exploit GPUs as a processing engine.

MPI

MPI is the most used de-facto standard libraries for parallel programming based on message passing paradigm. It is a collection of libraries to send messages between computers or processes on the distributed memory environment. In MPI programming model, nodes have their own memory space, own processors, and communicate with each other to access memory space [34]. In addition, programmers must divide the tasks among the nodes with separate memory spaces and define the nodes communications and synchronize them. To facilitate parallel programming, MPI provides various libraries and functions to communicate, coordinate, and synchronize between distributed nodes. Most of the current bioinformatics code could be parallelized under the MPI models such as mpiBLAST [35] (Basic Local Alignment Search Tool), parallel version of the BLAST sequence alignment, and MPI-CMS [36], parallel implementation of the Cross Motif Search algorithm.

OpenMP

OpenMP is a set of high-level APIs (Application Programming Interfaces) which provides shared-memory based parallelism and multi-threading paradigm in multi-core environments. It consists of a set of compiler directives, libraries, and predefined interfaces that can be used in programming languages such as C/C++, Java, Python, and many other languages. After compiling OpenMP programs, threads negotiate with each other through shared memory space and hence increase the performance of the program. Similar to CUDA, OpenMP provides three feature for programmers: 1) controlling features that alter the flow in a program, 2) synchronization features for coordinating the execution of threads, and 3) data environment features for communicating between threads [37]. OpenMP provides a high-level abstraction that makes it well suited for high-performance computing programmers in shared memory environment. Therefore, one of the main advantages of OpenMP is that it does not require major changes for converting a sequential code to parallel one.

Hadoop

Before introducing Hadoop, Map-Reduce paradigm should be introduced. In this paradigm, data is divided into subsets, and then these subsets are assigned to the different machines for parallel processing. Finally, it brings together separate processes and returns the end result. The stage of division and allocation of data to machines is called Map, and bringing together and presenting the result is called reduce stage. Map-Reduce paradigm is suitable for big data analyzing due to its ability to execute the program in parallel over the cluster of computers data without loading the whole data into memory. Hadoop is a Java-based framework that allows parallel and distributed programming across the distributed environment using Map-Reduce paradigm. It has two main components: YARN (Yet Another Resource Negotiator) and HDFS (Hadoop Distributed File System). YARN manages computational resources needed for distributed executions. HDFS prepares scalable and robust distributed file system for big data. Apart from the Hadoop, there are numerous software frameworks (such as Pig, Spark, Mahout etc.) that provide specific features whose users with no knowledge of distributed programming can also process large amounts of data on the specific domain [38]. Additionally, there are many bioinformatics tools which have been developed based on Hadoop such as CloudBLAST [39], distributed version of the BLAST2 algorithm using Hadoop framework, Eoulsan [40], a framework for RNA sequence data analysis, and Seqpig [41] and BioPig [42], for analyzing large-scale sequencing data. Unfortunately, there is no “silver bullet” for parallel programming. Indeed, based on framework selection, parallel programming is more complex and different than sequential programming. Efficient distribution of tasks on the processing units, avoiding inefficient data replication, and unnecessary communication among the processing units are the vital factors that affect parallel programming performance. Each of the mentioned frameworks provides a different paradigm of parallel programming and have their own strong and weak points. The use of OpenMP for parallel programming is easier than other frameworks, but it runs on the shared-memory environment. MPI runs on shared and distributed memory but requires more changes in the sequential algorithm. Hadoop provides highly scalable and faults tolerant environment, but it is not always straightforward to implement sequential algorithms as a Map-Reduce program. Although exploiting CUDA leads to higher performance compared to using CPU in data-level parallel programming, but CUDA programming is more difficult than CPU programming and programmers need an in-depth understanding of the GPU architecture. Table summarizes the characteristics of the frameworks based on usability, complexity, and scalability. There are many reasons to integrate the two or more parallel programming frameworks. For example, CUDA-aware MPI programs, accelerate an existing single-GPU application to scale across multi-GPU application by using MPI. Apart from the above frameworks, several other projects which provide specific features have been developed. Table summarizes some important libraries and tools for programmers to efficiently exploit and integrate parallel frameworks. These libraries aim to develop more efficient parallel programs and provide high level abstraction for researchers with low experience in parallel and distributed programming.

PARALLEL ALGORITHMS

The crucial step in GRN inference is selecting the model. In this review, we focus only on the approaches that the modeling algorithms are parallel. Based on their mathematical models, in the next subsections, we will review parallel algorithms.

Bayesian Network Based Models

Modeling gene regulatory networks based on Probabilistic Bayesian Networks (PBN) have become popular in the bioinformatics community. The main advantages of PBN are the ability to represent the uncertainty in models, exibility, and integrating prior knowledge (e.g. biological knowledge) with experimental data. In 1999, for the first time, Murphy et al. [16] used the Bayesian network for GRN inference and thereafter, significant efforts focused on reverse engineering by PBNs. According to Pearl and Russell [43], PBN is a Directed Acyclic Graph (DAG) , where , the set of nodes, represents random variables, and is the set of directed edges, which represents cause-and-effect relationships such as regulation influence among the genes. A directional edge indicates that is parent of or gene regulates the gene in GRN context. Mathematically, PBN encodes the Markov assumption that given its parents, each variable is conditionally independent of its non-descendants. Based on this assumption, PBN compacts the joint probability distribution as follow: (1) where is the set of parents of in the DAG. Ease of use refers to the effort required to programming. Code conversion refers to the effort required to changing the sequential code to parallel code. For Bayesian network learning, many outstanding algorithms have been developed. Well-structured review of the Bayesian network learning is presented in [44]. In essence, learning PBNs from data consists of both parameter and structure learning (or model selection). Estimating the local conditional probabilities for each node is parameter learning and establishing the network as a candidate DAG is structure learning. Structure learning is more important than parameter learning in GRN inference because cause-and-effect interactions among the genes are determined at this step. Finding an exact network that fits on data, is NP-hard problem because the number of DAGs grows super-exponentially with the number of variables. This implies that exact algorithms can become a computationally intractable task and currently there is no polynomial time algorithm that can solve an NP-hard problem of large or even moderate input size. One way to tackle NP-hard problems is to design heuristic or parallel algorithms that reduce the computational time. There are three generic approaches to structure learning: score-based, constraint-based, and hybrid learning methods [45]. The first approach assigns a score to the candidate DAG by scoring functions and tries to optimize scoring criteria with a heuristic algorithm such as greedy search. Selecting an appropriate scoring function is very important since it is the key ingredient to reconstruct high-quality GRN by PBNs. These methods work well on small datasets with not too many variables. Constraint-based methods efficiently restrict the search space. Therefore, they can often work well on large datasets. Sparse Candidate Algorithm (SCA) [46] is one of the prominent constraint-based algorithms where each variable constrained to have at most parents. Finally, hybrid methods are combinations of the score-based and constraint-based approaches. Nikolov and Aluru [47] developed a parallel hybrid Bayesian structure learning for reverse engineering. They demonstrated that the main cause of error in SCA based approaches is misselected optimal parents (OP) from candidate parent (CP) set. To address this issue, inspired by parallel pairwise mutual information [48], authors created a mutual information based network to identify CP set for each node. They then developed a parallel exact algorithm for selecting OPs from CPs set. In order to do this, they checked all subsets of CP and elicited the OPs for each node by scoring function and eventually, OP sets were used to create an initial network. Note that obtained graph may contain cycles, which are detected and eliminated by exponentiation of adjacency matrix based on cycle length (shorter cycles before longer ones). The authors implement the proposed method in the Cray system with AMD many-core processors by using C++ and MPI library. In their evaluation of performance on data of size 500 genes and 100 observations, in the best case, the method inferred GRN in less than 2 minutes in the Cray AMD cluster with 1024 cores. In the algorithm developed by Misra et al. [49], a massively parallel heuristic PBN structure learning was established to whole-genome network reverse engineering by exploiting Tianhe-2 and Stampede high-end heterogeneous supercomputers. The proposed method is similar to [47] based on differences in the scoring function for network evaluation, limited size of the CP set to reduce the computational complexity, and implementation techniques to achieve performance, scalability, and efficient load balancing. In order to efficiently distribute the work between the processing units, they performed hierarchical dynamic work distribution that first divides tasks across the cluster nodes, and then subdivides this task within a node. One of the conventional approaches to parallel structure learning is dividing the whole network learning problem into several subnetworks learning, where each of them contains randomly sampled variables. Evidently, the main issue here is how to select an appropriate sampling approach. Tamada et al. [20] developed a parallel PBN structure learning algorithm for reverse engineering, based on the subnetwork strategy and random walking technique, called Neighbor Node Sampling and Repeat (NNSR). The authors demonstrated that the small sample size and appropriate sampling of the variables (or genes) lead to subnetworks that can efficiently demonstrate cause-and-effect relationships. Therefore, they propose a two-phase heuristic algorithm which first, at each iteration, using random sampling (all variables being equally likely), learns a new subnetwork of the set of sampled variables, and then creates the whole network by using neighbor node sampling based on the random walking on the subnetworks. In order to do this, they create a weighted graph by introducing edge frequency. Edge frequency indicates the ratio of the number of occurrences of directed edge in different subnetworks divided by the number of different subnetworks in which two variable are selected together (greater number indicates a stronger cause-and-effect relationship). Next, random walk procedure selects a specific proportion of the nodes from a weighted graph and creates a large number of smaller subnetworks. The authors implemented the proposed method in C programming language and OpenMPI library on the 724 computation nodes with dual Intel quad core Xeon 3.0 GHz, in total 5792 cores. They applied the proposed method on Human Umbilical Vein Endothelial Cells (HUVECs) with 13731 transcripts and extracted GRN in less than 3 hours. Furthermore, the proposed method also extracts valuable information such as hub nodes and putative master regulators that are not achievable from the small network. Based on this model, Tamada et al. [50] have developed a software collection called SiGN. SiGN consists of two other parallel programs based on the graphical Gaussian model, SiGN-L1, and state space model [51], SiGNSSM. One of the main sources of error in the statistical inference is overconfidence to model, which is generated by ignoring model uncertainty [52]. Model uncertainty refers to the situations in which there is no unique and agreed model for a specific problem. In most situations, the main cause of uncertainty is irrelevant variables in constructing the model. Inspired by ensemble learning, one way to tackle model uncertainty is Bayesian Model Averaging (BMA). BMA refers to the procedure of selecting variables by averaging posterior probability of the models in which each of them consists of a set of candidate variables or regulators in the GRN. The main challenge in the BMA is selecting an efficient model. Young et al. [53] proposed a Bayesian inference method for regression variables selection from time-series data based on the BMA, called ScanBMA. They have developed a greedy mechanism for picking appropriate models based on Occam’s window principle. Parallel implementation of ScanBMA named as fastBMA [54] is available from https://github.com/lhhunghimself/fastBMA.

Information Theory Based Models

Due to easy implementation, simplicity, low computational cost, and ability to detect complex interactions, parallel Information Theory Based Models (ITBM) is somewhat superior in reverse engineering. In the last two decades, some attractive algorithms based on the information theory have been developed. The ITBMs such as correlation-based [55, 56], Mutual Information (MI) [57-60], and Gaussian Graphical Models [17] (GGM) are the main state-of-the-art approaches to extract dependency on biological networks inference. In following, along with the review, we will introduce mathematical details of some similarity measures which are the cornerstone of ITBMs. Pearson correlation (PC), Mutual Information (MI), and Partial Correlation are the main similarity measures that have been extensively used in the literature. Each of them has their own limitations and benefits. There is no proof that one is superior to others [61]. MI is often used as a similarity measure, which enables the detection of non-linear relationships among the variables. It is defined based on the individual and joint entropies in the following way: (2) where is the differential entropy of a random variable and is a measure of its uncertainty. In particular, for a continuous variable , it is defined by: (3) In 3, is probability density function for continuous variable X. It can be estimated by different methods such as histogram plotting, kernel estimators, k-nearest neighbor estimators [62], and B-spline estimators [63]. Note that estimating probability density function is one of the challenging problems in the MI based approaches. Binning the continuous variables into quantile intervals is another way of estimating probability distribution. Within this approach, each continuous expression value is replaced by an integer value corresponding to the bin if fall into. This is defined as follows: (4) where is the number of bins, represents the joint probability , and and are marginal probabilities and . This method is very simple and fast but is sensitive to the number of bins used. Based on this approach, Belcastro et al. [64] developed a parallel MI-based algorithm. Kernel-based estimators are computationally expensive when a large number of variables are available. To tackle this problem, Daub et al. [63] proposed a B-spline based method for binning continuous data. Within this approach, each continuous value is assigned to bins with weights given by the B-spline function of order defined over knot points. For a continuous value, this function returns a vector of size with continuous non-negative weights that indicate to which bins the value should be assigned. Based on this idea, four parallel reverse engineering [48, 65-67] have been developed which will be discussed in detail below. Zola et al. [48] proposed a parallel algorithm named TINGe (Tool Inferring Networks of Genes). TINGe is the first parallel software for reverse engineering which constructs the largest whole genome plant network. It uses B-spline based MI and provides efficient permutation testing for assessing statistical significance by rank transformation, Data Processing Inequality (DPI) to remove indirect relationships, and parallel processing for reverse engineering. DPI states that if three random variables , , from a Markov Chain in that order i.e., , then and . Indeed, if three genes , , from a triangle in the network, DPI can be applied to remove the indirect edge among the three edges by removing the weakest MI value. This can significantly decrease false positive rate. In their performance evaluations on Arabidopsis thaliana of size 15222 genes and 3137 observations, the method inferred GRN in 30 minutes on a 2048-CPU Blue Gene/L and 2 hours and 25 minutes on a 8 node Cell blade cluster. Since, TINGe was successful, Misra et al. [65] implemented it on the Intel Xeon Phi single-chip coprocessor and Chockalingam et al. [67] developed a distributed version of TINGe on the Amazon EC2 cloud computing platform by using Hadoop framework. Shi et al. [66] proposed a parallel MI-based algorithm by using B-spline function and CUDA framework, called CUDA-MI. By defining the weighting matrix (, : number of genes; : number of bins) in which each row of it indicates the weight coefficients of gene value in the set of the bin, CUDA-MI calculates pairwise MI in parallel among genes. Thereafter, the authors implemented their approach on the Nvidia Tesla C2050 GPU with 448 cores 1.15 GHz and compared it with Quad-Core i7 2.66 GHz CPU. By using single GPU version, their best acceleration was 82x, compared to the execution on multi-threaded CPU. Additionally, they combined CUDA-MI with ARACNE [57] method and the results of specifiity, sensitivity, and precision analysis revealed that the combined method is more efficient than simple ARACNE and TINGe software [48]. PC is another widely used correlation measure that detects linear relationships among the variables and is defined as follow: (5) where is the covariance of , , and is the standard deviation of Xi. Liang et al. [56] used PC based method for gene co-expression network reconstruction, called FASTGCN. They proposed a parallel algorithm that integrates genetic information entropy to preprocessing, PC for analyzing dependency, and z-score for coefficient normalization, and efficiently exploits GPU memory by using the zero-copy technique. The authors compared CUDA version of FASTGCN (implemented on Nvidia Tesla K20c with 2496 cores 760 MHz) against three versions of FASTGCN: Multi-core (Intel Xeon 16 cores 2.90GHz) CPU with 16 OpenMP thread, Single-thread CPU with C/C++ programming language, and Single-thread CPU with R programming and achieved 2x, 10x, 80x speedups respectively on the dataset containing 16000 genes of 590 individuals. Zheng et al. [68] developed a new software based on their previous PCA-CMI (Path Consistency Algorithm based on Conditional Mutual Information) algorithm [69], known as CMIP. PCA-CMI is a well-known iterative algorithm for reverse engineering. At first, it creates a complete graph of size ( is the number of genes) and at each iteration , by using -order Conditional Mutual Information (CMI), quantifies relationships among two genes given their common -neighbors. The CMI of variables and given is defined as: where , , and H(Xi,Xj,Xk) are joint entropies. High CMI value indicates that there may be a close relationship between the variables and given variable(s) . After that, it deletes the edges with zero or low CMI value at each iteration. The time consumed for large-scale data and how to determine an appropriate edge deletion threshold are the main drawbacks of PCA-CMI. To overcome these drawbacks, the authors developed two parallel software by using CUDA and OpenMP frameworks and defined a mechanism for automatic threshold setting. In CUDA version of CMIP, pre-processed data is delivered to GPU cores for correlation calculation using a parallel model and in OpenMP version, loop calculation is accelerated with the multi-threading approach. CMIP attained acceptable performance compared to conventional methods. Borelli et al. [70] proposed a new exhaustive search algorithm, which expresses the reverse engineering as a feature selection problem. In this way, feature selection can be viewed as an iterative searching method for selecting an optimal subset of genes which regulate target gene based on mean conditional entropy function as selecting criteria. The mean conditional entropy of variables given defined as: (7) Conditional entropy of Xj conditional on refers to the average entropy of conditional on the value of , averaged over all possible values of . Small value of conditional entropy indicates that can well predict or gene associates the gene in GRN context. Exhaustive search algorithm which is a time-consuming step has been implemented on the GPU and Multi-GPU in parallel. Furthermore, search algorithm has been implemented in global and local versions. Regulated genes of each target gene have been limited but not limited in the local and global search, respectively. Finally, the authors generated data by AGN simulator with 1024, 2048, 4096, and 8192 genes to evaluate their approach. They have compared the proposed method when it is implemented by one, two, and four GPUs with 240 core per GPU against CPU version which utilized six 3.2GHz core and OpenMP library. By using four GPU, their acceleration compared to the execution on CPU is 55, 110, 260, when there are 32, 64, and 128 target genes per block, respectively. LegumeGRN [71] is a reverse engineering web tool, which has been implemented on multiple well-known reverse engineering algorithms. LegumeGRN developers have implemented a parallel version of TIGRESS [72] and GENIE3 [73], two popular algorithms for reverse engineering, which uses feature selection like methods as a mechanism for reverse engineering. GENIE3 uses tree-based ensemble feature selection method for reverse engineering on multifactorial expression data and TINGe uses LARS feature selection. When dealing with high dimensional data and non-uniform distribution of variables, bias of MI estimator is one of the main sources of error. To overcome this problem, Kiraskov et al. [62] proposed an unbiased MI estimator based on nearest neighbor (KNN) classifier. The main idea is estimating the probability densities from the distribution of its nearest neighbor which implies minimally biased estimator. Sales and Romualdi [74] developed a parallel R package for reverse engineering based on KNN and MI, called PARMIGENE (PARallel Mutual Information estimation for GEne NEtwork reconstruction). The authors combined PARMIGENE with CLR, ARACNE, and MRNET, three state-of-the-art ITBMs which use MI for reverse engineering. Experimental results on in-silico datasets show that PARMIGENE estimator not only gives unbiased and more precise results, but is also faster than the other estimators.

Differential Equation-Based Models

Ordinary differential equations that are based on the biochemical systems theory are popular approaches for reverse engineering. In this model, by using a non-linear function , regulatory interactions between genes can be expressed as follow: (8) where describes the expression level of gene at time , and are the interaction parameters among genes and the external perturbation of gene, respectively. To date, one of the most prominent methods is a type of systems of ordinary differential equations called S-Systems. The general form of an S-System for representing a gene regulatory network is the as follow: (9) where is the expression level of gene , and is the total number of genes in the network. The non-negative parameters and are rate constants; and are kinetic orders that reflect the interaction from gene to gene in the activation and degradation processes, respectively. The parameter estimation of an S-system model is a large-scale optimization problem that is computationally expensive. Lee et al. [75] and Jostin and Jaeger [76] developed a GRN model based on S-system. They proposed two distributed evolutionary algorithms for solving large-scale S-system parameters estimation. Lee et al. [75] combined Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The authors used two fitness function based on the Minimum Square Error (MSE) and exploited island model parallelism. In this way, the entire population is divided into the number of subpopulations and each of them is independently executed on the one or more processor(s). The algorithm is implemented on top of the Hadoop platform. Jostin and Jaeger [76] developed parallel island evolutionary algorithm, which is faster and more accurate than the comparable simulated annealing algorithm. Xiao et al. [77] recently developed an asynchronous parallel algorithm to improve the accuracy and lower time complexity of large-scale GRN inference by combining splitting technology and ODE. The authors demonstrated that the sparsity and modularity of large-scale GRNs are much higher than the small-scale GRNs. In this paper, the whole network decomposes into clusters based on the MI criteria and each cluster is modeled by ODE. They used Gaussian elimination process for parameter estimation. Gardner et al. [78] developed an algorithm via a set of ODEs on the series of steady-state RNA expression, called NIR (Network Identification by multiple Regression). NIR constructs a first-order model of regulatory interactions and uses multiple linear regression to estimate model parameters. Due to the high time complexity, like the other sequential algorithms, sequential NIR cannot be used with large-scale datasets with thousands of genes. Gregoretti et al. [79] developed a parallel version of NIR algorithm. They argued that parameter estimation of NIR can be done independently by decomposing data matrix into a set of sub-matrices. In addition to speedup, the results of tests on large datasets show that the parallel NIR produces many fewer errors. Differential evolution is a population-based approach that holds promise for parameter estimation of ordinary differential equations and is appropriate to be parallelized [80] because the evaluation of the populations is independent of each other. In this approach, a problem is iteratively solved until no further improvement on the solution with regard to a given objective function. In each iteration, a new population is created via a migration technique in which the best individual from each population is selected and copied to another population [81]. Kozlov and Samsonov [82] and Ramirez et al. [83] proposed a parallel differential evolution algorithm for differential equations parameter estimation by using MPI library and CUDA framework, respectively. As discussed in the introduction, there are many algorithms for GRN modeling from expression data. In this article, we reviewed only the approaches that its modeling algorithms were parallel. Table shows some of the strengths and weaknesses of computational methods which provides useful insights on GRN reconstruction.

CONCLUSION AND DISCUSSION

According to reviewed papers, parallel approaches mostly use MPI library (Fig. ). This can have several reasons. One is that some frameworks, such as CUDA, are only supported on specific hardware and programming language. Another reason is that MPI can be used in a wider range of problems than other frameworks. In spite of the complexity of MPI programming, the last reason is that the researchers in MPI programming have a greater ability than CUDA and Hadoop frameworks. However, none of the frameworks are complete and have their own limitations. As discussed earlier, hybrid parallel programming such as MPI-OpenMP, MPI-CUDA, and OpenMP-CUDA is a good idea to achieve better performance and increase flexibility. Mathematical modeling is an alternative categorization. Based on the reviewed papers, information theory based and differential equations based approaches are often used than PBN models (Fig. ). There are two important reasons for this: first, these approaches are more prevalent among bioinformatics researchers; second, their branch-less nature makes them attractive for parallelism. However, there are several parallel algorithms in literature developed for PBN structure learning, which can be used in the context of GRN problems with minor modifications. As discussed earlier, using PBNs in addition to prior knowledge (e.g. gene ontology or biological knowledge) can ultimately improve accuracy and have a reasonable biological justification. In order to perform parallel inference, selecting modeling approach and parallel framework are essential steps. In this work, we reviewed parallel algorithms on GRN inference problem. We also briefly explained parallel frameworks for programming and development of algorithms. Table summarizes the research works we have found within the literature’s which use parallelism in the reverse engineering process. As a result of our studies, we propose some guidelines to facilitate decision-making for parallel reverse engineering: GRNs often are modular [84]. Modularity is a suitable property for parallel reverse engineering and based on this, researchers can develop efficient parallel algorithms. Based on the reviewed papers, much less attention has been paid to the knowledge-based approach. Therefore, developing parallel knowledge-based algorithm is an interesting idea. In gene expression dataset, sample sizes are substantially smaller than the number of available genes. This is known as “large p small n” problem, so researchers must take this into account to design more efficient method. Sequential inference algorithms are highly limited to the size of the dataset and often do not provide valuable information such as hub genes, master regulators, and many others. Parallel algorithms for large-scale GRN problems deliver fast and useful results. However, this field is interdisciplinary, involving parallel algorithms design, bioinformatics, and machine learning. Therefore, in this paper, parallel reverse engineering algorithms are reviewed from the perspective of parallel frameworks used, bioinformatics knowledge used for inference, and mathematical modeling methods.

CONSENT FOR PUBLICATION

Not applicable.

Table 1

Parallel framework comparison.

Framework	Programming Model	Framework Complexity	Programming Language	Ease of Use	Code Conversion Effort	Scalability
CUDA	SIMD	Fair	C/C++	Moderately	More	Low
OpenMP	Multi-thread	Low	Most Languages	Easy	Few	Low
MPI	SIMD/MIMD	Fair	Most Languages	Poor	More	Medium
Hadoop	Distributed	High	Java	Poor	More	High

Note: Framework complexity refers to the difficulty in using different frameworks.

Table 2

Some related libraries and projects on CUDA, MPI, OpenMP, and Hadoop.

Project	Description	URL
Spark	An open-source cluster-computing framework on Hadoop	http://spark.apache.org/
Pig	A query language based on Hadoop for basic calculations over large datasets	http://pig.apache.org/
Mahout	A distributed machine learning and data mining library on Hadoop	http://mahout.apache.org/
OpenMPI	Most used implementation of the MPI model. Open MPI 1.7 and later is CUDA-aware	https://www.open-mpi.org/
MVAPICH	CUDA-aware MPI implementation. It helps to run CUDA+MPI	http://mvapich.cse.ohio-state.edu/
Mars	A Map-Reduce framework on graphics processors	https://github.com/arianepaola/Mars
CuBLAS	An implementation of basic linear algebra subprograms on CUDA framework	https://developer.nvidia.com/cublas
JCUDA	Java bindings for CUDA libraries. It helps to run Hadoop Map task on GPUs	http://www.jcuda.org/
omp4j	An OpenMP like library for Java programming language	http://www.omp4j.org/
mpi4py	A library for MPI programming in python	http://pythonhosted.org/mpi4py/
PyCUDA	A library for integrating CUDA in python	https://github.com/inducer/pycuda

Table 3

Advantages and disadvantages of computational methods.

Model	Strength	Weakness
Bayesian network	• Facilitate the incorporation of prior knowledge and experimental data• Able to cope with incomplete and noisy data• Handle with uncertainty	• Feedback regulations not allowed• Learning structure of the Bayesian network is NP-hard, therefore, can only apply to small-scale networks• Cannot model time series data
Information theory	• Easy to parallelize• Low computational cost• Able to detect complex interactions	• Can have a high rate of false positives in high dimensional data• Poor asymptotic behaviour under high dimensional data
Differential equation	• Suitable for time series and steady-state data• Model positive and negative feedback interactions	• Difficult to find optimal parameter values• Applicable to small-scale networks

Table 4

Parallel GRN inference algorithms.

Reference	Data Type	Based on	Framework	(Co)processor	Source Available	Description
[66]	Discrete	Information Theory	CUDA	GPU	√3	Known as CUDA-MI
[56]	Continuous	Information Theory	CUDA	GPU	√4	Known as FastGCN
[68]	Discrete	Information Theory	CUDA	GPU	√5	Known as CMIP
[83]	Continuous	Differential Equation	CUDA	GPU	-	-
[70]	Discrete	Information Theory	CUDA-OpenMP	GPU	-	-
[74]	Continuous	Information Theory	OpenMP	-	√6	Known as PARMIGENE
[76]	Continuous	Differential Equation	MPI	-	-	-
[77]	Continuous	Differential Equation	MPI	-	-	Known as LSGPA
[53]	Continuous	Bayesian Network	MPI	-	√7,8	Known as fastBMA
[50]	Continuous	B-S-L¹	MPI	-	√9	Known as SiGN
[65]	Discrete	Information Theory	MPI	Intel Xeon Phi	-	Based on TINGe
[49]	Discrete	Bayesian Network	MPI	Intel Xeon/ Intel Xeon Phi	-	-
[82]	Continuous	Differential Equation	MPI	Intel Xeon	-	Known as DEEP
[48]	Continuous	Information Theory	MPI	-	√10	Known as TINGe
[64]	Discrete	Information Theory	MPI	Intel Xeon	-	-
[79]	Continuous	Differential Equation	MPI	-	-	Known as Parallel NIR
[20]	Discrete	Bayesian Network	MPI	Intel Xeon	√11	-
[47]	Discrete	Bayesian Network	MPI	Cray AMD	-	-
[75]	Continuous	Differential Equation	Hadoop	-	-	-
[67]	Continuous	Information Theory	Hadoop	-	-	-
[71]	Continuous	*²	-	-	√12	Known as LegumeGRN

1B-S-L: Bayesian Network, State Space Model, L1-regularization

2A software which have implemented multiple well-known reverse engineering algorithms

3 https://sites.google.com/site/liuweiguohome/cuda-mi

4 http://ibi.zju.edu.cn/software/FastGCN/

5 http://www.picb.ac.cn/CMIP/

6 https://cran.r-project.org/web/packages/parmigene/index.html

7 https://github.com/lhhunghimself/fastBMA, fastBMA is a parallel implementation of ScanBMA

8 https://www.bioconductor.org/

9 http://sign.hgc.jp/

10 http://aluru-sun.ece.iastate.edu/doku.php?id=tinge_gena

11 http://bonsai.hgc.jp/~tamada/hgc/suppl/GWGN/index.html

12 https://legumegrn.noble.org/cc.html

62 in total

Review 1. Genomics, gene expression and DNA arrays.

Authors: D J Lockhart; E A Winzeler
Journal: Nature Date: 2000-06-15 Impact factor: 49.962

2. Estimating mutual information.

Authors: Alexander Kraskov; Harald Stögbauer; Peter Grassberger
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2004-06-23

3. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information.

Authors: Xiujun Zhang; Xing-Ming Zhao; Kun He; Le Lu; Yongwei Cao; Jingdong Liu; Jin-Kao Hao; Zhi-Ping Liu; Luonan Chen
Journal: Bioinformatics Date: 2011-11-15 Impact factor: 6.937

Parallel Algorithms for Inferring Gene Regulatory Networks: A Review.

INTRODUCTION

PARALLEL FRAMEWORKS

GPU and CUDA

MPI

OpenMP

Hadoop

PARALLEL ALGORITHMS

Bayesian Network Based Models

Information Theory Based Models

Differential Equation-Based Models

CONCLUSION AND DISCUSSION

CONSENT FOR PUBLICATION

Review 1. Genomics, gene expression and DNA arrays.

2. Estimating mutual information.

3. Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information.

4. Estimating genome-wide gene networks using nonparametric Bayesian network models on massively parallel computers.

5. Inference of gene regulatory networks using time-series data: a survey.

Review 6. RNA-Seq: a revolutionary tool for transcriptomics.

7. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics.

8. Gene regulatory networks inference using a multi-GPU exhaustive search algorithm.

9. Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks.

10. Fast Bayesian inference for gene regulatory networks using ScanBMA.