Literature DB >> 27812276

IBBOMSA: An Improved Biogeography-based Approach for Multiple Sequence Alignment.

Rohit Kumar Yadav1, Haider Banka2.   

Abstract

In bioinformatics, multiple sequence alignment (MSA) is an NP-hard problem. Hence, nature-inspired techniques can better approximate the solution. In the current study, a novel biogeography-based optimization (NBBO) is proposed to solve an MSA problem. The biogeography-based optimization (BBO) is a new paradigm for optimization. But, there exists some deficiencies in solving complicated problems such as low population diversity and slow convergence rate. NBBO is an enhanced version of BBO, in which, a new migration operation is proposed to overcome the limitations of BBO. The new migration adopts more information from other habitats, maintains population diversity, and preserves exploitation ability. In the performance analysis, the proposed and existing techniques such as VDGA, MOMSA, and GAPAM are tested on publicly available benchmark datasets (ie, Bali base). It has been observed that the proposed method shows the superiority/competitiveness with the existing techniques.

Entities:  

Keywords:  Multiple sequence alignment (MSA); biogeography-based optimization (BBO); diversity; migration operator

Year:  2016        PMID: 27812276      PMCID: PMC5084829          DOI: 10.4137/EBO.S40457

Source DB:  PubMed          Journal:  Evol Bioinform Online        ISSN: 1176-9343            Impact factor:   1.625


Introduction

More than three amino acid sequences or protein sequence alignment at a time is called multiple sequence alignment (MSA). MSA is the most important tool to solve biological problems. We can solve lots of problem in biology by using MSA. MSA helps to predict the secondary and tertiary structures of RNA and proteins.1,2 We can reconstruct phylogenetic trees using MSA, which can predict the function of an unknown amino acid by aligning its sequences with some other known functions. We can also find similarity of the sequences using MSA, which can help to define similarity in functions and structures.3,4 In order for an MSA to be valid, entire sequences in the multiple alignments must have a common origin. The goal of MSA is to maximize the matching of protein or amino acid as far as possible.5 Therefore, MSA is an important problem in bioinformatics to study the genetic and phylogenetic relationship. There are several methods to solve an MSA problem in the past. The MSA problem can be solved and an optimal alignment can be achieved by using dynamic programming (DP). DP uses a scoring function that contains a large domain. In 1970, Needleman and Wunsch6 proposed the use of DP algorithm to solve the problem of two sequence alignments. But the problem behind the use of DP is that when the number and length of sequence are increased, its complexity also increases in an exponential manner. Then, the MSA problem becomes NP-hard. Since complexity is the main constraint for the computer to solve any problem, we have to maximize the matching of protein or amino acid sequence in limited time or less complexity. This is the major reason why researchers switch to other methods. The MSA problem can be also solved using progressive method. The progressive approach takes less complexity in terms of time and space for solving an MSA problem.7,8 According to progressive alignment method, initially align more similar sequences and then incrementally align more divergent sequences or group of sequences in the initial alignment. The standard representative of progressive methods is CLUSTALW.9 In the first step, according to this approach, we have to assign the weight of each pair of sequences in a partial alignment. We assign small weight for most similar sequences and big weight for most divergent sequences. After that, we take substitution matrix that defines the score between two residues of protein sequence based on similarity. Two types of gap have been introduced in the third step. The first one is residue-specific gap and the second one is locally residue gap penalties. In the fourth step, gap that has been introduced in early position receives locally reduced gap penalties to encourage the opening gap at these positions. These four steps are integrated into CLUSTALW, which is freely available. Progressive alignment method performs better for MSA package in terms of accuracy and time. Even this method has some limitation. The problem behind this method is dependency on initial alignment and choice of scoring scheme. In other words, we bound that to align more similar sequences in the initial stage. If we have not aligned more similar sequences in the initial stage, then the solution may be trapped in local optima. An iterative method is another option for solving MSA. An iterative method does not depend on initial alignment because it starts with initial alignment and improves the solutions per iteration until no more improvement is possible. The main objective of the iterative approach for MSA is to globally improve the quality of a sequence alignment. There are some iterative and stochastic approaches for MSA (for example, simulated annealing10,11). Hidden Markov Models Training (HMMT)12 is based on a simulated annealing process. The problem behind the solution recommended by these methods may be trapped in local optima. Evolutionary algorithms13,14 are population-based algorithms. According to these algorithms, we generate random initial population in the first step. In the next step, we apply some operators to modify the initial population for next generation. We repeatedly use these operators until we reach the global optimum. When using Evolutionary Algorithms (EAs) for an MSA, an initial generation is generated by random manner, and then, the steps of an EA are applied to improve the similarities among the sequences. There are some evolutionary computations for MSA.15–19 There are some other genetic algorithm (GA)-based methods for MSA, such as SAGA,19 GA-ACO,20 MSA-EC,21 MSA-GA,22 RBT-GA,23 GAPAM,24 VDGA,25 and MOMSA.26 We define methodology of some algorithm to solve an MSA problem based on GA. In SAGA, the initial generation is generated randomly. According to SAGA, 22 different operators are used to gradually improve the fitness of MSA. But the problem behind SAGA is time complexity due to repeated use of fitness function. RBT-GA is also a GA-based method, combined with the rubber band technique (RBT), to find optimal protein sequence alignments.27 RBT28 is an iterative algorithm for sequence alignment using a DP table. The authors26 solved 56 problems from reference sets 1, 2, 3, 4, and 5 of the benchmark Bali base 2.0 dataset and Bali base 3.0 dataset. The drawbacks of these evolutionary methods are also local optima due to poor diversity of the solutions.

Motivation and contributions

In the domain of biology, MSA is the most crucial to solve numerous standard problems such as structure prediction and phylogenetic property. According to the open literature, the MSA is still an open-challenging problem. Hence, we motivate to solve an MSA problem using the improved version of biogeography-based optimization (BBO). However, this paper achieves the following contributions. We first proposed a method to improve migration operator in BBO and then used it in MSA for maintaining diversity of the solutions. The results obtained in experimental analysis are better in terms of time factor. In addition, we provide a comparison table, which claims that our method is better than the existing competitive solutions in terms of matching score.

Biogeography-Based Optimization

BBO29 was designed by emigration and immigration of species from one habitat to another. In the BBO algorithm, candidate solutions are called habitats (or islands). Each feature in a solution represented by a habitat is called a suitability index variable (SIV), while the goodness of a habitat is measured by the habitat suitability index (HSI). Habitats with a high HSI can support more species, whereas low HSI habitats support only a few species. Poor habitats can improve their HSI by accepting new features from more attractive habitats in the evolution process. In BBO, there are two main operators: migration and mutation. The migration operator is a probabilistic operator that can randomly modify SIVs based on the immigration rate λ and emigration rate µ. Both λ and µ are functions of the number of species in the ith habitat (Hi). In the original BBO algorithm, for mathematical convenience, µ and λ are assumed to be linear with the same maximum values, which means that the immigration rate λ and emigration rate µ are linear functions of the number of species. The linear migration model for the ith habitat (Hi) can be calculated as where E is the maximum possible emigration rate, I is the maximum possible immigration rate, n is the number of species in the ith habitat, and n is the maximum number of species. The complete process of BBO is given in Algorithm 1.
Algorithm 1

Main procedure of BBO

1Begin
2Initialize the population Pop with N habitats randomly
3Evaluate the fitness (HSI) for each Habitat in Pop
4while (criteria of termination not satisfied)
5Map the HSI to the number of species count S for each habitat
6Calculate the immigration rate and emigration rate according to migration model
7Modify habitats with the migration operator (algorithm 2)
8Mutate habitats with mutation operator (algorithm 3)
9End While
10End
Main procedure of BBO In BBO, the migration operator is a probabilistic operator that is used to randomly adjust each habitat Hi by sharing features among them. The probability that Hi is modified is proportional to its immigration rate λ, while the probability that the source of the modification comes from Hj is proportional to the emigration rate µ The migration equation is expressed as where Hi(SIV) denotes the feature (SIV) of the ith habitat Hi. As Simon stated, the migration operator merely migrates SIVs from one solution to another and does not involve reproduction of “children”.29 The migration operator algorithm process is shown in Algorithm 2.
Algorithm 2

Migration operator

1Begin
2For i = 1 to N
3If rand (0,1) < λi
4Hi is selected
5End If
6For j = 1 to N
7If rand (0,1) < µj
8Hi(SIV) = Hj (SIV)
9End If
10End For
11End For
12End
Migration operator Cataclysmic events can cause a species count to differ from its equilibrium value, thereby suddenly changing a habitat’s HSI. We model this sudden operation in BBO as mutation. The SIVs of the ith habitat Hi can be randomly modified by the mutation operator according to the habitat’s priori probability P The mutation probability m of the ith habitat Hi is expressed as where mmax is a user-defined parameter and Pmax = max(P), i = 1, 2. N. In the BBO mutation operator, an SIV in each habitat is randomly replaced by a new feature, randomly and probabilistically generated in the entire solution space, which tends to increase population diversity. The process of mutation operator is given in Algorithm 3.
Algorithm 3

Mutation operator

1Begin
2For i = 1 to N
3Use µ to compute the probability Pi
4If rand (0, 1) < Pi
5Hi is selected
6Hi(SIV) = Random Value generated within the search space
7End if
9End for
10End
Mutation operator

Proposed Method

Habitat representation

In BBO, each solution is represented as habitat. where N is the number of habitats. In the initialization state, first put the gap in our given MSA randomly. The initial solution is given in Figure 1.
Figure 1

Initial solution.

Binary encoding scheme: In the encoding scheme, put 1 in the position of gap and put 0 in the position of protein sequences. Figure 2 shows an encoding of initial solution.
Figure 2

Encoding scheme.

After that, we are taking decimal value of this binary encoded value from bottom to top of each column. Hence, habitat representation of this solution is X1 = (1, 0, 0, 8, 2, 4) and the number of columns in the MSA is equal to the number of features in the habitat. Now in this manner, we can generate 100 number of solutions putting gap in MSA. Hence, we can find 100 habitats in initialization.

Fitness function

The sum of pair is used to measure fitness of MSAs. Here, each column in an alignment is scored by summing the product of the scores of each pair of symbols. The score of the entire alignment is then summed over all column scores by using (3.2.1) and (3.2.2). Here, W is the cost of MSAs. P is the length (columns) of the alignment, W is the cost of the ith column of length P, N is the number of sequences, Cost(A is the alignment score between two aligned sequences A and A When A ≠ “_” and A ≠ “_”, then Cost(A) is determined from the percentage of acceptable mutations matrix. Also when A = “_” and A = “_” then Cost(A) = 0. Finally, the cost function Cost(A, A) includes the sum of the substitution costs of the insertion/deletions when A = “_” and A ≠ “_” or A ≠ “_” and A = “_” using a model with affine gap penalties as shown in (Eq. 3.2.2). Here, Z is the gap penalty, Q is the cost of opening a gap, r is the cost of extending the gap, and A is the number of the gap. In this paper, gap penalties (gap opening penalty is −5 and the gap extension penalty is −0.40).

New solution generation

In this process, two types of operators are used, one is migration and the other is mutation. To improve the solution, low HSI solution accepts the species from the high HSI solution. The entire process is called as migration.

Migration

Migration is used to diversify the solution space or to explore the solution search space, whereas mutation intensifies the solution search space. In each iteration, we are applying migration and mutation operators to the habitats. In the migration process, we share the feature of high HSI habitat to low HSI to improve the solution quality. This operator is very effective, and the resultant habitat is much more different from the actual habitat. We chose two habitats according to immigration and emigration rates. Afterward, one index was chosen randomly in emigration habitat, and this SIV/element goes to the same position of immigration habitat. This process is presented in Figure 3.
Figure 3

Graphical representation of migration process.

Mutation

This operator is not much more effective, and the difference between actual habitat and resultant habitat is very less. This operator is not frequent and intensifies the solution of search space. In this operator, one habitat is chosen based on mutation probability. Afterward, one index is chosen randomly of this habitat, and put one new SIV/element between 0 and 2N (where N is the total number of sequences in MSA) in place of this element. The graphical representation of this process is shown in Figure 4.
Figure 4

Graphical representation of mutation process.

Main procedure of IBBOMSA Improved migration operator Mutation operator

Test Dataset

We have tested a large number of datasets from Bali base benchmark database to check the quality of our approach. Bali base version 1.030 contains 142 reference alignments, which keeps more than 1000sequences. Bali base version 2.031 is an extended version of Bali base version 1.0. Bali base version 2.0 contains 167 reference alignments, which keeps more than 2100sequences. Bali base version 2.0 contains eight reference sets. Each reference set keeps different types of sequences. Reference set 1 contains a small number of equidistance sequences. Reference set 2 contains totally different or unrelated sequence. Reference set 3 contains a pair of divergent subfamilies. Reference set 4 contains long terminal extension sequence. Reference set 5 contains large internal insertions and deletions. Finally, reference sets 6–8 contain test case problems where the sequences are repeated and the domains are inverted. Bali score is a score that measures the quality of algorithm. Bali score compares between manual alignment sequence (which is available on Bali base version 2.0) and alignment (which comes from some existence method). Range of Bali score is 0–1. If the manual alignment file and our output file are the same, then the score is 1. If the manual alignment file and our output file are totally different, then the score is 0. It gives the value between 0 and 1 according to similarity between Bali base manually alignment file and our output file.

Experimental Analyses

In this section, first, we compare IBBOMSA with the recently proposed MSA algorithms based on evolutionary algorithms, including VDGA,23 GAPAM,22 and MOMSA24 to prove its dominance. After that, we also compare the performance of IBBOMSA with many well-liked aligners. In this paper, IBBOMSA is coded in C language and implemented in the personal computer in Linux platform.

Effect of improved operator in BBO

The BBO algorithm was invented for immigration and emigration of species between habitats in multidimensional search space. Each habitat represents a solution. In traditional BBO, migration features of good solution appear in poor solution as a new feature while still remaining in good solution. Since this feature may exist in several number of solutions, this may increase the exploitation capability and decrease the diversity of search space. An improved migration with in updated feature appears in poor solution, where updated features come from our proposed migration operator. We used one scaling function for maintaining the exploration (diversity) and exploitation capability. But we have to use this scaling function in a proper way to maintain diversity and exploitation capability. If F = 0, it is similar to traditional BBO. Hence if F = 0, diversity of search space is decreasing and exploitation capability is decreasing. If F = 1, diversity of search space is increasing and exploitation capability is increasing. For maintaining these two things, we have taken F = 0.5. To analyze the effect of this proposed operator on the algorithms performance, we have designed five set of experiments. In this set, GAPAM, VDGA, BBO, MOMSA, and improved BBO were run. We measure the fitness of each habitat according to fitness function, which is given in “Fitness function” section. We have used eight BAliBASE datasets for these experiments (4 from each of reference sets 1 and 2, which are illustrated in Figs. 5 and 6, respectively).
Figure 5

Performance of improved BBO and some existing methods per generation with respect to reference set 1. (A) Performance of proposed method and other existing methods with respect to 1ped Data. (B) Performance of proposed method and other existing methods with respect to 1amk Data. (C) Performance of proposed method and other existing methods with respect to 1fieA Data. (D) Performance of proposed method and other existing methods with respect to 1ldg Data.

Figure 6

Performance of improved BBO and some existing methods per generation with respect to reference set 2. (A) Performance of proposed method and other existing methods with respect to 1csy Data. (B) Performance of proposed method and other existing methods with respect to 1cpt Data. (C) Performance of proposed method and other existing methods with respect to 1havA Data. (D) Performance of proposed method and other existing methods with respect to 1sbp Data.

Experimental results and analysis

Comparison of IBBOMSA with MOMSA, VDGA, and GAPAM. In order to examine the performance of our proposed method, IBBOMSA, we compare with well-known existence methods such as VDGA,23 GAPAM,22 and MOMSA,24 which are the best methods for MSA in recent time. We have taken a selected dataset from MOMSA for comparing our proposed method to other methods in an appropriate manner. The authors chose 56 test cases in Bali base 2.0, which contains 18 test cases from reference set 1, 23 test cases from reference set 2, 11 test cases from reference set 3, and 2 test cases from reference sets 4 and 5, respectively. Calculation of fitness function of MSA is given in “Fitness function” section, and the fitness value of the corresponding MSA is calculated. IBBOMSA is performed 10 times, and the best of their results are recorded. Tables 1–5 show the results of IBBOMSA, MOMSA, VDGA, and GAPAM on Bali base reference sets 1, 2, 3, 4, and 5, respectively.
Table 1

Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 1.

NAMESEQ NUMBERSEQ LENGTHGAPAM22VDGA23MOMSA24IBBOMSA
1idy50580.56500.57300.21540.5745
1tvxA4690.31600.26700.05260.4234
1uky42200.40200.44900.51480.5879
kinase52760.48700.54500.84960.7834
1ped33740.49800.48200.73890.8269
2myr44740.31700.35900.43720.4678
1ycc41160.84500.75500.93450.8269
3cyr41090.91100.82100.81540.8934
1ad242130.95600.94100.95620.9279
1ldg46750.96300.90600.98860.8256
1fieA44420.96300.93000.98200.9852
1sesA5630.98200.96200.95830.9929
1krn5820.96000.96001.00000.9286
2fxb5630.97000.97800.93570.9798
1amk52580.99800.98400.99470.9456
1ar5A42030.97400.93800.96040.9238
1 gpb58280.98300.98400.98620.9889
1taq59280.94500.95900.94770.9125
Avg. score0.77970.76620.79260.8219
Table 2

Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 2.

NAMESEQ NUMBERSEQ LENGTHGAPAM22VDGA23MOMSA24IBBOMSA
1aboA15800.79600.69100.83980.8425
1idy19600.98900.99200.97430.9270
1csy19990.76400.88500.85360.8576
1r6920760.96500.83400.94500.9789
1tvxA16690.92000.97400.93650.9819
1tgxA19710.87800.87800.95220.9628
1ubi15600.76700.77800.92110.8967
1wit201060.85100.81500.92030.9119
2trx18940.98600.98600.98630.9468
1sbp162620.76500.77200.88080.9226
1havA262420.87900.84600.89690.8997
1uky232250.80800.89100.94040.9525
2hsdA202550.79600.82900.91920.9249
2pia162940.82800.85000.97330.9345
3grs152370.74600.75100.84920.8719
kinase182870.79900.88800.93970.9452
1ajsA183890.89900.90500.90150.9110
1cpt154340.87500.81200.88620.8943
1lvl234730.78100.81900.94620.9268
1pamA185110.86000.86300.95810.9719
1ped183880.91200.94700.97170.9779
2myr174820.82200.83000.96590.9618
4enl174400.89600.88900.91510.9201
Avg. score0.85130.85760.92490.9270
Table 3

Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 3.

NAMESEQ NUMBERSEQ LENGTHGAPAM22VDGA23MOMSA24IBBOMSA
1idy27600.60100.59900.46000.6025
1r6923780.70900.73300.87840.8879
1ubi22970.38600.41400.66060.7107
1wit191020.75800.87300.88950.7935
1uky242200.46800.48100.63930.6634
kinase232870.82800.89000.89120.8345
1ajsA283960.31100.45300.54220.5754
1pamA195110.83500.78800.92360.8689
1ped213880.81300.89300.91310.9240
2myr214820.51300.65100.72780.7464
4enl194270.80000.86600.81580.8698
Avg score.0.63830.69460.75830.7706
Table 4

Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 4.

NAMESEQ NUMBERSEQ LENGTHGAPAM22VDGA23MOMSA24IBBOMSA
1dynA68480.03300.03300.80000.8978
kinase2184680.38400.54201.00000.8426
Avg. score0.20850.28750.90000.8702
Table 5

Result of IBBOMSA, MOMSA-W, VDGA, and GAPAM on Bali base reference set 5.

NAMESEQ NUMBERSEQ LENGTHGAPAM22VDGA22MOMSA24IBBOMSA
2cba83280.85200.83500.98750.8687
s51153010.83500.74300.98140.9829
Avg. score0.84350.78900.98440.9258

Comparison of IBBOMSA with MOMSA

MOMSA was recently developed for MSA, which is based on multiobjective optimization. MOMSA method has the ability to develop more than one solution at a time. The authors of MOMSA have described related results with many of the alignment algorithms. The proposed method, IBBOMSA, also has the ability to develop more than one solution at a time. For assessment of both algorithms, we have taken all the datasets of BAliBASE version 2.0 and 3.0. Tables 6 and 7 show average SP and TC scores obtained by these two algorithms based on every group of test cases of BAliBASE versions 2.0 and 3.0. The values of SP and TC scores obtained by MOMSA are reported in Ref. 24. From Table 2, we can say that the proposed IBBOMSA performed better than in most of the cases in both terms, SP and TC scores, in BAliBASE version 2.0. From Table 7, we can also say that the proposed IBBOMSA outperforms in terms of SP and TC scores in BAliBASE version 3.0.
Table 6

Alignment score comparison between MOMSA and IBBOMSA on the BAliBASE version 2.0.

ALGORITHMSMOMSA-W (SP)MOMSA-W (TC)IBBOMSA (SP)IBBOMSA (TC)
Ref1 (82)0.8440.7710.8920.774
Ref2 (23)0.9250.5570.9470.637
Ref3 (12)0.7660.4880.8020.442
Ref4 (12)0.8710.6170.8760.653
Ref5 (12)0.9360.8020.9480.812
Total (141) (mean & SD)0.861 ± 0.1810.893 ± 0.0790.702 ± 0.3050.663 ± 0.290
Table 7

Alignment score comparison between MOMSA and IBBOMSA on the BAliBASE version 3.0

ALGORITHMSMOMSA-W (SP)MOMSA-W (TC)IBBOMSA (SP)IBBOMSA (TC)
BB11 (38)0.4960.3790.5430.396
BB12 (44)0.8480.8140.8690.879
BB2 (41)0.7840.3620.7980.342
BB3 (30)0.6940.3710.7930.396
BB4 (49)0.7630.5340.7420.523
BB5 (16)0.6830.4180.6920.498
Total (218) (mean & SD)0.722 ± 0.1830.500 ± 0.3090.739 ± 0.29250.505 ± 0.436

Comparison of IBBOMSA with the state-of-the-art alignment algorithms

In order to prove the accuracy of our proposed IBBOMSA algorithm, we compare the proposed method with some of the widely used alignment algorithms such as MSAProbs,30 Probalign,31 MAFFT,32 Procons,33 Clustal Omega,34 T-Coffee,35 Kalign,36 MUSCLE,37 FSA,38 DIALIGN,39 PRANK,40 and CLUSTALW.9 Table 4 shows the average TC scores of these algorithms on six subsets of BAliBASE 3.0. The data used in Table 8 are drawn from Ref. 24, except the data about IBBOMSA. The proposed IBBOMSA is the fourth best aligner in terms of accuracy. The top aligners are MSAProbs, which reach the highest SP and TC scores on almost all the subsets of BAliBASE version 3.0. The fastest method is Kalign2, and the slowest one is PRANK. IBBOMSA is the seventh best aligner in terms of time. It proves that the effort in improving the accuracy and running time for the proposed IBBOMSA method is successful.
Table 8

Average TC score of several algorithms on BAliBASE version 3.0.

ALIGNMENT ALGORITHMSAVERAGE SCORE (218)BB11 (38)BB12 (44)BB2 (41)BB3 (30)BB4 (49)BB5 (16)TOTAL TIME(S)
MSAProbs0.6070.4410.8650.4640.6070.6220.60812382
Probalign0.5890.4530.8620.4390.5660.6030.54910095.2
MAFFT (auto)0.5880.4390.8310.450.5810.6050.5911475.4
IBBOMSA0.5710.4110.8740.4180.5920.6350.4982472.6
Procons0.5580.4170.8550.4060.5440.5320.57313086.3
Clustal omeg0.5540.3580.7890.450.5750.5790.533539.91
T-Coffee0.5510.410.8480.4020.4910.5450.58781041.5
Kalign0.5010.3650.790.360.4760.5040.43521.88
MOMSA-W0.5000.3790.8140.3620.3710.5340.418110289
MUSCLE0.4750.3180.8040.350.4090.450.46789.57
MAFFT (default)0.4580.3180.7490.3160.4250.480.49668.24
FSA0.4190.2580.8180.1870.2590.4740.39853648.1
Dialign0.4150.270.6960.2920.3120.4410.4253977.44
PRANK0.3760.2650.680.2570.3210.360.356128355
CLUSTALW0.3740.2230.7120.220.2720.3960.308766.47

Conclusions

In this paper, we have proposed an improved BBO algorithm for solving MSA. We design a new migration operator to maintain exploration and exploitation. However, we have to use scaling function carefully. We compared the new algorithm with the existing BBO algorithm. It shows that the new algorithm is superior to the existing BBO or at least competitive. To test our present approach, we considered a good number of benchmark datasets from Bali base 2.0, so as to cover all the test sets of MOMSA. Therefore, the corresponding Bali score of this solution was used to compare with other methods, as they used Bali score as their measure of the quality/accuracy of the MSA. The experimental results proved that the proposed BBO performed better for most of the test cases. Since the solution of the proposed method was not best for some test cases, but it is close to the best. The proposed method performed better than the others because of its improved migration operator to help maintain diversity of search space. After the experimental analysis, we can say that the proposed method can effectively solve an MSA problem.
Algorithm 4

Main procedure of IBBOMSA

1Begin
2Initialize the population with N habitats randomly
3Evaluate the fitness (HSI) for each Habitat in initial population
4While (termination criteria are not satisfied)
5Map the HSI to the number of species count S for each habitat
6Calculate the immigration rate and emigration rate using a migration model
8Modify habitats with the improved migration operator (algorithm 2)
9Mutate habitats (algorithm 3)
11End While
12End
Algorithm 5

Improved migration operator

1Begin
2For I = 1 to N
3If rand (0,1) < λi
4Hi is selected
5End If
6For j = 1 to N
7Generate two different integers p1 and p2 in {1, N}
8If rand (0,1) < µj
9Hj is selected
10Hi(SIV) = Hj(SIV)F * (Hp1 (SIV) + Hp2 (SIV))
11End If
12End For
13End For
14End
Algorithm 6

Mutation operator

1Begin
2For I = 1 to N
3Use µ to compute the probability Pi
4If rand (0,1) < Pi
5Hi is selected
6Hi(SIV) = Random Value generated within the search space
7End if
9End for
10End
  27 in total

1.  T-Coffee: A novel method for fast and accurate multiple sequence alignment.

Authors:  C Notredame; D G Higgins; J Heringa
Journal:  J Mol Biol       Date:  2000-09-08       Impact factor: 5.469

2.  ProbCons: Probabilistic consistency-based multiple sequence alignment.

Authors:  Chuong B Do; Mahathi S P Mahabhashyam; Michael Brudno; Serafim Batzoglou
Journal:  Genome Res       Date:  2005-02       Impact factor: 9.043

3.  Probalign: multiple sequence alignment using partition function posterior probabilities.

Authors:  Usman Roshan; Dennis R Livesay
Journal:  Bioinformatics       Date:  2006-09-05       Impact factor: 6.937

4.  MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.

Authors:  Yongchao Liu; Bertil Schmidt; Douglas L Maskell
Journal:  Bioinformatics       Date:  2010-06-23       Impact factor: 6.937

5.  Multiple sequence alignment using simulated annealing.

Authors:  J Kim; S Pramanik; M J Chung
Journal:  Comput Appl Biosci       Date:  1994-07

6.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

7.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

Authors:  Fabian Sievers; Andreas Wilm; David Dineen; Toby J Gibson; Kevin Karplus; Weizhong Li; Rodrigo Lopez; Hamish McWilliam; Michael Remmert; Johannes Söding; Julie D Thompson; Desmond G Higgins
Journal:  Mol Syst Biol       Date:  2011-10-11       Impact factor: 11.429

8.  RBT-GA: a novel metaheuristic for solving the Multiple Sequence Alignment problem.

Authors:  Javid Taheri; Albert Y Zomaya
Journal:  BMC Genomics       Date:  2009-07-07       Impact factor: 3.969

9.  Fast statistical alignment.

Authors:  Robert K Bradley; Adam Roberts; Michael Smoot; Sudeep Juvekar; Jaeyoung Do; Colin Dewey; Ian Holmes; Lior Pachter
Journal:  PLoS Comput Biol       Date:  2009-05-29       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.