| Literature DB >> 28970546 |
Basharat Bhat1, Nazir A Ganai2, Syed Mudasir Andrabi2, Riaz A Shah2, Ashutosh Singh3.
Abstract
Membrane proteins plays significant role in living cells. Transmembrane proteins are estimated to constitute approximately 30% of proteins at genomic scale. It has been a difficult task to develop specific alignment tools for transmembrane proteins due to limited number of experimentally validated protein structures. Alignment tools based on homology modeling provide fairly good result by recapitulating 70-80% residues in reference alignment provided all input sequences should have known template structures. However, homology modeling tools took substantial amount of time, thus aligning large numbers of sequences becomes computationally demanding. Here we present TM-Aligner, a new tool for transmembrane protein sequence alignment. TM-Aligner is based on Wu-Manber and dynamic string matching algorithm which has significantly improved its accuracy and speed of multiple sequence alignment. We compared TM-Aligner with prevailing other popular tools and performed benchmarking using three separate reference sets, BaliBASE3.0 reference set7 of alpha-helical transmembrane proteins, structure based alignment of transmembrane proteins from Pfam database and structure alignment from GPCRDB. Benchmarking against reference datasets indicated that TM-Aligner is more advanced method having least turnaround time with significant improvements over the most accurate methods such as PROMALS, MAFFT, TM-Coffee, Kalign, ClustalW, Muscle and PRALINE. TM-Aligner is freely available through http://lms.snu.edu.in/TM-Aligner/ .Entities:
Mesh:
Substances:
Year: 2017 PMID: 28970546 PMCID: PMC5624947 DOI: 10.1038/s41598-017-13083-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1TM-Aligner workflow on a set of input sequences. Here TM-Aligner predicts transmembrane, cytoplasmic and non-cytoplasmic regions from input sequences using TMHMM, input sequences are then classified into different groups based on the number of TMs present in each sequence. Classes with the dominant number of transmembrane sequences were chosen for alignment which were then used as a seed alignment for overall alignment process.
Performance comparison between TM-Aligner and other MSA tools on each BAliBASE3-reference set7 protein family: a) Sum-of-Pair (SP) score b) Time - indicate processing time/CPU time in seconds. Standalone version of PRALINETM is unavailable, so praline is not included in time comparison table; however, the time taken by PRALINETM is greater than TM-Coffee. Every other tool including TM-Aligner is tested individually using single threaded machine with two available cores.
| (a) |
| ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 | 0.652 | 0.738 |
| 0.461 | 0.519 | 0.630 | 0.321 | 0.700 |
|
| 43 | 0.914 | 0.946 | 0.910 | 0.906 |
| 0.914 | 0.916 | 0.919 |
|
| 14 | 0.838 | 0.839 | 0.847 | 0.864 | 0.865 | 0.829 | 0.704 |
|
|
| 55 | 0.859 |
| 0.850 | 0.786 | 0.869 | 0.829 | 0.501 | 0.870 |
|
| 33 | 0.897 | 0.911 | 0.905 | 0.887 | 0.901 | 0.857 | 0.501 |
|
|
| 52 | 0.319 |
| 0.500 | 0.354 | 0.514 | 0.538 | 0.285 | 0.509 |
|
| 59 | 0.773 | 0.718 | 0.747 | 0.630 | 0.741 | 0.644 | 0.275 |
|
|
| 128 | 0.813 |
| 0.832 | 0.847 | 0.847 | 0.806 | 0.480 | 0.815 |
|
| 0.758 | 0.807 | 0.790 | 0.710 | 0.770 | 0.755 | 0.490 | 0.796 | |
|
|
| ||||||||
|
|
|
|
|
|
|
|
|
| |
|
| 51 | 778 | 17633 | 5 | 28 | 38 | 3 | 17 | |
|
| 43 | 1836 | 35622 | 8 | 28 | 35 | 6 | 26 | |
|
| 14 | 17 | 1055 | 1 | 3 | 12 | 1 | 3 | |
|
| 55 | 1443 | 21885 | 6 | 32 | 44 | 3 | 24 | |
|
| 33 | 38 | 3962 | 1 | 3 | 26 | 1 | 7 | |
|
| 52 | 1385 | 18521 | 4 | 78 | 45 | 6 | 26 | |
|
| 59 | 602 | 21055 | 6 | 32 | 54 | 3 | 21 | |
|
| 128 | 4346 | 35865 | 19 | 52 | 117 | 6 | 56 | |
|
| 1300 | 19500 | 6 | 32 | 46 | 3 | 22 | ||
Performance comparison (in terms of SP-Score) between TM-Aligner and other transmembrane alignment tools on Pfam alignments. ‘x’ - represents, alignment could not be completed either due to restriction on number of input sequences or resource limitation.
| Pfam ID. | Number of Seq. | TM-Aligner | TM-Coffee | Praline | Promals |
|---|---|---|---|---|---|
| PF01036) | 1038 |
| x | x | 0.708 |
| PF10316 | 434 |
| x | 0.658 | 0.708 |
| PF14778 | 424 |
| x | 0.706 | 0.759 |
| PF01534 | 1894 |
| x | x | x |
| PF02117 | 182 | 0.812 |
| 0.711 | 0.810 |
| PF10325 | 372 | 0.737 | x | 0.608 | 0.100 |
| PF10413 | 177 |
|
|
|
|
| PF02076 | 981 |
| x | x | 0.557 |
| PF02714 | 3894 | 0.510 | x | x | x |
| PF02116 | 261 | 0.900 | 0.910 | 0.892 |
|
| PF03383 | 78 | 0.540 |
| 0.485 | 0.517 |
Performance comparison between TM-Aligner and other transmembrane alignment tools on GPCRDB structural alignments.
| Family | No. of sequences | TM-Aligner | Praline | TM-Coffee | Promals |
|---|---|---|---|---|---|
| Human GPCR protein sequences | 398 | 0.430 | 0.261 | 0.284 | 0.201 |
| ClassA GPCR protein sequences* | 194 | 0.841 | 0.797 | 0.839 | 0.802 |
*Only TM regions were used for benchmarking.
TM-Aligner compared with other available transmembrane alignment tools.
| ALIGNMENT TOOL | ALGORITHM USED | INPUT LIMITATION |
|---|---|---|
| TM-ALIGNER | TM-Prediction and Dynamic Alignment |
|
| TM-COFFEE[ | Homology modelling |
|
| PRALINE[ | Homology modelling |
|
| PROMALS[ | Homology modelling |
|
Figure 2Front page of the TM-Aligner server. The main section allows the user to paste or upload sequences in fasta format. Options to modify alignment parameters, like substitution matrix, gap open and gap extension penalty are provided. A brief description of each option is available in the tutorial section inside navigation panel of web-server.
Figure 3Colored alignment produced by TM-Aligner server. Input sequences are of cAMP receptor proteins. (A) Shows result page, TM-Aligner provides visualization of multiple sequence alignment in different color schemes and with a variety of options. “TM-Info” tab on the result page provides complete information about a total number of transmembrane present in the input sequences (B).