| Literature DB >> 25855811 |
Abstract
Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest.Entities:
Mesh:
Year: 2015 PMID: 25855811 PMCID: PMC4489265 DOI: 10.1093/nar/gkv317
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Flowchart of the SANSparallel web server. Computations done by the web server are blue. Results sent to the user include textual outputs (green) and alignment visualizations (orange). Multiple alignment (instantiated from Jalview Desktop) and sequence logo computations utilize third party resources in the cloud (pink).
Figure 2.Benchmark results showing the number of true positives detected in the top-1000 hits and top-500 hits binned by sequence identity.
Speed comparison of database search programs: time taken to search 4174 queries of the Dickeya solani benchmark
| Program | Hits | Cores | Time (s) | Relative speed |
|---|---|---|---|---|
| verifast | 100 | 16 | 62 | 5903 |
| fast | 100 | 16 | 65 | 5631 |
| verifast | 500 | 16 | 111 | 3298 |
| verifast | 1000 | 16 | 170 | 2153 |
| fast | 500 | 16 | 178 | 2056 |
| LAMBDA | 500 | 16 | 216 | 1695 |
| slow | 100 | 16 | 235 | 1558 |
| fast | 1000 | 16 | 324 | 1130 |
| LAST | 1000 | 16 a | 327 | 1119 |
| slow | 500 | 16 | 406 | 902 |
| DIAMOND | 1000 | 16 | 446 | 821 |
| slow | 1000 | 16 | 612 | 598 |
| verislow | 500 | 16 | 624 | 587 |
| verislow | 1000 | 16 | 792 | 462 |
| verifast | 1000 | 1 | 1009 | 363 |
| UBLAST b | 1000 | 16 a | 1310 | 279 |
| RAPSEARCH2 | 1000 | 16 | 1469 | 249 |
| LAMBDA | 500 | 1 | 2052 | 178 |
| LAST | 1000 | 1 | 2957 | 124 |
| fast | 1000 | 1 | 3297 | 111 |
| SANSc | 1000 | 1 | 3809 | 96 |
| BLAT b | 1000 | 1 | 4307 | 85 |
| slow | 1000 | 1 | 5015 | 73 |
| verislow | 1000 | 1 | 7094 | 52 |
| RAPSEARCH2 | 1000 | 1 | 18761 | 20 |
| UBLAST b | 1000 | 1 | 28399 | 13 |
| BLAST | 1000 | 16a | 32149 | 11 |
| BLAST | 1000 | 1 | 366046 | 1 |
aGNUparallel.
bDatabase split to chunks (UBLAST: 19, BLAT: 5) due to program's size limit.
cSerial implementation (9).
Figure 3.Example output.