Literature DB >> 25952019

MICA: A fast short-read aligner that takes full advantage of Many Integrated Core Architecture (MIC).

Ruibang Luo, Jeanno Cheung, Edward Wu, Heng Wang, Sze-Hang Chan, Wai-Chun Law, Guangzhu He, Chang Yu, Chi-Man Liu, Dazong Zhou, Yingrui Li, Ruiqiang Li, Jun Wang, Xiaoqian Zhu, Shaoliang Peng, Tak-Wah Lam.   

Abstract

BACKGROUND: Short-read aligners have recently gained a lot of speed by exploiting the massive parallelism of GPU. An uprising alterative to GPU is Intel MIC; supercomputers like Tianhe-2, currently top of TOP500, is built with 48,000 MIC boards to offer ~55 PFLOPS. The CPU-like architecture of MIC allows CPU-based software to be parallelized easily; however, the performance is often inferior to GPU counterparts as an MIC card contains only ~60 cores (while a GPU card typically has over a thousand cores).
RESULTS: To better utilize MIC-enabled computers for NGS data analysis, we developed a new short-read aligner MICA that is optimized in view of MIC's limitation and the extra parallelism inside each MIC core. By utilizing the 512-bit vector units in the MIC and implementing a new seeding strategy, experiments on aligning 150 bp paired-end reads show that MICA using one MIC card is 4.9 times faster than BWA-MEM (using 6 cores of a top-end CPU), and slightly faster than SOAP3-dp (using a GPU). Furthermore, MICA's simplicity allows very efficient scale-up when multiple MIC cards are used in a node (3 cards give a 14.1-fold speedup over BWA-MEM).
SUMMARY: MICA can be readily used by MIC-enabled supercomputers for production purpose. We have tested MICA on Tianhe-2 with 90 WGS samples (17.47 Tera-bases), which can be aligned in an hour using 400 nodes. MICA has impressive performance even though MIC is only in its initial stage of development.
AVAILABILITY AND IMPLEMENTATION: MICA's source code is freely available at http://sourceforge.net/projects/mica-aligner under GPL v3. SUPPLEMENTARY INFORMATION: Supplementary information is available as "Additional File 1". Datasets are available at www.bio8.cs.hku.hk/dataset/mica.

Entities:  

Mesh:

Year:  2015        PMID: 25952019      PMCID: PMC4423751          DOI: 10.1186/1471-2105-16-S7-S10

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Introduction

With the rapid advance of sequencing technologies, there is continuously demand for faster and faster analysis. The recently announced Illumina HiSeq × Ten sequencing system promises to sequence 18,000 whole human genomes (30x) in one year (four such systems can sequence more genomes than in all of history), while cutting the cost to $1,000 each. To cater to such capacity, it is important to develop new analysis software to fully utilize available acceleration hardware in addition to the CPU. For example, SOAP3-dp[1] uses a graphics processing unit (GPU) and is a few times faster than mainstream CPU aligners and delivers higher sensitivity. Besides GPU, attention has also fallen on Intel's new product Many Integrated Core (MIC), a.k.a. Xeon Phi Co-processor. MIC was introduced in 2011. It is an acceleration device whose hardware and system-software architecture support its use for general purpose computing. It is well suited to software implementations where computations on many thousands of data items can be carried out independently in parallel. The latest product has 57-61 cores and 8 GB of memory in one board, providing ~1 TFlops. Two of the top ten supercomputers in TOP500 (which ranks the world's 500 most powerful supercomputers) are equipped with MIC (Tianhe-2 has 48,000 MIC boards, and Stampede has 6,400 boards). Experience has shown, however, that it is not easy to build useful read alignment software using massive core architectures such as GPU and MIC. The apparent problem is that the most biologically relevant sequence-alignment algorithm[2,3] involves dynamic programming dependencies that are awkward to compute efficiently in parallel. The fastest GPU implementations of the algorithm to date relies on task parallelism[1], where each thread of execution computes an entire alignment independently of all other parallel threads. This, however, requires sequences to be aligned to have similar lengths to ensure balanced tasks distribution among cores. Another typical sequence alignment problem is that a short query sequence (100 to 250 bp) must be aligned with a comparatively long (3 Gbp or longer) reference sequence. Since a brute-force search for all candidate alignments in this setting would be computationally prohibitive, read aligners typically construct a list of candidate reference sequence locations within which potential alignments might be discovered. The size of a list depends on the complexity of the sequence to be aligned and this accounts for a significant proportion of the computational imbalance involved in read alignment using massive core architectures. Several high-throughput read aligners including BarraCUDA[4], CUSHAW[5] and SOAP3-dp[1] that utilize GPU acceleration have been developed in the past few years, but to this date, limited studies have been carried out on short read alignment using MIC. This paper introduces MICA, a new short-read aligner designed to fully utilize the computing power of MIC.

Method

MICA accepts input reads in FASTA/FASTQ format and outputs alignment results in SAM/BAM format. In a typical setting, MICA runs on a server (host) equipped with 1 to 3 MIC cards. MICA runs each MIC using the offload mode (instead of native mode) to exploit the host's memory and provide better I/O performance. The latter is important when dealing with large volume of sequencing data. For each MIC card, MICA maintains a CPU thread in the host called an MIC-controller, which feeds the MIC with around million reads each time and spawns 224 MIC threads (running on 56 cores) to align the reads in parallel. At the end, the alignment results are copied to the MIC-controller for output (Figure 1). Note that reads with different complexities vary in aligning time. To achieve maximum throughput, MICA dynamically balances the load of the threads to avoid unnecessary idling.
Figure 1

MICA's architecture: interaction between the host and MIC cards.

MICA's architecture: interaction between the host and MIC cards. Algorithmically, MICA maintains a BWT index in each MIC, and it adopts SOAP3-dp's approach to align reads shorter than 150 bp, and uses a new approach to handle longer reads. SOAP3-dp was designed for HiSeq 2000 paired-end reads of 100 bp; its efficiency stems from the fact that a GPU, with a 2 way-BWT index[6], can efficiently align at least one end of a paired-end read with 0-2 mismatches; the other end can be aligned using dynamic programming in the GPU. For longer reads (150 bp or more), SOAP3-dp deteriorates in sensitivity and speed because GPU is inefficient to align with more mismatches using the index. Thus, we need a new approach. In the following, we divide the discussion into two parts: the first part describes the techniques on how to utilize the resources of MIC to match the performance of SOAP3-dp on GPU, and the second part is about the new techniques for aligning reads. Our experiment revealed that an MIC core was 4 to 6 times slower than a CPU core when running programs designed for CPU, and an MIC with 57 cores might be comparable to a 12-core CPU when used for brute-force parallelization of a CPU program. Noteworthy, each MIC core has thirty-two 512-bit registers; each allows sixteen 32-bit data to be operated in parallel. Such extra parallelism, if exploited properly, can boost the efficiency dramatically. This requires a lot of engineering work, though. Our first success is on the BWT index, MICA exploits 512-bit operations to speed up different arithmetic and memory transaction operations when querying the BWT (SOAP3-dp is based on 64-bit operations). Next, we turn to dynamic programming, which is for aligning reads allowing indels and soft clipping. Below we give the details of a new parallel algorithm for dynamic programming that can utilize the 512-bit registers.

A. Highly parallel dynamic programming

To align a read of length m and a reference region of length n by DP, the traditional approach (Figure 2) needs to fill a 2-dimenional table (denoted T[n,m]), in which T[i, j] is determined by T[i-1,j-1], T[i-1,j] and T[i, j-1] (upper, upper-left and left dependencies). Such dependencies force us to compute consecutive entries on a row (or a column) one by one.
Figure 2

Traditional approach to Dynamic Programming.

Traditional approach to Dynamic Programming. To circumvent sequential dependencies, we represent this table in the diagonal order as a single array (Figure 3), i.e., T[i, j] is followed by T[i-1,j+1], T[i-2,j+2], and so on. This representation allows us to fill 16 entries of the table in one step. To fill T[i, j], T[i-1, j+1],..., T[i-15, j+15] in a single iteration, we need to load 16 entries from 3 memory locations beginning from T[i-1, j-i], T[i-1, j] and T[i, j-1]. Another concern is that the number of memory transactions in each core is an important factor to the overall efficiency. To minimize the number of memory transactions, for each entry, we pack all the necessary information (including the read and reference nucleotide) required for calculation into a 32-bit entry in the DP table and the memory transactions are reduced to loading only three 512-bit vectors of the DP table; The number "three" comes from the affine gap penalty model, which requires to compute three DP tables M, I, D to store the optimal score when aligning up to each position with the last character being a Match, Insert or Delete, respectively. To reduce memory transactions by increasing locality, the M, I, D values for each i, j are packed into T[i, j].
Figure 3

MICA's approach to Dynamic Programming.

MICA's approach to Dynamic Programming.

B. New seeding strategies

For longer reads, we can only afford to use the BWT to align short fragments of a read (i.e., the seeds) and then count on dynamic programming to verify the candidate positions of the seed. Ideally we want fewer seeds (to save time on seed alignment) and the seeds should return correct candidate positions without too many incorrect ones (to save time on dynamic programming). MICA attempts to improve SOAP3-dp and other aligners by using as few seeds as possible, while maintains a balance between efficiency (fast & do not give too many candidates) and sensitivity (capture correct candidate positions for more reads). MICA uses a combination of the following strategies.

a) Non-branching mismatches

MICA allows more mismatches during seed alignment but without sacrificing efficiency. SOAP3-dp allows only one mismatch as the time for seed alignment increases exponentially with the number of mismatches (due to the branching of the search tree). We observe that very often a mismatch is unambiguous and there is exactly one way to correct it according to the reference genome. MICA allows 1-2 "non-branching" mismatches in addition to one "branching" mismatch for seed alignment. In details, when performing alignment, we take a single character in the read and extend it character by character and check if the pattern exists in the reference genome. We call it a mismatch when we try to extend it such that the character in the read does not match the corresponding character in the reference genome. We can classify the mismatches into two types, branching mismatches (BM) and non-branching mismatches (NBM) (Figure 4). (a) When a NBM occurs, the current character in the read does not allow an extension, but there is another unique character that allows an extension. (b) When a BM occurs, there are at least 2 characters allowing an extension, one of them can be the character in the read, and we choose to extend it with a mismatch character.
Figure 4

Illustration of branching mismatches (BM) and non-branching mismatches (NBM).

Illustration of branching mismatches (BM) and non-branching mismatches (NBM). For computing purposes, a BM increases the number of possible extensions much more than a NBM, and is more time-consuming to handle. Yet, we observed that very often a mismatch is unambiguous and there is exactly one way to correct it according to the reference genome (i.e. it is a NBM). Therefore, we allow MICA to find alignments with up to 2 non-branching mismatches, in addition to the 1 branching mismatch for seed alignment. This approach discovers more seeds than the "1 mismatch only" approach.

b) Read-sensitive seed length

MICA dynamically adjusts the seed length depending on the content of the read. If a region of the read appears to be non-repetitive, it uses a shorter seed to avoid missing critical candidate positions; for a region that appears to be repetitive, we choose a long seed to reduce the number of candidates. More specifically, MICA follows SOAP3-dp's design for rapidly aligning those reads with < 3 mismatches and without an Indel, leaving those reads failed to be aligned (with ≥ 3 mismatches or with ≥ 1 Indel) to the Dynamic Programming (DP) module. This rapid mode works efficiently with 100 bp paired-end reads, where the slower DP module handles only about 10% of the reads. However, new sequencers such as Illumina HiSeq × Ten, produces paired-end reads 150 bp in length. The probability of a 150 bp read to span ≥ 3 mismatches or an Indel is higher than that of 100 bp, empirically forcing the DP module to handle 30-40% of the reads, which slows down the performance of MICA significantly. To enable MICA to work with 150 bp or longer reads efficiently, we have redesigned the workflow by disabling the rapid mode and using multiple rounds of seed discovery in the DP module to catch up to the speed. The longer seed lengths required in the former rounds limit the number of candidate alignment regions, where the shorter seed lengths in the latter rounds ensure the sensitivity, but requiring much more computation due to the large number of candidate alignment regions to be verified. Most reads with fewer mismatches and Indels thus could be aligned in the former rounds, leaving complicated reads for the latter rounds, which are more computationally demanding. Table 1 gives the implementation details of each round as in the latest version of MICA. The details of SOAP3-dp are also provided for comparison.
Table 1

MICA and SOAP3-dp seeding details.

MICA:
RoundSeed LengthSeed OverlapSeed # Limit

Round 014001,000

Round 18001,000

Round 2460100

SOAP3-dp DP module:

Round 12820100

Round 232281,000

"Seed Overlap" means the maximum overlap between two seeds when extracting seeds from a read.

MICA and SOAP3-dp seeding details. "Seed Overlap" means the maximum overlap between two seeds when extracting seeds from a read.

c) Breaking seeds at variant positions

MICA also tries to exclude read positions that are suspicious to be errors or variants (especially Indels) as part of a seed. The above seeding strategies are used in multiple rounds with different parameters to attain better speed without sacrificing sensitivity. The superior sensitivity of the seeds allows MICA to pair up the candidate positions from both ends of almost all reads before the DP. This saves a lot of time on verifying incorrect positions.

Results

A. Software settings and machine specifications

a) MICA: Version r178. Compiled with Intel C/C++ Compiler version 13.1; one to three 57-core MIC cards (8G, ECC enabled), each coupled with a CPU core; and one CPU core for output (in SAM format). b) SOAP3-dp: Version r176. Compiled with GCC 4.7.2 and CUDA SDK 5.5; one nVidia GeForce GTX680 (4G); 4 CPU cores; serial BAM output. c) BWA-MEM: Version 0.7.5a. Compiled with GCC 4.7.2; 6 CPU cores. SAM output. d) In general: Besides the acceleration devices (MIC/GPU), all other hardware is the same for all experiments. Specifically, Intel i7-3730k, 6-core @3.2 GHz, 64G memory

B. Real data comparison with other aligners

We used 77-fold depth of 150 bp Illumina paired-end reads of the YH samples [7] (PE150) to benchmark MICA and other state-of-the-art aligners including GPU-based SOAP3-dp [1] and CPU-based BWA-MEM [8]. The results are shown in Table 2. The figures were calculated as the average of three repeated runs, and a tailor-made setting was used for each aligner to ensure the best performance. For the PE150 dataset, MICA using one card is ~4.9 times faster than BWA-MEM and slightly faster than SOAP3-dp. The speed of MICA can be scaled up almost linearly with additional cards. When using three cards, MICA is ~14.1 times faster than BWA-MEM. MICA's sensitivity is 3.1% and 0.47% higher than BWA-MEM and SOAP3-dp, respectively.
Table 2

Performance of MICA, SOAP3-dp and BWA-MEM on experimental data.

DatasetVolume (Gbp)# of Read Pairs (M)FoldMICASOAP3-dpBWA-MEM

1 card2 cards2 cardsscale-up3 cards3 cards scale-upProperly paired1 cardProperly paired6 coresProperly paired
PE150232.15773.8377.3820,919 s(5.81 hr)10,618 s(2.95 hr)1.97x7,183 s(2.00 hr)2.91x95.48%25,878 s(7.19 hr)95.01%101,466 s(28.19 hr)92.32%

PE100148.43742.1649.4815,879 s(4.41 hr)8,093 s(2.25 hr)1.96x5,453 s(1.51 hr)2.91x97.23%17,982 s(5.00 hr)97.08%53,832 s(14.95 hr)95.74%
Performance of MICA, SOAP3-dp and BWA-MEM on experimental data. We have also benchmarked 100 bp paired-end reads [7] (PE100, ~49-fold). Interestingly, MICA's acceleration on PE150 is slightly more significant than on PE100. For the latter, MICA is ~3.4 and ~9.9 times faster than BWA-MEM when using one and three MIC cards, respectively. When processing many shorter reads, MICA has a bottleneck in accessing the memory.

A. Simulated data comparison with other aligners

We carried out simulated data test with short read simulator Mason[9] to assess the accuracy and sensitivity of MICA. We tested with 2 sets (100 bp and 150 bp) of 6M Illumina-style paired-end (PE) reads with 500 bp insert size from GRCh37 major build. Bowtie2[10], SeqAlto[11] and GEM[12] allow users to sacrifice accuracy and sensitivity for speed with switches. We applied "very-fast", "sensitive", and "very-sensitive" switches to Bowtie2, "fast (-f)" to SeqAlto, and "fast adaptive (--fast-mapping)", "fastest (--fast-mapping = 0)" to GEM. We used indices with full suffix array (SA) for SOAP3-dp. All parameters of MICA, SOAP3-dp and SOAP3 were set to default, and parameters for other aligners including BWAaln and CUSHAW2[13,14] were set to favor the two read types and length. 13 sets of programs and parameters were compared in total. In both 100 bp and 150 bp datasets, MICA is comparable to SOAP3-dp. MICA has advantage over other tools with higher speed, higher sensitivity and lower FDR results (Table 3 for 100 bp PE, and 150 bp table in supplementary).
Table 3

Comparison on 13 sets of programs and parameters using 100 bp paired-end simulated reads.

6M 100 bp Paired-end reads, 1.2 Gbp bases. 500 bp insert size, 25 bp standard deviation.MICGPUCPU

MICA(1 MIC Card, 240 threads)SOAP3-dpSOAP3Bowtie2(Sensititve)Bowtie2(Very-Sensititve)Bowtie2(Very-fast)BWA1SeqAltoSeqAlto(Fast alignement)CUSHAW2GEM2GEM2(Fast Mapping: adaptive)GEM2(Fast Mapping: 0)
ConfigurationCPU (thread: core i7-3930k)1444444444444

GPU (device: GTX680)0110000000000

Computational ResourcesTotal Elapsedsec.178162132966197467211544953791303416446298

Fold0.761.231.007.3214.955.098.743.752.879.873.153.382.26

Loading Index3sec.83747438383853+1+196964040+140+140+1

Alignment4sec.9588589281936634370+369+3603992831263199+176238+16790+167

Fold0.950.880.589.2819.366.3410.993.992.8312.633.754.052.57

Avg. MemoryGB25.217.217.33.33.33.33.576.93.64.34.34.3

Peak MemoryGB29.918.119.23.53.53.54.87.27.23.64.34.34.3

Alignment MetricsAligned#11,999,810 11,999,82711,870,74011,999,76311,999,93611,998,22611,998,80412,000,00011,995,87211,999,97511,999,76311,999,48411,995,422

Diff.-17-129,070-47126-1,584-1,006190-3,938165-47-326-4,388

Properly Paired#1199966811,999,46011,742,90211,998,91211,999,34411,996,52811,997,25411,999,97611,995,41011,977,21811,998,99411,997,70211,991,992

Diff.--208-256766-756-324-3140-2414308-4258-22450-674-1966-7676

Incorrectly Aligned#48,184 40,561138,655143,012141,373147,76485,29795,67299,24399,24356,51461,64261,887

Diff.--7,62390,47194,82893,18999,58037,11347,48851,05951,0598,33013,45813,703

Sensitivity5%99.60%99.66%97.77%98.81%98.82%98.75%99.28%99.20%99.14%99.17%99.53%99.48%99.45%

Diff.-0.06%-1.83%-0.79%-0.78%-0.85%-0.32%-0.40%-0.46%-0.43%-0.07%-0.12%-0.15%

FDR6%0.40%0.34%1.17%1.19%1.18%1.23%0.71%0.80%0.83%0.83%0.47%0.51%0.52%

Diff.--0.06%0.77%0.79%0.78%0.83%0.31%0.40%0.43%0.43%0.07%0.11%0.12%

1 The time consumption of BWA is calculated as "align left reads"+"align right reads"+"sampe". The index loading times of "align right reads" and "sampe" modules are 1 second due to the reason that, index files were cached during "align left reads". However, datasets larger than the host memory will flush the cache during alignment.

2 The alignment time consumption of GEM is calculated as "alignment"+"convert to SAM format". The conversion module was run with 4 threads in consistent with the alignment module.

3 SOAP3-dp, SOAP3, SeqAlto and GEM aligners explicitly provide index loading time consumption. The index loading time for Bowtie2, CUSHAW2 and BWA are calculated by the total size of index, divided by 100 MB/s, which is the average network file system speed of the testing environment. The index loading time maybe underestimated while the time processing the index was not calculated.

4 The alignment times were explicitly provided by the aligners (include results processing and input/output time) or calculated by total elapsed time minus estimated index loading time.

5 Sensitivity is calculated as "Correctly aligned reads"/"All simulated reads". The higher the better.

6 FDR is calculated as "Incorrectly aligned reads"/"All aligned reads". The lower the better

Comparison on 13 sets of programs and parameters using 100 bp paired-end simulated reads. 1 The time consumption of BWA is calculated as "align left reads"+"align right reads"+"sampe". The index loading times of "align right reads" and "sampe" modules are 1 second due to the reason that, index files were cached during "align left reads". However, datasets larger than the host memory will flush the cache during alignment. 2 The alignment time consumption of GEM is calculated as "alignment"+"convert to SAM format". The conversion module was run with 4 threads in consistent with the alignment module. 3 SOAP3-dp, SOAP3, SeqAlto and GEM aligners explicitly provide index loading time consumption. The index loading time for Bowtie2, CUSHAW2 and BWA are calculated by the total size of index, divided by 100 MB/s, which is the average network file system speed of the testing environment. The index loading time maybe underestimated while the time processing the index was not calculated. 4 The alignment times were explicitly provided by the aligners (include results processing and input/output time) or calculated by total elapsed time minus estimated index loading time. 5 Sensitivity is calculated as "Correctly aligned reads"/"All simulated reads". The higher the better. 6 FDR is calculated as "Incorrectly aligned reads"/"All aligned reads". The lower the better For 100 bp reads, MICA takes 178 seconds to align 6M read pairs, which is 16 seconds slower than SOAP3-dp (due to the increasing time in loading the index), but 1.67 to 11.09 times faster than other tools. MICA's sensitivity is 99.60% which is 0.06% lower than SOAP3-dp but 0.07-1.83% higher than other tools. MICA's FDR is 0.40%, which is 0.07-0.79% lower than other tools except being 0.06% higher than SOAP3-dp.

B. Production test on Tianhe-2 supercomputer

Tianhe-2 is currently the champion of the Top500 list of supercomputers. It has in total 16,000 computing nodes, each equipped with two 6-core Intel E5-2692 v2 @ 2.20 Hz CPU, 64 GB main memory and 3 MIC cards. Each MIC card has 57 cores and 8 GB on-board memory with ECC enabled. To test Tianhe-2 with the workload of a population scale study, we used whole genome sequencing data of 45 CHB and 45 CHS samples (Supplementary Details) from the 1000 genomes project; a sample had 64.68-fold depth on average. In total, we had 932 lanes of 100 bp paired-end reads with size varying from 1.82- to 12.7-fold per lane. The alignments of 932 lanes were carried out with 5 different settings. All alignment jobs were submitted simultaneously using one node per lane, occupying 4660 nodes in total at the very beginning. The results are shown in Table 4. 19 jobs failed to complete due to node or card failure, thus excluded from calculation. Using 3 cards per node and SAM output, the mean time consumption for 923 completed lanes is 1,303 seconds, with the largest lane finished within an hour. By stacking up the alignment jobs to minimize the idle time of computing nodes involved, we anticipate that all 923 jobs could be finished in an hour using less than 400 nodes.
Table 4

Results of experiments on Tianhe-2 using 5 different settings.

SettingMIC card usedOutput FormatFinished laneLongest (sec)Mean (sec)Median (sec)
13SAM9233,4251,3031,220

23BAM9296,0122,7772,666

336-thread BAM9284,4841,3711,321

426-thread BAM9314,2691,5081,475

516-thread BAM9306,9152,9432,879
Results of experiments on Tianhe-2 using 5 different settings. Using 3 cards and 6-thread BAM output, the mean time consumption is about the same but the maximum time consumption is ~18 minutes longer. Interestingly, alignment using 2 cards is almost twice as fast as using 1 card, but using 3 cards is only slightly faster than using 2 cards, since the computation was throttled by the slow compression algorithm in the BAM output module. Splitting the output into two or more files to enable non-blocking parallel compression will help in exploiting the full power of all three cards.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

S. C., J. C., E. W. and H. W. contributed equally to this work. Ruib. L. and T. L. managed the project and led the design. S. C., J. C., E. W. and H. W. implemented the software. W. L., D. Z., C. Y., C. L., Y. L., Ruiq. L., J. W., X. Z. and S. P. tested the software. S. C., J. C., E. W., Ruib. L. and T. L. wrote the paper.
  12 in total

1.  Mapping short DNA sequencing reads and calling variants using mapping quality scores.

Authors:  Heng Li; Jue Ruan; Richard Durbin
Journal:  Genome Res       Date:  2008-08-19       Impact factor: 9.043

2.  CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform.

Authors:  Yongchao Liu; Bertil Schmidt; Douglas L Maskell
Journal:  Bioinformatics       Date:  2012-05-09       Impact factor: 6.937

3.  The GEM mapper: fast, accurate and versatile alignment by filtration.

Authors:  Santiago Marco-Sola; Michael Sammeth; Roderic Guigó; Paolo Ribeca
Journal:  Nat Methods       Date:  2012-10-28       Impact factor: 28.547

4.  Fast and accurate read alignment for resequencing.

Authors:  John C Mu; Hui Jiang; Amirhossein Kiani; Marghoob Mohiyuddin; Narges Bani Asadi; Wing H Wong
Journal:  Bioinformatics       Date:  2012-07-18       Impact factor: 6.937

5.  An improved algorithm for matching biological sequences.

Authors:  O Gotoh
Journal:  J Mol Biol       Date:  1982-12-15       Impact factor: 5.469

6.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.

Authors:  Ben Langmead; Cole Trapnell; Mihai Pop; Steven L Salzberg
Journal:  Genome Biol       Date:  2009-03-04       Impact factor: 13.583

7.  SOAP3-dp: fast, accurate and sensitive GPU-based short read aligner.

Authors:  Ruibang Luo; Thomas Wong; Jianqiao Zhu; Chi-Man Liu; Xiaoqian Zhu; Edward Wu; Lap-Kei Lee; Haoxiang Lin; Wenjuan Zhu; David W Cheung; Hing-Fung Ting; Siu-Ming Yiu; Shaoliang Peng; Chang Yu; Yingrui Li; Ruiqiang Li; Tak-Wah Lam
Journal:  PLoS One       Date:  2013-05-31       Impact factor: 3.240

8.  Long read alignment based on maximal exact match seeds.

Authors:  Yongchao Liu; Bertil Schmidt
Journal:  Bioinformatics       Date:  2012-09-15       Impact factor: 6.937

9.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors:  Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal:  Gigascience       Date:  2012-12-27       Impact factor: 6.524

10.  SeqAn an efficient, generic C++ library for sequence analysis.

Authors:  Andreas Döring; David Weese; Tobias Rausch; Knut Reinert
Journal:  BMC Bioinformatics       Date:  2008-01-09       Impact factor: 3.169

View more
  6 in total

Review 1.  A Review of Parallel Implementations for the Smith-Waterman Algorithm.

Authors:  Zeyu Xia; Yingbo Cui; Ang Zhang; Tao Tang; Lin Peng; Chun Huang; Canqun Yang; Xiangke Liao
Journal:  Interdiscip Sci       Date:  2021-09-06       Impact factor: 3.492

2.  SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.

Authors:  José M Abuín; Juan C Pichel; Tomás F Pena; Jorge Amigo
Journal:  PLoS One       Date:  2016-05-16       Impact factor: 3.240

3.  Genetically improved BarraCUDA.

Authors:  W B Langdon; Brian Yee Hong Lam
Journal:  BioData Min       Date:  2017-08-02       Impact factor: 2.522

4.  Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC+.

Authors:  Jorge González-Domínguez; Yongchao Liu; Bertil Schmidt
Journal:  PLoS One       Date:  2016-01-05       Impact factor: 3.240

Review 5.  Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.

Authors:  Zekun Yin; Haidong Lan; Guangming Tan; Mian Lu; Athanasios V Vasilakos; Weiguo Liu
Journal:  Comput Struct Biotechnol J       Date:  2017-08-14       Impact factor: 7.271

6.  Toward reliable biomarker signatures in the age of liquid biopsies - how to standardize the small RNA-Seq workflow.

Authors:  Dominik Buschmann; Anna Haberberger; Benedikt Kirchner; Melanie Spornraft; Irmgard Riedmaier; Gustav Schelling; Michael W Pfaffl
Journal:  Nucleic Acids Res       Date:  2016-06-17       Impact factor: 16.971

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.