| Literature DB >> 23815271 |
Youhei Namiki1, Takashi Ishida, Yutaka Akiyama.
Abstract
BACKGROUND: Huge numbers of genomes can now be sequenced rapidly with recent improvements in sequencing throughput. However, data analysis methods have not kept up, making it difficult to process the vast amounts of available sequence data. This increased processing time is especially critical in DNA sequence clustering because of the intrinsic difficulty in parallelization. Thus, there is a strong demand for a faster clustering algorithm.Entities:
Mesh:
Year: 2013 PMID: 23815271 PMCID: PMC3654901 DOI: 10.1186/1471-2105-14-S8-S7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flowchart of proposed method.
Figure 2Computation of .
Figure 3Computation of .
Computation time for each sequence length (1 million sequences)
| 100 bases | 150 bases | 400 bases | ||||
|---|---|---|---|---|---|---|
| CD-HIT | 41m40s | 45m15s | 1h4m29s | |||
| LCS-HIT | 7m10s | (5.8) | 13m45s | (3.3) | 31m22s | (2.1) |
Computation time for each sequence length (2 million sequences)
| 100 bases | 150 bases | 400 bases | ||||
|---|---|---|---|---|---|---|
| CD-HIT | 2h11m47s | 2h17m56s | 2h50m38s | |||
| LCS-HIT | 18m42s | (7.1) | 31m41s | (4.4) | 1h7m26s | (2.5) |
Computation time for each sequence length (5 million sequences)
| 100 bases | 150 bases | 400 bases | ||||
|---|---|---|---|---|---|---|
| CD-HIT | 11h17m22s | 11h28m17s | 14h57m56s | |||
| LCS-HIT | 2h11m09s | (5.2) | 3h4m43s | (3.7) | 6h42m23s | (2.2) |
Figure 4Computation time (2 million sequences). The red line shows the computation time of LCS-HIT for each read length and blue line shows that of CD-HIT.
Number of clusters (2 million sequences)
| 100 bases | 150 bases | 400 bases | |
|---|---|---|---|
| CD-HIT | 1,242,054 | 1,015,466 | 493,384 |
| LCS-HIT | 1,185,704 | 970,419 | 480,201 |
Computation time for real sequencing datasets
| 454 | Illumina/Solexa | |||
|---|---|---|---|---|
| CD-HIT | 2m47s | 27h2m5s | ||
| LCS-HIT | 44s | (3.8) | 3h44m31s | (7.4) |