| Literature DB >> 26679555 |
Abstract
BACKGROUND: Continued advances in next generation short-read sequencing technologies are increasing throughput and read lengths, while driving down error rates. Taking advantage of the high coverage sampling used in many applications, several error correction algorithms have been developed to improve data quality further. However, correcting errors in high coverage sequence data requires significant computing resources.Entities:
Mesh:
Year: 2015 PMID: 26679555 PMCID: PMC4674851 DOI: 10.1186/1471-2105-16-S17-S7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Sequence datasets.
| Data Set | SRA Accession | Sequencing Platform | Reference Genome | Reference Strain | Organization | Publication Date | GC% | Genome Length (Mb) | Read Length | Coverage |
|---|---|---|---|---|---|---|---|---|---|---|
| S1 | SRR789669 | HiSeq 2500 | D. Miranda | MSH22 | UCB | 08-05-2013 | 45.6 | 136.73 | 90 | 43 |
| S2 | SRR647546 | HiSeq 2000 | E. Coli | O157:H7 | UMIGS | 10-01-2013 | 51.7 | 5.59 | 101 | 369 |
| S3 | ERR036168 | HiSeq 2000 | P. Falciparum | 3D7 | WTSI | 03-11-2011 | 22.5 | 23.27 | 75 | 362 |
| S4 | ERR142615 | HiSeq 2000 | B. Pertussis | ST24 | WTSI | 28-08-2012 | 67.2 | 4.12 | 75 | 1000 |
| S5 | ERR142617 | HiSeq 2000 | P. Falciparum | 3D7 | WTSI | 28-08-2012 | 20.2 | 23.27 | 75 | 160 |
| S6 | SRR507777 | HiSeq 2000 | S. Cerevisiae | S288c | CSHL | 20-06-2012 | 39 | 12.16 | 76 | 362 |
UCB: University of California, Berkeley; WTSI: Wellcome Trust Sanger Institute; UMIGS: Institute for Genome Sciences, University of Maryland; CSHL: Cold Spring Harbor Laboratory
Alignment of sequence datasets.
| Data Set | Reference Strain | Number of Reads | Aligned (%) | Unaligned (%) | Ambiguous (%) | Perfect Reads (%) | Error Rate |
|---|---|---|---|---|---|---|---|
| S1 | DroMir2.2 | 65603904 | 43580948 (66.4) | 18010469 (27.5) | 4012487 ( 6.1) | 33234182 (50.7) | 0.71 |
| S2 | O157:H7 | 20461442 | 18191534 (88.9) | 1045418 ( 5.1) | 1224490( 6.0) | 13294580 (65.0) | 0.72 |
| S3 | 3D7 | 112418270 | 88192289(78.5) | 19516044 (17.4) | 4709936 ( 4.2) | 74845636 (66.6) | 0.43 |
| S4 | CS | 54996906 | 46950786 (85.4) | 3957391 ( 7.2) | 4088729 ( 7.4) | 44437437 (80.8) | 0.24 |
| S5 | 3D7 | 49738806 | 41836006 (84.1) | 5818785 (11.7) | 2084015 ( 4.2) | 40472551 (81.4) | 0.18 |
| S6 | S288c | 57886340 | 46179933 (79.8) | 2213383 ( 3.8) | 9493023 (16.4) | 51612399 (89.2) | 0.13 |
Results using default parameters and Rule2.
| Data Set |
|
|
|
| Precision | Specificity | Sensitivity | Time taken in seconds | |
|---|---|---|---|---|---|---|---|---|---|
| S1 | 3.3E+07 | 786 | 2.6E+07 | 6588766 | 0.563 | 0.204 | 1 | 149.786 | 217.319 |
| S2 | 1.1E+07 | 2403877 | 781985 | 6384877 | 0.933 | 0.891 | 0.819 | 31.449 | 28.394 |
| S3 | 7.5E+07 | 146338 | 4348595 | 3.3E+07 | 0.945 | 0.884 | 0.998 | 88.842 | 197.219 |
| S4 | 4.4E+07 | 45171 | 3873795 | 6685674 | 0.92 | 0.633 | 0.999 | 51.391 | 148.213 |
| S5 | 4E+07 | 51679 | 4422332 | 4843923 | 0.901 | 0.523 | 0.999 | 51.942 | 135.711 |
| S6 | 5.2E+07 | 33420 | 1681374 | 4592567 | 0.968 | 0.732 | 0.999 | 54.857 | 160.928 |
Figure 1Effect on coverage in dataset S5.
Figure 2.
Results on S4, S5, and S6 for varying rules.
| Data Set | Rule |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| S4 | Rule1 | 44315371 | 122066 | 3718959 | 6840510 | 0.923 | 0.648 | 0.997 |
| S4 | Rule2 | 44392266 | 45171 | 3873795 | 6685674 | 0.920 | 0.633 | 0.999 |
| S4 | Rule3 | 44392266 | 45171 | 3873932 | 6685537 | 0.920 | 0.633 | 0.999 |
| S4 | Rule4 | 44415654 | 21783 | 3905218 | 6654251 | 0.919 | 0.630 | 1.000 |
| S4 | Rule5 | 44416777 | 20660 | 3926550 | 6632919 | 0.919 | 0.628 | 1.000 |
| S5 | Rule1 | 40246703 | 225848 | 4161367 | 5104888 | 0.906 | 0.551 | 0.994 |
| S5 | Rule2 | 40420872 | 51679 | 4422332 | 4843923 | 0.901 | 0.523 | 0.999 |
| S5 | Rule3 | 40420999 | 51552 | 4422972 | 4843283 | 0.901 | 0.523 | 0.999 |
| S5 | Rule4 | 40433782 | 38769 | 4435544 | 4830711 | 0.901 | 0.521 | 0.999 |
| S5 | Rule5 | 40443582 | 28969 | 4442097 | 4824158 | 0.901 | 0.521 | 0.999 |
| S6 | Rule1 | 51341705 | 270694 | 1354288 | 4919653 | 0.974 | 0.784 | 0.995 |
| S6 | Rule2 | 51578979 | 33420 | 1681374 | 4592567 | 0.968 | 0.732 | 0.999 |
| S6 | Rule3 | 51579033 | 33366 | 1681628 | 4592313 | 0.968 | 0.732 | 0.999 |
| S6 | Rule4 | 51605360 | 7039 | 1684518 | 4589423 | 0.968 | 0.732 | 1.000 |
| S6 | Rule5 | 51605719 | 6680 | 1690727 | 4583214 | 0.968 | 0.731 | 1.000 |
Results on reptile error correction.
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
| S1 | Reptile | 4780231 | 28675648 | 134010 | 4249799767 | 19494 | 0.143 | 1.000 | 0.139 | 0.004061 |
| S1 | Rule1 | 1299909 | 6308111 | 3427119 | 1143304758 | 18463 | 0.171 | 0.997 | -0.280 | 0.014004 |
| S1 | Rule2 | 9147518 | 24304818 | 158446 | 4249775331 | 23037 | 0.273 | 1.000 | 0.269 | 0.002512 |
| S2 | Reptile | 5665375 | 8695457 | 90366 | 1946564438 | 2788 | 0.395 | 1.000 | 0.388 | 0.000492 |
| S2 | Rule1 | 2895901 | 11462701 | 745833 | 1945908971 | 5018 | 0.202 | 1.000 | 0.150 | 0.001730 |
| S2 | Rule2 | 5371516 | 8988883 | 128454 | 1946526350 | 3221 | 0.374 | 1.000 | 0.365 | 0.000599 |
| S3 | Reptile | 18025179 | 10286737 | 605575 | 6874673712 | 44797 | 0.637 | 1.000 | 0.615 | 0.002479 |
| S3 | Rule1 | 18806026 | 9498842 | 1940111 | 6873336908 | 51638 | 0.664 | 1.000 | 0.596 | 0.002738 |
| S3 | Rule2 | 18290440 | 10019429 | 695540 | 6874583747 | 46844 | 0.646 | 1.000 | 0.622 | 0.002555 |
| S4 | Reptile | 5565750 | 3453627 | 17740 | 3810574694 | 5139 | 0.617 | 1.000 | 0.615 | 0.000922 |
| S4 | Rule1 | 6365587 | 2653084 | 50006 | 3810542428 | 5845 | 0.706 | 1.000 | 0.700 | 0.000917 |
| S4 | Rule2 | 5990892 | 3028178 | 19705 | 3810572729 | 5446 | 0.664 | 1.000 | 0.662 | 0.000908 |
| S5 | Reptile | 1690826 | 3501370 | 302635 | 3259337078 | 11016 | 0.326 | 1.000 | 0.267 | 0.006473 |
| S5 | Rule1 | 1904825 | 3286174 | 481440 | 3259158273 | 12213 | 0.367 | 1.000 | 0.274 | 0.006371 |
| S5 | Rule2 | 1802536 | 3389315 | 341530 | 3259298183 | 11361 | 0.347 | 1.000 | 0.281 | 0.006263 |
| S6 | Reptile | 4346955 | 1083849 | 16099 | 4213859720 | 1413 | 0.800 | 1.000 | 0.797 | 0.000325 |
| S6 | Rule1 | 4939773 | 490989 | 37325 | 4213838494 | 1455 | 0.910 | 1.000 | 0.903 | 0.000294 |
| S6 | Rule2 | 4449982 | 980530 | 19259 | 4213856560 | 1705 | 0.819 | 1.000 | 0.816 | 0.000383 |