| Literature DB >> 27556803 |
Richard Beal1, Tazin Afrin2, Aliya Farheen2, Donald Adjeroh2.
Abstract
BACKGROUND: The longest common subsequence (LCS) problem is a classical problem in computer science, and forms the basis of the current best-performing reference-based compression schemes for genome resequencing data.Entities:
Keywords: Biology; Compression; Genome resequencing; LCS; LPF; Longest common subsequence; Longest previous factor
Mesh:
Year: 2016 PMID: 27556803 PMCID: PMC5001248 DOI: 10.1186/s12864-016-2793-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1LCS dynamic programming table for S 1=A A C C T T A A and S 2=A G G T C G T A. A sample LCS trace (ACTA) is highlighted
Fig. 2Total bytes needed by our algorithm to compress the Arabidopsis thaliana genome, i.e. file size sum of symbols and triples
Fig. 3Compressing the Arabidopsis thaliana genome Chromosome 4
Fig. 4Size of the symbols file when compressing the Arabidopsis thaliana genome
Fig. 5Size of the triples file when compressing the Arabidopsis thaliana genome
Arabidopsis thaliana genome: Optimal k for compressing chromosome U into the smallest C (in bytes)
|
|
| | |
|---|---|---|
| 1 | 31–35 | 1086 |
| 2 | 16–1578 | 504 |
| 3 | 24–39 | 746 |
| 4 | 18 | 4418 |
| 5 | 19–91 | 433 |
Arabidopsis thaliana genome: Results (in bytes) for compressing chromosome U into C
|
| | | Our Scheme | GRS | GReEn | ||
|---|---|---|---|---|---|---|
| | |
|
| [ | [ | ||
| 1 | 30 427 671 | 1 086 | 963 | 1 037 |
| 1 551 |
| 2 | 19 698 289 | 504 | 584 | 605 |
| 937 |
| 3 | 23 459 830 |
| 759 | 803 | 2 989 | 1 097 |
| 4 | 18 585 056 | 4 555 | 2 507 | 3 156 |
| 2 356 |
| 5 | 26 975 502 |
| 502 | 520 | 604 | 618 |
| Sum | 119 146 348 | 7 324 |
| 6 121 | 6 644 | 6 559 |
Bold signifies the best result
Oryza sativa genome: Results (in bytes) for compressing chromosome U into C
|
| | | Our Scheme | GRS | GReEn | ||
|---|---|---|---|---|---|---|
| | |
|
| [ | [ | ||
| 1 | 43 268 879 | 15 207 | 4 735 |
| 1 502 040 | 4 972 |
| 2 | 35 930 381 | 4 645 | 1 649 | 1 517 |
| 1 906 |
| 3 | 36 406 689 | 54 234 | 15 693 |
| 47 764 | 17 890 |
| 4 | 35 278 225 | 21 474 | 6 636 |
| 36 145 | 6 750 |
| 5 | 29 894 789 | 17 030 | 5 431 |
| 6 177 | 5 539 |
| 6 | 31 246 789 |
| 146 | 141 | 14 | 482 |
| 7 | 29 696 629 | 5 899 | 2 064 |
| 4 067 | 2 448 |
| 8 | 28 439 308 | 23 126 |
| 10 115 | 118 246 | 9 507 |
| 9 | 23 011 239 |
| 146 | 141 | 14 | 366 |
| 10 | 23 134 759 | 175 228 |
| 50 277 | 788 542 | 60 449 |
| 11 | 28 512 666 | 41 407 |
| 13 351 | 2 397 470 | 14 797 |
| 12 | 27 497 214 |
| 146 | 141 | 14 | 429 |
| Sum | 372 317 567 | 358 286 |
| 109 553 | 4 901 902 | 125 535 |
Bold signifies the best result
Homo sapiens genome: Results (in bytes) for compressing chromosome U into C
|
| | | Our Scheme | GRS | GReEn | |
|---|---|---|---|---|---|
| | |
| [ | [ | ||
| 1 | 247 249 719 | 2 836 652 |
| 1 336 626 | 1 225 767 |
| 2 | 242 951 149 | 2 871 186 |
| 1 354 059 | 1 272 105 |
| 3 | 199 501 827 | 2 115 410 |
| 1 011 124 | 971 527 |
| 4 | 191 273 063 | 2 398 432 |
| 1 139 225 | 1 074 357 |
| 5 | 180 857 866 | 2 064 874 |
| 988 070 | 947 378 |
| 6 | 170 899 992 | 1 902 067 |
| 906 116 | 865 448 |
| 7 | 158 821 424 | 2 326 721 |
| 1 096 646 | 998 482 |
| 8 | 146 274 826 | 1 617 884 |
| 764 313 | 729 362 |
| 9 | 140 273 252 | 1 877 509 |
| 864 222 | 773 716 |
| 10 | 135 374 737 | 1 623 010 |
| 768 364 | 717 305 |
| 11 | 134 452 384 | 1 586 558 |
| 755 708 | 716 301 |
| 12 | 132 349 534 | 1 476 523 |
| 702 040 | 668 455 |
| 13 | 114 142 980 | 1 100 576 |
| 520 598 | 490 888 |
| 14 | 106 368 585 | 1 026 227 |
| 484 791 | 451 018 |
| 15 | 100 338 915 | 1 055 663 |
| 496 215 | 453 301 |
| 16 | 88 827 254 | 1 225 378 |
| 567 989 | 510 254 |
| 17 | 78 774 742 | 1 081 739 |
| 505 979 | 464 324 |
| 18 | 76 117 153 | 865 138 |
| 408 529 | 378 420 |
| 19 | 63 811 651 | 862 129 |
| 399 807 | 369 388 |
| 20 | 62 435 964 | 605 179 |
| 282 628 | 266 562 |
| 21 | 46 944 323 | 488 340 |
| 226 549 | 203 036 |
| 22 | 49 691 432 | 568 734 |
| 262 443 | 230 049 |
| X | 154 913 754 | 7 525 925 |
| 3 231 776 | 2 712 153 |
| Y | 57 772 954 | 1 343 260 |
| 592 791 | 481 307 |
| M | 16 571 | 151 | 151(*) | 183 |
|
| Sum | 3 080 436 051 | 42 445 265 |
| 19 666 791 | 17 971 030 |
Bold signifies the best result
Homo sapiens genome: Results (in bytes) for compressing chromosome U via decomposition, i.e. compressing the payload (ρ) into C and compressing the character-case bitstring α into C
|
| | | Our Scheme | GRS | GReEn | ||||
|---|---|---|---|---|---|---|---|---|
| | |
| | |
|
| [ | [ | ||
| 1 | 247 249 719 | 381 577 | 161 319 | 755 092 | 447 919 |
| 1 336 626 | 1 225 767 |
| 2 | 242 951 149 | 356 526 | 153 805 | 756 823 | 452 338 |
| 1 354 059 | 1 272 105 |
| 3 | 199 501 827 | 284 096 | 119 348 | 553 835 | 343 213 |
| 1 011 124 | 971 527 |
| 4 | 191 273 063 | 330 381 | 137 301 | 619 981 | 383 882 |
| 1 139 225 | 1 074 357 |
| 5 | 180 857 866 | 259 922 | 109 768 | 550 876 | 331 063 |
| 988 070 | 947 378 |
| 6 | 170 899 992 | 265 222 | 110 544 | 508 662 | 310 029 |
| 906 116 | 865 448 |
| 7 | 158 821 424 | 292 797 | 121 289 | 611 475 | 355 616 |
| 1 096 646 | 998 482 |
| 8 | 146 274 826 | 222 972 | 93 378 | 434 420 | 261 455 |
| 764 313 | 729 362 |
| 9 | 140 273 252 | 309 512 | 132 957 | 493 024 | 276 468 |
| 864 222 | 773 716 |
| 10 | 135 374 737 | 245 264 | 103 115 | 436 272 | 257 895 |
| 768 364 | 717 305 |
| 11 | 134 452 384 | 222 735 | 92 471 | 423 687 | 254 637 |
| 755 708 | 716 301 |
| 12 | 132 349 534 | 214 123 | 88 447 | 393 764 | 239 811 |
| 702 040 | 668 455 |
| 13 | 114 142 980 | 148 938 | 62 730 | 301 116 | 183 038 |
| 520 598 | 490 888 |
| 14 | 106 368 585 | 141 128 | 57 354 | 286 839 | 170 916 |
| 484 791 | 451 018 |
| 15 | 100 338 915 | 138 219 | 58 777 | 302 957 | 173 600 |
| 496 215 | 453 301 |
| 16 | 88 827 254 | 151 606 | 62 779 | 346 282 | 191 190 |
| 567 989 | 510 254 |
| 17 | 78 774 742 | 136 168 | 57 030 | 301 837 | 171 680 |
| 505 979 | 464 324 |
| 18 | 76 117 153 | 113 469 | 47 122 | 241 437 | 140 909 |
| 408 529 | 378 420 |
| 19 | 63 811 651 | 130 468 | 53 531 | 230 673 | 134 701 |
| 399 807 | 369 388 |
| 20 | 62 435 964 | 94 273 | 38 689 | 169 584 | 99 796 |
| 282 628 | 266 562 |
| 21 | 46 944 323 | 71 121 | 28 744 | 141 387 | 79 835 |
| 226 549 | 203 036 |
| 22 | 49 691 432 | 81 329 | 33 663 | 164 026 | 89 961 |
| 262 443 | 230 049 |
| X | 154 913 754 | 523 282 | 196 868 | 1 533 249 | 875 026 |
| 3 231 776 | 2 712 153 |
| Y | 57 772 954 | 152 464 | 57 002 | 300 287 | 153 582 |
| 592 791 | 481 307 |
| M | 16 571 | 64 | 64(*) | 49 | 49(*) |
| 183 | 127 |
| Sum | 3 080 436 051 | 5 267 656 | 2 178 095 | 10 857 634 | 6 378 609 |
| 19 666 791 | 17 971 030 |
Bold signifies the best result