Literature DB >> 30721401

AC: A Compression Tool for Amino Acid Sequences.

Morteza Hosseini1, Diogo Pratas2, Armando J Pinho2.   

Abstract

Advancement of protein sequencing technologies has led to the production of a huge volume of data that needs to be stored and transmitted. This challenge can be tackled by compression. In this paper, we propose AC, a state-of-the-art method for lossless compression of amino acid sequences. The proposed method works based on the cooperation between finite-context models and substitutional tolerant Markov models. Compared to several general-purpose and specific-purpose protein compressors, AC provides the best bit-rates. This method can also compress the sequences nine times faster than its competitor, paq8l. In addition, employing AC, we analyze the compressibility of a large number of sequences from different domains. The results show that viruses are the most difficult sequences to be compressed. Archaea and bacteria are the second most difficult ones, and eukaryota are the easiest sequences to be compressed.

Keywords:  Compression; Finite-context model; Kolmogorov complexity; Protein; Substitutional tolerant Markov model

Mesh:

Year:  2019        PMID: 30721401     DOI: 10.1007/s12539-019-00322-1

Source DB:  PubMed          Journal:  Interdiscip Sci        ISSN: 1867-1462            Impact factor:   2.233


  3 in total

1.  Smash++: an alignment-free and memory-efficient tool to find genomic rearrangements.

Authors:  Morteza Hosseini; Diogo Pratas; Burkhard Morgenstern; Armando J Pinho
Journal:  Gigascience       Date:  2020-05-01       Impact factor: 6.524

2.  AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models.

Authors:  Milton Silva; Diogo Pratas; Armando J Pinho
Journal:  Entropy (Basel)       Date:  2021-04-26       Impact factor: 2.524

3.  Sequence Compression Benchmark (SCB) database-A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences.

Authors:  Kirill Kryukov; Mahoko Takahashi Ueda; So Nakagawa; Tadashi Imanishi
Journal:  Gigascience       Date:  2020-07-01       Impact factor: 6.524

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.