Literature DB >> 19251772

Textual data compression in computational biology: a synopsis.

Raffaele Giancarlo1, Davide Scaturro, Filippo Utro.   

Abstract

MOTIVATION: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks.
RESULTS: The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been used. When possible, a unifying organization of the main ideas and techniques is also provided. AVAILABILITY: It goes without saying that most of the research results reviewed here offer software prototypes to the bioinformatics community. The Supplementary Material provides pointers to software and benchmark datasets for a range of applications of broad interest. In addition to provide reference to software, the Supplementary Material also gives a brief presentation of some fundamental results and techniques related to this paper. It is at: http://www.math.unipa.it/ approximately raffaele/suppMaterial/compReview/

Mesh:

Year:  2009        PMID: 19251772     DOI: 10.1093/bioinformatics/btp117

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  18 in total

1.  Discovering functional modules by identifying recurrent and mutually exclusive mutational patterns in tumors.

Authors:  Christopher A Miller; Stephen H Settle; Erik P Sulman; Kenneth D Aldape; Aleksandar Milosavljevic
Journal:  BMC Med Genomics       Date:  2011-04-14       Impact factor: 3.063

2.  LFQC: a lossless compression algorithm for FASTQ files.

Authors:  Marius Nicolae; Sudipta Pathak; Sanguthevar Rajasekaran
Journal:  Bioinformatics       Date:  2015-06-20       Impact factor: 6.937

3.  Data Compression Concepts and Algorithms and their Applications to Bioinformatics.

Authors:  O U Nalbantog̃lu; D J Russell; K Sayood
Journal:  Entropy (Basel)       Date:  2010-01-01       Impact factor: 2.524

4.  Efficient DNA sequence compression with neural networks.

Authors:  Milton Silva; Diogo Pratas; Armando J Pinho
Journal:  Gigascience       Date:  2020-11-11       Impact factor: 6.524

5.  DNABIT Compress - Genome compression algorithm.

Authors:  Pothuraju Rajarajeswari; Allam Apparao
Journal:  Bioinformation       Date:  2011-01-22

6.  On the representability of complete genomes by multiple competing finite-context (Markov) models.

Authors:  Armando J Pinho; Paulo J S G Ferreira; António J R Neves; Carlos A C Bastos
Journal:  PLoS One       Date:  2011-06-30       Impact factor: 3.240

7.  GReEn: a tool for efficient compression of genome resequencing data.

Authors:  Armando J Pinho; Diogo Pratas; Sara P Garcia
Journal:  Nucleic Acids Res       Date:  2011-12-01       Impact factor: 16.971

8.  NGC: lossless and lossy compression of aligned high-throughput sequencing data.

Authors:  Niko Popitsch; Arndt von Haeseler
Journal:  Nucleic Acids Res       Date:  2012-10-12       Impact factor: 16.971

9.  Alignment-free phylogeny of whole genomes using underlying subwords.

Authors:  Matteo Comin; Davide Verzotto
Journal:  Algorithms Mol Biol       Date:  2012-12-06       Impact factor: 1.405

10.  Compression of FASTQ and SAM format sequencing data.

Authors:  James K Bonfield; Matthew V Mahoney
Journal:  PLoS One       Date:  2013-03-22       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.