Literature DB >> 25399029

E-MEM: efficient computation of maximal exact matches for very large genomes.

Nilesh Khiste1, Lucian Ilie1.   

Abstract

MOTIVATION: Alignment of similar whole genomes is often performed using anchors given by the maximal exact matches (MEMs) between their sequences. In spite of significant amount of research on this problem, the computation of MEMs for large genomes remains a challenging problem. The leading current algorithms employ full text indexes, the sparse suffix array giving the best results. Still, their memory requirements are high, the parallelization is not very efficient, and they cannot handle very large genomes.
RESULTS: We present a new algorithm, efficient computation of MEMs (E-MEM) that does not use full text indexes. Our algorithm uses much less space and is highly amenable to parallelization. It can compute all MEMs of minimum length 100 between the whole human and mouse genomes on a 12 core machine in 10 min and 2 GB of memory; the required memory can be as low as 600 MB. It can run efficiently genomes of any size. Extensive testing and comparison with currently best algorithms is provided.
AVAILABILITY AND IMPLEMENTATION: The source code of E-MEM is freely available at: http://www.csd.uwo.ca/∼ilie/E-MEM/ CONTACT: ilie@csd.uwo.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 25399029     DOI: 10.1093/bioinformatics/btu687

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  7 in total

1.  Comparing fixed sampling with minimizer sampling when using k-mer indexes to find maximal exact matches.

Authors:  Meznah Almutairy; Eric Torng
Journal:  PLoS One       Date:  2018-02-01       Impact factor: 3.240

2.  HISEA: HIerarchical SEed Aligner for PacBio data.

Authors:  Nilesh Khiste; Lucian Ilie
Journal:  BMC Bioinformatics       Date:  2017-12-19       Impact factor: 3.169

3.  The effects of sampling on the efficiency and accuracy of k-mer indexes: Theoretical and empirical comparisons using the human genome.

Authors:  Meznah Almutairy; Eric Torng
Journal:  PLoS One       Date:  2017-07-07       Impact factor: 3.240

4.  CNEFinder: finding conserved non-coding elements in genomes.

Authors:  Lorraine A K Ayad; Solon P Pissis; Dimitris Polychronopoulos
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

5.  LASER: Large genome ASsembly EvaluatoR.

Authors:  Nilesh Khiste; Lucian Ilie
Journal:  BMC Res Notes       Date:  2015-11-24

Review 6.  Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine.

Authors:  Hao Ye; Joe Meehan; Weida Tong; Huixiao Hong
Journal:  Pharmaceutics       Date:  2015-11-23       Impact factor: 6.321

7.  Sequence-specific minimizers via polar sets.

Authors:  Hongyu Zheng; Carl Kingsford; Guillaume Marçais
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.