Literature DB >> 32766810

CRAFT: Compact genome Representation toward large-scale Alignment-Free daTabase.

Yang Young Lu1, Jiaxing Bai2, Yiwen Wang2, Ying Wang2,3, Fengzhu Sun1.   

Abstract

MOTIVATION: Rapid developments in sequencing technologies have boosted generating high volumes of sequence data. To archive and analyze those data, one primary step is sequence comparison. Alignment-free sequence comparison based on k-mer frequencies offers a computationally efficient solution, yet in practice, the k-mer frequency vectors for large k of practical interest lead to excessive memory and storage consumption.
RESULTS: We report CRAFT, a general genomic/metagenomic search engine to learn compact representations of sequences and perform fast comparison between DNA sequences. Specifically, given genome or high throughput sequencing data as input, CRAFT maps the data into a much smaller embedding space and locates the best matching genome in the archived massive sequence repositories. With 102-104-fold reduction of storage space, CRAFT performs fast query for gigabytes of data within seconds or minutes, achieving comparable performance as six state-of-the-art alignment-free measures.
AVAILABILITY AND IMPLEMENTATION: CRAFT offers a user-friendly graphical user interface with one-click installation on Windows and Linux operating systems, freely available at https://github.com/jiaxingbai/CRAFT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2021        PMID: 32766810      PMCID: PMC9431648          DOI: 10.1093/bioinformatics/btaa699

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


  23 in total

1.  Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach.

Authors:  Ji Qi; Bin Wang; Bai-Iin Hao
Journal:  J Mol Evol       Date:  2004-01       Impact factor: 2.395

2.  COCACOLA: binning metagenomic contigs using sequence COmposition, read CoverAge, CO-alignment and paired-end read LinkAge.

Authors:  Yang Young Lu; Ting Chen; Jed A Fuhrman; Fengzhu Sun
Journal:  Bioinformatics       Date:  2017-03-15       Impact factor: 6.937

3.  Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans.

Authors:  Brian D Muegge; Justin Kuczynski; Dan Knights; Jose C Clemente; Antonio González; Luigi Fontana; Bernard Henrissat; Rob Knight; Jeffrey I Gordon
Journal:  Science       Date:  2011-05-20       Impact factor: 47.728

4.  Alignment-Free Sequence Analysis and Applications.

Authors:  Jie Ren; Xin Bai; Yang Young Lu; Kujin Tang; Ying Wang; Gesine Reinert; Fengzhu Sun
Journal:  Annu Rev Biomed Data Sci       Date:  2018-04-25

5.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.

Authors:  Kim D Pruitt; Tatiana Tatusova; Donna R Maglott
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

6.  Comparison of metatranscriptomic samples based on k-tuple frequencies.

Authors:  Ying Wang; Lin Liu; Lina Chen; Ting Chen; Fengzhu Sun
Journal:  PLoS One       Date:  2014-01-02       Impact factor: 3.240

7.  Skmer: assembly-free and alignment-free sample identification using genome skims.

Authors:  Shahab Sarmashghi; Kristine Bohmann; M Thomas P Gilbert; Vineet Bafna; Siavash Mirarab
Journal:  Genome Biol       Date:  2019-02-13       Impact factor: 13.583

8.  Kraken: ultrafast metagenomic sequence classification using exact alignments.

Authors:  Derrick E Wood; Steven L Salzberg
Journal:  Genome Biol       Date:  2014-03-03       Impact factor: 13.583

9.  Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer.

Authors:  Guillaume Bernard; Cheong Xin Chan; Mark A Ragan
Journal:  Sci Rep       Date:  2016-07-01       Impact factor: 4.379

Review 10.  Alignment-free sequence comparison: benefits, applications, and tools.

Authors:  Andrzej Zielezinski; Susana Vinga; Jonas Almeida; Wojciech M Karlowski
Journal:  Genome Biol       Date:  2017-10-03       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.