Literature DB >> 33816803

Indexing labeled sequences.

Tatiana Rocher1, Mathieu Giraud1, Mikaël Salson1.   

Abstract

BACKGROUND: Labels are a way to add some information on a text, such as functional annotations such as genes on a DNA sequences. V(D)J recombinations are DNA recombinations involving two or three short genes in lymphocytes. Sequencing this short region (500 bp or less) produces labeled sequences and brings insight in the lymphocyte repertoire for onco-hematology or immunology studies.
METHODS: We present two indexes for a text with non-overlapping labels. They store the text in a Burrows-Wheeler transform (BWT) and a compressed label sequence in a Wavelet Tree. The label sequence is taken in the order of the text (TL-index) or in the order of the BWT (TLBW-index). Both indexes need a space related to the entropy of the labeled text.
RESULTS: These indexes allow efficient text-label queries to count and find labeled patterns. The TLBW-index has an overhead on simple label queries but is very efficient on combined pattern-label queries. We implemented the indexes in C++ and compared them against a baseline solution on pseudo-random as well as on V(D)J labeled texts. DISCUSSION: New indexes such as the ones we proposed improve the way we index and query labeled texts as, for instance, lymphocyte repertoire for hematological and immunological studies.
© 2018 Rocher et al.

Entities:  

Keywords:  Burrows–Wheeler transform; Data structures; Text indexing; V(D)J recombination; Wavelet Tree

Year:  2018        PMID: 33816803      PMCID: PMC7924554          DOI: 10.7717/peerj-cs.148

Source DB:  PubMed          Journal:  PeerJ Comput Sci        ISSN: 2376-5992


  6 in total

1.  Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform.

Authors:  Anthony J Cox; Markus J Bauer; Tobias Jakobi; Giovanna Rosone
Journal:  Bioinformatics       Date:  2012-05-03       Impact factor: 6.937

2.  High-throughput sequencing in acute lymphoblastic leukemia: Follow-up of minimal residual disease and emergence of new clones.

Authors:  Mikaël Salson; Mathieu Giraud; Aurélie Caillault; Nathalie Grardel; Nicolas Duployez; Yann Ferret; Marc Duez; Ryan Herbert; Tatiana Rocher; Shéhérazade Sebda; Sabine Quief; Céline Villenet; Martin Figeac; Claude Preudhomme
Journal:  Leuk Res       Date:  2016-11-21       Impact factor: 3.156

3.  ARResT/Interrogate: an interactive immunoprofiler for IG/TR NGS data.

Authors:  Vojtech Bystry; Tomas Reigl; Adam Krejci; Martin Demko; Barbora Hanakova; Andrea Grioni; Henrik Knecht; Max Schlitt; Peter Dreger; Leopold Sellner; Dietrich Herrmann; Marine Pingeon; Myriam Boudjoghra; Jos Rijntjes; Christiane Pott; Anton W Langerak; Patricia J T A Groenen; Frederic Davi; Monika Brüggemann; Nikos Darzentas
Journal:  Bioinformatics       Date:  2017-02-01       Impact factor: 6.937

4.  Fast construction of FM-index for long sequence reads.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2014-08-08       Impact factor: 6.937

Review 5.  Somatic generation of antibody diversity.

Authors:  S Tonegawa
Journal:  Nature       Date:  1983-04-14       Impact factor: 49.962

6.  Vidjil: A Web Platform for Analysis of High-Throughput Repertoire Sequencing.

Authors:  Marc Duez; Mathieu Giraud; Ryan Herbert; Tatiana Rocher; Mikaël Salson; Florian Thonier
Journal:  PLoS One       Date:  2016-11-11       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.