Literature DB >> 28250805

Playing hide and seek with repeats in local and global de novo transcriptome assembly of short RNA-seq reads.

Leandro Lima1,2, Blerina Sinaimeri1,2, Gustavo Sacomoto1,2, Helene Lopez-Maestre1,2, Camille Marchet3, Vincent Miele2, Marie-France Sagot1,2, Vincent Lacroix1,2.   

Abstract

BACKGROUND: The main challenge in de novo genome assembly of DNA-seq data is certainly to deal with repeats that are longer than the reads. In de novo transcriptome assembly of RNA-seq reads, on the other hand, this problem has been underestimated so far. Even though we have fewer and shorter repeated sequences in transcriptomics, they do create ambiguities and confuse assemblers if not addressed properly. Most transcriptome assemblers of short reads are based on de Bruijn graphs (DBG) and have no clear and explicit model for repeats in RNA-seq data, relying instead on heuristics to deal with them.
RESULTS: The results of this work are threefold. First, we introduce a formal model for representing high copy-number and low-divergence repeats in RNA-seq data and exploit its properties to infer a combinatorial characteristic of repeat-associated subgraphs. We show that the problem of identifying such subgraphs in a DBG is NP-complete. Second, we show that in the specific case of local assembly of alternative splicing (AS) events, we can implicitly avoid such subgraphs, and we present an efficient algorithm to enumerate AS events that are not included in repeats. Using simulated data, we show that this strategy is significantly more sensitive and precise than the previous version of KisSplice (Sacomoto et al. in WABI, pp 99-111, 1), Trinity (Grabherr et al. in Nat Biotechnol 29(7):644-652, 2), and Oases (Schulz et al. in Bioinformatics 28(8):1086-1092, 3), for the specific task of calling AS events. Third, we turn our focus to full-length transcriptome assembly, and we show that exploring the topology of DBGs can improve de novo transcriptome evaluation methods. Based on the observation that repeats create complicated regions in a DBG, and when assemblers try to traverse these regions, they can infer erroneous transcripts, we propose a measure to flag transcripts traversing such troublesome regions, thereby giving a confidence level for each transcript. The originality of our work when compared to other transcriptome evaluation methods is that we use only the topology of the DBG, and not read nor coverage information. We show that our simple method gives better results than Rsem-Eval (Li et al. in Genome Biol 15(12):553, 4) and TransRate (Smith-Unna et al. in Genome Res 26(8):1134-1144, 5) on both real and simulated datasets for detecting chimeras, and therefore is able to capture assembly errors missed by these methods.

Entities:  

Keywords:  Alternative splicing; Assembly evaluation; De Bruijn graph topology; Enumeration algorithm; Formal model for representing repeats; RNA-seq; Repeats; Transcriptome assembly

Year:  2017        PMID: 28250805      PMCID: PMC5322684          DOI: 10.1186/s13015-017-0091-2

Source DB:  PubMed          Journal:  Algorithms Mol Biol        ISSN: 1748-7188            Impact factor:   1.405


  20 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  De novo assembly and analysis of RNA-seq data.

Authors:  Gordon Robertson; Jacqueline Schein; Readman Chiu; Richard Corbett; Matthew Field; Shaun D Jackman; Karen Mungall; Sam Lee; Hisanaga Mark Okada; Jenny Q Qian; Malachi Griffith; Anthony Raymond; Nina Thiessen; Timothee Cezard; Yaron S Butterfield; Richard Newsome; Simon K Chan; Rong She; Richard Varhol; Baljit Kamoh; Anna-Liisa Prabhu; Angela Tam; YongJun Zhao; Richard A Moore; Martin Hirst; Marco A Marra; Steven J M Jones; Pamela A Hoodless; Inanc Birol
Journal:  Nat Methods       Date:  2010-10-10       Impact factor: 28.547

3.  Families of transposable elements, population structure and the origin of species.

Authors:  Jerzy Jurka; Weidong Bao; Kenji K Kojima
Journal:  Biol Direct       Date:  2011-09-19       Impact factor: 4.540

4.  Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data.

Authors:  Petr Novák; Pavel Neumann; Jirí Macas
Journal:  BMC Bioinformatics       Date:  2010-07-15       Impact factor: 3.169

5.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels.

Authors:  Marcel H Schulz; Daniel R Zerbino; Martin Vingron; Ewan Birney
Journal:  Bioinformatics       Date:  2012-02-24       Impact factor: 6.937

6.  Evaluation of de novo transcriptome assemblies from RNA-Seq data.

Authors:  Bo Li; Nathanael Fillmore; Yongsheng Bai; Mike Collins; James A Thomson; Ron Stewart; Colin N Dewey
Journal:  Genome Biol       Date:  2014-12-21       Impact factor: 13.583

7.  TransRate: reference-free quality assessment of de novo transcriptome assemblies.

Authors:  Richard Smith-Unna; Chris Boursnell; Rob Patro; Julian M Hibberd; Steven Kelly
Journal:  Genome Res       Date:  2016-06-01       Impact factor: 9.043

8.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

9.  IDBA-tran: a more robust de novo de Bruijn graph assembler for transcriptomes with uneven expression levels.

Authors:  Yu Peng; Henry C M Leung; Siu-Ming Yiu; Ming-Ju Lv; Xin-Guang Zhu; Francis Y L Chin
Journal:  Bioinformatics       Date:  2013-07-01       Impact factor: 6.937

10.  Transcriptome and genome sequencing uncovers functional variation in humans.

Authors:  Tuuli Lappalainen; Michael Sammeth; Marc R Friedländer; Peter A C 't Hoen; Jean Monlong; Manuel A Rivas; Mar Gonzàlez-Porta; Natalja Kurbatova; Thasso Griebel; Pedro G Ferreira; Matthias Barann; Thomas Wieland; Liliana Greger; Maarten van Iterson; Jonas Almlöf; Paolo Ribeca; Irina Pulyakhina; Daniela Esser; Thomas Giger; Andrew Tikhonov; Marc Sultan; Gabrielle Bertier; Daniel G MacArthur; Monkol Lek; Esther Lizano; Henk P J Buermans; Ismael Padioleau; Thomas Schwarzmayr; Olof Karlberg; Halit Ongen; Helena Kilpinen; Sergi Beltran; Marta Gut; Katja Kahlem; Vyacheslav Amstislavskiy; Oliver Stegle; Matti Pirinen; Stephen B Montgomery; Peter Donnelly; Mark I McCarthy; Paul Flicek; Tim M Strom; Hans Lehrach; Stefan Schreiber; Ralf Sudbrak; Angel Carracedo; Stylianos E Antonarakis; Robert Häsler; Ann-Christine Syvänen; Gert-Jan van Ommen; Alvis Brazma; Thomas Meitinger; Philip Rosenstiel; Roderic Guigó; Ivo G Gut; Xavier Estivill; Emmanouil T Dermitzakis
Journal:  Nature       Date:  2013-09-15       Impact factor: 49.962

View more
  5 in total

1.  De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers.

Authors:  Martin Hölzer; Manja Marz
Journal:  Gigascience       Date:  2019-05-01       Impact factor: 6.524

2.  Cloning of the first cDNA encoding a putative CCRFamide precursor: identification of the brain, eyestalk ganglia, and cardiac ganglion as sites of CCRFamide expression in the American lobster, Homarus americanus.

Authors:  J Joe Hull; Melissa A Stefanek; Patsy S Dickinson; Andrew E Christie
Journal:  Invert Neurosci       Date:  2020-11-26

3.  Investigation of the activity of transposable elements and genes involved in their silencing in the newt Cynops orientalis, a species with a giant genome.

Authors:  Federica Carducci; Elisa Carotti; Marco Gerdol; Samuele Greco; Adriana Canapa; Marco Barucca; Maria Assunta Biscotti
Journal:  Sci Rep       Date:  2021-07-20       Impact factor: 4.379

4.  Differential toxicity and venom gland gene expression in Centruroides vittatus.

Authors:  Thomas McElroy; C Neal McReynolds; Alyssa Gulledge; Kelci R Knight; Whitney E Smith; Eric A Albrecht
Journal:  PLoS One       Date:  2017-10-04       Impact factor: 3.240

5.  The Bellerophon pipeline, improving de novo transcriptomes and removing chimeras.

Authors:  Jesse Kerkvliet; Arthur de Fouchier; Michiel van Wijk; Astrid Tatjana Groot
Journal:  Ecol Evol       Date:  2019-08-17       Impact factor: 2.912

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.