Literature DB >> 30260405

De novo clustering of long reads by gene from transcriptomics data.

Camille Marchet1, Lolita Lecompte1, Corinne Da Silva2, Corinne Cruaud2, Jean-Marc Aury2, Jacques Nicolas1, Pierre Peterlongo1.   

Abstract

Long-read sequencing currently provides sequences of several thousand base pairs. It is therefore possible to obtain complete transcripts, offering an unprecedented vision of the cellular transcriptome. However the literature lacks tools for de novo clustering of such data, in particular for Oxford Nanopore Technologies reads, because of the inherent high error rate compared to short reads. Our goal is to process reads from whole transcriptome sequencing data accurately and without a reference genome in order to reliably group reads coming from the same gene. This de novo approach is therefore particularly suitable for non-model species, but can also serve as a useful pre-processing step to improve read mapping. Our contribution both proposes a new algorithm adapted to clustering of reads by gene and a practical and free access tool that allows to scale the complete processing of eukaryotic transcriptomes. We sequenced a mouse RNA sample using the MinION device. This dataset is used to compare our solution to other algorithms used in the context of biological clustering. We demonstrate that it is the best approach for transcriptomics long reads. When a reference is available to enable mapping, we show that it stands as an alternative method that predicts complementary clusters.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30260405      PMCID: PMC6326815          DOI: 10.1093/nar/gky834

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  59 in total

1.  Comparison of gene indexing databases.

Authors:  J Bouck; W Yu; R Gibbs; K Worley
Journal:  Trends Genet       Date:  1999-04       Impact factor: 11.639

2.  Performance of modularity maximization in practical contexts.

Authors:  Benjamin H Good; Yves-Alexandre de Montjoye; Aaron Clauset
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2010-04-15

3.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

4.  CFinder: locating cliques and overlapping modules in biological networks.

Authors:  Balázs Adamcsek; Gergely Palla; Illés J Farkas; Imre Derényi; Tamás Vicsek
Journal:  Bioinformatics       Date:  2006-02-10       Impact factor: 6.937

5.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Authors:  Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy
Journal:  Nat Biotechnol       Date:  2015-05-25       Impact factor: 54.908

6.  Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

Authors:  Heng Li
Journal:  Bioinformatics       Date:  2016-03-19       Impact factor: 6.937

7.  De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads.

Authors:  David Eccles; Jodie Chandler; Mali Camberis; Bernard Henrissat; Sergey Koren; Graham Le Gros; Jonathan J Ewbank
Journal:  BMC Biol       Date:  2018-01-11       Impact factor: 7.431

8.  DNACLUST: accurate and efficient clustering of phylogenetic marker genes.

Authors:  Mohammadreza Ghodsi; Bo Liu; Mihai Pop
Journal:  BMC Bioinformatics       Date:  2011-06-30       Impact factor: 3.169

Review 9.  Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art.

Authors:  Justin Chu; Hamid Mohamadi; René L Warren; Chen Yang; Inanç Birol
Journal:  Bioinformatics       Date:  2017-04-15       Impact factor: 6.937

10.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

View more
  13 in total

Review 1.  Nanopore sequencing technology, bioinformatics and applications.

Authors:  Yunhao Wang; Yue Zhao; Audrey Bollas; Yuru Wang; Kin Fai Au
Journal:  Nat Biotechnol       Date:  2021-11-08       Impact factor: 54.908

2.  kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph.

Authors:  Ze-Gang Wei; Xing-Guo Fan; Hao Zhang; Xiao-Dan Zhang; Fei Liu; Yu Qian; Shao-Wu Zhang
Journal:  Front Genet       Date:  2022-05-05       Impact factor: 4.772

3.  De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Authors:  Kristoffer Sahlin; Paul Medvedev
Journal:  J Comput Biol       Date:  2020-03-16       Impact factor: 1.479

Review 4.  Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing.

Authors:  Liangzhen Zhao; Hangxiao Zhang; Markus V Kohnen; Kasavajhala V S K Prasad; Lianfeng Gu; Anireddy S N Reddy
Journal:  Front Genet       Date:  2019-03-21       Impact factor: 4.599

Review 5.  Getting the Entire Message: Progress in Isoform Sequencing.

Authors:  Simon A Hardwick; Anoushka Joglekar; Paul Flicek; Adam Frankish; Hagen U Tilgner
Journal:  Front Genet       Date:  2019-08-16       Impact factor: 4.599

6.  Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris.

Authors:  Dario I Ojeda; Tiina M Mattila; Tom Ruttink; Sonja T Kujala; Katri Kärkkäinen; Jukka-Pekka Verta; Tanja Pyhäjärvi
Journal:  G3 (Bethesda)       Date:  2019-10-07       Impact factor: 3.154

7.  Direct full-length RNA sequencing reveals unexpected transcriptome complexity during Caenorhabditis elegans development.

Authors:  Runsheng Li; Xiaoliang Ren; Qiutao Ding; Yu Bi; Dongying Xie; Zhongying Zhao
Journal:  Genome Res       Date:  2020-02-05       Impact factor: 9.043

8.  Investigating the mitochondrial genomic landscape of Arabidopsis thaliana by long-read sequencing.

Authors:  Bansho Masutani; Shin-Ichi Arimura; Shinichi Morishita
Journal:  PLoS Comput Biol       Date:  2021-01-12       Impact factor: 4.475

Review 9.  Methodologies for Transcript Profiling Using Long-Read Technologies.

Authors:  Spyros Oikonomopoulos; Anthony Bayega; Somayyeh Fahiminiya; Haig Djambazian; Pierre Berube; Jiannis Ragoussis
Journal:  Front Genet       Date:  2020-07-07       Impact factor: 4.599

10.  ReorientExpress: reference-free orientation of nanopore cDNA reads with deep learning.

Authors:  Angel Ruiz-Reche; Akanksha Srivastava; Joel A Indi; Ivan de la Rubia; Eduardo Eyras
Journal:  Genome Biol       Date:  2019-11-29       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.