Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 De novo clustering of long reads by gene from transcriptomics data.

Literature DB >> 30260405

De novo clustering of long reads by gene from transcriptomics data.

Camille Marchet¹, Lolita Lecompte¹, Corinne Da Silva², Corinne Cruaud², Jean-Marc Aury², Jacques Nicolas¹, Pierre Peterlongo¹.

Abstract

Long-read sequencing currently provides sequences of several thousand base pairs. It is therefore possible to obtain complete transcripts, offering an unprecedented vision of the cellular transcriptome. However the literature lacks tools for de novo clustering of such data, in particular for Oxford Nanopore Technologies reads, because of the inherent high error rate compared to short reads. Our goal is to process reads from whole transcriptome sequencing data accurately and without a reference genome in order to reliably group reads coming from the same gene. This de novo approach is therefore particularly suitable for non-model species, but can also serve as a useful pre-processing step to improve read mapping. Our contribution both proposes a new algorithm adapted to clustering of reads by gene and a practical and free access tool that allows to scale the complete processing of eukaryotic transcriptomes. We sequenced a mouse RNA sample using the MinION device. This dataset is used to compare our solution to other algorithms used in the context of biological clustering. We demonstrate that it is the best approach for transcriptomics long reads. When a reference is available to enable mapping, we show that it stands as an alternative method that predicts complementary clusters.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
RNA

Year: 2019 PMID： 30260405 PMCID： PMC6326815 DOI： 10.1093/nar/gky834

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

59 in total

1. Comparison of gene indexing databases.

Authors: J Bouck; W Yu; R Gibbs; K Worley
Journal: Trends Genet Date: 1999-04 Impact factor: 11.639

2. Performance of modularity maximization in practical contexts.

Authors: Benjamin H Good; Yves-Alexandre de Montjoye; Aaron Clauset
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2010-04-15

3. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

4. CFinder: locating cliques and overlapping modules in biological networks.

Authors: Balázs Adamcsek; Gergely Palla; Illés J Farkas; Imre Derényi; Tamás Vicsek
Journal: Bioinformatics Date: 2006-02-10 Impact factor: 6.937

5. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Authors: Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy
Journal: Nat Biotechnol Date: 2015-05-25 Impact factor: 54.908

6. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences.

Authors: Heng Li
Journal: Bioinformatics Date: 2016-03-19 Impact factor: 6.937

7. De novo assembly of the complex genome of Nippostrongylus brasiliensis using MinION long reads.

Authors: David Eccles; Jodie Chandler; Mali Camberis; Bernard Henrissat; Sergey Koren; Graham Le Gros; Jonathan J Ewbank
Journal: BMC Biol Date: 2018-01-11 Impact factor: 7.431

8. DNACLUST: accurate and efficient clustering of phylogenetic marker genes.

Authors: Mohammadreza Ghodsi; Bo Liu; Mihai Pop
Journal: BMC Bioinformatics Date: 2011-06-30 Impact factor: 3.169

Review 9. Innovations and challenges in detecting long read overlaps: an evaluation of the state-of-the-art.

Authors: Justin Chu; Hamid Mohamadi; René L Warren; Chen Yang; Inanç Birol
Journal: Bioinformatics Date: 2017-04-15 Impact factor: 6.937

10. Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors: Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal: Nat Biotechnol Date: 2011-05-15 Impact factor: 54.908

13 in total

Review 1. Nanopore sequencing technology, bioinformatics and applications.

Authors: Yunhao Wang; Yue Zhao; Audrey Bollas; Yuru Wang; Kin Fai Au
Journal: Nat Biotechnol Date: 2021-11-08 Impact factor: 54.908

2. kngMap: Sensitive and Fast Mapping Algorithm for Noisy Long Reads Based on the K-Mer Neighborhood Graph.

Authors: Ze-Gang Wei; Xing-Guo Fan; Hao Zhang; Xiao-Dan Zhang; Fei Liu; Yu Qian; Shao-Wu Zhang
Journal: Front Genet Date: 2022-05-05 Impact factor: 4.772

3. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Authors: Kristoffer Sahlin; Paul Medvedev
Journal: J Comput Biol Date: 2020-03-16 Impact factor: 1.479

Review 4. Analysis of Transcriptome and Epitranscriptome in Plants Using PacBio Iso-Seq and Nanopore-Based Direct RNA Sequencing.

Authors: Liangzhen Zhao; Hangxiao Zhang; Markus V Kohnen; Kasavajhala V S K Prasad; Lianfeng Gu; Anireddy S N Reddy
Journal: Front Genet Date: 2019-03-21 Impact factor: 4.599

Review 5. Getting the Entire Message: Progress in Isoform Sequencing.

Authors: Simon A Hardwick; Anoushka Joglekar; Paul Flicek; Adam Frankish; Hagen U Tilgner
Journal: Front Genet Date: 2019-08-16 Impact factor: 4.599

6. Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris.

Authors: Dario I Ojeda; Tiina M Mattila; Tom Ruttink; Sonja T Kujala; Katri Kärkkäinen; Jukka-Pekka Verta; Tanja Pyhäjärvi
Journal: G3 (Bethesda) Date: 2019-10-07 Impact factor: 3.154

7. Direct full-length RNA sequencing reveals unexpected transcriptome complexity during Caenorhabditis elegans development.

Authors: Runsheng Li; Xiaoliang Ren; Qiutao Ding; Yu Bi; Dongying Xie; Zhongying Zhao
Journal: Genome Res Date: 2020-02-05 Impact factor: 9.043