Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Literature DB >> 32181688

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Kristoffer Sahlin¹, Paul Medvedev^1,2,3.

Abstract

Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (to scale) and makes use of quality values (to handle variable error rates). We test isONclust on three simulated and five biological data sets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large data sets.

Entities: Chemical

Keywords: algorithms; clustering; long-read sequencing; sequencing data analysis; third-generation sequencing; transcriptomics

Mesh：

Year: 2020 PMID： 32181688 PMCID： PMC8884114 DOI： 10.1089/cmb.2019.0299

Source DB: PubMed Journal: J Comput Biol ISSN： 1066-5277 Impact factor: 1.479

43 in total

1. STACK: Sequence Tag Alignment and Consensus Knowledgebase.

Authors: A Christoffels; A van Gelder; G Greyling; R Miller; T Hide; W Hide
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. Corset: enabling differential gene expression analysis for de novo assembled transcriptomes.

Authors: Nadia M Davidson; Alicia Oshlack
Journal: Genome Biol Date: 2014-07-26 Impact factor: 13.583

3. Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads.

Authors: Zechen Chong; Jue Ruan; Chung-I Wu
Journal: Bioinformatics Date: 2012-09-01 Impact factor: 6.937

4. SimLoRD: Simulation of Long Read Data.

Authors: Bianca K Stöcker; Johannes Köster; Sven Rahmann
Journal: Bioinformatics Date: 2016-05-10 Impact factor: 6.937

5. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.

Authors: Martin Steinegger; Johannes Söding
Journal: Nat Biotechnol Date: 2017-10-16 Impact factor: 54.908

6. Alignment-free clustering of UMI tagged DNA molecules.

Authors: Baraa Orabi; Emre Erhan; Brian McConeghy; Stanislav V Volik; Stephane Le Bihan; Robert Bell; Colin C Collins; Cedric Chauve; Faraz Hach
Journal: Bioinformatics Date: 2019-06-01 Impact factor: 6.937

7. Improving PacBio long read accuracy by short read alignment.

Authors: Kin Fai Au; Jason G Underwood; Lawrence Lee; Wing Hung Wong
Journal: PLoS One Date: 2012-10-04 Impact factor: 3.240

8. DNACLUST: accurate and efficient clustering of phylogenetic marker genes.

Authors: Mohammadreza Ghodsi; Bo Liu; Mihai Pop
Journal: BMC Bioinformatics Date: 2011-06-30 Impact factor: 3.169

9. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

Authors: Jeff Daily
Journal: BMC Bioinformatics Date: 2016-02-10 Impact factor: 3.169

10. Long-Read Isoform Sequencing Reveals a Hidden Complexity of the Transcriptional Landscape of Herpes Simplex Virus Type 1.

Authors: Dóra Tombácz; Zsolt Csabai; Attila Szűcs; Zsolt Balázs; Norbert Moldován; Donald Sharon; Michael Snyder; Zsolt Boldogkői
Journal: Front Microbiol Date: 2017-06-20 Impact factor: 5.640

17 in total

1. Weighted minimizer sampling improves long read mapping.

Authors: Chirag Jain; Arang Rhie; Haowen Zhang; Claudia Chu; Brian P Walenz; Sergey Koren; Adam M Phillippy
Journal: Bioinformatics Date: 2020-07-01 Impact factor: 6.937

Review 2. Nanopore sequencing technology, bioinformatics and applications.

Authors: Yunhao Wang; Yue Zhao; Audrey Bollas; Yuru Wang; Kin Fai Au
Journal: Nat Biotechnol Date: 2021-11-08 Impact factor: 54.908

3. The minimizer Jaccard estimator is biased and inconsistent.

Authors: Mahdi Belbasi; Antonio Blanca; Robert S Harris; David Koslicki; Paul Medvedev
Journal: Bioinformatics Date: 2022-06-24 Impact factor: 6.931

4. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Authors: Kristoffer Sahlin; Paul Medvedev
Journal: J Comput Biol Date: 2020-03-16 Impact factor: 1.479

5. The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools.

Authors: Xueyi Dong; Luyi Tian; Quentin Gouil; Hasaru Kariyawasam; Shian Su; Ricardo De Paoli-Iseppi; Yair David Joseph Prawer; Michael B Clark; Kelsey Breslin; Megan Iminitoff; Marnie E Blewitt; Charity W Law; Matthew E Ritchie
Journal: NAR Genom Bioinform Date: 2021-04-26