Literature DB >> 32181688

De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Kristoffer Sahlin1, Paul Medvedev1,2,3.   

Abstract

Long-read sequencing of transcripts with Pacific Biosciences (PacBio) Iso-Seq and Oxford Nanopore Technologies has proven to be central to the study of complex isoform landscapes in many organisms. However, current de novo transcript reconstruction algorithms from long-read data are limited, leaving the potential of these technologies unfulfilled. A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin. To address this challenge, we develop isONclust, a clustering algorithm that is greedy (to scale) and makes use of quality values (to handle variable error rates). We test isONclust on three simulated and five biological data sets, across a breadth of organisms, technologies, and read depths. Our results demonstrate that isONclust is a substantial improvement over previous approaches, both in terms of overall accuracy and/or scalability to large data sets.

Entities:  

Keywords:  algorithms; clustering; long-read sequencing; sequencing data analysis; third-generation sequencing; transcriptomics

Mesh:

Year:  2020        PMID: 32181688      PMCID: PMC8884114          DOI: 10.1089/cmb.2019.0299

Source DB:  PubMed          Journal:  J Comput Biol        ISSN: 1066-5277            Impact factor:   1.479


  43 in total

1.  STACK: Sequence Tag Alignment and Consensus Knowledgebase.

Authors:  A Christoffels; A van Gelder; G Greyling; R Miller; T Hide; W Hide
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Corset: enabling differential gene expression analysis for de novo assembled transcriptomes.

Authors:  Nadia M Davidson; Alicia Oshlack
Journal:  Genome Biol       Date:  2014-07-26       Impact factor: 13.583

3.  Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads.

Authors:  Zechen Chong; Jue Ruan; Chung-I Wu
Journal:  Bioinformatics       Date:  2012-09-01       Impact factor: 6.937

4.  SimLoRD: Simulation of Long Read Data.

Authors:  Bianca K Stöcker; Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2016-05-10       Impact factor: 6.937

5.  MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.

Authors:  Martin Steinegger; Johannes Söding
Journal:  Nat Biotechnol       Date:  2017-10-16       Impact factor: 54.908

6.  Alignment-free clustering of UMI tagged DNA molecules.

Authors:  Baraa Orabi; Emre Erhan; Brian McConeghy; Stanislav V Volik; Stephane Le Bihan; Robert Bell; Colin C Collins; Cedric Chauve; Faraz Hach
Journal:  Bioinformatics       Date:  2019-06-01       Impact factor: 6.937

7.  Improving PacBio long read accuracy by short read alignment.

Authors:  Kin Fai Au; Jason G Underwood; Lawrence Lee; Wing Hung Wong
Journal:  PLoS One       Date:  2012-10-04       Impact factor: 3.240

8.  DNACLUST: accurate and efficient clustering of phylogenetic marker genes.

Authors:  Mohammadreza Ghodsi; Bo Liu; Mihai Pop
Journal:  BMC Bioinformatics       Date:  2011-06-30       Impact factor: 3.169

9.  Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

Authors:  Jeff Daily
Journal:  BMC Bioinformatics       Date:  2016-02-10       Impact factor: 3.169

10.  Long-Read Isoform Sequencing Reveals a Hidden Complexity of the Transcriptional Landscape of Herpes Simplex Virus Type 1.

Authors:  Dóra Tombácz; Zsolt Csabai; Attila Szűcs; Zsolt Balázs; Norbert Moldován; Donald Sharon; Michael Snyder; Zsolt Boldogkői
Journal:  Front Microbiol       Date:  2017-06-20       Impact factor: 5.640

View more
  17 in total

1.  Weighted minimizer sampling improves long read mapping.

Authors:  Chirag Jain; Arang Rhie; Haowen Zhang; Claudia Chu; Brian P Walenz; Sergey Koren; Adam M Phillippy
Journal:  Bioinformatics       Date:  2020-07-01       Impact factor: 6.937

Review 2.  Nanopore sequencing technology, bioinformatics and applications.

Authors:  Yunhao Wang; Yue Zhao; Audrey Bollas; Yuru Wang; Kin Fai Au
Journal:  Nat Biotechnol       Date:  2021-11-08       Impact factor: 54.908

3.  The minimizer Jaccard estimator is biased and inconsistent.

Authors:  Mahdi Belbasi; Antonio Blanca; Robert S Harris; David Koslicki; Paul Medvedev
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

4.  De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality Value-Based Algorithm.

Authors:  Kristoffer Sahlin; Paul Medvedev
Journal:  J Comput Biol       Date:  2020-03-16       Impact factor: 1.479

5.  The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools.

Authors:  Xueyi Dong; Luyi Tian; Quentin Gouil; Hasaru Kariyawasam; Shian Su; Ricardo De Paoli-Iseppi; Yair David Joseph Prawer; Michael B Clark; Kelsey Breslin; Megan Iminitoff; Marnie E Blewitt; Charity W Law; Matthew E Ritchie
Journal:  NAR Genom Bioinform       Date:  2021-04-26

6.  Vulcan: Improved long-read mapping and structural variant calling via dual-mode alignment.

Authors:  Yilei Fu; Medhat Mahmoud; Viginesh Vaibhav Muraliraman; Fritz J Sedlazeck; Todd J Treangen
Journal:  Gigascience       Date:  2021-09-24       Impact factor: 6.524

7.  MinION-Based DNA Barcoding of Preserved and Non-Invasively Collected Wildlife Samples.

Authors:  Adeline Seah; Marisa C W Lim; Denise McAloose; Stefan Prost; Tracie A Seimon
Journal:  Genes (Basel)       Date:  2020-04-18       Impact factor: 4.096

8.  Investigating the mitochondrial genomic landscape of Arabidopsis thaliana by long-read sequencing.

Authors:  Bansho Masutani; Shin-Ichi Arimura; Shinichi Morishita
Journal:  PLoS Comput Biol       Date:  2021-01-12       Impact factor: 4.475

Review 9.  Nanopore sequencing and its application to the study of microbial communities.

Authors:  Laura Ciuffreda; Héctor Rodríguez-Pérez; Carlos Flores
Journal:  Comput Struct Biotechnol J       Date:  2021-03-07       Impact factor: 7.271

Review 10.  Methodologies for Transcript Profiling Using Long-Read Technologies.

Authors:  Spyros Oikonomopoulos; Anthony Bayega; Somayyeh Fahiminiya; Haig Djambazian; Pierre Berube; Jiannis Ragoussis
Journal:  Front Genet       Date:  2020-07-07       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.