Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A simple guide to de novo transcriptome assembly and annotation.

Literature DB >> 35076693

A simple guide to de novo transcriptome assembly and annotation.

Venket Raghavan¹, Louis Kraft¹, Fantin Mesny², Linda Rigerte².

Abstract

A transcriptome constructed from short-read RNA sequencing (RNA-seq) is an easily attainable proxy catalog of protein-coding genes when genome assembly is unnecessary, expensive or difficult. In the absence of a sequenced genome to guide the reconstruction process, the transcriptome must be assembled de novo using only the information available in the RNA-seq reads. Subsequently, the sequences must be annotated in order to identify sequence-intrinsic and evolutionary features in them (for example, protein-coding regions). Although straightforward at first glance, de novo transcriptome assembly and annotation can quickly prove to be challenging undertakings. In addition to familiarizing themselves with the conceptual and technical intricacies of the tasks at hand and the numerous pre- and post-processing steps involved, those interested must also grapple with an overwhelmingly large choice of tools. The lack of standardized workflows, fast pace of development of new tools and techniques and paucity of authoritative literature have served to exacerbate the difficulty of the task even further. Here, we present a comprehensive overview of de novo transcriptome assembly and annotation. We discuss the procedures involved, including pre- and post-processing steps, and present a compendium of corresponding tools.

Entities: Chemical

Keywords: RNA-seq; annotation; assembly; de novo; tools; transcriptome

Mesh：

Year: 2022 PMID： 35076693 PMCID： PMC8921630 DOI： 10.1093/bib/bbab563

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

Keyword Cloud
References

233 in total

1. De novo assembly and analysis of RNA-seq data.

Authors: Gordon Robertson; Jacqueline Schein; Readman Chiu; Richard Corbett; Matthew Field; Shaun D Jackman; Karen Mungall; Sam Lee; Hisanaga Mark Okada; Jenny Q Qian; Malachi Griffith; Anthony Raymond; Nina Thiessen; Timothee Cezard; Yaron S Butterfield; Richard Newsome; Simon K Chan; Rong She; Richard Varhol; Baljit Kamoh; Anna-Liisa Prabhu; Angela Tam; YongJun Zhao; Richard A Moore; Martin Hirst; Marco A Marra; Steven J M Jones; Pamela A Hoodless; Inanc Birol
Journal: Nat Methods Date: 2010-10-10 Impact factor: 28.547

2. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors: Weizhong Li; Adam Godzik
Journal: Bioinformatics Date: 2006-05-26 Impact factor: 6.937

3. Differential expression in RNA-seq: a matter of depth.

Authors: Sonia Tarazona; Fernando García-Alcalde; Joaquín Dopazo; Alberto Ferrer; Ana Conesa
Journal: Genome Res Date: 2011-09-08 Impact factor: 9.043

Review 4. mRNAs, proteins and the emerging principles of gene expression control.

Authors: Christopher Buccitelli; Matthias Selbach
Journal: Nat Rev Genet Date: 2020-07-24 Impact factor: 53.242

5. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data.

Authors: Elena Bushmanova; Dmitry Antipov; Alla Lapidus; Andrey D Prjibelski
Journal: Gigascience Date: 2019-09-01 Impact factor: 6.524

6. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches.

Authors: Baris E Suzek; Yuqi Wang; Hongzhan Huang; Peter B McGarvey; Cathy H Wu
Journal: Bioinformatics Date: 2014-11-13 Impact factor: 6.937

7. CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors: Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal: Bioinformatics Date: 2012-10-11 Impact factor: 6.937

8. Pervasive transcription of the human genome produces thousands of previously unidentified long intergenic noncoding RNAs.

Authors: Matthew J Hangauer; Ian W Vaughn; Michael T McManus
Journal: PLoS Genet Date: 2013-06-20 Impact factor: 5.917

Review 9. Visual programming for next-generation sequencing data analytics.

Authors: Franco Milicchio; Rebecca Rose; Jiang Bian; Jae Min; Mattia Prosperi
Journal: BioData Min Date: 2016-04-27 Impact factor: 2.522

Review 10. Realizing the potential of full-length transcriptome sequencing.

Authors: Ashley Byrne; Charles Cole; Roger Volden; Christopher Vollmers
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2019-10-07 Impact factor: 6.237