| Literature DB >> 26793234 |
Joanna Moreton1, Abril Izquierdo2, Richard D Emes1.
Abstract
De novo assembly of a complete transcriptome without the need for a guiding reference genome is attractive, particularly where the cost and complexity of generating a eukaryote genome is prohibitive. The transcriptome should not however be seen as just a quick and cheap alternative to building a complete genome. Transcriptomics allows the understanding and comparison of spatial and temporal samples within an organism, and allows surveying of multiple individuals or closely related species. De novo assembly in theory allows the building of a complete transcriptome without any prior knowledge of the genome. It also allows the discovery of alternate splice forms of coding RNAs and also non-coding RNAs, which are often missed by proteomic approaches, or are incompletely annotated in genome studies. The limitations of the method are that the generation of a truly complete assembly is unlikely, and so we require some methods for the assessment of the quality and appropriateness of a generated transcriptome. Whilst no single consensus pipeline or tool is agreed as optimal, various algorithms, and easy to use software do exist making transcriptome generation a more common approach. With this expansion of data, questions still exist relating to how do we make these datasets fully discoverable, comparable and most useful to understand complex biological systems?Entities:
Keywords: annotation; assessment; availability; de novo transcriptome assembly; high-throughput sequencing
Year: 2016 PMID: 26793234 PMCID: PMC4707302 DOI: 10.3389/fgene.2015.00361
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1An overview of the two transcriptome assembly pipelines. The key parts of two transcriptome assembly pipelines are shown depending on whether a reference genome is available. This review is focused on de novo transcriptome assembly; more information on the pipeline for reference-based transcriptome assembly can be found in review papers such as Martin and Wang (2011).
Figure 2An example of a simple Read sequences (B) All subsequence k-mers of length 5 from the reads (C) A de Bruijn graph constructed from unique k-mers as the nodes and overlapping k-mers connected by edges (a k-mer shifted by one base overlaps another k-mer by k-1 bases) (D) Assembled transcripts by traversing the two paths in the graph.