| Literature DB >> 31874618 |
Jin Zhao1, Haodi Feng2, Daming Zhu1, Chi Zhang3, Ying Xu4.
Abstract
BACKGROUND: Alternative splicing allows the pre-mRNAs of a gene to be spliced into various mRNAs, which greatly increases the diversity of proteins. High-throughput sequencing of mRNAs has revolutionized our ability for transcripts reconstruction. However, the massive size of short reads makes de novo transcripts assembly an algorithmic challenge.Entities:
Keywords: Alternative splicing; De novo assembly; RNA-seq
Mesh:
Year: 2019 PMID: 31874618 PMCID: PMC6929406 DOI: 10.1186/s12859-019-3272-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Right extension suffix tree. An example for adding the suffixes of a reverse read to the right extension suffix tree
Fig. 2Left extension suffix tree. An example for adding the suffixes of a read to the left extension suffix tree
Fig. 3Simplified suffix tree. An example for constructing the simplified suffix trees. In this example, there are total of four reads. The read length is 5bp, and the minimum overlap length is 2bp
Comparison of memory occupied by different strategies
| Reads | 50bp | 75bp | 100bp | 50bp | 50bp |
|---|---|---|---|---|---|
| 0.1million | 0.1million | 0.1million | 0.5million | 1million | |
| Suffix tree | 68.3M | 143.5M | 220.5M | 85.1M | 94.3M |
| Single- | 20.8M | 34.7M | 49.2M | 78.5M | 149.3M |
| Multiple- | 286.0M | 572.9M | 899.5M | 445.9M | 640.6M |
Fig. 4Contig extension. An example for extending contigs by reads under the help of right extension suffix tree and left extension suffix tree
Fig. 5Impact of the length of read on the performances of assemblers
Number of full-length transcripts recovered by the de novo assemblers
| Assembler | Dog dataset | Human dataset | ||||
|---|---|---|---|---|---|---|
| Identified | Reconstructed | Candidates | Identified | Reconstructed | Candidates | |
| Trinity | 1017 | 1663 | 96018 | 1913 | 3039 | 437730 |
| BinPacker | 1149 | 2601 | 73419 | 1491 | 3449 | 192674 |
| IDBA-Tran | 598 | 1011 | 69757 | 1376 | 2196 | 182651 |
| SOAPdenovo-Trans | 1005 | 1006 | 85028 | / | / | / |
| Oases | 530 | 957 | 113361 | 1762 | 3126 | 439865 |
| IsoTree | 1354 | 2974 | 81597 | 2015 | 3821 | 218269 |
| DTA-SiST-E | 1504 | 3916 | 103461 | 2175 | 4930 | 278255 |
| DTA-SiST-H | 1370 | 2514 | 71356 | 1950 | 3959 | 199642 |
Fig. 6The running time and peak memory for each de novo assembler on the real dataset