| Literature DB >> 35355524 |
Xiaoyu Zhao1,2, Ting Yu1.
Abstract
Full-length transcript reconstruction has a pivotal role in RNA-seq data analysis. In this research, we present a new genome-guided transcriptome assembly algorithm, namely Tiglon, which integrates multiple alignments of different mapping tools and builds the labeled splice graphs, followed by a label-based dynamic path-searching strategy to reconstruct the transcripts. We evaluate Tiglon on a simulated dataset and 12 real datasets under the Hisat2 and Star mappings. The results indicate that the integrating techniques of Tiglon exhibit great superiority over the state-of-the-art assemblers, including StringTie2 and Scallop, depending on Hisat2 alignments, Star alignments, or the merged alignments of both. Especially, Tiglon is significantly powerful in recovering lowly expressed transcripts.Entities:
Keywords: Bioinformatics; Biological sciences; Biological sciences research methodologies; Experimental models in systems biology; Systems biology
Year: 2022 PMID: 35355524 PMCID: PMC8958329 DOI: 10.1016/j.isci.2022.104067
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1An IGV snapshot shows that the reference transcript named “XR_929,880.3” of human genome GRCh38 is covered by reads from RNA-seq sample SRA: SRR307911
The exons of this transcript are all captured by both Hisat2 and Star mappings, while its first junction is not captured by Star mapping, and its fifth junction is not captured by Hisat2 mapping. Depending on only one aligner, StringTie2 and Scallop cannot recover this transcript, while Tiglon recovers it by integrating both alignments.
Figure 2Performance evaluation on the simulated dataset
(A) Precision and the number of correctly assembled transcripts of the assemblers on the simulated dataset.
(B) F-score of the assemblers on the simulated dataset.
(C) Comparisons of detected transcripts with low, middle, and high expression levels on the simulated dataset. The abbreviation ST is for StringTie2, SC for Scallop, and MA for MergedAlignments.
Figure 3Performance evaluation on the eight Homo sapiens samples H1–H8
(A) Precision and the number of correctly assembled transcripts of the assemblers on the eight samples.
(B) Average F-score of the assemblers on the eight samples. The error bars show the SD (the same for other panels).
(C) The average number of correctly assembled transcripts with different expression levels by the assemblers on the eight samples. The abbreviation ST is for StringTie2, SC for Scallop, and MA for MergedAlignments.
Figure 4Performance evaluation on the four Mus musculus samples M1–M4
(A) Precision and the number of correctly assembled transcripts of the assemblers on the four samples.
(B) Average F-score of the assemblers on the four samples. The error bars show the SD (the same for other panels).
(C) The average number of correctly assembled transcripts with different expression levels by the assemblers on the four samples. The abbreviation ST is for StringTie2, SC for Scallop, and MA for MergedAlignments.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Fastq files for RNA-seq of H1 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR307911 |
| Fastq files for RNA-seq of H2 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR387662 |
| Fastq files for RNA-seq of H3 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR10517380 |
| Fastq files for RNA-seq of H4 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR2403203 |
| Fastq files for RNA-seq of H5 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR307903 |
| Fastq files for RNA-seq of H6 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR315323 |
| Fastq files for RNA-seq of H7 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR315334 |
| Fastq files for RNA-seq of H8 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR7536920 |
| Fastq files for RNA-seq of M1 | Sequence Read Archive (SRA) in NCBI | SRA accession: DRR205674 |
| Fastq files for RNA-seq of M2 | Sequence Read Archive (SRA) in NCBI | SRA accession: DRR205677 |
| Fastq files for RNA-seq of M3 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3320855 |
| Fastq files for RNA-seq of M4 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3320871 |
| Fastq files for RNA-seq of S1 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR545723 |
| Fastq files for RNA-seq of S2 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR534291 |
| Fastq files for RNA-seq of S3 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8767255 |
| Fastq files for RNA-seq of S4 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR307905 |
| Fastq files for RNA-seq of S5 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8759122 |
| Fastq files for RNA-seq of S6 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR315326 |
| Fastq files for RNA-seq of S7 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR315330 |
| Fastq files for RNA-seq of S8 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8867129 |
| Fastq files for RNA-seq of S9 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8867125 |
| Fastq files for RNA-seq of S10 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8767256 |
| Fastq files for RNA-seq of S11 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR7478767 |
| Fastq files for RNA-seq of S12 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR7536918 |
| Fastq files for RNA-seq of S13 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR10517375 |
| Fastq files for RNA-seq of S14 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR10517379 |
| Fastq files for RNA-seq of S15 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR10517374 |
| Fastq files for RNA-seq of S16 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8315697 |
| Fastq files for RNA-seq of S17 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8315695 |
| Fastq files for RNA-seq of S18 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR7047912 |
| Fastq files for RNA-seq of S19 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8867128 |
| Fastq files for RNA-seq of S20 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8588656 |
| Fastq files for RNA-seq of S21 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR10611961 |
| Fastq files for RNA-seq of S22 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR10039475 |
| Fastq files for RNA-seq of S23 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR6013560 |
| Fastq files for RNA-seq of S24 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3639847 |
| Fastq files for RNA-seq of S25 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3639846 |
| Fastq files for RNA-seq of S26 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3639851 |
| Fastq files for RNA-seq of S27 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3639849 |
| Fastq files for RNA-seq of S28 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR8759124 |
| Fastq files for RNA-seq of S29 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3502071 |
| Fastq files for RNA-seq of S30 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR11114714 |
| Fastq files for RNA-seq of S31 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR11171673 |
| Fastq files for RNA-seq of S32 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR11171674 |
| Fastq files for RNA-seq of S33 | Sequence Read Archive (SRA) in NCBI | SRA accession: DRR205676 |
| Fastq files for RNA-seq of S34 | Sequence Read Archive (SRA) in NCBI | SRA accession: DRR205678 |
| Fastq files for RNA-seq of S35 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3320877 |
| Fastq files for RNA-seq of S36 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3320873 |
| Fastq files for RNA-seq of S37 | Sequence Read Archive (SRA) in NCBI | SRA accession: SRR203276 |
| Fastq files for RNA-seq of S38 | Sequence Read Archive (SRA) in NCBI | SRA accession: ERR3320869 |
| Human reference genome, GRCh38/hg38 | Genome Reference Consortium | |
| Mouse reference genome, GHCm38/mm10 | Genome Reference Consortium | |
| Human reference transcriptome, hg38.ncbiRefSeq.gtf | Genome Reference Consortium | |
| Mouse reference transcriptome, mm10.ncbiRefSeq.gtf | Genome Reference Consortium | |
| Tiglon | This paper | |
| StringTie2 | ||
| Scallop | ||
| iPAC | ||
| Trans-Borrow | ||
| RSEM | ||
| Hisat2 | ||
| Star | ||
| Samtools | ||