| Literature DB >> 35743218 |
Md Tofazzal Hossain1,2,3, Jingjing Zhang1,2, Md Selim Reza1,2, Yin Peng4, Shengzhong Feng1, Yanjie Wei1.
Abstract
Circular RNAs (circRNAs) are RNA molecules formed by joining a downstream 3 splice donor site and an upstream 5 splice acceptor site. Several recent studies have identified circRNAs as potential biomarker for different diseases. A number of methods are available for the identification of circRNAs. The circRNA identification methods cannot provide full-length sequences. Reconstruction of the full-length sequences is crucial for the downstream analyses of circRNA research including differential expression analysis, circRNA-miRNA interaction analysis and other functional studies of the circRNAs. However, a limited number of methods are available in the literature for the reconstruction of full-length circRNA sequences. We developed a new method, circRNA-full, for full-length circRNA sequence reconstruction utilizing chimeric alignment information from the STAR aligner. To evaluate our method, we used full-length circRNA sequences produced by isocirc and ciri-long using long-reads RNA-seq data. Our method achieved better reconstruction rate, precision, sensitivity and F1 score than the existing full-length circRNA sequence reconstruction tool ciri-full for both human and mouse data.Entities:
Keywords: circular RNA; full-length sequence; reconstruction of circRNA sequence
Mesh:
Substances:
Year: 2022 PMID: 35743218 PMCID: PMC9223815 DOI: 10.3390/ijms23126776
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Expression profile of circRNAs in human data. (A) Venn-diagram of the circRNAs identified by CIRI and CIRCexplorer. (B) Distribution of the genomic origin of the circRNAs (%). (C) Distribution of the number of back-spliced reads spanning the circRNAs. (D) Chromosome distribution of the identified circRNAs.
Figure 2Expression profile of circRNAs in mouse data. (A) Venn-diagram of the circRNAs identified by CIRI and CIRCexplorer. (B) Distribution of the genomic origin of the circRNAs (%). (C) Distribution of the number of back-spliced reads spanning the circRNAs. (D) Chromosome distribution of the identified circRNAs.
Figure 3Sequence reconstruction results for circRNA-full and ciri-full. (A) Venn-diagram of the number of reconstructed circRNAs for human data. (B) Venn-diagram of the number of reconstructed circRNAs for the mouse data. (C) Length distribution of reconstructed circRNAs.
Number of reconstructed sequences produced by ciri-full and circRNA-full for different samples.
| Species | Sample | NSC | ciri-Full | circRNA-Full | ||
|---|---|---|---|---|---|---|
| NRS | % of RS | NRS | % of RS | |||
|
| SRR10612068 | 3756 | 3048 | 81.15 | 3411 | 90.81 |
| SRR10612069 | 3254 | 2552 | 78.43 | 2959 | 90.93 | |
| SRR10612070 | 3290 | 2616 | 79.51 | 3002 | 91.25 | |
|
| CRR194214 | 9699 | 9209 | 94.95 | 9658 | 99.58 |
| CRR194215 | 10,911 | 8649 | 79.27 | 10,864 | 99.57 | |
Note: NSC = Number of sequences used for comparison, NRS = Number of reconstructed sequences, RS = Reconstructed sequences.
Performance comparison of ciri-full and circRNA-full using different accuracy measures.
| Species | Sample | Method | NRS | TP | FN | Precision | Sensitivity | F1 Score |
|---|---|---|---|---|---|---|---|---|
|
| SRR10612068 | ciri-full | 3048 | 1245 | 324 | 40.85% | 79.35% | 0.5393 |
| circRNA-full | 3411 | 1942 | 99 | 56.93% | 95.15% | 0.7124 | ||
| SRR10612069 | ciri-full | 2552 | 1034 | 351 | 40.52% | 74.66% | 0.5253 | |
| circRNA-full | 2959 | 1707 | 92 | 57.69% | 94.89% | 0.7175 | ||
| SRR10612070 | ciri-full | 2616 | 1080 | 330 | 41.28% | 76.60% | 0.5365 | |
| circRNA-full | 3002 | 1769 | 94 | 58.93% | 94.95% | 0.7272 | ||
|
| CRR194214 | ciri-full | 9209 | 1965 | 51 | 21.34% | 97.47% | 0.3501 |
| circRNA-full | 9658 | 2138 | 7 | 22.14% | 99.67% | 0.3623 | ||
| CRR194215 | ciri-full | 8649 | 1798 | 420 | 20.79% | 81.06% | 0.3309 | |
| circRNA-full | 10864 | 2315 | 5 | 21.31% | 99.78% | 0.3512 |
Note: NRS = Number of reconstructed sequences, TP = True positives, FN = False negatives.
Figure 4Performance comparison of circRNA-full and ciri-full. (A) Reconstruction rate. (B) Precision. (C) Sensitivity. (D) F1 score for different samples.
Figure 5Extracting alignment of circRNA spanning reads for each circRNA from the alignment bam file.
Process of extracting exons and introns from the CIGAR value of chimeric alignments.
| CIGAR String | Corresponding Number | Start | End | Transcript Name | Intersecting | Intersecting |
|---|---|---|---|---|---|---|
| M | 67 | 125314967 | 125315033 | NM_021964 | 125313307 | 125313656 |
| N | 8354 | 125315034 | 125323387 | |||
| M | 57 | 125323388 | 125323444 | NM_021964 | 125323308 | 125323444 |
| N | 7713 | 125323445 | 125331157 | |||
| M | 81 | 125331158 | 125331238 | NM_021964 | 125331157 | 125331238 |
Figure 6Procedure of extracting exons from the CIGAR string from the alignment of chimeric read.
Figure 7Procedure of detecting skip exon after extracting exons and introns from the CIGAR string of the spanning chimeric reads.
Figure 8Procedure of deleting skip exon from all available exons extracted from the CIGAR string of the spanning chimeric reads.