| Literature DB >> 23557086 |
Abstract
BACKGROUND: Next generation transcriptome sequencing (RNA-Seq) is emerging as a powerful experimental tool for the study of alternative splicing and its regulation, but requires ad-hoc analysis methods and tools. PASTA (Patterned Alignments for Splicing and Transcriptome Analysis) is a splice junction detection algorithm specifically designed for RNA-Seq data, relying on a highly accurate alignment strategy and on a combination of heuristic and statistical methods to identify exon-intron junctions with high accuracy.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23557086 PMCID: PMC3623791 DOI: 10.1186/1471-2105-14-116
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Junction Accuracy of TopHat and PASTA. The blue bars represent TopHat predictions and the red bars represent PASTA predictions. Junction FP rates are shown in the top panel and junction FN rates are shown in the bottom panel. A total of 4 different sequencing depths were simulated.
Figure 2Junction Accuracy of PASTA and other software. The two panels display the results of the comparison of PASTA with other junction detection pipelines on simulated datasets. In the first simulation, the frequencies of indels, substitutions and sequencing errors were 0.05%, 0.1% and 0.5% respectively, and 80% of the splice signals were from annotated splice isoforms. In the second simulation, the frequencies of indels, substitutions and sequencing errors were 0.25%, 0.5% and 1% respectively, with 65% of the splice signals coming from annotated splice forms. In addition, 25% of the trailing 10 bases are subject to a 50% sequencing error rate. Reproduced with permission from [4].
Number of reads and junctions detected
| | | |||||
|---|---|---|---|---|---|---|
| 1 | Control | Lane 1 | 19.2 | 165541 | 80211 | 2.064 |
| | | Lane 3 | 15.4 | 149797 | 72908 | 1.828 |
| | | Total | 34.6 | 195731 | 112581 | 1.739 |
| | Mutant | Lane 1 | 21.8 | 169493 | 72908 | 2.325 |
| | | Lane 2 | 17.9 | 157481 | 82036 | 1.920 |
| | | Lane 3 | 22.3 | 162408 | 81823 | 1.985 |
| | | Lane 4 | 39.2 | 202157 | 59014 | 3.426 |
| | | Total | 101.2 | 287568 | 152196 | 1.889 |
| 2 | Control | Lane 1 | 29.9 | 166050 | 140831 | 1.179 |
| | | Lane 2 | 8.74 | 141885 | 107399 | 1.321 |
| | | Lane 3 | 10.2 | 144879 | 110459 | 1.312 |
| | | Total | 48.84 | 210016 | 157949 | 1.330 |
| | Mutant | Lane 1 | 27.6 | 148238 | 113908 | 1.301 |
| | | Lane 2 | 10.6 | 160885 | 124606 | 1.291 |
| | | Lane 3 | 25.4 | 175240 | 133601 | 1.312 |
| | | Lane 4 | 25.6 | 177388 | 133539 | 1.328 |
| Total | 89.2 | 250991 | 167664 | 1.497 |
This table displays the total number of reads, the total number of junctions identified by both programs, and the ratio between these two numbers for Run 1 and Run 2 respectively.
Number of junctions from ENSEMBL known genes
| 1 | Control | Lane 1 | 128811 | 65117 | 63063 | 0.490 | 0.968 |
| | | Lane 3 | 120465 | 67552 | 65252 | 0.542 | 0.966 |
| | | Total | 140083 | 86148 | 83674 | 0.597 | 0.971 |
| | Mutant | Lane 1 | 129099 | 57615 | 55770 | 0.432 | 0.968 |
| | | Lane 2 | 122237 | 67038 | 64517 | 0.528 | 0.962 |
| | | Lane 3 | 123860 | 65568 | 63078 | 0.509 | 0.962 |
| | | Lane 4 | 142097 | 41084 | 39695 | 0.279 | 0.966 |
| | | Total | 163462 | 98757 | 95854 | 0.586 | 0.971 |
| 2 | Control | Lane 1 | 130899 | 115751 | 111098 | 0.849 | 0.960 |
| | | Lane 2 | 119397 | 94638 | 91672 | 0.768 | 0.969 |
| | | Lane 3 | 119950 | 96743 | 93614 | 0.780 | 0.968 |
| | | Total | 146117 | 123247 | 118544 | 0.811 | 0.962 |
| | Mutant | Lane 1 | 122889 | 99377 | 96287 | 0.784 | 0.969 |
| | | Lane 2 | 127854 | 106340 | 102840 | 0.804 | 0.967 |
| | | Lane 3 | 132544 | 111252 | 107418 | 0.810 | 0.966 |
| | | Lane 4 | 134049 | 111334 | 107571 | 0.802 | 0.966 |
| Total | 156339 | 126177 | 121633 | 0.778 | 0.964 |
This table displays the number of junctions in ENSEMBL known gene models identified by PASTA and TopHat for Run 1 and Run 2 respectively.
Number of junctions not from ENSEMBL known genes
| 1 | Control | Lane 1 | 36702 | 15094 | 3267 | 0.089 | 0.216 |
| | | Lane 3 | 29331 | 14416 | 3098 | 0.106 | 0.215 |
| | | Total | 55647 | 26433 | 5368 | 0.096 | 0.203 |
| | Mutant | Lane 1 | 40393 | 15293 | 2589 | 0.064 | 0.169 |
| | | Lane 2 | 35243 | 14997 | 2990 | 0.085 | 0.199 |
| | | Lane 3 | 38547 | 16255 | 3110 | 0.081 | 0.191 |
| | | Lane 4 | 60059 | 17930 | 2234 | 0.037 | 0.125 |
| | | Total | 124104 | 53439 | 7947 | 0.064 | 0.149 |
| 2 | Control | Lane 1 | 35150 | 25080 | 10251 | 0.292 | 0.409 |
| | | Lane 2 | 22487 | 12716 | 4785 | 0.213 | 0.376 |
| | | Lane 3 | 24928 | 13716 | 5190 | 0.208 | 0.378 |
| | | Total | 63898 | 34702 | 15181 | 0.238 | 0.437 |
| | Mutant | Lane 1 | 25348 | 14531 | 5710 | 0.225 | 0.393 |
| | | Lane 2 | 33030 | 18266 | 7254 | 0.220 | 0.397 |
| | | Lane 3 | 42695 | 22349 | 9368 | 0.219 | 0.419 |
| | | Lane 4 | 43338 | 22205 | 9290 | 0.214 | 0.418 |
| Total | 94651 | 41487 | 19214 | 0.203 | 0.463 |
The third part displays the number of additional junctions (not in ENSEMBL annotation) identified by PASTA and TopHat for Run 1 and Run 2 respectively.
Average probability scores and percentages of canonical junctions
| | | ||||||
|---|---|---|---|---|---|---|---|
| 1 | 182951 | 0.208 | 0.169 | 161315 | 0.71 | 0.706 | |
| | 3 | 151512 | 0.215 | 0.185 | 146821 | 0.71 | 0.72 |
| | 5 | 225167 | 0.181 | 0.143 | 168571 | 0.697 | 0.683 |
| | 6 | 214297 | 0.167 | 0.139 | 160157 | 0.665 | 0.673 |
| | 7 | 196135 | 0.176 | 0.141 | 186288 | 0.607 | 0.6 |
| | 8 | 286174 | 0.174 | 0.132 | 224290 | 0.609 | 0.569 |
| 1 | 185119 | 0.407 | 0.142 | 182407 | 0.743 | 0.698 | |
| | 2 | 111081 | 0.4 | 0.256 | 121711 | 0.86 | 0.853 |
| | 3 | 128474 | 0.364 | 0.22 | 128610 | 0.833 | 0.822 |
| | 5 | 120223 | 0.424 | 0.243 | 129222 | 0.853 | 0.84 |
| | 6 | 153947 | 0.37 | 0.183 | 155789 | 0.78 | 0.754 |
| | 7 | 179484 | 0.38 | 0.151 | 199578 | 0.66 | 0.624 |
| | 8 | 185332 | 0.381 | 0.146 | 201870 | 0.656 | 0.615 |
| | | ||||||
| 1 | 219248 | 0.239 | 0.204 | 125018 | 0.801 | 0.802 | |
| | 3 | 186561 | 0.25 | 0.227 | 111772 | 0.806 | 0.818 |
| | 5 | 264113 | 0.21 | 0.174 | 129625 | 0.791 | 0.783 |
| | 6 | 255218 | 0.196 | 0.172 | 119236 | 0.774 | 0.786 |
| | 7 | 247485 | 0.202 | 0.17 | 134938 | 0.724 | 0.722 |
| | 8 | 345153 | 0.196 | 0.152 | 165311 | 0.718 | 0.683 |
| 1 | 223944 | 0.413 | 0.179 | 143582 | 0.825 | 0.791 | |
| | 2 | 135570 | 0.445 | 0.321 | 97222 | 0.913 | 0.912 |
| | 3 | 155550 | 0.4 | 0.275 | 101534 | 0.902 | 0.898 |
| | 5 | 145114 | 0.458 | 0.299 | 104331 | 0.908 | 0.904 |
| | 6 | 188431 | 0.387 | 0.223 | 121305 | 0.87 | 0.855 |
| | 7 | 229704 | 0.371 | 0.18 | 149358 | 0.767 | 0.739 |
| | 8 | 236287 | 0.372 | 0.175 | 150915 | 0.762 | 0.729 |
| | | ||||||
| 1 | 259337 | 0.298 | 0.268 | 84929 | 0.887 | 0.887 | |
| | 3 | 225201 | 0.318 | 0.302 | 73132 | 0.892 | 0.898 |
| | 5 | 305148 | 0.262 | 0.228 | 88590 | 0.881 | 0.878 |
| | 6 | 296541 | 0.25 | 0.233 | 77913 | 0.875 | 0.881 |
| | 7 | 296642 | 0.251 | 0.224 | 85781 | 0.855 | 0.852 |
| | 8 | 397801 | 0.234 | 0.187 | 112663 | 0.83 | 0.806 |
| 1 | 263656 | 0.442 | 0.231 | 103870 | 0.907 | 0.892 | |
| | 2 | 169910 | 0.527 | 0.427 | 62882 | 0.948 | 0.95 |
| | 3 | 189370 | 0.475 | 0.37 | 67714 | 0.944 | 0.945 |
| | 5 | 178189 | 0.526 | 0.394 | 71256 | 0.947 | 0.948 |
| | 6 | 223803 | 0.44 | 0.295 | 85933 | 0.931 | 0.927 |
| | 7 | 280701 | 0.397 | 0.23 | 98361 | 0.899 | 0.885 |
| | 8 | 287818 | 0.395 | 0.222 | 99384 | 0.898 | 0.88 |
| | | ||||||
| 1 | 289649 | 0.352 | 0.326 | 54617 | 0.924 | 0.922 | |
| | 3 | 253947 | 0.377 | 0.365 | 44386 | 0.926 | 0.926 |
| | 5 | 335371 | 0.311 | 0.28 | 58367 | 0.921 | 0.917 |
| | 6 | 326184 | 0.3 | 0.286 | 48270 | 0.917 | 0.916 |
| | 7 | 326196 | 0.296 | 0.272 | 56227 | 0.907 | 0.902 |
| | 8 | 431187 | 0.269 | 0.224 | 79277 | 0.887 | 0.87 |
| 1 | 288984 | 0.475 | 0.278 | 78542 | 0.938 | 0.933 | |
| | 2 | 197064 | 0.583 | 0.497 | 35728 | 0.958 | 0.96 |
| | 3 | 216813 | 0.532 | 0.44 | 40271 | 0.956 | 0.957 |
| | 5 | 205938 | 0.58 | 0.466 | 43507 | 0.958 | 0.96 |
| | 6 | 251758 | 0.49 | 0.36 | 57978 | 0.949 | 0.95 |
| | 7 | 308843 | 0.434 | 0.28 | 70219 | 0.936 | 0.93 |
| 8 | 316415 | 0.432 | 0.271 | 70787 | 0.935 | 0.926 | |
Average coverage (in reads/junction) and probability score of junctions in ENSEMBL known genes
| | | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1 | 127707 | 13.393 | 0.818 | 0.922 | 216559 | 1.872 | 0.222 | 0.125 | |
| | 3 | 120058 | 11.559 | 0.817 | 0.933 | 178275 | 1.857 | 0.217 | 0.122 |
| | 5 | 127561 | 14.353 | 0.822 | 0.923 | 266177 | 1.808 | 0.200 | 0.111 |
| | 6 | 121484 | 12.362 | 0.799 | 0.924 | 252970 | 1.737 | 0.178 | 0.1 |
| | 7 | 121641 | 14.724 | 0.799 | 0.921 | 260782 | 2.033 | 0.194 | 0.106 |
| | 8 | 135760 | 18.913 | 0.814 | 0.900 | 374704 | 2.151 | 0.203 | 0.116 |
| 1 | 127102 | 22.007 | 0.896 | 0.933 | 240424 | 2.120 | 0.403 | 0.146 | |
| | 2 | 119385 | 9.819 | 0.903 | 0.961 | 113407 | 1.629 | 0.364 | 0.155 |
| | 3 | 119715 | 10.806 | 0.904 | 0.960 | 137369 | 1.627 | 0.332 | 0.139 |
| | 5 | 122562 | 11.343 | 0.907 | 0.960 | 126883 | 1.676 | 0.394 | 0.158 |
| | 6 | 125888 | 14.778 | 0.909 | 0.955 | 183848 | 1.811 | 0.348 | 0.138 |
| | 7 | 126222 | 19.085 | 0.904 | 0.946 | 252840 | 2.151 | 0.339 | 0.127 |
| 8 | 127674 | 19.007 | 0.895 | 0.932 | 259528 | 2.140 | 0.342 | 0.124 | |
Known Junctions indicates predicted junctions appearing in ENSEMBL known genes, and Unknown Junctions indicates predicted junctions not appearing in ENSEMBL known genes. Only junctions with a maximum coverage of 100 were considered.
Average coverage (in reads/junction) and probability score of junctions by canonical signal
| | | ||||||
|---|---|---|---|---|---|---|---|
| 1 | 1 | 144869 | 0.783 | 11.85 | 199397 | 0.196 | 2.001 |
| | 3 | 133780 | 0.787 | 10.428 | 164553 | 0.192 | 1.967 |
| | 5 | 147430 | 0.777 | 12.493 | 246308 | 0.177 | 1.909 |
| | 6 | 137620 | 0.756 | 10.955 | 236834 | 0.161 | 1.831 |
| | 7 | 139533 | 0.756 | 12.907 | 242890 | 0.174 | 2.142 |
| | 8 | 165421 | 0.753 | 15.589 | 345043 | 0.179 | 2.303 |
| 2 | 1 | 153599 | 0.871 | 18.534 | 213927 | 0.360 | 2.151 |
| | 2 | 132287 | 0.888 | 8.983 | 100505 | 0.315 | 1.678 |
| | 3 | 133965 | 0.884 | 9.781 | 123119 | 0.288 | 1.68 |
| | 5 | 137729 | 0.89 | 10.242 | 111716 | 0.346 | 1.721 |
| | 6 | 145698 | 0.882 | 12.946 | 164038 | 0.304 | 1.871 |
| | 7 | 151673 | 0.869 | 16.193 | 227389 | 0.299 | 2.185 |
| 8 | 151274 | 0.864 | 16.272 | 235928 | 0.307 | 2.207 | |
Canonical Junctions indicates predicted junctions with a canonical splicing signal, while Non-canonical Junctions indicates predicted junctions without the canonical signal. Only junctions with a maximum coverage of 100 were considered.
Prediction of minor splice sites using mouse RNA-Seq dataset from Mbnl2 experiment
| 1 | 178 | 652 | 27.30% |
| 2 | 170 | 684 | 24.85% |
| 3 | 190 | 798 | 23.81% |
| 5 | 153 | 660 | 23.18% |
| 6 | 164 | 751 | 21.84% |
| 7 | 71 | 221 | 32.13% |
| 8 | 181 | 369 | 49.05% |
| Total | 1107 | 4135 | 26.77% |
The number of minor splice sites (AT-AC) from PASTA predictions compared against ENSEMBL mouse annotations.