| Literature DB >> 24690220 |
Sahar Abubucker, Samantha N McNulty, Bruce A Rosa, Makedonka Mitreva1.
Abstract
BACKGROUND: Alternative splicing (AS) of mRNA is a vital mechanism for enhancing genomic complexity in eukaryotes. Spliced isoforms of the same gene can have diverse molecular and biological functions and are often differentially expressed across various tissues, times, and conditions. Thus, AS has important implications in the study of parasitic nematodes with complex life cycles. Transcriptomic datasets are available from many species, but data must be revisited with splice-aware assembly protocols to facilitate the study of AS in helminthes.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24690220 PMCID: PMC3997825 DOI: 10.1186/1756-3305-7-151
Source DB: PubMed Journal: Parasit Vectors ISSN: 1756-3305 Impact factor: 3.876
test assemblies
| | | | | | | | |
| % Aligned reads | 97.37% | 99.09% | 97.34% | 97.36% | 97.37% | 97.37% | 96.70% |
| Isotigs | 16737 | 25776 | 16868 | 16548 | 16263 | 16130 | 16772 |
| Isotig N50 | 658 | 563 | 658 | 659 | 662 | 662 | 598 |
| Isogroups | 15403 | 24523 | 15404 | 15401 | 15380 | 15358 | 15940 |
| AS Isogroups | 824 (5.35%) | 823 (3.36%) | 824 (5.35%) | 802 (5.21%) | 741 (4.82%) | 674 (4.39%) | 691 (4.34%) |
| Ave. isotigs per AS isogroup | 2.62 | 2.52 | 2.78 | 2.43 | 2.19 | 2.15 | 2.20 |
| | | | | | | | |
| fragmentation | 9.40% | 20.70% | 9.40% | 9.40% | 9.40% | 9.40% | 9.90% |
| trans-chimeric isotigs2 | 397 (2.37%) | 407 (1.58%) | 398 (2.36%) | 398 (2.40%) | 397 (2.44%) | 385 (2.38%) | 148 (0.88%) |
| cis-chimeric isotigs3 | 165 (0.99%) | 185 (0.72%) | 209 (1.24%) | 195 (1.18%) | 148 (0.91%) | 145 (0.90%) | 104 (0.62%) |
| | | | | | | | |
| Isotigs with match4 | 6155 (36.77%) | 10604 (41.14%) | 6158 (36.51%) | 6083 (36.76%) | 6022 (37.03%) | 5996 (37.17%) | 6385 (38.07%) |
| Isogroups with match5 | 5937 (38.54%) | 10400 (42.41%) | 5940 (38.56%) | 5933 (38.52%) | 5913 (38.45%) | 5901 (38.42%) | 6294 (39.49%) |
| 5602 (27.31%) | 8470 (41.29%) | 5604 (27.32%) | 5599 (27.29%) | 5583 (27.21%) | 5579 (27.19%) | 5727 (27.92%) | |
| | | | | | | | |
| Isotigs with match4 | 11418 (68.22%) | 17031 (66.07%) | 11456 (67.92%) | 11217 (67.78%) | 11053 (67.96%) | 10984 (68.10%) | 11778 (70.22%) |
| Isogroups with match5 | 10811 (70.19%) | 16512 (67.33%) | 10816 (70.22%) | 10815 (70.22%) | 10789 (70.15%) | 10777 (70.17%) | 11540 (72.40%) |
| 9600 (46.79%) | 12129 (59.12%) | 9600 (46.79%) | 9598 (46.79%) | 9575 (46.67%) | 9564 (46.62%) | 9748 (47.52%) |
1Newbler parameters are as follows: urt, include unaligned read tips; het, heterogeneous population; icl, isotig contig length threshold; ml, minimum overlap length; mi, minimum overlap identity.
2Trans-chimeric isotigs refer to misassembled transcripts with multiple open reading frames coding in opposite directions.
3Cis-chimeric isotigs refer to misassembled transcripts containing sequences derived from distinct regions of the genome assembly.
4Matches were required to meet a cutoff of ≥90% nucleotide sequence identity over ≥75% of the length of the isotigs in a single high-scoring segment pair.
5Matching isogroups are defined as isogroups containing ≥1 isotig matched to a C. elegans feature.
Figure 1Roche/454 read processing, decontamination, assembly and annotation. Raw Roche/454 reads were converted from sff to fastq format for editing and assembly. Relevant adapter sequences were trimmed, and reads failing to meet quality and complexity thresholds were removed. Reads that successfully map to rRNA, bacterial, human or host sequences were also eliminated. The remaining, high-quality, species-specific reads were assembled with Newbler’s cDNA specific protocol using our optimized parameter combination, translated using Prot4EST [44], and annotated using InterProScan [45,46]. Statistical analyses can be carried out at the level of isotigs (unique transcripts) or isogroups (unique genetic loci) depending on the nature of the investigation.
assembly statistics
| WS150 | Oct 2005 | 20066 | 20066 |
| WS166 | Oct 2006 | 20082 | 23207 |
| WS183 | Oct 2007 | 20155 | 23541 |
| WS196 | Oct 2008 | 20191 | 23902 |
| WS208 | Oct 2009 | 20238 | 24202 |
| WS220 | Oct 2010 | 20405 | 24842 |
| WS228 | Oct 2011 | 20484 | 25391 |
| WS246 | Oct 2012 | 20537 | 26041 |
| WS240 | Oct 2013 | 20538 | 26769 |
1Assemblies and annotations from WormBase [41].
Figure 2Alternative splicing of gene C05B5.5. (A) Isogroup00600 from our de novo cDNA assembly contains two isotigs derived from C. elegans gene C18E3.6 (exons depicted as blue bars in top track). Alignment of Roche/454 reads (green bars with arrowheads indicating directionality) gave rise to three distinct contigs (dark, medium and light orange bars). These contigs were pieced together to form isotigs 01225 and 01226 based on read support displayed in the contig graph. Isotig01225, which contains all three contigs, corresponds perfectly to the gene model (blue bars). However, isotig01226 includes only the first (light orange) and third (dark orange), which results in a 50 bp gap with respect to isotig01225 and the gene model. (B) Illumina RNAseq reads (dark purple, horizontal bars) mapped to isotig01226 further verifies the junction between the first (light orange) and third (dark orange) contigs, with proportional coverage indicated (light purple, vertical bars). This figure was adapted from alignments visualized using the Integrated Genomics Viewer [48,49].
Assembly of down-sampled read sets
| | | | |
| Reads used | 1746642 | 698656 | 809855 |
| Average read length | 403 | 403 | 383 |
| % aligned reads | 96.70% | 94.05% | 92.68% |
| Isotigs | 16772 | 12132 | 17322 |
| Isotig N50 | 598 | 569 | 599 |
| Isogroups | 15940 | 11746 | 16129 |
| AS Isogroups | 708 (2.65%) | 341 (2.90%) | 1026 (6.36%) |
| Average isotigs per AS isogroup | 2.20 | 2.13 | 2.16 |
| | | | |
| Fragmentation | 9.90% | 6.30% | 9.60% |
| | | | |
| Isotigs with match1 | 6385 (38.07%) | 4424 (36.47%) | 6422 (37.07%) |
| Isogroups with match2 | 6294 (39.49%) | 4380 (37.29%) | 6224 (38.59%) |
| 5727 (27.92%) | 4213 (20.10%) | 5628 (27.43%) |
1Matches were required to meet a cutoff of ≥90% nucleotide sequence identity over ≥75% of the length of the isotigs in a single high-scoring segment pair.
2Matching isogroups are defined as isogroups containing ≥1 isotig matched to a C. elegans feature.
Parasitic nematode transcript assemblies
| Publication | [ | [ | [ | [ | [ | [ | [ | [ | [ |
| Genome BioProject ID | PRJNA72585 | PRJNA72571 | PRJNA72587 | PRJNA72135 | PRJNA72579 | PRJNA230512 | PRJNA72577 | PRJNA72569 | PRJNA74537 |
| Stages | Egg, L1, L2, iL3, aL3, male, female | Egg, L1, L2, iL3, aL3, L4, male, female | Egg, L1, iL3, L5, male, female | iL3, mixed sex adults | L2, iL3, L4, male, female | Mixed sex adults | Egg, L1, L2, iL3, L4, mixed sex adults | Mixed sex adults | Mixed sex adults |
| Clean reads | 4,028,728 | 6,113,083 | 4,740,349 | 1,566,641 | 2,614,527 | 1,050,204 | 7,528,633 | 1,746,999 | 2,513,840 |
| Normalized or full assembly | Normalized | Normalized | Normalized | Normalized | Full | Full | Normalized | Full | Full |
| Number of isotigs | 53,978 | 74,506 | 50,581 | 21,320 | 36,795 | 22,728 | 67,599 | 31,065 | 37,640 |
| Average isotig length | 1,029 bp | 763 bp | 964 bp | 866 bp | 815 bp | 820 bp | 889 bp | 989 bp | 535 bp |
| Number of isogroups | 35,422 | 42,785 | 29,960 | 16,233 | 23,061 | 15,828 | 37,189 | 21,780 | 31,546 |
| Number of AS Isogroups | 9,955 (28.10%) | 14,180 (33.14%) | 10,380 (34.65%) | 3,354 (20.66%) | 5, 589 (24.24%) | 3,869 (24.44%) | 11,840 (31.84%) | 4,604 (21.14%) | 3,686 (11.68%) |
| Average isotigs per AS isogroup | 2.86 | 3.24 | 2.99 | 2.52 | 3.46 | 2.78 | 3.57 | 3.02 | 2.65 |
| Number of unique translations | 48,713 | 60,697 | 44,784 | 20,286 | 29,478 | 20,436 | 58,022 | 28,041 | 35,669 |
| Number of unique InterPro domains | 4,103 | 3,967 | 4,110 | 3,823 | 3,978 | 2,212 | 4,903 | 4,550 | 3,454 |
| Number of Unique GO terms | 1,234 | 1,211 | 1,259 | 1,183 | 1,239 | 809 | 1,428 | 1,301 | 1,081 |
Enrichment of InterPro protein domains among alternatively spliced isogroups
| IPR000504 | RNA recognition motif domain | 9 | 858 | 50.7% | 2.1E-06 |
| IPR016197 | Chromo domain-like | 9 | 128 | 64.1% | 3.8E-05 |
| IPR012677 | Nucleotide-binding, alpha-beta plait | 9 | 1037 | 48.7% | 4.2E-05 |
| IPR006092 | Acyl-CoA dehydrogenase, N-terminal | 8 | 53 | 73.6% | 2.0E-04 |
| IPR013786 | Acyl-CoA dehydrogenase/oxidase, N-terminal | 9 | 68 | 69.1% | 3.2E-04 |
| IPR003593 | AAA + ATPase domain | 8 | 189 | 57.7% | 3.6E-04 |
| IPR001412 | Aminoacyl-tRNA synthetase, class I, conserved site | 8 | 43 | 74.4% | 6.6E-04 |
| IPR006091 | Acyl-CoA oxidase/dehydrogenase, central domain | 9 | 70 | 67.1% | 7.6E-04 |
| IPR009100 | Acyl-CoA dehydrogenase/oxidase, N-terminal and middle domain | 9 | 93 | 63.4% | 8.9E-04 |
| IPR011993 | Pleckstrin homology-like domain | 9 | 474 | 50.4% | 1.7E-03 |
| IPR014001 | Helicase, superfamily 1/2, ATP-binding domain | 9 | 317 | 52.1% | 3.7E-03 |
| IPR023780 | Chromo domain | 9 | 79 | 63.3% | 3.6E-03 |
| IPR000953 | Chromo domain/shadow | 9 | 85 | 62.4% | 3.7E-03 |
| IPR015421 | Pyridoxal phosphate-dependent transferase, major region, subdomain 1 | 9 | 275 | 52.7% | 3.8E-03 |
| IPR002194 | Chaperonin TCP-1, conserved site | 8 | 55 | 67.3% | 3.6E-03 |
| IPR003954 | RNA recognition motif domain, eukaryote | 9 | 41 | 70.7% | 4.5E-03 |
| IPR006020 | PTB/PI domain | 9 | 71 | 63.4% | 5.7E-03 |
| IPR001650 | Helicase, C-terminal | 9 | 298 | 51.7% | 6.8E-03 |
| IPR017998 | Chaperone tailless complex polypeptide 1 (TCP-1) | 9 | 66 | 63.6% | 7.6E-03 |
| IPR011545 | DNA/RNA helicase, DEAD/DEAH box type, N-terminal | 9 | 275 | 52.0% | 7.3E-03 |
| IPR002495 | Glycosyl transferase, family 8 | 7 | 11 | 90.9% | 7.2E-03 |
*In total, 40.5% of all isogroups associated with any InterPro domain are AS.
**Binomial test, FDR corrected, threshold value of 0.01.