| Literature DB >> 24758194 |
Marta P Castro-Ferreira, Tjalf E de Boer, John K Colbourne, Riet Vooijs, Cornelis A M van Gestel, Nico M van Straalen, Amadeu M V M Soares, Mónica J B Amorim, Dick Roelofs1.
Abstract
BACKGROUND: The soil worm Enchytraeus crypticus (Oligochaeta) is an ecotoxicology model species that, until now, was without genome or transcriptome sequence information. The present research aims at studying the transcriptome of Enchytraeus crypticus, sampled from multiple test conditions, and the construction of a high-density microarray for functional genomic studies.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24758194 PMCID: PMC4234436 DOI: 10.1186/1471-2164-15-302
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Composition of theRNA pool used for transcriptome sequencing, showing RNA masses (%) from distinct test conditions. This RNA pool included balanced RNA masses (%) from 40 distinct exposure conditions: twelve chemicals, twelve temperature treatments, five for soil moisture treatments, three pHs, four developmental stages, and four control conditions.
Sequencing results and assembly statistics (using Newbler) for transcriptome sequences
| Number of reads1 | 1,478,792 | 1,351,016 | 1,273,500 |
| Number of bases | 619,758,522 | 569,108,879 | 663,968,932 |
| Number of sequences | 87,686 | 27,296 | 24,748 |
| Number of bases | 34,418,269 | 28,992,517 | 31,886,676 |
| Average sequence size (bp) | 392. | 1,062 | 1,288 |
| Length of N50 sequence (bp)2 | 475 | 1,354 | 1,443 |
| Largest sequence size (bp) | 1,195 | 6,693 | 7,689 |
1After quality filtering steps and removal of outliers such as adaptor sequences and repeats.
2N50 is a weighted median statistic, such that 50% of all bases are contained in sequences not inferior to N50 length.
Comparison of annotation success between contigs and singletons
| Mean ORF length | | |
| Contigs | 349 | 197 |
| Singletons | 226 | 145 |
| Mean length | | |
| Contigs | 1,324 | 874 |
| Singletons | 447 | 393 |
| Containing start codon (%) | | |
| Contigs | 84.8 | 68.5 |
| Singletons | 67.7 | 49.8 |
| Mean GC-content (%) | | |
| Contigs | 43.4 | 37.9 |
| Singletons | 46.9 | 38.6 |
Figure 2Distribution of the nucleotide lengths of open reading frames for annotated and unannotated contigs. Annotated contigs are shown in black and unannotated contigs in grey. Open reading frame lengths were predicted with ORFpredictor.
Summary of transcriptome annotation
| No. sequences | Total | 27296 | 87686 | ||
| | BLASTx match | 13,745 | (50%) | 25,347 | (29%) |
| | With GO terms | 12,133 | (44%) | 21,890 | (25%) |
| | With mapped GO terms | 9,401 | (34%) | 16,619 | (19%) |
| | With enzyme codes | 2,525 | (9%) | 4,253 | (5%) |
| No. GO terms for… | Biological processes | 24,577 | 42,989 | ||
| | Molecular functions | 17,484 | 30,356 | ||
| | Cellular components | 12,996 | 20,726 | ||
| No. Orthologs | orthoMCL | 4,875 | 11,278 | ||
Figure 3Distribution of BLASTx top-hit species forsingletons and contigs. The BLASTx top-hit species for a given sequence is the best-hit for a given BLASTx result among all Blast results. The best alignment is the one with the highest sequence similarity and the lowest e-value. Blue represents BLASTx hits using singletons; green represents BLASTx hits using contigs.
Figure 4Microarray hybridization data comparing zinc-exposed worms to control worms. A: Contig intensities vs singleton intensities. Blue represents sequences below 500 bp, red represents sequences above 500 bp. B: volcano plot. X-axis, log2 differential expression ratio, Log odds of differential expression significance (Log(p)-Log(1-p)). Red line represents the Log odds ratio for FDR corrected p = 0.05.
GO term enrichment analysis from zinc exposure
| GO:0006614 | SRP-dependent cotranslational protein targeting to membrane | P | 4.78E-05 | 8 | 64 |
| GO:0006613 | Cotranslational protein targeting to membrane | P | 4.78E-05 | 8 | 67 |
| GO:0045047 | Protein targeting to ER | P | 4.90E-05 | 8 | 74 |
| GO:0072599 | Establishment of protein localization in endoplasmic reticulum | P | 4.90E-05 | 8 | 74 |
| GO:0070972 | Protein localization in endoplasmic reticulum | P | 3.21E-04 | 8 | 99 |
| GO:0006612 | Protein targeting to membrane | P | 4.38E-04 | 9 | 149 |
| GO:0022625 | Cytosolic large ribosomal subunit | C | 2.05E-03 | 6 | 56 |
| GO:0019058 | Viral infectious cycle | P | 2.32E-03 | 8 | 140 |
| GO:0006415 | Translational termination | P | 2.32E-03 | 6 | 60 |
| GO:0072594 | Establishment of protein localization to organelle | P | 2.73E-03 | 8 | 147 |
| GO:0022415 | Viral reproductive process | P | 3.97E-03 | 8 | 157 |
| GO:0000184 | Nuclear-transcribed mRNA catabolic process, nonsense-mediated decay | P | 1.22E-02 | 6 | 86 |
| GO:0016032 | Viral reproduction | P | 1.49E-02 | 9 | 257 |
| GO:0015934 | Large ribosomal subunit | C | 1.59E-02 | 6 | 93 |
| GO:0019083 | Viral transcription | P | 1.85E-02 | 6 | 98 |
| GO:0019080 | Viral genome expression | P | 1.85E-02 | 6 | 98 |
| GO:0043624 | Cellular protein complex disassembly | P | 2.52E-02 | 6 | 105 |
| GO:0000956 | Nuclear-transcribed mRNA catabolic process, nonsense-mediated decay | P | 2.75E-02 | 7 | 162 |
| GO:0022626 | Cytosolic ribosome | C | 2.75E-02 | 6 | 109 |
| GO:0043241 | Protein complex disassembly | P | 3.32E-02 | 6 | 114 |
| GO:0006402 | mRNA catabolic process | P | 4.18E-02 | 7 | 177 |
1False discovery rate corrected significant p-value (derived from Fisher exact test); 2number of significantly regulated transcripts containing the particular GO ID; 3number of transcripts on the array containing the particular GO ID.