| Literature DB >> 26486607 |
Miriam L Sharpe1, Peter K Dearden2, Gregory Gimenez3, Kurt L Krause4.
Abstract
BACKGROUND: The New Zealand glowworm is the larva of a carnivorous fungus gnat that produces bioluminescence to attract prey. The bioluminescent system of the glowworm is evolutionarily distinct from other well-characterised systems, especially that of the fireflies, and the molecules involved have not yet been identified. We have used high throughput sequencing technology to produce a transcriptome for the glowworm and identify transcripts encoding proteins that are likely to be involved in glowworm bioluminescence.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26486607 PMCID: PMC4617951 DOI: 10.1186/s12864-015-2006-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary statistics for reads from 454 GS-FLX sequencing
| Total HQ reads | Total HQ sequence (bp) | Average read length | % Mixed | % Dots | |
|---|---|---|---|---|---|
| Light organ tissue | 559 773 | 192 760 671 | 344 | 16.13 | 11.25 |
| Non-light organ tissue | 564 835 | 194 425 594 | 344 | 12.07 | 11.37 |
| Combined | 1 124 608 | 387 186 265 | 344 | 14.09 | 11.31 |
HQ = high quality; % Mixed = percentage of reads filtered out by the mixed filter, where a mixed read is the result of simultaneously sequencing a mixture of different DNA molecules; % Dots = percentage of reads filtered by the dots filter, where a dot is an instance of three successive nucleotide flows that record no incorporation
Fig. 1Distribution of read lengths from 454 sequencing of light organ (blue) and non-light organ (red) cDNA libraries
De novo assembly statistics
| Input | Assembly software | Number of transcripts | Number of bases assembled | N50 (bp) | Number of singleton reads | Median contig length (bp) | Mean contig length (bp) | Maximum contig length (bp) |
|---|---|---|---|---|---|---|---|---|
| 454 reads (two libraries merged) | CLC Genomics Workbench | 18 794 | 14 257 486 | 897 | 94 753 | 586 | 759 | 11 217 |
| Illumina reads (six libraries merged) | Trinity | 196 766 | 187 289 921 | 1 828 | NA | 447 | 952 | 30 278 |
N50 size = the length such that 50 % of the assembled genome lies in N50 size or greater; NA = not applicable to results from this assembly program
Summary statistics for reads from Illumina sequencing
| Sample | Raw reads | Trimmed, quality filtered reads | Trimmed, quality filtered reads (% of raw reads) |
|---|---|---|---|
| Glowworm 1 light organ | 44 776 272 | 35 671 872 | 80 % |
| Glowworm 1 non-light organ | 39 450 268 | 31 259 202 | 79 % |
| Glowworm 2 light organ | 41 122 414 | 32 769 962 | 80 % |
| Glowworm 2 non-light organ | 37 092 452 | 29 526 444 | 80 % |
| Glowworm 3 light organ | 40 295 844 | 32 042 384 | 80 % |
| Glowworm 3 non-light organ | 39 697 568 | 31 624 498 | 80 % |
| Combined | 242 434 818 | 192 894 362 | 80 % |
Fig. 2Distribution of contig lengths for 454/CLC Genomics Workbench (green) and Illumina/Trinity (orange) assemblies
Fig. 3Gene ontology annotations. Pictured are the top ten GO terms for each of the three GO categories
Differential expression analysis for 454 data: transcripts expressed ≥ 10-fold more highly in glowworm light organ tissue than in non-light organ tissue
| Light organ | Rest of body | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Contig number | Gene length (bp) | Total gene reads | Normalized expression values | RPKM | Total gene reads | Normalized expression values | RPKM | Difference (normalized values) | Fold change (normalized values) | Putative identity from BLASTX search hits |
| 16656 | 711 | 955 | 2387.2 | 2876.3 | 2 | 7.4 | 5.9 | 2379.8 | 322.6 | phosphatidylethanolamine-binding protein |
| 12303 | 1832 | 1702 | 1651.2 | 1989.5 | 9 | 13 | 10.3 | 1638.2 | 127.0 | luciferin 4-monooxygenase |
| 10195 | 478 | 2588 | 9622.5 | 11594.2 | 14 | 77.4 | 61.6 | 9545.1 | 124.3 | luciferin 4-monooxygenase |
| 13441 | 1452 | 13014 | 15929.3 | 19193.3 | 87 | 158.4 | 126.0 | 15770.9 | 100.6 | luciferin 4-monooxygenase |
| 12283 | 1279 | 446 | 619.8 | 746.7 | 3 | 6.2 | 4.9 | 613.6 | 100.0 | glutathione s-transferase |
| 4637 | 1047 | 273 | 463.4 | 558.4 | 2 | 5.1 | 4.0 | 458.3 | 90.9 | sulfotransferase |
| 12325 | 1830 | 2569 | 2495 | 3006.2 | 21 | 30.3 | 24.1 | 2464.7 | 82.3 | luciferin 4-monooxygenase |
| 2377 | 1303 | 339 | 462.4 | 557.1 | 4 | 8.1 | 6.5 | 454.3 | 57.1 | aminoacylase-1 |
| 2684 | 2024 | 65 | 57.1 | 68.8 | 1 | 1.3 | 1.0 | 55.8 | 43.9 | otopetrin |
| 17552 | 746 | 513 | 1222.2 | 1472.6 | 8 | 28.4 | 22.5 | 1193.8 | 43.0 | cleavage stimulation factor 64 kDa subunit |
| 11885 | 2198 | 49 | 39.6 | 47.7 | 1 | 1.2 | 1.0 | 38.4 | 33.0 | ATP-binding cassette transporter |
| 3008 | 1206 | 49 | 72.2 | 87.0 | 1 | 2.2 | 1.7 | 70 | 32.8 | GST-containing flywch zinc-finger protein |
| 16267 | 1647 | 42 | 45.3 | 54.6 | 1 | 1.6 | 1.3 | 43.7 | 28.3 | protein lsm14 homolog b-like |
| 12345 | 961 | 293 | 541.9 | 652.9 | 7 | 19.3 | 15.3 | 522.6 | 28.1 | carboxylesterase |
| 2668 | 928 | 72 | 137.9 | 166.1 | 2 | 5.7 | 4.5 | 132.2 | 24.2 | tpa_inf: hdc07468 |
| 12310 | 1030 | 452 | 779.9 | 939.7 | 13 | 33.4 | 26.5 | 746.5 | 23.4 | carboxylesterase |
| 4414 | 2035 | 138 | 120.5 | 145.2 | 4 | 5.2 | 4.1 | 115.3 | 23.2 | sodium-dependent phosphate transporter |
| 4376 | 704 | 32 | 80.8 | 97.3 | 1 | 3.8 | 3.0 | 77 | 21.3 | carbonic anhydrase |
| 11948 | 1363 | 29 | 37.8 | 45.6 | 1 | 1.9 | 1.5 | 35.9 | 19.9 | facr2_drome ame: full = fatty acyl- reductase cg8303 |
| 16727 | 1498 | 30 | 35.6 | 42.9 | 1 | 1.8 | 1.4 | 33.8 | 19.8 | short-chain dehydrogenase |
| 16747 | 1127 | 28 | 44.2 | 53.2 | 1 | 2.3 | 1.9 | 41.9 | 19.2 | cofilin actin-depolymerizing factor homolog |
| 3316 | 1969 | 21 | 19 | 22.8 | 1 | 1.3 | 1.1 | 17.7 | 14.6 | isoform a |
| 9802 | 1035 | 22 | 37.8 | 45.5 | 1 | 2.6 | 2.0 | 35.2 | 14.5 | kynurenine aminotransferase |
| 9665 | 1782 | 19 | 18.9 | 22.8 | 1 | 1.5 | 1.2 | 17.4 | 12.6 | transposable element tc3 transposase |
| 11695 | 1953 | 56 | 51 | 61.4 | 3 | 4.1 | 3.2 | 46.9 | 12.4 | protein yellow |
| 12223 | 1142 | 36 | 56 | 67.5 | 2 | 4.6 | 3.7 | 51.4 | 12.2 | ganglioside-induced differentiation-associated protein 1 |
| 10010 | 267 | 18 | 119.8 | 144.4 | 1 | 9.9 | 7.9 | 109.9 | 12.1 | hypothetical protein Sulku_2095 |
| 3263 | 1410 | 17 | 21.4 | 25.8 | 1 | 1.9 | 1.5 | 19.5 | 11.3 | multiple inositol polyphosphate phosphatase |
| 4985 | 851 | 16 | 33.4 | 40.3 | 1 | 3.1 | 2.5 | 30.3 | 10.8 | ubiquitin fusion degradaton protein |
| 13509 | 589 | 32 | 96.6 | 116.3 | 2 | 9 | 7.1 | 87.6 | 10.7 | kda midgut protein |
| 16372 | 1493 | 16 | 19 | 22.9 | 1 | 1.8 | 1.4 | 17.2 | 10.6 | cytochrome p450 |
| 4731 | 700 | 31 | 78.7 | 94.8 | 2 | 7.6 | 6.0 | 71.1 | 10.4 | isoform b |
| 7053 | 817 | 15 | 32.6 | 39.3 | 1 | 3.2 | 2.6 | 29.4 | 10.2 | No significant similarity found |
| 16679 | 960 | 45 | 83.3 | 100.4 | 3 | 8.3 | 6.6 | 75 | 10.0 | hypothetical conserved protein |
Transcripts are ranked according to normalized expression values; read counts were scaled so that the median values were made equal. RPKM = reads per kilobase of exon model per million mapped reads
Differential expression analysis for 454 data: top 30 transcripts expressed in glowworm light organ tissue but not in non-light organ tissue
| Contig number | Gene length (bp) | Total gene reads | Normalized expression values | RPKM | Putative identity from BLASTX search hits |
|---|---|---|---|---|---|
| 13443 | 427 | 347 | 1444.3 | 1740.2 | No significant similarity found |
| 10414 | 304 | 64 | 374.2 | 450.8 | luciferin 4-monooxygenase |
| 2392 | 1330 | 156 | 208.5 | 251.2 | heat shock protein 70 |
| 13448 | 565 | 65 | 204.5 | 246.4 | sugar transporter sweet1-like isoform 2 |
| 13375 | 816 | 82 | 178.6 | 215.2 | No significant similarity found |
| 14967 | 142 | 10 | 125.2 | 150.8 | No significant similarity found |
| 94 | 215 | 15 | 124 | 149.4 | No significant similarity found |
| 860 | 68 | 4 | 104.5 | 126.0 | No significant similarity found |
| 4416 | 1031 | 54 | 93.1 | 112.2 | No significant similarity found |
| 10178 | 78 | 4 | 91.1 | 109.8 | No significant similarity found |
| 10298 | 119 | 6 | 89.6 | 108.0 | hypothetical protein Sulku_2095 |
| 14957 | 504 | 25 | 88.2 | 106.2 | No significant similarity found |
| 161 | 142 | 7 | 87.6 | 105.6 | No significant similarity found |
| 17729 | 168 | 8 | 84.6 | 102.0 | No significant similarity found |
| 706 | 85 | 4 | 83.6 | 100.8 | No significant similarity found |
| 10061 | 152 | 7 | 81.8 | 98.6 | troponin i |
| 14955 | 181 | 8 | 78.6 | 94.6 | No significant similarity found |
| 2879 | 1600 | 70 | 77.8 | 93.7 | protein maelstrom homolog |
| 80 | 70 | 3 | 76.2 | 91.8 | No significant similarity found |
| 14930 | 118 | 5 | 75.3 | 90.7 | No significant similarity found |
| 15799 | 49 | 2 | 72.5 | 87.4 | No significant similarity found |
| 10252 | 74 | 3 | 72.1 | 86.8 | No significant similarity found |
| 7300 | 228 | 9 | 70.2 | 84.5 | No significant similarity found |
| 10131 | 104 | 4 | 68.4 | 82.4 | No significant similarity found |
| 1453 | 55 | 2 | 64.6 | 77.9 | No significant similarity found |
| 15071 | 83 | 3 | 64.2 | 77.4 | No significant similarity found |
| 14973 | 194 | 7 | 64.1 | 77.3 | No significant similarity found |
| 9 | 201 | 7 | 61.9 | 74.6 | No significant similarity found |
| 15207 | 173 | 6 | 61.6 | 74.3 | No significant similarity found |
| 788 | 117 | 4 | 60.8 | 73.2 | No significant similarity found |
Transcripts areranked according to normalized expression values; read counts were scaled so that the median values were made equal. RPKM = reads per kilobase of exon model per million mapped reads
Transcripts from the Illumina sequenced samples that are significantly more highly expressed in the light organ relative to the rest of the body in the glowworm
| Transcript number | Rank | Log2 fold change | Log2 of read count per million |
| False discovery rate | Putative identity from BLASTX search hits |
|---|---|---|---|---|---|---|
| 64201-seq1 | 1 | 9.90 | 11.88 | 1.68E-06 | 0.054 | acyl-CoA synthetase/luciferin 4-monooxygenase |
| 62762 | 2 | 10.15 | 15.44 | 3.05E-06 | 0.054 | acyl-CoA synthetase/luciferin 4-monooxygenase |
| 60014 | 3 | 9.91 | 9.09 | 3.14E-06 | 0.054 | aminoacylase |
| 51138 | 4 | 9.71 | 10.98 | 4.75E-06 | 0.054 | phosphatidylethanolamine-binding protein |
| 64201-seq2 | 5 | 10.05 | 11.12 | 5.13E-06 | 0.054 | acyl-CoA synthetase/luciferin 4-monooxygenase |
| 56768 | 6 | 9.81 | 10.63 | 5.56E-06 | 0.054 | glutathione S-transferase |
A positive value for log2 fold change indicates over-expression in light organ relative to non-light organ tissue
Common differentially expressed transcripts from 454 and Illumina sequencing analyses
| Rank (Illumina) | Transcript number (Illumina) | Rank (454) | Contig number (454) | Protein size (kDa) | Putative identity from BLASTX search hits | Speculative function |
|---|---|---|---|---|---|---|
| 1 | 64201-seq1 | 7 | 12325 | 58.6 | acyl-CoA synthetase/luciferin 4-monooxygenase | bioluminescence catalysis |
| 2 | 62762 | 3,4 | 10195, 13441 | 59.0 | acyl-CoA synthetase/luciferin 4-monooxygenase | bioluminescence catalysis |
| 3 | 60014 | 8 | 2377 | 45.0 | aminoacylase | processing of bioluminescent substrate |
| 4 | 51138 | 1 | 16656 | 20.1 | phosphatidylethanolamine-binding protein | ATP binding |
| 5 | 64201-seq2 | 2 | 12303 | 58.2 | acyl-CoA synthetase/luciferin 4-monooxygenase | bioluminescence catalysis |
| 6 | 56768 | 5 | 12283 | 25.0 | glutathione S-transferase | directly or indirectly involving glutathione in bioluminescence |
Fig. 4Alignment of amino acid sequences encoded by transcripts 64201-seq1, 64201-seq2, and 62762. The alignment was carried out using Clustal Omega (http://www.ebi.ac.uk) and visualised using Jalview (http://www.jalview.org). Residues are colored according to conservation of sequence identity (dark blue: 100 % conserved). Black boxes represent positions of ATP-binding motifs conserved throughout the ANL superfamily [33], and red boxes represent luciferin-binding residues from the beetle luciferase [62, 63]. The residue marked with a ‘#’ plays a key role in the firefly luciferase adenylation half reaction, and the residue marked with a ‘*’ plays a key role in the oxidation (light-producing) half reaction [64]
Fig. 5Unrooted phylogenetic tree of luciferases and related proteins. Details for each protein are provided in Table 8
Details of luciferases and related proteins included in the phylogenetic analysis (Fig. 5)
| Organism | Name/accession number of protein | Function | Catalyses bioluminescence with firefly luciferin? | Reference |
|---|---|---|---|---|
|
| 64201_seq1 | candidate luciferase | Not tested | - |
|
| 64201_seq2 | candidate luciferase | Not tested | - |
|
| 62762 | candidate luciferase | Not tested | - |
|
| luciferase/P08659 | luciferase | Yes | [ |
|
| luciferase/P13129-1 | luciferase | Yes | [ |
| Luciola cruciata (Japanese firefly) | LcLL1/BAE80728 | luciferase-like protein | No | [ |
| Luciola cruciata (Japanese firefly) | LcLL2/BAE80729 | luciferase-like protein | No | [ |
|
| CG6178 | firefly luciferase homolog (closest homolog to firefly luciferase in | No | [ |
|
| pdgy/NP_572988 | acyl Co-A synthetase | Not tested | [ |
|
| CG4830 | predicted acyl-CoA synthetase | No | [ |
|
| AAEL000101-PA | AMP-dependent CoA ligase homolog | Not tested | - |
|
| TmLL-1/BAE95689 | luciferase-like protein | No | [ |
|
| BAE95690/TM-LL2 | acyl-CoA synthase | No | [ |
|
| TmLL-3/BAE95691 | luciferase-like protein | No | [ |
|
| Luc-like | luciferase-like protein | Weak bioluminescence | [ |