Literature DB >> 26486607

Comparative RNA seq analysis of the New Zealand glowworm Arachnocampa luminosa reveals bioluminescence-related genes.

Miriam L Sharpe1, Peter K Dearden2, Gregory Gimenez3, Kurt L Krause4.   

Abstract

BACKGROUND: The New Zealand glowworm is the larva of a carnivorous fungus gnat that produces bioluminescence to attract prey. The bioluminescent system of the glowworm is evolutionarily distinct from other well-characterised systems, especially that of the fireflies, and the molecules involved have not yet been identified. We have used high throughput sequencing technology to produce a transcriptome for the glowworm and identify transcripts encoding proteins that are likely to be involved in glowworm bioluminescence.
RESULTS: Here we report the sequencing and annotation of the first transcriptome of the glowworm, and a differential analysis of expression from the glowworm light organ compared with non-light organ tissue. The analysis identified six transcripts encoding proteins that are potentially involved in glowworm bioluminescence. Three of these proteins are members of the ANL superfamily of adenylating enzymes, with similar amino acid sequences to that of the luciferase enzyme found in fireflies (31 to 37 % identical), and are candidate luciferases for the glowworm bioluminescent system. The remaining three transcripts encode putative aminoacylase, phosphatidylethanolamine-binding and glutathione S-transferase proteins.
CONCLUSIONS: This research provides a basis for further biochemical studies into how the glowworm produces light, and a source of genetic information to aid future ecological and evolutionary studies of the glowworm.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26486607      PMCID: PMC4617951          DOI: 10.1186/s12864-015-2006-2

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

In caves and forested river gorges across New Zealand, an abundance of star-like lights can be seen when it is dark. The creature responsible for this light, known locally as a glowworm, is the carnivorous larva of a fungus gnat, Arachnocampa luminosa. The larva lies in a mucous hammock, hanging long sticky silk threads below it like fishing lines, and luminesces from a small light organ located at the end of its tail. Small flying insects are attracted by the light, become entangled in the sticky lines, and are then consumed by the glowworm [1, 2]. Despite their common name, glowworms are actually a type of fly (order Diptera, family Keroplatidae) [3]. Confusingly, some firefly larvae and adults are also referred to as glowworms, but the well-studied bioluminescent beetles (including fireflies, click beetles and railroad worms) are members of a different order – Coeloptera (superfamily Elateroidea) [4]. Diptera and Coleoptera diverged about 330 million years ago, with no known bioluminescent species intervening [5-7], which indicates that bioluminescence evolved independently in these insects. The ability to bioluminesce has evolved many times. One recent estimate suggests that bioluminescence has evolved at least 40 times across extant organisms, possibly more than 50 times, when counting the number of distinct light-producing chemical mechanisms across monophyletic lineages [6]. Despite their differences and separate evolutionary origins, all bioluminescent systems that have been studied produce light by oxidation of a light-emitting substrate (generically referred to as a luciferin) catalyzed by an enzyme (a luciferase). Luciferase enzymes have extremely varied structures, mechanisms and substrate specificities [8]. Researchers have studied the biochemistry used by Arachnocampa to produce light [9-11], but many details remain elusive, including the identities of the luciferase and luciferin. Although both the glowworm and firefly systems use ATP to bioluminesce, the chemistries of bioluminescence in the two creatures are distinct. When mixed, the substrate for the firefly system (luciferin) and the glowworm luciferase do not produce light [9], which implies that they use different luciferase enzymes and substrates. The physiology of the glowworm light organ is also unique. It is made up of the swollen distal tips of the four Malpighian tubules [12, 13]. Malpighian tubules are part of the insect excretory system, analogous to the vertebrate kidneys, and are not part of any other insect bioluminescence system [14]. The ability of firefly, bacteria (such as Vibrio harveyi) and sea pansy (Renilla) luciferases and luciferins to produce easily-measured light has led to the use of these systems as tools in biomedical and biological research, for example as genetic reporters, drug screening assays, bioluminescence imaging, and assays for the presence of ATP or calcium [15, 16]. Understanding the molecular basis of light generation in glowworms will not only expand our understanding of how bioluminescence works, but may also lead to novel bioluminescent applications. For example, the glowworm system uses a different luciferin substrate and produces a different bioluminescent spectra maximum than currently used bioluminescent research tools, therefore the glowworm system could be used in conjunction with existing bioluminescent applications, for example, to detect several compounds in one sample or monitor expression of multiple genes simultaneously. One approach to revealing the molecular physiology of glowworm bioluminescence is to sequence the transcriptome of the organism. Transcriptome sequencing is a relatively cheap and easy way of providing genome-wide sequence data for non-model organisms for which no genome sequence data is available [17, 18]. Sequencing on a genome-wide scale is still a new approach for investigating bioluminescence. Although reported high-throughput sequencing of bioluminescent creatures so far include four genomes (the ctenophore Mnemiopsis leidyi [19], various strains of V. fischeri [20-22], the luminous mushroom Mycena chloropho [23], and the European brittle star Amphiura filiformis [24]), and several transcriptomes (the European brittle star [24], cypridinid ostracods or seed shrimp [25], the Oplophorid shrimp [26], the luminous mushroom [23], and the dinoflagellate Lingulodinium polyedrum [27, 28]), none of these studies have reported any detailed analyses specifically looking at the differences in transcripts produced between luminous and non-luminous tissues. It should be noted that a limited transcriptional profiling study has been carried out on A. luminosa [29]. The authors of this study sequenced 537 cDNAs that were constructed from light organ expressed sequence tags using the Sanger method, and did not include a comparison between tissues. The same research group also carried out a small transcriptional survey on a cDNA library from Macrolampis sp2 firefly lanterns [30]. Here we present the first in-depth sequencing of polyadenylated RNAs from the New Zealand glowworm, and a detailed analysis at the transcriptomic level of luminescent versus nonluminescent tissue, through which we have identified six proteins that are likely to be involved in bioluminescence. This research provides a basis for biochemical studies into how the glowworm produces light, and a source of genetic information to aid future ecological and evolutionary studies of the glowworm.

Results and discussion

Experimental plan

We carried out two separate transcriptome sequencing experiments, on biological replicates, using the most appropriate bioinformatic processing and analyses for each approach. We first performed 454 GS-FLX sequencing because there was no genomics information available for the species, and then used Illumina sequencing to validate those results, and to provide greater sequencing depth to support a differential gene expression analysis. RNA was extracted from the light organ and from the rest of the body (non-light organ). The two experiments differed in the use of biological replicates, sequencing platform, and analysis software (Additional file 1: Figure S1).

Sequencing, read cleaning and de novo assembly

454

mRNA was isolated from two samples: one prepared from non-light organ tissue (~200 mg of tissue from eight glowworms) and one prepared from light organ tissue (~420 mg of tissue from 172 glowworms). After cDNA synthesis, 454 GS-FLX sequencing for each library was carried out on one-half of a pico-titer plate. A total of about 1.12 million high quality reads were obtained, amounting to almost 400 Mbp (Table 1; Fig. 1). Reads, including singletons, were merged from both libraries for de novo transcript assembly, which was carried out using CLC Genomics Workbench 5.1, and yielded a reference transcriptome of 18,794 transcripts, with an N50 of 897 (Table 2).
Table 1

Summary statistics for reads from 454 GS-FLX sequencing

Total HQ readsTotal HQ sequence (bp)Average read length% Mixed% Dots
Light organ tissue559 773192 760 67134416.1311.25
Non-light organ tissue564 835194 425 59434412.0711.37
Combined1 124 608387 186 26534414.0911.31

HQ = high quality; % Mixed = percentage of reads filtered out by the mixed filter, where a mixed read is the result of simultaneously sequencing a mixture of different DNA molecules; % Dots = percentage of reads filtered by the dots filter, where a dot is an instance of three successive nucleotide flows that record no incorporation

Fig. 1

Distribution of read lengths from 454 sequencing of light organ (blue) and non-light organ (red) cDNA libraries

Table 2

De novo assembly statistics

InputAssembly softwareNumber of transcriptsNumber of bases assembledN50 (bp)Number of singleton readsMedian contig length (bp)Mean contig length (bp)Maximum contig length (bp)
454 reads (two libraries merged)CLC Genomics Workbench18 79414 257 48689794 75358675911 217
Illumina reads (six libraries merged)Trinity196 766187 289 9211 828NA44795230 278

N50 size = the length such that 50 % of the assembled genome lies in N50 size or greater; NA = not applicable to results from this assembly program

Summary statistics for reads from 454 GS-FLX sequencing HQ = high quality; % Mixed = percentage of reads filtered out by the mixed filter, where a mixed read is the result of simultaneously sequencing a mixture of different DNA molecules; % Dots = percentage of reads filtered by the dots filter, where a dot is an instance of three successive nucleotide flows that record no incorporation Distribution of read lengths from 454 sequencing of light organ (blue) and non-light organ (red) cDNA libraries De novo assembly statistics N50 size = the length such that 50 % of the assembled genome lies in N50 size or greater; NA = not applicable to results from this assembly program

Illumina

We sequenced six cDNA libraries using the Illumina HiSeq-2000 sequencer, each prepared from either the light organ or non-light organ mRNA of three individual glowworms. The increased number of biological replicates in the second experiment provided it with increased statistical power for the subsequent RNA-seq analysis [31]. The Illumina platform also provided us with information on strand origin, i.e. from which of the two DNA strands a given RNA transcript was derived. This information can increase the percentage of alignable reads, thereby improving transcript reconstruction compared with non-strand specific data of known strand origin [32]. The Illumina machine generated 37.1 to 44.7 million pairs of 200 base length paired-end reads for each library (Table 3). In order to improve the quality of the data, we removed adapter sequences, trimmed low quality bases (Q < 20) from both ends of reads and discarded reads less than 50 bases in length. The resulting 29.5 to 35.7 million high quality reads per library (79 to 80 % of total raw reads) were merged together for de novo transcript assembly using the Trinity package, producing a de novo assembly containing 187,289,921 bases and a total of 196,766 transcripts (Table 2). A graph of the contig length distribution highlights the differences in contig size and number between the two assemblies (Fig. 2).
Table 3

Summary statistics for reads from Illumina sequencing

SampleRaw readsTrimmed, quality filtered readsTrimmed, quality filtered reads (% of raw reads)
Glowworm 1 light organ44 776 27235 671 87280 %
Glowworm 1 non-light organ39 450 26831 259 20279 %
Glowworm 2 light organ41 122 41432 769 96280 %
Glowworm 2 non-light organ37 092 45229 526 44480 %
Glowworm 3 light organ40 295 84432 042 38480 %
Glowworm 3 non-light organ39 697 56831 624 49880 %
Combined242 434 818192 894 36280 %
Fig. 2

Distribution of contig lengths for 454/CLC Genomics Workbench (green) and Illumina/Trinity (orange) assemblies

Summary statistics for reads from Illumina sequencing Distribution of contig lengths for 454/CLC Genomics Workbench (green) and Illumina/Trinity (orange) assemblies

Functional annotation and gene ontology of the glowworm transcriptome

In order to provide descriptions of the functions and properties of as many glowworm gene products as possible, BLASTX searches were performed for each transcript above 1 kb in size from the Illumina/Trinity transcriptome assembly against the Drosophila RefSeq non-redundant database at the National Centre for Biotechnology Information (NCBI). 38,259 of the 55,997 transcripts matched to a known protein within the database with a score of E < 10−6). 32 % of the transcripts (17,738) did not have a BLAST result. Some of these sequences may not have a homolog in Drosophila; others may be from non-coding RNA sequences that are polyadenylated. Some sequences may be unmatched due to assembly errors. We used Blast2GO to assign Gene Ontology (GO) terms to transcripts with BLASTX matches. 34,332 transcripts were assigned GO terms (Fig. 3).
Fig. 3

Gene ontology annotations. Pictured are the top ten GO terms for each of the three GO categories

Gene ontology annotations. Pictured are the top ten GO terms for each of the three GO categories We also carried out an analysis to find out which metabolic processes predominate in the light organ. First of all we mapped all of the Illumina light organ reads back to the full Illumina transcriptome, then used an FDR adjusted enrichment test for the light organ over the whole transcriptome with 10 % as the cutoff (so that transcripts with less than 10 % of their length mapped by fragments were counted as not expressed). We used Blast2GO to assign GO terms to this subset of light organ transcripts. The results showed that there are a large range of metabolic processes occurring in the light organ, although none that could be confidently linked to its bioluminescent function (see Additional file 2: Table S1).

Read mapping, measurement of gene expression and differential expression analyses

For each experiment we assembled a database of transcripts de novo, since there is no reference genome available for Arachnocampa. 90.78 and 87.41 % of reads from the 454 light organ and non-light organ libraries, respectively, uniquely mapped to transcripts in the 454/CLC Genomics Workbench assembly. We normalized expression values for each sample by scaling so that the median values were made equal. A comparison of normalized expression values provided us with a list of 34 transcripts found at ≥ 10-fold higher levels in the light organ sample than in the rest of the glowworm body sample (Table 4). In addition, 4616 different transcripts were expressed in the light organ sample, but not in the rest of the glowworm body sample. The top 30 of these, ranked according to normalised expression levels, are listed in Table 5.
Table 4

Differential expression analysis for 454 data: transcripts expressed ≥ 10-fold more highly in glowworm light organ tissue than in non-light organ tissue

Light organRest of body
Contig numberGene length (bp)Total gene readsNormalized expression valuesRPKMTotal gene readsNormalized expression valuesRPKMDifference (normalized values)Fold change (normalized values)Putative identity from BLASTX search hits
166567119552387.22876.327.45.92379.8322.6phosphatidylethanolamine-binding protein
12303183217021651.21989.591310.31638.2127.0luciferin 4-monooxygenase
1019547825889622.511594.21477.461.69545.1124.3luciferin 4-monooxygenase
1344114521301415929.319193.387158.4126.015770.9100.6luciferin 4-monooxygenase
122831279446619.8746.736.24.9613.6100.0glutathione s-transferase
46371047273463.4558.425.14.0458.390.9sulfotransferase
123251830256924953006.22130.324.12464.782.3luciferin 4-monooxygenase
23771303339462.4557.148.16.5454.357.1aminoacylase-1
268420246557.168.811.31.055.843.9otopetrin
175527465131222.21472.6828.422.51193.843.0cleavage stimulation factor 64 kDa subunit
1188521984939.647.711.21.038.433.0ATP-binding cassette transporter
300812064972.287.012.21.77032.8GST-containing flywch zinc-finger protein
1626716474245.354.611.61.343.728.3protein lsm14 homolog b-like
12345961293541.9652.9719.315.3522.628.1carboxylesterase
266892872137.9166.125.74.5132.224.2tpa_inf: hdc07468
123101030452779.9939.71333.426.5746.523.4carboxylesterase
44142035138120.5145.245.24.1115.323.2sodium-dependent phosphate transporter
43767043280.897.313.83.07721.3carbonic anhydrase
1194813632937.845.611.91.535.919.9facr2_drome ame: full = fatty acyl- reductase cg8303
1672714983035.642.911.81.433.819.8short-chain dehydrogenase
1674711272844.253.212.31.941.919.2cofilin actin-depolymerizing factor homolog
33161969211922.811.31.117.714.6isoform a
980210352237.845.512.62.035.214.5kynurenine aminotransferase
966517821918.922.811.51.217.412.6transposable element tc3 transposase
116951953565161.434.13.246.912.4protein yellow
122231142365667.524.63.751.412.2ganglioside-induced differentiation-associated protein 1
1001026718119.8144.419.97.9109.912.1hypothetical protein Sulku_2095
326314101721.425.811.91.519.511.3multiple inositol polyphosphate phosphatase
49858511633.440.313.12.530.310.8ubiquitin fusion degradaton protein
135095893296.6116.3297.187.610.7kda midgut protein
163721493161922.911.81.417.210.6cytochrome p450
47317003178.794.827.66.071.110.4isoform b
70538171532.639.313.22.629.410.2No significant similarity found
166799604583.3100.438.36.67510.0hypothetical conserved protein

Transcripts are ranked according to normalized expression values; read counts were scaled so that the median values were made equal. RPKM = reads per kilobase of exon model per million mapped reads

Table 5

Differential expression analysis for 454 data: top 30 transcripts expressed in glowworm light organ tissue but not in non-light organ tissue

Contig numberGene length (bp)Total gene readsNormalized expression valuesRPKMPutative identity from BLASTX search hits
134434273471444.31740.2No significant similarity found
1041430464374.2450.8luciferin 4-monooxygenase
23921330156208.5251.2heat shock protein 70
1344856565204.5246.4sugar transporter sweet1-like isoform 2
1337581682178.6215.2No significant similarity found
1496714210125.2150.8No significant similarity found
9421515124149.4No significant similarity found
860684104.5126.0No significant similarity found
441610315493.1112.2No significant similarity found
1017878491.1109.8No significant similarity found
10298119689.6108.0hypothetical protein Sulku_2095
149575042588.2106.2No significant similarity found
161142787.6105.6No significant similarity found
17729168884.6102.0No significant similarity found
70685483.6100.8No significant similarity found
10061152781.898.6troponin i
14955181878.694.6No significant similarity found
287916007077.893.7protein maelstrom homolog
8070376.291.8No significant similarity found
14930118575.390.7No significant similarity found
1579949272.587.4No significant similarity found
1025274372.186.8No significant similarity found
7300228970.284.5No significant similarity found
10131104468.482.4No significant similarity found
145355264.677.9No significant similarity found
1507183364.277.4No significant similarity found
14973194764.177.3No significant similarity found
9201761.974.6No significant similarity found
15207173661.674.3No significant similarity found
788117460.873.2No significant similarity found

Transcripts areranked according to normalized expression values; read counts were scaled so that the median values were made equal. RPKM = reads per kilobase of exon model per million mapped reads

Differential expression analysis for 454 data: transcripts expressed ≥ 10-fold more highly in glowworm light organ tissue than in non-light organ tissue Transcripts are ranked according to normalized expression values; read counts were scaled so that the median values were made equal. RPKM = reads per kilobase of exon model per million mapped reads Differential expression analysis for 454 data: top 30 transcripts expressed in glowworm light organ tissue but not in non-light organ tissue Transcripts areranked according to normalized expression values; read counts were scaled so that the median values were made equal. RPKM = reads per kilobase of exon model per million mapped reads We mapped reads for each of the six samples in this experiment onto the Illumina/Trinity reference transcriptome assembly. 91 % of the reads from all six samples were matched to transcripts from the assembly set. Since there were three separate samples for each tissue type, we performed inter-sample normalization, so that cross sample comparison could be carried out without being biased by sequencing depth. We used a TMM method (trimmed-mean of M values) to accommodate the difference in sequencing depth between replicates by finding a scaling factor for each library that minimizes the log-fold changes between the samples. The scaling factor is used to normalise the expression values for each sample. Differential expression analysis was carried out on transcript expression values from all six Illumina-sequenced samples after adjusting for library size. Only six transcripts were considered to be expressed to a significantly higher level in the light organ than in the non-light organ tissue (false discovery rate of < 0.1); these are listed in Table 6.
Table 6

Transcripts from the Illumina sequenced samples that are significantly more highly expressed in the light organ relative to the rest of the body in the glowworm

Transcript numberRankLog2 fold changeLog2 of read count per million P valueFalse discovery ratePutative identity from BLASTX search hits
64201-seq119.9011.881.68E-060.054acyl-CoA synthetase/luciferin 4-monooxygenase
62762210.1515.443.05E-060.054acyl-CoA synthetase/luciferin 4-monooxygenase
6001439.919.093.14E-060.054aminoacylase
5113849.7110.984.75E-060.054phosphatidylethanolamine-binding protein
64201-seq2510.0511.125.13E-060.054acyl-CoA synthetase/luciferin 4-monooxygenase
5676869.8110.635.56E-060.054glutathione S-transferase

A positive value for log2 fold change indicates over-expression in light organ relative to non-light organ tissue

Transcripts from the Illumina sequenced samples that are significantly more highly expressed in the light organ relative to the rest of the body in the glowworm A positive value for log2 fold change indicates over-expression in light organ relative to non-light organ tissue

Differential expression analysis validation

When comparing the sequences of differentially expressed transcripts in the two experiments, it became apparent that the six transcripts from the Illumina experiment were all found to be in the top eight ranked transcripts from the 454 experiment (Table 7), with one transcript (annotated as 62762 in the Illumina experiment) listed twice in the top eight. The close matching of the results from these two separate experiments, despite differences in samples, sequencing platforms and analytical algorithms, effectively validates these results.
Table 7

Common differentially expressed transcripts from 454 and Illumina sequencing analyses

Rank (Illumina)Transcript number (Illumina)Rank (454)Contig number (454)Protein size (kDa)Putative identity from BLASTX search hitsSpeculative function
164201-seq171232558.6acyl-CoA synthetase/luciferin 4-monooxygenasebioluminescence catalysis
2627623,410195, 1344159.0acyl-CoA synthetase/luciferin 4-monooxygenasebioluminescence catalysis
3600148237745.0aminoacylaseprocessing of bioluminescent substrate
45113811665620.1phosphatidylethanolamine-binding proteinATP binding
564201-seq221230358.2acyl-CoA synthetase/luciferin 4-monooxygenasebioluminescence catalysis
65676851228325.0glutathione S-transferasedirectly or indirectly involving glutathione in bioluminescence
Common differentially expressed transcripts from 454 and Illumina sequencing analyses The number of differentially expressed transcripts is small in both analyses. This may be because there are Malpighian tubules in both tissue types: the light organ tissue sample contains the modified Malpighian tubule tips that produce light, and the non-light organ sample contains the remaining non-luminescent parts of the Malpighian tubules. Presuming the modified Malpighian tubule tips retain some of the same functionality as the remainder of the tubules, both samples would share many of the same transcripts.

Functional annotation of genes highly expressed in the light organ

The proteins encoded by these six transcripts are likely to play important roles in the bioluminescence of the glowworm, assuming that the transcript levels are equivalent to protein levels. In order to find out what these roles might be, we searched for annotated sequence homologs in the publically available non redundant Genbank protein sequence database using the BLASTX algorithm. The resulting annotations will need to be confirmed using biochemical investigation of both the native and recombinant forms of the encoded proteins.

64201-seq1, 64201-seq2, 62762

The proteins encoded by these three transcripts all display the signature motifs of the ANL superfamily of adenylating enzymes [33] (Fig. 4). The three main subfamilies in the ANL superfamily include the Acyl-CoA synthetases, the NRPS adenylation domains, and the beetle (firefly) Luciferase enzymes. Despite catalyzing a wide range of different overall reactions, ANL enzymes all use a two-step reaction where the first step is always the activation of a carboxylate substrate with ATP to form an adenylate intermediate. These three glowworm proteins are very similar to firefly luciferase (31 to 37 % identical with luciferase from Photinus pyralis). The glowworm proteins are very similar and two appear to be isoforms. 64201_seq1 and 64201_seq2 are 79 % identical (the Trinity software labelled them with the same number and ‘seq1’ or ‘seq2’ to reflect this), and 62762 is 43 and 45 % identical to 64201_seq2 and 64201_seq1, respectively. The differences between these three proteins do not appear to be due to alternative splicing, since the differences in sequence are scattered throughout the proteins (Fig. 4).
Fig. 4

Alignment of amino acid sequences encoded by transcripts 64201-seq1, 64201-seq2, and 62762. The alignment was carried out using Clustal Omega (http://www.ebi.ac.uk) and visualised using Jalview (http://www.jalview.org). Residues are colored according to conservation of sequence identity (dark blue: 100 % conserved). Black boxes represent positions of ATP-binding motifs conserved throughout the ANL superfamily [33], and red boxes represent luciferin-binding residues from the beetle luciferase [62, 63]. The residue marked with a ‘#’ plays a key role in the firefly luciferase adenylation half reaction, and the residue marked with a ‘*’ plays a key role in the oxidation (light-producing) half reaction [64]

Alignment of amino acid sequences encoded by transcripts 64201-seq1, 64201-seq2, and 62762. The alignment was carried out using Clustal Omega (http://www.ebi.ac.uk) and visualised using Jalview (http://www.jalview.org). Residues are colored according to conservation of sequence identity (dark blue: 100 % conserved). Black boxes represent positions of ATP-binding motifs conserved throughout the ANL superfamily [33], and red boxes represent luciferin-binding residues from the beetle luciferase [62, 63]. The residue marked with a ‘#’ plays a key role in the firefly luciferase adenylation half reaction, and the residue marked with a ‘*’ plays a key role in the oxidation (light-producing) half reaction [64]

60014

The protein encoded by this transcript has 44 % amino acid sequence identity with human aminoacylase-1. Amino acids can be stored with an acyl group attached to their N-terminus, which makes them more stable. Aminoacylase-1 removes the acyl group, making the amino acid available for protein synthesis and other metabolic roles [34, 35], and acts specifically on mercapturic acids (S-conjugates of N-acetyl-L-cysteine) and neutral aliphatic N-acyl-α-amino acids. If the glowworm luciferin is revealed to be a derivative of an amino acid, as it appears to be for the unrelated Siberian luminous earthworm Fridericia heliota [36], it is possible that the glowworm might store the luciferin substrate in a stable acylated form and 60014 could deacetylate the substrate, making it available for the bioluminescent reaction. There is, however, no other evidence for this involvement, and the identity of the luciferin is unclear at this stage.

51138

This transcript encodes a member of the phosphatidylethanolamine-binding protein superfamily. Proteins in this family generally play roles in modulating cellular signaling [37]. At a molecular level they have been found to bind nucleotides, opioids and phosphatidylethanolamine. They can also bind kinases, leading to inhibition or activation of signalling pathways. From this information we can infer that 51138 may be involved in the modulation of a bioluminescence signaling pathway.

56768

This protein is a member of the glutathione S-transferase (GST) family of proteins, which play an important role in insecticide resistance and protection against oxidative stress. Members of this family catalyze the conjugation of reduced glutathione to a variety of exogenous and endogenous hydrophobic electrophiles for the purpose of detoxification [38]. 56768 has closest homology with the Delta class of insect GSTs, and has 45 % identity with a mosquito Delta GST that has DDT dehydrochlorinase activity [39]. Therefore glutathione may play a role in glowworm bioluminescence, either directly or indirectly, although it is unclear at this stage what this role might be.

Evolution of bioluminescence in glowworms

It is interesting that the firefly and glowworm luciferase enzymes belong to the same family of enzymes (assuming that one or more of the three ANL proteins from the light organ are confirmed biochemically to be the glowworm luciferases). In one way it is not unexpected as both are from the same class (Insecta), but because of the evolutionary distance between the glowworm and the firefly, and because the two bioluminescent systems use different substrates [9], it seemed likely that the proteins would differ significantly. However, these differential transcriptomic analyses, and the observation that both the glowworm and the firefly use ATP as a cofactor [10], suggest that the two luciferases may indeed have evolved from the same ancestral non-bioluminescent enzyme. Other explanations for both glowworms and fireflies having a similar luciferase enzyme are unlikely, such as horizontal gene transfer, or the possibility that the insect ancestral to both Coleoptera and Diptera was bioluminescent and passed the bioluminescent gene to both fireflies and glowworms but not to the vast majority of other non-bioluminescent Coleoptera and Diptera that exist today. The independent evolution of the two bioluminescent enzymes from a non-bioluminescent ancestral acyl-CoA synthetase enzyme is most likely, because it has been well established that the beetle luciferases evolved from non-bioluminescent acyl-CoA synthetases [40-42]. In addition, acyl-CoA synthetases from two nonluminous insects, the mealworm Zophobas morio [43] and the fruit fly Drosophila melanogaster [44], have been shown to bioluminesce faintly in the presence of either the firefly luciferin substrate or a synthetic analog of this luciferin. A firefly luciferase-like gene has also been identified from an animal unrelated to insects: the siliceous sponge Suberites domuncula [45]; however, this claim needs to be confirmed [46], especially since the native luciferase protein itself has not yet been isolated from the sponge tissue. We carried out a phylogenetic analysis of the three glowworm luciferase-like proteins along with known firefly luciferases and other luciferase-like proteins from various insects. Notably, the three glowworm proteins are grouped together and are placed closer to proteins from non-bioluminescent dipteran insects (D. melanogaster and A. aegypti) than to firefly luciferases (see Fig. 5 and Table 8).
Fig. 5

Unrooted phylogenetic tree of luciferases and related proteins. Details for each protein are provided in Table 8

Table 8

Details of luciferases and related proteins included in the phylogenetic analysis (Fig. 5)

OrganismName/accession number of proteinFunctionCatalyses bioluminescence with firefly luciferin?Reference
Arachnocampa luminosa (glowworm)64201_seq1candidate luciferaseNot tested-
Arachnocampa luminosa (glowworm)64201_seq2candidate luciferaseNot tested-
Arachnocampa luminosa (glowworm)62762candidate luciferaseNot tested-
Photinus pyralis (North American firefly)luciferase/P08659luciferaseYes[14]
Luciola cruciata (Japanese firefly)luciferase/P13129-1luciferaseYes[14]
Luciola cruciata (Japanese firefly)LcLL1/BAE80728luciferase-like proteinNo[42, 65]
Luciola cruciata (Japanese firefly)LcLL2/BAE80729luciferase-like proteinNo[42, 65]
Drosophila melanogaster (fruit fly)CG6178firefly luciferase homolog (closest homolog to firefly luciferase in D. melanogaster)No[65]
Drosophila melanogaster (fruit fly)pdgy/NP_572988acyl Co-A synthetaseNot tested[66]
Drosophila melanogaster (fruit fly)CG4830predicted acyl-CoA synthetaseNo[65]
Aedes aegypti (mosquito)AAEL000101-PAAMP-dependent CoA ligase homologNot tested-
Tenebrio molitor (mealworm)TmLL-1/BAE95689luciferase-like proteinNo[67]
Tenebrio molitor (mealworm)BAE95690/TM-LL2acyl-CoA synthaseNo[67]
Tenebrio molitor (mealworm)TmLL-3/BAE95691luciferase-like proteinNo[67]
Zophobas morio (giant mealworm)Luc-likeluciferase-like proteinWeak bioluminescence[43]
Unrooted phylogenetic tree of luciferases and related proteins. Details for each protein are provided in Table 8 Details of luciferases and related proteins included in the phylogenetic analysis (Fig. 5) It is uncertain why there are three isoforms of the firefly-luciferase-like protein expressed in the glowworm light organ. The firefly Luciola cruciata expresses three isoforms, where only one has bioluminescent activity [42], so it is possible that the glowworm may also follow a similar pattern of gene duplication. Gene duplication followed by enzyme diversification and the development of novel functions has been a feature previously observed in vertebrate acyl-CoA synthetase genes [47]. There appear to be many duplication events in the evolution of the acyl-CoA synthetase enzyme family in insects. We used tBLASTn to search the total glowworm transcriptome from the Illumina experiment and found that the three glowworm proteins have numerous paralogs: between 65 and 68 transcripts have 20 to 50 % identity with the three protein sequences. In addition, the limited transcriptional profiling studies by Silva et al. mentioned previously found about 1 % of the 537 A. luminosa light organ cDNAs and 2 % of the 572 firefly lantern cDNAs sequenced were members of the acyl-CoA synthetase enzyme family (also known as AMP/CoA-ligases) [29, 30].

Conclusions

This report describes the first high-throughput transcriptome sequencing of the New Zealand glowworm, and the use of comparative RNAseq to identify genes expressed in luminescent tissue that are involved in bioluminescence. Two separate differential expression analyses have identified six genes that are significantly more highly expressed in the light organ than in non-luminescent tissue. These genes encode putative aminoacylase, GST and phosphatidylethanolamine-binding proteins, and, most notably, three proteins that are homologs of firefly luciferase, at least one of which we expect to be the glowworm luciferase. Interestingly, in the Silva et al. study of glowworm light organ cDNAs [29], one of the members of the acyl-CoA synthetase enzyme family sequenced showed 44–45 % identity with railroad worm luciferases, and 2.1 % of the transcripts sequenced were putative GST proteins. This, combined with our research underlines the potential importance of these sequences in glowworm bioluminescence. Further biochemical studies are required to confirm that one or more of the candidate luciferases are able to produce light. These studies should include two approaches: firstly, produce these proteins in a recombinant form and assay them for bioluminescent activity using the native luciferin substrate extracted from the glowworm; secondly, isolate the native luciferase protein(s) from the light organ tissue, using the same assay to track bioluminescent activity, and identify the isolated protein(s) using mass spectrometry and the transcriptome database generated in the current study. If the candidate luciferase(s) is/are confirmed, then this will show that this enzyme has independently evolved the ability to produce light at least twice in extant organisms, in New Zealand glowworms and in fireflies, but with different substrates. Once the identity of the glowworm enzyme has been confirmed, and the chemistry of the glowworm substrate has been revealed, potential applications of the novel glowworm bioluminescence system can be explored.

Methods

Sample collection and RNA extraction

A. luminosa is an endemic species and is of particular interest to the indigenous Maori people of New Zealand. Therefore local Maori were consulted about this research through Te Komiti Rakahau ki Kāi Tahu (the Ngāi Tahu Research Consultation Committee). As glowworms are invertebrates, the approval of an ethics committee was not required. A. luminosa is not protected, and no experiment was performed on living animals. A. luminosa specimens were collected from locations around New Zealand, including Dunedin, Te Anau and Palmerston North. Live glowworms were snap frozen by being placed on foil above dry ice, and then stored until required at −80 °C. Using a razor, light organs were removed from the glowworm bodies while still frozen. Light organ samples contained only white light organ tissue, and non-light organ samples contained the rest of the glowworm body and head (darker tissue) with white light organ tissue removed entirely. Tissues were homogenised with TRIzol® Reagent (Invitrogen) and a glass dounce homogeniser; total RNA was extracted using UltraPure™ (Phenol: Chloroform:Isoamyl Alcohol; Invitrogen), and then further purified using the RNeasy Kit (Qiagen). RNA quantification and integrity assessment were performed for each sample on an RNA chip (Bioanalyzer 2100, Agilent Technologies).

cDNA library construction, sequencing and quality control

Two hundred μg of non-light organ RNA (prepared from ~200 mg of non-light organ tissue from eight glowworms) and 65 μg of light organ RNA (prepared from ~420 mg of light organ tissue from 172 glowworms) were sent to the 454 Life Sciences Sequencing Center (Roche, Branford, Connecticut, USA) for mRNA enrichment, cDNA library construction (including mRNA fragmentation using a zinc chloride solution, cDNA synthesis using random hexamer primers and adaptor ligation) and subsequent 454 sequencing on a Roche GS FLX Titanium Genome Sequencer. Each sample was sequenced on one half of a picotiter plate. Low quality reads were discarded (quality limit of 0.05). We also removed: ambigious nucleotides (maximum of 5 nucleotides allowed), terminal nucleotides (1 each from both the 5′ and 3′ ends), adapter sequences and sequences less than 20 nucleotides in length. Light organ and non-light organ total RNA was prepared separately from three individual glowworms (0.6 to 1.2 μg of RNA for each light organ sample, and 8.8 to 27.0 μg of RNA for each non-light organ sample). The six samples, each with an RNA Integrity Number (RIN) of over 6, were sent to the Otago Genomics and Bioinformatics Facility, where mRNA was isolated using oligo-dT magnetic beads, and cDNA libraries were constructed using the Illumina TruSeq Stranded mRNA Sample Preparation Kit. Sequencing was carried out on an Illumina HiSeq-2000 sequencer, generating 100 bp paired-end reads. Each sample was run on one eighth of a sequencing lane. Adaptor sequences were trimmed from reads using fastq-mcf [48], and bases with low quality phred scores trimmed (cut-off phred score of Q20). Adapter and quality trimmed reads less than 50 nucleotides in length were discarded using the SolexaQA package [49]. Reads were quality assessed using FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc).

De novo assembly

We used the CLC Genomics Workbench v5.1 (http://www.clcbio.com) to de novo-assemble 454 reads from the combined light organ and non-light organ samples into a single, non-redundant contig dataset. Singletons were merged into the contig set. Assembly parameters were set at the program default settings. Reads combined from all light organ and non-light organ samples were assembled using the Trinity software package [50]. Assembly parameters were adjusted for Illumina stranded paired end sequencing (i.e. –left and –right for both R1 and R2 using --SS_lib_type RF for the stranded library type).

Read mapping and measurement of gene expression

We used the CLC Genomics Workbench to map reads from each of the 454 light organ and non-light organ samples back onto the CLC Genomics Workbench/454 assembly dataset. Expression of genes was represented by the abundance of reads uniquely mapped to each contig in the assembly. Abundance was expressed as RPKM (reads per kilobase of exon model per million mapped reads) or normalized expression values (the contig read counts were scaled so that the median values were made equal). For each of the six Illumina samples, reads were mapped onto the Trinity/Illumina assembly using Bowtie 2 [51], and transcript abundance for each sample was estimated using the RSEM package [52]. Per sample relative abundance was estimated using FPKM (fragments per kilobase of transcript per million fragments mapped), EC (expected counts) and TPM (transcripts per million). For cross-sample comparisons, we normalized raw read counts for these samples using the TMM method (trimmed-mean of M values; a weighted trimmed mean of the log expression ratios [53]) as implemented in the edgeR package [54] (http://www.bioconductor.org).

Differential expression analyses

To identify candidate bioluminescence-related genes from our datasets, we compared normalized transcript abundance values between corresponding light organ and non-light organ samples. In order to detect the differentially expressed genes between the two 454-sequenced samples, a two-group differential analysis was performed in CLC Genomics Workbench on normalized expression values. Genes with significantly different levels of expression between light organ and non-light organ samples (a single factor design) were identified using the quantile-adjusted Conditional Maximum Likelihood method (qCML) in the edgeR package [55]. We considered transcripts to be differentially expressed to a significant level at a false discovery rate at < 0.1.

Functional annotation

Transcripts that were expressed at significantly higher levels in light organ compared with non-light organ tissue were annotated by searching the GenBank non-redundant database at the NCBI (http://www.ncbi.nlm.nih.gov) using the BLASTX software [56] with default parameters. Automated annotation of the transcriptome database was carried out using BLASTX searches to the NCBI non-redundant database within the Blast2GO program v2.8.0 (http://www.blast2go.com) [57]. An E-value cut-off of 1E−6 was used to identify similar annotated proteins from which function could be inferred.

Phylogenetic analysis

Multiple alignments were produced using MUSCLE [58, 59]. A phylogenetic analysis was carried out in MrBayes [60], using the WAG model of protein evolution [61] which was found to be the most appropriate after tests with mixed models. Monte-Carlo Markov chains were run for 1 000 000 generations with the initial 25 % of trees discarded as burn-in. The consensus trees produced were visualized with CLC Genomics Workbench.

Availability of supporting data

The raw sequence data from the Illumina experiment was submitted in FASTQ format to the NCBI Sequence Read Archive (SRA) database (accessions: SRR2241413, SRR2283829, SRR2283830, SRR2283831, SRR2283975 and SRR2284246) and are also accessible through the BioProject accession PRJNA290397 (http://www.ncbi.nlm.nih.gov/bioproject/290397). The Illumina/Trinity transcriptome assembly has been deposited at the NCBI Transcriptome Shotgun Assembly (TSA) database and is available at the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI under the accession GDQV00000000.
  55 in total

1.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.

Authors:  S Whelan; N Goldman
Journal:  Mol Biol Evol       Date:  2001-05       Impact factor: 16.240

Review 2.  Firefly luciferase: an adenylate-forming enzyme for multicatalytic functions.

Authors:  Satoshi Inouye
Journal:  Cell Mol Life Sci       Date:  2009-10-27       Impact factor: 9.261

Review 3.  Application of enzyme bioluminescence for medical diagnostics.

Authors:  Ludmila A Frank; Vasilisa V Krasitskaya
Journal:  Adv Biochem Eng Biotechnol       Date:  2014       Impact factor: 2.635

4.  Complete genome sequence of Vibrio fischeri: a symbiotic bacterium with pathogenic congeners.

Authors:  E G Ruby; M Urbanowski; J Campbell; A Dunn; M Faini; R Gunsalus; P Lostroh; C Lupp; J McCann; D Millikan; A Schaefer; E Stabb; A Stevens; K Visick; C Whistler; E P Greenberg
Journal:  Proc Natl Acad Sci U S A       Date:  2005-02-09       Impact factor: 11.205

5.  Two bioluminescent diptera: the North American Orfelia fultoni and the Australian Arachnocampa flava. Similar niche, different bioluminescence systems.

Authors:  Vadim R Viviani; J Woodland Hastings; Thérèse Wilson
Journal:  Photochem Photobiol       Date:  2002-01       Impact factor: 3.421

6.  A mutagenesis study of the putative luciferin binding site residues of firefly luciferase.

Authors:  Bruce R Branchini; Tara L Southworth; Martha H Murtiashaw; Henrik Boije; Sarah E Fleet
Journal:  Biochemistry       Date:  2003-09-09       Impact factor: 3.162

7.  Diversity and history of the long-chain acyl-CoA synthetase (Acsl) gene family in vertebrates.

Authors:  Mónica Lopes-Marques; Isabel Cunha; Maria Armanda Reis-Henriques; Miguel M Santos; L Filipe C Castro
Journal:  BMC Evol Biol       Date:  2013-12-12       Impact factor: 3.260

8.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

9.  Genomic organization, evolution, and expression of photoprotein and opsin genes in Mnemiopsis leidyi: a new view of ctenophore photocytes.

Authors:  Christine E Schnitzler; Kevin Pang; Meghan L Powers; Adam M Reitzel; Joseph F Ryan; David Simmons; Takashi Tada; Morgan Park; Jyoti Gupta; Shelise Y Brooks; Robert W Blakesley; Shozo Yokoyama; Steven Hd Haddock; Mark Q Martindale; Andreas D Baxevanis
Journal:  BMC Biol       Date:  2012-12-21       Impact factor: 7.431

10.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity.

Authors:  Robert C Edgar
Journal:  BMC Bioinformatics       Date:  2004-08-19       Impact factor: 3.169

View more
  9 in total

Review 1.  Oxytocin/vasopressin-like neuropeptide signaling in insects.

Authors:  Edin Muratspahić; Emilie Monjon; Leopold Duerrauer; Stephen M Rogers; Darron A Cullen; Jozef Vanden Broeck; Christian W Gruber
Journal:  Vitam Horm       Date:  2019-10-18       Impact factor: 3.421

2.  Detection of light and vibration modulates bioluminescence intensity in the glowworm, Arachnocampa flava.

Authors:  Rebecca Mills; Julie-Anne Popple; Martin Veidt; David John Merritt
Journal:  J Comp Physiol A Neuroethol Sens Neural Behav Physiol       Date:  2016-02-20       Impact factor: 1.836

3.  RNA-Seq analysis of the blue light-emitting Orfelia fultoni (Diptera: Keroplatidae) suggest photoecological adaptations at the molecular level.

Authors:  Danilo T Amaral; Carl H Johnson; Vadim R Viviani
Journal:  Comp Biochem Physiol Part D Genomics Proteomics       Date:  2021-05-12       Impact factor: 3.306

4.  Mass spectrometry analysis and transcriptome sequencing reveal glowing squid crystal proteins are in the same superfamily as firefly luciferase.

Authors:  Gregory Gimenez; Peter Metcalf; Neil G Paterson; Miriam L Sharpe
Journal:  Sci Rep       Date:  2016-06-09       Impact factor: 4.379

5.  Transcriptome analysis reveals candidate genes involved in luciferin metabolism in Luciola aquatilis (Coleoptera: Lampyridae).

Authors:  Wanwipa Vongsangnak; Pramote Chumnanpuen; Ajaraporn Sriboonlert
Journal:  PeerJ       Date:  2016-10-04       Impact factor: 2.984

6.  Symplectin evolved from multiple duplications in bioluminescent squid.

Authors:  Warren R Francis; Lynne M Christianson; Steven H D Haddock
Journal:  PeerJ       Date:  2017-07-31       Impact factor: 2.984

7.  De novo transcriptome analysis of the excretory tubules of Carausius morosus (Phasmatodea) and possible functions of the midgut 'appendices'.

Authors:  Matan Shelomi
Journal:  PLoS One       Date:  2017-04-06       Impact factor: 3.240

8.  Mycena genomes resolve the evolution of fungal bioluminescence.

Authors:  Huei-Mien Ke; Hsin-Han Lee; Chan-Yi Ivy Lin; Yu-Ching Liu; Min R Lu; Jo-Wei Allison Hsieh; Chiung-Chih Chang; Pei-Hsuan Wu; Meiyeh Jade Lu; Jeng-Yi Li; Gaus Shang; Rita Jui-Hsien Lu; László G Nagy; Pao-Yang Chen; Hsiao-Wei Kao; Isheng Jason Tsai
Journal:  Proc Natl Acad Sci U S A       Date:  2020-11-23       Impact factor: 11.205

9.  New Zealand glowworm (Arachnocampa luminosa) bioluminescence is produced by a firefly-like luciferase but an entirely new luciferin.

Authors:  Oliver C Watkins; Miriam L Sharpe; Nigel B Perry; Kurt L Krause
Journal:  Sci Rep       Date:  2018-02-19       Impact factor: 4.379

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.