Literature DB >> 28500331

Transcriptome sequencing and phylogenetic analysis of four species of luminescent beetles.

Kai Wang1, Wei Hong1, Hengwu Jiao1, Huabin Zhao2.   

Abstract

The evolution of bioluminescence has prompted scientific attention to illuminate phylogenetic relationships of luminescent beetles. However, genomic resources are virtually lacking in rhagophthalmids (Rhagophthalmidae) and their related firefly beetles lampyrids (Lampyridae). Here, we employed the Illumina Hiseq 2000 platform and sequenced the whole-body transcriptomes of the four luminescent beetles: one rhagophthalmid (Rhagophthalmus sp.) and three fireflies (Asymmetricata circumdata, Aquatica ficta, and Pyrocoelia pectoralis). We obtained 55.4, 43.4, 38.6, and 36.7 million clean reads for the four species, respectively. All reads were assembled into contigs from which unigenes were derived. All unigenes were annotated by publicly available databases, and a total of 4325 orthologous genes were identified. Using multiple phylogenetic approaches, our transcriptome data confirmed the distinctiveness of Rhagophthalmidae from Lampyridae, which was also supported by our mitogenome analysis using three newly determined mitogenome sequences and 12 previously published ones. Together, this study is the first report of whole transcriptome sequencing data in Rhagophthalmidae and Lampyridae species, representing a valuable genomic resource for studying the origin and evolution of some remarkable traits in these beetles such as bioluminescence. Moreover, our transcriptome and mitogenome data provide useful phylogenetic information that could be of importance in future studies of phylogenetic inference.

Entities:  

Mesh:

Year:  2017        PMID: 28500331      PMCID: PMC5431921          DOI: 10.1038/s41598-017-01835-9

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Bioluminescence is among the most spectacular features in living organisms, including numerous species of marine fishes, marine invertebrates, terrestrial invertebrates, fungi, bacteria, and protists[1, 2]. The uses of bioluminescence in nature involve a range of vital functions: camouflage, attraction, defense, warning, communication, mimicry, and illumination[2]. The chemical basis of the natural light-producing molecules has been elucidated in the last century[3], permitting discoveries of countless valuable applications in biology and medicine using luciferase-based systems[2]. In addition, industrial designers have been ambitious in utilizing the natural light-producing systems in bioluminescent organisms for decoration and street lighting[4]. Among these bioluminescent organisms, luminescent insects were primarily found in members of the three orders: Collembola (springtails), Diptera (flies), and Coleoptera (beetles)[5]. Within the superfamily Elateroidea of the order Coleoptera, several groups of beetles are able to produce and emit light such as fireflies (Lampyridae), railroad worms (Phengodidae), click beetles (Elateridae), and Rhagophthalmidae[5, 6]. The origin and evolution of bioluminescence in these beetles has prompted a number of studies to illuminate the phylogenetic relationships of these bioluminescent beetles[7-16]. Although both morphological[10–12, 14, 16] and molecular[7–9, 13, 15, 17] features were involved in these studies, molecular evidence has been expected to resolve taxonomic status and phylogenetic relationships that have been contentious based on morphological evidence[18], because the evolutionary history of morphological features is usually complex[18, 19]. Unfortunately, the incongruence of molecular evidence has also been observed in phylogenetic analyses concerning the phylogenetic relationships of these beetles[7–9, 13, 15], possibly because a single or a few genes lack sufficient phylogenetic signals[20]. For instance, Rhagophthalmidae had previously been assigned to the family Phengodidae inferred from morphological data[11]; other studies based on morphological, embryological and molecular evidence had assigned Rhagophthalmidae to be a subfamily or genus in the family Lampyridae[8, 14, 15, 21]; but more recent evidence from both morphological and molecular data had supported the familial status of Rhagophthalmidae[7, 9, 10, 13, 16, 22–24], which is distinct from both Lampyridae and Phengodidae. In addition, the taxonomic studies focusing on Lampyridae and Rhagophthalmidae have been scarce, especially for species distributed in China[15, 25, 26]. The scarcity calls for additional data of taxonomic importance. Phylogenomics, i.e. phylogenetic analysis involving genome-scale data, has been believed to outcompete single-gene phylogenetics, which frequently yielded conflicting results caused by stochastic errors from small-scale data sets[20]. However, genomic resources are extremely limited in Rhagophthalmidae and its related beetles, which may prevent their phylogenetic relationships from in-depth investigations. To provide novel genomic resources from Rhagophthalmidae and its related firefly beetles (Lampyridae), we employed the Illumina Hiseq 2000 platform and sequenced the whole-body transcriptomes of the four beetles: one rhagophthalmid beetle (Rhagophthalmus sp.) and three representatives of lampyrid beetles (Asymmetricata circumdata, Aquatica ficta, and Pyrocoelia pectoralis). To validate the phylogenetic inference based on transcriptome data, we employed the traditional Sanger sequencing to obtain two complete mitogenomes (Aquatica ficta and A. wuhana) and one nearly complete mitogenome (Lamprigera yunnana); we also identified the 13 protein coding genes (PCGs) in mitogenomes from each of the four whole-body transcriptomes (Supplementary Table S1); and we additionally retrieved 12 published complete mitogenomes from members of Rhagophthalmidae, Lampyridae, Phengodidae, Elateridae, Lycidae, Cantharidae, and Tenebrionidae (see the species names and GenBank accessions in Materials and Methods). Our transcriptome analysis recognized the distinctiveness of Rhagophthalmidae from Lampyridae, which was also supported by our mitogenome analysis.

Results

Transcriptome sequencing and assembly

Total RNA was isolated from each of the frozen whole-bodies of beetles. mRNA was purified from the total RNA using oligo-dT attached magnetic beads. Following mRNA fragmentation, cDNA synthesis, end repair, 3’ adenylation, adaptor ligation, and PCR enrichment, four RNA sequencing (RNA-Seq) libraries were constructed using the Illumina TruSeq RNA sample preparation kit. The raw sequence data were filtered by removing adaptors, low-quality reads, and ambiguous reads. We ultimately obtained 55.4, 43.4, 38.6, and 36.7 million clean reads for Rhagophthalmus sp., A. circumdata, A. ficta, and P. pectoralis, respectively (Table 1). These clean reads were separately assembled into 44883, 57254, 71424, and 80017 contigs, in which 15418, 24275, 31520, and 31356 unigenes (i.e. unique putative genes) were derived, respectively (Table 1). It appears that the transcriptome data in greater size tend to have less assembled contigs or unigenes. For example, the largest data (Rhagophthalmus sp.) contains the least contigs, while the smallest data (P.pectoralis) consists of the most contigs (Table 1). Details of clean reads, assembled contigs and unigenes of the four transcriptomes generated in this study were given in Table 1. In addition, we compared the 13 PCGs in A. ficta mitogenome identified from the Illumina-based transcriptome assembly with the same genes obtained from the traditional Sanger sequencing for another individual of the same species, and found that a total of 99.3% sequences from both sequencing approaches are identical. This finding suggests that our transcriptome assemblies are reliable and our sequencing coverages are acceptable.
Table 1

Statistics of clean reads, assembled contigs and unigenes of the four transcriptomes generated in this study. Abbreviation: nt, nucleotide(s).

Species Rhagophthalmus sp.A. circumdata A. ficta P. pectoralis
Clean reads
 Number of reads55,374,19443,437,50438,642,10236,663,932
 Number of bases (nt)5,496,107,2763,991,132,6863,909,904,6303,532,502,346
 Mean length (nt)99969696
Contigs
 Number44,88357,25471,42480,017
 Mean length (nt)1,4611,0871,1661,283
 N50 statistics (nt)2,9812,0162,4612,622
Unigenes
 Number15,41824,27531,52031,356
 Mean length (nt)1,451958838923
 N50 statistics (nt)2,1971,5561,4371,690
Statistics of clean reads, assembled contigs and unigenes of the four transcriptomes generated in this study. Abbreviation: nt, nucleotide(s).

Functional annotation of unigenes

A total of 10345 (percentage of all unigenes in a species: 67.10%), 15352 (63.24%), 17480 (55.46%),and 15342 (48.93%) unigenes of Rhagophthalmus sp., A. circumdata, A. ficta, and P. pectoralis were respectively annotated by at least one of the publicly available databases: Swiss-prot, NCBI non-redundant protein (NR), Clusters of Orthologous Groups of proteins (COG), and Gene Ontology (GO) (Table 2, Supplementary Dataset S1). We used all unigenes as query and ran the TBLASTX program to search against the Swiss-prot, NR, COG, and GO databases with an E-value of 1.0E-5. For example, we identified 10322 best blast hits in the NR database when annotating the transcriptome of Rhagophthalmus sp., comprising 66.95% of all unigenes (Table 2); while only 4845 best hits were detected in the GO database for the same species, covering 31.42% of all unigenes (Table 2). Indeed, the NR database generally annotated the highest percentages of unigenes, whereas the GO database typically annotated the lowest (Table 2). Overall, Rhagophthalmus sp. has a higher percentage of unigenes being annotated compared to the other three beetles (Table 2), possibly because a greater size of sequencing data can give a better assembly for the Rhagophthalmus beetle (Table 1). In addition, the distributions of both GO terms and COG classes were similar among the four transcriptomes, except A. circumdata showing a unique expansion in a single GO term (Fig. 1).
Table 2

Summary of unigene annotations for the four transcriptomes.

Database Rhagophthalmus sp. A. circumdata A. ficta P. pectoralis
Swiss-prot8,447 (54.79%)11,171 (46.02%)12,042 (38.20%)11,263 (35.92%)
NR10,322 (66.95%)15,315 (63.09%)17,432 (55.30%)15,304 (48.81%)
GO4,845 (31.42%)6,403 (26.38%)7,235 (22.95%)6,428 (20.50%)
COG9,027 (58.55%)12,197 (52.25%)13,357 (42.38%)12,415 (39.59%)
Total10,345 (67.10%)15,352 (63.24%)17,480 (55.46%)15,342 (48.93%)

Percentages of all unigenes in a given species were shown in parentheses. Abbreviations: NR, NCBI non-redundant proteins; COG, Clusters of Orthologous Groups of proteins; GO, Gene Ontology.

Figure 1

Functional classification of unigenes derived from the four transcriptomes based on the GO (Gene Ontology) annotation (A) and the COG (Cluster of Orthologous Groups) annotation (B).

Summary of unigene annotations for the four transcriptomes. Percentages of all unigenes in a given species were shown in parentheses. Abbreviations: NR, NCBI non-redundant proteins; COG, Clusters of Orthologous Groups of proteins; GO, Gene Ontology. Functional classification of unigenes derived from the four transcriptomes based on the GO (Gene Ontology) annotation (A) and the COG (Cluster of Orthologous Groups) annotation (B).

Ortholog identification

Genes from the reference genome of the red flour beetle Tribolium castaneum were respectively compared with all unigenes from each of the four transcriptome assemblies using the TBLASTX searches[27]. By examining reciprocal (or bi-directional) best blast hits, putative orthologs were identified. If the BLAST score ratio of the second best-hit to the first best-hit is greater than 0.8, we excluded the putative ortholog for further analysis, because the candidate ortholog is possibly a paralog due to the high level of sequence similarity. The resulting putative orthologs range from 6992 to 7612 (Table 2). Each ortholog from the red flour beetle and the four beetles with transcriptome data was aligned by PRANK version 100802[28], and poorly aligned positions and divergent regions in alignments were filtered by GBLOCKS version 0.91b[29]. After discarding the alignments with an aligned region shorter than 100 nt (nucleotides), we obtained 4325 putative orthologs with high-quality alignments for subsequent analyses.

Mitogenome sequencing

Using the traditional Sanger sequencing approach, we sequenced two complete mitogenomes (A. ficta and A. wuhana) and one nearly complete mitogenome (L. yunnana) (Table 3). We were unable to sequence the region between ND2 and 12S rRNA in L. yunnana despite multiple attempts (Table 3), possibly because this region contains the A+T-rich region which may pose technical issues in sequencing[30]. All the 13 protein coding genes (PCGs) from each of the three mitogenomes were complete except the ND2 in L. yunnana lacking a segment near its 5’ end (Table 3). All the 13 PCGs, 2 rRNAs, 22 transfer RNAs (tRNAs) and 1 A-T-rich region common to the vast majority of animal mitogenomes[31] were identified in the three newly sequenced mitogenomes, except for the three tRNAs (tRNATrp, tRNACys, tRNATyr) and the A-T-rich region in L. yunnana (Table 3). Arrangements and orientations of all genes in the three mitogenomes are identical to other beetles[25, 32–34]. Similar to other firefly beetles, all the PCGs employed traditional mitochondrial start codons ATN and terminated with TAA, TAG or single T (Table 3).
Table 3

Annotations of the three newly sequenced mitochondrial genomes.

GeneDirection Aquatica wuhana Aquatica ficta Lamprigera yunnana
FromToStart/stop codonFromToStart/stop codonFromToStart/stop codon
tRNA Ile Forward164ATA/TAA163ATA/TAGn.a./TAA
tRNA Gln F6213061129
tRNA Met Reverse130195129194
ND2 F196120919512081* 878
tRNA Trp F1211127512101274882942
tRNA Cys R134214041340140211431203
tRNA Tyr R140414661402146512031265
COI F14383003ATT/TAA14583002ATT/TAA12372802ATT/TAA
tRNA Leu F299930622988306127982861
COII F30643742ATG/T+tRNA30633741ATG/T+tRNA28353540ATA/T+tRNA
tRNA Lys F374338133742381235413610
tRNA Asp F381338763812387436103675
ATP8 F38774032ATT/TAA38754030ATT/TAA36853828ATA/TAA
ATP6 F40264700ATG/TAA40244698ATG/TAA38224485ATG/T+tRNA
COIII F47005483ATG/T+tRNA46985481ATG/T+tRNA44865266ATG/T+tRNA
tRNA Gly F548455465482554552675328
ND3 F55475900ATA/TAG55465899ATT/TAG53355682ATA/TAG
tRNA Ala F589959625898596156815744
tRNA Arg F596260265961602557435805
tRNA Asn F602660906025609058045865
tRNA Ser F609161576091615758585920
tRNA Glu F615862206158622159215983
tRNA Phe R621962816220628359826041
ND5 R62827989ATT/T+tRNA62847994ATA/T+tRNA60397749ATT/T+tRNA
tRNA His R799080527992805477507810
ND4 R80539379ATG/T+tRNA80559381ATG/T+tRNA78109135ATG/TAG
ND4L R93739663ATG/TAA93759665ATG/TAA91299416ATG/TAA
tRNA Thr F966597279667972894189480
tRNA Pro R972797909729979494819544
ND6 F979210277ATA/TAA979610284ATA/TAA954610049ATT/TAA
CYTB F1027711410ATG/TAG1028411417ATG/TAG1005011177ATG/TAG
tRNA Ser F114091147411416114811117611240
ND1 R1147912437ATT/TAG1149912443ATT/TAG1125712207ATG/TAG
tRNA Leu R124451250612451125131220912269
16SrRNA R125071377312514137761227013525
tRNA Val R137741384213777138461352513592
12SrRNA R138431458813847145961358814138*
A+T-rich region14589161861459716836n.a.

Incomplete sequenced region was indicated with an asterisk. Abbreviation: n.a., not available (due to incomplete sequencing).

Annotations of the three newly sequenced mitochondrial genomes. Incomplete sequenced region was indicated with an asterisk. Abbreviation: n.a., not available (due to incomplete sequencing).

Phylogenetic analysis based on nuclear gene sequences

Phylogenetic analysis was first undertaken with transcriptome-derived nuclear gene sequences. Given that Lampyridae was divided into two monophyletic clades, with one clade consisting of Lampyrinae and the other clade comprising Cyphonocerinae, Ototetrinae, Luciolinae[13], we selected one species P. pectoralis (Lampyrinae) from the first clade and two species A. ficta (Luciolinae) and A. circumdata (Luciolinae) from the second clade to perform whole-body transcriptome sequencing, in addition to the species Rhagophthalmus sp. The resulting transcriptome data were assembled and annotated, and a total of 4325 putative orthologs were identified after careful filtering of sequencing artifacts. A data set is considered to be mutationally saturated when multiple substitutions are dominating. Substitution saturation decreases phylogenetic information, and a highly saturated data set will produce an incorrect phylogeny[20]. To assess the impact of substitution saturation, we randomly selected 100 nuclear orthologs and performed substitution saturation tests. Results showed that the third codon sites of most nuclear orthologs have undergone substantial substitution saturation (Supplementary Table S2), we thus only used the first and second codon sites of the 4325 nuclear orthologs (Supplementary Dataset S2) for subsequent phylogenetic analysis with the concatenation method[35]. Based on the 1854774-nt concatenated alignment of nuclear orthologs from the four species with transcriptome data and the red flour beetle with reference genome data, we used both Maximum Likelihood (ML) and Bayesian Inference (BI) approaches to reconstruct phylogenetic trees. Both maximum likelihood bootstrap values and Bayesian inference posterior probability values (shown as percentages) were 100% at all nodes of the phylogenetic tree (Fig. 2). Specifically, both species from Luciolinae were first clustered into a sister clade, which next grouped with the single species from Lampyrinae (Fig. 2). All the three firefly beetles (Lampyridae) formed a monophyletic group, and the single species from Rhagophthalmidae appeared to locate outside of Lampyridae (Fig. 2).
Figure 2

Phylogenetic relationships between Rhagophthalmidae and Lampyridae inferred from the concatenated 4325 nuclear gene markers with the third codon positions removed, Tribolium castaneum was chosen to be the outgroup. Numbers at each node are the ML bootstrap values/Bayesian posterior probabilities/MP-EST bootstrap values, shown as percentages.

Phylogenetic relationships between Rhagophthalmidae and Lampyridae inferred from the concatenated 4325 nuclear gene markers with the third codon positions removed, Tribolium castaneum was chosen to be the outgroup. Numbers at each node are the ML bootstrap values/Bayesian posterior probabilities/MP-EST bootstrap values, shown as percentages. To reduce the influence of gene tree heterogeneity, we also undertook phylogenetic analysis with the coalescent model using the MP-EST method[36]. Based on the alignment of each orthologous gene with all three codon positions, the ML gene trees with 100 bootstrap replicates were respectively reconstructed by PhyML version 3.0[37] using the best-fit models generated by jModelTest version 2.1.4[38]. Because poorly supported individual gene trees could reduce phylogenetic signals in inferring species tree[39, 40], only 2,555 genes with average bootstrap support values greater than 70% were used for MP-EST tree reconstruction. Results showed that the tree topology and bootstrap values of the MP-EST tree using the coalescent method were identical to those of the ML and BI trees reconstructed using the concatenation method (Fig. 2). For comparison, we additionally reconstructed a phylogeny using a bioluminescence related gene. Because bioluminescence in beetles is produced by catalyzing the oxidation of luciferin by luciferase[41], the luciferase gene is preferable for this analysis. We used the luciferase gene of Lampyris noctiluca (Genbank accession: X89479) as the query to search against the assembled transcripts of each transcriptome using TBLASTN. The best hit from each transcriptome was selected as a candidate, and candidate luciferase genes of the four beetles were searched against NR (NCBI non-redundant proteins) to ensure accuracy in gene identification. Taking the luciferase gene of Tribolium castaneum as the outgroup, we generated an alignment using the five luciferase genes, and phylogenetic trees were built using similar approaches as described elsewhere[19, 42–47]. The resulting tree topologies (Supplementary Fig. S1) were identical to those inferred from 4325 genes (Fig. 2).

Phylogenetic analysis based on mitochondrial gene sequences

Phylogenetic analysis was also conducted with mitochondrial gene sequences. Firstly, we retrieved published mitogenomes from 12 species of related beetles: Rhagophthalmus ohbai (Rhagophthalmidae), Rhagophthalmus lufengensis (Rhagophthalmidae), Pyrocoelia rufa (Lampyridae), Luciola cruciate (Lampyridae), Luciola substriata (Lampyridae), Aquatica leii (Lampyridae), Brasilocerus sp.2 (Phengodidae), Phrixotrix hirtus (Phengodidae), Pyrophorus divergens (Elateridae), Merolycus dentipes (Lycidae), Chauliognathus opacus (Cantharidae), and Tribolium castaneum (Tenebrionidae) (Fig. 3). Secondly, we newly determined mitogenomes with the traditional Sanger sequencing from three firefly beetles: A. ficta (Lampyridae), A. wuhana (Lampyridae) and L. yunnana (Lampyridae) (Table 3). Thirdly, we identified mitochondrial gene sequences from our newly dertermined transcriptome assemblies of four beetles: Rhagophthalmus sp. (Rhagophthalmidae), A. ficta (Lampyridae), A. circumdata (Lampyridae), and P. pectoralis (Lampyridae) (Supplementary Table S1). The firefly A. ficta has sequence data from both Sanger and Illumina sequencing, but the transcriptome-derived mitogenome sequence is incomplete (Supplementary Table S1), we thus selected the Sanger-based mitogenome sequence (Fig. 3 and Table 3) for further analysis. In total, our dataset of mitogenomes contained 18 beetles. In addition to the outgroup species T. castaneum, six families of beetles were included: three rhagophthalmids, nine lampyrids, two phengodids, one elaterid, one lycid, and one cantharid (Fig. 3).
Figure 3

Phylogenetic relationships between Rhagophthalmidae and its related beetle families inferred from the concatenated 13 mitochondrial protein coding genes with the third codon positions removed. Numbers at nodes are the ML bootstrap values/Bayesian posterior probabilities, shown as percentages. Species shown in blue have mitochondrial genes generated from Sanger sequencing, while species in red have mitochondrial genes identified from Illumina-sequenced transcriptomes.

Phylogenetic relationships between Rhagophthalmidae and its related beetle families inferred from the concatenated 13 mitochondrial protein coding genes with the third codon positions removed. Numbers at nodes are the ML bootstrap values/Bayesian posterior probabilities, shown as percentages. Species shown in blue have mitochondrial genes generated from Sanger sequencing, while species in red have mitochondrial genes identified from Illumina-sequenced transcriptomes. The 13 mitochondrial PCGs from each of the 18 beetle species were aligned and concatenated. Substitution saturation analysis revealed that the third codon sites of all examined mitochondrial PCGs have experienced substantial substitution saturation, while the first and second codon sites have not (Supplementary Table S3), we thus just used the first and second codon sites of the 13 mitochondrial genes for subsequent phylogenetic analysis with the concatenation method (Supplementary Dataset S3). The concatenated alignment of the 13 mitochondrial PCGs is 7382 nt in length. Both ML and BI phylogenetic analyses recovered identical phylogenetic trees (Fig. 3). Our trees showed that Rhagophthalmus sp. clustered with two other rhagophthalmids with published mitogenomes, and the three rhagophthalmids formed a monophyletic group with a strong support of 100% in both ML and BI analyses (Fig. 3), confirming that our sampled Rhagophthalmus beetle indeed belongs to Rhagophthalmidae phylogenetically. As depicted from the trees, Rhagophthalmidae was first allied with Phengodidae as sister groups, and Rhagophthalmidae and Phengodidae formed a monophyletic group with Lampyridae (Fig. 3), confirming that Rhagophthalmidae is not a subgroup within Lampyridae. The long branches in Phengodidae species would not affect our phylogenetic analysis, because we recovered similar trees after removing the two phengodids (Supplementary Fig. S2). We did not perform the coalescent method in the phylogenetic analysis based on mitogenomes, because mitochondrial genes exhibit limited incongruence and the impact of mitochondrial gene tree heterogeneity has been believed to be minimal[48].

Discussion

In this study, we present the first whole transcriptome shotgun sequencing data in Rhagophthalmidae and Lampyridae species using massive parallel mRNA sequencing (RNA-seq), providing valuable genome resources for studying the evolution of intriguing traits in these beetles such as bioluminescence. We also newly determined two complete and one nearly complete mitogenome sequences in Lampyridae species. Using various phylogenetic approaches, our transcriptome and mitogenome data unambiguously demonstrate Rhagophthalmidae being distinct from Lampyridae and reject previous hypotheses supporting Rhagophthalmidae as a subgroup within Lampyridae. Our molecular phylogenetic study supports the familial status of Rhagophthalmidae. The phylogenetic position of Rhagophthalmidae was controversial due to conflicting results inferred from molecular and morphological data. For example, Rhagophthalmidae had been assigned to the family Phengodidae inferred from morphological data[11]; Rhagophthalmidae had been considered to be a subfamily or genus in the family Lampyridae based on morphological, embryological and molecular evidence[8, 12, 14, 15]. However, the overwhelming molecular evidence has consistently recognized the distinctiveness of Rhagophthalmidae, which appeared to be different from both Lampyridae and Phengodidae[7, 9, 13, 17, 22–24, 26, 49, 50]. As a result, the familial status of Rhagophthalmidae has been well established. In this study, we used 4325 nuclear genes derived from transcriptome data and all 13 mitochondrial protein coding genes to conduct phylogenetic inference. Our analyses based on various analytical approaches and several datasets consistently recognized the distinctiveness of Rhagophthalmidae (Figs 2 and 3). Although our study did not involve multiple individuals or populations from each species, intraspecific genetic variation should not impact the phylogenetic resolution at the family or subfamily level. Although our samples are limited due to the lack of genetic material, we included taxa that may represent basal lineages. More importantly, our phylogenetic resolution of Rhagophthalmidae remained consistent after using different analytical approaches and multiple data sets (Figs 2 and 3). In support of our findings, morphological evidence has identified the distinctiveness of Rhagophthalmidae from Lampyridae and Phengodidae[10, 16], and molecular evidence has revealed that Rhagophthalmidae is a sister group to Phengodidae[9, 13]. In contrast to our findings, two molecular studies argued that Rhagophthalmidae should be placed within Lampyridae[8, 15]. However, this argument was inferred from a short 16S rRNA gene segment with a length of 506-nt[8, 15] and cannot be supported by multi-locus data sets[9, 13]. Although broad sampling of taxa is important in molecular phylogenetic studies[8, 15], sufficient number of gene loci is also required to build a reliable phylogenetic tree[39, 51]. According to our mitogenome analysis, the three luminescent beetle families (Rhagophthalmidae, Lampyridae, and Phengodidae) formed a monophyletic group (Fig. 3), suggesting a single origin of bioluminescence in the common ancestor of these beetles. This finding is consistent with a recent phylogenetic analysis based on mitogenomes[9] but inconsistent with another study based on a combination of mitochondrial and nuclear genes[13]. The contrasting results between mitochondrial and nuclear genes call for an in-depth study by adding more taxa and more genome-scale data, which will help reconstruct a reliable phylogeny and make a conclusive inference on the evolution of bioluminescence in beetles. Phylogenomics refers to phylogenetic analysis involving genome-scale data. Phylogenomics has been believed to outcompete single-gene phylogenetics, which frequently yielded conflicting results caused by stochastic errors from small-scale data sets[20, 51]. However, systematic errors are still present after adding more data[52]. There are three major challenges that could generate strong incongruence in phylogenomic analysis: substitution saturation, nucleotide compositional bias, and different tree reconstruction methods[20]. We took the following steps to overcome these challenges. First, we undertook substitution saturation tests and removed the third codon positions of all genes that may have undergone substantial substitution saturation. Both nuclear and mitochondrial gene data sets with the first and second codon positions consistently supported Rhagophthalmidae to be an independent group that is distinct from Lampyridae (Figs 2 and 3). Second, we used the deduced protein sequences to reduce nucleotide compositional bias. Our result showed that the BI tree topologies inferred from protein sequences (Supplementary Fig. S3) were identical to those inferred from nucleotide sequences (Figs 2 and 3), suggesting that nucleotide compositional bias is not a major factor affecting our phylogenetic analysis. Third, we used both ML and BI methods with nuclear and mitochondrial gene data sets, and identified no incongruence between the two tree reconstruction methods (Figs 2 and 3). In addition, given the prevalence of gene tree heterogeneity[53-55], we also conducted phylogenetic analysis with the coalescent method, which has been proved to generate accurate and congruent phylogenies in the presence of heterogeneous gene trees[56, 57]. The coalescent method implemented in the MP-EST program[36] was applied to our transcriptome-derived nuclear gene data set of 2555 putative orthologous genes, and yielded a phylogenetic tree identical to the concatenation-based ML and BI trees (Fig. 2). Indeed, after examining the proportions of 14 possible topologies inferred from the 2555 orthologous genes, we found that most genes (75.7%) supported the topology shown in Fig. 2 and the average proportion of other 13 topologies is 1.9% (Fig. 4 and Supplementary Table S4), suggesting a low level of gene tree heterogeneity among beetles studied in this work. This study proved usefulness of our genome-scale data in phylogenetic analysis, and will help to illuminate the origin and evolution of bioluminescence in Lampyridae and other luminescent beetles.
Figure 4

A histogram depicting the proportions of 14 tree topologies with average bootstrap values above 70%, inferred from 2555 orthologous genes in the MP-EST analysis. Rha, Rhagophthalmus sp.; AC, Asymmetricata circumdata; AF, Aquatica ficta; PP, Pyrocoelia pectoralis.

A histogram depicting the proportions of 14 tree topologies with average bootstrap values above 70%, inferred from 2555 orthologous genes in the MP-EST analysis. Rha, Rhagophthalmus sp.; AC, Asymmetricata circumdata; AF, Aquatica ficta; PP, Pyrocoelia pectoralis.

Materials and Methods

Ethics statement

All beetle species used in this study were sampled in the field. No specific permits were required, and no endangered or protected species were involved. All experiments on the beetles were conformed to the rules and guidelines on animal experimentation in China.

Taxon sampling and DNA extraction

Live larvae or adults of beetles studied here were sampled at five locations in China (Supplementary Table S5). All these individuals were stored at −80 °C after freezing in liquid nitrogen. For mitogenome sequencing, genomic DNAs of the three species (A. ficta, A. wuhana and L. yunnana) were isolated from thoracic muscles of adult individuals with the Qiagen DNeasy kits.

Whole-body transcriptome sequencing (RNA-seq)

Following the manufacturer’s protocol, total RNAs were isolated using Trizol (Invitrogen) from whole bodies of four individuals: one adult individual of Rhagophthalmus sp. and three larva individuals of firefly beetles (A. ficta, A. circumdata, and P. pectoralis). Four paired-end libraries with an insert size of approximately 200 bp were constructed using the Illumina Truseq RNA sample prep kit according to the manufacturer’s protocol. Details of constructing RNA-seq libraries were previously described[19]. All libraries were sequenced commercially to generate paired-end reads of average length 101-bp on the Illumina HiSeq 2000 sequencing platform[58].

De novo assembly and unigene annotation

Sequence quality of each RNA-seq sample was assessed by FastQC version 10.1 (www.bioinformatics.bbsrc.ac.uk/projects/fastqc). Trimmomatic version 0.32[59] was used to trim out low quality sequences, ambiguous sequences and artificial sequences such as residual adaptors and Illumina specific sequences as described elsewhere[60]. Four de novo transcriptome assemblies were constructed by the Trinity program[61] with default settings. Because de novo transcriptome assemblies involve many contigs in a contig cluster, we required a minimum expression filter of one fragment per kilobase of exon per million fragments mapped (FPKM) and filtered the contigs with FPKM value smaller than one; the contig with the highest expression level in a contig cluster was selected to be a unigene for downstream analyses[60]. The resulting unigenes were used for BLASTX searches[27] and unigene annotations based on the NR, Swiss-prot and COG databases, with an E-value cutoff of 1e-5. The GO annotation was conducted by Blast2GO software[62] with default parameters using the NR blast results in XML format.

PCR amplifications and mitogenomes annotation

To amplify the three mitogenomes (A.ficta, A.wuhana and L.yunnana), dozens of primer pairs were designed according to published firefly mitogenome sequences[32-34]. Details of PCR amplification and sequencing procedure were described in our previous study[33]. We defined the 13 PCGs of the three species by multiple sequence alignments with related species using MEGA version 5.20[63]. The tRNAs were identified by tRNAscan-SE[64]. The published mitogenomes used here were downloaded from the GenBank database (http://www.ncbi.nlm.nih.gov/genbank) under accession numbers as follows: Tribolium castaneum, NC_003081; Brasilocerus sp.2., KJ938490; Pyrophorus divergens, NC_009964; Chauliognathus opacus, NC_013576; Rhagophthalmus ohbai, NC_010964; Rhagophthalmus lufengensis, NC_010969; Pyrocoelia rufa, NC_003970; Luciola cruciata, NC_022472; Luciola substriata, NC_027176; Aquatica leii, NC_025276; Phrixotrix hirtus, KM923891; Merolycus dentipes, HQ232815.

Ortholog identification and phylogenomic analysis

The cDNA sequences of the red flour beetle (Tribolium castaneum 3.0 Assembly) were downloaded from BeetleBase[65] (http://www.Beetlebase.org), and compared with the transcriptome-derived unigenes by reciprocal (or bi-directional) TBLASTX searches[27]. We identified putative orthologs by examining reciprocal best blast hits. We discarded putative orthologs with a BLAST score ratio of the second best-hit to the first best-hit greater than 0.8, which can exclude potential paralogs in our phylogenetic analysis. All putative orthologs were aligned by PRANK version 100802[28], and poorly aligned positions and divergent regions were removed by GBLOCKS version 0.91b[29]. In addition, the alignments with an aligned region shorter than 100-nt were discarded. All the 13 mitochondrial PCGs and 100 randomly selected nuclear orthologs were used to examine the substitution saturation by DAMBE[66]. According to the Akaike information criterion (AIC) and Bayesian information criterion (BIC)[67], we ran the jModelTest version 2.1.4 program[38] separately to select the best-fit models of nucleotide substitution for concatenated nuclear and mitochondrial gene alignments after removal of the saturated third codon positions. For the concatenated nuclear genes, RAxML version 7.2.6[68] was used to reconstruct the ML tree under the GTR+GAMMA model with 100 bootstrap replicates, and MrBayes version 3.2.6[69] was applied to reconstruct the BI tree using the recommended GTR+I+G model with 0.5 million generations[45, 47]. To reduce the impact of gene tree heterogeneity, we also undertook phylogenetic analysis with the coalescent model using the MP-EST method[36]. For the concatenated 13 mitochondrial PCGs, the ML tree was reconstructed by RAxML version 7.2.6 with 1000 bootstrap replicates under recommended GTR+GAMMA model, and the concatenated sequence was partitioned by different genes to estimate and optimize individual α-shape parameters, GTR-rates, and base frequencies for each gene. MrBayes version 3.2.6 was used to reconstruct the BI tree with one million generations and the concatenated sequence was partitioned according to different models: HKY+G model for ND4L, GTR+G model for ND1 and ATP6, and GTR+I+G model for the other 10 mitochondrial genes. In addition, we used deduced protein sequences in our phylogenomic analysis, aiming to reduce the impact of nucleotide compositional bias. Briefly, protein sequences of the nuclear and mitochondrial genes were deduced and aligned by MEGA version 5.20[63]. ProtTest version 3.41[70] was applied to select the best-fit model of protein sequence evolution following the Akaike information criterion (AIC). The LG+I+G model was determined to provide the best fit to the concatenated protein sequences of nuclear genes, while the MtRev+I+G model was selected for the concatenated protein sequences of mitochondrial genes. Phylogenetic trees were reconstructed by MrBayes version 3.2.6[69] with as many generations as required.
  55 in total

1.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors:  J Castresana
Journal:  Mol Biol Evol       Date:  2000-04       Impact factor: 16.240

2.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2006-08-23       Impact factor: 6.937

Review 3.  Coalescent methods for estimating phylogenetic trees.

Authors:  Liang Liu; Lili Yu; Laura Kubatko; Dennis K Pearl; Scott V Edwards
Journal:  Mol Phylogenet Evol       Date:  2009-06-06       Impact factor: 4.286

Review 4.  Gene tree discordance, phylogenetic inference and the multispecies coalescent.

Authors:  James H Degnan; Noah A Rosenberg
Journal:  Trends Ecol Evol       Date:  2009-03-21       Impact factor: 17.712

5.  Sequence and organization of complete mitochondrial genome of the firefly, Aquatica leii (Coleoptera: Lampyridae).

Authors:  Hengwu Jiao; Minghui Ding; Huabin Zhao
Journal:  Mitochondrial DNA       Date:  2013-12-02

6.  Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model.

Authors:  Sen Song; Liang Liu; Scott V Edwards; Shaoyuan Wu
Journal:  Proc Natl Acad Sci U S A       Date:  2012-08-28       Impact factor: 11.205

7.  Mitochondrial genomes of two luminous beetles, Rhagophthalmus lufengensis and R. ohbai (Arthropoda, Insecta, Coleoptera).

Authors:  Xueyan Li; Katsunori Ogoh; Nobuyoshi Ohba; Xingcai Liang; Yoshihiro Ohmiya
Journal:  Gene       Date:  2007-01-08       Impact factor: 3.688

8.  Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

Authors:  M J T N Timmermans; S Dodsworth; C L Culverwell; L Bocak; D Ahrens; D T J Littlewood; J Pons; A P Vogler
Journal:  Nucleic Acids Res       Date:  2010-09-28       Impact factor: 16.971

9.  Organization and comparative analysis of the mitochondrial genomes of bioluminescent Elateroidea (Coleoptera: Polyphaga).

Authors:  Danilo T Amaral; Yasuo Mitani; Yoshihiro Ohmiya; Vadim R Viviani
Journal:  Gene       Date:  2016-04-07       Impact factor: 3.688

10.  The utility of the neglected mitochondrial control region for evolutionary studies in lepidoptera (insecta).

Authors:  Marta Vila; Mats Björklund
Journal:  J Mol Evol       Date:  2004-03       Impact factor: 2.395

View more
  15 in total

1.  Long-read sequence assembly of the firefly Pyrocoelia pectoralis genome.

Authors:  Xinhua Fu; Jingjing Li; Yu Tian; Weipeng Quan; Shu Zhang; Qian Liu; Fan Liang; Xinlei Zhu; Liangsheng Zhang; Depeng Wang; Jiang Hu
Journal:  Gigascience       Date:  2017-12-01       Impact factor: 6.524

2.  In silico identification and characterization of a diverse subset of conserved microRNAs in bioenergy crop Arundo donax L.

Authors:  Wuhe Jike; Gaurav Sablok; Giorgio Bertorelle; Mingai Li; Claudio Varotto
Journal:  Sci Rep       Date:  2018-11-12       Impact factor: 4.379

3.  The transcriptome analysis of Protaetia brevitarsis Lewis larvae.

Authors:  Zhongjie Li; Miaomiao Meng; Shasha Li; Bo Deng
Journal:  PLoS One       Date:  2019-03-21       Impact factor: 3.240

4.  Trehalase Gene as a Molecular Signature of Dietary Diversification in Mammals.

Authors:  Hengwu Jiao; Libiao Zhang; Huan-Wang Xie; Nancy B Simmons; Hui Liu; Huabin Zhao
Journal:  Mol Biol Evol       Date:  2019-10-01       Impact factor: 16.240

5.  Convergent reduction of V1R genes in subterranean rodents.

Authors:  Hengwu Jiao; Wei Hong; Eviatar Nevo; Kexin Li; Huabin Zhao
Journal:  BMC Evol Biol       Date:  2019-08-30       Impact factor: 3.260

6.  Reconstruction of insect hormone pathways in an aquatic firefly, Sclerotia aquatilis (Coleoptera: Lampyridae), using RNA-seq.

Authors:  Pornchanan Chanchay; Ajaraporn Sriboonlert; Wanwipa Vongsangnak; Anchana Thancharoen
Journal:  PeerJ       Date:  2019-08-02       Impact factor: 2.984

7.  Complete mitochondrial genome of the Korean endemic firefly, Luciola unmunsana (Coleoptera: Lampyridae).

Authors:  Min Jee Kim; Jeong Sun Park; Iksoo Kim
Journal:  Mitochondrial DNA B Resour       Date:  2020-08-14       Impact factor: 0.658

8.  Transcriptome sequencing and differential gene expression analysis of the schistosome-transmitting snail Oncomelania hupensis inhabiting hilly and marshland regions.

Authors:  Jin-Song Zhao; An-Yun Wang; Hua-Bin Zhao; Yan-Hong Chen
Journal:  Sci Rep       Date:  2017-11-17       Impact factor: 4.379

9.  Genome sequencing of Rhinorhipus Lawrence exposes an early branch of the Coleoptera.

Authors:  Dominik Kusy; Michal Motyka; Carmelo Andujar; Matej Bocek; Michal Masek; Katerina Sklenarova; Filip Kokas; Milada Bocakova; Alfried P Vogler; Ladislav Bocak
Journal:  Front Zool       Date:  2018-05-02       Impact factor: 3.172

10.  Genome sequences identify three families of Coleoptera as morphologically derived click beetles (Elateridae).

Authors:  Dominik Kusy; Michal Motyka; Matej Bocek; Alfried P Vogler; Ladislav Bocak
Journal:  Sci Rep       Date:  2018-11-20       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.