Literature DB >> 30508200

The Mitochondrial Genomes of Phytophagous Scarab Beetles and Systematic Implications.

Nan Song1, Hao Zhang2.   

Abstract

In this study, we newly sequenced five mitogenomes of representatives of phytophagous scarab beetles (Coleoptera: Scarabaeidae) by using next-generation sequencing technology. Two species have complete (or nearly complete) mitogenome sequences, namely Popillia mutans Newman (Coleoptera: Scarabaeidae) and Holotrichia oblita Faldermann (Coleoptera: Scarabaeidae). The remaining three species have the partial mitogenomes, and the missing genes are mainly located adjacent to the control region. The complete (or nearly complete) mitogenomes have the same genome structure as most of the existing Scarabaeidae mitogenomes. We conducted phylogenetic analyses together with 24 published mitogenomes of Scarabaeoidea. The results supported a basal split of coprophagous and phytophagous Scarabaeidae. The subfamily Sericinae was recovered as sister to all other phytophagous scarab beetles. All analyses supported a non-monophyletic Melolonthinae, which included two different non-sister clades. The Cetoniinae was recovered as sister to a clade including Rutelinae and Dynastinae. Although the Rutelinae was rendered paraphyletic by Dynastinae in the Bayesian trees inferred under the site-heterogeneous CAT-GTR or CAT-MTART model, discordant patterns were given in some of ML trees estimated using the homogeneous GTR model.

Entities:  

Mesh:

Year:  2018        PMID: 30508200      PMCID: PMC6275328          DOI: 10.1093/jisesa/iey076

Source DB:  PubMed          Journal:  J Insect Sci        ISSN: 1536-2442            Impact factor:   1.857


The scarab beetles comprise a large group of insects classified as the family Scarabaeidae (Insecta: Coleoptera), with over 30,000 species of beetles worldwide. From the point of view of feeding habits, these insects can be divided into coprophagous and phytophagous (including saprophagous) lineages. The species of coprophagous Scarabaeidae are often found in dung and in excrement, which are also called as dung beetles with various nesting behaviors (e.g., the rolling and tunneling behaviors). Due to their significant economic and ecological importance, dung beetles have been one of the most intensively studied coleopteran groups (Nichols et al. 2008, Brown et al. 2010). In comparison, researches on the phytophagous Scarabaeidae are limited. In particular, there remain many controversies on the taxonomy and phylogeny of the phytophagous clade of Scarabaeidae. The phytophagous scarab beetles as an independent lineage have been recognized since Erichson (1847). The majority of phytophagous scarabs are traditionally grouped into four subfamilies such as Melolonthinae, Cetoniinae, Dynastinae, and Rutelinae (Smith 2006, Smith et al. 2006, Eberle et al. 2014). Moreover, several other smaller groups are proposed to be incorporated into this clade (e.g., the Sericinae and Trichiinae, Péringuey 1904, Browne and Scholtz 1998). All phytophagous scarabs are also called as Pleurosticti (Erichson 1847). The monophyly of Pleurosticti is often supported by morphological studies (Ritcher 1958, Balthasar 1963, Browne and Scholtz 1998). Based on the partial nuclear 18S and 28S ribosomal DNA sequences, Smith et al. (2006) recovered the phytophagous scarabs as a monophyletic group, which contained the following groups: Melolonthinae, Cetoniinae, Rutelinae, Dynastinae, Anomalini, and Adoretini. Despite the great progress made in recent years, many issues on the higher-level phylogeny of Scarabaeidae remain to be addressed (Sanmartín and Mmartín-Piera 2003, Smith et al. 2006). Ahrens (2005) recovered the Melolonthinae as a paraphyletic group based on the analysis of morphological characters. Ahrens et al. (2014) retrieved the Melolonthinae as paraphyletic again, based on a multi-locus sequence analysis. On the contrary, other authors considered the Melolonthinae as a separate clade and to be elevated to the rank of family. Cherman and Morón (2014) included the Rutelinae, Dynastinae, and Melolonthinae to form the family Melolonthidae. Moreover, the family status of Melolonthinae was favored by the subsequent study of Cherman et al. (2016). Apart from the monophyly of Melolonthidae, the phylogenetic position of Sericinae was also problematic. Machatschke (1959) recovered the monophyletic Sericinae, including the tribes Diphucephalini, Ablaberini, and Sericini. However, the sericines were more often regarded as a tribe of Melolonthinae in recent literature (Ahrens 2005, Smith 2006, Ahrens et al. 2014). Elucidating the status of subgroups included in the family Scarabaeidae may contribute to the understanding of the evolution of this group, which further improves the conservation strategies of insects and the control measures of associated pests. Thus, it is necessary to investigate the constitution and the phylogeny of Scarabaeidae using additional types of data and increased taxon sampling. The mitochondrial genome (mitogenome) is a useful maker in resolving the phylogeny at different taxonomic levels of Coleoptera (Timmermans et al. 2010, 2016; Coates 2014; Gillett et al. 2014; Crampton-Platt et al. 2015; Breeschoten et al. 2016; Nie et al. 2018). Recent mitochondrial phylogenomic studies have applied the next-generation sequencing (NGS) technologies to reconstruct the insect mitogenomes (e.g., Gillett et al. 2014, Crampton-Platt et al. 2015, Song et al. 2016). Compared with traditional PCR and Sanger sequencing, the approaches implemented by NGS are fast and cost-effective due to their capability to assemble a large number of mitogenomes from mixtures of species samples simultaneously. Herein, we reconstructed five mitogenomes of representatives of the phytophagous scarab beetles from pooled genomic DNA. Combined with existing Scarabaeoidea mitogenomes, we inferred the phylogeny of Scarabaeidae to 1) test the monophyly of Melolonthinae, 2) to examine the position of Sericinae, and 3) to investigate the relationship between Rutelinae and other scarab beetles.

Materials and Methods

Taxon Sampling and DNA Extraction

Taxa were collected from Zhengzhou, Henan province, China. No specific permits were required for the insect specimens collected for this study. The field studies did not involve endangered or protected species. All sequenced insects are common beetle species in China and not included in the ‘List of Protected Animals in China’. Two species of Melolonthinae (i.e., Holotrichia oblita and Holotrichia sp.), two of Sericinae (i.e., Serica orientalis and Serica sp.) and one of Rutelinae (i.e., Popillia mutans) were utilized for DNA extraction and further sequencing. The adult specimens were preserved in 100% ethanol. Total genomic DNA was extracted from the thoracic leg muscle tissue, with the TIANamp Micro DNA Kit (Tiangen Biotech Co., Ltd) following the manufacturer’s protocol. Vouchers are deposited at the Entomological Museum of Henan Agricultural University. DNA concentration was measured by Nucleic acid protein analyzer (Quawell Technology Inc.).

Library Construction and Illumina Sequencing

Five scarab beetle species were respectively added into five different sequencing pools. Besides scarab beetle, the separate pool included other 29 unrelated samples. Following Gillett et al. (2014), each sample in the mixture was devised to have a largely identical DNA concentration (1.5 μg) to maximize the efficiency of genome assembling. The Illumina libraries were prepared with Illumina TruSeqTM DNA Sample Prep Kit (Illumina, San Diego, CA) and insert size of 350 bp. Each library was sequenced on the Illumina HiSeq X Ten platform (Beijing Novogene Bioinformatics Technology Co., Ltd, China) generating 150 bp paired-end reads.

Sequence Quality Control and Assembly

FastQC was used for the quality control checks on raw sequence data (Andrews 2010). All reads were filtered with NGS QC Toolkit using default settings (Patel and Jain 2012). In this step, reads containing adapters and ploy-N, and low-quality reads were removed from raw data. At the same time, Q20, Q30, GC-content, and sequence duplication level of the cleaned data were calculated. All the downstream analyses were based on clean data with high quality (avg. Q20 > 90%, and avg. Q30 > 85%). De novo assembly were performed with IDBA-UD v. 1.1.1 (Peng et al. 2012). The assemblies were constructed using 200 for the setting of minimum size of contig, and an initial k-mer size of 40, an iteration size of 10, and a maximum k-mer size of 90.

Mitogenome Identification and Annotation

The mitochondrial scaffolds were identified by mitochondrial baiting, with local-blasting implemented in BioEdit (Hall 1999) against the genome data assembled by IDBA-UD. The bait gene fragments (i.e., mitochondrial cox1, cytb, and rrnS) were amplified using the universal primers identical to those of Song et al. (2016). The preliminary mitogenome annotation were conducted using the MITOS webserver, under default settings and the invertebrate genetic code for mitochondria (Bernt et al. 2013). The gene boundaries were further checked and refined by alignment with the published mitogenome sequences of Scarabaeidae (see details in Supp. Table S1). To check the quality of the mitogenome sequences assembled, the mappings to the mitochondrial contigs were performed using Geneious R11. The secondary structures of mitochondrial rrnL and rrnS were predicted, mainly with reference to the models of Apis mellifera (Gillespie et al. 2006) and of Drosophila virilism (Cannone et al. 2002), using the method of ‘Comparative sequence analysis’ in the study of Ouvrard et al. (2000).

Multiple Alignment

For the mitochondrial protein-coding genes, we first removed the stop codon of each sequence. Then, the nucleotide matrix of each protein-coding gene was translated into amino acids under the invertebrate mitochondrial genetic code in MEGA 6 (Tamura et al. 2013) and aligned based on their amino acid sequences with Muscle as implemented in MEGA. The amino acid matrix was back-translated into the corresponding nucleotide matrix. For the mitochondrial tRNA and rRNA genes, each of them was aligned using MAFFT (version 7) (Katoh and Standley 2013) under the iterative refinement method incorporating the most accurate local pairwise alignment information (E-INS-i). The resulting alignments were checked in MEGA. Gaps were striped by Gap Strip/Squeeze v2.1.0 with 40% Gap tolerance (http://www.hiv.lanl.gov/content/sequence/GAPSTREEZE/gap.html). Finally, all alignments were concatenated using FASconCAT_v1.0 (Kuck and Meusemann 2010) to construct the full dataset of PCGRNA. To reduce the bias effect introduced by synonymous change on the phylogenetic analysis, protein-coding genes were degenerated by using the Perl script Degen v1.4 to construct the dataset of PCGDegen (Regier et al. 2010, Zwick et al. 2012). The degenerated protein-coding genes were combined with RNA genes to compile the dataset of PCGDegenRNA. Possible sequence saturation at each codon position and at each gene partition of the concatenated alignment was individually examined by DAMBE 5 (Xia 2013). Estimates of nonsynonymous (dN) and synonymous (dS) substitution rates of protein-coding genes were obtained by the method of Yang and Nielsen (Yang and Nielsen 2000) using the program yn00 as implemented in PAML 4.9 (Yang 2007). The Excel 2016 was used to perform one-way analysis of variance (ANOVA) analysis in order to test for significant differences of substitution rates between different lineages, and the significance level was set to be 0.05.

Phylogenetic Analysis

We estimated the phylogeny of Scarabaeidae with two different approaches: maximum likelihood (ML) analysis under homogeneous models and Bayesian analysis under heterogeneous models. The following datasets were utilized in the phylogenetic analysis: 1) PCG: the nucleotide sequences of 13 protein-coding genes; 2) PCG_AA: the deduced amino acid sequences of the protein-coding genes; 3) PCGDegen: the nucleotide sequences of 13 protein-coding genes degenerated by Degen script; 4) PCGRNA: the nucleotide sequences of 13 protein-coding genes combined with RNA genes; and 5) PCGDegenRNA: the sequences of PCGDegen combined with RNA genes. Prior to ML analyses based on the datasets of PCG, PCGRNA and PCG_AA, the PartitionFinder (Lanfear et al. 2012) was run to select the optimal partitioning strategies and best-fitting models for each dataset under the Bayesian Information Criterion (BIC), using a greedy search with RAxML (Stamatakis 2015). ML tree searches were performed in IQ-TREE (Nguyen et al. 2015), with various data partition schemes and best-fitting models determined by PartitionFinder. The detailed information on the partitions and the best models selected are listed in Supp. Table S2. Branch support analysis was conducted using 1,000 ultrafast bootstrap replicates. For the datasets of PCGDegen and PCGDegenRNA, un-partitioned analyses were performed, due to the altered variation of alignments by Degeneracy recoding scheme. The parallel version of PhyloBayes (pb_mpi1.5a implemented on a HP server) was used for Bayesian tree calculation (Lartillot and Philippe 2004, Lartillot et al. 2009). The CAT-GTR model was used for nucleotide analyses, while the CAT-MTART model for amino acids. Two independent chains were run in parallel and started from a random topology. The ‘bpcomp’ program contained in the package of PhyloBayes was used to calculate the largest (maxdiff) and mean (meandiff) discrepancy observed across all bipartitions. The program ‘tracecomp’ was also used to summarize the discrepancies and the effective sizes estimated for each column of the trace file. When the maximum ‘maxdiff’ value was lower than 0.1 and minimum effective size was higher than 100, the Bayesian runs were recognized to be reached good convergence. A consensus tree was calculated from the saved trees by bpcomp program after checking for stationarity, with the default options.

Hypothesis Testing

For all five datasets the alternative hypotheses were tested, using the Shimodaira-Hasegawa (SH) test (Shimodaira and Hasegawa 1999) and the approximately unbiased (AU) test (Shimodaira 2002). The specific hypotheses concerning four phylogenetic issues were assessed: 1) the basal divergence between coprophagous and phytophagous Scarabaeidae, 2) the Sericinae as sister to all other phytophagous taxa, 3) the monophyly of Melolonthinae, 4) the monophyly of Rutelinae. Additionally, we compared the tree topologies resulting from phylogenetic inferences using different datasets under ML and Bayes criteria. The site-log-likelihood values were calculated under the GTR+G model using TREE-PUZZLE (Version 5.3) (Schmidt et al. 2002). And then, the obtained values were used as input for the software CONSEL (Shimodaira and Hasegawa 2001). All constraint likelihood trees were generated by IQ-TREE (Nguyen et al. 2015) using the settings described above.

Results

Mitogenome Assembly

Five Illumina HiSeq runs resulted in 72,395,049, 72,720,510, 74,104,974, 81,772,168, and 89,829,307 raw paired reads, respectively. After filtering, 66,451,916, 67,980,632, 72,278,440, 81,189,879, and 89,122,956 clean paired reads were generated for corresponding libraries. For each species, at least one Sanger sequence was obtained and utilized to identify the potential mitochondrial contig (Supp. Table S3). Local blast searching results showed that each baiting fragment had its best blast match with the targeted mitochondrial contig (i.e., Identities ≥ 99% and E-value = 0.00). In contrast, other contigs assembled had the lower identities and the higher e-values (Supp. Table S3). Moreover, the mitogenome identified was represented by a single large-contig (contig length > 13 kb in most cases). The sequencing coverages corresponding to five mitogenomes ranged from 171-fold to 370-fold.

Mitogenome Organization

The mitogenome organization of five scarab beetle species sequenced are summarized in Table 1. P. mutans and H. oblita had the complete or nearly complete mitogneomes, which contained the full 37 mitochondrial genes and the complete or partial control region. The other three mitogenomes (i.e., Holotrichia sp., S. orientalis and Serica sp.) were partial, and the missing fragments were located in the putative control region and the adjacent regions.
Table 1.

The organization of the newly sequenced mitogenomes

Gene region Popillia mutans Holotrichia oblita Serica sp. Serica orientalis Holotrichia sp.
StartStopStartStopStartStopStartStopStartStop
Partial control region----165716581179
trnI165165658721659723180242
trnQ6913763131719787721789240309
trnM137205131200787855789857319387
nad220612132011208856186085818623881395
trnW1229129912261292186419311884195114091473
trnC1292135412851347192419851944200514661527
trnY1355141913531417198520482005206915271589
cox11412295613832951204735852068360615883123
trnL-UUR2952301629523018358136463602366731303194
cox23017371830193738364743303668435131953896
trnK3705377537043774433244024353442338833951
trnD3778384437743839440344694424448739714034
atp83845400038403995446946244488464340354190
atp63994466839894660461852894637530841844858
cox34668545446605446528960765308609548585645
trnG5455551954475510607661386095615756455707
nad35520587355115864613964926158650957086061
trnA5872593758635928649165566510657560616124
trnR5937600159295993655666206575663961246190
trnN6004606759936056661866836637670261956259
trnS-AGN6068613460576124668467506703676962606326
trnE6135620261256189675168146770683163276391
trnF6201626561886253681368786830689663906456
nad56265798062547972687885936896861164578175
trnH7981804379738037859486578612867581768239
nad48043938080379374865799948675987182398967
nad4l9374966493689658998810278----
trnT96679731966197251028110345----
trnP97329796972697901034610409----
nad69798103019783102951041110908----
cytb103011144310295114371090812050----
trnS-UCN114421150611436115011204912113----
nad1115231247311519124691213213082----
trnL-CUN124751253812471125321308413146----
rrnL125391382712510138041314713866----
trnV13828138971380313872------
rrnS13897146951387314672------
Complete or partial control region14696161921467315968------

Dash (-) indicates no application or the missing genome region.

The organization of the newly sequenced mitogenomes Dash (-) indicates no application or the missing genome region. The complete or nearly complete mitogenomes reconstructed showed the same gene arrangement proposed as ancestral for insects (Cameron 2014). The nucleotide composition of new mitogenome sequences were similar to those published scarab beetle species, with the significant A + T bias in the major strand. For the protein-coding genes, five new mitogenomes used the typical ATN as start codons, except for the nad3 (using ACA) and nad4 (using TGT) in H. oblita. Ten of 13 protein-coding genes terminated with the conventional stop codons TAG or TAA, while the cox2 (in P. mutans and Holotrichia sp.), cox3 (in P. mutans, S. orientalis, Serica sp., and Holotrichia sp.), and nad3 (in S. orientalis) had an incomplete stop codon T. The secondary structures of mitochondrial tRNA genes were predicted. Supp. Fig. S1 illustrates the tRNA secondary structures of the complete mitogenome of P. mutans (Supp. Fig. S1A) and the nearly complete mitogenome of H. oblita (Supp. Fig. S1B). All tRNA genes had the typical clover-leaf structure except for the trnS-AGN. The DHU arm was reduced or missing in the trnS-AGN. For the P. mutans and H. oblita, the complete mitochondrial rRNA genes were identified between trnL-CUN and trnV and between trnV and the control region, with the length of 1,270 or 1,289 bp for rrnL and 799 or 800 bp for rrnS. The secondary structures of mitochondrial rRNA genes were inferred (Fig. 1A and B for rrnL and rrnS of P. mutans, and Supp. Fig. S2A and B for rrnL and rrnS of H. oblita). Because the P. mutans and H. oblita have the basically identical secondary structure predictions for two mitochondrial rRNA genes, the following description will focus on those of P. mutans. The secondary structure of rrnL consisted of five canonical structural domains (I-II, IV-VI) and 44 helices (Fig. 1A and Supp. Fig. S2A). Domain III was absent, which was the same as the published insect mitochondrial rrnL secondary models (Cannone et al. 2002, Gillespie et al. 2006). When compared with other scarab beetles, domains Ⅳ-Ⅵ located in the 3′ end of the rrnL molecule were highly conserved. The three domains contained the helices with more than 75% identical positions across species (e.g., H1755, H1835 and H1906). The mitochondrial rrnS secondary model predicted for P. mutans consisted of 27 helices (Fig. 1B and Supp. Fig. S2B), which formed four typical domains (labeled I, II, III and IV). Across taxa analyzed, domains III and IV were more conserved than domains I and II.
Fig. 1.

(A) The secondary structure of rrnL gene predicted for Popillia mutans. (B) The secondary structure of rrnS gene predicted for Popillia mutans.

(A) The secondary structure of rrnL gene predicted for Popillia mutans. (B) The secondary structure of rrnS gene predicted for Popillia mutans. The full control region located between rrnS and trnI was identified only in the mitogenome of P. mutans, with the length of 1,497 bp and A + T content of 82.6%. There were a series of poly-A or poly-T stretches and [T]n[A]n structures randomly scattered in this region. However, no obviously tandem repeat motifs were found.

Characteristics of Data Matrices

Saturation tests demonstrated that rRNA genes and the third codon positions of protein-coding genes were saturated when assuming an asymmetrical topology (Iss < Iss.cAsym, Supp. Table S4). Evolutionary rate analyses showed that the major lineages analyzed shared the similar dS values (Table 2). In comparison, several outgroup taxa (e.g., Glaresis sp.) and ingroup taxa (e.g., Polyphylla laticollis mandshurica and Cheirotonus jansoni) had the higher dN values, while the Rutelinae and Cetoniinae displayed the lower dN values. The dN/dS values showed the same trend as that of the dN values (P = 0.910). The statistical analyses revealed no significant differences of dS values among the species studied. However, there were significant differences for dN or dN/dS values (P < 0.05), with or without outgroup taxa.
Table 2.

Estimation of synonymous and nonsynonymous substitution rates by yn00 implemented in PAML

ItemSubfamilySpeciesGroup dN dS dN/dS
IngroupAphodiinaeAphodius sp.00.15534.35600.0356
ScarabaeinaeSarophorus sp.10.13044.24170.0307
ScarabaeinaeScarabaeidae sp. BMNH 127475010.13864.29430.0323
ScarabaeinaeScarabaeidae sp. BMNH 127475210.12944.29880.0301
ScarabaeinaeScarabaeidae sp. BMNH 127475310.14234.28960.0332
ScarabaeinaeXinidium sp.10.13904.29690.0324
CetoniinaeLeucocelis sp.20.11464.34890.0264
CetoniinaeMyodermum sp.20.12804.41220.0290
Cetoniinae Osmoderma opicum 20.13724.37910.0313
Cetoniinae Protaetia brevitarsis 20.11794.31000.0274
Dynastinae Cyphonistes vallatus 30.14634.51690.0324
MelolonthinaeAsthenopholis sp.40.12284.50310.0273
Melolonthinae Cheirotonus jansoni 40.15744.42420.0356
Melolonthinae Holotrichia oblita 40.13894.40840.0315
Melolonthinae Holotrichia sp. 40.14024.38530.0320
Melolonthinae Polyphylla laticollis mandshurica 40.16654.44300.0375
Melolonthinae Rhopaea magnicornis 40.12894.36410.0295
MelolonthinaeSchizonycha sp.40.12664.37860.0289
RutelinaeAdoretus sp.50.11434.29310.0266
Rutelinae Popillia mutans 50.12264.35000.0282
RutelinaePopillia sp.50.12584.32270.0291
SericinaePleophylla sp.60.12534.32750.0290
Sericinae Serica orientalis 60.12364.34310.0285
SericinaeSerica sp.60.12574.39920.0286
OutgroupGeotrupidae Anoplotrupes stercorosus 70.14204.27120.0332
GeotrupidaeBolboceratex sp.70.15204.52520.0336
Glaphyridae Glaphyrus comosus 80.14114.32090.0326
GlaresidaeGlaresis sp.90.15984.42480.0361
TrogidaeTrox sp.100.13094.29230.0305
Estimation of synonymous and nonsynonymous substitution rates by yn00 implemented in PAML

ML Analysis

Although results of the ML analyses based on various datasets were similar, the trees differed to a greater extent in the relationships of Melolonthinae and Rutelinae. The monophyly of Scarabaeidae was strongly supported (e.g., BP = 82 in PCGRNA-ML-tree, Fig. 2). The coprophagous scarab beetles comprising the clade Aphodiinae + Scarabaeinae were recovered as the sister group to the phytophagous lineages. The phytophagous clade contained five subfamilies, namely the Sericinae, Melolonthinae, Cetoniinae, Rutelinae, and Dynastinae. In all ML analyses but PCG_AA, the Sericinae was recovered outside of the assemblage of Melolonthinae, and as a sister lineage to all other phytophagous scarabs. In the ML tree from the PCG_AA dataset, the Sericinae was recovered as a sister group to a species from Melolonthinae (i.e., Holotrichia sp.), with weak node support value (BP = 74). The Melolonthinae as a non-monophyletic group was divided into two different non-sister clades.
Fig. 2.

Maximum likelihood tree inferred from the PCGRNA dataset using IQ-TREE under the partitions and best-fitting models selected by PartitionFinder. Branch support values (>70) are presented near each node. Scale bar represents substitutions/site. Asterisks designate the species newly sequenced in this study. The colored lines correspond to the subfamilies recovered.

Maximum likelihood tree inferred from the PCGRNA dataset using IQ-TREE under the partitions and best-fitting models selected by PartitionFinder. Branch support values (>70) are presented near each node. Scale bar represents substitutions/site. Asterisks designate the species newly sequenced in this study. The colored lines correspond to the subfamilies recovered. The following phylogenetic results changed depending on the dataset. 1) For the relationship of melolonthine lineages, Holotrichia sp. formed a sister-group to Cheirotonus jansoni in the analyses from the PCG, PCGRNA, PCGDegen and PCGDegenRNA datasets (Fig. 2 and Fig. 3A, C, and D), whereas Holotrichia sp. was sister to the clade Sericinae in the analysis from the PCG_AA dataset (Fig. 3B); 2) The second major melolonthine clade containing the other five melolonthine species was sister to a large clade including Dynastinae, Rutelinae, and Cetoniinae, the relationships among melolonthine species were different across analyses: in the ML analyses using the PCG, PCGRNA and PCGDegenRNA datasets was Rhopaea magnicornis sister to Polyphylla laticollis mandshurica, and both together sister to Holotrichia oblita (Fig. 2 and Fig. 3A and D), in contrast, in other two analyses (i.e., PCG_AA and PCGDegen) was R. magnicornis sister to a clade P. laticollis mandshurica + H. oblita (Fig. 3B and C); 3) the monophyly of Rutelinae was supported by the PCG, PCG_AA, and PCGRNA datasets, whereas the degenerated datasets (i.e., PCGDegen and PCGDegenRNA) retrieving the C. vallatus (Dynastinae) as sister to Adoretus sp. led to a non-monophyletic Rutelinae (Fig. 3C and D).
Fig. 3.

Maximum likelihood trees inferred from the datasets of (A) PCG, (B) PCG_AA, (C) PCGDegen, and (D) PCGDegenRNA using IQ-TREE. Branch support values are presented near each node. Scale bar represents substitutions/site.

Maximum likelihood trees inferred from the datasets of (A) PCG, (B) PCG_AA, (C) PCGDegen, and (D) PCGDegenRNA using IQ-TREE. Branch support values are presented near each node. Scale bar represents substitutions/site.

Bayesian Analysis

Bayesian tree reconstructions were performed under the site-heterogeneous CAT-GTR or CAT-MTART model (Figs. 4 and 5). The monophyly of Scarabaeidae were recovered in all Bayesian analyses, except for that from the PCGDegenRNA dataset. The nested position of the outgroup Glaphyrus comosus resulted in a non-monophyletic Scarabaeidae in the PCGDegenRNA Bayes tree, but branch support measures indicated that this reconstruction was weakly supported (PP = 0.57). Monophyletic groupings of taxa corresponding to the subfamily level were congruent with the relationships found in the ML topologies. The Aphodiinae was well supported as a sister-group to Scarabaeinae in the coprophagous Scarabaeidae clade, with high posterior probabilities (PP ≥ 0.98). Among the phytophagous lineages, the Sericinae was consistently recovered as sister to the remaining phytophagous lineages. The non-monophyletic Melolonthinae branched next to the Sericinae. The terminal clades still consisted of the Cetoniinae, Rutelinae, and Dynastinae. The subfamily Rutelinae was rendered paraphyletic by an internal clade comprising C. vallatus + Adoretus sp. in all Bayesian analyses but for that from the PCG_AA dataset. The PCG_AA Bayes tree retrieved Dynastinae and Rutelinae to both be monophyletic.
Fig. 4.

Bayesian tree inferred from the PCGRNA dataset using PhyloBayes under the CAT-MTART model. Node numbers show poster probability values (>0.9). Scale bar represents substitutions/site. Asterisks designate the species newly sequenced in this study. The colored lines correspond to the subfamilies recovered.

Fig. 5.

Bayesian trees inferred from the datasets of (A) PCG, (B) PCG_AA, (C) PCGDegen, and (D) PCGDegenRNA using PhyloBayes. Node numbers show poster probability values. Scale bar represents substitutions/site.

Bayesian tree inferred from the PCGRNA dataset using PhyloBayes under the CAT-MTART model. Node numbers show poster probability values (>0.9). Scale bar represents substitutions/site. Asterisks designate the species newly sequenced in this study. The colored lines correspond to the subfamilies recovered.

Alternative Hypothesis Testing

The hypothesis that the basal division of Scarabaeidae into two clades (coprophagous and phytophagous) was strongly supported by the statistical testing with different datasets (Table 3). Moreover, hypothesis testing consistently supported Sericinae as an independent lineage and sister to all other phytophagous lineages. This hypothesis received the highest likelihood values in all analyses but for that from the PCG_AA dataset. Although the monophyly of Melolonthinae cannot be addressed unambiguously, the likelihood value for a tree constraining Melolonthinae as non-monophyletic was often higher than that constraining a monophyletic Melolonthinae. The paraphyly of Rutelinae was rejected by the PCG dataset (P < 0.05 in both SH and AU tests, Table 3), whereas the monophyly of Rutelinae was consistently supported by all testing. Based on five datasets and two inference methods, a total of 10 trees were produced (Figs. 2–5). The results of statistical testing of tree topology indicated that the topology resulting from the partitioned ML analyses of the PCGRNA and PCG datasets (Fig. 2 and 3A) was more likely to present the true tree, with respect to their highest likelihood values (Supp. Table S5). Whereas, some of the Bayesian trees were rejected (e.g., PCGRNA-Bayes-tree, P < 0.05 in all SH or AU tests).
Table 3.

Hypothesis testing: results of the SH and AU tests, with rejection at P < 0.05

Hypothesis testedPCGPCG_AAPCGDegenPCGDegenRNAPCGRNA
Log-likelihood P values forLog-likelihood P values forLog-likelihood P values forLog-likelihood P values forLog-likelihood P values for
SHAUSHAUSHAUSHAUSHAU
Coprophagous + Phytophagous−163937.180.4330.274−66075.520.9070.731−54370.010.7830.459−74014.860.8360.358−194662.250.7430.427
Sericinae + all other phytophagous−163907.950.9150.786−66085.880.4760.285−54365.570.7210.611−74014.860.8650.423−194656.990.9790.796
Melolonthinae as a nonmonophyletic group−163988.420.0400.010−66098.730.1870.138−54389.190.1810.040−74033.710.0950.023−194664.160.6580.468
Melolonthinae as a monophyletic group−163968.760.1120.027−66098.730.1870.138−54393.380.1560.038−74033.710.0950.024−194668.230.5780.179
Cetoniinae + (monophyletic Rutelinae + Dynastinae)−163926.750.6010.349−66075.520.9090.725−54371.560.6320.412−74018.960.4750.353−194680.670.2920.065
Cetoniinae + (paraphyletic Rutelinae + Dynastinae)−163995.820.0310.010−66087.210.3780.132−54370.010.7830.502−74014.860.9480.758−194675.530.3950.254
Hypothesis testing: results of the SH and AU tests, with rejection at P < 0.05 Bayesian trees inferred from the datasets of (A) PCG, (B) PCG_AA, (C) PCGDegen, and (D) PCGDegenRNA using PhyloBayes. Node numbers show poster probability values. Scale bar represents substitutions/site.

Discussion

The present study adds five new mitogenome sequences to the phytophagous scarab beetles, with the goal of investigating the higher-level relationships within Scarabaeidae. Our results demonstrate that NGS techniques adopted here can generate mitogenomes in a more efficient and cost-effective way, and that the mitogenomic data can provide insights into the subfamily relationships of Scarabaeidae. Although there have been numerous studies attempting to reconstruct the evolutionary history of scarab beetles (Howden 1982; Browne and Scholtz 1995, 1996, 1998; Ahrens 2005; Smith et al. 2006; Hunt et al. 2007; Ahrens et al. 2014; Bocak et al. 2014; Eberle et al. 2014), no authors utilized the mitogenomes alone to estimate the phylogeny of Scarabaeidae. This inspired us to perform this analysis to further assess the phylogenetic usefulness of mitogenomes. The backbone relationships recovered by the current mitogenomic data are concordant with previous morphological and molecular studies, namely that the Scarabaeidae comprises the traditionally recognized coprophagous and phytophagous lineages (Browne and Scholtz 1999, Smith et al. 2006, Ahrens et al. 2014, Bocak et al. 2014). The former clade includes the subfamilies Aphodiinae and Scarabaeinae, while the latter consists of Sericinae, Melolonthinae, Rutelinae, Dynastinae, and Cetoniinae. We acknowledge the effect of the sparse taxon sampling of several lineages on the recovery of relationships within this megadiverse insect group. There is an immediate need of an extensive sequencing program of scarab beetles. This article contributes to stimulate interest in phylogenomic study of Scarabaeidae by using whole mitogenome sequences. The monophyly of Rutelinae and the placement of C. jansoni (Melolonthinae) were variable. Two ML analyses based on the degenerated datasets under the homogeneous model and four out of five Bayesian analyses under the site-heterogeneous model supported a paraphyletic Rutelinae. Recovery of the monophyly of Rutelinae depended on whether the single representative of Dynastinae (i.e., C. vallatus) clustered with one exemplar of Rutelinae (i.e., Adoretus sp.) or not. The majority of analyses retrieved a sister-group relationship of C. jansoni and Holotrichia sp., both of which were outside the main clade of Melolonthinae. Three species (i.e., C. jansoni, Holotrichia sp. and C. vallatus) leading to the unstable reconstructions exhibited the obviously long branch lengths, particularly in the Bayesian trees. In the analysis of substitution rates of protein-coding genes, we found the rate heterogeneity in the current data (Table 2). Moreover, three species motioned above had the greatly accelerated rates, compared with other scarab beetles. Long branches associated with rate heterogeneity may result in the incongruence between analyses. With regard to the monophyly of the subfamily Melolonthinae, there has been no consensus opinion among coleopterists. Based on the nuclear gene datasets, Smith et al. (2006) did not provide definitive results on the monophyly of the Melolonthinae and suggested additional phylogenetically informative characters needed to tackle this problem. Machatschke (1959) proposed the monophyly of the subfamily Sericinae, which rendered the paraphyly of the Melolonthinae. This hypothesis was confirmed by another morphological study by Ahrens (2005). In this paper, the Sericinae was consistently recovered as an independent clade and placed as the most basal branching lineage within the phytophagous scarab beetles. Thus, our data supported the point of the elevated status of Sericini to be the subfamily rank (Sericinae). Meanwhile, a non-monophyletic Melolonthinae was supported in all analyses. The sister-group relationship between Rutelinae and Dynastinae was always recovered by previous studies (Howden 1982, Browne and Scholtz 1999, Hunt et al. 2007, Bocak et al. 2014). Smith et al. (2006) indicated that some taxa of the Dynastinae might be moved to the subfamily Rutelinae. In the PCGDegen and PCGDegenRNA ML analyses and in all Bayesian analyses but PCG_AA, the Rutelinae was recovered as a paraphyletic group with respect to the single representative of Dynastinae. Despite this, the preferred trees (i.e., PCG-ML-tree and PCGRNA-ML-tree, see results in Supp. Table S5) supported the monophyly of Rutelinae (BP = 95 in both trees). Future studies should focus on adding whole mitogenome sequences of other underrepresented taxa, especially Dynastinae, in order to advance understanding of the subfamily level relationships of the phytophagous clades. The sister group relationship between the clade comprising Rutelinae and Dynastinae and the subfamily Cetoniinae was strongly supported in all analyses (BP ≥ 88 and PP ≥ 0.98). This arrangement is in line with the result of Bocak et al. (2014), but conflict with Browne and Scholtz (1999) and Hunt et al. (2007). The latter two researches recovered a close affinity of the clade Rutelinae + Dynastinae and the Melolonthinae (Browne and Scholtz 1999, Hunt et al. 2007). Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file. Click here for additional data file.
  33 in total

1.  CONSEL: for assessing the confidence of phylogenetic tree selection.

Authors:  H Shimodaira; M Hasegawa
Journal:  Bioinformatics       Date:  2001-12       Impact factor: 6.937

2.  An approximately unbiased test of phylogenetic tree selection.

Authors:  Hidetoshi Shimodaira
Journal:  Syst Biol       Date:  2002-06       Impact factor: 15.683

3.  Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses.

Authors:  Robert Lanfear; Brett Calcott; Simon Y W Ho; Stephane Guindon
Journal:  Mol Biol Evol       Date:  2012-01-20       Impact factor: 16.240

4.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth.

Authors:  Yu Peng; Henry C M Leung; S M Yiu; Francis Y L Chin
Journal:  Bioinformatics       Date:  2012-04-11       Impact factor: 6.937

5.  FASconCAT: Convenient handling of data matrices.

Authors:  Patrick Kück; Karen Meusemann
Journal:  Mol Phylogenet Evol       Date:  2010-04-21       Impact factor: 4.286

6.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

7.  Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics.

Authors:  M J T N Timmermans; S Dodsworth; C L Culverwell; L Bocak; D Ahrens; D T J Littlewood; J Pons; A P Vogler
Journal:  Nucleic Acids Res       Date:  2010-09-28       Impact factor: 16.971

8.  MITOS: improved de novo metazoan mitochondrial genome annotation.

Authors:  Matthias Bernt; Alexander Donath; Frank Jühling; Fabian Externbrink; Catherine Florentz; Guido Fritzsch; Joern Pütz; Martin Middendorf; Peter F Stadler
Journal:  Mol Phylogenet Evol       Date:  2012-09-07       Impact factor: 4.286

9.  Soup to Tree: The Phylogeny of Beetles Inferred by Mitochondrial Metagenomics of a Bornean Rainforest Sample.

Authors:  Alex Crampton-Platt; Martijn J T N Timmermans; Matthew L Gimmel; Sujatha Narayanan Kutty; Timothy D Cockerill; Chey Vun Khen; Alfried P Vogler
Journal:  Mol Biol Evol       Date:  2015-05-08       Impact factor: 16.240

10.  The evolution of morphospace in phytophagous scarab chafers: no competition--no divergence?

Authors:  Jonas Eberle; Renier Myburgh; Dirk Ahrens
Journal:  PLoS One       Date:  2014-05-29       Impact factor: 3.240

View more
  1 in total

1.  The complete mitochondrial genome sequence of Oryctes rhinoceros (Coleoptera: Scarabaeidae) based on long-read nanopore sequencing.

Authors:  Igor Filipović; James P Hereward; Gordana Rašić; Gregor J Devine; Michael J Furlong; Kayvan Etebari
Journal:  PeerJ       Date:  2021-01-13       Impact factor: 2.984

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.