Literature DB >> 26106582

The complete mitochondrial genome of Papilio glaucus and its phylogenetic implications.

Jinhui Shen1, Qian Cong1, Nick V Grishin2.   

Abstract

Due to the intriguing morphology, lifecycle, and diversity of butterflies and moths, Lepidoptera are emerging as model organisms for the study of genetics, evolution and speciation. The progress of these studies relies on decoding Lepidoptera genomes, both nuclear and mitochondrial. Here we describe a protocol to obtain mitogenomes from Next Generation Sequencing reads performed for whole-genome sequencing and report the complete mitogenome of Papilio (Pterourus) glaucus. The circular mitogenome is 15,306 bp in length and rich in A and T. It contains 13 protein-coding genes (PCGs), 22 transfer-RNA-coding genes (tRNA), and 2 ribosomal-RNA-coding genes (rRNA), with a gene order typical for mitogenomes of Lepidoptera. We performed phylogenetic analyses based on PCG and RNA-coding genes or protein sequences using Bayesian Inference and Maximum Likelihood methods. The phylogenetic trees consistently show that among species with available mitogenomes Papilio glaucus is the closest to Papilio (Agehana) maraho from Asia.

Entities:  

Keywords:  Illumina sequencing; Mitochondrial genome; Papilio glaucus; Phylogeny

Year:  2015        PMID: 26106582      PMCID: PMC4475787          DOI: 10.1016/j.mgene.2015.05.002

Source DB:  PubMed          Journal:  Meta Gene        ISSN: 2214-5400


Introduction

The order Lepidoptera contains approximately 160,000 described and half a million estimated species (Kristensen et al., 2007), and it represents one of the most diverse and fascinating group of insects. Recent studies have revealed their potential as model organisms to study the genetics of interesting phenotypic traits in butterflies, such as the Batesian mimicry in swallowtails (Clarke and Sheppard, 1972, Nishikawa et al., 2013), migration in the monarch (Zhan et al., 2011, Zhan et al., 2014) and wing pattern development in longwings (Hines et al., 2012, Surridge et al., 2011). The profound diversity and the recent evolutionary radiation of Lepidoptera provide rich materials to study evolution, speciation and adaptation (Engsontia et al., 2014, Zhang et al., 2013a). These studies benefit significantly from decoding the genomes of various Lepidoptera species. Recently, we published the draft genome of Eastern Tiger swallowtail Papilio (Pterourus) glaucus using next generation sequencing techniques (Cong et al., 2015). This nuclear whole genome was the first reported from the Papilionidae family. Traditional genome assemblers failed to automatically assemble the mitogenome, probably due to the difficulty in distinguishing NGS reads of the mitogenome from those of nuclear genome, the presence of nuclear copies of mitochondrial (NUMT) DNA (which can lead to conflicts in assembly), and the poor signal-to-noise ratio caused by the high coverage of mitochondrial DNA (Hahn et al., 2013). However, a dedicated effort should be able to assemble the mitogenome from whole-genome sequencing reads. The mitogenome sequences are expected to be useful for phylogenetic studies, and they have been obtained for many species. The Papilionidae family has over 570 species worldwide (C.A. Bridges, 1988), while only 17 species have complete mitogenomes currently available in GenBank (included in Table 1, accessed on: 11/28/2014).
Table 1

List of taxa analyzed in present paper.

FamilySpeciesLength (bp)Accession numberReferences
PapilionidaePapilio glaucus15,306KR822739This study
Papilio bianor15,357NC_018040.1Unpublished
Papilio dardanus15,337JX313686.2Unpublished
Papilio maackii15,357KC433408.1Dong et al. (2013)
Papilio machaon15,185HM243594.1Unpublished
Papilio maraho16,094FJ810212.1Unpublished
Papilio polytes15,256KM014701.1Wang et al. (2014a)
Papilio syfanius15,359KJ396621.1Dong et al. (2014)
Parnassius apollo15,404KF746065.1Chen et al. (2014a)
Parnassius bremeri15,389NC_014053.1Kim et al. (2009)
Parnassius imperator15,424KM507326.1Wang et al. (2014b)
Sericinus montela15,242HQ259122.1Ji et al. (2012)
Luehdorfia taibai15,553KC952673.1Lian-Xi et al. (2014)
Teinopalpus aureus15,242HM563681.1Qin et al. (2012a)
Lamproptera curius15,277KJ141168.1Unpublished
Graphium timur15,226KJ472924.1Chen et al. (2014b)
Atrophaneura alcinous15,266KJ540880.1Chen et al. (2014c)
Troides aeacus15,263EU625344.1Unpublished
LycaenidaeCoreana raphaelis15,314DQ102703.1Kim et al. (2006)
Cupido argiades15,330KC310728.1Zhang et al. (2013b)
Curetis bulis15,162JX262888.1Zhang et al. (2013c)
Lycaena phlaeas15,280JX262887.1Zhang et al. (2013c)
Protantigius superans15,248HQ184265.1Kim et al. (2011a)
Spindasis takanonis15,349HQ184266.1Kim et al. (2011a)
RiodinidaeAbisara fylloides15,301HQ259069.1Unpublished
Apodemia mormo15,262KJ647171.1Kim and Kim (2014)
PieridaeAnthocharis bambusarum15,180KC465748.1Unpublished
Aporia crataegi15,140JN796473.1Park et al. (2012)
Aporia intercostata15,144KC461928.1Unpublished
Catopsilia pomona15,142JX274649.1Hao et al. (2014)
Delias hyparete15,186JX094279.1Shi et al. (2012)
Eurema hecabe15,160KC257480.1Sun et al. (2014)
Hebomoia glaucippe15,701KC489093.1Hao et al. (2013a)
Leptidea morsei15,122JX274648.1Hao et al. (2014)
Artogeia melete15,140EU597124.1Hong et al. (2009)
Pieris rapae15,157NC_015895.1Mao et al. (2010)
HesperiidaeAmpittia dioscorides15,313KM102732.1Unpublished
Carterocephalus silvicola15,765KJ629163.1Kim et al. (2014)
Celaenorrhinus maculosa15,282KF543077.1Wang et al. (2013a)
Choaspes benjaminii15,300KJ629164.1Kim et al. (2014)
Ctenoptilum vasava15,468JF713818.1Hao et al. (2012)
Daimio tethys15,350KJ629165.1Kim et al. (2014)
Erynnis montanus15,530KC659955.1Wang et al. (2014c)
Lobocla bifasciatus15,366KJ629166.1Kim et al. (2014)
Ochlodes venata15,622HM243593.1Unpublished
Potanthus flavus15,267KJ629167.1Kim et al. (2014)
NymphalidaeAbrota ganga15,356KF590536.1Wu et al. (2014)
Acraea issoria15,245GQ376195.1Hu et al. (2010)
Apatura ilia15,242JF437925.1Chen et al. (2012)
Apatura metis15,236JF801742.1Zhang et al. (2012)
Argynnis childreni15,131KF590547.1Wu et al. (2014)
Argynnis hyperbius15,156JF439070.1Wang et al. (2011)
Athyma asura15,181KF590542.1Wu et al. (2014)
Athyma cama15,269KF590526.1Wu et al. (2014)
Athyma kasa15,230KF590524.1Wu et al. (2014)
Athyma opalina15,240KF590551.1Wu et al. (2014)
Athyma perius15,277KF590528.1Wu et al. (2014)
Athyma selenophora15,208KF590529.1Wu et al. (2014)
Athyma sulpitia15,268JQ347260.1Tian et al. (2012)
Bhagadatta austenia15,615KF590545.1Wu et al. (2014)
Calinaga davidis15,267HQ658143.1Xia et al. (2011)
Danaus chrysippus15,236KF690637.1Gan et al. (2014a)
Danaus plexippus15,314KC836923.1Servin-Garciduenas and Martinez-Romero (2014)
Dichorragia nesimachus15,355KF590541.1Wu et al. (2014)
Dophla evelina15,320KF590532.1Wu et al. (2014)
Euploea core15,192KF590546.1Wu et al. (2014)
Euploea midamus15,187KJ866207.1Unpublished
Euploea mulciber15,166HQ378507.1Hao et al. (2013b)
Euthalia irrubescens15,365KF590527.1Wu et al. (2014)
Fabriciana nerippe15,140JF504707.1Kim et al. (2011b)
Hamadryas epinome15,207KM378244.1Cally et al. (2014)
Heliconius cydno15,367KM208636.1Qian (2014)
Heliconius hecale15,338KM068091.1Shen and Wang (2014)
Heliconius melpomene15,328KP100653.1(Heliconius Genome, 2012, Meng et al., 2014)
Heliconius pachinus15,369KM014809.1Huang et al. (2014a)
Hipparchia autonoe15,489GQ868707.1Kim et al. (2010)
Ideopsis similis15,200KJ476729.1Gan et al. (2014b)
Issoria lathonia15,172HM243590.1Unpublished
Junonia almana15,256KF590539.1Wu et al. (2014)
Junonia orithya15,214KF199862.1Shi et al. (2013a)
Kallima inachus15,183JN857943.1Qin et al. (2012b)
Lexias dirtea15,250KF590531.1Wu et al. (2014)
Libythea celtis15,164HQ378508.1Unpublished
Melanargia asiatica15,142KF906486.1Huang et al. (2014b)
Melanitis leda15,122JF905446.1Shi et al. (2013b)
Melanitis phedima15,142KF590538.1Wu et al. (2014)
Melitaea cinxia15,162HM243592.1Unpublished
Neope pulaha15,209KF590543.1Wu et al. (2014)
Neptis philyra15,164KF590552.1Wu et al. (2014)
Neptis soma15,130KF590533.1Wu et al. (2014)
Pandita sinope15,257KF590530.1Wu et al. (2014)
Pantoporia hordonia15,603KF590534.1Wu et al. (2014)
Parantica sita15,211KF590544.1Wu et al. (2014)
Pararge aegeria15,240KJ547676.1Teixeira da Costa (2014)
Parasarpa dudu15,236KF590537.1Wu et al. (2014)
Parthenos sylvia15,249KF590550.1Wu et al. (2014)
Polyura arja15,363KF590540.1Wu et al. (2014)
Sasakia charonda15,233JX119051.1Wang et al. (2012)
Sasakia funebris15,233JX131328.1Wang et al. (2013b)
Tanaecia julii15,316KF590548.1Wu et al. (2014)
Timelaea maculata15,178KC572131.1Cao et al. (2013)
Tirumala limniace15,285KJ784473.1Gan et al. (2014c)
Triphysa phryne15,143KF906487.1Zhang et al. (2014a)
Yoma sabina15,330KF590535.1Wu et al. (2014)
Ypthima akragas15,227KF590553.1Wu et al. (2014)
CossidaeEogystia hippophaecolus15,431KC831443.1Gong et al. (2014)
BombycidaeBombyx mori15,643KM279431.1Zhang et al. (2014b)
HepialidaeThitarodes renzhiensis16,173NC_018094.1Cao et al. (2012)
The insect mitogenome is a circular DNA of 14–19 kilobases (kb), containing 13 protein-coding genes (PCGs), 2 ribosomal-RNA-coding genes (rRNAs), 22 transfer-RNA-coding genes (tRNAs), and an A + T rich displacement loop (D-loop) control region (Cameron, 2014). Because of their maternal inheritance, compact structure, lack of genetic recombination, and relatively fast evolutionary rate, mitogenomes have been used widely in molecular phylogenetics and evolution studies (Cameron, 2014, Moritz et al., 1987). Here, we reconstruct and annotate the complete mitogenome of Papilio glaucus from next generation sequencing reads, and perform phylogenetic analyses of P. glaucus mitogenome with available complete mitogenomes of butterflies.

Materials and methods

Library preparation and Illumina sequencing

A male P. glaucus was caught and freshly frozen from Lake Ray Roberts State Park, Greenbelt Corridor along the Elm Fork of the Trinity River, 33.2536, − 97.0434, Denton County, Texas, USA (date 4-VIII-2013). The specimen will be deposited in the National Museum of Natural History, Smithsonian Institution, Washington, DC, USA (USNM). Detailed procedures and protocols for library preparation were described in Cong et al. (2015). Briefly, we extracted genomic DNA from a piece of muscle dissected from the butterfly thorax using the ChargeSwitch gDNA mini tissue kit (Life Technologies, Grand Island, NY, USA). 250 bp and 500 bp paired-end libraries were made following the Illumina TruSeq DNA sample preparation guide using enzymes from NEBNext Modules (New England Biolabs, Ipswich, MA, USA). These libraries were sequenced at the genomics core facility in UT Southwestern Medical Center for 150 bp from both ends with a rapid run on Illumina HiSeq1500.

Sequence assembly

Sequencing reads were processed sequentially by MIRABAIT (Chevreux et al., 1999) to remove contamination from sequence adapters, by Fastq_quality_trimmer (http://hannonlab.cshl.ed/fastx_toolkit/) to trim low-quality regions at both ends and by QUAKE to correct errors (Cong et al., 2015, Kelley et al., 2010). From either the 250 bp or 500 bp library, we used mitochondrial baiting and iterative mapping (MITObim) v1.6 (Hahn et al., 2013) to assemble the mitogenome using two approaches: (1) using mitogenomes of Papilio maackii (KC433408.1), Papilio polytes (KM014701.1) and Papilio maraho (FJ810212.1) as references to guide the assembly; (2) using a short COI barcode sequence (a segment of about 600 bp from the mitochondrial gene cytochrome oxidase I) of P. glaucus (GU090087.1) as the starting seed. We used the default parameters for MITObim except for setting the — kbait to be 35 instead of 31. The genome assemblies directly produced by the reference-guided mode in MITObim did not directly consider that the mitogenome should be a circular DNA and that the reads mapped to the N-terminus of the reference sequence could overlap with reads mapped to the C-terminus of the reference mitogenome. Therefore, it produced sequences whose C-terminal segment (usually about several hundred base pairs) is a duplication of the N-terminal segment. We detected such duplicated regions by aligning the N-terminal half and C-terminal half with BLASTN and manually removed the redundant segments. We also adjusted the linear representation of the circular DNA by circular permutation so that the sequence starts with the coding gene for ND2, which is the convention for most Lepidoptera sequences deposited in the database. We aligned the mitogenome sequences produced by different methods with MAFFT (Katoh and Standley, 2013) (Supplementary data). These assemblies mostly agree with each other, with most discrepancies located in the D-loop region. We derived our final mitogenome sequence from the alignment of these different assemblies by taking the dominant nucleotide or gap at each position.

Annotation and analysis of the mitochondrial genome

The mitogenome sequence was annotated using the MITOS web server (Bernt et al., 2013). We translated the sequences of PCGs to protein sequences using the genetic code for invertebrate mitogenomes. Secondary structures of tRNA genes were predicted using the same server.

Assembly quality assessment

We first checked if the assembly was well-supported by the sequencing reads by mapping the reads to the mitogenome using bowtie2 v2.2.3 (Langmead and Salzberg, 2012). The alignments were combined into one single SAM-format file, processed with SAMtools (Li et al., 2009) and visualized in Integrative Genomics Viewer (IGV) (Robinson et al., 2011). Number of mapped reads (coverage by reads) at each position was calculated using bedtools v2.20.1 (Quinlan and Hall, 2010) and the histogram of the coverage was prepared in IBM SPSS Statistics v21 and Microsoft Excel 2010. Second, we assessed the quality of our assembly by its consistency with other published Papilio sequences in the protein-, rRNA- and tRNA-coding regions. We aligned the rRNA- and tRNA-coding sequences directly and aligned translated sequences for PCGs to the corresponding proteins of other published Papilio mitogenomes. Alignments confirmed that our sequences are consistent with the majority of these published mitogenomes, and gaps are only in regions that are poorly conserved among other Papilio species. Finally, we confirmed the assembly by comparing with Sanger sequencing results. We compared the mitogenome sequence with a reported 2291 bp partial mitogenome sequence of P. glaucus (accession: AF044013) (Caterino and Sperling, 1999) using BLAST. In addition, we sequenced the D-loop region and one arbitrarily selected coding region that partly covers ND4 and ND5. We amplified the two fragments from genomic DNA using primers shown in Table 2 with AmpliTaq Gold® 360 Master Mix (Life Technologies, Grand Island, NY, USA), following the manufacturer's protocol. Amplified products were separated by 2% E-Gel® EX Agarose Gels (Life Technologies, Grand Island, NY, USA) and purified by Zymoclean™ Gel DNA Recovery Kit (Zymo Research, Irvine, CA, USA). The purified DNA fragments were sent for Sanger sequencing in the sequencing core facility at UT Southwestern Medical Center. Sequencing results were manually confirmed by visualizing the traces in FinchTV v1.4 and then compared with the assembled mitogenome.
Table 2

Primers used for amplification of the fragments containing D-loop and ND4 end region.

Fragment location on genomePrimerPrimer sequence (5′ to 3′)
14,387–15,306, 1–104D-loopFGCAACTGCTGGCACAAAAT
D-loopRCCAATTCAACATCCCAATCA
7428–8136ND4FCTAATCCTAACCCATCCCAACC
ND4RTAGCTGCTCCTCCTTCTATGA

Phylogenetic analysis

104 complete, non-redundant butterfly mitogenomes that are currently available were downloaded from GenBank (Table 1). Moth mitogenomes from Thitarodes renzhiensis (NC_018094.1), Bombyx mori (KM279431.1), and Eogystia hippophaecolus (KC831443.1) were also downloaded and used as outgroups. DNA sequences of the 37 protein- and RNA-coding genes were aligned by MAFFT. We manually checked the alignments of each gene, corrected sequences from some species with annotation errors based on consensus, replaced mitogenomes of poor quality (for example, sequences with frame-shift mutation that causes premature ending of proteins) with alternative mitogenome from the same species, and removed positions with large gaps and their surrounding regions with uncertain alignment. The processed alignments were analyzed for phylogeny with Bayesian Inference and Maximum likelihood methods using MrBayes v3.2 (Huelsenbeck and Ronquist, 2001) and RaxML v8.1 (Stamatakis, 2006). We built trees with a different partitioning of the data sets (unpartitioned; partitioned by genes, PCG codon positions, and the exclusion of 3rd codon positions of PCGs). In addition, the translated protein sequences were aligned with MAFFT and analyzed with MrBayes. For analyses based on DNA alignments, the most suitable nucleotide substitution model (GTR + I + G) selected by jModelTest v2.1.7 (Darriba et al., 2012) was used, and for the protein-based analyses, we used a mixed model (Poisson, Jones, Dayhoff, Mtrev, Mtmam, Wag, Rtrev, Cprev, Vt, and Blosum) provided by the MrBayes program. The resulting phylogenetic trees were visualized in FigTree v1.4.2.

Results

Coverage of the mitogenome assembly by the reads

The coverage of the assembled P. glaucus mitogenome was high by the sequencing reads, with about 6000 fold mean coverage at base pair level and 500 fold minimal coverage (Fig. 1A and B). As shown in Fig. 1A, the regions with the lowest coverage were the beginning, the end and the D-loop regions. The low coverage at the beginning and the end of the mitogenome was primarily an artifact from limiting each read to map to only one most likely position: the circular DNA was represented as a linear sequence, and reads that should map partly to the beginning and partly to the end were restricted to map to either the beginning or the end.
Fig. 1

Coverage of Papilio glaucus mitogenome by sequencing reads (250 bp and 500 bp) mapped to them by Bowtie2. (A). Coverage at each base position. (B). Histogram of coverage distribution. (C). Negative correlation between the coverage and A + T contents of 50 bp windows in the genome.

The lower coverage in the D-loop might indicate potential errors in that region. However, independent Sanger sequencing of both the D-loop region (from the end of rrnS to the beginning of ND2) and another arbitrarily selected region (from the end of ND4 to the beginning of ND5), completely matched our assembly, indicating its high quality. In addition, the mitogenome sequence also agreed with the partial mitogenome (2291 bp) sequence of P. glaucus (accession: AF044013) in the database. Our sequence showed 0.3% sequence divergence from the previous sequence (only 6 out of 2291 positions are different), which likely corresponded to sequence variations between different individuals of the same species. Instead, the low coverage of the D-loop region is probably related to its AT-rich composition, which tends to break during library preparation and is thus underrepresented in the sequencing libraries (Benjamini and Speed, 2012). Indeed, we observed that the percentage of A and T in a 50 bp window from the mitogenome is negatively correlated with the coverage for that region by the reads (Fig. 1C).

Base composition and genome structure

The P. glaucus mitogenome is a closed circular DNA of 15,306 bp. The nucleotide composition of the majority strand is A = 6117 (39.96%), T = 6193 (40.46%), G = 1167 (7.62%), and C = 1829 (11.95%), which is highly biased towards A and T. The majority strand has a negative AT-skew (− 0.0062) and GC-skew (− 0.2210) (Table 3), indicating a higher occurrence of T over A, C over G nucleotides on this strand. The P. glaucus mitogenome retains the typical insect mitogenome gene set, including 13 PCGs (ND1-6, COX1-3, ND4L, ATP8, ATP6, and CYTB), 22 tRNA genes (two for serine and leucine and one for each of the remaining amino acids), 2 ribosomal RNAs (rrnL and rrnS), and an A + T rich D-loop control region (Fig. 2 and Table 4).
Table 3

Composition and skewness of Papilio glaucus mitogenome regions.

NucleotidesWhole genomePCGstRNAsrRNAsA + T rich regionIntergenic spacer region
A%39.9639.5240.0640.7647.1339.36
T%40.4639.5040.9643.3347.5445.74
G%7.628.338.324.901.844.26
C%11.9512.6510.6711.003.4810.64
A + T%80.4379.0281.0184.1094.6785.11
G + C%19.5720.9818.9915.905.3314.89
AT-skew− 0.00620.0003− 0.0111− 0.0306− 0.0043− 0.0750
GC-skew⁎⁎− 0.2210− 0.2061− 0.1241− 0.3832− 0.3077− 0.4286

AT-skew = [A − T] / [A + T].

GC-skew = [G − C] / [G + C] (Perna and Kocher, 1995).

Fig. 2

Map of genes in the Papilio glaucus mitochondrial genome. PCGs are colored in yellow, tRNA-coding genes are in purple, rrnL and rrnS are in green. Each gene is shown as an arrow indicating the transcription direction. The black line in the middle shows the coordinate of each gene in the mitogenome. The arrows on top of the line correspond to genes coded on the majority strand, and those below show genes on the minority strand.

Table 4

Summary of the Papilio glaucus mitogenome.

GeneDirectionFromToSizeIntergenic nucleotidesAnticodonStart codonStop codon
ND2F110141014− 2ATTTAA
trnWF1013107765− 8TCA
trnCR10701131623GCA
trnYR11351199652GTA
COX1F1202273215310CGAT
trnL2F27332800680TAA
COX2F280134826820ATGT
trnKF34833552700CTT
trnDF35533619670GTC
ATP8F36203787168− 7ATTTAA
ATP6F378144586787ATGTAA
COX3F446652547892ATGTAA
trnGF5257532266− 3TCC
ND3F53205676357− 2ATATAG
trnAF5675573864− 1TGC
trnRF5738580164− 1TCG
trnNF58015866660GTT
trnS1F58675927611ACT
trnEF5929599567− 2TTC
trnFR59946060670GAA
ND5R6061779417340ATATAA
trnHR77957860660GTG
ND4R786192021342− 4ATAT
ND4LR919994892916ATGTAA
trnTF94969559640TGT
trnPR95609623642TGG
ND6F962610,1595344ATTTAA
CYTBF10,16411,31811552ATATAA
trnS2F11,32111,3856516TGA
ND1R11,40212,3409391ATGTAG
trnL1R12,34212,409680TAG
rrnLR12,41013,72813190
trnVR13,72913,791630TAC
rrnSR13,79214,5727810
A + T rich region14,57315,0604880
trnMF15,06115,129690CAT
trnIF15,13015,19364− 3GAT
trnQR15,19115,2586848TTG

Annotation of the mitogenome

The annotation of the mitogenome is illustrated in Fig. 2 and summarized in Table 4. Nine protein-coding genes (ND2, COX1, COX2, ATP8, ATP6, COX3, ND3, ND6, and CYTB) are coded on the majority strand. COX1 uses start codon CGA, which is consistent with many other insect mitogenomes (Kim et al., 2009). A recent study using an expressed sequence tag from a Lepidopteran species confirmed the presence of COXI transcripts starting from CGA (Margam et al., 2011). Each of the rest of the genes starts with the typical ATN. COX1, COX2 and ND4 use an incomplete stop codon T (Ojala et al., 1981), and a complete TAA codon will likely be formed during mRNA maturation (Boore, 1999, Ojala et al., 1981). The 13 PCGs have a total length of 11,214 bp (Table 4). 14 out of the 22 tRNA-coding genes are encoded on the majority strand. The tRNAs have a total length of 1443 bp, and their individual lengths range from 61 bp to 70 bp (Table 4). Secondary structures predicted by MITOS suggest that all tRNA genes adopt a typical cloverleaf structure except for trnS1 (Fig. 3). The dihydrouridine (DHU) arm of trnS1 does not form a stable stem-loop structure, which is very common in butterfly mitogenomes (Kim et al., 2014, Lu et al., 2013). The two rRNA genes, rrnL and rrnS, are located on the minority strand, and their lengths are 1319 bp and 781 bp, respectively (Table 4).
Fig. 3

Secondary structure of 22 tRNA-coding genes of Papilio glaucus mitogenome predicted by the MITOS web server. The tRNAs are labeled by their corresponding amino acids in abbreviations.

A 488 bp A + T rich region (A + T content: 94.7%) connects rrnS and trnM. This region contains an “ATAGA” motif located 19 bp downstream from rrnS and followed by 14 bp of poly-T stretch that is consistent with a gene regulation element commonly found in Lepidoptera (Lu et al., 2013, Salvato et al., 2008). In addition to this A + T rich region, there are 94 bp non-coding nucleotides that make up 12 intergenic spacer sequences, ranging from 1 bp to 48 bp in length. The longest 48 bp spacer is located between trnQ and ND2, and a 16 bp spacer is located between trnS2 and ND1 (Table 4). In addition, there are 33 bp of overlapping sequences at 10 locations. The longest 8 bp overlap is between trnW and trnC. There is a 7 bp “ATGATAA” overlap between ATP8 and ATP6 (Table 4), and this is a common feature for Lepidopteran mitogenomes (Lu et al., 2013).

Phylogenetic relationships

We phylogenetically analyzed 105 butterfly species from 6 families: Papilionidae, Hesperiidae, Pieridae, Lycaenidae, Riodinidae, and Nymphalidae, and used 3 species of moths as outgroups. Maximum Likelihood analysis of the DNA alignments (Fig. 4A) and Bayesian Inference of DNA (Fig. 4B and Supplemental S2) and protein sequences (Fig. 4C) correctly partitioned butterflies into 6 families and suggested the same tree topology at the family level: (Papilionidae + (Hesperiidae + (Pieridae + ((Riodinidae + Lycaenidae) + Nymphalidae)))). Tree topologies between different methods were very similar. The positions of several species vary between trees obtained with different methods, such as Carterocephalus silvicola, Hebomoia glaucippe, and Ypthima akragas. However, all the trees consistently place P. glaucus, the only available mitogenome from the subgenus Pterourus, as a sister of P. maraho, the only available mitogenome from the subgenus Agehana (Fig. 4 and Supplemental S2).
Fig. 4

Phylogeny of butterflies. (A). Phylogenetic tree obtained by RaxML with the data set of PCG and RNA genes, partitioned into 13 PCGs, 1 tRNA, and 2 rRNA groups. (B). Phylogenetic tree obtained by MrBayes based on PCG and RNA genes, partitioned into 13 PCGs, 1 tRNA, and 2 rRNA groups. (C). Phylogenetic tree obtained by MrBayes based on 13 protein sequences, unpartitioned. Number at each node shows confidence of that group by bootstrap in (A), or posterior probability in (B) and (C). Thitarodes renzhiensis (NC_018094.1), Bombyx mori (KM279431.1), and Eogystia hippophaecolus (KC831443.1) were used as outgroup.

Discussion

The traditional method to obtain the mitogenome is through Sanger sequencing of a couple of overlapping segments. Here, we describe our protocol of assembling mitogenomes from Next Generation Sequencing reads for whole genome sequencing, and report the mitogenome of Eastern tiger swallowtail, P. glaucus. We used MITObim, a published tool designed for this task. MITObim has a reference-guided mode: it finds the conserved regions of a mitogenome using related reference species and extends these regions by baiting reads that overlap with the assembled regions until no gaps are left in between. Another mode of MITObim works without a reference mitogenome: it starts with a short COI sequence and extends by baiting reads with overlaps, till the N- and C-termini of the sequence can be mapped to the same reads, indicating that a circular mitogenome has been assembled (Hahn et al., 2013). Compared with the traditional PCR method, this method of mitogenome assembly does not require multiple primer designing and optimization, especially for species with limited knowledge available for primers' design. However, MITObim could make mistakes due to (1) ambiguity in mapping and aligning the reads, (2) presence of sequencing error, and (3) difficulty in distinguishing mitochondrial DNA reads from nuclear DNA reads, especially those from nuclear copies of mitochondrial DNA or low-complexity regions. Therefore, taking the consensus of different MITObim runs with different modes and references, and careful validation of the result are needed. We produced a highly accurate assembly by integrating assemblies made by different MITObim modes (using other mitogenomes as references or the COI barcode sequence as seed). Even for the D-loop region that contains multiple short repeats, sequences obtained from Sanger sequencing showed no differences from our assembly. Our phylogenetic analysis yielded a detailed tree of butterflies with available complete mitogenomes. Traditionally, Hesperiidae were considered to be at the root of the phylogenetic tree for all butterflies due to their similarity in morphology to moths (Kristensen and Skalski, 1998). However, several recent studies with molecular evidence have suggested a different evolutionary history of butterflies: (Papilionidae + (Hesperiidae + (Pieridae + (Nymphalidae + (Lycaenidae + Riodinidae))))) (Heikkila et al., 2012, Kim et al., 2014, Regier et al., 2013). Our phylogenetic analysis contained many more taxa compared with previous analyses and supported the same result in placing Papilionidae at the base. The reason for this apparent contradiction between the traditional morphology and recent molecular-based phylogeny is still poorly understood and requires future analysis. It is notable that P. glaucus, a swallowtail native of eastern North America, is found to be confidently grouped with P. maraho based on our phylogeny analysis. P. maraho is a threatened swallowtail endemic to Taiwan in Asia (Baillie and Groombridge, 1996). Morphology, behavior and phylogenetic studies based on COI barcode suggested that P. maraho is very close to Papilio elwesi (Igarashi, 1979, Lu et al., 2009). Both P. elwesi and P. maraho are frequently attributed to the subgenus Agehana, which in some studies is included in subgenus Chilasa (Hancock, 1983, Zakharov et al., 2004). Despite a wide geographic separation, Agehana and Chilasa native to Asia are likely to be the closest relatives of the subgenus Pterourus (Zakharov et al., 2004), which is native to America. We speculate that these butterflies might have migrated between Asia (Agehana) and North America (Pterourus). Swallowtail butterflies have dispersed between continents (Condamine et al., 2012, Condamine et al., 2013). A recent report had found that Polyommatus blue butterflies traveled from Asia to North America via the Bering Strait, ultimately migrating to South America (Vila et al., 2011). Several other reports found that other butterfly species, animals, and plants followed the same route to the New World (Donoghue and Smith, 2004, Enghoff, 1995, Mullen, 2006, Peña et al., 2010). Although our phylogenetic analyses produced similar phylogenetic trees, despite being performed with different methods, on different subsets of data (without or without RNA-coding sequences, DNA and translated proteins), and with different partition schemes, minor variations in positions of some taxa were observed. These inconsistencies might be caused by the available mitogenomes not carrying sufficient information to deduce accurate phylogeny, available software not being fully adequate for the task, or our position selection strategy requiring improvement. The current sample of taxa did not provide a fully resolved, consistent and accurate phylogeny. Additional analyses, mitogenomes, and possibly information from nuclear genomes are needed to address the question of the applicability of mitogenomes to infer a more accurate phylogenetic tree of butterflies. The following are the supplementary data related to this article.

Supplemental S1

Multiple alignment of MITObim results.

Supplemental S2

More phylogenetic trees.
  86 in total

1.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2006-08-23       Impact factor: 6.937

2.  Characterization of complete mitochondrial genome of the skipper butterfly, Celaenorrhinus maculosus (Lepidoptera: Hesperiidae).

Authors:  Kai Wang; Jiasheng Hao; Huabin Zhao
Journal:  Mitochondrial DNA       Date:  2013-10-09

3.  The mitochondrial genome of the Korean hairstreak, Coreana raphaelis (Lepidoptera: Lycaenidae).

Authors:  I Kim; E M Lee; K Y Seol; E Y Yun; Y B Lee; J S Hwang; B R Jin
Journal:  Insect Mol Biol       Date:  2006-04       Impact factor: 3.585

4.  The complete mitochondrial genome of Melanargia asiatica (Lepidoptera: Nymphalidae: Satyrinae).

Authors:  Dunyuan Huang; Jiasheng Hao; Wei Zhang; Tianjuan Su; Ying Wang; Xiaofeng Xu
Journal:  Mitochondrial DNA A DNA Mapp Seq Anal       Date:  2014-05-28       Impact factor: 1.514

5.  Wing pattern evolution and the origins of mimicry among North American admiral butterflies (Nymphalidae: Limenitis).

Authors:  Sean P Mullen
Journal:  Mol Phylogenet Evol       Date:  2006-02-24       Impact factor: 4.286

6.  Patterns of nucleotide composition at fourfold degenerate sites of animal mitochondrial genomes.

Authors:  N T Perna; T D Kocher
Journal:  J Mol Evol       Date:  1995-09       Impact factor: 2.395

7.  The complete mitochondrial genome of the mountainous duskywing, Erynnis montanus (Lepidoptera: Hesperiidae): a new gene arrangement in Lepidoptera.

Authors:  Ah Rha Wang; Heon Cheon Jeong; Yeon Soo Han; Iksoo Kim
Journal:  Mitochondrial DNA       Date:  2013-04-16

8.  The complete mitochondrial genomes of two ghost moths, Thitarodes renzhiensis and Thitarodes yunnanensis: the ancestral gene arrangement in Lepidoptera.

Authors:  Yong-Qiang Cao; Chuan Ma; Ji-Yue Chen; Da-Rong Yang
Journal:  BMC Genomics       Date:  2012-06-22       Impact factor: 3.969

9.  Quake: quality-aware detection and correction of sequencing errors.

Authors:  David R Kelley; Michael C Schatz; Steven L Salzberg
Journal:  Genome Biol       Date:  2010-11-29       Impact factor: 13.583

10.  The complete mitochondrial genome of the bag-shelter moth Ochrogaster lunifer (Lepidoptera, Notodontidae).

Authors:  Paola Salvato; Mauro Simonato; Andrea Battisti; Enrico Negrisolo
Journal:  BMC Genomics       Date:  2008-07-15       Impact factor: 3.969

View more
  12 in total

1.  Genomes reveal drastic and recurrent phenotypic divergence in firetip skipper butterflies (Hesperiidae: Pyrrhopyginae).

Authors:  Jing Zhang; Qian Cong; Jinhui Shen; Ernst Brockmann; Nick V Grishin
Journal:  Proc Biol Sci       Date:  2019-05-29       Impact factor: 5.349

2.  The complete mitochondrial genome of Vanessa indica and phylogenetic analyses of the family Nymphalidae.

Authors:  Youxue Lu; Naiyi Liu; Liuxiang Xu; Jie Fang; Shuyan Wang
Journal:  Genes Genomics       Date:  2018-06-14       Impact factor: 1.839

3.  Fifty new genera of Hesperiidae (Lepidoptera).

Authors:  Qian Cong; Jing Zhang; Jinhui Shen; Nick V Grishin
Journal:  Insecta mundi       Date:  2019-10-11

4.  Taxonomic changes suggested by the genomic analysis of Hesperiidae (Lepidoptera).

Authors:  Jing Zhang; Qian Cong; Jinhui Shen; Nick V Grishin
Journal:  Insecta mundi       Date:  2022-02-25

5.  The mitogenome of a Malagasy butterfly Malaza fastuosus (Mabille, 1884) recovered from the holotype collected over 140 years ago adds support for a new subfamily of Hesperiidae (Lepidoptera).

Authors:  Jing Zhang; David C Lees; Jinhui Shen; Qian Cong; Blanca Huertas; Geoff Martin; Nick V Grishin
Journal:  Genome       Date:  2020-03-06       Impact factor: 2.166

6.  The complete mitochondrial genome of Lerema accius and its phylogenetic implications.

Authors:  Qian Cong; Nick V Grishin
Journal:  PeerJ       Date:  2016-01-07       Impact factor: 2.984

7.  The complete mitogenome of Achalarus lyciades (Lepidoptera: Hesperiidae).

Authors:  Jinhui Shen; Qian Cong; Nick V Grishin
Journal:  Mitochondrial DNA B Resour       Date:  2016-08-30       Impact factor: 0.658

8.  Mitogenomes of Giant-Skipper Butterflies reveal an ancient split between deep and shallow root feeders.

Authors:  Jing Zhang; Qian Cong; Xiao-Ling Fan; Rongjiang Wang; Min Wang; Nick V Grishin
Journal:  F1000Res       Date:  2017-03-06

9.  Mitogenomes of the four Agathymus holotypes collected 55 years ago.

Authors:  Jing Zhang; Qian Cong; Jinhui Shen; Nick V Grishin
Journal:  Mitochondrial DNA B Resour       Date:  2017-09-04       Impact factor: 0.658

10.  The complete mitochondrial genome of a skipper Burara striata (Lepidoptera: Hesperiidae).

Authors:  Jing Zhang; Qian Cong; Jinhui Shen; Rongjiang Wang; Nick V Grishin
Journal:  Mitochondrial DNA B Resour       Date:  2017-03-10       Impact factor: 0.658

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.