Literature DB >> 35141362

The complete mitochondrial genome data of the Common Rose butterfly, Pachliopta aristolochiae (Lepidoptera, Papilionoidea, Papilionidae) from Malaysia.

Marylin Miga1, Puteri Nur Syahzanani Jahari1, Chan Vei Siang2, Kamarul Rahim Kamarudin3, Mohd Shahir Shamsir3, Lili Tokiman4, Sivachandran Parimannan5,6, Heera Rajandas5,6, Farhan Mohamed2, Faezah Mohd Salleh1,5.   

Abstract

Here, we present the complete mitochondrial genome of Pachliopta aristolochiae, a Common Rose butterfly from Malaysia. The sequence was generated using Illumina NovaSeq 6000 sequencing platform. The mitogenome is 15,235bp long, consisting of 13 protein-coding genes, 22 transfer RNAs, two ribosomal RNAs, and two D-loop regions. The total base composition was (81.6%), with A (39.3%), T (42.3%), C (11.0%) and G (7.3%). The gene order of the three tRNAs was trnM-trnI-trnQ, which differs from the ancestral insect gene order trnI-trnQ-trnM. Phylogenetic tree analysis revealed that the sequenced Pachliopta aristolochiae in this data is closely related to Losaria neptunus (NC 037868), with highly supported ML and BI analysis. The data presented in this work can provide useful resources for other researchers to study deeper into the phylogenetic relationships of Lepidoptera and the diversification of the Pachliopta species. Also, as one of the bioindicator species, this data can be used to assess environmental changes in the terrestrial and aquatic ecosystem via enviromental DNA approahes. The mitogenome of Pachliopta aristolochiae is available in GenBank under the accession number MZ781228.
© 2021 The Author(s). Published by Elsevier Inc.

Entities:  

Keywords:  Lepidoptera; Malaysia; Mitogenome; Pachliopta aristolochiae; Papilionidae

Year:  2021        PMID: 35141362      PMCID: PMC8813591          DOI: 10.1016/j.dib.2021.107740

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Fasta: Mitogenome sequence data Tables: Sequencing data, gene features, base composition, list of Lepidoptera mitogenomes used for phylogenetic analyses Figures: Circular mitogenome map, features of the D-loop regions, phylogenetic tree analysis

Value of the Data

The sequenced mitochondrial genome of the Common Rose butterfly, Pachliopta aristolochiae in this data represents the Pachliopta species originating from Malaysia. As one of the bioindicator species, this mitogenome data can be used to assess environmental changes in the terrestrial and aquatic ecosystem via environmental DNA approaches. The additional mitogenome data of Pachliopta aristolochiae generated can also provide the relevant information needed for other researchers to study deeper into the phylogenetic relationships of Lepidoptera and the diversification of the Pachliopta species.

Data Description

The Common Rose butterfly, Pachliopta aristolochiae mitogenome is a circular DNA with a total of 15,235bp in length (Fig. 1). Table 1 shows the statistical data information for the sequence reads. The mitogenome encodes 13 protein-coding genes (PCGs), 22 transfer RNAs, 2 ribosomal RNAs, and two D-loop regions (Table 2). The gene order of P.aristolochiae located between the D-loop and NAD2 was trnM-trnI-trnQ, which had been observed in most Lepidoptera mitogenomes, however, it differs from that of the ancestral insect gene order, trnI-trnQ-trnM [1]. The total size of the PCGs was 11,178bp in length and the tRNAs were 1,452bp long, ranging from 60bp to 71bp. Meanwhile, the sizes for the 12S and 16S RNAs are 719bp and 1280bp respectively. The majority of the PCGs (NAD2, COX1, COX2, ATP8, ATP6, COX3, NAD3, NAD6, CYTB) are scattered on the heavy strand, and NAD5, NAD4, NAD4l, NAD1 are on the light strand. Out of 13 PCGs, 12 were initiated by the typical ATN codon except for COX1 which uses the CGA start codon. Contrary to the start codon, two PCGs (COX2 and NAD4) were terminated with the incomplete stop codon T and the others were terminated by either TAA or TAG stop codon. The phenomena of incomplete termination codon had been observed in most Lepidoptera mitogenomes, and are associated with the polyadenylation process [2]. The mitogenome of P. aristolochiae showed an AT content of 81.64% with the base composition of A (39.3%), T (42.3%), C (11.0%), and G (7.3%) as shown in Table 3. The nucleotide skew statistics of the whole mitogenome indicates a high occurrence of T over A, and C over G with an AT-skew of -0.037 and GC-skew of -0.202.
Fig. 1

Mitogenome map of Pachliopta aristolochiae generated using OGDRAW [3]. The genes scattered on the heavy strand are shown on the outer side of the circle, while the inner side shows those that are scattered on the light strand. The arrows indicate the direction of gene transcription. CR represents the control region (D-loop).

Table 1

Sequencing data of Pachliopta aristolochiae mitogenome.

Pachliopta aristolochiae
Raw reads10,102,746
Trimmed reads10,102,675
Ave. read length149.5
Mapped reads17,890
% mapped reads0.002
Depth of coverage (X)175.72
Table 2

Gene features of Pachliopta aristolochiae mitogenome.

Position
Gene (anticodon)StartStopDirectionSizeStart/Stop codon
trnM(cat)167F67
trnI(gat)67130F64
trnQ(ttg)128196R69
NAD22311244F1014ATT/TAA
trnW(tca)12431307F65
trnC(gca)13001365R66
trnY(gta)13681434R67
COX114372967F1531CGA/TAA
trnL2(taa)29683034F67
COX230353716F682ATG/T
trnK(ctt)37173787F71
trnD(gtc)37873853F67
ATP838544021F168ATT/TAA
ATP640154692F678ATG/TAA
COX346925477F786ATG/TAA
trnG(tcc)54815546F66
NAD355475900F354ATA/TAG
trnA(tgc)58995963F65
trnR(tcg)59636024F62
trnN(gtt)60256089F65
trnS1(gct)60896148F60
D-loop61486192F45
trnE(ttc)61786246F69
trnF(gaa)62656330R66
NAD563338048R1716ATT/TAA
trnH(gtg)80678133R67
NAD481379472R1336ATG/T
NAD4l94749764R291ATG/TAA
trnT(tgt)97679831F65
trnP(tgg)98329896R65
NAD6989910432F534ATT/TAA
CYTB1043211580F1149ATG/TAA
trnS2(tga)1159311657F65
NAD11167412612R939ATG/TAA
trnL1(tag)1261312683R71
16S rRNA1265913963R1280
trnV(tac)1402114083R63
12S rRNA1408414802R719
D - loop1481615235F420
Table 3

Base composition and AT/GC skewness for each gene region of Pachliopta aristolochiae mitogenome.

GeneSize (bp)A%G%T%C%A+T%AT skewGC skew
Whole mitogenome15,23539.37.342.311.081.6−0.037−0.202
Protein coding11,17833.510.146.89.680.3−0.1660.025
tRNA1,45243.010.539.17.582.10.0480.167
rRNA2,02443.610.440.85.284.40.0330.333
D-loop (major)36546.31.649.62.595.9−0.034−0.220
D-loop (minor)4546.72.251.10.097.8−0.0451.000
Mitogenome map of Pachliopta aristolochiae generated using OGDRAW [3]. The genes scattered on the heavy strand are shown on the outer side of the circle, while the inner side shows those that are scattered on the light strand. The arrows indicate the direction of gene transcription. CR represents the control region (D-loop). Sequencing data of Pachliopta aristolochiae mitogenome. Gene features of Pachliopta aristolochiae mitogenome. Base composition and AT/GC skewness for each gene region of Pachliopta aristolochiae mitogenome. Two D-loop regions were found in the sequenced mitogenome of P.aristolochiae for this data. The first region was found at the position 6148bp to 6192bp, located between trnS1 and trnE. This region is 45bp long, which contained a string of microsatellite-like element (AT). Meanwhile, the second D-loop region was 420bp long, located between 12S rRNA and trnM, spanning a conserved ATAGA motif, followed by a poly-T stretch, and a microsatellite-like element (AT)₉ and (TA)₆ after the motif ATTTA, as commonly found in all Lepidoptera mitogenomes [4]. Fig. 2 describe the features of the two D-loop regions.
Fig. 2

Features of the two D-loop regions of Pachliopta aristolochiae mitogenome located between trnS1 and trnE, as well as 12S rRNA and trnM. Conserved motifs ‘ATAGA’ and ‘ATTTA’ are indicated in red and blue respectively. Poly-T stretch is indicated in green while microsatellite-like elements (TA)n and (AT)n are shown in yellow.

Features of the two D-loop regions of Pachliopta aristolochiae mitogenome located between trnS1 and trnE, as well as 12S rRNA and trnM. Conserved motifs ‘ATAGA’ and ‘ATTTA’ are indicated in red and blue respectively. Poly-T stretch is indicated in green while microsatellite-like elements (TA)n and (AT)n are shown in yellow. Maximum-Likelihood (ML) and Bayesian Inference (BI) probability tree were generated using 13 PCGs of 22 Lepidoptera mitogenomes from the family Papilionidae and Lycaenidae obtained from GenBank, including the sequenced P. aristolochiae in this data (Table 4). The resulting trees yielded identical topology under the ML and BI analysis (Fig. 3). Most of the nodes are highly supported with bootstrap value of more than 70% in ML analysis, and a Bayesian posterior probability of more than 0.95 in BI analysis. The sequence P. aristolochiae (MZ781228) in this study is clustered with the previously sequenced P. aristolochiae (NC 034280) and are closely related to Losaria neptunus (NC 037868), supported with a bootstrap value of 100% in ML and 1.0 posterior probability value in BI. A BLASTn analysis was also conducted to compare between the two mitogenomes of P.aristolochiae, where P.aristolochiae (MZ781228) in this data is 99.42% similar to P. aristolochiae (NC 034280) deposited in GenBank.
Table 4

Lepidoptera mitogenomes used to build the phylogenetic tree analysis. The sequenced P.aristolochiae in this data is indicated by (*), with GenBank Accession No. MZ781228.

FamilySubfamilySpeciesGenBank Accession No.
PapilionidaePapilioninaePapilio parisNC 053770
PapilionidaeParnassiinaeParnassius mercuriusNC 047306
PapilionidaePapilioninaePapilio memnonNC 043911
PapilionidaeParnassiinaeParnassius apolloniusNC 041148
PapilionidaePapilioninaePachliopta aristolochiaeNC 034280
PapilionidaePapilioninaePapilio protenorNC 034317
PapilionidaePapilioninaePapilio dardanusNC 034355
PapilionidaePapilioninaePapilio rexNC 034356
PapilionidaePapilioninaeGraphium leechiNC 034837
PapilionidaePapilioninaePapilio helenusNC 025757
PapilionidaePapilioninaeEuryades corethrusNC 037862
PapilionidaeParnassiinaeBhutanitis mansfieldiNC 037863
PapilionidaePapilioninaeLamproptera megesNC 037867
PapilionidaePapilioninaeLosaria neptunusNC 037868
PapilionidaePapilioninaeOrnithoptera richmondiaNC 037869
PapilionidaePapilioninaeOrnithoptera priamusNC 037870
PapilionidaePapilioninaeMimoides lysithousNC 037871
PapilionidaePapilioninaePapilio slateriNC 037874
PapilionidaePapilioninaeTrogonoptera brookianaNC 037875
PapilionidaePapilioninaePachliopta aristolochiae*MZ781228
LycaenidaePolyommatinaeCaerulea coeligenaNC 058607
LycaenidaePolyommatinaeShijimiaeoides divinaNC 029763
Fig. 3

Phylogenetic tree of Pachliopta aristolochiae (MZ781228), indicated by asterisk (*) and 21 other Lepidoptera mitogenomes built using Maximum-Likelihood (ML) and Bayesian Inference (BI) approach. Bootstrap support values were indicated on each tree node, showing the results of ML and BI analysis. Caerulea coeligena (NC 058607) and Shijimiaeoides divina (NC 029763) from the family Lycaenidae were used as outgroups.

Lepidoptera mitogenomes used to build the phylogenetic tree analysis. The sequenced P.aristolochiae in this data is indicated by (*), with GenBank Accession No. MZ781228. Phylogenetic tree of Pachliopta aristolochiae (MZ781228), indicated by asterisk (*) and 21 other Lepidoptera mitogenomes built using Maximum-Likelihood (ML) and Bayesian Inference (BI) approach. Bootstrap support values were indicated on each tree node, showing the results of ML and BI analysis. Caerulea coeligena (NC 058607) and Shijimiaeoides divina (NC 029763) from the family Lycaenidae were used as outgroups.

Experimental Design, Materials and Methods

Sample collection, DNA extraction and pre-processing

The sample Pachliopta aristolochiae (voucher no: DIB022) was collected from Sungai Semawak Taman Negara Endau-Rompin Johor, Malaysia (5.62 N, 100.46 E) in March 2019. The genomic DNA was extracted from a fresh tissue sample using Qiagen Blood and Tissue Kit (Qiagen, Valencia, CA) and was fragmented using a Bioruptor® system [5]. The library preparation was done using NEBNext® Ultra™ II DNA Library Prep Kit for Illumina®, following the manufacturer's instructions. Then, the library was sent for sequencing using the Illumina NovaSeq 6000 platform with 150 paired-end mode (PE150). A total of 10,102,764 raw reads were obtained and firstly verified using the FastQC program for quality assessment (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Next, the raw reads were trimmed for sequencing adapters, low-quality bases as well as Ns [6,7] using AdapterRemoval v2.3.2 [8]. Sequences with quality score of 20 and above were retained. Both the forward and reverse reads were interleaved into a single file before using PALEOMIX [9].

Mitogenome assembly, annotation and sequence analysis

The mitogenome was assembled using the NOVOPlasty v.4.2 [10] program with the default parameter. The reference sequence and seed input were taken from BOLD public data (http://barcodinglife.org/), with the sequence ID BKKP127-18.COI-5P. Next, the assembled mitogenome was run through PALEOMIX BAM pipeline [9] using default parameters to remove reads shorter than 15 bp after trimming. The mitogenome annotation was carried out using MITOS v2 web server [11], with reference set ‘RefSeq 81 Metazoa’ and genetic code ‘5’ for invertebrates. Then, the predicted proteins were verified using the Open Reading Frame (ORF) Finder (https://www.ncbi.nlm.nih.gov/orffinder/) server using BLASTP. To improve the genome annotation, the predicted proteins from MITOS v2 web server [11] and ORF Finder were aligned with the reference sequence of Pachliopta aristolochiae (NC 034280) in GenBank using Jalview 2 v11.1.4 [12]. Tablet software [13] was used to manually check for insertion and deletion of bases, as well as the sequence coverage. The total base compositions were calculated using BioEdit [14]. The AT/GC skewness was calculated as follows: AT skew= (A-T)/(A+T) and GC skew=(G-C)/(G+C), where each letter represents the total percentage of the respective base count. The annotated mitogenome sequence file was converted into GenBank format using GB2sequin web application [15]. The GenBank file format was then used to generate the circular mitogenome map using OGDRAW [3].

Phylogenetic analysis

A total of 21 available Lepidoptera mitogenomes from the family Papilionidae and Lycaenidae were obtained from GenBank (Table 4). Caerulea coeligena (NC 058607) and Shijimiaeoides divina (NC 029763) from the family Lycaenidae were used as outgroups. The PCGs of each Lepidoptera mitogenomes were firstly extracted using the PhyloSuite v1.2.2 [16] platform. The 13 protein-coding genes were then aligned in batches using the MAFFT program integrated into PhyloSuite [16] and were concatenated. Phylogenetic analyses were performed using Maximum-Likelihood (ML) and Bayesian Inference (BI) approach using the IQ-Tree [17] program implemented in PhyloSuite v1.2.2 [16] and MrBayes v3.2.7 [18] respectively. PartitionFinder v2.1.1 [19] was used to determine the best partitioning schemes for the dataset. Maximum-Likelihood (ML) tree was built using 5000 ultrafast bootstrapping with 1000 iterations, and the best substitution model was determine by PartitionFinder v2.1.1 [19]. For Bayesian Inference (BI) analysis, each partition was set to the GTR substitution model (nst=6) with gamma distributed rate variation across sites (rates=invgamma) and a proportion of invariable sites (GTR + Γ + I). The analysis was carried out for 10,000,000 generations with 4 chains, sampled every 1000 generations with a burn-in of 25% until the average standard deviation of split frequencies are less than 0.01. Tracer v1.7.2 was used to ensure sufficient parameter sampling and that the Estimated Sample Size (ESS) is more than 200 [20]. Both resulting trees were visualized using Figtree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).

CRediT authorship contribution statement

Marylin Miga: Conceptualization, Methodology, Data curation, Software, Validation, Writing – original draft. Puteri Nur Syahzanani Jahari: Data curation, Conceptualization, Methodology, Software, Validation, Writing – review & editing. Chan Vei Siang: Methodology, Software. Kamarul Rahim Kamarudin: Methodology. Mohd Shahir Shamsir: Methodology, Formal analysis, Resources, Funding acquisition. Lili Tokiman: Methodology. Sivachandran Parimannan: Formal analysis, Resources, Funding acquisition. Heera Rajandas: Formal analysis, Resources, Funding acquisition. Farhan Mohamed: Methodology, Software. Faezah Mohd Salleh: Conceptualization, Methodology, Resources, Writing – review & editing, Supervision, Funding acquisition.

Declaration of competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
SubjectGenomics
Specific subject areaLepidoptera, Papilionidae, Mitogenomics
Type of data

Fasta: Mitogenome sequence data

Tables: Sequencing data, gene features, base composition, list of Lepidoptera mitogenomes used for phylogenetic analyses

Figures: Circular mitogenome map, features of the D-loop regions, phylogenetic tree analysis

How the data were acquiredWhole genome shotgun sequencing using Illumina NovaSeq 6000 platform with 150 paired-end mode (PE150)
Data formatRaw and analyzed
Parameters for data collectionGenomic DNA was extracted from fresh tissue sample of Pachliopta aristolochiae using the Qiagen Blood and Tissue Kit (Qiagen, Valencia, CA) and fragmented using a Bioruptor® system. The library was prepared using NEBNext® Ultra™ II DNA Library Prep Kit for Illumina®. The sample was then sent for sequencing using the Illumina NovaSeq 6000 platform with 150 paired-end mode (PE150).
Description of data collectionThe assembly was done using NOVOPlasty v.4.2 and run through a PALEOMIX BAM pipeline to assess the mitogenome mapping. Annotation was done using the MITOS v2 web server and the predicted protein-coding genes were further verified using the Open Reading Frame (ORF) Finder. The circular mitogenome map was generated using OGDRAW. PhyloSuite v1.2.2 was used to extract, align and concatenate 13 protein-coding genes from 22 Lepidoptera mitogenomes prior to phylogenetic analysis. IQ-Tree and MrBayes v3.2.7 programs were used to build the phylogenetic trees using Maximum-Likelihood (ML) and Bayesian Inference (BI) probability method. PartitionFinder v2.2.1 was used to set the best partitioning schemes for the dataset. The resulting phylogenetic trees were visualized using Figtree v1.4.4.
Data source locationThe sample Pachliopta aristolochiae (voucher no: DIB022) was collected from Sungai Semawak Taman Negara Endau-Rompin Johor, Malaysia (5.62 N, 100.46 E) in March 2019.
Data accessibilityRepository name: NCBI BioProjectData identification number: PRJNA753627Direct URL to data: http://www.ncbi.nlm.nih.gov/bioproject/753627Repository name: NCBI GenBankData identification number: MZ781228Direct URL to data: https://www.ncbi.nlm.nih.gov/nuccore/mz781228Repository name: Mendeley DataData identification number: 10.17632/n52pmth7cc.2Direct URL to data: https://data.mendeley.com/datasets/n52pmth7cc/2
  18 in total

1.  PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses.

Authors:  Robert Lanfear; Paul B Frandsen; April M Wright; Tereza Senfeld; Brett Calcott
Journal:  Mol Biol Evol       Date:  2017-03-01       Impact factor: 16.240

2.  Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX.

Authors:  Mikkel Schubert; Luca Ermini; Clio Der Sarkissian; Hákon Jónsson; Aurélien Ginolhac; Robert Schaefer; Michael D Martin; Ruth Fernández; Martin Kircher; Molly McCue; Eske Willerslev; Ludovic Orlando
Journal:  Nat Protoc       Date:  2014-04-10       Impact factor: 13.491

3.  W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis.

Authors:  Jana Trifinopoulos; Lam-Tung Nguyen; Arndt von Haeseler; Bui Quang Minh
Journal:  Nucleic Acids Res       Date:  2016-04-15       Impact factor: 16.971

4.  Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors:  Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal:  Bioinformatics       Date:  2009-01-16       Impact factor: 6.937

5.  MITOS: improved de novo metazoan mitochondrial genome annotation.

Authors:  Matthias Bernt; Alexander Donath; Frank Jühling; Fabian Externbrink; Catherine Florentz; Guido Fritzsch; Joern Pütz; Martin Middendorf; Peter F Stadler
Journal:  Mol Phylogenet Evol       Date:  2012-09-07       Impact factor: 4.286

6.  MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space.

Authors:  Fredrik Ronquist; Maxim Teslenko; Paul van der Mark; Daniel L Ayres; Aaron Darling; Sebastian Höhna; Bret Larget; Liang Liu; Marc A Suchard; John P Huelsenbeck
Journal:  Syst Biol       Date:  2012-02-22       Impact factor: 15.683

7.  NOVOPlasty: de novo assembly of organelle genomes from whole genome data.

Authors:  Nicolas Dierckxsens; Patrick Mardulyn; Guillaume Smits
Journal:  Nucleic Acids Res       Date:  2017-02-28       Impact factor: 16.971

8.  The first complete mitochondrial genome data of Hippocampus kuda originating from Malaysia.

Authors:  Puteri Nur Syahzanani Jahari; Nur Fatihah Abdul Malik; Mohd Shahir Shamsir; M Thomas P Gilbert; Faezah Mohd Salleh
Journal:  Data Brief       Date:  2020-05-21

9.  OrganellarGenomeDRAW--a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets.

Authors:  Marc Lohse; Oliver Drechsel; Sabine Kahlau; Ralph Bock
Journal:  Nucleic Acids Res       Date:  2013-04-22       Impact factor: 16.971

10.  Characterization of the mitogenomes of long-tailed giant rat, Leopoldamys sabanus and a comparative analysis with other Leopoldamys species.

Authors:  Puteri Nur Syahzanani Jahari; Shahfiz Mohd Azman; Kaviarasu Munian; Nor Hazwani Ahmad Ruzman; Mohd Shahir Shamsir; Stine R Richter; Faezah Mohd Salleh
Journal:  Mitochondrial DNA B Resour       Date:  2021-02-11       Impact factor: 0.658

View more
  2 in total

1.  Characterization of the first mitogenomes of the smallest fish in the world, Paedocypris progenetica, from peat swamp of Peninsular Malaysia, Selangor, and Perak.

Authors:  NorJasmin Hussin; Izzati Adilah Azmir; Yuzine Esa; Amirrudin Ahmad; Faezah Mohd Salleh; Puteri Nur Syahzanani Jahari; Kaviarasu Munian; Han Ming Gan
Journal:  Genomics Inform       Date:  2022-03-31

2.  Unfolding the mitochondrial genome structure of green semilooper (Chrysodeixis acuta Walker): An emerging pest of onion (Allium cepa L.).

Authors:  Soumia P S; Dhananjay V Shirsat; Ram Krishna; Guru Pirasanna Pandi G; Jaipal S Choudhary; Naiyar Naaz; Karuppaiah V; Pranjali A Gedam; Anandhan S; Major Singh
Journal:  PLoS One       Date:  2022-08-30       Impact factor: 3.752

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.