Shu-jun Wei1, Pu Tang, Li-hua Zheng, Min Shi, Xue-xin Chen. 1. Key Laboratory of Molecular Biology of Crop Pathogens and Insects, Ministry of Agriculture, Institute of Insect Sciences, Zhejiang University, 268 Kaixuan Road, 310029, Hangzhou, China.
Abstract
The apocritan Hymenoptera show extraordinary features in mitochondrial genomes, but no complete sequence has been reported for the basal lineage, Evanioidea. Here, we sequenced the complete mitochondrial genome of Evania appendigaster. This genome is 17,817 bp long; with low A+T content, 77.8%, compared with other hymenopteran species. Four tRNA genes were rearranged, among which remote inversion is the dominant gene rearrangement event. Gene shuffling is caused by tandem duplication-random loss while remote inversion is best explained by recombination. The start codon of nad1 was found as TTG, which might be common across Hymenoptera. trnS2 and trnK use abnormal anticodons TCT and TTT, respectively, and the D-stem pairings in trnS2 are absent. The secondary structure of two rRNA genes are predicted and compared with those in other insects. Five long intergenic spacers were present, including a long intergenic spacer between atp8 and atp6, where these two genes overlap in the previously reported animal genomes. A conserved motif was found between trnS1 and nad1, which is proposed to be associated with mtTERM. The A+T-rich region is 2,325 bp long, among the longest in insects, and contains a tandem repeat region.
The apocritan Hymenoptera show extraordinary features in mitochondrial genomes, but no complete sequence has been reported for the basal lineage, Evanioidea. Here, we sequenced the complete mitochondrial genome of Evania appendigaster. This genome is 17,817 bp long; with low A+T content, 77.8%, compared with other hymenopteran species. Four tRNA genes were rearranged, among which remote inversion is the dominant gene rearrangement event. Gene shuffling is caused by tandem duplication-random loss while remote inversion is best explained by recombination. The start codon of nad1 was found as TTG, which might be common across Hymenoptera. trnS2 and trnK use abnormal anticodons TCT and TTT, respectively, and the D-stem pairings in trnS2 are absent. The secondary structure of two rRNA genes are predicted and compared with those in other insects. Five long intergenic spacers were present, including a long intergenic spacer between atp8 and atp6, where these two genes overlap in the previously reported animal genomes. A conserved motif was found between trnS1 and nad1, which is proposed to be associated with mtTERM. The A+T-rich region is 2,325 bp long, among the longest in insects, and contains a tandem repeat region.
Animal mitochondrial genomes are about 16 Kb in size and contain 37 genes: 13 protein-coding genes, 22 transfer RNA genes (tRNA) and two ribosomal RNA genes (rRNA) [1, 2]. The genome is highly economized with few sections of noncoding DNA, intergenic regions, or repetitive sequences [3, 4], except for an A+T rich region, which contains essential regulatory elements for transcription and replication [5].Gene arrangements are usually conserved within major lineages [2], but may be highly rearranged in certain groups [6-12]. Gene rearrangement events may serve as useful phylogenetic markers and models for evolutionary studies [13-16]. In apocritan Hymenoptera, frequent gene rearrangements have been observed from broad examinations of gene segments [10, 17] and whole genome sequences [18-22]. However, no informative arrangement pattern has been identified to date, for which there are two possible explanations: the one is that diversified gene arrangements have arisen independently among different hymenopteran lineages, and the other is that limited sampling is concealing potentially synapomorphic rearrangements. The apocritan lineage shows other extraordinary features in the mitochondrial genome, such as high A+T content [23, 24], diversified gene rearrangement events, and the involvement of recombination in gene rearrangement [17].Evaniidae is proposed to be one of the most basal lineages in Hymenoptera [25, 26]. Presently, no complete mitochondrial genome has been sequenced from members of this family or its presumed sister groups, the Aulacidae and Gasteruptiidae. Here, we present the complete mitochondrial genome of Evania appendigaster (Hymenoptera: Evaniidae) and give a thorough description of its genome features in comparison to other hymenopteran species.
Materials and methods
DNA extraction, PCR amplification and sequencing
Total genomic DNA was extracted using the DNeasy tissue kit (Qiagen, Hilden, Germany) from a leg of an E. appendigaster adult.A range of universal insect mitochondrial primers [27, 28] and hymenopteran mitochondrial primers were used to amplify the regions of cox1-cox2, cob-rrnL, rrnL-rrnS. Species-specific primers were designed based on sequenced fragments and combined in various ways to bridge the gap of cox2-cob and rrnS-cox1. Six fragments of 575–8626 bp were amplified, covering the whole mitochondrial genome (Table 1). The PCR and sequencing procedures followed the methods in Wei et al. [23].
Table 1
Primers used in this study
Region
Primer position
Product length (bp)
Primer sequence
cox1-cox2
2127–3634
1508
TATTTTGATTYTTTGGHCAYCCWGAAGT
CCACAAATTTCTGAACATTG
cox2-cob
3339–11964
8626
TCAGGTCACCAATGATATTGA
ATTACACCTCCTAGTTTATTAGGGAT
cob-rrnL
11480–13593
2114
TATGTACTACCATGAGGACAAATATC
TTACCTTAGGGATAACAGCGTWA
rrnL-rrnS
13034–15118
2085
CCWGGTAAAATTAAAATATAAACTTC
AAACTAGGATTAGATACCCTATTAT
rrnS
14700–15275
576
GTATAYTTACTTTGTTACGACTT
GTGCCAGCAGYYGCGGTTANAC
rrnS-cox1
15096–2334
5057
ATTAGGGTATCTAATCCAACTTT
GCTCGTGTATCCACATCTATT
Primers used in this study
Genome annotation and secondary structure prediction
tRNA genes were initially identified using the tRNAscan-SE search server [29] with default parameters. Sequences longer than 100 bp between the identified tRNA genes were used as queries in BLAST searches in GenBank for identification of protein-coding and rRNA genes. The exact initiation and termination codons were identified in ClustalX version 2.0 [30] using reference sequences from other insects, following the criteria in Wei et al. [23]. Finally, the tRNA search was carried out again for the large intergenic regions using a reduced cutoff score. Twenty-one of the 22 typical animal mitochondrial tRNA genes were found using the previous steps, except for trnS2, which was identified by alignment. A+T content and codon usage were calculated using MEGA version 4.0 [31].All tRNA secondary structures were predicted using the tRNAscan-SE search server [29] except for trnS2, which was predicted manually. rRNA structures were predicted by comparison and algorithm-based methods as in Wei et al. [23].
Results and discussion
Genome structure and base composition
The complete mitochondrial genome of E. appendigaster is 17,817 bp (GenBank accession No. FJ593187), which is among the largest animal mitochondrial genomes yet sequenced [1]. All of the 37 typical animal mitochondrial genes were identified (Fig. 1; Table 2).
Fig. 1
Organization of Evania appendigaster mitochondrial genome. Gene abbreviations are as follows: cox1, cox2, and cox3 refer to the cytochrome oxidase subunits, cob refers to cytochrome b, nad1-nad6 refer to NADH dehydrogenase components, and rrnL and rrnS refer to ribosomal RNAs. Transfer RNA genes are denoted by one letter symbol according to the IPUC-IUB single-letter amino acid codes. L1, L2, S1 and S2 denote tRNA
, tRNA
, tRNA
and tRNA
, respectively. AT indicates A+T-rich region. Gene names with lines indicate that the genes are coded on the minority strand while those without lines are on the majority strand
Table 2
Annotation of Evania appendigaster mitochondrial genome
Gene
Strand
Gene position
Gene length (bp)
Anti/Start codon
Stop codon
Intergenic nucleotides
trnC
+
1–63
63
GCA
–
−2
trnM
+
64–129
66
CAT
–
0
trnI
+
128–194
67
GAT
–
−1
trnS1
−
194–262
69
TGA
–
−2
trnQ
−
261–330
70
TTG
–
22
nad2
+
353–1365
1013
ATG
TA
−2
trnY
−
1364–1432
69
GTA
–
3
cox1
+
1436–2980
1545
ATG
TAA
−5
trnL2
+
2976–3041
66
TAA
–
0
cox2
+
3042–3719
678
ATT
TAA
8
trnK
+
3728–3797
70
TTT
–
534
trnD
+
4332–4393
62
GTC
–
0
atp8
+
4394–4555
162
ATG
TAA
244
atp6
+
4800–5474
675
ATT
TAA
1
cox3
+
5476–6265
790
ATA
T
0
trnG
+
6266–6332
67
TCC
–
0
nad3
+
6333–6683
351
ATT
TAA
11
trnA
+
6695–6762
68
TGC
–
20
trnR
+
6783–6848
66
TCG
–
−6
trnN
+
6843–6908
66
GTT
–
−3
trnS2
+
6906–6966
61
TCT
–
0
trnE
+
6967–7032
66
TTC
–
2
trnF
−
7035–7099
65
GAA
–
0
nad5
−
7100–8747
1648
ATA
TAA
−3
trnH
−
8745–8812
68
GTG
–
0
nad4
−
8813–10148
1336
ATG
T
−7
nad4l
−
10142–10414
273
ATT
TAA
1
trnT
+
10416–10480
65
TGT
–
0
trnP
−
10481–10546
66
TGG
–
2
nad6
+
10549–11088
540
ATC
TAA
1
cob
+
11090–12253
1164
ATG
TAA
94
nad1
−
12348–13273
926
TTG
TA
0
trnL1
−
13274–13342
69
TAG
–
0
rrnL
−
13343–14616
1274
–
–
0
trnV
−
14617–14680
64
TAC
–
0
rrnS
−
14681–15427
747
–
–
0
trnW
−
15428–15492
65
TCA
–
0
A+T-rich region
–
15493–17817
2325
–
–
0
+ Indicates the gene coded on the majority strand
− Indicates the gene coded on the minority strand
– Indicates the strand or codon not applicable; the abbreviations are as in Fig. 1
Organization of Evania appendigaster mitochondrial genome. Gene abbreviations are as follows: cox1, cox2, and cox3 refer to the cytochrome oxidase subunits, cob refers to cytochrome b, nad1-nad6 refer to NADH dehydrogenase components, and rrnL and rrnS refer to ribosomal RNAs. Transfer RNA genes are denoted by one letter symbol according to the IPUC-IUB single-letter amino acid codes. L1, L2, S1 and S2 denote tRNA
, tRNA
, tRNA
and tRNA
, respectively. AT indicates A+T-rich region. Gene names with lines indicate that the genes are coded on the minority strand while those without lines are on the majority strandAnnotation of Evania appendigaster mitochondrial genome+ Indicates the gene coded on the majority strand− Indicates the gene coded on the minority strand– Indicates the strand or codon not applicable; the abbreviations are as in Fig. 1There are in total 31 overlapping nucleotides between neighboring genes in nine locations and the length of overlapping sequence is 1–7 bp, while there are in total 943 bp intergenic nucleotides in 13 locations and the length of intergenic spacers is 1–534 bp, excluding the A+T-rich region (Table 2).The A+T content of E. appendigaster mitochondrial genome are lower than all other sequenced hymenopteran species, and there are more A and C than T and G in the majority strand (Table 3). A higher A+T content was found in parasitic wasps (Apocrita) compared with nonparasitic wasps (Symphyta) in partial mitochondrial genes [24] and whole genome sequences [18–20, 22, 32, 33].
Table 3
Base composition of hymenopteran mitochondrial genomes
Species
Whole genome
All protein-coding genes
T%
C%
A%
G%
AT%
AT skew
GC skew
T%
C%
A%
G%
AT%
AT skew
GC skew
Perga condeia
33.8
14.6
42.8
8.8
77.9
0.117
−0.248
43.2
11.6
33.3
12.0
76.5
−0.129
0.017
Vanhorniaeucnemidaruma
36.0
14.8
42.2
7.1
80.1
0.079
−0.352
42.7
11.7
35.5
10.0
78.2
−0.092
−0.078
Evaniaappendigaster
37.9
15.0
39.9
7.2
77.8
0.026
−0.351
42.7
13.2
31.8
12.3
74.5
−0.146
−0.035
Diadegmasemiclausum
41.5
9.6
42.1
6.7
87.4
0.007
−0.178
46.9
8.2
36.8
8.1
83.7
−0.121
−0.006
Abispaephippium
39.5
14.6
39.1
6.7
80.6
−0.005
−0.371
43.5
11.2
35.2
10.1
78.7
−0.105
−0.052
Polisteshumilisa
41.1
10.7
42.3
5.9
84.7
0.014
−0.289
46.6
8.5
36.8
8.1
83.4
−0.118
−0.024
Apismellifera
41.2
10.5
42.1
6.3
84.9
0.011
−0.250
46.1
8.5
37.2
8.2
83.3
−0.107
−0.018
Bombusignitus
42.3
9.4
42.8
5.6
86.8
0.006
−0.253
47.5
7.5
37.6
7.4
85.1
−0.116
−0.007
Meliponabicolora
42.5
8.5
43.8
5.2
86.7
0.015
−0.241
48.0
6.9
38.4
6.8
86.4
−0.111
−0.007
AT and GC skew are calculated for the majority strand
aIndicates that no complete mitochondrial genome is available from GenBank, and corresponding values are from partial genome sequences
Base composition of hymenopteran mitochondrial genomesAT and GC skew are calculated for the majority strandaIndicates that no complete mitochondrial genome is available from GenBank, and corresponding values are from partial genome sequences
Gene rearrangement
Gene arrangement of the E. appendigaster mitochondrial genome is similar to other apocritan species. Gene rearrangement events have been classified as translocation, local inversion (inverted in the local position), gene shuffling (local translocation) and remote inversion (translocated and inverted) [17]. Four tRNA genes are rearranged, which are remote inversions of trnW, trnC and trnS1 and gene shuffling of trnM (Fig. 1). Rearrangement of tRNA genes is common in the hymenopteran mitochondrial genome, especially those in tRNA clusters, such as in the junctions of A+T-rich region-nad2, nad2-cox1, cox2-atp8 and nad3-nad5 [10, 17, 23]. However, the rearrangements in the E. appendigaster mitochondrial genome are novel. In vertebrates, gene shuffling is the dominant gene rearrangement event [34], while in Hymenoptera, equal numbers of gene shuffling, inversion and translocation events have been observed at the cox2-atp8 junction [10]. In the E. appendigaster mitochondrial genome, remote inversion was found to be the dominant gene rearrangement event.Gene shuffling is usually explained by the tandem duplication-random loss (TDRL) model [17, 35]. Evidence of the TDRL model includes a derived pattern of gene order, pseudogene and the position of intergenic spacer, the last two of which are the expected intermediate steps in changing mitochondrial gene order under this model. In the derived tRNA cluster between the A+T-rich region and nad2, all neighboring genes are overlapped or directly adjacent except for trnQ and nad2, where there is a 22 bp intergenic spacer (Table 2). Under the TDRL model, it is unlikely to randomly delete the duplicated or original genes to produce a pattern in which remnant adjacent genes overlap. Thus, it is unlikely that trnC and trnS1 were rearranged by TDRL, while it is possible that trnM was rearranged by tandem duplication of the trnI-trnQ-trnM cluster followed by deletion of trnI-trnQ and trnM in the two boundaries in an intermediate state before the insertion of trnC and trnS1. This region is located to one side of the A+T rich region that is thought to contain two replication origins [36], so an illicit-primer may be responsible for the duplication of the original tRNA cluster. The 22 bp intergenic spacer between trnQ and nad2 may be a remnant region after deletion of the second copy of trnM. Recombination may be involved in remote inversions and is the most plausible explanation for local inversions in apocritan mitochondrial genomes.
Protein-coding genes
The size of the protein-coding genes in the E. appendigaster mitochondrial genome is similar to their corresponding orthologs in other insects. The genes with the highest A+T content in the hymenopteran mitochondrial genome are usually nad6 or atp8. In E. appendigaster, the A+T content of atp8 is 69.1%, amongst the lowest ones, and this is the result of lower A+T content in the 3′ sequence of atp8.All protein-coding genes start with ATN codons (two with ATA, four with ATT, one with ATC, and five with ATG) except for nad1, which uses TTG as start codon (Table 1). cox1 is usually found to use nonstandard start codons in insects, such as TCG, ACC, CGA, CTA, CCG and AAA [37, 38]. In E. appendigaster, cox1 uses the usual start codon ATG, 3 bp after the end of trnY, and the translated amino acid sequence aligned well with orthologs in other Hymenoptera. All examined species in Lepidoptera have been found to use R as the initial amino acid for cox1 [39], whereas in Hymenoptera all species uses the ATN start codon [18, 19, 21–23, 32] except for Vanhornia eucnemidarum [20]. In E. appendigaster, three ATA lying in or 6 bp downstream from trnL1 are possible start codons for nad1. However, we proposed TTG directly after trnL1 as the start codon for nad1. This would minimize intergenic spacer and avoid overlapping between trnL1 and nad1 [37, 40]. We examined nad1 start codons in the 11 previously reported hymenopteran species, and the results revealed that either the intergenic spacers or the overlapping regions would be reduced in Perga condei [32], Vanhornia eucnemidarum [20] and three Nasonia species [21] if TTG is assigned as the start codon (Fig. 2). In Diadegma semiclausum [23], Polistes humilis [22] and three bee species [18, 33], no TTG codon is found near the initial region of nad1. In two vespid species, Abispa ephippium and Polistes humilis, trnL1 is rearranged and rrnL is left upstream nad1. In A. ephippium, a TTG codon is present 3 bp downstream the identified start codon ATA. Since there is no standard way to define the exact boundaries of rRNAs, the criteria of reducing intergenic spacer and overlapping region could not be applied to assign the start codon. In conclusion, our results suggest that TTG is a possible start codon for nad1 in Hymenoptera [37, 40, 41].
Fig. 2
Determination of nad1 start codons in Evania appendigaster and other reported hymenopteran mitochondrial genomes. The box indicates the newly assigned start codons, and the shaded regions the previously assigned start codons. Sequences of tRNA are marked by solid lines, intergenic spacers by dotted lines and rrnL by dashed lines
Determination of nad1 start codons in Evania appendigaster and other reported hymenopteran mitochondrial genomes. The box indicates the newly assigned start codons, and the shaded regions the previously assigned start codons. Sequences of tRNA are marked by solid lines, intergenic spacers by dotted lines and rrnL by dashed linesNine protein-coding genes use the termination codon TAA. Four protein-coding genes use incomplete stop codons: nad1 and nad2 use the truncated termination codon TA, and cox3 and nad4 use T, which is commonly reported in other invertebrates [18, 42]. The relative synonymous codon usage values show a biased use of A and T nucleotides in E. appendigaster (Table 4).
Table 4
Codon usage in Evania appendigaster mitochondrial genome
Amino Acid
Codon
Number
RSCU
Phe
UUU
296
1.67
UUC
59
0.33
Leu
UUA
304
3.41
UUG
46
0.52
CUU
105
1.18
CUC
10
0.11
CUA
61
0.68
CUG
9
0.10
Ile
AUU
372
1.77
AUC
49
0.23
Met
AUA
282
1.75
AUG
40
0.25
Val
GUU
87
1.73
GUC
15
0.30
GUA
78
1.55
GUG
21
0.42
Ser
UCU
98
2.03
UCC
44
0.91
UCA
110
2.28
UCG
7
0.15
Pro
CCU
42
1.37
CCC
24
0.78
CCA
50
1.63
CCG
7
0.23
Thr
ACU
65
1.69
ACC
22
0.57
ACA
63
1.64
ACG
4
0.10
Ala
GCU
47
2.24
GCC
17
0.81
GCA
17
0.81
GCG
3
0.14
Tyr
UAU
132
1.64
UAC
29
0.36
His
CAU
52
1.55
CAC
15
0.45
Gln
CAA
50
1.67
CAG
10
0.33
Asn
AAU
162
1.62
AAC
38
0.38
Lys
AAA
110
1.79
AAG
13
0.21
Asp
GAU
50
1.64
GAC
11
0.36
Glu
GAA
52
1.42
GAG
21
0.58
Cys
UGU
36
1.60
UGC
9
0.40
Trp
UGA
69
1.60
UGG
17
0.40
Arg
CGU
12
1.14
CGC
6
0.57
CGA
16
1.52
CGG
8
0.76
Ser
AGU
37
0.77
AGC
7
0.15
AGA
71
1.47
AGG
12
0.25
Gly
GGU
51
1.06
GGC
10
0.21
GGA
86
1.78
GGG
46
0.95
RSCU refers to relative synonymous codon usage
Codon usage in Evania appendigaster mitochondrial genomeRSCU refers to relative synonymous codon usage
tRNA genes
The length of tRNAs ranges from 61 to 70 bp. All tRNA genes have a typical cloverleaf structure except for trnS2 (Fig. 3). trnS2 could not be identified and folded using conventional tRNA search methods such as tRNAscan-SE. We manually found the location of trnS2 by comparisons with those identified in other insects and then determined the exact boundaries according to the secondary structure folded by eye. The D-stem pairings in the DHU arm are absent in E. appendigaster
trnS2, which has also been reported in other insects [6, 18, 37, 43] and the rest of Metazoa [44, 45]. Since this atypical trnS2 is common in Coleoptera, Sheffield et al. [37] built an updated covariance model for automated annotation, which also performs well in other insects.
Fig. 3
Predicted secondary structures for the 22 typical tRNA genes of Evania appendigaster mitochondrial genome. Base-pairing is indicated as follows: Watson–Crick pairs by lines, wobble GU pairs by dots and other noncanonical pairs by circles
Predicted secondary structures for the 22 typical tRNA genes of Evania appendigaster mitochondrial genome. Base-pairing is indicated as follows: Watson–Crick pairs by lines, wobble GU pairs by dots and other noncanonical pairs by circlesA total of 28 unmatched base pairs exist in the E. appendigaster mitochondrial tRNA secondary structures, 19 of which are G–U pairs, eight U–U and one A–A. The number of mismatches is relatively high in the E. appendigaster mitochondrial tRNAs compared with other insects, and even within Metazoa [46]. Mismatches in regions where the tRNA genes overlap with adjacent downstream genes could be corrected by 3′-RNA editing [47-50]. The 5′-parts of tRNA accepter stems are also found in Acanthamoeba [51] and some fungi [52]. Of the 28 mismatches, only four in trnQ, trnR and trnS1 are located in the overlapping regions in the accepter stem, indicating that other mechanisms might be involved to escape the effects of Muller’s ratchet in the E. appendigaster mitochondrial genome [53].trnS2 and trnK use abnormal anticodons TCT and TTT, respectively, which have been found to be correlated with gene rearrangement [23].
rRNA genes
rrnL has a length of 1274 bp, with an A+T content of 79.7%. rrnS has a length of 747 bp, with an A+T content of 76.0%. The gene sizes are normal, but the A+T contents are lower than their counterparts in other hymenopteran species.Both rrnL and rrnS conform to the secondary structure models proposed for these genes from other insects [23, 39, 54–56]. Forty-nine helices are present in E. appendigaster
rrnL as in D. melanogaster [55] and A. mellifera [54], belonging to six domains (Fig. 4). H837 usually forms a long stem structure with a small loop in the terminal [23, 37, 54], but it forms a shorter stem and a larger loop in E. appendigaster as that in D. melanogaster [55]. The deduced structures of H2347 and H2520 are variable [54, 57, 58], but in E. appendigaster they are more similar to those from A. mellifera by Gillespie et al. (2006) than those from other insects [57, 58].
Fig. 4
Predicted rrnL secondary structure in Evania appendigaster mitochondrial genome. Tertiary interactions and base triples are shown connected by continuous lines. A 5′ half of rrnL; B 3′ half of rrnL. Symbols for base-pairings are as in Fig. 2
Predicted rrnL secondary structure in Evania appendigaster mitochondrial genome. Tertiary interactions and base triples are shown connected by continuous lines. A 5′ half of rrnL; B 3′ half of rrnL. Symbols for base-pairings are as in Fig. 2The secondary structure of rrnS contains 29 helices present in D. virilis [56] and A. mellifera [54], belonging to three domains (Fig. 5). Helix H39 could not be predicted, where a circle was formed by H27, H47, H367 and H500, and the sequences in between. Helix 47 is variable among different lepidopteran species, but the terminal portion of this stem is conserved [37], and in E. appendigaster, two loops were formed similar to D. virilis but different from two other hymenopteran species, D. semiclausum and A. mellifera, where a larger loop is present. H673 is well conserved in moths, where one stem with a bulge in the terminal is present [39, 59], and in E. appendigaster, two stem-loop structures are present as in D. virilis [56] and D. semiclausum [23], but different from that in A. mellifera [54], in which this structure is similar to moths. The structure of H1074 has been discussed in honey bee [54, 60, 61], and our predicted structure in E. appendigaster is consistent with that of Page (2000) and Gillespie et al. (2006).
Fig. 5
Predicted rrnS secondary structure in Evania appendigaster mitochondrial genome. Symbols are as in Fig. 3
Predicted rrnS secondary structure in Evania appendigaster mitochondrial genome. Symbols are as in Fig. 3
Non-coding regions
One of the most interesting features in the E. appendigaster mitochondrial genome is the presence of five major non-coding regions of more than 20 bp: spacer 1 is 22 bp between trnQ and nad2, spacer 2 is 534 bp between trnK and trnD, spacer 3 is 244 bp between atp8 and atp6, spacer 4 is 94 bp between cob and nad1, and spacer 5 is 2325 bp between rrnS-trnW and trnC-trnM-trnI. Long intergenic spacers have been identified in several insect mitochondrial genomes [18, 20, 23, 40, 62, 63]. Although intergenic spacers appeared to be unique to individual species [37], conserved motifs have been found across all insects, and are proposed to be associated with mtTERM [37, 39, 64].Spacer 1 shows limited conservation among hymenopteran species which possess it. In Hymenoptera, the tRNAs directly upstream nad2 are variable because of frequent gene rearrangements of the tRNAs between A+T-rich region and nad2 [23], therefore this spacer is unlikely to have any function in translation or transcription. However, we suggested that it is the product of gene rearrangement as in that in D. semiclausum [23]. Spacer 2 has an A+T content of 96.8%, composed of seven tandem repeat units “GTAATTTTAT”, twelve “AATAATAATATT”, eight “AATAATAATATTAAT”, an initial sequence “TTATTAATAAACCTTAAATTAAAAATTAATTA”, and a terminal sequence “AATAATAATAT(TAA)8(TA)33AT”. Spacer 3 has an A+T content of 76.2% and contains no repeat sequence although it is 224 bp long. As far as we know, no intergenic nucleotides between atp6 and atp8 have been found in the previously reported insect mitochondrial genomes, and furthermore, it is a common feature of metazoan mitochondrial genomes that atp8 and atp6 overlap [65]. It has been proposed that the secondary structure of the transcribed mRNA may facilitate cleavage between the abutting proteins [38, 66, 67]. We could map secondary structure as those in other insects [38, 46] (Fig. 6), which indicated that the presence of spacer 3 in E. appendigaster would not affect the cleavage of atp6 and atp8. Spacer 4 was found in another six hymenopteran species (Fig. 7). This intergenic spacer region may correspond to the binding site of mtTERM, a transcription attenuation factor [64], as evidenced by a 7 bp motif (ATACTAA) conserved across Lepidoptera [39] and a 5 bp (TACTA) motif conserved across Coleoptera [37]. In Hymenoptera, we found a 6 bp conserved motif (THACWW), which shows high similarity to those in Lepidoptera and Coleoptera. In P. condei and D. semiclausum, although there is only a 2 bp intergenic spacer and a 7 bp overlapping region between trnS1 and nad1, respectively, we could still find conserved motifs in both species nearby regions between trnS1 and nad1. This may indicate wrong annotations of this region in both genomes, or the existence of the motif within genes. Spacer 5 is proposed as the A+T-rich region because of its location between rrnS-trnW and trnC-trnM-trnI and high A+T content (85.6%). It is one of the longest A+T-rich regions in the sequenced insect mitochondrial genomes [23, 68]. Twenty-three tandemly arranged units of “GTCATTATTTAATATAAAATA” are present in the middle of the A+T-rich region. This region, characterized by five elements [2, 5], is believed to function in the initiation of replication and control of transcription. However, these elements in the E. appendigaster mitochondrial genome are not arranged in the conserved pattern.
Fig. 6
mRNA loops for genes atp8-atp6 in Evania appendigaster mitochondrial genome. The box indicates start codon of atp6
Fig. 7
Alignment of the intergenic spacers between trnS1 and nad1 (Spacer 4) across Evania appendigaster and other reported hymenopteran mitochondrial genomes. Shaded region indicates the conserved motif and some of these unconserved intergenic nucleotides are replaced by dots
mRNA loops for genes atp8-atp6 in Evania appendigaster mitochondrial genome. The box indicates start codon of atp6Alignment of the intergenic spacers between trnS1 and nad1 (Spacer 4) across Evania appendigaster and other reported hymenopteran mitochondrial genomes. Shaded region indicates the conserved motif and some of these unconserved intergenic nucleotides are replaced by dots
Authors: Stephen L Cameron; Mark Dowton; Lyda R Castro; Kalani Ruberu; Michael F Whiting; Andy D Austin; Kieren Diement; Julia Stevens Journal: Genome Date: 2008-10 Impact factor: 2.166