Literature DB >> 33218217

Identification of Mitochondrial DNA (NUMTs) in the Nuclear Genome of Daphnia magna.

Krzysztof Kowal1, Angelika Tkaczyk1, Mariusz Pierzchała2, Adam Bownik3, Brygida Ślaska1.   

Abstract

This is the first study in which the Daphnia magna (D. magna) nuclear genome (nDNA) obtained from the GenBank database was analyzed for pseudogene sequences of mitochondrial origin. To date, there is no information about pseudogenes localized in D. magna genome. This study aimed to identify NUMTs, their length, homology, and location for potential use in evolutionary studies and to check whether their occurrence causes co-amplification during mitochondrial genome (mtDNA) analyses. Bioinformatic analysis showed 1909 fragments of the mtDNA of D. magna, of which 1630 were located in ten linkage groups (LG) of the nDNA. The best-matched NUMTs covering >90% of the gene sequence have been identified for two mt-tRNA genes, and they may be functional nuclear RNA molecules. Isolating the total DNA in mtDNA studies, co-amplification of nDNA fragments is unlikely in the case of amplification of the whole tRNA genes as well as fragments of other genes. It was observed that TRNA-MET fragments had the highest level of sequence homology, thus they could be evolutionarily the youngest. The lowest homology was found in the D-loop-derived pseudogene. It may probably be the oldest NUMT incorporated into the nDNA; however, further analysis is necessary.

Entities:  

Keywords:  BLAST; NUMT; mtDNA; water flea

Mesh:

Substances:

Year:  2020        PMID: 33218217      PMCID: PMC7699184          DOI: 10.3390/ijms21228725

Source DB:  PubMed          Journal:  Int J Mol Sci        ISSN: 1422-0067            Impact factor:   5.923


1. Introduction

Daphnia, commonly known as the water flea, are small crustaceans usually inhabiting freshwater ponds and lakes on all continents of the globe. It has long been used as a model for the elucidation of animal responses and adaptations to environmental changes [1]. It has been also used in diverse biological research areas such as ecology, ecotoxicology, evolution, and reproductive biology due to its important position in the aquatic food chain, a high degree of phenotypic plasticity, and cyclical parthenogenesis responding to environmental stimuli [1,2,3,4,5,6,7]. Its sensitive behavioral and physiological responses are parameters used as biomarkers of the effect induced by various substances [1,2,3,4,5,6,7,8]. Thus, it has been used for reproduction tests, acute toxicity studies, and chronic toxicity tests in the OECD Guidelines [9,10]. The current state of knowledge of the nuclear genome of Daphnia magna is based on reports by Routtu et al. [11,12], Dukić et al. [13], and Lee et al. [14]. Low- and high-density genetic linkage maps were obtained, in which they assembled the whole genome sequence of D. magna. Specific genetic markers from a high-resolution genetic linkage map of D. magna Xinb3 were evaluated in toxicological studies by Korea Institute of Toxicology (KIT) [14]. Genomic resources are steadily being developed for many species of the genus Daphnia. In particular, a database of around 12,000 expressed sequence tags (EST) is currently available (http://wfleabase.org) [15], providing a useful resource to isolate polymorphic genetic markers in this species. However, there is no information about pseudogenic sequences of mitochondrial origin (NUMTs) in the D. magna genome. Nuclear DNA sequences that are homologous to the mitochondrial genome are often referred to as mitochondrial pseudogenes, or NUMTs [16]. NUMTs may differ in length and be as large as the full length of the mitochondrial genome [17]. It has been reported that the NUMT length is positively correlated with the genome size, suggesting potential roles of non-coding DNA gain and loss in NUMT accumulation [18]. NUMTs have been documented in almost all eukaryotic genomes studied [19]. The transfer of mitochondrial DNA sequences into the nuclear genome is an ongoing evolutionary process [20], which has markedly influenced the evolution and function of eukaryotic genomes [19]. Thus, NUMTs are good materials for studying the evolution of nuclear sequences without selective constraints [21]. However, because of their homology, NUMTs may confound mtDNA studies, as the NUMT co-amplification product could interfere with sequence analysis [22]. This is the first study in which the D. magna nuclear genome deposited in the GenBank database was analyzed for pseudogene sequences of mitochondrial origin. The present study aimed to identify NUMTs, their length, homology, and location for potential use in evolutionary studies and to check whether their occurrence causes co-amplification during mitochondrial genome analyses.

2. Results

Bioinformatic analysis showed 1909 fragments of the mitochondrial genome, of which 1630 fragments were located in ten linkage groups of the nuclear genome of D. magna. The other fragments were localized in scaffolds which were used during genome sequencing. All the NUMTs found in this research are listed in supplementary materials (Table S1). The total length of the NUMT sequences in the linkage groups corresponded to the number of fragments on individual LG (Table 1). The total length of NUMTs in the D. magna genome was 44.391 base pairs (bp), which accounted for 0.042% of the length of the nuclear genome. Their percentage content was 0.037% in the longest linkage group 2, in which 228 NUMTs were identified, and 0.047% in the shortest linkage group 10 (Table 1). The most frequently occurring fragments of the mtDNA sequence in the nuclear genome included ND2 (115), ND3 (113), TRNA-CYS (110), and 16S rRNA (105). However, the highest number of NUMTs was observed for the non-coding area, i.e., the D-loop (147) (Table 2). The lowest numbers of mtDNA fragments found in nDNA were observed mainly for genes encoding tRNA molecules: TRNA-PRO and TRNA-MET (6), TRNA-TYR and TRNA-ASN (4), and TRNA-SER1 (2). The highest numbers of mtDNA fragments were recorded on LG2-228, and the lowest—on LG8-134 (Table 2).
Table 1

Percentage content of NUMTs in the linkage groups of the nuclear genome.

Linkage Group (LG)Physical Length (bp) from Lee et al., 2019Sum of NUMTs Lengths (bp)Percentage Content of NUMTs on LG
LG114,058,8885.0630.036%
LG216,351,0566.0710.037%
LG311,081,2464.3890.040%
LG410,002,8794.1470.041%
LG510,116,0754.6000.045%
LG69,588,6883.9970.042%
LG710,149,7644.8290.048%
LG89,006,9113.5330.039%
LG98,299,5533.9660.048%
LG108,061,3273.7960.047%
Total106,716,38744.3910.042%
Table 2

Distribution of mtDNA gene fragments in the linkage groups and the sum of fragment length.

mtDNA Sequence LG Number of NUMTs (100% Identical) * Sum of Gene Fragments (bp)
12345678910
TRNA-GLN 365725325442 (15)966
TRNA-MET 2211------6 (6)102
ND2 12181512129121267115 (15)3235
TRNA-TRP 32332--31219 (6)447
TRNA-CYS 111391451411101112110 (12)2596
TRNA-TYR -1-----1114 (1)100
COX1 112253733431 (2)1175
TRNA-LEU1 519316531236 (10)786
COX2 2634147552452 (9)1492
TRNA-LYS -23-2-22-112 (3)290
TRNA-ASP 3-12-213--12 (1)273
ATP8 -512-1321-15 (3)438
ATP6 474452212334 (2)957
COX3 322-423-3423 (3)717
TRNA-GLY 234-21522223 (3)512
ND3 1717146914109152113 (21)2876
TRNA-ALA 16151112-119 (0)407
TRNA-ARG 215525-11325 (4)695
TRNA-ASN 11-1----1-4 (1)112
TRNA-SER1 -------2--2 (0)52
TRNA-GLU 574243445-38 (8)924
TRNA-PHE -1--21-2219 (2)187
ND5 171571171010310898 (10)3244
TRNA-HIS -31223713224 (6)536
ND4 743421243737 (1)1194
ND4L 43311212-623 (2)615
TRNA-THR 52--2-332-17 (9)355
TRNA-PRO --13--1-1-6 (1)151
ND6 9141275814104891 (7)2445
CYTB 8147565436462 (8)1631
TRNA-SER2 9793965451168 (16)2644
ND1 15129136111365696 (16)1486
TRNA-LEU2 -3--11-1--6 (2)135
16S rRNA 11193715811111010105 (10)3700
TRNA-VAL 22113122--14 (4)320
12S rRNA 12166756125131092 (10)2617
TRNA-ILE ---------- 0 0
D-LOOP 13122213291212101113147 (23)3979
SUM 189 228 170 150 165 150 171 134 135 138 1630 (253)
Sum of fragments in the LG (bp) 5063607143894147460039974829353339663796 44.391

* The counts of NUMTs that were 100% identical with the sequence from mtDNA are indicated in brackets.

The longest fragments of the mitochondrial genome present in the nuclear genome were observed for the D-loop (182 bp), ND4 (108 bp), ND3 (99 bp), and ND5 and COX3 (94 bp each) (Table 3). The 182-bp fragment of the D-loop constituted 63% of the entire sequence of this region. In contrast, in the case of the other protein-encoding genes, the fragment size ranged from 4% (CYTB) to 32% (ATP8). In turn, NUMTs were recorded among genes encoding tRNA, constituting over 90% of the mtDNA gene sequence: TRNA-ARG and TRNA-THR (95% each) and TRNA-GLU (97%). All the analyzed sequence fragments had a minimum length in the range of 16–25 bp (Table 3).
Table 3

Minimal and maximal length of each mtDNA gene fragments located in the nuclear genome.

SequenceLength (in bp)Min (in bp)% MinMax (in bp)% Max
TRNA-GLN 681624%4465%
TRNA-MET 651625%1828%
ND2 987182%586%
TRNA-TRP 641625%3859%
TRNA-CYS 641625%4063%
TRNA-TYR 641828%3555%
COX1 1537191%725%
TRNA-LEU1 681624%3957%
COX2 679183%528%
TRNA-LYS 701724%3550%
TRNA-ASP 631727%3251%
ATP8 1681710%5432%
ATP6 675183%477%
COX3 789192%9412%
TRNA-GLY 631625%3251%
ND3 354175%9928%
TRNA-ALA 621931%2845%
TRNA-ARG 641625% 61 95%
TRNA-ASN 671624%4567%
TRNA-SER1 652538%2742%
TRNA-GLU 651625% 63 97%
TRNA-PHE 681624%2638%
ND5 1708191%946%
TRNA-HIS 631625%4267%
ND4 1315191%1088%
ND4L 306176%3812%
TRNA-THR 631625% 60 95%
TRNA-PRO 641625%3453%
ND6 504184%6413%
CYTB 1133182%464%
ND1 927182%889%
TRNA-SER2 691623%3551%
TRNA-LEU2 671624%2740%
16S RRNA 1373191%786%
TRNA-VAL 721622%3853%
12S RRNA 752182%7210%
TRNA-ILE 64----
D-LOOP 289176% 182 63%
Of the 1630 NUMTs (Table 1), 253 fragments, representing 16% of all NUMTs, showed 100% homology with the mtDNA gene sequences. 100% sequence homology for TRNA-MET was found for all 6 NUMTs (Table 2 and Table 4). 23 NUMTs whose sequence homology was 100% were observed for the D-loop region and 21 NUMTs for the gene ND3. In contrast, in the case of genes TRNA-SER1 and TRNA-ALA, no NUMTs with 100% sequence homology were observed (Table 2). The mean values of the percentage of sequence identity ranged from 88.0% (ATP8) to 100% (TRNA-MET). The percentage identity for the individual linkage groups was in the range of 90–91%. The largest homology was recorded on LG10 (90.7%) and the lowest—on LG6 (90.2%) (Table 4). At least one sequence with 100% homology was identified on each of the linkage groups.
Table 4

Mean % identity of mtDNA gene fragments located in the linkage groups.

SequenceLG1LG2LG3LG4LG5LG6LG7LG8LG9LG10Mean % Identity–Gene
TRNA-GLN 93.888.894.392.992.385.197.189.490.595.0 91.6
TRNA-MET 100.0 100.0 100.0 100.0 ------ 100.0
ND2 88.791.490.288.089.388.491.290.087.287.2 89.5
TRNA-TRP 90.995.288.390.293.8--89.7 100.0 94.0 91.8
TRNA-CYS 92.689.789.189.988.493.391.487.688.792.4 90.5
TRNA-TYR -85.7----- 100.0 94.797.1 94.4
COX1 79.592.382.694.687.882.288.390.792.895.4 89.1
TRNA-LEU1 91.0 100.0 92.591.9 100.0 93.691.595.286.283.3 92.2
COX2 81.695.486.587.592.190.291.189.085.089.0 90.2
TRNA-LYS -95.286.5-98.6-95.191.9-90.5 92.6
TRNA-ASP 96.8-95.084.4-85.690.991.5-- 90.9
ATP8 -88.488.584.8- 100.0 84.587.790.9- 88.0
ATP6 86.985.190.488.493.190.386.995.285.592.7 88.9
COX3 84.990.587.6-90.591.397.1-88.394.6 90.9
TRNA-GLY 90.492.192.0-95.290.994.385.095.288.3 92.0
ND3 90.291.589.288.596.191.190.389.991.595.7 91.0
TRNA-ALA 90.593.794.789.894.790.988.090.7-94.7 91.9
TRNA-ARG 89.090.592.094.389.787.0-90.994.797.7 91.7
TRNA-ASN 84.6 100.0 -88.0----95.6- 92.0
TRNA-SER1 -------90.3-- 90.3
TRNA-GLU 89.890.889.390.293.095.187.691.696.2- 91.5
TRNA-PHE - 100.0 --90.190.5-95.592.691.3 93.1
ND5 89.588.086.192.589.492.486.886.089.588.6 89.2
TRNA-HIS -95.1 100.0 100.0 80.787.691.288.989.0 100.0 91.8
ND4 89.586.190.987.393.089.787.388.588.794.1 89.8
ND4L 86.187.484.682.9 100.0 88.995.087.3-90.8 88.5
TRNA-THR 98.389.5--95.0-98.3 100.0 94.0- 96.6
TRNA-PRO --91.390.7-- 100.0 -92.6- 92.7
ND6 92.989.089.792.687.890.989.189.288.792.6 90.2
CYTB 91.189.489.893.091.187.188.993.492.290.9 90.5
ND1 90.792.793.188.392.388.391.592.590.686.1 90.6
TRNA-SER2 95.392.394.495.289.493.696.690.290.388.8 92.3
TRNA-LEU2 -92.0--85.285.2- 100.0 -- 91.1
16S RRNA 86.387.681.791.987.188.486.487.986.192.7 87.8
TRNA-VAL 100.0 83.590.592.090.990.583.3 100.0 -- 91.4
12S RRNA 89.189.988.287.890.288.989.293.093.185.5 89.5
D-LOOP 91.990.091.791.888.891.090.392.691.588.0 90.6
Mean % Identity–LG 90.6 90.3 90.3 90.5 90.4 90.2 90.4 90.6 90.6 90.7 90.4
Table 5 compares the NUMTs’ number and length with mtDNA fragments found in pseudogenes and transcriptome of D. magna. The overall number of mtDNA fragments in pseudogenes was 599 and their highest length was estimated between 18 bp (TRNA-MET) to 99 bp (ND3). The lowest number of mtDNA fragments being part of pseudogenes was observed for TRNA-PRO (1), ATP8 (2), and TRNA-ASP and TRNA-ILE (3). The highest number of these fragments was observed for 16s rRNA (63) and D-loop (48). In the case of mtDNA fragments found in the transcriptome, the highest number was observed for TRNA-CYS (101) and 16S rRNA (100). The lowest number of mtDNA fragments was found for TRNA-SER1 and TRNA-PHE gene (1). The highest length ranged between 20 bp to 64 bp. The main localization of mtDNA fragments in transcriptome was in different mRNA transcript variants of various protein-coding genes as well as ncRNA and misc_RNA sequences (Table S1). The overall number of mtDNA fragments in transcriptome was 1275.
Table 5

The comparison of the number and length of mitochondrial DNA fragments found in the nuclear genome (NUMTs), pseudogenes, and the transcriptome.

NUMTsmtDNA Fragments Found in PseudogenesmtDNA Fragments Found in Transcriptome
GeneNumberGreatest Length (bp)NumberGreatest Length (bp)NumberGreatest Length (bp)
TRNA-GLN 464413192335
TRNA-MET 6185182129
ND2 1285847393658
TRNA-TRP 233811305838
TRNA-CYS 13640232610134
TRNA-TYR 43500525
COX1 377211725261
TRNA-LEU1 38395202823
COX2 705321381439
TRNA-LYS 1435835731
TRNA-ASP 14323291839
ATP8 18542151026
ATP6 38641124664
COX3 299415272528
TRNA-GLY 233213373148
ND3 1319928992753
TRNA-ALA 21285185728
TRNA-ARG 296118318561
TRNA-ASN 445419345
TRNA-SER1 32700120
TRNA-GLU 466315292730
TRNA-PHE 1026524126
ND5 1079419657043
TRNA-HIS 26427212827
ND4 4610825337333
ND4L 25381038530
TRNA-THR 18609224660
TRNA-PRO 6341131938
ND6 1066421419742
CYTB 715927381935
TRNA-SER2 743529373331
ND1 1078845882434
TRNA-LEU2 7271023927
16S rRNA 12378635910046
TRNA-VAL 1938920832
12S rRNA 1367210254251
TRNA-ILE 00320220
D-LOOP17018248346450
Sum1909 599 1275

3. Discussion

This is the first study in which the D. magna nuclear genome deposited in the GenBank database was analyzed for pseudogene sequences of mitochondrial origin. The first complete information about the genome of D. magna was published by Lee et al. in 2019 [14]. To date, there is no information about pseudogenes localized in the genome of the water flea. Our research provides comprehensive bioinformatic details about the location of NUMTs found in the reference genome, which may be useful for future phylogenetic, evolution, and/or population analyses. A computer-based search for NUMTs in the nuclear genome of 85 species of animals, plants, fungi, and protists showed that the total length of detected NUMTs varied from 0 to 823.9 kb per nuclear genome [23]. For instance, the NUMT content was 0 in Anopheles gambiae, 263.478 bp in Homo sapiens, and more than 800 kbp in Oryza sativa [23]. In the case of D. magna (Table 1), the overall length of NUMT in the nuclear genome was 44.391 bp. The total length of 24 NUMTs was 9.989 bp in the Pteromalus puparum genome, and 42.972 bp in Nasoni vitripennis [24], and more than 230 kbp in Apis mellifera [25]. There seems to be a positive correlation between the haploid genome sizes (C-values) and NUMT amount/prevalence in eukaryotes [26]. Another explanation for the differences in the total length of NUMTs is the fact that they accumulate in the genomes in a continuous evolutionary process [18]. Like in other species, e.g., Myotis lucifugus [27], the percentage of NUMTs in the genome was less than 0.1%. In contrast, the number and length of NUMTs may vary depending on the computer-based query for NUMTs in the BLAST tool, as in the case of the genome of Canis lupus familiaris [22,28]. In our study, the sequence search in BLASTN 2.6.0 implemented in CLC Genomics Workbench 12.0 yielded 1909 results, although 279 results were found in various scaffolds used during the sequencing of the D. magna genome. In this paper, however, the NUMT results from the scaffold were excluded due to the potential occurrence of artifacts created during sequencing, as observed by Shi et al. [27], where surprisingly, an entire mitochondrial genome was found in the scaffold AAPE02072785 in the M. lucigufus genome. It is also worth considering that, in this work, the BLAST search result took into account all search results, even those fragments whose length was only 16 bp (Table 3). The NUMTs found in scaffolds and mitochondrial DNA fragments found in pseudogenes and transcriptome are listed in supplementary materials (Table S1). In the case of various mtDNA gene fragments, such as TRNA-MET, COX1, and ND4 where their number is higher in the transcriptome than in the nuclear genome, it may be the result of the presence of multiple mRNA transcripts or ncRNA forms in the transcriptome. The variability of mRNA and ncRNA forms may be caused by alternative splicing. Moreover, the transcriptome is changing during the lifespan and may depend on the type of tissue [29]. It is not excluded that some NUMT are parts of protein-coding genes and/or participate in gene regulation through ncRNA formation. In this research, the authors focused on describing the presence of mitochondrial DNA in the nuclear genome, while the investigation of the role of NUMTs in gene expression and regulation in D. magna should be verified experimentally in further analyses. NUMTs were used to define characteristics and to clarify phylogenetic inconsistencies suggested by paralog sequences [18,30]. The analysis performed by Mishmar et al. [31] revealed that mtDNA fragments, which were integrated into the nucleus before the radiation of modern human mtDNAs, confirming that mtDNAs similar to today’s African macro-haplogroup L were the first human mtDNAs. The analysis of NUMTs in D. magna revealed that the latest evolutionary sequences are pseudogenes derived from the sequence TRNA-MET as the homology of the entire gene sequence was 100%. The homology of TRNA-MET gene fragments localized in transcriptome is below 100%, however, it could be a result of gene modifications after expression (Table S1). In addition, all mtDNA fragments in transcriptome might differ from each other due to different ways of modulating gene expression depending on the tissue. The lowest homology (70.33%) was characteristic for the pseudogene derived from the mtDNA D-loop sequence located on LG5 (Table S1). It may probably be the oldest element of mitochondrial DNA incorporated into the nuclear genome. However, due to the high degree of mutation in the D-loop, the thesis requires further verification. Similarly, NUMTs derived from TRNA-ALA and TRNA-SER1 may have been one of the first sequences derived from mtDNA. However, by assessing only sequence homology, the sequence of incorporation of mitochondrial pseudogenes into the nuclear genome cannot be determined. Nevertheless, as observed by Mishmar et al. [31], the nuclear genome accumulates mutational changes at a much slower rate than mtDNA. Hence, the sequences of “recent” NUMTs can provide valuable information about the mtDNA sequences of the earliest humans. NUMTs exhibit different degrees of homology to their mitochondrial counterparts. They are variable in size, evenly distributed within and among chromosomes, and, in some cases, they are highly rearranged and/or fragmented [32]. However, the mitochondrial chromosome’s size does not correlate with the NUMT frequency or size distribution [19]. The transfer to the nucleus can be influenced by mitochondria’s vulnerability to stress and other factors that may cause the escape of mtDNA to the cytoplasm [32]. Mutations in mtDNA may occur in the entire mitochondrial genome; however, they are most frequently detected in the hypervariable regions of D-loops [33,34,35]. The number of mutations as well as their incidence in the D-loop area may be related to the number of NUMTs occurring in the nuclear genome. The higher the mutation rate, the greater the likelihood of transfer of the D-loop fragment into the cytoplasm followed by its incorporation into the nuclear genome as a pseudogene. In the D. magna genome, the highest numbers of pseudogenes from the D-loop (147) were observed, and only 23 of them had 100% sequence homology (Table 1). No pseudogenes derived from the TRNA-ILE gene sequence were observed in the nuclear genome (Table 1). However, the detailed search of TRNA-ILE fragments in pseudogenes revealed the presence of three of them in LOC116920659, LOC116923942, and LOC116927475 and two fragments in the transcriptome (Table S1). The difference between the number of NUMTs and pseudogenes and transcriptome for the TRNA-ILE gene might be the result of the searching method. Pseudogenes were generated by automated computational analysis using gene prediction method: Gnomon [36]. Gnomon is a gene prediction HMM-based program. The core algorithm is based on Genscan which uses a 3-periodic fifth-order Hidden Markov Model (HMM) for the coding propensity score and incorporates descriptions of the basic transcriptional, translational, and splicing signals, as well as length distributions and compositional features of exons, introns, and intergenic regions. For the genes for which no experimental information is available, Gnomon is creating conventional ab initio predictions [36]. As a result of this prediction, the proposed gene contains the fragment of TRNA-ILE. The presence of this sequence fragments derived from mtDNA cannot be excluded, however, further experimental analysis of the protein should be held. Due to the lack of comparison with other genome sequences for D. magna the authors could not exclude any of the results. The parameters of the Gnomon prediction were widely used in the description of other model organisms’ genomes i.e., Arabidopsis thaliana, Danio rerio, Mus musculus, and Drosophila melanogaster [36]. Thus, the authors recognize the possibility of occurrence of tRNA-ILE-derived NUMTs in the D. magna genome. In the human genome, NUMTs are commonly associated with repetitive elements, suggesting a possible role for transposable elements in mtDNA integration in the nuclear genome [31]. Certain NUMTs are repeated multiple times within the human genome [32,37]. In the case of the D. magna genome, some NUMTs were also observed, which were repeated many times in different linkage groups (Table S1). However, their association with repetitive elements in the nuclear genome requires additional research. The average levels of NUMT sequence homology for the individual linkage groups do not differ significantly from each other, which may indicate a random and even inclusion of sequence fragments into each of them (Table 4). The cytochrome C oxidase subunit I (COI) has possibly been the most commonly studied marker. However, its popularity is mainly associated with its use as a marker for the DNA barcoding of animal diversity [38]. There are several factors causing inadequacy of mtDNA in general and COI individually, such as male-biased gene flow, the selection on any mtDNA nucleotide(s) (as the whole genome is one linkage group), retention of ancestral polymorphism, and introgression following hybridization [39]. Presently, there are huge numbers of COI sequences in public databases, and most of them have a limited length, generally close to the length of the barcoding region. It is known that the possibility of the presence of NUMTs in the existing data should not be ignored [40]. Since the success of taxonomic differentiation is positively correlated with the barcode length, the minibarcode length is usually kept above 100 bp. For example, an approximately 250-bp region of 16S rRNA can be successfully amplified from various medicinal preparations and food products. It provides the correct identification of animal species [41,42]. Gene fragments that are often used for species identification in D. magna are in the following ranges: COX1 (19–72 bp), CYTB (18–46 bp), 12s rRNA (18–72 bp), and 16s rRNA (19–78 bp); each of them constitutes less than 10% of the length of the entire gene (Table 3). Hence, the NUMT sequences of frequently analyzed genes are generally shorter than the respective mitochondrial sequence; thus, the possibility of NUMT co-amplification should decrease with an increased length of the targeted mitochondrial marker. However, it is worth paying attention to the coverage of NUMTs derived from the TRNA-ARG (95%), TRNA-GLU (97%), and TRNA-THR (95%) genes (Table 3). The transcriptome analysis for TRNA-ARG revealed the existence of the NUMT in lncRNA sequence transcript variants of the gene LOC116925827 (Table S1). The presence of this gene fragment in a non-coding sequence may suggest its role in gene regulation. The TRNA-THR-derived NUMT was identified in mRNA gene transcript variant LOC116930084. On the other hand, the sequence fragment of TRNA-GLU was only partially found in the transcriptome in zinc finger CCHC domain-containing protein 7-like mRNA localized in LG2. Perhaps, in these cases, they are not NUMTs but functional gene fragments taking part in gene modulation. Nonetheless, to confirm it, a detailed analysis of the transcriptome on larger research groups should be held. NUMTs are highly polymorphic in terms of the sequence, homo/heterozygosis status, and presence/absence at a specific locus [43]. These features facilitate the use of NUMTs as specific population markers, as proposed for the human population by Lang et al. [44] The biological importance of NUMTs may correlate with their location on the chromosome. Depending on the insertion location, NUMTs may perturb the function of the genes [45]. Additionally, de novo integration of NUMT pseudogenes into the nuclear genome has an adverse effect in some cases: promoting various disorders and aging, as observed in humans [46]. Chatre and Ricchetti [47] report that migratory mitochondrial DNA can also impact the replication of the nuclear region in Saccharomyces cerevisiae.

4. Materials and Methods

The whole sequence of the Daphnia magna genome, strain SK and annotation of the nuclear and mitochondrial genome of D. magna were obtained from GenBank (the accession numbers for the nuclear and mitochondrial genome are GCA_003990815.1 and NC_026914.1, respectively). Each of the mitochondrial genes were aligned separately to detect plausible NUMTs in the nuclear genome. The presence of mtDNA fragments in defined pseudogenes was estimated by the alignment of mtDNA sequences with the GCF_003990815.1_ASM399081v1_pseudogene_without_product.fna sequence. The presence of mtDNA fragments in the transcriptome was established by the search in GCF_003990815.1_ASM399081v1_rna.fna sequence. The detailed results were presented in Supplementary Table S1. The SK strain genome assembly was released on 02-Jan-2019 and was integrated with the high-density genetic linkage map of the strain Xinb3. The presence of NUMTs in the D. magna nuclear genome GCA_003990815.1 was evaluated using the BLAST (BLASTN 2.6.0) (Table S1) [48] program implemented within the CLC Genomics Workbench 12.0 software package (https://www.qiagenbioinformatics.com). In the presented study, we implemented a low complexity filter to mask off the query sequence segments with low compositional complexity. Such filtering was used to eliminate statistically significant but biologically uninteresting reports from the BLAST output, leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences. The expectation value (e-value = 10) describing the threshold for reporting matches against database sequences based on the assumption that ten matches are expected to be found merely by chance according to the stochastic model of Karlin and Altschul (1990) [49]. If the E-value ascribed to a match was greater than the expected threshold, no match was reported. The lower value of the threshold caused more stringent searching criteria, leading to fewer chance matches being reported. Higher threshold results in more matches being reported, but many may just match by chance, not due to any biological similarity. The value of match/mismatch, which assigned a score for aligning and evaluated the quality of a pairwise sequence alignment was set to 2/3. The penalty to open gap and penalty to extend gap (Gap Cost.) was set to 5 for existence and 2 for extension. The maximum number of database sequences, where BLAST found matches to a query sequence, to be included in the BLAST report was set to 100.5.

5. Conclusions

This article described the first occurrence of mitochondrial pseudogenic sequences (NUMTs) in the nuclear genome of D. magna. There was no full sequence homology for two genes: TRNA-SER1 as well as TRNA-ALA NUMTs. The total length of NUMTs in the nuclear genome was 44.391 bp (from 16 to 182 bp), which accounted for 0.042% of the entire genome. The best-matched NUMTs covering more than 90% of the mtDNA gene sequence were identified for the TRNA-ARG (95%), and TRNA-THR (95%) genes, and they may be included in the functional nuclear RNA molecules. The NUMT length varied from 16 to 63 bp for tRNA genes, from 17 to 108 bp for coding genes, from 18 to 78 bp for rRNA genes, and from 17 to 182 bp for the D–loop region. Therefore, using the product of total DNA isolation in mtDNA studies, co-amplification of nDNA fragments is unlikely especially in the case of amplification of the whole tRNA genes and fragments of other genes the D-loop with a length exceeding 200 bp. It was observed that fragments TRNA-MET (from 16 to 18 bp length) had the highest level of sequence homology, which means that they could be evolutionarily the youngest. The lowest degree of homology was found in the pseudogene derived from the mtDNA D-loop sequence. It may probably be the oldest element of mitochondrial DNA incorporated into the nuclear genome; however, due to the high degree of mutation in the D-loop, the thesis requires further analysis and elucidation.
  40 in total

1.  Polymorphic NumtS trace human population relationships.

Authors:  Martin Lang; Marco Sazzini; Francesco Maria Calabrese; Domenico Simone; Alessio Boattini; Giovanni Romeo; Donata Luiselli; Marcella Attimonelli; Giuseppe Gasparre
Journal:  Hum Genet       Date:  2011-12-08       Impact factor: 4.132

2.  Effects of L-proline on swimming parameters of Daphnia magna subjected to heat stress.

Authors:  Adam Bownik; Aleksandra Szabelak; Magdalena Kulińska; Monika Wałęka
Journal:  J Therm Biol       Date:  2019-07-02       Impact factor: 2.902

3.  Mitochondrial pseudogenes are pervasive and often insidious in the snapping shrimp genus Alpheus.

Authors:  S T Williams; N Knowlton
Journal:  Mol Biol Evol       Date:  2001-08       Impact factor: 16.240

4.  Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat.

Authors:  J V Lopez; N Yuhki; R Masuda; W Modi; S J O'Brien
Journal:  J Mol Evol       Date:  1994-08       Impact factor: 2.395

5.  Nuclear mitochondrial DNA activates replication in Saccharomyces cerevisiae.

Authors:  Laurent Chatre; Miria Ricchetti
Journal:  PLoS One       Date:  2011-03-08       Impact factor: 3.240

6.  An SNP-based second-generation genetic map of Daphnia magna and its application to QTL analysis of phenotypic traits.

Authors:  Jarkko Routtu; Matthew D Hall; Brian Albere; Christian Beisel; R Daniel Bergeron; Anurag Chaturvedi; Jeong-Hyeon Choi; John Colbourne; Luc De Meester; Melissa T Stephens; Claus-Peter Stelzer; Eleanne Solorzano; W Kelley Thomas; Michael E Pfrender; Dieter Ebert
Journal:  BMC Genomics       Date:  2014-11-27       Impact factor: 3.969

7.  Rampant nuclear insertion of mtDNA across diverse lineages within Orthoptera (Insecta).

Authors:  Hojun Song; Matthew J Moulton; Michael F Whiting
Journal:  PLoS One       Date:  2014-10-21       Impact factor: 3.240

8.  DNA barcodes for 1/1000 of the animal kingdom.

Authors:  Paul D N Hebert; Jeremy R Dewaard; Jean-François Landry
Journal:  Biol Lett       Date:  2009-12-16       Impact factor: 3.703

9.  Development and validation of a multi-locus DNA metabarcoding method to identify endangered species in complex samples.

Authors:  Alfred J Arulandhu; Martijn Staats; Rico Hagelaar; Marleen M Voorhuijzen; Theo W Prins; Ingrid Scholtens; Adalberto Costessi; Danny Duijsings; François Rechenmann; Frédéric B Gaspar; Maria Teresa Barreto Crespo; Arne Holst-Jensen; Matthew Birck; Malcolm Burns; Edward Haynes; Rupert Hochegger; Alexander Klingl; Lisa Lundberg; Chiara Natale; Hauke Niekamp; Elena Perri; Alessandra Barbante; Jean-Philippe Rosec; Ralf Seyfarth; Tereza Sovová; Christoff Van Moorleghem; Saskia van Ruth; Tamara Peelen; Esther Kok
Journal:  Gigascience       Date:  2017-10-01       Impact factor: 6.524

10.  Mitochondrial DNA-like sequences in the nucleus (NUMTs): insights into our African origins and the mechanism of foreign DNA integration.

Authors:  Dan Mishmar; Eduardo Ruiz-Pesini; Martin Brandon; Douglas C Wallace
Journal:  Hum Mutat       Date:  2004-02       Impact factor: 4.878

View more
  1 in total

1.  NUMTs Can Imitate Biparental Transmission of mtDNA-A Case in Drosophila melanogaster.

Authors:  Maria-Eleni Parakatselaki; Chen-Tseh Zhu; David Rand; Emmanuel D Ladoukakis
Journal:  Genes (Basel)       Date:  2022-06-06       Impact factor: 4.141

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.