Literature DB >> 34797834

A mitochondrial genome phylogeny of voles and lemmings (Rodentia: Arvicolinae): Evolutionary and taxonomic implications.

Natalia I Abramson1, Semyon Yu Bodrov1, Olga V Bondareva1, Evgeny A Genelt-Yanovskiy1, Tatyana V Petrova1.   

Abstract

Arvicolinae is one of the most impressive placental radiations with over 150 extant and numerous extinct species that emerged since the Miocene in the Northern Hemisphere. The phylogeny of Arvicolinae has been studied intensively for several decades using morphological and genetic methods. Here, we sequenced 30 new mitochondrial genomes to better understand the evolutionary relationships among the major tribes and genera within the subfamily. The phylogenetic and molecular dating analyses based on 11,391 bp concatenated alignment of protein-coding mitochondrial genes confirmed the monophyly of the subfamily. While Bayesian analysis provided a high resolution across the entire tree, Maximum Likelihood tree reconstruction showed weak support for the ordering of divergence and interrelationships of tribal level taxa within the most ancient radiation. Both the interrelationships among tribes Lagurini, Ellobiusini and Arvicolini, comprising the largest radiation and the position of the genus Dinaromys within it also remained unresolved. For the first time complex relationships between genus level taxa within the species-rich tribe Arvicolini received full resolution. Particularly Lemmiscus was robustly placed as sister to the snow voles Chionomys in the tribe Arvicolini in contrast with a long-held belief of its affinity with Lagurini. Molecular dating of the origin of Arvicolinae and early divergences obtained from the mitogenome data were consistent with fossil records. The mtDNA estimates for putative ancestors of the most genera within Arvicolini appeared to be much older than it was previously proposed in paleontological studies.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 34797834      PMCID: PMC8604340          DOI: 10.1371/journal.pone.0248198

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The subfamily Arvicolinae Gray, 1821 (Rodentia: Cricetidae), voles, lemmings and muskrats, is a highly diverse, young and fast-evolving group within the order Rodentia. Currently, representatives of the subfamily occupy most of the temperate and cold-climate terrestrial habitats across the Northern Hemisphere. Modern global fauna of Arvicolinae consists of more than 150 recent species grouped into 28 genera, with new species being constantly discovered and described [1,2]. Phylogeny of Arvicolinae has been explored using both morphological and genetic methods, allowing comparisons of reconstruction from different datasets for further cross-validation. Application of molecular phylogenetic methods resulted in a series of revisions of phylogenetic relationships and taxonomic structure of several genera and species [3-12], generally supporting the hypothesis of three successive waves of adaptive radiations in the evolutionary history of the group [7]. According to the paleontological records, the ancestors of Arvicolinae were specialised “microtoid cricetids’’, sharing the dental adaptations of increased hypsodonty. Rich fossil records of these lineages in the late Miocene sites of Eurasia and their absence in North America indicates that the origin and primary evolutionary centre for arvicolids was situated in the northern parts of Asia [13,14]. Arvicolinae sensu stricto presumably emerged during the rapid species diversification in late Miocene—early Pliocene, when global climate became cooler and drier and woodland habitats were largely replaced by open grassy landscapes. The members of this first radiation were muskrats (Ondatrini), lemmings (Lemmini and Dicrostonychini) and long-clawed mole voles (Prometheomyini). Second radiation wave is characterized by a divergence of the ancestors of modern Clethrionomyini, the compact tribe of red-backed and mountain voles [15], currently abundant in boreal forests and highlands. The third radiation wave resulted in a formation of the steppe lemmings (Lagurini), mole voles (Ellobiusini) and the richest species group—Arvicolini [7]. The branching order both within the first and the last radiation waves also remained unresolved since all attempts to untangle complex phylogenetic relationships within subfamily were made with the use of only a few mitochondrial or nuclear markers, or the study included an insufficient number of taxa in the analysis [3,5,7,9,10,12,16-21]. The recent review of Muroid rodent phylogeny based on mitochondrial cytochrome b and five nuclear genes confirmed [21] findings of previous studies [7] that the proper taxonomic sampling is crucial to build the large-scale tree reconstructions. The broad introduction of next-generation sequencing (NGS) resulted in constant accumulation of published mitochondrial genomes of Arvicolinae [22-37]. As a result, mitochondrial genome sequences of nearly one-fifth of the total species diversity of the subfamily (ca. 30 species) is already available. Mitochondrial DNA has many advantages for the phylogenetic studies, as it possesses a strict maternal inheritance [38] with high mutation rate due to a limited repair system (5–10 times that of nuclear DNA) [39] and simple conserved structure. Due to the high number of mitochondria in the cell almost the complete mitochondrial genome may be sequenced from old historical samples from museum collections thus the currently hardly available taxa may be included in the data set. Complete mitochondrial genome datasets provide a robust phylogeny with highly resolved trees having strong branch support [40]. However, the mitochondrial genomes can conflict with a phylogeny estimated from the nuclear genome due to the replacement of organelle genomes from one species or populations with those of another mediated by hybridization and introgression [41-44]. The use of mitochondrial genome datasets may also provide evidences of adaptive radiations, as signatures of selection were detected during reconstructing mitochondrial genome phylogenies for the number of vertebrate taxa including deep-sea fishes [45,46], marine turtles [47,48], and mammals [49-54]. Yet, mitochondrial genomes remain an important tool both for phylogenies resolution and for the species identification by barcoding using individual mitochondrial genes or complete genomes. Consecutive sequencing of new mitochondrial genomes recently provided new insights into many animal groups [55-59], several mammal phylogenies including e.g. fruit bats [58], carnivores [41], wood mice [60] and zokors [61]. Though molecular studies during the last decades have considerably extended and refined our knowledge of the pattern and timescale of Arvicolinae phylogeny, there are important issues that remain to be elucidated. In this study, we were aimed at estimating the phylogeny of voles and lemmings using complete mitogenomes generated using high-throughput sequencing. By significantly increasing the number of newly sequenced mitogenomes representing major tribes of voles and lemmings, we implement the phylogenetic and molecular dating analysis on a dataset consisting of almost all living genera within the subfamily. The following questions were specifically addressed during the study: (1) the order of divergence and interrelationships of taxa within the first, most ancient radiation, (2) the interrelationships of three tribe level taxa Lagurini, Ellobiusini and Arvicolini (3) relative phylogenetic placement of genera Dinaromys Kretzoi, 1955 and Lemmiscus Thomas, 1912, (4) tangled interrelationships of genera and subgenera in the most speciose tribe Arvicolini, (5) the position of Agricola agrestis Linnaeus, 1761 and Iberomys cabrerae Thomas, 1906, and (6) the timing of arvicoline divergences.

Material and methods

Ethics statement

Our study was performed using tissue and vaucher collection of the Zoological Institute Russian Academy of Sciences and the research did not require field work or live animal experimentation. DNA samples for this study were obtained following the guidelines of the Animal Care and Research Committee of the Zoological Institute RAS, Saint Petersburg and in accordance with respective national and international animal care and use policies Permission N 3-7/15-06-2021 (See S1 Table). Tissues of specimens used in the study are publicly deposited and accessible by others in a permanent repository of Zoological Institute Russian Academy of Sciences, laboratory of molecular systematics.

Taxonomic sampling

Fifty-eight species of Arvicolinae, belonging to 27 genera and all the tribe level taxa, as well as six outgroup taxa were used in this study. Complete mitochondrial genomes for 30 Arvicolinae species were sequenced in the current study (including 15 species belonging to the Arvicolini tribe, one for Dicrostonychini, two for Lagurini, three for Lemmini, six species belonging to Clethrionomyini and three crucial species without stable taxonomic position: Prometheomys schaposchnikowi Satunin, 1901 (Prometheomyini), Dinaromys bogdanovi Martino, 1922 and Lemmiscus curtatus Cope, 1868). For 28 species belonging to Arvicolini, Ellobiusini, Clethrionomyini, Dicrostonychini and Ondatrini tribes, sequences were available in the NCBI nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore). The detailed information, GenBank accession numbers, and the voucher IDs for new sequences are given in S1 Table. Hereinafter, we use the taxonomic classification following Gromov & Polyakov [62], Musser & Carleton [1], Abramson & Lissovsky [63] with amendments made in result of the current study.

DNA isolation, NGS library preparation and sequencing

Muscle tissue samples of fresh specimens were collected between 1996–2019 years and stored in 96% ethanol at -20 degrees Celcius in a tissue and DNA collection of the Group of molecular systematics of mammals (Zoological Institute RAS). Historic specimen of Lemmiscus curtatus (sampled in 1927) was obtained from the collection of the laboratory of theriology (Zoological Institute RAS), see S1 Table for details. Homogenization of tissues was performed using the Qiagen TissueLyser LT (Qiagen). For most samples, genomic DNA was extracted using the Diatom DNA Prep 200 (Isogen, Russia) except for the L. curtatus museum specimen. To reduce the potential contamination, all manipulations with the L. curtatus were carried out using a PCR workstation (LAMSYSTEMS CC, Miass, Russia) in a separate laboratory room isolated from post-PCR facilities, being used exclusively for studies of historic samples from the collection of Zoological Institute. All the working surfaces, instruments and plastics were sterilized with UV light and chloramine-T. DNA from the museum skin sample (2 × 2 mm piece from the inner side of the lip, dissected by a sterilized surgical blade) was isolated using the phenol-chloroform extraction method according to a standard protocol [64]. The following ultrasound fragmentation of the total genomic DNA was implemented using Covaris S220 focused ultrasonicator instrument (Covaris). The resulting fragmented DNA was purified and concentrated using paramagnetic bead-based chemistry AMPure XP (Beckman-Coulter) using standard workflow. DNA concentration was evaluated using a Qubit fluorometer (Thermo Fisher Scientific). NGS libraries were prepared using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs). The resulting PCR products were purified and concentrated using AMPure XP beads (Beckman-Coulter). The concentration of samples was measured using a Qubit fluorometer, and quality control of the libraries was implemented using Bioanalyzer 2100 instrument and the DNA High Sensitivity kit (Agilent). Sequencing was performed on an Illumina HiSeq 4000 system, resulting in pair-end reads of 75bp. DNA quality was checked with Qubit, the final distribution of lengths of the libraries adapter content checking was conducted using Bioanalyzer2100 (Agilent). DNA extraction (except the museum specimen of L. curtatus), library preparation and sequencing were performed using resources of the Skoltech Genomics Core Facility (https://www.skoltech.ru/research/en/shared-resources/gcf-2/). Standard precautions were taken to avoid contamination according to the Illumina’s recommendations at all stages of library preparation and sequencing [65].

Read processing, mitogenome assembly and annotation

The quality of raw reads was evaluated using FastQC [66], and parts with the quality score below 20 were trimmed using Trimmomatic-0.32 [67]. Bowtie 2.3.5.1 [68] was used to filter reads with contamination. Complete mitochondrial genomes of other Arvicolinae were used as reference sequences. Also, this was made for the museum specimen to enrich reads with mitochondrial DNA. Nucleotide misincorporation patterns that can often be observed during the studies of ancient or old museum sample DNA as a result of post-mortem DNA damage in reads from L. curtatus were achieved using mapDamage 2.0 [69]. Complete mitochondrial genome was assembled using plasmidSPAdes [70] with default settings. The resulting contigs were filtered by length, the most similar in size to mitochondrial DNA were selected (size about 16 kb for mammals). Raw reads of L. curtatus were mapped on the mitochondrial genome of Mynomes ochrogaster Wagner, 1842 with manual settings using Geneious Prime 2019.1 (Biomatters Ltd.) due to the fragmentation of the DNA extracted from the museum sample (S1 Fig). The contigs were annotated using the MITOS web server [71] (http://mitos2.bioinf.uni-leipzig.de/index.py), with default settings and the vertebrate mitochondrial genetic code. Gene boundaries were checked and refined by alignment against 28 published mitogenome sequences of Arvicolinae (see details in S1 Table). All positions of low quality, low coverage, as well as fragments that greatly differed from the reference Arvicolinae mitochondrial genomes, were replaced by N manually. Assembled sequences of protein coding genes (PCGs) were checked for internal stops manually. Raw read data are available from the a SRA database (PRJNA590630), and all assembled and annotated mitogenomes were deposited in NCBI GenBank (S1 Table).

Sequence alignment

The 30 newly obtained mitochondrial genomes were compared with 28 earlier published Arvicolinae mitogenomes mined from NCBI (see accession numbers in S1 Table), including and five mitogenomes obtained by us earlier [72-74]. All mitochondrial genomes were aligned with Mauve (http://darlinglab.org/mauve/mauve.html) in Geneious Prime 2019.1 (Biomatters Ltd.). In several studies, it has been convincingly shown that protein-coding sequences may have a strong resolving power for inferring phylogenetic interrelationships and divergence time estimates derived from PCG may be quite accurate [58,75,76]. We used this approach, however complete mt genomes will serve as the starting point for further analyses. For the subsequent analyses, the concatenated alignment of 13 PCGs using MAFFT version 7.222 [77] was produced. Third codon position has previously been shown to bias phylogenetic reconstructions [78]. The phylogeny on a smaller dataset of Arvicolinae turned out to be very poorly resolved with the exception of the third codon position. So we masked transitions in 3rd codon position by RY-coding (R for purines and Y for pyrimidines) as described in Abramson et al. [74]. Thus, two datasets were subsequently analyzed—total alignment of 13 PCGs, where all three codon positions were considered (with a length of 11,391 bp) and RY-coded alignment with transitions in third codon position masked.

Analysis of base composition

The base composition was calculated in Geneious Prime 2019.1 (Biomatters Ltd.). The strand bias in nucleotide composition was studied by calculating the relative frequencies of C and G nucleotides (CG3 skew = [C − G]/[C + G]) [58,79,80]. Both analyses were calculated using full-length mitogenomes. The PCG-alignment of 64 mitochondrial genomes was used to calculate relative frequencies of four bases (A, C, G and T) at each of three codon positions in MEGA X [81]. The 12 variables, each representing base frequency in first, second or third position, were then summarized by a Principal Component Analysis (PCA) using the PAST v.4.04 [82].

Saturation tests

The presence of phylogenetic signal was assessed with a substitution saturation analysis using the Xia test [83] in the DAMBE 7.2.1 software [84] for the whole alignment of the PCG dataset and 13 separate genes following the procedure described by Xia & Lemey [85], particularly when (a) 1st and 2nd codon position considered and 3rd position is masked from the alignment, and (b) when only 3rd codon position is included in the analysis. The analysis is based on Index of substitution saturation—Iss, and Iss.c is the critical value at which the sequences begin to fail to recover the true tree). Once Iss.c is known for a set of sequences, then we can calculate the Iss value from the sequences and compare it against the Iss.c. If Iss value exceeds the Iss.c, we can conclude that the sequence dataset consists of substitution saturation and cannot be used for further phylogenetic reconstruction. The proportion of invariant sites was specified for tests considered 1st and 2nd codon positions. The analysis was performed on a complete alignment with all sites considered. Additional analysis of saturation for each of the PCG was estimated using R packages seqinr [86] and ape [87]. P-distances were plotted against K81 distances for transitions and transversions of each codon position.

Phylogenetic analyses

We used PartitionFinder 2.1.1 [88] applying AICc and “greedy” algorithm, when an analysis is based on the a priori features of the alignment, to select the optimal partitioning scheme for each dataset. Our analysis started with the partitioning by codon positions within PCG fragments, each treated as a unique partition. For the complete 13 PCG alignment, GTR+I+G model was suggested for almost all the partitions except ND6 3rd codon position, for which the TRN+I+G model was selected. For the alignment with RY-coded 3rd codon position, two partitions were suggested—1+2nd and 3rd codon positions with GTR+I+G and GTR+G models respectively (S2 Table). Phylogenetic reconstructions using Maximum Likelihood (ML) and Bayesian Inference (BI) analyses were performed on both complete and RY-masked datasets. Trees were rooted by six species: Akodon montensis Thomas, 1913, Peromyscus megalops Merriam, 1898, Urocricetus kamensis Satunin, 1903 and three species of hamsters from genus Cricetulus Milne-Edwards, 1867 (S1 Table). Maximum Likelihood (ML) analysis was performed using IQ-TREE web server [89] with 10,000 ultrafast bootstrap replicates [90]. Bayesian Inference (BI) analysis was performed in MrBayes 3.2.6 [91] with next settings: nst = mixed, rates = invgamma and partitions as suggested with PartitionFinder results (S2 Table). Each analysis started with random trees and performed two independent runs with four independent Markov Chain Monte Carlo (MCMC) for 10 million generations with sampling every 1,000th generation, the standard deviations of split frequencies were below 0.01; potential scale reduction factors were equal to 1.0; stationarity was examined in Tracer v1.7 [92]. A consensus tree was constructed based on the trees sampled after the 25% burn-in. We also conducted ML analysis for each PCG separately with partitions by codon positions and models supposed automatically in IQ-TREE. Hyperacrius fertilis True, 1894 sequence was excluded from the ND4 alignment since this gene was highly fragmented [74]. The mitogenome of Craseomys rufocanus Sundevall, 1846 (accessed from GenBank) completely lacked the ND6 sequence (S2 Table), so this species was excluded from the analysis for this gene.

Divergence dating

Divergence times were estimated on the CIPRES Science Gateway [93] with Bayesian approach implemented in BEAST v.2.6.2 [94] using both complete PCG dataset and the one in which the transitions in the third codon position were masked with RY-coding. Datasets were partitioned according to the recommendations of PartitionFinder. All site model parameters were chosen for separate partitions with corrected Akaike’s information criterion (AICc) in JMODELTEST 2.1.1 [95]. Eight fossil calibrations were used (S3 Table). Lognormal prior distributions were applied to all the calibrations with offset values and 95% HPD intervals based on first appearance data (FAD) and stratigraphic sampling downloaded from the Paleobiology Database on 01.12.2020 using the parameters “Taxon = fossil species, Timescale = FAD” (S3 Table). To check the stability of the result, we performed an analysis with the alternate exclusion of each of the eight calibrations used (S4 Table). BEAST analyses under the birth-death process used a relaxed lognormal clock model and the program’s default prior distributions of model parameters. Each analysis was run for 100 million generations and sampled every 10 000 generations. The convergence of two independent runs was examined using Tracer v1.7 [92], and combined using LogCombiner, discarding the first 25% as burn-in. Trees were then summarized with TreeAnnotator using the maximum clade credibility tree option and fixing node heights as mean heights. Divergence time bars were obtained automatically in FigTree v1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/) from the output using the 95% highest posterior density (HPD) of the ages for each node. The comparison with an empty ‘prior run’ showed that the data were informative for estimating the divergence dates.

Results

Mitochondrial genome assembly and annotation

We sequenced, assembled and annotated mitochondrial genomes for the 30 new taxa of Arvicolinae. The mapDamage analysis implemented on the raw reads of Lemmiscus curtatus (S2 Fig) showed a low variation of deamination misincorporations values. C to T misincorporations varied from 10 to 15%, G to A from 10 to 12% and were equal to the results of similar studies [96]. Since the relative level of observed misincorporations was not significantly different from the other substitution variants, the mitogenome of L. curtatus was assembled using the same pipeline as for the rest of taxa. The assembled mitogenomes, circular double-stranded DNA of the same organization as in other mammals, contained 13 PCGs, 22 transfer RNAs (tRNA), two ribosomal RNAs (rRNAs), and a non-coding region corresponding to the control region (D-loop). Nine genes (ND6 and eight tRNAs) were oriented in the reverse direction, whereas the others were transcribed in the forward direction. All the assembled mitogenomes contained all the genes listed above, but in some species demonstrated incomplete gene sequences (see S5 Table for details). Mitochondrial genome sequences were deposited in GenBank under accession numbers indicated in S1 Table. In the subsequent analyses, the PCG dataset, containing 11,391 bp was used.

Variation in base composition

Comparison of base composition calculated using the alignment of full-length mitogenome sequences of Arvicolinae showed that mitochondrial genomes of taxa from tribes Clethrionomyini (28.36% С) and Ellobiusini (28.7% C) have the highest GC-content. Arvicolini, Dicrostonychini and Lemmini had slightly smaller values: 27.69, 28.03 and 27.77% C respectively. Lagurini were found to have the most AT-skewed base composition of mitogenomes: 31.20 and 31.30% of adenine, respectively (S1 Table). Lagurini and Arvicolini also demonstrated the highest GC-skew values (-0.32 in both cases). Ellobiusini and Clethrionomyini occupied an intermediate position in terms of it with -0.33 and -0.34, respectively. Dicrostonychini and Lemmini with equal value -0.35 have the smallest GC-skew values. The base composition (frequency of the nucleotides A, C, G, and T) was further analyzed at the three codon positions in the concatenated alignment of PCGs for each species separately (S1 Table). The 12 variables measured for 64 taxa were summarized by a PCA, based on the first two components, which contributed 73.7% and 18.8% of the total variance, respectively (Fig 1). Most of the observed variation was related to the percentage of base composition in a third codon position (S6 Table). The first component (PC1) demonstrated a high positive correlation (0.98) with the percentage of C3 (percentage of cytosine in the third position) and a high negative correlation (-0.93) with T3 (thymine in the third position). The second component (PC2) positively correlated with G3 (0.67) and negatively correlated with A3 (-0.88). Most of the Arvicolinae formed a compact group on the PCA graph. Three species of genus Cricetulus, C.longicaudatus, C. migratorius and C. griseus clustered separately from the main group of Arvicolinae along the PC1. Both species demonstrated approximately 10% lower percentage of C3 and 10% higher percentage in T3 (S6 Table). The other outgroup taxa, Urocricetus kamensis and Akodon montensis grouped with Arvicolinae, and A. montensis showed similar base composition to Hyperacrius fertilis. Among Arvicolinae, the most dissimilar base composition was observed in Mynomes longicaudus Merriam, 1888, Chionomys gud Satunin, 1909 and Arvicola amphibius Linnaeus, 1758, showing higher than group average percentage of T3 and lower than group average percentage of C3. The mitochondrial genome of Ondatra zibethicus Linnaeus, 1766 was characterized by the highest percentage of adenine in the third position (45.7%) compared to other Arvicolinae. Ellobius lutescens Thomas, 1897 demonstrated the highest percentage of С3 (37.1%) among the complete PCG dataset (Fig 1). Since most of the variation has been observed in the third codon position (S6 Table), the complete dataset was subsequently used in the following phylogenetic reconstructions.
Fig 1

Base composition in mitochondrial PCG of Arvicolinae.

The frequency of the four bases (A, C, G, and T) at each codon position (first, second and third) in concatenated alignment was used as 12 variables for PCA. Tribes are indicated by colours. Species names are given only for taxa that differ in their nucleotide composition. Details with percentages for each position are given in S6 Table.

Base composition in mitochondrial PCG of Arvicolinae.

The frequency of the four bases (A, C, G, and T) at each codon position (first, second and third) in concatenated alignment was used as 12 variables for PCA. Tribes are indicated by colours. Species names are given only for taxa that differ in their nucleotide composition. Details with percentages for each position are given in S6 Table.

Substitution saturation analysis

Substitution saturation decreases phylogenetic information contained in the sequences and plagues the phylogenetic analysis involving deep branches. According to the analysis implemented in DAMBE software (S7 Table), the observed Iss saturation index was significantly (P<0.0001) lower than critical Iss.c value for both symmetrical and asymmetrical topology tests indicating the lack of saturation in the studied Arvicolinae dataset. The results of saturation plots for separate genes (S1 File) show the same pattern of negligible saturation. As a result for all 13 PCGs, no significant saturation for the 1st and 2nd codon position, and they are all suitable for the phylogenetic inference. CYTB, ND1 and ND6 show the same for 3rd codon position in contrast with ND2, ND3, ND4, ND4L. For other genes there was no significant saturation even for the 3rd codon position considering symmetrical topology for more than 32 OTUs (number of operational taxonomic units, S7 Table).

Time-calibrated mitochondrial genome phylogeny of Arvicolinae

The maximum-likelihood (ML) and Bayesian inference (BI) trees reconstructed using complete and RY-coded alignments of PCGs had similar topology (Fig 2, S1 File). Overall, ML analysis demonstrated lower node supports compared to BI analysis. In total 70% of the nodes were highly supported by ML and BI, with Bayesian probabilities BP>0.95 and ML bootstrap support BS>95 (Fig 2, nodes with a black dot).
Fig 2

Time-calibrated mitochondrial phylogeny of Arvicolinae.

Node labels display the following supports: BI complete / BI RY-coded 3rd codon position / ML complete / ML RY-coded 3rd codon position. Black circles show nodes with 0.95–1.0 BI and 95–100 ML support. All letters at nodes correspond to fossil constraints in S3 Table. Traditional tribal designations are also given above the branches and corresponding branches distinguished by different colors.

Time-calibrated mitochondrial phylogeny of Arvicolinae.

Node labels display the following supports: BI complete / BI RY-coded 3rd codon position / ML complete / ML RY-coded 3rd codon position. Black circles show nodes with 0.95–1.0 BI and 95–100 ML support. All letters at nodes correspond to fossil constraints in S3 Table. Traditional tribal designations are also given above the branches and corresponding branches distinguished by different colors. The monophyly of subfamily Arvicolinae was strongly supported by BI and ML analyses. However, several nodes, predominantly the internal nodes representing deeper phylogenies, which were highly supported by Bayesian analysis, did not receive high BS values. The divergence time between Arvicolinae and Cricetinae was estimated as Late Miocene, ca. 11.31 (9.48–13.3) / 10.7 (8.42–13.31) Ma, based on the complete and RY-coded alignments respectively (Table 1). When one of the eight calibration was discarded by turn, the results generally remained stable (S4 Table), with the exception of the analysis with excluded calibration of MRCA for the subfamily Arvicolinae (node“A”, Fig 2). This run yielded in younger dates for most of the nodes (S4 Table)).
Table 1

Divergence time estimates for the major lineages within the subfamily Arvicolinae.

TMRCACompleteRY-masked7 calibrations (A excluded)
Arvicolinae + Cricetinae11.31 (9.48–13.3)10.7 (8.42–13.31)9.45 (7.45–11.8)
Arvicolinae7.36 (7.04–7.78)7.33 (7.05–7.73)6.09 (5.15–7.17)
Lemmini4.81 (3.68–5.97)4.37 (3.31–5.71)4.16 (3.33–5.15)
Dicrostonychini + Ondatrini5.85 (5.15–6.56)5.54 (4.41–6.59)4.92 (4.25–5.77)
Dicrostonychini4.89 (4.08–5.7)4.49 (3.23–5.86)4.13 (3.26–4.97)
Clethrionomyini4.02 (3.33–4.72)4.46 (3.35–5.64)3.4 (2.68–4.19)
(Ellobiusini + Dinaromys) / (Arvicolini + Lagurini)6.2 (5.65–6.76)6.11 (5.17–6.92)5.19 (4.09–5.85)
Ellobiusini4.97 (4.21–5.69)4.58 (3.42–5.68)4.12 (3.2–5.06)
Lagurini3.1 (2.59–3.75)3.05 (2.56–3.75)2.91 (2.55–3.38)
Hyperacrius / Arvicolini5.76 (5.2–6.34)5.79 (4.88–6.63)4.82 (4.03–5.68)
Arvicolini s.str.*4.9 (4.33–5.47)5.02 (4.12–5.89)4.11 (4.03–5.68)
Chionomys + Lemmiscus4.04 (3.3–4.74)4.28 (3.11–5.4)3.39 (2.6–4.21)
Chionomys 3.29 (2.5–4.04)3.45 (2.13–4.67)2.77 (1.95–3.55)
Proedromys /4.32 (3.81–4.86)4.52 (3.69–5.33)3.65 (3.01–4.34)
Microtus** (Microtus, Sumeriomys Terricola, Blanfordimys, Agricola, Iberomys)3.8 (3.31–4.3)3.87 (3.07–4.63)3.01 (2.44–3.64)
Mynomes 3.41 (2.89–3.91)3.32 (2.48–4.11)2.88 (2.29–3.47)
Microtus + Terricola 2.96 (2.46–3.5)3.18 (2.35–3.95)2.5 (1.97–3.13)
Microtus 1.87 (1.43–2.33)2.06 (1.31–2.8)1.58 (1.13–2.05)
Terricola 1.38 (0.89–1.93)1.46 (0.7–2.32)1.17 (0.71–1.64)
Iberomys + (Agricola + Blanfordimys)3.18 (2.66–3.71)3.11 (2.23–3.99)2.67 (2.07–3.3)
Agricola + Blanfordimys 2.82 (2.26–3.35)2.64 (1.69–3.54)2.35 (1.73–2.97)
Neodon + (Alexandromys + Lasiopodomys)3.92 (3.4–4.4)4.12 (3.31–4.9)3.3 (2.67–3.93)
Neodon 3.16 (2.58–3.76)3.3 (2.43–4.23)2.66 (2.01–3.3)
Alexandromys + Lasiopodomys 3.6 (3.12–4.09)3.65 (2.86–4.44)3.03 (2.47–3.67)
Lasiopodomys 3.09 (2.59–3.56)3.07 (2.27–3.85)2.58 (2.02–3.17)
Alexandromys 2.16 (1.55–2.81)2.2 (1.27–3.22)1.8 (1.14–2.42)

Mean node ages marked in bold and 95% highest posterior density intervals (in brackets) in million years ago (Ma) estimated with two PCG datasets—complete and with RY-masked 3rd codon position.

* excluding Arvicola and Hyperacrius.

** sensu Pardinas et al., 2017.

Mean node ages marked in bold and 95% highest posterior density intervals (in brackets) in million years ago (Ma) estimated with two PCG datasets—complete and with RY-masked 3rd codon position. * excluding Arvicola and Hyperacrius. ** sensu Pardinas et al., 2017. The earliest radiation of the proper arvicolines (tribes Lemmini, Prometheomyini, Ondatrini and Dicrostonychini) dates back to the Late Miocene with mean at 7.36 (7.04–7.78) / 7.33 (7.05–7.73) Ma. Despite the high node support for the nodes marking Lemmini and Dicrostonychini, the basal part of the phylogenetic tree remains unresolved and represents a polytomy with several nodes not receiving significant BI and ML support. Analysis based on the PCG dataset where the third codon position was not masked with RY-coding, indicated significantly high Bayesian support for node C, combining Ondatrini and Dicrostonychini (Fig 2). The time to MRCA of node C is about 5.85 (5.15–6.56) / 5.54 (4.41–6.59) Ma and the MRCA of proper Dicrostonychini at 4.89 (4.08–5.7) / 4.49 (3.23–5.86) Ma. The tribe Clethrionomyini representing second radiation of Arvicolinae received high BI and ML support, and nodes within the clade were also highly supported. The MRCA for Clethrionomyini dates back to 4.02 (3.33–4.72) / 4.46 (3.35–5.64) Ma. The cluster containing tribes Ellobiusini, Lagurini, Arvicolini and genera Dinaromys, Arvicola Lacepede, 1799 and Hyperacrius Miller, 1896, i.e. the third radiation of Arvicolinae, was robustly supported by BI using both alignments and received reliable support by ML only with RY-masked alignment (Fig 2). Within this cluster, nodes marking tribes were highly supported by BI and ML. At the level of terminal branches within this cluster Dinaromys bogdanovi, Hyperacrius fertilis and Arvicola amphibius were the only to lack a certain phylogenetic position. D. bogdanovi grouped with Ellobiusini showing high BI support and no ML support. The water vole, Arvicola amphibius, clustered with Lagurini (high BP and no BS support) thus being paraphyletic to Arvicolini. The sagebrush vole Lemmiscus curtatus was sister to snow voles, Chionomys gud and C. nivalis Martins, 1842 with a robust support obtained in all analyses. The cluster of Chionomys Miller, 1908 and Lemmiscus was the earliest derivative in the highly supported group uniting all known vole genera of Arvicolini tribe. Arvicolini sensu stricto (excluding Arvicola) was fully resolved: both node H marking the whole tribe and all nodes within the tribe received robust support in ML and BI analyses. The estimated time of the largest radiation event within the subfamily and TMRCA for the trichotomy Arvicolini—Ellobiusini—Lagurini dates back to 6.2 (5.65–6.76) / 6.11 (5.17–6.92) Ma. All following divergence events within this radiation according to the obtained estimates took place very close to each other, the 95% HPD of the diverging branches leading to MRCA of existing tribes are highly overlapping. Thus the date estimate to the MRCA of Ellobiusini was 4.97 (4.21–5.69) / 4.58 (3.42–5.68) Ma. The MRCA of the Lagurini tribe is around 3 Ma: 3.1 (2.59–3.75) / 3.05 (2.56–3.75) (Fig 2, Table 1). The estimate for the earliest split within the Arvicolini tribe radiation (not including Arvicola and Hyperacrius) with all recent genera is about 4.9 (4.33–5.47) / 5.02 (4.12–5.89) Ma, that coincides with the onset of Pliocene period, whereas the major part of recent genera, excluding early derivating Chionomys, Lemmiscus and Proedromys Thomas, 1911, according to obtained estimates appear either in the middle of the Pliocene or close to the boundary of Late Pliocene-Early Pleistocene (Fig 2, Table 1).

Gene trees

The topology of the Arvicolinae phylogeny varied between the 13 PCG trees (S1 File). While tribe level nodes received good support at most of the trees, the phylogenetic relationships between the taxa remained unresolved. The ATP8- and COX2-based trees lacked resolution at both deep and shallow nodes, and therefore, these trees resulted in a complete polytomy. The only node at the ATP8-based tree that retained its integrity with high support was the tribe Clethrionomyini. Noteworthy that this Clethrionomyini node had high support and was consistent at the majority of the gene trees, except for the ND3. The node containing the taxa of the Arvicolini tribe (excluding Arvicola amphibius) received high support on the COX1, ATP6, ND3, ND5, ND6, ND1 and CYTB gene trees. The ND4 gene tree yielded in a highly supported node grouping the semiaquatic species—Ondatra zibethicus and Arvicola amphibius, the result was not supported by any other gene tree and mitogenome BI and ML phylogenetic reconstructions (Fig 2). Positions of these two species, as well as Dinaromys bogdanovi were very unstable across the individual gene trees. These taxa often occupied different positions and clustered with other species randomly. Remarkable that even in the case when tribal support and content was consistent across various trees and with the mitogenome tree, the interrelationships between tribes at the individual gene trees were unresolved. The lack of resolution especially at the deep nodes may be related to high saturation that is demonstrated with some genes, particularly ATP6, ATP8, ND1, ND2, ND3, ND5 and ND4 (maximum saturation, S7 Table), since phylogenetic signal disappears when divergence is over 10%.

Discussion

Our phylogenetic reconstruction of the subfamily Arvicolinae is based on a PCG dataset of mitochondrial genome sequences of 58 species of voles and lemmings with the outgroup of six hamsters. The dataset included 30 original sequences, and for 10 genera the mitogenomic sequencing was implemented for the first time. To date, this is the most comprehensive mitogenome dataset aimed at the revision of the Arvicolinae phylogeny considering almost all recent genera represented by nominal species. While the monophyletic origin of Arvicolinae has always been considered indisputable, previous attempts to resolve phylogenetic relationships within the subfamily using either morphological analysis or combinations of mitochondrial and nuclear markers yielded in several hard polytomies [16] or conflicting topologies [3,5-7,9,12,21,97,98]. The more taxa and more markers were considered in the analysis, the better resolution for the nodes marking major tribes within Arvicolinae has been obtained [7,20,21]. However, the diversification events within major radiation waves remained unresolved. The phylogenetic position of the genera Prometheomys Satunin, 1901, Arvicola, Ondatra Link, 1795 and Dinaromys in reconstructions performed with mitochondrial and nuclear markers was controversial [3,5,7,20,21] and genera Hyperacrius and Lemmiscus received little attention, their phylogenetic position was arguable.

Mismatches between the Bayesian and Maximum Likelihood support for the tribes and three waves of radiation within Arvicolini

The topology of the mitochondrial genome tree of Arvicolinae obtained in this study, in general, was in good agreement with previous large-scale phylogenetic reconstructions of the group based on mitochondrial and nuclear genes [7,20,21,97,98]. Using the concatenated alignment of 13 PCG, the present reconstruction resulted in high support for the nodes marking tribes in both Bayesian and Maximum Likelihood analyses. While Bayesian analysis also provided high posterior probability support for the basal nodes, ML approach failed to recover relationships and order of divergence between the basal branches. Previously, these deep divergences were identified as three waves of rapid radiations [7]. The first radiation within the subfamily is represented by four tribes—Lemmini, Prometheomyini, Dicrostonychini (including Phenacomys Merriam, 1889) and Ondatrini. The order of divergence and phylogenetic relationships between these ancient tribes remains unresolved using mitochondrial genome data. In particular, the absence of resolution is highlighted by the unstable position of the Prometheomys or Lemmini as the basal lineages. The basal position of Lemmini and Prometheomys with Ondatrini/Dicrostonichini both postdating them was also obtained earlier [20] on a very large set of Arvicolinae taxa analyzed using several mitochondrial and nuclear markers. It is important to underline that in each case this clustering has no support. In a number of other reconstructions [3,6,21,23] Prometheomys is the earliest split within the subfamily. To check if this inconsistency may be related to nucleotide composition bias we withdrew three species from the outgroup of the genus Cricetulus, (C.longicaudatus, C. migratorius and C. griseus) that demonstrated approximately 10% lower percentage of C3 and 10% higher percentage in T3 than most Arvicolinae and left in the outgroup only Urocricetus kamensis and Akodon montensis that showed similar base composition with species of Arvicolinae and carried out 4 variants Bayesian (on full and RY-coded alignment) and ML, respectively. Only Bayesian inference on full alignment resulted in the basal position of Prometeomys, but with minimal support (S3 Fig). With both BI and ML and RY masked alignment the same topology as on the full set of taxa (Fig 2), with basal Lemmini (but again without support) was obtained. The second radiation is represented exclusively by the large monophyletic tribe Сlethrionomyini (Fig 2). These are predominantly forest-dwelling taxa that originated in Eurasia with only a few species penetrating North America during the Pleistocene. According to our data, the monophyly of Clethrionomyini was supported in analyses of either concatenated alignment or individual mitochondrial genes except for the short ND4L (S1 File). With all the nodes receiving high BP and ML support, the internal topology of branches within Clethrionomyini obtained in this study was similar to previous reconstructions of this tribe based on one mitochondrial and three nuclear loci [11]. The third radiation comprises three tribes Arvicolini, Ellobiusini and Lagurini. While these tribes, as well as most genera within the tribes, received strong support, and our reconstruction demonstrates that all the taxa of the third radiation share the same putative common ancestor, their interrelationships within this large clade also were not recovered, actually representing polytomy. According to our data, the genera Dinaromys, Hyperacrius and Lemmiscus whose assignment to certain tribes has previously been doubtful (Fig 2) also belong to the third radiation. Their taxonomic position, as well as the position of the genus Arvicola that suddenly appeared to be paraphyletic to other Arvicolini, are discussed below.

Phylogenetic relationships of the genus level taxa. Monotypic and low-diverse genera of uncertain position

The subfamily Arvicolinae includes several seriously understudied genera of unclear taxonomic position. For these genera, molecular data include either only mitochondrial CYTB sequences [5,16-18] or several additional mitochondrial and nuclear markers [3,7,9-11,21]. These genera are often the orphan genera, i.e. being represented by a single extant species. Considering such taxa is of remarkable importance for the reconstruction of high-level phylogenies, but their position on a tree can often be contradictory due to long-branch attraction [99,100]. While the resolving power of the phylogenetic reconstruction increases with the number of genes in analysis, several studies of rapid radiations based on organellar genomes pointed out the effect of long branch attraction [56,101,102] and references therein. Our study, among other, considers five genera of the unclear position either within the first (Prometheomys and Ondatra) or third (Dinaromys, Hyperacrius and Lemmiscus) radiation waves. The Balkan vole, Dinaromys bogdanovi, endemic species from Balkan Peninsula was attributed to either Ondatrini [103] or Prometheomyini [104], but conventionally to Clethrionomyini [62,105,106]. Morphologically Dinaromys is mostly close to the extinct Pliocene genus Pliomys Méhely, 1914 [62,107-109], which distinguishes it from the rest of extant vole taxa. The genus Pliomys, in turn, has generally been considered the ancestral form for the whole Clethrionomyini tribe. That was the main reason [62] to distinguish a separate subtribe Pliomyi within the latter consisting of the two genera—extant Dinaromys and extinct Pliomys. Until recently, CYTB was the only studied locus for Dinaromys, and it was placed as sister to Prometheomys, another monotypic genus, and both were close to the Ellobiusini—Arvicolini—Lagurini group [5]. This grouping was strongly rejected by the following attempts to build molecular phylogeny of Arvicolinae showing the position of Prometheomys as the earliest derivative within the subfamily [3,7,9,21], and Dinaromys within the clade uniting Ellobiusini, Lagurini and Arvicola, i.e. the third radiation [9,20,21]. According to mitochondrial genome phylogeny (Fig 2), Dinaromys does not have putative MRCA with monophyletic Clethrionomyini tribe and most likely belongs to the third radiation, yet the certain position of this genus within this large group remains unclear. The analysis of partial mitochondrial CYTB sequence [11] demonstrated that genus Hyperacrius does not seem to belong to the Clethrionomyini tribe. By analysing the set of mitogenomes of Clethrionomyini and Arvicolini it was recently suggested that Hyperacrius has the basal position within the tribe Arvicolini [74]. Here, using the broader taxonomic sampling, we confirm these previous findings showing that Hyperacrius predates the diversification of all main genera of Arvicolini. Reconstructions performed using the individual mitochondrial genes often placed genus Ondatra as sister to Arvicola [5,16] or Clethrionomyini tribe [97] with low support. In all studies involving varying sets of nuclear genes Ondatra was among early diverging lineages [7] and sister to Neofiber True, 1884 if it was included in the analysis [21,97]. Such a position better corresponds to conventional taxonomy and paleontological data. Our results placed Ondatra sister to the Dicrostonychini tribe, hence with low support (except BI with transitions in the 3rd position included). Similar topology was observed by Lv et al. [20]. Lemmiscus curtatus—the sagebrush vole—is the only extant representative of the genus, it inhabits semi-arid prairies on the western coast of North America. For a long time, Lemmiscus was considered as closely related to the steppe voles Lagurini of the Old World and even as a subgenus within Lagurus Gloger, 1841 [62,110-112]. The close affinity between Lemmiscus and the Palearctic Lagurini was then seriously criticized from the paleontological perspective. Morphological similarities among the two groups were interpreted as a result of the parallel evolution at open, steppe-like landscapes, and Lemmiscus was proposed to be close to the tribe Arvicolini, particularly the genus Microtus Schrank, 1798 [113]. These data corroborated the previous grouping of Microtus and Lemmiscus in phylogenetic reconstruction based on restriction fragment LINE-1 [114], yet their taxonomic sampling did not include Lagurini and most genera of the Arvicolini tribe. In a recent reconstruction using mitochondrial CYTB and the only nuclear gene Lemmiscus clustered with Arvicola amphibius, yet with no support [21]. Using mitochondrial genomes to reconstruct Arvicolinae phylogeny, we sensationally show that Lemmiscus appears to be sister to the snow voles genus Chionomys. This clustering was obtained in all variants of the analysis, and node support values were significant. The snow voles unite three species occurring only in the Old World, particularly mountain systems of Southwestern, Central and Southeastern Europe and Southwestern Asia. Snow voles inhabit rocky patches of a subalpine and alpine belt from 500 up to 3500 m above the sea level [62,115]. Reliable pre-Pleistocene fossil remains of Chionomys are unknown, and the origin of the genus was previously attributed to the mid-Pleistocene [116]. Our data strongly contradicts this conventional view, and both Lemmiscus and Chionomys probably are more ancient taxa. Also, Lemmiscus and Chionomys occur at different continents and occupy contrasting ecological niches; they are also very dissimilar morphologically. These findings, broadly discussed below, are important for the understanding of the migration events of Arvicolinae from Eurasia to North America.

Phylogenetic relationships within the tribe Arvicolini sensu stricto

By using the mitochondrial genome data, we obtained good support for the nodes within the tribe Arvicolini except for the Arvicola amphibius that clustered with Lagurini (Fig 2). This evident artefact may be related to nucleotide composition bias (see Fig 1, S6 Table). Note that with the third position masked as RY this clustering has weak or no support. On the other hand the evidently wrongposition of A. amphibius can be a consequence of the long-branch attraction effect, and further studies should consider including sequences of the e.g. southern water wole, A. sapidus Miller, 1908 and nuclear genome data for better phylogenetic position resolution of the genus. The other nodes within Arvicolini, hereafter called Arvicolini sensu stricto marking the genus and subgenus level taxa, were recovered as monophyletic and clearly resolved. The phylogenetic pattern indicates two major migration waves of voles to the Nearctic. The earliest derivative from the MRCA is a branch leading to Chionomys—Lemmiscus node and this gives clear indication on the first dispersal of common ancestors of the group from Palearctic to Nearctic. The only recent descendant of this lineage in North America is Lemmiscus. The next split of ancestral lineage evidently took place in Asia and is represented by poorly diversified genus Proedromys and highly diversified cluster, uniting all the rest recent vole genera. This latter cluster further splits into highly supported clade of Asian voles showing sister relationships of genus Neodon Horsfield, 1841 and genera Alexandromys Ognev, 1914 and Lasiopodomys Lataste, 1887 and a cluster uniting two sister clades: Nearctic voles with following fast radiation resulting in nearly 20 recent species (here named Mynomes Rafinesque, 1817 after the earliest name of the generic group level), and a clade that further splits into Western Palearctic (Microtus s.str., Terricola Fatio, 1867 and Sumeriomys Argyropulo, 1933) and one containing taxa distributed in Central Asia (Blanfordimys Argyropulo, 1933), Westernmost Europe (Iberomys Chaline, 1972) and wide-ranged Agricola Blasius, 1857 (from Western Europe to Siberia). It is important to note that trees uniting Nearctic “Microtus’’ species in one cluster were obtained in various studies [12,18,19,21], but for the first time this cluster receives robust support, justifying the genus level status under the name Mynomes. Another significant finding is the more clear assignment of Iberomys cabrerae and Agricola agrestis, both species conventionally assigned to Microtus [62], but whоse position at the molecular trees within the Arvicolini tribe was always uncertain. The tendency for clusterization of A. agrestis and Blanfordimys, though without support, was shown earlier [9,12,18,20,21]. In the paper where both I. cabrerae and A. agrestis were analysed in a comprehensive dataset with CYTB [18] these species appeared in different clusters: A. agrestis with Blanfordimys, while I. cabrerae within the cluster of Nearctic voles, however later on [10] in a detailed study of Asian voles came up with analogous to reported here clustering of I. cabrerae and A. agrestis with Blanfordimys. An important contribution was recently made by Barbosa et al. [34] who used a genomic approach for resolving phylogeny of speciose Microtus voles. According to their results both species appear to be monophyletic, however this study was based only on eight species and lacked most of the genera of the group. According to our results the cluster showing close relationships of these species with Blanfordimys is highly supported.

Divergence time estimates for the major Arvicolinae lineages in the context of fossil record

Dating the origin of Arvicolinae

Our data estimated the time of radiation from the MRCA of all Arvicolinae as ca. 7.3 Ma (Table 1), i.e. the Late Miocene and divergence time of Cricetinae and Arvicolinae from common ancestors around 11 Ma. These dating estimates correspond with fossil records [117] and molecular dating obtained by previous studies [21]. Between the 11.1 and 7.75 Ma (from Early Vallesian to Late Turolian) in Eurasia appeared many taxa, conventionally referred to as microtoid cricetids. These forms were characterised by the arvicoline−like prismatic dental pattern with variously pronounced hypsodonty [117-120] and are generally considered as the ancestors for arvicolids [62,117-119,121]. There is no single opinion concerning the earliest known form assigned to the true Arvicolinae. Several authors attribute Pannonicola (= Ishymomys) sp. As the first fossil representative of Arvicolinae, dated as ca. 7.3 Ma from Middle Turolian, Hungary [122], and Asia [123]. Fejfar and coauthors [117] considered Pannonicola as the starting point of arvicolid evolution, though it is far from the generally accepted point of view. According to other, more widely distributed opinions, the first fossil Arvicolinae is Promimomys Kretzoi 1955 known from the Miocene-Pliocene boundary, dated around 5.3 Ma [14,124,125]. While calibrating the offset of Arvicolinae by Pannonicola at 7 Ma, we received the mean divergence date for the clade Arvicolinae as 7.36 Ma, which is unrealistically too close to FAD of Pannonicola. By excluding this calibration point from analysis, we obtained the 1 Ma younger estimate for the Arvicolinae node (S4 Table). However, during the consistent removal of one calibration point in turn, the estimate of the Arvicolinae origin between 7.36–7.43 remained stable.

Ancient radiation of Arvicolinae and the first migration event from Palearctic to Nearctic

It is generally agreed that Arvicolinae originated from arvicolid-like cricetids that first appeared in Eurasia but not in North America [13,14,124]. Chronology of immigration events supported by fossil evidence was detaily considered earlier [14,126,127 and references therein]. Below we examine how divergence dating obtained in this study is consistent with fossil evidence. The divergence between Dicrostonychini and Ondatrini took place ca. 6 Ma according to our data, indicating that the ancestors of this group were very closely related to the first arvicolines. Feifar et al. [117] pointed that the molar pattern of Pannonicola Kretzoi, 1965, according to the opinion of the authors, the oldest known fossil Arvicolinae, show similarity with Dolomys Nehring, 1898 the putative ancestor of Ondatrini, and possibly Dicrostonychini, indicating their closer relationships. Our data corroborate this grouping. The conventional idea of Promimomys as the first true arvicoline taxon indicates the first immigration timing around 5.5 Ma (Martin, 2010, 2015) as there is no record of this taxon in the older sediments. From the mitochondrial genome data, the date for the MRCA of Lemmini was estimated as 4.81–4.37 Ma. This molecular dating consider ancestors of Lemmini almost a million years older than the fossil remains reliably attributed to Lemmini in Europe [128,129] and Asia [130] dated as the Early Villaniyan (Mammal Neogene zone MN16, 3.2 Ma), while North American fossils of Lemmini were dated at ca. 3.9 Ma or Late Blancan according to Ruez & Gensler [131]. However, these lemming fossils are characterized by very advanced unrooted teeth and masticatory patterns, close to the recent forms of lemmings. Among the potential ancestors here can be mentioned Tobenia kretzoi Fejfar, Repenning, 1998, a species with rooted molars known from the early Pliocene of Wölfersheim, Germany [132]. This record refers to MN15 that is ca. 4 Ma, close to our molecular dating. The molecular estimate of the two major radiation waves of Arvicolinae, leading to the tribes Clethrionomyini, Arvicolini, Ellobiusini and Lagurini, dates back to 6 Ma, Late Miocene, MN 13 (Late Turolian). These findings correspond to paleontological data and confirm the estimates received previously in the study based on nuclear genes [7]. Slightly later, firstly in North America [14] and then in Europe [117,125] and Western Asia [133], the most plesiomorphic arvicoline genus Promimomys Kretzoi, 1955 appeared. This form is considered ancestral to numerous species emerged in the Early Pliocene and conventionally assigned to highly mixed and species-rich genus Mimomys Forsyth-Major, 1902. According to the generally accepted view, different forms of this complex “Mimomys’’ group represented the starting point for all subsequent lineages of Arvicolines. The concept of common ancestry for these forms within this geological period does not contradict the data obtained in the present study and hypothesis proposed by paleontologists [117]. The radiation of common ancestors for all Clethrionomyini species starts later, since Late Ruscinian (MN 15), around 4 Ma.

The origin of the tribe Arvicolini sensu stricto: Second trans-Beringian dispersal

The molecular estimate for the MRCA of node H, Arvicolini s.str. (Fig 2) is ca. 5 Ma. (4.33–5.47). Considering that most primitive forms of the genus Mimomys are among the MRCA candidates for all main genera within the tribe Arvicolini s.str, the obtained time estimate i.e. the very beginning of the Pliocene, may be considered as consistent with the fossil record. However, there is a certain probability of the overestimation of the age of this node during the divergence dating analysis. One of the earliest records of Mimomys in North America was dated as 4.75 Ma [134], while fossil remains from Asia are slightly older [135]. This is the time of the second dispersal of arvicolids from Asia to the Nearctic. The only recent descendant of these immigrants in North America is Lemmiscus curtatus. According to our data, the starting point of evolutionary history for this lineage is around 4 Ma. The earliest remains assigned to the genus are known from the end of Early Pleistocene from the SAM Cave in New Mexico [136] in the sediments according to paleomagnetic and faunistic data that may be dated as 1.8 Ma. Repenning [136] deduced Lemmiscus from plesiomorphic Allophaiomys Kormos, 1932. Remains assigned to the latter taxon are widely distributed among the Early Pleistocene sites dated between 2.2 and 1.6 Ma in both the Palearctic and Nearctic. Yet, Allophaiomys is a rather lumped taxon [137] presumably accepted as ancestral to most Microtus species and associated genera. Tesakov and Kolfschoten [113] suggested the hypothesis of a Mimomys–Lemmiscus phyletic lineage. However, their hypothesis also presumes that ancestral Mimomys (Cromeromys Zazhigin, 1980), a form having rooted molars and inhabiting vast areas from Western Europe to Beringia dispersed southwards across North America in the late Early Pleistocene and evolved there into rootless Lemmiscus. Thus, our dating conflicts with both views and supports the idea of dispersal and further evolution from “Mimomys” stage [113,136] in the middle of the Pliocene, ca. 4 Ma. The fossil remains of Chionomys are known only from the Pleistocene sediments [116]. According to our dating based on mtDNA, the diversification of ancestral lineage may have started in Western Palearctic as early as in the Middle Pliocene, i.e. either late Zanclean or early Piacenzian periods.

Diversification within Arvicolini sensu stricto: Late Pliocene exchange between Palearctic and Nearctic faunas

The other genera within Arvicolini were monophyletic according to Bayesian and ML analyses with high node support. According to conventional view, this group originated from Allophaiomys [62,136], a highly complex taxon common in the Early Pleistocene (ca. 2 Ma) faunas of the Nearctic and Palearctic. Our results on divergence dating raise another hypothesis on the starting point for the group taking place at the level of “Mimomys’’ stage, i.e. in the Pliocene. The genus Proedromys is the first derivative from this common stem, most likely in the middle of the Pliocene (approx. 4 Ma). The standalone position of this genus among other genera of Arvicolini that plausibly derived from Allophaiomys has been earlier underlined by Gromov, Polyakov [62] and Repenning [136]. A further split within Arvicolini took place in the late Pliocene and resulted in the entirely Asian lineage which is currently represented by genera Neodon, Alexandromys and Lasiopodomys. The other, sister lineage emerged in the Early Pliocene, around 3.8–4 Ma, also from the pre-Allophaiomys stage and diverged into two branches. Ancestors of the first branch (Mynomes) penetrated the Nearctic during the third Nearctic immigration event, where they diversified into 20 species. Descendants of the other, Palearctic branch, further produced two lineages. Among them, the first apparently originated in Central Asia and dispersed westwards during the Late Pliocene. Iberomys cabrerae, inhabiting the Iberian Peninsula and the foothills of Pyrenees is a relict descendant of this line [138] and references therein. Our mitogenomic data supports the hypothesis of long independent evolution of Iberomys cabrerae lineage previously confirmed by several unique morphological, biological and ecological traits [138]. The first fossil remains of Iberomys were found in the Early Pleistocene [139] sediments in Spain predating the Jaramillo reversal event (approx. 1.2 Ma). According to the scenario set by Cuenca-Bescós et al. [138], the genus Iberomys is a basal sister group of Arvicolini evolved since its origins in the Early Pleistocene in the western Mediterranean region. The most probable origin and vicariant speciation according to these authors was linked to the stock of Allophaiomys taxa with plesiomorphic character states. The results reported here, in a whole are in a good agreement with this scenario although indicate an earlier time of origin and speciation from the stock predating Allophaiomys-stage and going far back to the Mimomys stage of the Late Pliocene. The other descendants of the same stock in the recent fauna are represented by Agricola and Blanfordimys. The range of Agricola covers whole Europe and stretches to the east up to the Lake Baikal and watershed between the Yenisey and Lena Rivers [62,63]. Recent studies showed that Agricola is represented by three highly divergent lineages, possibly a species level taxa [140]. Three species of Blanfordimys occur in high mountain forests and steppes of Central Asia and are characterized by a very primitive molar pattern, similar to Allophaiomys. The idea that Agricola and Iberomys represent relicts of a very early colonization of Arvicolini to Western Europe was earlier suggested by Martinkova and Moravec [19] and well agrees with the given data. The second lineage of Palearctic branch has evolved in Western Palearctic and in the modern fauna is represented by species-rich genus Microtus (with subgenera Microtus s.str and Sumeriomys) and Terricola (around 14 species, mainly found in South and Southwestern Europe). The divergence between these lineages corresponds to the Late Pliocene, however the speciation events coincided with Early Pleistocene for genus Terricola and early Middle Pleistocene for subgenera Sumeriomys and Microtus. The latter dating matches with known fossil records [109,141]. Summing up the comparison between the molecular estimates of divergence times reported here and known paleontological data, it is curious to note that while the dates for MRCA for most genera within Arvicolini s.str. are significantly older than was previously supposed [109,117,142], dating of speciation events within the genera (Lasiopodomys, Alexandromys, Terricola) are consistent with fossil records [109,134,136,141-143] and others. Although our results are consistent both with fossil record and previous molecular dating results it is worth to mention, that the lack of calibration points for terminal nodes in our analysis may be the source of underestimation of basal node ages, while ages of several youngest nodes in Arvicolini could be overestimated.

Systematic remarks

While systematic relationships of higher taxa within Arvicolinae undoubtedly require further studies involving genomic approaches, some amendments to the current taxonomic system could be made already at this step of the research. Our study provided significant input for the potential review of taxonomic structure and composition of the tribe Arvicolini (S2 File). Our data shows that the position of the genus Arvicola is still unresolved. On the contrary, genera Lemmiscus and Hyperacrius certainly should be considered as members of the tribe Arvicolini. The further grouping of species into genera and subgenera within this highly diverse tribe always was very subjective and debatable. Most arguable was the composition of the genus Microtus. The current system [62] where Blanfordimys, Neodon and Lasiopodomys have generic status, while Alexandromys, Stenocranius and Terricola are referred as subgenera within the genus Microtus is strongly outdated and contradicts the data of recent phylogenetic studies. The last checklist [2] partly modified this scheme and following Abramson and Lissovsky [63] elevated Alexandromys to full genus and Stenocranius considered as subgenus of Lasiopodomys, and Neodon as a genus. However, despite the accumulated evidence from several previous papers [9,19,144], in this reference book without substantiation the status of Blanfordimys was downgraded [145] while three well differentiated lineages (Blanfordimys, all Nearctic microtines, Terricola, Microtus and Sumeriomys) were illogically united in one genus Microtus. These well-differentiated lineages together form the sister branch to one with similar branching pattern and recognized three genera: Alexandromys, Lasiopodomys and Neodon. It is widely known, that the better is the phylogenetic resolution of any species-rich group the more complicated it matches the conventional hierarchical categories of Linnean system. Trying to keep as much stability of nomenclature retaining the already commonly used names that correspond to certain lineages from one hand and to reflect robust phylogenetic nodes in a formal classification from the other, here we suggest the following system of generic group taxa within the tribe Arvicolini sensu stricto which is given in S2 File.

Conclusions

Our phylogenetic analysis based on a complete mitochondrial genomes confirmed the monophyly of the subfamily, monophyly of the most tribes originated during the three subsequent radiation events. While order of divergence between ancient genera belonging to the first radiation were not uniformly supported by Bayesian and Maximum Likelihood analyses, our study reports the high node statistical support for the groups of genera within the highly diverse tribe Arvicolini. Mitogenome phylogeny resolved several previously reported polytomies and also revealed unexpected relationships between taxa. The robust placement of Lemmiscus as sister to the snow voles, Chionomys in the tribe Arvicolini, in contrast with a long-held belief of its affinity with Lagurini, is an essential novelty of our phylogenetic analyses. Our results resolve some of the ambiguous issues in phylogeny of Arvicolinae, but some phylogenetic relationships require further genomic studies, e.g. the evaluation of the precise positions of Arvicola, Dinaromys and Hyperacrius. Here, we provide the evidence of high informativeness of the mitogenomic data for phylogenetic reconstruction and divergence time estimation within Arvicolinae, and suggest that mitogenomes can be highly informative, when the number of extant and extinct forms are comparable (the case of Arvicolini) and insufficient when extant forms represent single lineages of once rich taxon (most cases of early radiation in the subfamily). The accuracy and precision of previous divergence time estimates derived from multigene nDNA and nDNA–mtDNA datasets are here refined and improved. The estimates for subfamily origin and early divergence are consistent with fossil record, however mtDNA estimates for putative ancestors of most genera within Arvicolini appeared to be much older than it was supposed from paleontological studies.

Mapping of raw Illumina reads of Lemmiscus curtatus against the reference mitochondrial genome of Mynomes ochrogaster using Geneious Prime.

(PDF) Click here for additional data file. Nucleotide misincorporations at 5’-termini (A) and 3’-termini (B) of the Lemmiscus curtatus calculated using mapDamage. All possible misincorporations are plotted in gray, except for guanine to adenine (G>A, blue lines) and cytosine to thymine (C>T, red lines). (PDF) Click here for additional data file.

A molecular phylogeny for the subfamily Arvicolinae reconstructed using the complete PCG dataset and Bayesian algorithm without three Cricetidae species.

Major Arvicolinae tribes are indicated by color coding (Arvicolini—light blue, Lagurini—blue, Ellobiusini—purple, Clethrionomyini—magenta, Dicrostonychini—dark green, Ondatrini—light green, Prometheomyini—yellow, Lemmini—red, nomen nudum species—black). Black circles show nodes with 0.95–1.0 BI and 95–100 ML support. (JPG) Click here for additional data file.

Material, GenBank accession numbers and mitogenome characteristics.

(XLSX) Click here for additional data file.

Optimal partitioning schemes of the concatenated full and RY-coding alignments for mitochondrial protein-coding genes obtained using PartitionFinder for the subsequent Bayesian phylogenetic analysis.

(XLSX) Click here for additional data file.

Fossil calibrations used in dating analysis.

Node labels correspond to Fig 2, ages are given in million years ago (Ma). FAD—age of the first appearance of the taxon in the fossil record (first appearance date). (XLSX) Click here for additional data file.

Additional divergence time estimates for the major lineages within the subfamily Arvicolinae.

Complete—the main variant of divergence dating analysis (Table 1); A-H—results of the analyses with alternate exclusion of calibrations (labels show what calibration was excluded). (XLSX) Click here for additional data file.

Completeness of analyzed mitogenomes.

Mitogenomes obtained in the current study are marked in bold, partial genes colored by yellow. Absent genes indicated with red color. The length of protein-coding genes is given in nucleotides. (XLSX) Click here for additional data file.

Source table for the PCA analysis.

Each codon position is indicated by numbers. For nucleotides, percentages are given. (XLSX) Click here for additional data file.

Test of substitution saturation.

Analysis performed on all sites for 1&2nd and 3rd codon position separately. Iss—index of substitution saturationIssSym is Iss.c assuming a symmetrical topology, IssAsym is Iss.c assuming an asymmetrical topology, NumOTU—number of operation taxonomic units. Red color indicates P-value < 0.05. (XLSX) Click here for additional data file.

A molecular phylogeny for the subfamily Arvicolinae reconstructed using the complete PCG dataset and each of the 13 individual genes with the corresponding saturation plots indicated.

Major Arvicolinae tribes are indicated by color coding (Arvicolini—light blue, Lagurini—blue, Ellobiusini—purple, Clethrionomyini—magenta, Dicrostonychini—dark green, Ondatrini—light green, Prometheomyini—yellow, Lemmini—red, nomen nudum species—black). Bayesian topology was used to plot the tree for the complete 13 PCGs dataset. Node labels display the following supports: BI complete / BI RY-coded 3rd codon position / ML complete / ML RY-coded 3rd codon position. Black circles show nodes with 0.95–1.0 BI and 95–100 ML support. For each of the PCGs, maximum likelihood topology is given, node labels display ultrafast ML bootstrap above 50%. Saturation plots are indicated on the side insets, where colors mark the following partitions: 1st codon position transitions (ts)—brown, 1st transversions (tv)—red, 2nd ts—blue, 2nd tv—green, 3rd ts—pink, 3rd tv—black. (PDF) Click here for additional data file.

The proposed system of generic group taxa within the tribe Arvicolini sensu stricto.

(DOCX) Click here for additional data file.
  81 in total

1.  SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing.

Authors:  Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner
Journal:  J Comput Biol       Date:  2012-04-16       Impact factor: 1.479

2.  PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses.

Authors:  Robert Lanfear; Paul B Frandsen; April M Wright; Tereza Senfeld; Brett Calcott
Journal:  Mol Biol Evol       Date:  2017-03-01       Impact factor: 16.240

3.  ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R.

Authors:  Emmanuel Paradis; Klaus Schliep
Journal:  Bioinformatics       Date:  2019-02-01       Impact factor: 6.937

4.  Cryptic speciation in the field vole: a multilocus approach confirms three highly divergent lineages in Eurasia.

Authors:  J Paupério; J S Herman; J Melo-Ferreira; M Jaarola; P C Alves; J B Searle
Journal:  Mol Ecol       Date:  2012-11-19       Impact factor: 6.185

5.  MITOS: improved de novo metazoan mitochondrial genome annotation.

Authors:  Matthias Bernt; Alexander Donath; Frank Jühling; Fabian Externbrink; Catherine Florentz; Guido Fritzsch; Joern Pütz; Martin Middendorf; Peter F Stadler
Journal:  Mol Phylogenet Evol       Date:  2012-09-07       Impact factor: 4.286

6.  Whole mitochondrial genome of long-clawed mole vole (Prometheomys schaposchnikowi) from Turkey, with its phylogenetic relationships.

Authors:  Osman İbiş; Ahmet Yesari Selçuk; Benjamin N Sacks; Barış Yıldız; Servet Özcan; Haluk Kefelioğlu; Coşkun Tez
Journal:  Genomics       Date:  2020-06-06       Impact factor: 5.736

7.  Complete mitochondrial genomes confirm the generic placement of the plateau vole, Neodon fuscus.

Authors:  Jian-Qiu Li; Li Li; Bao-Quan Fu; Hong-Bin Yan; Wan-Zhong Jia
Journal:  Biosci Rep       Date:  2019-08-09       Impact factor: 3.840

8.  The adaptive evolution of the mammalian mitochondrial genome.

Authors:  Rute R da Fonseca; Warren E Johnson; Stephen J O'Brien; Maria João Ramos; Agostinho Antunes
Journal:  BMC Genomics       Date:  2008-03-04       Impact factor: 3.969

9.  Phylotranscriptomics: saturated third codon positions radically influence the estimation of trees based on next-gen data.

Authors:  Jesse W Breinholt; Akito Y Kawahara
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

10.  Complete mitochondrial genome of the Eurasian collared lemming Dicrostonyx torquatus Pallas, 1779 (Rodentia: Arvicolinae).

Authors:  Vadim B Fedorov; Anna V Goropashnaya
Journal:  Mitochondrial DNA B Resour       Date:  2016-11-11       Impact factor: 0.658

View more
  2 in total

1.  First mitochondrial genome of the Amazonian marsh rat Holochilus sciureus Wagner 1842 (Rodentia, Cricetidae, Sigmodontinae).

Authors:  Susette Castañeda-Rico; Jesús E Maldonado; Robert C Dowler; Cody W Edwards
Journal:  Mitochondrial DNA B Resour       Date:  2022-08-29       Impact factor: 0.610

2.  Signatures of Adaptation in Mitochondrial Genomes of Palearctic Subterranean Voles (Arvicolinae, Rodentia).

Authors:  Olga Bondareva; Evgeny Genelt-Yanovskiy; Tatyana Petrova; Semen Bodrov; Antonina Smorkatcheva; Natalia Abramson
Journal:  Genes (Basel)       Date:  2021-12-02       Impact factor: 4.096

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.