Literature DB >> 27054158

Data on phylogenetic analyses of gazelles (genus Gazella) based on mitochondrial and nuclear intron markers.

Hannes Lerp1, Sebastian Klaus2, Stefanie Allgöwer2, Torsten Wronski3, Markus Pfenninger4, Martin Plath5.   

Abstract

The data provided is related to the article "Phylogenetic analyses of gazelles reveal repeated transitions of key ecological traits and provide novel insights into the origin of the genus Gazella" [1]. The data is based on 48 tissue samples of all nine extant species of the genus Gazella, namely Gazella gazella, Gazella arabica, Gazella bennettii, Gazella cuvieri, Gazella dorcas, Gazella leptoceros, Gazella marica, Gazella spekei, and Gazella subgutturosa and four related taxa (Saiga tatarica, Antidorcas marsupialis, Antilope cervicapra and Eudorcas rufifrons). It comprises alignments of sequences of a cytochrome b data set and of six nuclear intron markers. For the latter new primers were designed based on cattle and sheep genomes. Based on these alignments phylogenetic trees were inferred using Bayesian Inference and Maximum Likelihood methods. Furthermore, ancestral character states (inferred with BayesTraits 1.0) and ancestral ranges based on a Dispersal-Extinction-Cladogenesis model were estimated and results׳ files were stored within this article.

Entities:  

Year:  2016        PMID: 27054158      PMCID: PMC4802422          DOI: 10.1016/j.dib.2016.02.062

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table

Value of the data

New nuclear intron primers for phylogenetic investigations of closely related bovid species. Data provide phylogenetic insight into the genus Gazella. Ancestral character state and ancestral range information for the genus Gazella were inferred with this data.

Data

Data provided with this article are newly established primer sequences of nuclear intron markers for bovids and sequence alignments of the respective markers and Cyt b including species from the genera Gazella, Eudorcas, Antilope, Saiga and Antidorcas. Furthermore, phylogenetic tree files and result files from analyses of ancestral character state estimation and ancestral ranges estimation for the genus Gazella are shared.

Experimental design, materials and methods

PCR primer design

We designed new nuclear primers for the amplification of introns of the nuclear encoded genes zinc finger protein 618 (ZNF618), epidermal growth factor receptor substrate 15-like 1 (EPS15L1), SPARC-related modular calcium-binding protein 1 (SMOC1), pantothenate kinase 4 (PANK4), NACHT, LRR and PYD domains-containing protein 2 (NLRP2) and chromodomain-helicase-DNA-binding protein 2 (CHD2; Table 1). We used the sheep (Ovis aries) genome, available on the website of the international sheep genomics consortium (http://www.livestockgenomics.csiro.au/sheep/oar1.0.php), and cattle (Bos taurus) genome, available from the Ensembl genome database (http://www.ensembl.org/Bos_taurus/Info/Index). We searched the sheep genome for annotated protein-coding genes and used the provided Swiss-Prot number to search for the corresponding gene sequences in the cattle genome. If those sequences contained introns of a length between 400 and 1000 bp, we assembled the exons of the respective gene with the complete gene sequence of sheep using Geneious Pro 5.4.2 (Biomatters Ltd., available from http://www.geneious.com). Primers were subsequently designed according to conserved regions of the exons of cattle and sheep in a way that the resulting sequences stretched across at least one intron. To avoid linkage disequilibrium we only used genes on different chromosomes. Primers were designed using the Oligonucleotide Properties Calculator [2] and the reverse complement converter (http://www.bioinformatics.org/sms/rev_comp.html). All primers were synthesized by Eurofins MWG Synthesis GmbH.
Table 1

Newly designed intron primers for bovid species with chromosome number of sheep and cattle, Swiss-Prot number, melting (TM) and annealing temperatures, amplification lengths and GC contents.

Primer nameProteinChromosome number sheepChromosome number cattleSwiss-Prot numberPrimer forwardReverse
ZNF618Zinc finger protein 618Chr 2Chr 8Q5T7W0TCC TAT GAG TGT GGA ATC TGT GGTCT CCT GAG GTG GCT TCA GTG
EPS15L1Epidermal growth factor receptor substrate 15-like 1Chr 5Chr 7A7MB30CAA AGA CCA GTT CGC GTT AGC TATCC CCC GAT CCA AGA GTG CT
Smoc1SPARC-related modular calcium-binding protein 1Chr 7Chr 10Q9H4F8TGG CTA CTG CTG GTG TGT GCCCTGTCCTTGAAGGGGTCCT
PANK4Pantothenate kinase 4Chr 12Chr 16Q4R4U1ACT GGG GGT GGG GCA TAC AAGGT CAT CAC ATC CTC CTT GTC AA
NLRP2NACHT, LRR and PYD domains-containing protein 2Chr 14Chr 18Q9NX02CAG TCC CTC ACA TGC TTG AACCAG TTT CAC CCC ACG ATC TC

Sequence alignments

DNA was extracted using the Qiagen DNeasy blood and tissue kit according to the manufacturer’s protocol. Sequences were obtained by Sanger sequencing, and newly established sequences were deposited in GenBank (Table 2). We aligned sequences with MUSCLE ([3]; gapopen=−400; gapextend=−200). In total, the concatenated alignment consisted of 4,623 nucleotides. The Cyt b gene partition was translated into amino acid sequences and checked for stop codons that would indicate potential pseudogenes. The alignments for the six nuclear introns of the genes ZNF618, EPS15L1, SMOC1, PANK4, NLRP2, CHD2 and the mitochondrial Cytochrome b gene are supplemented to this article (Lerp_et_al_Gazella_{gene code}_alignment.nexus).
Table 2

Accession numbers of sequences used in this study.

Sample IDCHD2EPS15L1NLRP2PANK4SMOC1ZNF618Cytb
GH1KU560704KU560659KU560746KU560790KU560880KU560837KU560629
TAUM 11861KU560705KU560660KU560747KU560791KU560881KU560838KC188775
TAUM 12479KU560706KU560661KU560748KU560792KU560882KU560839KC188774
TAUM 10170KU560707KU560662KU560749KU560793KU560883KU560840KC188740
TAUM 11048KU560708KU560663KU560750KU560794KU560884KU560841KC188759
GGF41KU560709KU560664KU560751KU560795KU560885KU560842KU560630
OmanIKU560710KU560665KU560752KU560796KU560886KU560843KU560648
3455KU560711KU560666KU560753KU560797KU560844KU560649
182KU560798KU560887JN410348
3463KU560712KU560667KU560754KU560799KU560888KU560631
3466KU560713KU560668KU560755KU560800KU560845KU560650
3467KU560714KU560669KU560756KU560801KU560846KU560632
3469KU560715KU560670KU560757KU560802KU560847KU560651
Chad19KU560716KU560671KU560758KU560803KU560889JN410237
Chad7KU560717KU560672KU560804KU560890JN410235
2866KU560718KU560673KU560759KU560805KU560891KU560848JN410252
3564KU560719KU560674KU560760KU560806KU560892KU560849JN410230
AWWP 9159KU560720KU560675KU560761KU560807KU560893KU560850JN410319
PCGD59KU560721KU560676KU560762KU560808KU560894KU560851JN410251
PCGD1KU560677KU560809KU560895KU560852JN410257
3261KU560722KU560678KU560763KU560810KU560896KU560853JN410255
MongoKU560723KU560679KU560764KU560811KU560897KU560652
AWWP 9053KU560724KU560680KU560765KU560812KU560898KU560854KU560653
583KU560725KU560681KU560766KU560813KU560899KU560855KU560633
7KU560726KU560682KU560767KU560814KU560900KU560856JN410357
9KU560727KU560683KU560768KU560815KU560901KU560857JN410341
6KU560728KU560684KU560769KU560816KU560902KU560858JN410340
10KU560770KU560817KU560903KU560859KU560634
2887KU560729KU560685KU560771KU560818KU560904KU560860KU560635
2885KU560730KU560686KU560772KU560819KU560905KU560861KU560636
781KU560731KU560687KU560773KU560820KU560906KU560862JN410345
782KU560732KU560688KU560774KU560821KU560907KU560863JN410344
75KU560733KU560775KU560822KU560908KU560864KU560654
90KU560734KU560689KU560823KU560909KU560865KU560655
271KU560690KU560776KU560824KU560910KU560866KU560656
AWWP 7895KU560691KU560777KU560825KU560867KU560657
AWWP 9055KU560735KU560692KU560778KU560826KU560868KU560637
AWWP 8397KU560693KU560779KU560827KU560869KU560638
AWWP 7238KU560736KU560694KU560780KU560870KU560658
OZ1KU560737KU560695KU560781KU560828KU560911KU560871KU560639
OZ2KU560738KU560696KU560782KU560829KU560912KU560872KU560640
OZ3KU560739KU560697KU560783KU560830KU560913KU560873KU560641
OZ4KU560740KU560698KU560784KU560831KU560874KU560642
S06KU560741KU560699KU560785KU560832KU560875KU560643
S08KU560742KU560700KU560786KU560833KU560876KU560644
S10KU560743KU560701KU560787KU560834KU560877KU560645
S12KU560744KU560702KU560788KU560835KU560878KU560646
SBKU560745KU560703KU560789KU560836KU560879KU560647

Phylogenetic analyses

Phylogeny and divergence times were estimated with a Bayesian approach in BEAST MC3 1.7.5 [4]. Additionally, we inferred a species tree using a coalescence approach on the multiple loci as implemented in the *BEAST algorithm [8] that we used for subsequent ancestral character (1000 trees) and range (maximum clade credibility tree) estimation. Molecular clock rates and substitution schemes were unlinked between partitions. We inferred the most likely substitution model for each marker using jModelTest 2.1.3 [9], considering models with equal/unequal base frequencies and with/without rate variation among sites (base tree for likelihood calculations=ML tree; tree topology search operation=NNI; the best model was inferred based on the Akaike Information Criterion). This resulted in a HKY+G model of sequence evolution for all genes except for PANK4 with a HKY model. We applied a Yule tree prior to account for independently evolving lineages. We chose an uncorrelated log-normal relaxed molecular clock using an external substitution rate for the Cytb gene (normally distributed rate with a mean of 1.50±0.15% per Ma; 5–95% interquantile range: 1.25–1.75% per Ma; [10]). This rate was estimated based on four different alignments of primate protein-coding mitochondrial sequences and fossil calibration points for six primate data sets using a Bayesian approach [10]. For the more conserved nuclear genes reliable external rates were not available, and so we assumed a very broad exponentially distributed prior with a mean of 0.01% per Ma (5–95% interquantile range: 0.01–0.30% per Ma). We ran three chains for 50 M iterations, sampling every 10,000th iteration. Convergence of sampled parameters and potential autocorrelations (effective sampling size for all parameters>200) were investigated in Tracer 1.6 [11]. We discarded the first 10% of sampled trees as burn-in. The maximum clade credibility tree was chosen and parameter values annotated using TreeAnnotator (part of the BEAST package). The resulting substitution rates were 0.97% per Ma for Cyt b (95% credibility interval, CI: 0.05–1.45%), 0.12% per Ma for EPS15L1 (CI: 0.05–0.19%), 0.17% per Ma for NLRP2 (CI: 0.08–0.27%), 0.16% per Ma for SMOC1 (CI: 0.04–0.32%), 0.21% per Ma for ZNF618 (CI: 0.1–0.32%) and 0.11% per Ma for PANK4 (CI: 0.05–0.18%). To confirm the tree topology calculated in BEAST we also analyzed the concatenated data set with a Maximum Likelihood (ML) approach. ML-analysis was performed with RAxML 8.0.14 [5] under a GTR+Γ model that was unlinked for all partitions. Support of nodes was assessed with 1,000 bootstrap replicates. Phylogenetic (Bayesian and ML) and species trees are Supplemented to this article (Lerp_et_al_Gazella_phylogeny_{program}.nwk and Lerp_et_al_Gazella_Species_Tree_starBEAST.nwk).

Ancestral character state estimation

We estimated ancestral characters for ecological and behavioral traits using a Bayesian approach to character evolution in BayesTraits multistate 1.0 [6]. The analysis was conducted with 1000 randomly selected post-burn-in trees to account for uncertainty in phylogenetic reconstruction; outgroups were removed with exception of Antilope cervicapra (the sister group to Gazella, see [1]). We estimated ancestral character states for three key ecological/behavioral traits: habitat type (mountainous vs. plain-dwelling), group size (small groups<15 individuals vs. large herds), and movement patterns (sedentary vs. migratory; see input files). In addition, we reconstructed ancestral character states for presence or absence of horns in females, and the occurrence of twinning (see Table S2 in [1]). We ran the analysis for 20 M iterations, sampling every 10,000th iteration and discarding the first 10% as burn-in. To specify the range of values used to seed the prior distribution, we applied an exponential hyperprior with a mean ranging from 0.0 to 0.5 and a rate deviation of seven (twinning=2, female horns=6), resulting in mean acceptance rates between 20% and 40%. To further corroborate the ancestral state in the most recent common ancestor (MRCA) of the genus Gazella we additionally applied a model testing approach. In separate runs – with the general MCMC setting as described above – we constrained the ancestral condition of the MRCA of Gazella to each of the alternative states and compared the harmonic mean of likelihoods (as an estimator of marginal likelihoods) using the Bayes factor (BF). As harmonic means tend to be unstable, we repeated each run five times and calculated the BF from the arithmetic means. Result files of the ancestral character state estimation (ACSE) are supplemented to this article (Lerp_et_al_Gazella_ACSE_{trait}.txt).

Biogeography

To estimate ancestral ranges based on a Dispersal-Extinction-Cladogenesis (DEC) model as implemented in the software Lagrange v. 20130526 [7] the species tree (maximum clade credibility tree with median heights) obtained through Bayesian inference was used as phylogenetic input. Species were assigned to one of four discrete geographic areas: (a) Africa, (b) Middle East, (c) Central Asia, and (d) India (Figure 3 in [1]). We did not take into account the distribution data of the more distant outgroups, but included the genus Antilope as the nearest extant relative of the genus Gazella. To test for the direction of dispersal we calculated three models of range evolution: without constrained dispersal (H0); with dispersal only from Africa to Asia (i.e., Middle East, Central Asia, India) allowed (Afr→As), and a third model allowing only dispersal from Asia to Africa (As→Afr). We compared the resulting global maximum likelihood at the root nodes and the AIC between models (Table 1 in [1]). In all three models, Africa was assumed adjacent only to the Middle East, while adjacency between the three Asian ranges was not constrained. Model results can be found within this article (Lerp_et_al_Gazella_DEC_H0.txt, Lerp_et_al_Gazella_DEC_Afr→As.txt, Lerp_et_al_Gazella_DEC_As→Afr.txt).
Subject areaBiology, genetics and genomics
More specific subject areaPhylogenetics and phylogenomics
Type of dataTables, primer sequences, sequence alignments, phylogenetic trees, ancestral character state estimation and ancestral ranges estimation.
How data was acquiredPrimers were designed using the Oligonucleotide Properties Calculator [2]. Sequences were aligned with MUSCLE [3].
Phylogenetic trees were inferred with BEAST MC3 1.7.5 [4] and RAxML 8.0.14 [5].
Ancestral character state estimation was conducted with BayesTraits multistate 1.0 [6].
Ancestral ranges were estimated based on a Dispersal-Extinction-Cladogenesis (DEC)-model implemented in Lagrange v. 20130526 [7].
Data formatAnalyzed
Experimental factorsSample types used for DNA extraction were tissue, skin, blood and hairs and were extracted using Qiagen DNeasy blood and tissue kit according to the manufacturer’s protocol.
Experimental featuresWe sampled gazelle species from a wide geographic range to cover as much of the extant diversity as possible.
Data source locationSamples were collected in Israel, Saudi Arabia, Oman, Chad, Algeria, Sudan, Tunisia, Mongolia, Pakistan, and from captive breeding stocks
Data accessibilityData is available within the article.
  10 in total

1.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

2.  Bayesian estimation of ancestral character states on phylogenies.

Authors:  Mark Pagel; Andrew Meade; Daniel Barker
Journal:  Syst Biol       Date:  2004-10       Impact factor: 15.683

3.  Phylogenetic analyses of gazelles reveal repeated transitions of key ecological traits and provide novel insights into the origin of the genus Gazella.

Authors:  Hannes Lerp; Sebastian Klaus; Stefanie Allgöwer; Torsten Wronski; Markus Pfenninger; Martin Plath
Journal:  Mol Phylogenet Evol       Date:  2016-01-28       Impact factor: 4.286

4.  Maximum likelihood inference of geographic range evolution by dispersal, local extinction, and cladogenesis.

Authors:  Richard H Ree; Stephen A Smith
Journal:  Syst Biol       Date:  2008-02       Impact factor: 15.683

5.  Time dependency of molecular rate estimates and systematic overestimation of recent divergence times.

Authors:  Simon Y W Ho; Matthew J Phillips; Alan Cooper; Alexei J Drummond
Journal:  Mol Biol Evol       Date:  2005-04-06       Impact factor: 16.240

6.  jModelTest 2: more models, new heuristics and parallel computing.

Authors:  Diego Darriba; Guillermo L Taboada; Ramón Doallo; David Posada
Journal:  Nat Methods       Date:  2012-07-30       Impact factor: 28.547

7.  BEAST: Bayesian evolutionary analysis by sampling trees.

Authors:  Alexei J Drummond; Andrew Rambaut
Journal:  BMC Evol Biol       Date:  2007-11-08       Impact factor: 3.260

8.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

9.  Bayesian inference of species trees from multilocus data.

Authors:  Joseph Heled; Alexei J Drummond
Journal:  Mol Biol Evol       Date:  2009-11-11       Impact factor: 16.240

10.  OligoCalc: an online oligonucleotide properties calculator.

Authors:  Warren A Kibbe
Journal:  Nucleic Acids Res       Date:  2007-04-22       Impact factor: 16.971

  10 in total
  1 in total

1.  Gazella arabica dareshurii: a remarkable relict population on Farur Island, Iran.

Authors:  Davoud Fadakar; Mojdeh Raam; Hannes Lerp; Ali Ostovar; Hamid Reza Rezaei; Eva V Bärmann
Journal:  BMC Ecol Evol       Date:  2021-11-28
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.