Literature DB >> 29904706

New molecular sequence data and species trees for North American whipsnakes.

Kyle A O'Connell1,2, Eric N Smith1.   

Abstract

In this data article we present species trees based on coalescent species delimitation results for North American whipsnakes, as well as metadata pertaining to the article "The effect of missing data on coalescent species delimitation and a taxonomic revision of whipsnakes (Colubridae: Masticophis)" (MPE-2017-76-R1). Species trees were constructed using SNP data generated from double-digest RADseq, filtered to 80% completeness between species. Tables correspond with the primary manuscript and serve as a repository of genetic sequence information for whipsnakes. These data can be downloaded and combined with future whipsnake datasets.

Entities:  

Year:  2018        PMID: 29904706      PMCID: PMC5998179          DOI: 10.1016/j.dib.2018.04.067

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table Value of the data The mitochondrial phylogeny shows the phylogenetic structure of whipsnakes, with an emphasis on the Great Plains lineage of the United States. The included species trees reveal previously unknown phylogenetic relationships among whipsnakes. Supplementary Table 1 shows metadata for whipsnake sequence data that could be used in future studies. Table 1 explains missing data thresholds used for various analyses.
Table 1

Parameters for each SNP dataset utilized in this study, including the number of loci, and the percent missing data.

DatasetN% Missing at locusMean % missing individual# LociAnalysis used
A143020365SPLITSTREE
B265035.32077SNAPP
C262013.3325SNAPP
D105038.61464SNAPP
E102016.2216SNAPP

Data

The data included in this DIB article includes details on two species trees of North American whipsnakes, as well as the meta-data for all whipsnake sequence data used in the original article [1]. The species trees were estimated using coalescent methods from SNP data, with 20% missing data thresholds. Two species trees to encompass both species complexes included in this study. Meta-data includes information for both mitochondrial sequence data, and ddRADseq short sequence reads (Table 1). Parameters for each SNP dataset utilized in this study, including the number of loci, and the percent missing data.

Experimental design, materials and methods

Mitochondrial phylogenetic analysis

We aligned all sequences with the Geneious Aligner under default settings [2]. We calculated uncorrected average pairwise distance between lineages in Mega v7 [3]. We selected the most probable model of nucleotide evolution for Likelihood analyses using Bayesian information criteria implemented in PartitionFinder [4], partitioning by codon position. We estimated a maximum likelihood phylogeny using raxmlGUI v1.3 with 1000 rapid bootstrap iterations [5] and visualized our final phylogeny in FigTree v1.4.3 [6]. We considered nodes with bootstrap values ≥70 as strongly supported (Fig. 1).
Fig. 1

Maximum likelihood phylogeny generated from mtDNA. The full clade representing Masticophis flagellum testaceus is shown. All other clades are collapsed. Nodes with at least 70% bootstrap support are shown with grey circles.

Maximum likelihood phylogeny generated from mtDNA. The full clade representing Masticophis flagellum testaceus is shown. All other clades are collapsed. Nodes with at least 70% bootstrap support are shown with grey circles.

SNP-based species tree analysis

Species trees were estimated with ≤20% missing data using SNAPP v1.0. We assigned species identities based on the best supported model from our BFD* analyses. We allowed BEAUti to estimate the mutation rate, and confirmed that both U and V were approximately equal to one. We assigned a Gamma distribution to our Lambda prior, with an Alpha of 1 and a Beta of 77. On our Snap prior we assigned an Alpha of 1, a Beta of 100, and a Lambda of 77. We ran the analyses for 10,000,000 MCMC generations, sampling every 1000 generations. We visualized the complete tree sets in DENSITREE v1.0 [7], and removed the first 10% of trees as burn-in (Fig. 2).
Fig. 2

Species trees generated using SNAPP based on the best-supported models from our Bayes Factor delimitation analysis from the primary manuscript for datasets C and E (< 20% missing loci). Support values are labeled for each node that is not fully supported.

Species trees generated using SNAPP based on the best-supported models from our Bayes Factor delimitation analysis from the primary manuscript for datasets C and E (< 20% missing loci). Support values are labeled for each node that is not fully supported.
Subject areaBiology
More specific subject areaPhylogenetics, Herpetology
Type of dataFigure, table
How data was acquiredPhylogenetic analyses
Data formatAnalyzed, raw
Experimental factors
Experimental featuresConducted phylogenetic analyses
Data source locationUSA, Mexico
Data accessibilityGenBank (KT713652-KT713738), NCBI Short Read Archive SRS1047296, SRS1047267, SRS1047268, SRS1047265
  5 in total

1.  Partitionfinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses.

Authors:  Robert Lanfear; Brett Calcott; Simon Y W Ho; Stephane Guindon
Journal:  Mol Biol Evol       Date:  2012-01-20       Impact factor: 16.240

2.  DensiTree: making sense of sets of phylogenetic trees.

Authors:  Remco R Bouckaert
Journal:  Bioinformatics       Date:  2010-03-12       Impact factor: 6.937

3.  MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets.

Authors:  Sudhir Kumar; Glen Stecher; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2016-03-22       Impact factor: 16.240

4.  The effect of missing data on coalescent species delimitation and a taxonomic revision of whipsnakes (Colubridae: Masticophis).

Authors:  Kyle A O'Connell; Eric N Smith
Journal:  Mol Phylogenet Evol       Date:  2018-03-20       Impact factor: 4.286

5.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.