| Literature DB >> 29904704 |
Shaoyuan Wu1, Scott Edwards2, Liang Liu3.
Abstract
We present a genomic data set comprised of the coding DNA sequences of 5162 loci from 90 vertebrate species, including 82 mammals. The loci were aligned with their protein sequences. The aligned protein sequences were then back translated into their original DNA sequences. The alignments were further filtered to remove individual sequences from each alignment exhibiting long branches or other unusual features. The data is deposited in figshare (http://figshare.com/articles/cds_5162.zip/6031190) and will be useful as a test data set for large-scale phylogenomic analysis.Entities:
Keywords: Alignment; Mammal; Phylogenomics
Year: 2018 PMID: 29904704 PMCID: PMC5998303 DOI: 10.1016/j.dib.2018.04.094
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1The summary of the alignments for 5162 loci. (a) The histogram of the sequence length across 5162 loci. (b) The histogram of the number of species across 5162 loci. (c) The boxplot for the proportion of missing characters across 5162 loci. Missing characters include gaps and ambiguous characters.
| Subject area | |
|---|---|
| More specific subject area | |
| Type of data | |
| How data was acquired | |
| Data format | |
| Experimental factors | |
| Experimental features | |
| Data source location | |
| Data accessibility |