| Literature DB >> 26152111 |
Serap Gonen1, Stephen C Bishop2, Ross D Houston3.
Abstract
BACKGROUND: Restriction site-Associated DNA sequencing (RAD-Seq) is widely applied to generate genome-wide sequence and genetic marker datasets. RAD-Seq has been extensively utilised, both at the population level and across species, for example in the construction of phylogenetic trees. However, the consistency of RAD-Seq data generated in different laboratories, and the potential use of cross-species orthologous RAD loci in the estimation of genetic relationships, have not been widely investigated. This study describes the use of SbfI RAD-Seq data for the estimation of evolutionary relationships amongst ten teleost fish species, using previously established phylogeny as a benchmark.Entities:
Mesh:
Year: 2015 PMID: 26152111 PMCID: PMC4495686 DOI: 10.1186/s13104-015-1261-2
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Descriptions of the RAD sequences and the studies from which they were obtained
| Species | Reference | Consensus sequence availability | Initial number of sequences | Sequence length (bp) | Post-processed number of sequences | Protocol and pipeline details | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| RAD-Seq library preparation protocol | Fragment size selection window (bp) | Sequencing platform | Sequence analysis pipeline | Minimum depth coverage per locus | ||||||
| Chinook salmon ( | Brieuc et al. [ | Online (SE)e | 62,249 | 75 | 62,249 | Baird et al. [ | 200–500 | Illumina GAII/HiSeq | STACKS | Locus sequenced in 135 (85%) individuals |
| Sockeye salmon ( | Everett et al. [ | Provided by authors (SE) | 64,613 | 60 | 64,613 | Baird et al. [ | 400–800 | Illumina GAII/HiSeq | Custom-written Perl scripts, Bowtie, Novoalign | 10 reads per allele per locus per individual |
| Rainbow trout ( | Hecht et al. [ | Provided by authors (SE) | 12,073 | 67 | 32,027 | Miller et al. [ | 200–500 | Illumina GAII/HiSeq 2000 | Perl scripts from Miller et al. (2012), Novoalign | 5 reads per locus per individual |
| Hale et al. [ | Provided by authors (SE) | 277,469 | 89 | Miller et al. [ | 300–600 | Illumina HiSeq | Perl scripts from Miller et al. (2012), Novocraft | 5 reads per locus per individual | ||
| Hohenlohe et al. [ | Online (PE)f | 77,141 | 147–552a | Etter et al. [ | 330–400 | Illumina HiSeq | STACKS | Locus sequenced in 1/60 (2%) individuals after pooling across individuals | ||
| Miller et al. [ | Online (SE)g | 40,649 | 68 | Baird et al. [ | 200–500 | Illumina HiSeq | Custom-written Perl scripts, Novoalign | Locus sequenced in 3 individuals | ||
| Atlantic salmon ( | Gonen et al. [ | Provided by authors (PE) | 366,219 | 95 | 65,758 | Etter et al. [ | 250–500 | Illumina HiSeq 2000 | RADtools, STACKS | 500 reads per locus across 96 individuals |
| Houston et al. [ | Provided by authors (PE) | 66,073b | 95 | Baird et al. [ | 250–500 | Illumina GAIIx/HiSeq 2000 | RADtools | 5 reads per allele per locus per individual | ||
| Lake whitefish ( | Gagnaire et al. [ | Provided by authors (SE) | 193,258 | 69 | 193,258 | Baird et al. [ | 200–500 | Illumina HiSeq 2000 | STACKS | Locus is present in at least one mapping parent |
| Three-spined stickleback ( | Roesti et al. [ | Provided by authors (SE) | 31,118c | 64 or 138d | 31,118 | Baird et al. [ | 200–500 | Illumina HiSeq 2000 | Novoalign, SAMtools | 12 reads per locus across 284 individuals |
| Atlantic halibut ( | Palaiokostas et al. [ | Provided by authors (SE) | 83,678 | 96 | 83,678 | Baird et al. [ | 300–550 | Illumina HiSeq 2000 | STACKS | 30 reads per locus per individual |
| Baltic sea herring ( | Corander et al. [ | Online (SE)h | 63,742 | 95 | 63,742 | Baird et al. [ | 200–500 | Illumina HiSeq 2000 | FLORAGENEX unitag assembler v2.0, FLORAGENEX pipeline | 5 reads per locus per individual |
| Spotted gar ( | Amores et al. [ | Provided by authors (SE) | 64,483 | 75 | 64,483 | Miller et al. [ | 200–500 | Illumina GAIIx | STACKS | Locus sequenced in 85 (90%) individuals |
| Gudgeon ( | Kakioka et al. [ | Online (SE)i | 44,109 | 70 | 44,109 | Etter et al. [ | 300–500 | Illumina GAIIx/HiSeq 2000 | STACKS | 3 reads per locus per individual |
SE single-end RAD-Seq, PE paired-end RAD-Seq.
aPaired-end RAD sequencing generated contigs of variable length.
b2 files from two families, sequence counts: 70,207 and 70,739. Subsequently combined into one file with 66,073 common sequences.
c46 files (one per individual). Sequence count range: 25,840 – 42,618. Subsequently combined into one file with 31,118 common sequences.
dTwo separate sequencing studies were implemented, resulting in two different read lengths.
e http://www.g3journal.org/lookup/ suppl/doi:10.1534/g3.113.009316/-/DC1.
f http://datadryad.org/resource/ doi:10.5061/dryad.32b88
g http://onlinelibrary.wiley.com/ doi:10.1111/j.1365-294X.2011.05305.x/
hdoi:10.5061/dryad.jr56h.
i http://www.biomedcentral.com/1471-2164/14/32/additional.
Number of RAD locus clusters and interspecific variants identified for each analysis
| Species | Parameters | Analysis pipeline | Minimum taxon coverage | Number of orthologous RAD loci | Number (%) of orthologous RAD loci in genes | Number of variants for relationship estimation | Range of missing interspecific variants in included species | Percentage of missing data in RAxML matrix |
|---|---|---|---|---|---|---|---|---|
| Salmonids | Strict | BLASTN | 5 | 3,050 | 375 (12.3) | 6,959 | NA | 0 |
| Salmonids | Strict | BLASTN | ≥3 | 22,632 | 1,407 (6.2) | 39,890 | 3,135–21,480 | 25.09 |
| All ten species | Relaxed | BLASTN | 10 | 1 | 1 (100.0) | NA | NA | NA |
| All ten species | Relaxed | BLASTN | ≥7 | 137 | 106 (77.4) | 1,440 | 37–745 | 25.50 |
| All ten species | Relaxed | BLASTN | ≥5 | 452 | 321 (71.0) | 4,094 | 371–2,881 | 36.75 |
Figure 1Expected evolutionary relationships as defined by Near et al. [49] and Shedko et al. [48]. Species images were taken from http://en.wikipedia.org/ or are published for open access use. Divergence times and branch lengths not drawn to scale. Divergence estimates for the non-salmonid teleost fish species were obtained from Near et al. [49], and divergence estimates for the salmonid species were obtained from Shedko et al. [48].
Figure 2Example tree of all ten fish species obtained in this study using RAxML. Evolutionary relationships obtained using RAD data in this study were congruent with those of Near et al. [49] (teleost species) and Shedko et al. [48] (salmonid species) (Figure 1). Parameters—RAD loci present in at least five of ten species; 452 loci, 4,094 between-species variants. Branch lengths (given as percentages) estimated in RAxML are given along individual branches (in blue), and node bootstrap support values (1,000 bootstrap replicates) are given at individual nodes (in red). Branch lengths are not drawn to scale.