| Literature DB >> 23651561 |
Matthew A Lemay1, David J Donnelly, Michael A Russello.
Abstract
BACKGROUND: High throughput next-generation sequencing technology has enabled the collection of genome-wide sequence data and revolutionized single nucleotide polymorphism (SNP) discovery in a broad range of species. When analyzed within a population genomics framework, SNP-based genotypic data may be used to investigate questions of evolutionary, ecological, and conservation significance in natural populations of non-model organisms. Kokanee salmon are recently diverged freshwater populations of sockeye salmon (Oncorhynchus nerka) that exhibit reproductive ecotypes (stream-spawning and shore-spawning) in lakes throughout western North America and northeast Asia. Current conservation and management strategies may treat these ecotypes as discrete stocks, however their recent divergence and low levels of gene flow make in-season genetic stock identification a challenge. The development of genome-wide SNP markers is an essential step towards fine-scale stock identification, and may enable a direct investigation of the genetic basis of ecotype divergence.Entities:
Mesh:
Year: 2013 PMID: 23651561 PMCID: PMC3653777 DOI: 10.1186/1471-2164-14-308
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of next-generation sequence data obtained for each ecotype of Okanagan Lake kokanee
| No. of bases | 371,876,524 | 373,169,057 |
| No. of reads | 1,343,483 | 1,406,375 |
| Mean read length | 276.8 bases | 265.3 bases |
Summary of the contigs present in each kokanee data set
| No. of contigs | 123,547 | 11,074 | 277 | 557 |
| Mean coverage | 7.5 | 37.0 | 6.8 | 8.4 |
| Mean length | 463.7 | 594.8 | 374.2 | 404.2 |
| Mean no. of reads | 14.2 | 77.9 | 8.5 | 12.7 |
1 In the high coverage data set each contig has a minimum length of 200 bases and a minimum of 5× coverage for each ecotype. In addition, this data set has duplicate contigs and contigs that map to pathogen DNA sequences removed.
2 Contigs composed of reads from a single ecotype. Minimum length = 200 bases; minimum coverage = 5×. Contigs that map to pathogen DNA sequences have been removed.
Figure 1Characterization of the contigs present in the high coverage data set. Histograms represent (A) average coverage of each contig (mean = 37.0), (B) number of reads (mean = 77.9), (C) contig lengths (mean = 594.8 bases), and (D) the number of SNPs for each of the high coverage contigs.
Figure 2Functional annotation of the high coverage contigs. The frequency (%) of each observed gene ontology (GO) term is given for the three GO domains (biological process, cellular component, and molecular function).
Genetic diversity estimates from loci that were successfully genotyped using High Resolution Melt Analysis (HRMA)
| 0.13 / 0.00 * | 0.32 / 0.00 * | 0.19 / 0.05 * | 0.93 (1.00) | 0.80 (0.48) | 0.90 | |
| 0.29 / 0.29 | 0.32 / 0.40 | 0.41 / 0.48 | 0.18 (1.00) | 0.20 (0.46) | 0.28 | |
| 0.47 / 0.57 | 0.43 / 0.61 | 0.47 / 0.57 | 0.74 (1.00) | 0.70 (0.54) | 0.63 | |
| 0.20 / 0.22 | 0.38 / 0.36 | 0.19 / 0.22 | 0.79 (0.52) | 0.75 (1.00) | 0.90 | |
| 0.32 / 0.40 | 0.32 / 0.40 | 0.00 / 0.00 | 0.80 (1.00) | 0.80 (0.77) | 1.00 | |
| 0.43 / 0.52 | 0.44 / 0.47 | 0.16 / 0.17 | 0.78 (0.42) | 0.67 (0.79) | 0.91 | |
| 0.49 / 0.37 | 0.50 / 0.53 | 0.43 / 0.46 | 0.58 (0.33) | 0.58 (0.65) | 0.32 | |
| 0.22 / 0.25 | 0.35 / 0.45 | 0.44 / 0.48 | 0.88 (0.31) | 0.77 (0.85) | 0.67 | |
| 0.50 / 0.61 | 0.50 / 0.25 * | 0.50 / 0.57 | 0.57 (0.35) | 0.56 (0.70) | 0.52 | |
*Denotes significant deviation from HWE following sequential Bonferroni correction.
1 The identity of the major allele is defined as the allele with the highest frequency in the transcriptome data.
+The contig from which this locus was created had some overlap with one other contig (34452). Both contigs were subsequently removed from the high coverage data set. As the overlap did not impact the SNP site, this locus has been retained.
Number of next-generation sequencing reads that aligned to reference sequences from four salmonid pathogens
| 7,393 | 219 | |
| Complete genome | ||
| [AM398681] | ||
| 17 | 0 | |
| Partial ITS1, complete 5.8S rRNA gene, partial ITS2 | ||
| [JN230351] | ||
| 393 | 1 | |
| Mitochondrion, complete genome | ||
| [AY534144] | ||
| 285 | 327 | |
| Glycoprotein (G) and non-virion protein (NV) genes | ||
| [IHNGNVJ] | ||
1 IHNV was identified in the high coverage data set containing reads from both ecotypes and was not expected to show ecotype specificity.
Figure 3Functional annotation of contigs that were unique to each ecotype. The frequency (%) of each observed gene ontology (GO) term is presented for both ecotypes.