Literature DB >> 24628286

Testing advances in molecular discrimination among Chinook salmon life histories: evidence from a blind test.

Michael A Banks¹, David P Jacobson, Isabelle Meusnier, Carolyn A Greig, Vanessa K Rashbrook, William R Ardren, Christian T Smith, Jeremiah Bernier-Latmani, John Van Sickle, Kathleen G O'Malley.

Abstract

The application of DNA-based markers toward the task of discriminating among alternate salmon runs has evolved in accordance with ongoing genomic developments and increasingly has enabled resolution of which genetic markers associate with important life-history differences. Accurate and efficient identification of the most likely origin for salmon encountered during ocean fisheries, or at salvage from fresh water diversion and monitoring facilities, has far-reaching consequences for improving measures for management, restoration and conservation. Near-real-time provision of high-resolution identity information enables prompt response to changes in encounter rates. We thus continue to develop new tools to provide the greatest statistical power for run identification. As a proof of concept for genetic identification improvements, we conducted simulation and blind tests for 623 known-origin Chinook salmon (Oncorhynchus tshawytscha) to compare and contrast the accuracy of different population sampling baselines and microsatellite loci panels. This test included 35 microsatellite loci (1266 alleles), some known to be associated with specific coding regions of functional significance, such as the circadian rhythm cryptochrome genes, and others not known to be associated with any functional importance. The identification of fall run with unprecedented accuracy was demonstrated. Overall, the top performing panel and baseline (HMSC21) were predicted to have a success rate of 98%, but the blind-test success rate was 84%. Findings for bias or non-bias are discussed to target primary areas for further research and resolution.

Entities: CellLine Chemical Disease Species

Keywords: Oncorhynchus tshawytscha; individual-identification; microsatellites

Mesh：

Substances：
Genetic Markers

Year: 2014 PMID： 24628286 PMCID： PMC4112815 DOI： 10.1111/age.12135

Source DB: PubMed Journal: Anim Genet ISSN： 0268-9146 Impact factor: 3.169

Introduction

Salmon are prized globally as a source of high‐quality food. Chinook or King salmon (Oncorhynchus tshawytscha) traditionally has ranked as the most favored salmon species owing to its firm quality and high‐nutrient flesh. Indeed, Chinook salmon was ranked among the top five of 60 wildlife species in an economic valuation of biodiversity (along with elk, moose, humpback whale and bald eagle; Martin‐Lopez et al. 2008). The natural distribution of Chinook extends from Hokkaido Island (Northern Japan) up northerly through Kamchatka, Russia, the Bering Sea, Alaska, to ocean territories west of Canada, Washington, Oregon and California. Today, this species also is spawned and reared in a substantial number of hatcheries distributed across this range and in aquaculture enterprises of Chile, Brazil, Korea and New Zealand, where some naturalized populations have become established. At the southeastern extreme of Chinook's natural distribution, California's Central Valley drainage surfaces as a unique context for this species. Broad availability of extensive habitat combined with consistent cold watering from Sierra snowmelt here has supported development of the most diverse range in life‐history types found anywhere. Thus, there are four primary runs, named fall, late‐fall, winter and spring, after seasonal peaks in numbers of freshwater returns from the ocean (Fisher 1994). Although there is overlap across seasons and essentially gravid Chinook may be found in the river year round, historically the runs occupied spatially segregated spawning habitats. Winter run utilized spring‐fed headwaters, spring run utilized higher elevation streams, late‐fall run utilized mainstem rivers and fall run utilized lower elevation rivers and tributaries (Yoshiyama et al. 2001). Today, however, approximately 70% of previously available habitats are now impounded by reservoirs or for other uses, raising questions as to how effectively these runs may be able to maintain reproductively isolated breeding groups. These four runs also often occur together during other phases of the Chinook's life cycle, for example as juvenile out‐migrants through the Sacramento/San Joaquin Delta and San Francisco estuary or during ocean‐feeding migration. As migrants through the Delta, juvenile Chinook are exposed to large water export facilities operated by the State of California (State Water Project) and the U.S. Government (Central Valley Project). Some of these salmon subpopulations are listed as endangered (winter run) or threatened (spring run), thus there has been active interest to develop reliable methods for identification of run among sampled fish. This motivated early development of molecular and statistical tools for individual assignment, and Central Valley Chinook salmon were among the first salmonids to be individually assigned to run using molecular genetics (Banks et al. 1999, 2000). It now has been over a decade since that baseline was published, and a central goal of our effort has been to develop and upgrade methodologies in order to provide the highest resolution for individual (not population)‐based discriminating among these four runs of Central Valley Chinook salmon. Two primary approaches were addressed: (i) We sought markers directly linked to life‐history traits differing among the runs (such as run timing; O'Malley et al. 2007) and (ii) we employed statistical approaches to assess the relative power of alternate makers for run discrimination (Banks et al. 2003). Research presented here focused on the improvements of molecular genetics to discriminate among Chinook salmon of California's Central Valley. Three different microsatellite loci panels were contrasted between two different baseline collections of Chinook salmon.

Methods

Baselines, subpopulation assemblages, sample collection and DNA extraction

This study compared and contrasted two baseline population genetic characterizations of Chinook salmon sampled from California's Central Valley drainage (Fig. 1), hereafter called baselines, and three different microsatellite loci panels. The first baseline collection, the Hatfield Marine Science Center (HMSC) baseline, founded on Banks et al. (2000), included samples that were divided among five reporting groups. Three of the reporting groups corresponded to primary runs (winter, fall and late‐fall), and the other two corresponded to genetically distinct assemblages of spring run: (i) spring run from Butte Creek and (ii) spring run from Deer and Mill Creeks. These samples were assembled among ten 96‐well trays (two for each primary run or reporting group) and included a total of 936 samples: comprising between six and 86 samples for each of nine years and 24 run collections taken from 1991 to 1998 by the California Department of Fish and Game (CDFG) and the U.S. Fish and Wildlife Service (Table 1). The second baseline collection, the Genetic Analysis of Pacific Salmon (GAPS) Consortium baseline, was developed and standardized among 12 fisheries genetics laboratories in the Pacific Northwest (Seeb et al. 2007; Moran et al. 2013) and included a total of nine discrete population samples from California's Central Valley drainage among a total of 166 population samples distributed from California to Alaska. These baseline collections were divided among four reporting groups (the five described in Banks et al. 2000 and depicted in Table 1, except late‐fall). To compare assignment accuracy of these baselines, it was necessary to use common reporting groups. Because the GAPS baseline did not characterize any late‐fall collections from California, fall and late‐fall results derived using the HMSC baseline in the present study were pooled into a single fall–late‐fall reporting group. This pooled fall–late‐fall reporting group derived from GAPS and HMSC baselines also included assignments to both spring and fall individuals from the Feather River Hatchery owing to known hybridization between these stocks and difficulty in resolving population identity between them (Banks et al. 2000; Hedgecock et al. 2001).

Figure 1

Rivers and tributaries of California's Central Valley indicating Chinook salmon sampling sites per run and hatcheries.

Table 1

Collection data for California's Central Valley Chinook baseline populations from breeding stocks separated by run timing and location. Hatfield Marine Science Center (HMSC) baselines are characterized at 16 and 21 microsatellite loci respectively; GAPS13 (from Genetic Analysis of Pacific Salmon Consortium) is a different baseline collection characterized at 13 microsatellite loci

Run	HMSC16 and HMSC21 baselines				GAPS13 baseline
Run	Year	Sampling location	Life stage	n	Year	Sampling location	Life stage	n
Winter	1991	Keswick & Red Bluff Dams	Adult	17	1992–5	Keswick & Red Bluff Dams	Adult	56
	1992	Keswick Dam	Adult	29	1997	Keswick Dam	Adult	3
	1993	Keswick & Red Bluff Dams	Adult	9	1998	Keswick Dam	Adult	17
	1994	Keswick Dam	Adult	24	2001	Keswick Dam	Adult	35
	1995	Keswick Dam	Adult	25	2003	Keswick Dam	Adult	10
	1998	Keswick Dam	Adult	87	2004	Keswick Dam	Adult	15
	Total			191				136
Spring	1994	Butte Creek	Spawned carcass	50	2002	Butte Creek	Adult	61
Butte	1996	Butte Creek	Spawned carcass	12	2003	Butte Creek	Adult	83
Creek	1997	Butte Creek	Spawned carcass	60
	1998	Butte Creek	Spawned carcass	62
	Total			184				144
Spring	1994	Deer Creek	Juvenile	12	2002	Deer Creek	Adult	53
Deer &	1995	Deer Creek	Spawned carcass	13	2002	Mill Creek	Adult	71
Mill	1995	Mill Creek	Spawned carcass	10	2003	Mill Creek	Adult	20
Creek	1996	Deer Creek	Juvenile	68
	1996	Mill Creek	Juvenile	12
	1997	Deer Creek	Spawned carcass	38
	1998	Deer Creek	Spawned carcass	26
	1998	Mill Creek	Spawned carcass	6
	Total			185				144
Fall	1995	Nimbus Hatchery	Adult	75	2002	Battle Creek	Adult	67
	1995	Mokelumne Hatchery	Adult	67	2003	Battle Creek	Adult	77
	1995	Merced Hatchery	Adult	48	2003	Feather Hatchery	Adult	144
					2002	Stanislaus River	Adult	76
					2002	Tuolumne River	Adult	68
	Total			190				432
Late‐fall	1993	Keswick Dam & Battle Creek	Adult	72		Not sampled
	1995	Coleman National Fish Hatchery	Adult	90
	1995	Keswick Dam	Adult	24
	Total			186

Rivers and tributaries of California's Central Valley indicating Chinook salmon sampling sites per run and hatcheries. Although 100%, jackknife and leave‐one‐out simulations available in population assignment applications may be useful for predicting the accuracy and precision provided by various genetic baselines, they also may provide biased or overly optimistic indications. It is thus ideal to include samples of known origin or ‘blind samples’ when evaluating assignment power. For this purpose, a total of 750 tissue samples from Chinook salmon of known life history stored in the CDFG tissue archive were coded (to mask their identity) and enabled a blind test of assignment accuracy of three alternate microsatellite panels. DNA extraction of blind‐test samples followed a silica‐based method utilizing multichannel pipettes; PALL glass fiber filtration plates; and buffer, centrifuge and transfer protocols described in Ivanova et al. (2006).

Microsatellite loci characterization

Baseline and blind‐test samples were characterized utilizing three microsatellite panels, and following amplification protocols detailed in references cited: GAPS13 (from Seeb et al. 2007) included: Ogo‐2, ‐4 (Olsen et al. 1998); Oki100 (Canadian Department of Fisheries and Oceans, unpublished); Omm1080 (Rexroad et al. 2001); Ots‐3M (Greig & Banks 1999); Ots‐9 (Banks et al. 1999); Ots‐201b, ‐208b, ‐211, ‐212, ‐213 (Greig et al. 2003); OtsG474 Williamson et al. (2002); and Ssa408 Cairney et al. 2000 HMSC16 (from Banks & Jacobson 2004) included: Ots‐104, ‐107 (Nelson & Beacham 1999); Ots‐201b, ‐208b, ‐209 ‐211, ‐212, ‐215 (Greig et al. 2003); Ots‐G78b, ‐G83b, ‐G249, ‐G253, ‐G311, ‐G422, ‐G409 Williamson et al. (2002); and Ost515 (Naish & Park 2002). HMSC21 included: the above 16 loci as well as an additional five microsatellites derived from research characterizing alternate copies of the circadian rhythm transcription factor cryptochrome: Cry2b.1, Cry2b.2, Cry3 (O'Malley et al. 2010), Ots‐701 (GenBank Accession no. KF163438) and Ots‐702 (GenBank Accession no. KF163440). Alternate alleles were resolved through electrophoresis utilizing an Applied Biosystems (ABI) 3730xl DNA analyzer and scored using ABI genemapper software (Version 4).

Standardization of the HMSC baseline with the Abernathy Fish Technology Center

The same standardization methods developed by the GAPS group (Seeb et al. 2007) were employed to standardize amplification, electrophoresis, allele nomenclature and scoring methods achieved between HMSC and the Abernathy Fish Technology Center (AFTC) laboratories. Briefly, this exercise involved sharing and evaluating three independent and coded 96‐well plates containing Chinook salmon DNA samples: Bin‐definition plate 1 was passed from HMSC to AFTC along with genotype data. AFTC amplified and analyzed these samples in their laboratory using an ABI 3130 DNA Sequencer to enable AFTC allele bin calibration and scoring with HMSC allele nomenclature. Test plate 1/bin‐definition plate 2 was passed from HMSC to AFTC but without any genotype data. AFTC analyzed these samples and reported results back HMSC to assess standardization. Test plate 2/bin‐definition plate 3 was passed from HMSC to AFSC without genotype data. AFTC analyzed these samples and reported results to HMSC for final assessment of standardization among laboratories.

Assignment and statistical analysis

Given that numbers of fall and late‐fall migrants substantially exceed those from winter and spring runs in most scenarios in the lower reaches of the Sacramento River or the NW Pacific Ocean, simulations performed to test for precision and accuracy were designed to approximate these relative abundance differences. This was achieved through utilizing the ‘realistic fishery’ option within the statistical package oncor (Kalinowski 2008 www.montana.edu/kalinowski/Software/ONCOR.htm). Note that this technique utilizes a cross‐validation over a gene copies method demonstrated to be less prone to providing over‐optimistic estimates of assignment power than earlier methods (Anderson et al. 2008; Anderson 2010). For HMSC baselines, parameters were set to construct 1000 hypothetical mixtures of size 100 individuals each, using a 0.97 fraction for fall–late‐fall reporting group and a 0.01 fraction each for the winter and spring from Butte Creek and the spring from Deer and Mill Creeks reporting groups. For the GAPS13 baseline, parameters were set to construct 1000 hypothetical mixtures of size 100 individuals each, using a 0.2475 fraction for Battle Creek fall, 0.2375 for Butte Creek fall, 0.2375 for Feather River Hatchery fall and 0.2375 for Stanislaus River fall. The GAPS13 simulation therefore had the same total 0.97 fraction for the fall‐run reporting group, 0.01 for the Butte Creek spring, 0.01 for the Deer Creek spring, 0.00 for the Feather River Hatchery spring and 0.01 for the winter reporting groups. Complete multilocus data for blind‐test samples were required with the exception of up to a maximum of three missing loci for all three microsatellite panels. Run identities were assessed utilizing oncor's ‘assign individual to baseline population’ option, and each individual was assigned to the reporting group for which it had the greatest probability (no probability cutoff was applied). Lower and upper 95% confidence intervals for realistic results from simulation studies were calculated using standard methods (P ± 1.96 * standard error; Sokal & Rohlf 1995). We cross‐tabulated the counts of the 750 blind‐test samples correctly (true) versus incorrectly (false) identified by each possible pair of panels, separately for each run. Because both panels of each pair were identifying the same set of samples, their correct identification proportions were not independent. Thus, we used an exact version of McNemar's test (Agresti 2002; Zar 2010) for each pair of panels to test for the equality of those proportions.

Results

Standardization results indicate the AFTC and the HMSC allele scores averaged 97% identical for test plate one and 98% correct for test plate two (Table 2). One locus, Ots‐208b, consistently scored less than the 90% identity threshold identified by the GAPS Consortium (Seeb et al. 2007). Concordance between laboratories for the remaining loci was at least 90%, indicating that these loci had been successfully standardized.

Table 2

Percentage agreement in allele scoring between Abernathy Fish Technology Center and Hatfield Marine Science Center (HMSC) for microsatellite panel HMSC16

Locus	Test plate 1	Test plate 2
Ots‐104	95.9	99.4
Ots‐107	100	98.8
Ots‐201b	98.8	99.4
Ots‐208b	88.3	87.7
Ots‐209	97.7	97.1
Ost‐211	96	100
Ots‐212	99.4	98.9
Ots‐215	100	100
Ots‐249	99.4	97.8
Ots‐253b	92.5	98.9
Ots‐515	92.3	94.8
Ots‐G311	99.2	99.3
Ots‐G409	94.9	99.4
Ost‐G422	100	100
Ost‐G78B	94.4	100
Ots‐G83B	100	99.4
Average	96.8	98.2

Realistic fishery simulation results indicated strong correct identity assignment potential (largely in the 90th percentiles) for each of the three microsatellite panels (Table 3 and Fig. 2). Consistent ranking among the three panels also was apparent from simulation results with correct assignment parameters ranging from 70 through 100% (GAPS13), 90% through 100% (HMSC16) and 96 through 100% (HMSC21). Non‐overlapping 95% confidence intervals reinforce findings that (i) spring from Butte Creek correct assignments was higher for HMSC16 and HMSC21 compared with GAPS13; (ii) spring from Deer and Mill Creeks assignments increased according to ranking for GAPS13, HMSC16 and HMSC21; and (iii) HMSC16 ranked lower than did GAPS13 and HMSC21 for pooled fall and late‐fall assignments. Finally, all run assignment averages for both HMSC16 and HMSC21 were higher than for GAPS13.

Table 3

Summary percentage correct results of realistic fishery simulations assessed at each of the three baselines for populations: W, winter; SB, spring from Butte Creek; SDM, spring from Deer and Mill Creeks; F‐LF, fall and late‐fall

	GAPS	HMSC16	HMSC21
W	100	100	100
SB	87.2 (83.6, 90.9)	98.4 (97.1, 99.8)	99.1 (98.1, 100.1)
SMD	69.7 (66.3, 73.2)	89.9 (86.6, 93.3)	95.8 (93.5, 98.0)
F‐LF	99.2 (99.1, 99.3)	97.9 (97.8, 98.1)	99.2 (99.1, 99.3)
Ave	89	96.6	98.5

GAPS, Genetic Analysis of Pacific Salmon Consortium; HMSC, Hatfield Marine Science Center.

Figure 2

Blind‐test (n = 623) and simulation correct assignment results (n = 1000 for winter and spring reporting groups) among California Central Valley Chinook salmon calculated using oncor (Kalinowski 2008) and assessed using three different microsatellite panels. Bars on simulations indicate 95% confidence intervals. Chinook salmon runs are indicated as follows: F&LF, pooled fall and late‐fall runs; SB, spring from Butte Creek; SMD spring from Mill and Deer Creeks; W, winter.

GAPS, Genetic Analysis of Pacific Salmon Consortium; HMSC, Hatfield Marine Science Center. Blind‐test (n = 623) and simulation correct assignment results (n = 1000 for winter and spring reporting groups) among California Central Valley Chinook salmon calculated using oncor (Kalinowski 2008) and assessed using three different microsatellite panels. Bars on simulations indicate 95% confidence intervals. Chinook salmon runs are indicated as follows: F&LF, pooled fall and late‐fall runs; SB, spring from Butte Creek; SMD spring from Mill and Deer Creeks; W, winter. Blind test of actual power (inferred from 623 known ID samples) indicated that simulation results generally were upwardly biased but affirmed parallel relative rankings across runs and microsatellite panels (Fig. 2). Fewer of winter run, spring from Butte Creek and spring from Deer and Mill Creeks assignments were correct than predicted. Fall‐run blind‐test assignments matched simulation estimates most closely. Average realistic fishery simulation rankings of microsatellite panels, HMSC21 best score of 98.5%, HMSC16 next best score of 96.6% and GAPS13 lowest score of 87.7%, were supported by blind‐test assignment accuracy of 84.2% (HMSC21), 83.8% (HMSC16) and 79.8% (GAPS13) (Table 4). There is some evidence that HMSC16 and HMSC21 winter blind‐test assignments were more often correct than were those of GAPS13 (McNemar's test, P = 0.0625; Table 5). However, we found no differences in the classification success rates of the three panels for any of the other runs (spring from Butte Creek, fall and spring from Deer and Mill Creeks). In particular, HMSC16 and HMSC21 had identical classification success for all blind‐test fish except those in the fall run (Table 5). Allele frequency data utilized in this study are available at OSU Scholars Archive (doi: 10.7267/N9KW5CXX).

Table 4

Summary results of percentage correct assignment for each baseline from blind‐test samples (Blind) and simulations (Sims) for populations: W, winter; SB, spring from Butte Creek; SDM, spring from Deer and Mill Creeks; F‐LF, fall and late‐fall

	GAPS		HMSC16		HMSC21
	Blind	Sims	Blind	Sims	Blind	Sims
W	92.61	100.0	95.45	100.00	95.45	100.00
SB	76.92	87.24	92.31	98.46	92.31	99.09
SMD	50.00	69.75	50.00	89.92	50.00	95.76
F‐LF	99.72	93.80	97.45	97.94	99.07	99.24
Ave	79.81	87.70	83.80	96.58	84.21	98.52

GAPS, Genetic Analysis of Pacific Salmon Consortium; HMSC, Hatfield Marine Science Center.

Table 5

Comparisons of microsatellite panels in their classification success for three true runs. T denotes an accurately classified fish, and F denotes an error. P‐values are for McNemar's test of equality in the proportions accurately classified by two panels. Spring run from Deer and Mill Creeks not shown because all three panels had identical classification success

	True run winter (n = 176)				True run spring from Butte Creek (n = 13)				True run fall (n = 432)
	H16‐F	H16‐T	P		H16‐F	H16‐T	P		H16‐F	H16‐T	P
G13‐F	8	5		G13‐F	1	2		G13‐F	1	1
G13‐T	0	163	0.0625	G13‐T	0	10	0.5	G13‐T	4	426	0.375

G13, Genetic Analysis of Pacific Salmon Consortium panel; H16, Hatfield Marine Science Center 16 microsatellite panel; H21, Hatfield Marine Science Center 21 microsatellite panel.

GAPS, Genetic Analysis of Pacific Salmon Consortium; HMSC, Hatfield Marine Science Center. G13, Genetic Analysis of Pacific Salmon Consortium panel; H16, Hatfield Marine Science Center 16 microsatellite panel; H21, Hatfield Marine Science Center 21 microsatellite panel.

Discussion

Noting that this study focused on discrimination among closely related Chinook salmon runs from the same primary watershed (that have lost 70% of their historic habitat for spatial segregation), a 98% overall correct assignment prediction from simulations and blind‐test affirmation at 84% correct is astonishing. Similarly, promising overall results have been obtained for Sockeye salmon (Beacham et al. 2005), cod (Glover et al. 2010), cow (Van de Goor et al. 2011), sheep (Niu et al. 2011) and cats (Kurushima et al. 2012). Indeed, HMSC21 blind‐test correct assignment averages of 99% (fall), 95% (winter) and 92% (spring from Butte Creek) are especially encouraging given the importance of accurate identification for endangered winter and threatened spring run life histories (NMFS 2009). These particular blind‐test results were in close agreement with predictions for simulations [fall: 99% (blind) and 99% (simulations); winter: 95% (blind) and 100% (simulations); spring from Butte Creek: 92% (blind) and 99% (simulations)] (Table 6). This general agreement also is very positive because previous simulation methods have suffered from upward bias in their assessment of most likely assignment power (Anderson 2010).

Table 6

Blind‐test result for 623 Chinook salmon. Rows indicate actual known identity; columns indicate where they were assigned by three microsatellite panels: G, GAPS (Genetic Analysis of Pacific Salmon Consortium) or H, HMSC (Hatfield Marine Science Center)

Run	Winter (W)			Spring from Butte Creek (SB)			Spring from Deer & Mill Creeks (SDM)			Fall (F)			Total
Run	G13	H16	H21	G13	H16	H21	G13	H16	H21	G13	H16	H21	Actual
W	163	168	168	2	1	1	0	1	1	11	6	6	176
SB	0	0	0	10	12	12	1	0	0	2	1	1	13
SDM	0	0	0	0	0	0	1	1	1	1	1	1	2
F‐LF	1	1	1	1	2	1	0	2	4	430	427	426	432
													623

W, winter; SB, spring from Butte Creek; SDM, spring from Deer and Mill Creeks; F‐LF, fall and late‐fall.

W, winter; SB, spring from Butte Creek; SDM, spring from Deer and Mill Creeks; F‐LF, fall and late‐fall. The wide difference between simulation prediction (96%) and blind‐test findings for spring run from Deer and Mill Creeks (50%) for all three baselines, however, indicates that this upward bias for simulation methods has not been completely eradicated. There are only two samples of known spring Deer and/or Mill Creeks origin among the 623 samples considered in the blind test. This small sample size tempts one to suggest that observed upward difference between simulation and blind‐test findings likely results from chance. We suggest, however, that tests with similarly small sample size scenarios are appropriate because threatened and endangered species by definition are always scarce. Identification applications commonly occur in contexts where endangered species are markedly outnumbered by their more abundant counterparts (such as large‐number fall and late‐fall Chinook salmon runs in the current case). Although the cross‐validation methods introduced by Anderson et al. (2008) and ‘realistic fishery’ algorithms available in oncor (Kalinowski 2008) have begun to overcome the upward bias problem, results obtained here for spring run from Mill and Deer Creeks demonstrate that shortfalls still exist in our ability to employ simulation methods to accurately predict most likely assignment power among closely related runs. An earlier iteration of data for this blind test had a total n = 532. These 532 known‐identity fish, however, happened to contain only one sample from Deer and Mill Creeks and 12 samples from Butte Creek spring runs, yet the three baselines correctly assigned all 13 of these spring samples to their known origin, except that GAPS(13) misassigned two of the 12 springs from Butte Creek. Thus, 100% [and 83% for Butte Creek (GAPS13)] correct blind‐test results for both spring run subpopulations were in closer agreement with simulation predictions and did not show any upward bias. Given that both spring run subpopulations had few numbers of samples employed in the first blind‐test 532 samples that were low, we returned to the original 750 blind‐test sample to derive more data. This increased our total number (n) to 623, but did not substantially increase the numbers of spring run in the blind test. These results underscore the importance of using data that are separate from those used to train a classification process in evaluating the accuracy of that process (Anderson 2010). No samples from any late‐fall run were included in the GAPS13 baseline; however, blind‐test and simulation results for late‐fall run in the HMSC baselines provided further information with regard to bias. The blind sample of 623 had a total of 77 samples from late‐fall run (data not shown). Simulation tests predicted a 91% success rate for late‐fall, yet the blind‐test score was only 44% correct. This was not unexpected considering that fall and late‐fall runs are the most closely related among all Central Valley population pairs (fall–late‐fall pairwise Fst = 0.02 vs. average Fst for all subpopulations = 0.08). Indeed, late‐fall‐run misassignments were largely to fall run. Note, however, that an n = 77 for late‐fall samples is no longer small, yet this run had the highest upward bias observed between simulation and blind‐test results. In contrast, this upward bias of simulation prediction was not observed for fall run. Considering fall and late‐fall runs separately, the n = 623 blind test had 157 fall‐run samples, of which 153 (97%) were correctly identified by HMSC21 in exact agreement with simulation prediction of 97%. Comparing results attained from different microsatellite panels, the overall increasing correct assignment ranking from GAPS13, HMSC16 to HMSC21 was in parallel with increasing number of loci, as observed in other studies (Bjørnstad & Røed 2002; Bamshad et al. 2003; Tadano et al. 2008). This is supported by consistent ranking results from simulation tests for each of the runs (except GAPS13, which switched to second place for combined fall–late‐fall simulation assignments) and marginal McNemar support for the same blind‐test 13‐16‐21 loci increasing assignment ranking. However, despite consistent top performance for HMSC21, margins separating results were not sufficient to prove this statistically. Although HMSC16 and 21 panel performances are largely the same for the blind test, simulations indicate the increased value of additional loci for discrimination among fall and spring runs (Fig. 2). This and fall–late‐fall discrimination remain areas of greatest challenge in addressing accuracy for individual‐based population assignment among California's Central Valley Chinook salmon. However, fall‐run identification across all baselines and microsatellite panels (including both blind‐test and simulation results) was high (average 98% correct). This level of success is a first and likely has strong application potential. Regionally, California's Central Valley Chinook salmon returns have been disturbingly low in recent years. Precipitously low numbers of Central Valley fall‐run Chinook salmon was the primary driving force for a complete ocean fishery closure for 2008 and 2009 (NMFS 2009). This situation had significant negative economic consequences for the region and motivates continued efforts, such as the molecular and statistical methods covered here, to better quantify accuracy for individual‐based population identity determination for improved management, monitoring and conservation.

17 in total

1. Isolation and cross species amplification of microsatellite loci useful for study of Pacific salmon.

Authors: R J Nelson; T D Beacham
Journal: Anim Genet Date: 1999-06 Impact factor: 3.169

2. Five multiplexed microsatellite loci for rapid response run identification of California's endangered winter chinook salmon.

Authors: C Greig; M A Banks
Journal: Anim Genet Date: 1999-08 Impact factor: 3.169

3. Evaluation of factors affecting individual assignment precision using microsatellite data from horse breeds and simulated breed crosses.

Authors: G Bjørnstad; K H Røed
Journal: Anim Genet Date: 2002-08 Impact factor: 3.169

10. Candidate loci reveal genetic differentiation between temporally divergent migratory runs of Chinook salmon (Oncorhynchus tshawytscha).

Authors: Kathleen G O'Malley; Mark D Camara; Michael A Banks
Journal: Mol Ecol Date: 2007-10-30 Impact factor: 6.185

Testing advances in molecular discrimination among Chinook salmon life histories: evidence from a blind test.

Introduction

Methods

Baselines, subpopulation assemblages, sample collection and DNA extraction

Microsatellite loci characterization

Standardization of the HMSC baseline with the Abernathy Fish Technology Center

Assignment and statistical analysis

Results

Discussion

1. Isolation and cross species amplification of microsatellite loci useful for study of Pacific salmon.

2. Five multiplexed microsatellite loci for rapid response run identification of California's endangered winter chinook salmon.

3. Evaluation of factors affecting individual assignment precision using microsatellite data from horse breeds and simulated breed crosses.

4. Human population genetic structure and inference of group membership.

5. Which genetic loci have greater population assignment power?

6. High accuracy of genetic discrimination among chicken lines obtained through an individual assignment test.

7. Economic valuation of biodiversity conservation: the meaning of numbers.

8. Assessing the power of informative subsets of loci for population assignment: standard methods are upwardly biased.

9. Thirty-five polymorphic microsatellite markers for rainbow trout (Oncorhynchus mykiss).

10. Candidate loci reveal genetic differentiation between temporally divergent migratory runs of Chinook salmon (Oncorhynchus tshawytscha).