| Literature DB >> 29740479 |
Guangtu Gao1, Torfinn Nome2, Devon E Pearse3, Thomas Moen4, Kerry A Naish5, Gary H Thorgaard6, Sigbjørn Lien2, Yniv Palti1.
Abstract
Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout (Oncorhynchus mykiss), SNP discovery has been previously done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL) and RNA sequencing. Recently we have performed high coverage whole genome resequencing with 61 unrelated samples, representing a wide range of rainbow trout and steelhead populations, with 49 new samples added to 12 aquaculture samples from AquaGen (Norway) that we previously used for SNP discovery. Of the 49 new samples, 11 were double-haploid lines from Washington State University (WSU) and 38 represented wild and hatchery populations from a wide range of geographic distribution and with divergent migratory phenotypes. We then mapped the sequences to the new rainbow trout reference genome assembly (GCA_002163495.1) which is based on the Swanson YY doubled haploid line. Variant calling was conducted with FreeBayes and SAMtools mpileup, followed by filtering of SNPs based on quality score, sequence complexity, read depth on the locus, and number of genotyped samples. Results from the two variant calling programs were compared and genotypes of the double haploid samples were used for detecting and filtering putative paralogous sequence variants (PSVs) and multi-sequence variants (MSVs). Overall, 30,302,087 SNPs were identified on the rainbow trout genome 29 chromosomes and 1,139,018 on unplaced scaffolds, with 4,042,723 SNPs having high minor allele frequency (MAF > 0.25). The average SNP density on the chromosomes was one SNP per 64 bp, or 15.6 SNPs per 1 kb. Results from the phylogenetic analysis that we conducted indicate that the SNP markers contain enough population-specific polymorphisms for recovering population relationships despite the small sample size used. Intra-Population polymorphism assessment revealed high level of polymorphism and heterozygosity within each population. We also provide functional annotation based on the genome position of each SNP and evaluate the use of clonal lines for filtering of PSVs and MSVs. These SNPs form a new database, which provides an important resource for a new high density SNP array design and for other SNP genotyping platforms used for genetic and genomics studies of this iconic salmonid fish species.Entities:
Keywords: SNP discovery; doubled haploid; genome resequencing; paralogous sequence variants; rainbow trout
Year: 2018 PMID: 29740479 PMCID: PMC5928233 DOI: 10.3389/fgene.2018.00147
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
List of the 11 clonal doubled haploid rainbow trout and steelhead lines used in this whole-genome resequencing for SNP discovery study.
| Whale rock male | YY male | Central California Coast | Landlocked steelhead | Wild |
| Whale rock female | XX female | Central California Coast | Landlocked steelhead | Wild |
| Arlee | YY male | Northern California | Resident | Domesticated |
| Hot creek | YY male | Northern California | Resident | Domesticated |
| Oregon State University | XX female | Northern California | Resident | Domesticated |
| Golden | YY male | Northern California | Resident | Domesticated |
| Skookumchuck | YY male | Chehalis River | Winter Steelhead | Semi-wild |
| Klamath | YY male | Williamson River | Possibly resident | Wild |
| Skamania | XX male | Lower Columbia River | Summer steelhead | Semi-wild |
| Touchet | YY male | Walla Walla River | Inland summer steelhead | Wild |
| Clearwater | YY male | Snake River | Inland summer steelhead | Semi-wild |
| Swanson | YY male | Kenai Peninsula, Alaska | Resident | Semi-domesticated |
Information is also provided on the Swanson line, which is the source of the current rainbow trout reference genome.
Washington tributary.
Oregon tributary.
This line is phenotypically a male, but it lacking the sdy gene.
Idaho tributary.
The line was established from a fish that was in the second generation of the hatchery program, or two generations removed from the wild origin of this population.
SNP distribution on chromosomes and unplaced scaffolds.
| Chromosome 1 | 84,884,017 | 1,359,811 | 62 |
| Chromosome 2 | 85,480,851 | 1,344,028 | 63 |
| Chromosome 3 | 84,937,469 | 1,228,012 | 69 |
| Chromosome 4 | 85,056,421 | 1,418,468 | 59 |
| Chromosome 5 | 92,202,553 | 1,502,172 | 61 |
| Chromosome 6 | 82,930,723 | 1,223,031 | 67 |
| Chromosome 7 | 79,763,776 | 1,256,322 | 63 |
| Chromosome 8 | 83,778,284 | 1,337,003 | 62 |
| Chromosome 9 | 68,467,736 | 1,111,656 | 61 |
| Chromosome 10 | 71,056,191 | 1,102,388 | 64 |
| Chromosome 11 | 80,278,304 | 1,334,513 | 60 |
| Chromosome 12 | 89,655,008 | 1,323,418 | 67 |
| Chromosome 13 | 66,052,243 | 765,805 | 86 |
| Chromosome 14 | 80,358,725 | 1,123,544 | 71 |
| Chromosome 15 | 63,368,167 | 1,007,882 | 62 |
| Chromosome 16 | 70,896,079 | 1,158,190 | 61 |
| Chromosome 17 | 76,527,837 | 1,167,740 | 65 |
| Chromosome 18 | 61,719,220 | 922,176 | 66 |
| Chromosome 19 | 59,576,373 | 972,098 | 61 |
| Chromosome 20 | 41,412,012 | 729,603 | 56 |
| Chromosome 21 | 51,929,587 | 712,355 | 72 |
| Chromosome 22 | 48,550,143 | 919,605 | 52 |
| Chromosome 23 | 49,041,849 | 830,851 | 59 |
| Chromosome 24 | 40,362,479 | 642,785 | 62 |
| Chromosome 25 | 82,601,656 | 1,326,472 | 62 |
| Chromosome 26 | 40,182,520 | 485,126 | 82 |
| Chromosome 27 | 45,316,876 | 688,717 | 65 |
| Chromosome 28 | 40,943,904 | 679,933 | 60 |
| Chromosome 29 | 42,631,536 | 628,383 | 67 |
| Unplaced scaffolds | 229,020,432 | 1,139,018 | 201 |
Number of SNPs called by Samtool mpileup only, by FreeBayes only, by both pipelines at a same site with same alternative allele (Same site and ALT), and by both pipelines at a same site with different alternative allele (Same site different ALT).
| QUAL | 7,265,023 | 3,580,043 | 43,220,392 (80%) | 7,073 (0.016%) |
| LC | 6,525,505 | 3,014,395 | 42,444,371 (82%) | 6,083 (0.014%) |
| DP | 6,146,820 | 4,137,345 | 39,629,341 (79%) | 4,155 (0.010%) |
| NS | 7,190,818 | 3,054,380 | 34,120,907 (77%) | 2,006 (0.006%) |
| DH | 5,799,376 | 2,855,191 | 31,441,105 (78%) | 1,205 (0.004%) |
The filtering steps are described in the Methods section.
Percentage is taken out of total combined sets.
Percentage is taken out of total SNPs called by both pipelines at the same sites.
Number of samples used for SNP discovery, and number of SNPs called by SAMtools mpileup and Freebayes.
| 61 | 31,333,183 | 29,310,209 | 27,561,442 |
| 60 | 3,768,527 | 3,100,183 | 2,733,564 |
| 59 | 1,375,968 | 1,180,917 | 792,937 |
| 58 | 764,008 | 706,192 | 353,162 |
The total number of whole-genome sequenced samples was 61. Up to three missing genotypes per SNP were permitted, but most of the SNPs were discovered without missing genotypes.
Figure 1Distribution of SNP minor allele frequency (MAF) in a re-sequencing project of 61 rainbow trout samples (including 11 double haploids). The frequency shown is from a total of 122 alleles per SNP, as each sample was represented by two alleles. Exact number of SNPs for each distribution range category is shown above the corresponding histogram for that range.
Number and percent of polymorphic SNPs per population or group of samples.
| Dworshak | H | 4 | 31,356,857 | 8,642,206 | 28 | 0.23 | 0.40 |
| Quinault | H | 4 | 31,336,970 | 9,414,415 | 30 | 0.23 | 0.38 |
| L. Quinault | H | 4 | 31,352,692 | 9,622,312 | 31 | 0.24 | 0.38 |
| Elwha | W | 4 | 31,389,210 | 11,149,740 | 36 | 0.22 | 0.37 |
| Skamania | H | 4 | 31,346,034 | 9,408,390 | 30 | 0.23 | 0.37 |
| Big Creek | W | 4 | 31,071,820 | 11,504,243 | 37 | 0.23 | 0.35 |
| Klamath | W | 4 | 30,742,045 | 10,469,080 | 34 | 0.24 | 0.35 |
| Aquagen | A | 12 | 30,856,284 | 11,908,286 | 39 | 0.21 | 0.32 |
| DH Line | DH | 11 | 29,951,350 | 14,423,126 | 48 | 0.19 | 0.01 |
W, Wild; H, Hatchery; A, Aquaculture; DH, doubled haploid.
Number of fish that were genotyped from that population or group.
Number of SNPs with genotype data for all the fish from that population or group.
Average minor allele frequency (MAF) from all the polymorphic SNPs in each population.
Average observed heterozygosity from all the polymorphic SNPs in each population.
Figure 2A Maximum-Likelihood phylogenetic tree for 40 rainbow trout sampled from eight populations with a samples size of at least four fish per population. The tree was generated with the SNPhylo pipeline using the program default thresholds for filtering of the SNP genotype data, and using the DNAML program as implemented in the PHYLIP package. The number at each node represents the bootstrap value (percentage out of 1,000 bootstrap samples, estimated with the R package “phangorn”). The population of origin or geographic location is represented in the sample names, following the population nomenclature in Table 5.
Summary of SnpEff annotation with the number of predicted effects in each effect type specified using the sequence ontology (SO) terms.
| stop_gained | HIGH | 7,108 |
| splice_donor_variant | HIGH | 3,068 |
| splice_acceptor_variant | HIGH | 2,765 |
| stop_lost | HIGH | 1,537 |
| start_lost | HIGH | 720 |
| missense_variant | MODERATE | 622,938 |
| splice_region_variant | MODERATE or LOW | 163,143 |
| synonymous_variant | LOW | 733,474 |
| 5_prime_UTR_premature_start_codon_gain_variant | LOW | 58,871 |
| stop_retained_variant | LOW | 796 |
| initiator_codon_variant | LOW | 92 |
| non_coding_transcript_variant | MODIFIER | 32,353,397 |
| intron_variant | MODIFIER | 30,231,828 |
| intergenic_region | MODIFIER | 15,397,666 |
| upstream_gene_variant | MODIFIER | 12,261,411 |
| downstream_gene_variant | MODIFIER | 12,225,390 |
| 3_prime_UTR_variant | MODIFIER | 1,119,901 |
| intragenic_variant | MODIFIER | 400,609 |
| 5_prime_UTR_variant | MODIFIER | 361,306 |
| non_coding_transcript_exon_variant | MODIFIER | 248,075 |
Putative impact of each effect is also included in the table.
Figure 3Genomic distribution of putative PSVs from the analysis of resequencing data from doubled haploid lines. The number of putative PSVs and MSVs on each chromosome arm is plotted against the chromosome arm length in base pairs. Putative PSVs and MSVs were counted as SNPs that were heterozygous in at least two DH lines (Het > 1) (A), or Het in all 11 lines (B). The 14 chromosome arms with delayed re-diploidization in the rainbow trout genome are 2p, 3p, 6q, 7p, 10q, 12q, 13p, 13q, 15q, 17p, 18p, 19p, 21p, and 26. Those chromosome arms have much higher density of putative PSVs and MSVs compared to the rest of the chromosome arms in the rainbow trout genome.
The effect of threshold of minimum number of heterozygous genotypes (Het) among the 11 doubled haploid (DH) lines on the number of putative PSVs and MSVs identified and the average observed heterozygosity in those loci among the 50 outbred (non-DH) rainbow trout sampled in this study.
| Het > 0 | 2,767,612 | 0.35 | 28 |
| Het > 1 | 1,733,481 | 0.44 | 39 |
| Het > 2 | 1,187,715 | 0.51 | 49 |
| Het > 3 | 836,278 | 0.57 | 58 |
| Het > 4 | 596,329 | 0.62 | 67 |
| Het > 5 | 429,862 | 0.66 | 75 |
| Het > 6 | 311,629 | 0.71 | 81 |
| Het > 7 | 224,252 | 0.75 | 85 |
| Het > 8 | 155,617 | 0.78 | 89 |
| Het > 9 | 97,433 | 0.82 | 92 |
| Het = 11 | 45,757 | 0.85 | 95 |
High rate of observed heterozygosity indicates high occurrence of MSVs and PSVs.
Average observed heterozygosity per locus.
Percent of loci that deviated from expected Hardy-Weinberg equilibrium (P < 0.05).