| Literature DB >> 29368091 |
Michele Balik-Meisner1, Lisa Truong2, Elizabeth H Scholl1, Robert L Tanguay2, David M Reif3.
Abstract
Toxicological and pharmacological researchers have seized upon the many benefits of zebrafish, including the short generation time, well-characterized development, and early maturation as clear embryos. A major difference from many model organisms is that standard husbandry practices in zebrafish are designed to maintain population diversity. While this diversity is attractive for translational applications in human and ecological health, it raises critical questions on how interindividual genetic variation might contribute to chemical exposure or disease susceptibility differences. Findings from pooled samples of zebrafish support this supposition of diversity yet cannot directly measure allele frequencies for reference versus alternate alleles. Using the Tanguay lab Tropical 5D zebrafish line (T5D), we performed whole genome sequencing on a large group (n = 276) of individual zebrafish embryos. Paired-end reads were collected on an Illumina 3000HT, then aligned to the most recent zebrafish reference genome (GRCz10). These data were used to compare observed population genetic variation across species (humans, mice, zebrafish), then across lines within zebrafish. We found more single nucleotide polymorphisms (SNPs) in T5D than have been reported in SNP databases for any of the WIK, TU, TL, or AB lines. We theorize that some subset of the novel SNPs may be shared with other zebrafish lines but have not been identified in other studies due to the limitations of capturing population diversity in pooled sequencing strategies. We establish T5D as a model that is representative of diversity levels within laboratory zebrafish lines and demonstrate that experimental design and analysis can exert major effects when characterizing genetic diversity in heterogeneous populations.Entities:
Mesh:
Year: 2018 PMID: 29368091 PMCID: PMC5851690 DOI: 10.1007/s00335-018-9735-x
Source DB: PubMed Journal: Mamm Genome ISSN: 0938-8990 Impact factor: 2.957
Fig. 1Known variants. a Genome size, known variant count in dbSNP, variant effect, and consequences of transcript variants. The red box contains the variant effects for the 20.1 M SNPs found in T5D. (All other zebrafish data refer to the reference genome and publically available data). b Allele frequency spectrum for common human variants. c Number of models per disease category stacked by organism (from https://monarchinitiative.org). d Number of phenotype-gene associations per species (from monarchinitiative.org)
SNP count per chromosome
| Chromosome | SNP count | Chromosome length (bp) | SNP percentage |
|---|---|---|---|
| 1 | 937,216 | 58,871,917 | 1.59 |
| 2 | 992,016 | 59,543,403 | 1.67 |
| 3 | 903,306 | 62,385,949 | 1.45 |
| 4 | 593,111 | 76,625,712 | 0.77 |
| 5 | 1,123,780 | 71,715,914 | 1.57 |
| 6 | 1,010,933 | 60,272,633 | 1.68 |
| 7 | 1,071,615 | 74,082,188 | 1.45 |
| 8 | 796,793 | 54,191,831 | 1.47 |
| 9 | 928,007 | 56,892,771 | 1.63 |
| 10 | 695,209 | 45,574,255 | 1.53 |
| 11 | 664,127 | 45,107,271 | 1.47 |
| 12 | 708,967 | 49,229,541 | 1.44 |
| 13 | 789,917 | 51,780,250 | 1.53 |
| 14 | 894,307 | 51,944,548 | 1.72 |
| 15 | 756,502 | 47,771,147 | 1.58 |
| 16 | 861,924 | 55,381,981 | 1.56 |
| 17 | 860,076 | 53,345,113 | 1.61 |
| 18 | 817,743 | 51,008,593 | 1.60 |
| 19 | 783,706 | 48,790,377 | 1.61 |
| 20 | 854,683 | 55,370,968 | 1.54 |
| 21 | 726,643 | 45,895,719 | 1.58 |
| 22 | 596,605 | 39,226,288 | 1.52 |
| 23 | 763,643 | 46,272,358 | 1.65 |
| 24 | 677,988 | 42,251,103 | 1.60 |
| 25 | 577,000 | 36,898,761 | 1.56 |
Fig. 4Distribution of variants on chromosome 4. The y axis displays the variant count partitioned into 1 mb bins of genomic sequence (x axis)
Fig. 2Zebrafish variant comparisons. a Venn diagram of SNP sites (in millions) compared to the Zv9 reference genome. b Proportions of SNPs binned by alternate allele frequencies for the 5 lines. The T5D allele frequencies are based on 276 individual whole genome sequences. For all other lines, frequencies were determined based on the proportion of reads with non-reference base calls since no individual genotypes can be determined from pooled sequence alignment. c Venn diagram of indel sites (in millions). d Proportion of indels for discrete alternate allele frequencies
Fig. 3Zebrafish variant comparisons after sequencing and masking a pooled subsample. a Venn diagram of SNP sites (in millions) compared to the Zv9 reference genome. b Proportions of SNPs binned by alternate allele frequencies for the 5 lines. For all lines, frequencies were determined based on the proportion of reads with non-reference base calls since no individual genotypes can be determined from pooled sequence alignment. c Venn diagram of indel sites (in millions). d Proportion of indels for discrete alternate allele frequencies