Literature DB >> 25009150

A defined zebrafish line for high-throughput genetics and genomics: NHGRI-1.

Matthew C LaFave1, Gaurav K Varshney1, Meghana Vemulapalli2, James C Mullikin3, Shawn M Burgess4.   

Abstract

Substantial intrastrain variation at the nucleotide level complicates molecular and genetic studies in zebrafish, such as the use of CRISPRs or morpholinos to inactivate genes. In the absence of robust inbred zebrafish lines, we generated NHGRI-1, a healthy and fecund strain derived from founder parents we sequenced to a depth of ∼50×. Within this strain, we have identified the majority of the genome that matches the reference sequence and documented most of the variants. This strain has utility for many reasons, but in particular it will be useful for any researcher who needs to know the exact sequence (with all variants) of a particular genomic region or who wants to be able to robustly map sequences back to a genome with all possible variants defined.
Copyright © 2014 by the Genetics Society of America.

Entities:  

Keywords:  CRISPR; SNV; genome sequence; variants; zebrafish

Mesh:

Year:  2014        PMID: 25009150      PMCID: PMC4174928          DOI: 10.1534/genetics.114.166769

Source DB:  PubMed          Journal:  Genetics        ISSN: 0016-6731            Impact factor:   4.562


THE zebrafish (Danio rerio) is a powerful tool for understanding vertebrate biology. The usefulness of this model organism is bolstered by the availability of a “finished” sequenced and annotated genome (Howe ; Flicek ). As a natural extension of this resource, there are several high-throughput efforts to systematically mutagenize all zebrafish protein-coding genes (Moens ; Kettleborough ; Varshney ,b). In addition to such projects, the combination of a sequenced genome and developments in targeted nuclease technology mean that the zebrafish community is now able to rapidly take advantage of custom genome-editing technologies (Doyon ; Bedell ; Hruscha ; Hwang ; Jao ). CRISPRs in particular provide an efficient, easy, and inexpensive means of manipulating and interrogating the genome (Jinek ; Cong ; Mali ). However, because there are very few hardy inbred zebrafish lines (overinbreeding tends to result in unhealthy stocks) and polymorphism rates are close to 1 every 100 bases, variants frequently have the potential to interfere with target site design (Stickney ; Guryev ; Bowen ) or with regions of homology used for homologous recombination. In general, genome targeting is heavily dependent on an exact match to the primary sequence. Depending on the sequence, even a single mismatch can severely reduce the cutting efficiency (Hsu ). In addition, other techniques such as RNA-Seq or ChIP-Seq are substantially less accurate without having fully characterized variants in the background strain. Therefore, it is preferable to carry out studies in a zebrafish strain in which the regions of invariant sequence are known with a high degree of confidence and all variants are categorized to allow for robust genomic mapping. With these concerns in mind, we derived the zebrafish line NHGRI-1. NHGRI-1 fish were derived from an original strain known as “TAB-5” made from a hybrid cross between fish from two of the most commonly used zebrafish lines: Tübingen and AB (Streisinger ; Haffter ). The F1 fish from this cross were inbred and screened to be clear of any mutations affecting the first 5 days of development. Since its initial isolation in 1997, we have carried the strain in the laboratory until the present day without introducing other outside genetic diversity. We selected several mating pairs from the TAB-5 pool, and the most robust mating pair was chosen as the founding pair for NHGRI-1. We are now on the third generation of NHGRI-1 and their fecundity and overall health remain strong. We carried out high-throughput sequencing to a depth of ∼50× for each parent. The male and female sequencing libraries had a combined 1,289,142,362 nonduplicate reads, with a median coverage of 52× and 47×, respectively. By doing so, we identified >10 million previously unreported single-nucleotide variants (SNVs). The raw sequence data have been deposited in the NCBI Sequence Read Archive [BioProject ID: 246102]. In addition, we have identified nearly all the regions of the genome that are invariant relative to the Zv9 reference sequence. We generated a browser extensible data (BED) file of invariant nucleotides, which indicates the regions in which there were both a lack of alternative alleles and a lack of sufficient read depth and genotype confidence to call bases as invariant (Figure 1). Seventy-one percent of the genome fits these criteria. The invariant file is hosted on the NHGRI-1 website at http://research.nhgri.nih.gov/manuscripts/Burgess/zebrafish/download.shtml, a University of California, Santa Cruz (UCSC) data hub called “ZebrafishGenomics” has been established at http://genome.ucsc.edu/cgi-bin/hgHubConnect, and data have been transferred to http://zfin.org/. Information on the variants themselves can be downloaded from dbSNP (submitter handle, NHGRI_DGS; submitter batch ID, NHGRI-1_founders). The invariant regions are easily identified by using the BED file, simplifying the design of CRISPR targets, amplicon primers, finding regions for homologous recombination, Morpholino design, or essentially any experiment that requires high confidence in the exact sequence of the genomic region of interest.
Figure 1

Screenshot of the UCSC browser custom tracks for NHGRI-1. Twenty mating pairs from 6-month-old TAB-5 fish were screened to select a robust founding pair with good clutch size and healthy progeny; the most fecund pair was renamed NHGRI-1. Fin clips from the NHGRI-1 male and female were prepared as separate genomic DNA libraries and sequenced on the Illumina HiSeq 2000 by the National Institutes of Health (NIH) Intramural Sequencing Center. Both libraries were subjected to paired-end sequencing with 101-bp reads. We aligned the sequence to the zebrafish genome [Zv9 (Howe )] with Novoalign version 2.08.02 (http://www.novocraft.com/). We removed PCR duplicates via SAMtools version 0.1.18 (Li ). We used bam2mpg to identify the most probable genotype (MPG) for nucleotides in both parents (Teer ). Bases that did not have an MPG score of at least 10, coverage of at least 20×, and a ratio of MPG score to coverage >0.5 were discarded. Regions of low sequence complexity were not specifically excluded from the analysis unless they failed to meet these criteria. The bases that matched the reference and met the above criteria in both fish were used to build the BED track of invariant nucleotides. The top track indicates the bases that were invariant in both fish sequenced. The white regions indicate either variation in at least one fish or insufficient read depth to confidently call the region as invariant. The second track indicates two nonsense mutations detected in this region. The letter indicates the alternative allele, and the color indicates whether the mutation was homozygous (red) or heterozygous (blue) in the NHGRI-1 population. Both tracks are available on the ZebrafishGenomics track hub, which is hosted at http://research.nhgri.nih.gov/manuscripts/Burgess/zebrafish/downloads/NHGRI-1/hub.txt and accessible through http://genome.ucsc.edu/cgi-bin/hgHubConnect.

Screenshot of the UCSC browser custom tracks for NHGRI-1. Twenty mating pairs from 6-month-old TAB-5 fish were screened to select a robust founding pair with good clutch size and healthy progeny; the most fecund pair was renamed NHGRI-1. Fin clips from the NHGRI-1 male and female were prepared as separate genomic DNA libraries and sequenced on the Illumina HiSeq 2000 by the National Institutes of Health (NIH) Intramural Sequencing Center. Both libraries were subjected to paired-end sequencing with 101-bp reads. We aligned the sequence to the zebrafish genome [Zv9 (Howe )] with Novoalign version 2.08.02 (http://www.novocraft.com/). We removed PCR duplicates via SAMtools version 0.1.18 (Li ). We used bam2mpg to identify the most probable genotype (MPG) for nucleotides in both parents (Teer ). Bases that did not have an MPG score of at least 10, coverage of at least 20×, and a ratio of MPG score to coverage >0.5 were discarded. Regions of low sequence complexity were not specifically excluded from the analysis unless they failed to meet these criteria. The bases that matched the reference and met the above criteria in both fish were used to build the BED track of invariant nucleotides. The top track indicates the bases that were invariant in both fish sequenced. The white regions indicate either variation in at least one fish or insufficient read depth to confidently call the region as invariant. The second track indicates two nonsense mutations detected in this region. The letter indicates the alternative allele, and the color indicates whether the mutation was homozygous (red) or heterozygous (blue) in the NHGRI-1 population. Both tracks are available on the ZebrafishGenomics track hub, which is hosted at http://research.nhgri.nih.gov/manuscripts/Burgess/zebrafish/downloads/NHGRI-1/hub.txt and accessible through http://genome.ucsc.edu/cgi-bin/hgHubConnect. We detected >17 million total variants upon merging the variant calls from the two libraries. Of that total, 236,301 were in exons of Ensembl transcripts (Table 1). Variants were called as homozygous only if they were homozygous in both fish; such variants will stably retain the variant allele in future generations.
Table 1

Raw counts of variants in NHGRI-1

VariantsSNVDIVTotal
Total variants14,917,3392,210,08017,127,419
 Heterozygous12,245,7151,953,27714,198,992
 Homozygous2,642,908225,3472,868,255
 Unknown28,71631,45660,172
Exon variants233,1413,160236,301
 Heterozygous190,6262,815193,441
 Homozygous42,15331142,464
 Unknown36234396

Single-nucleotide variants and deletion and insertion variants were annotated using ANNOVAR version 2012-10-16 (Wang ). Our annotation used the ensGene track hosted on the UCSC genome browser, which corresponded to Ensembl release 74 (Flicek ). We annotated the male and female fish separately and then combined the ANNOVAR output to determine overall homozygosity and heterozygosity. Variants were considered homozygous in NHGRI-1 only if they were independently called as homozygous in both sexes. We identified a variant as unknown if it was called as (1) unknown in both sexes or (2) unknown in one fish and homozygous reference in the other. All remaining variants were considered to be heterozygous in NHGRI-1, even if they were called as homozygous in one of the sexes. In cases in which deletion or insertion variants (DIVs) of different lengths were reported at the same position, both were counted as separate variants.

Single-nucleotide variants and deletion and insertion variants were annotated using ANNOVAR version 2012-10-16 (Wang ). Our annotation used the ensGene track hosted on the UCSC genome browser, which corresponded to Ensembl release 74 (Flicek ). We annotated the male and female fish separately and then combined the ANNOVAR output to determine overall homozygosity and heterozygosity. Variants were considered homozygous in NHGRI-1 only if they were independently called as homozygous in both sexes. We identified a variant as unknown if it was called as (1) unknown in both sexes or (2) unknown in one fish and homozygous reference in the other. All remaining variants were considered to be heterozygous in NHGRI-1, even if they were called as homozygous in one of the sexes. In cases in which deletion or insertion variants (DIVs) of different lengths were reported at the same position, both were counted as separate variants. To underscore the issues related to background variation in the commonly used zebrafish lines, we detected 669 variants that formed premature stop codons in at least one transcript, 105 of which were homozygous mutant in both sexes (Table 2). We have generated a BED track of these variants, indicating the location, the alternative allele, and the homo/heterozygosity. This track is available on the ZebrafishGenomics hub and the NHGRI-1 website (Figure 1). A list of affected genes can also be found in supporting information, Table S1.
Table 2

Mutations introduced by variants in NHGRI-1

AnnotationsTotal
SNV annotation
 Nonsynonymous77,791
 Synonymous149,378
 Stop gain640
 Stop loss90
 Unknown5,242
DIV annotation
 Frameshift deletion638
 Frameshift insertion540
 Nonframeshift insertion944
 Nonframeshift deletion872
 Stop gain29
 Stop loss8
 Unknown129
We detected 3160 deletion or insertion variants (DIVs) in exons. DIVs of a length divisible by three were highly represented and comprise ∼60% of the DIVs (Figure 2A). Presumably, this is because the resultant nonframeshift mutations would be less likely to be selected against than those that produce frameshifts. A similar profile has been reported in human indels (Chen ). This trend is not present in the genome-wide set of 2,210,080 NHGRI-1 DIVs (Figure 2B).
Figure 2

Deletion and insertion variant length distribution within exons. (A) The 3160 DIVs in exons. (B) The 2,210,080 DIVs detected genome-wide. Red bars indicate the number of deletions of a given length; blue bars represent insertions.

Deletion and insertion variant length distribution within exons. (A) The 3160 DIVs in exons. (B) The 2,210,080 DIVs detected genome-wide. Red bars indicate the number of deletions of a given length; blue bars represent insertions. We compared the SNVs identified in NHGRI-1 with dbSNP (Build ID: 139) and a publically available data set obtained from low-coverage sequencing of multiple zebrafish lines (Sherry ; Bowen ). For simplicity, we compared only biallelic SNVs for which the reference sequence is known (i.e., no “N”s). The majority of NHGRI-1 SNVs had not been previously reported in either data set (Figure 3). We find that the rate of SNVs per sequenced base in NHGRI-1 is 0.01 or ∼12.5–20× higher than the rate in humans (Kidd ). It is important to note that, while the 0.01 number is relevant for NHGRI-1, the regions of homozygosity created by inbreeding mean it certainly underestimates the SNV load in zebrafish as a whole.
Figure 3

SNV overlap with publicly available data sets. This comparison incorporates only SNVs that were biallelic and for which the reference base was an unambiguous A, C, G, or T. The Bowen SNVs were downloaded from http://fishbonelab.org/harris/Resources_files/parental_variants.tar; both data sets were downloaded on March 12th, 2014.

SNV overlap with publicly available data sets. This comparison incorporates only SNVs that were biallelic and for which the reference base was an unambiguous A, C, G, or T. The Bowen SNVs were downloaded from http://fishbonelab.org/harris/Resources_files/parental_variants.tar; both data sets were downloaded on March 12th, 2014. We also compared the mutational profile of NHGRI-1 to that reported for a zebrafish captured from the wild and sequenced at 39× coverage (Patowary ). Different cutoffs had been applied for variant calling in said study, such as a minimum of 32 reads to call an SNV and 5 reads to call a DIV, but the ratios of variant types can be compared. The differences are statistically significant, but small. Among the SNVs in the wild zebrafish, 22.3% were reported as being homozygous, compared to 17.8% in NHGRI-1 (Fisher’s exact test, P < 2.2 × 10−16). Deletions are more prevalent than insertions in both studies, with the wild zebrafish reported as having 53.9% deletions, compared to 51.6% in NHGRI-1 (P < 2.2 × 10−16). This fish line will have utility in terms of automated design for targeted nucleases, as well as for studies such as ChIP-Seq or RNA-Seq where SNVs or DIVs might reduce the accuracy of mapping the raw sequence data. In addition, techniques such as homologous recombination are very sensitive to variants (te Riele ), and NHGRI-1 will allow researchers to target genomic regions that do not contain any variant nucleotides. Thus, NHGRI-1 will prove useful in a variety of circumstances where absolute knowledge of the possible sequence variation is needed. The line will be distributed by the Zebrafish International Resource Center (http://zebrafish.org) and the European Zebrafish Resource Center (http://www.ezrc.kit.edu).
  28 in total

1.  Efficient mapping and cloning of mutations in zebrafish by low-coverage whole-genome sequencing.

Authors:  Margot E Bowen; Katrin Henke; Kellee R Siegfried; Matthew L Warman; Matthew P Harris
Journal:  Genetics       Date:  2011-12-14       Impact factor: 4.562

2.  Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing.

Authors:  Jamie K Teer; Lori L Bonnycastle; Peter S Chines; Nancy F Hansen; Natsuyo Aoyama; Amy J Swift; Hatice Ozel Abaan; Thomas J Albert; Elliott H Margulies; Eric D Green; Francis S Collins; James C Mullikin; Leslie G Biesecker
Journal:  Genome Res       Date:  2010-09-01       Impact factor: 9.043

3.  Genetic variation in the zebrafish.

Authors:  Victor Guryev; Marco J Koudijs; Eugene Berezikov; Stephen L Johnson; Ronald H A Plasterk; Fredericus J M van Eeden; Edwin Cuppen
Journal:  Genome Res       Date:  2006-03-13       Impact factor: 9.043

Review 4.  Reverse genetics in zebrafish by TILLING.

Authors:  Cecilia B Moens; Thomas M Donn; Emma R Wolf-Saxon; Taylur P Ma
Journal:  Brief Funct Genomic Proteomic       Date:  2008-11-21

5.  A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity.

Authors:  Martin Jinek; Krzysztof Chylinski; Ines Fonfara; Michael Hauer; Jennifer A Doudna; Emmanuelle Charpentier
Journal:  Science       Date:  2012-06-28       Impact factor: 47.728

6.  The Sequence Alignment/Map format and SAMtools.

Authors:  Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

7.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.

Authors:  Kai Wang; Mingyao Li; Hakon Hakonarson
Journal:  Nucleic Acids Res       Date:  2010-07-03       Impact factor: 16.971

8.  Mapping and sequencing of structural variation from eight human genomes.

Authors:  Jeffrey M Kidd; Gregory M Cooper; William F Donahue; Hillary S Hayden; Nick Sampas; Tina Graves; Nancy Hansen; Brian Teague; Can Alkan; Francesca Antonacci; Eric Haugen; Troy Zerr; N Alice Yamada; Peter Tsang; Tera L Newman; Eray Tüzün; Ze Cheng; Heather M Ebling; Nadeem Tusneem; Robert David; Will Gillett; Karen A Phelps; Molly Weaver; David Saranga; Adrianne Brand; Wei Tao; Erik Gustafson; Kevin McKernan; Lin Chen; Maika Malig; Joshua D Smith; Joshua M Korn; Steven A McCarroll; David A Altshuler; Daniel A Peiffer; Michael Dorschner; John Stamatoyannopoulos; David Schwartz; Deborah A Nickerson; James C Mullikin; Richard K Wilson; Laurakay Bruhn; Maynard V Olson; Rajinder Kaul; Douglas R Smith; Evan E Eichler
Journal:  Nature       Date:  2008-05-01       Impact factor: 49.962

9.  The identification of genes with unique and essential functions in the development of the zebrafish, Danio rerio.

Authors:  P Haffter; M Granato; M Brand; M C Mullins; M Hammerschmidt; D A Kane; J Odenthal; F J van Eeden; Y J Jiang; C P Heisenberg; R N Kelsh; M Furutani-Seiki; E Vogelsang; D Beuchle; U Schach; C Fabian; C Nüsslein-Volhard
Journal:  Development       Date:  1996-12       Impact factor: 6.868

10.  Heritable targeted gene disruption in zebrafish using designed zinc-finger nucleases.

Authors:  Yannick Doyon; Jasmine M McCammon; Jeffrey C Miller; Farhoud Faraji; Catherine Ngo; George E Katibah; Rainier Amora; Toby D Hocking; Lei Zhang; Edward J Rebar; Philip D Gregory; Fyodor D Urnov; Sharon L Amacher
Journal:  Nat Biotechnol       Date:  2008-05-25       Impact factor: 54.908

View more
  41 in total

1.  Parallelism and Epistasis in Skeletal Evolution Identified through Use of Phylogenomic Mapping Strategies.

Authors:  Jacob M Daane; Nicolas Rohner; Peter Konstantinidis; Sergej Djuranovic; Matthew P Harris
Journal:  Mol Biol Evol       Date:  2015-10-08       Impact factor: 16.240

Review 2.  A fish is not a mouse: understanding differences in background genetics is critical for reproducibility.

Authors:  Marcus J Crim; Christian Lawrence
Journal:  Lab Anim (NY)       Date:  2020-12-02       Impact factor: 12.625

3.  Alterations of larval photo-dependent swimming responses (PDR): New endpoints for rapid and diagnostic screening of aquatic contamination.

Authors:  Luis Colón-Cruz; Lauren Kristofco; Jonathan Crooke-Rosado; Agnes Acevedo; Aranza Torrado; Bryan W Brooks; María A Sosa; Martine Behra
Journal:  Ecotoxicol Environ Saf       Date:  2017-09-19       Impact factor: 6.291

Review 4.  Zebrafish Genome Engineering Using the CRISPR-Cas9 System.

Authors:  Mingyu Li; Liyuan Zhao; Patrick S Page-McCaw; Wenbiao Chen
Journal:  Trends Genet       Date:  2016-11-08       Impact factor: 11.639

5.  Susceptibility of larval zebrafish to the seizurogenic activity of GABA type A receptor antagonists.

Authors:  Suren B Bandara; Dennis R Carty; Vikrant Singh; Danielle J Harvey; Natalia Vasylieva; Brandon Pressly; Heike Wulff; Pamela J Lein
Journal:  Neurotoxicology       Date:  2019-12-04       Impact factor: 4.294

6.  A high-throughput functional genomics workflow based on CRISPR/Cas9-mediated targeted mutagenesis in zebrafish.

Authors:  Gaurav K Varshney; Blake Carrington; Wuhong Pei; Kevin Bishop; Zelin Chen; Chunxin Fan; Lisha Xu; Marypat Jones; Matthew C LaFave; Johan Ledin; Raman Sood; Shawn M Burgess
Journal:  Nat Protoc       Date:  2016-10-27       Impact factor: 13.491

7.  Image velocimetry and spectral analysis enable quantitative characterization of larval zebrafish gut motility.

Authors:  J Ganz; R P Baker; M K Hamilton; E Melancon; P Diba; J S Eisen; R Parthasarathy
Journal:  Neurogastroenterol Motil       Date:  2018-05-02       Impact factor: 3.598

8.  A scientist's guide for submitting data to ZFIN.

Authors:  D G Howe; Y M Bradford; A Eagle; D Fashena; K Frazer; P Kalita; P Mani; R Martin; S T Moxon; H Paddock; C Pich; S Ramachandran; L Ruzicka; K Schaper; X Shao; A Singer; S Toro; C Van Slyke; M Westerfield
Journal:  Methods Cell Biol       Date:  2016-05-12       Impact factor: 1.441

Review 9.  Understanding the regulation of vertebrate hematopoiesis and blood disorders - big lessons from a small fish.

Authors:  Anne L Robertson; Serine Avagyan; John M Gansner; Leonard I Zon
Journal:  FEBS Lett       Date:  2016-09-25       Impact factor: 4.124

10.  Kinesin superfamily protein Kif26b links Wnt5a-Ror signaling to the control of cell and tissue behaviors in vertebrates.

Authors:  Michael W Susman; Edith P Karuna; Ryan C Kunz; Taranjit S Gujral; Andrea V Cantú; Shannon S Choi; Brigette Y Jong; Kyoko Okada; Michael K Scales; Jennie Hum; Linda S Hu; Marc W Kirschner; Ryuichi Nishinakamura; Soichiro Yamada; Diana J Laird; Li-En Jao; Steven P Gygi; Michael E Greenberg; Hsin-Yi Henry Ho
Journal:  Elife       Date:  2017-09-08       Impact factor: 8.140

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.