| Literature DB >> 25009150 |
Matthew C LaFave1, Gaurav K Varshney1, Meghana Vemulapalli2, James C Mullikin3, Shawn M Burgess4.
Abstract
Substantial intrastrain variation at the nucleotide level complicates molecular and genetic studies in zebrafish, such as the use of CRISPRs or morpholinos to inactivate genes. In the absence of robust inbred zebrafish lines, we generated NHGRI-1, a healthy and fecund strain derived from founder parents we sequenced to a depth of ∼50×. Within this strain, we have identified the majority of the genome that matches the reference sequence and documented most of the variants. This strain has utility for many reasons, but in particular it will be useful for any researcher who needs to know the exact sequence (with all variants) of a particular genomic region or who wants to be able to robustly map sequences back to a genome with all possible variants defined.Entities:
Keywords: CRISPR; SNV; genome sequence; variants; zebrafish
Mesh:
Year: 2014 PMID: 25009150 PMCID: PMC4174928 DOI: 10.1534/genetics.114.166769
Source DB: PubMed Journal: Genetics ISSN: 0016-6731 Impact factor: 4.562
Figure 1Screenshot of the UCSC browser custom tracks for NHGRI-1. Twenty mating pairs from 6-month-old TAB-5 fish were screened to select a robust founding pair with good clutch size and healthy progeny; the most fecund pair was renamed NHGRI-1. Fin clips from the NHGRI-1 male and female were prepared as separate genomic DNA libraries and sequenced on the Illumina HiSeq 2000 by the National Institutes of Health (NIH) Intramural Sequencing Center. Both libraries were subjected to paired-end sequencing with 101-bp reads. We aligned the sequence to the zebrafish genome [Zv9 (Howe )] with Novoalign version 2.08.02 (http://www.novocraft.com/). We removed PCR duplicates via SAMtools version 0.1.18 (Li ). We used bam2mpg to identify the most probable genotype (MPG) for nucleotides in both parents (Teer ). Bases that did not have an MPG score of at least 10, coverage of at least 20×, and a ratio of MPG score to coverage >0.5 were discarded. Regions of low sequence complexity were not specifically excluded from the analysis unless they failed to meet these criteria. The bases that matched the reference and met the above criteria in both fish were used to build the BED track of invariant nucleotides. The top track indicates the bases that were invariant in both fish sequenced. The white regions indicate either variation in at least one fish or insufficient read depth to confidently call the region as invariant. The second track indicates two nonsense mutations detected in this region. The letter indicates the alternative allele, and the color indicates whether the mutation was homozygous (red) or heterozygous (blue) in the NHGRI-1 population. Both tracks are available on the ZebrafishGenomics track hub, which is hosted at http://research.nhgri.nih.gov/manuscripts/Burgess/zebrafish/downloads/NHGRI-1/hub.txt and accessible through http://genome.ucsc.edu/cgi-bin/hgHubConnect.
Raw counts of variants in NHGRI-1
| Variants | SNV | DIV | Total |
|---|---|---|---|
| Total variants | 14,917,339 | 2,210,080 | 17,127,419 |
| Heterozygous | 12,245,715 | 1,953,277 | 14,198,992 |
| Homozygous | 2,642,908 | 225,347 | 2,868,255 |
| Unknown | 28,716 | 31,456 | 60,172 |
| Exon variants | 233,141 | 3,160 | 236,301 |
| Heterozygous | 190,626 | 2,815 | 193,441 |
| Homozygous | 42,153 | 311 | 42,464 |
| Unknown | 362 | 34 | 396 |
Single-nucleotide variants and deletion and insertion variants were annotated using ANNOVAR version 2012-10-16 (Wang ). Our annotation used the ensGene track hosted on the UCSC genome browser, which corresponded to Ensembl release 74 (Flicek ). We annotated the male and female fish separately and then combined the ANNOVAR output to determine overall homozygosity and heterozygosity. Variants were considered homozygous in NHGRI-1 only if they were independently called as homozygous in both sexes. We identified a variant as unknown if it was called as (1) unknown in both sexes or (2) unknown in one fish and homozygous reference in the other. All remaining variants were considered to be heterozygous in NHGRI-1, even if they were called as homozygous in one of the sexes. In cases in which deletion or insertion variants (DIVs) of different lengths were reported at the same position, both were counted as separate variants.
Mutations introduced by variants in NHGRI-1
| Annotations | Total |
|---|---|
| SNV annotation | |
| Nonsynonymous | 77,791 |
| Synonymous | 149,378 |
| Stop gain | 640 |
| Stop loss | 90 |
| Unknown | 5,242 |
| DIV annotation | |
| Frameshift deletion | 638 |
| Frameshift insertion | 540 |
| Nonframeshift insertion | 944 |
| Nonframeshift deletion | 872 |
| Stop gain | 29 |
| Stop loss | 8 |
| Unknown | 129 |
Figure 2Deletion and insertion variant length distribution within exons. (A) The 3160 DIVs in exons. (B) The 2,210,080 DIVs detected genome-wide. Red bars indicate the number of deletions of a given length; blue bars represent insertions.
Figure 3SNV overlap with publicly available data sets. This comparison incorporates only SNVs that were biallelic and for which the reference base was an unambiguous A, C, G, or T. The Bowen SNVs were downloaded from http://fishbonelab.org/harris/Resources_files/parental_variants.tar; both data sets were downloaded on March 12th, 2014.