| Literature DB >> 29229672 |
Craig B Lowe1,2, Nicelio Sanchez-Luege1, Timothy R Howes1, Shannon D Brady1, Rhea R Daugherty1,3, Felicity C Jones1,3, Michael A Bell4, David M Kingsley1,2.
Abstract
We present a method to detect copy number variants (CNVs) that are differentially present between two groups of sequenced samples. We use a finite-state transducer where the emitted read depth is conditioned on the mappability and GC-content of all reads that occur at a given base position. In this model, the read depth within a region is a mixture of binomials, which in simulations matches the read depth more closely than the often-used negative binomial distribution. The method analyzes all samples simultaneously, preserving uncertainty as to the breakpoints and magnitude of CNVs present in an individual when it identifies CNVs differentially present between the two groups. We apply this method to identify CNVs that are recurrently associated with postglacial adaptation of marine threespine stickleback (Gasterosteus aculeatus) to freshwater. We identify 6664 regions of the stickleback genome, totaling 1.7 Mbp, which show consistent copy number differences between marine and freshwater populations. These deletions and duplications affect both protein-coding genes and cis-regulatory elements, including a noncoding intronic telencephalon enhancer of DCHS1 The functions of the genes near or included within the 6664 CNVs are enriched for immunity and muscle development, as well as head and limb morphology. Although freshwater stickleback have repeatedly evolved from marine populations, we show that freshwater stickleback also act as reservoirs for ancient ancestral sequences that are highly conserved among distantly related teleosts, but largely missing from marine stickleback due to recent selective sweeps in marine populations.Entities:
Mesh:
Year: 2017 PMID: 29229672 PMCID: PMC5793789 DOI: 10.1101/gr.206938.116
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.The collection locations of the 11 freshwater stickleback (blue) and 10 marine stickleback (red). Of the 21 sites, 14 belong to marine–freshwater pairs in which the sampling sites are geographically adjacent and have likely experienced ongoing or historical gene flow. Such gene flow reduces the number of genetic differences that are not related to differing marine and freshwater habitats.
Figure 2.The transducer model. (A) The transducer reads from s · k+k tapes containing the GC-bias and mappability of all overlapping reads, where s is the total number of individuals being analyzed, and k is the length of sequencing reads. g is the GC-bias of the jth of k read position that would cover assembly position i for sample n of s. b is the mappability of the jth of k read position that would cover position i. There are multiple output tapes, one for each individual, containing the read depth for that sample. d is the depth of reads at assembly position i for sample n of s. (B) The 25 states represent the variation in canonical copy number that can be detected by the transducer. (C) One example of the emission probabilities for a single base with the state representing no change in freshwater copy number, while the marine fish have a duplication. We show the marginal probability distribution for four individuals.
Figure 3.Detecting simulated duplications (A) and deletions (B) present in freshwater, but not marine, individuals. We simulated data analogous to our main data set of 10 marine and 11 freshwater stickleback genomes sequenced to a median coverage of 1.7×, except we randomly placed deletions and duplications that ranged from 30 to 1000 bp in the genomes of all freshwater individuals. We recorded the performance of existing methods (at 0.2 false positives) either based on the annotations of individual genomes (cn.MOPS and Genome STRiP) or after we pooled all marine samples into a pseudo-individual and all freshwater samples into a pseudo-individual (rSW-seq and CNVnator). We also tested the transducer's ability to detect heterozygous duplications and deletions. Heterozygous deletions are especially difficult to detect because, with 36-bp reads, regions with SNP divergence may exhibit reduced mapping efficiency, which results in read coverage similar to that of a heterozygous deletion.
Figure 4.Freshwater deletions in DCHS1 remove a conserved sequence that functions as a telencephalon enhancer. (A) The majority of freshwater populations do not have any reads (blue bars) mapping to a region within the first intron of DCHS1, but the marine populations (red bars) appear to universally contain this piece of DNA. The deleted region encompasses an element that shows cross-species conservation with medaka. (B) We cloned three copies of this region upstream of the HSP70 minimal promoter and GFP reporter to test its regulatory potential. (C) We injected this construct into fertilized eggs and multiple transgenic lines show GFP expression in the developing brain at 5 d post-fertilization (dotted white line shows the outline of the embryo within the egg).