Literature DB >> 21846351

A high-resolution integrated map of copy number polymorphisms within and between breeds of the modern domesticated dog.

Thomas J Nicholas¹, Carl Baker, Evan E Eichler, Joshua M Akey.

Abstract

BACKGROUND: Structural variation contributes to the rich genetic and phenotypic diversity of the modern domestic dog, Canis lupus familiaris, although compared to other organisms, catalogs of canine copy number variants (CNVs) are poorly defined. To this end, we developed a customized high-density tiling array across the canine genome and used it to discover CNVs in nine genetically diverse dogs and a gray wolf.
RESULTS: In total, we identified 403 CNVs that overlap 401 genes, which are enriched for defense/immunity, oxidoreductase, protease, receptor, signaling molecule and transporter genes. Furthermore, we performed detailed comparisons between CNVs located within versus outside of segmental duplications (SDs) and find that CNVs in SDs are enriched for gene content and complexity. Finally, we compiled all known dog CNV regions and genotyped them with a custom aCGH chip in 61 dogs from 12 diverse breeds. These data allowed us to perform the first population genetics analysis of canine structural variation and identify CNVs that potentially contribute to breed specific traits.
CONCLUSIONS: Our comprehensive analysis of canine CNVs will be an important resource in genetically dissecting canine phenotypic and behavioral variation.

Entities: Disease Gene Species

Mesh：

Year: 2011 PMID： 21846351 PMCID： PMC3166287 DOI： 10.1186/1471-2164-12-414

Source DB: PubMed Journal: BMC Genomics ISSN： 1471-2164 Impact factor: 3.969

Background

The domestication of the modern dog from their wolf ancestors has resulted in an extraordinary amount of diversity in canine form and function. As such, dogs are poised to provide unique insights into the genetic architecture of phenotypic variation and the mechanistic basis of strong artificial selection. A number of canine genomics resources have been developed to facilitate genotype-phenotype inferences, including a high-quality whole genome sequence and a dense catalog of SNPs discovered in a wide variety of breeds [1-3]. These genomics resources have been successfully used to identify an increasing number of genes that influence hallmark breed characteristics such as size, coat texture, and skin wrinkling [4-6]. Additionally, SNP data has been used to investigate patterns of genetic variation within and between breeds, establish timing and geography of domestication, examine relatedness among breeds, and identify signatures of artificial selection [4,7-9]. In addition to SNPs, it is important to characterize additional components of canine genomic variation in order to comprehensively assess the genetic basis of phenotypic diversity. For example, structural variation in general, and copy number variants (CNVs) in particular, has emerged as an important source of genetic variation in a wide range of organisms including dogs [10-18]. Duplications and deletions of genomic sequence can have significant impacts on a wide range of phenotypes including breed-defining traits. For example, a duplication of a set of FGF genes in Rhodesian and Thai Ridgebacks leads to the breeds characteristic dorsal hair ridge [19]. Although the FGF duplication provides a vivid example of the phenotypic consequences of structural variation in dogs, it remains unknown whether CNVs are an appreciable source of variation in morphological, behavioral, and physiological traits within and between breeds. Comprehensive discovery of structural variation in a diverse panel of breeds is an important first step in more systemically delimiting the contribution of CNVs to canine phenotypic variation. Previously, we used a customized aCGH chip to identify nearly 700 CNV regions located in segmental duplications (SDs) [17]. However, SDs only cover approximately 5% of the dog genome and thus a large fraction of total genomic space was unexplored. An additional study using a genome-wide tiling array from NimbleGen identified approximately 60 CNV regions outside of SDs [10]. However, the low probe density (~1 probe every 5 kb), limited the number and size of CNVs that could be identified. In an effort to more comprehensively interrogate the canine genome for CNVs, we used a high-density (~1 probe every 1 kb) genome-wide tiling array to discover additional CNVs in a panel of nine genetically and phenotypically diverse dogs. In total, we discover over 400 new CNV regions. Moreover, we designed a custom aCGH chip to genotype all known canine CNVs in 61 dogs from 12 diverse breeds, allowing the first population genetics analysis of structural variation in dogs to be performed. The comprehensive CNV resources that we have developed will be important tools in genetically dissecting canine phenotypic variation.

Results and Discussion

Genome-wide identification of CNVs using a high-density aCGH chip

We performed aCGH using a high-density tiling array in nine breeds (Table 1), a gray wolf, and a self-self hybridization. These nine breeds and gray wolf samples were previously studied using a custom array that exclusively targeted regions containing SDs [17]. In all of the aCGH hybridizations we used the same reference sample (a female Boxer distinct from Tasha, the Boxer used for generating the canine reference sequence), which was also the reference in our prior SD experiments [17]. The aCGH chip consists of over 2.1 million probes distributed across the genome (not including the uncharacterized chromosome, chrUn) with an average probe density of 1 kb. CNVs were identified using a circular binary segmentation algorithm implemented in the program segMNT, part of NimbleGen's NimbleScan software package. These calls were filtered by log2 values and number of probes using an adaptive threshold algorithm where the specific filtering criteria were a function of the size of the CNV (see Methods).

Table 1

Summary of CNVs identified with the genome-wide aCGH chip

	Number of CNVs

Breed	Total	Gain	Loss	Average Size (kb)	Genes
Basenji	109	45	64	54.9	114
Doberman Pinscher	107	57	50	83.8	88
German Shepherd	113	52	61	88.2	105
Labrador Retriever	77	33	44	90.9	88
Pug	97	44	53	62.3	74
Rottweiler	88	30	58	92.6	65
Shetland Sheepdog	86	35	51	123.5	91
Siberian Husky	86	47	39	61.7	91
Standard Poodle	109	37	72	64.6	127
Wolf	136	79	57	86.5	127
Self	0	0	0	0	0

Average	101	46	55	80.9	97

Summary of CNVs identified with the genome-wide aCGH chip We identified 1,008 CNVs in 403 unique CNV regions spanning 30.5 Mb of genomic sequence (Table 1). In the self-self hybridization, no CNVs were called using the same analysis and filters. The average number of CNVs per individual was 101, ranging from 86 (Shetland Sheepdog and Siberian Husky) to 136 (Gray Wolf). The average CNV size was approximately 81 kb (Table 1), and the largest CNV region was located on CFA 34 and spans 3.9 Mb. In total, these 403 CNV regions overlap or contain 401 protein coding genes. After assigning all genes PANTHER Molecular Function terms, we found that the most enriched gene classes are similar to those identified in SDs, namely, defense/immunity, and receptor genes, but also included oxidoreductase, protease, signaling molecule, and transporter genes (Additional file 1). Figure 1 summarizes the location and characteristics of all known dog CNVs derived from this and previous studies [10,17]. In total, after merging closely spaced CNVs, 910 distinct CNV regions that cover over 49.8 Mb have been identified. Of these regions, 395 contain or overlap protein coding genes and 134 have been found in multiple experiments. Larger CNVs were more likely to be observed in multiple studies (average size of CNVs identified in multiple versus single studies was 220 kb versus 64 kb, respectively). As expected, the uncharacterized chromosome (ChrUn), consisting of sequences that cannot be uniquely mapped to the genome, is particularly enriched for CNVs as it harbors approximately 65% of segmental duplications [17], which are hotspots of CNV formation.

Figure 1

An integrated map of all known CNVs in the canine genome. Gray bars represent chromosomes. Blue marks indicate the locations of 910 identified CNV regions. Red marks CNV regions that have been found in at least two different studies. Yellow stripes in the middle of the chromosomes mark CNV regions that contain or overlap known and predicted genes.

Comparison of SD vs Non-SD CNVs

We used the same individuals and reference sample as in our previous study of CNVs in segmental duplications, providing an opportunity to directly compare characteristics of CNVs between SDs and non-SD regions (Table 2). While most CNVs were not associated with SDs, on average CNVs associated with SDs were much larger (160.1 kb vs 33.6 kb; Table 2) resulting in the majority of CNV space to be associated with SDs (21.5 Mb or 70%). Similarly, the majority of genic CNVs were also found in CNVs associated with SDs (66%).

Table 2

Comparison of CNVs located in SDs and outside of SDs

Breed	CNV Location	Gain	Loss	Complex	Singletons	Average Size (kb)	Genes
Basenji	SD	17	22	3	4	173.0	82
	non-SD	17	38	0	23	35.5	32
Doberman Pinscher	SD	21	16	5	3	144.2	59
	non-SD	22	25	0	15	40.8	29
German Shepherd	SD	21	28	1	6	220.5	87
	non-SD	26	29	0	22	40.2	18
Labrador Retriever	SD	15	18	1	2	184.2	48
	non-SD	14	23	0	14	91.8	40
Pug	SD	11	18	4	7	223.1	42
	non-SD	25	25	0	20	27.5	32
Rottweiler	SD	9	27	1	4	222.2	46
	non-SD	20	27	0	19	40.9	19
Shetland Sheepdog	SD	15	24	0	3	339.2	75
	non-SD	19	25	0	20	33.6	16
Siberian Husky	SD	22	14	2	10	176.1	79
	non-SD	15	21	0	15	52.2	12
Standard Poodle	SD	13	26	1	7	215.6	87
	non-SD	17	44	0	36	25.3	40
Wolf	SD	34	25	3	16	207.7	95
	non-SD	25	28	0	14	36.3	32

Comparison of CNVs located in SDs and outside of SDs Of the 403 distinct CNV regions, 143 are present in multiple individuals and 260 were identified in a single individual. Interestingly, approximately 80% of these "singletons" are located outside of SDs (Table 2) as has been observed in humans [20-22]. Moreover, CNV complexity was markedly different between SD and non-SD CNVs. Specifically, we define CNV regions that exhibit both gains and losses in copy number within a single individual as complex. While only 14 complex regions were identified, they are all from segmental duplications. These observations are consistent with the dynamic nature of SDs [17,20-26], which are likely to harbor CNVs that are polymorphic within and between breeds.

CNV genotyping using a custom aCGH chip

To better understand how CNV variation is apportioned within and between breeds, we designed a custom 12-plex NimbleGen aCGH chip and genotyped 61 dogs from 12 diverse breeds (Table 3) for all known canine CNVs (Figure 1). The average probe density was approximately 560 bp, and all of the hybridizations were performed with the same female Boxer used in previous aCGH experiments. We used a hidden Markov model implemented in the software package RJaCGH [27] to call CNVs for each CNV region in each sample (see Methods). The RJaCGH software package assigns a posterior probability to each aCGH probe as being in a gain, loss, or normal copy state. A summary of the posterior probabilities of each probe across all 61 individuals is shown in Figure 2.

Table 3

Summary of CNVs identified in each breed with the genotyping aCGH chip

Breed	N^a	Total CNVs	Average^b	Range^c	Average H_e	Fixed Gains	Fixed Losses	Unique CNVs	Genic CNVs
Alaskan Malamute	4	406	188	86-306	0.194	15	22	8	194
Beagle	5	467	175	40-282	0.201	9	0	4	185
Border Collie	5	388	228	92-306	0.244	16	11	10	223
Boxer	5	403	133	72-244	0.160	6	0	7	165
Brittany	5	337	229	84-296	0.227	25	5	2	195
Dachshund	5	340	150	86-223	0.171	6	4	4	168
German Shepherd	5	382	201	144-219	0.193	26	18	5	196
Greyhound	5	394	156	40-267	0.180	7	6	6	189
Jack Russell Terrier	5	379	180	111-267	0.185	27	6	2	189
Labrador Retriever	6	409	179	119-254	0.194	8	7	7	206
Shar Pei	5	353	189	93-262	0.191	22	4	3	170
Standard Poodle	6	470	237	112-332	0.242	35	1	18	230

a Denotes the number of individuals studied.

b Average number of CNVs per individual

c Indicates the range in the number of CNVs identified per individual within each breed.

Figure 2

Heatmap representation of CNVs in all individuals. Columns represent individuals and rows represent a transformed measure of the posterior probability of each aCGH probe coming from a loss, normal copy, or gain state, denoted as PLoss, PNormal, and PGain, respectively. Specifically, for each probe, the posterior probabilities of each state obtained from RJaCGH were converted into a single value by first dividing all three posterior probabilities by the largest value and then calculating a transformed score defined as (PGain - PNormal) - (PLoss - PNormal), which results in a probe score that varies between -1 and 1. The values of -1, 0, and 1 correspond to the strongest evidence for loss, normal copy, and gains, respectively. Intermediate values reflect more uncertainty as to the state a given probe is in. Breeds are abbreviated as follows: Alaskan Malamute (AKM), Border Collie (BC), Beagle (BGL), Brittany (BRT), Boxer (BXR), Dachshund (DSH), Greyhound (GRY), German Shepherd (GSH), Jack Russell Terrier (JRT), Labrador Retriever (LBR), Shar Pei (SHP) and Standard Poodle (STP).

Summary of CNVs identified in each breed with the genotyping aCGH chip a Denotes the number of individuals studied. b Average number of CNVs per individual c Indicates the range in the number of CNVs identified per individual within each breed. Heatmap representation of CNVs in all individuals. Columns represent individuals and rows represent a transformed measure of the posterior probability of each aCGH probe coming from a loss, normal copy, or gain state, denoted as PLoss, PNormal, and PGain, respectively. Specifically, for each probe, the posterior probabilities of each state obtained from RJaCGH were converted into a single value by first dividing all three posterior probabilities by the largest value and then calculating a transformed score defined as (PGain - PNormal) - (PLoss - PNormal), which results in a probe score that varies between -1 and 1. The values of -1, 0, and 1 correspond to the strongest evidence for loss, normal copy, and gains, respectively. Intermediate values reflect more uncertainty as to the state a given probe is in. Breeds are abbreviated as follows: Alaskan Malamute (AKM), Border Collie (BC), Beagle (BGL), Brittany (BRT), Boxer (BXR), Dachshund (DSH), Greyhound (GRY), German Shepherd (GSH), Jack Russell Terrier (JRT), Labrador Retriever (LBR), Shar Pei (SHP) and Standard Poodle (STP). Raw CNV calls from RJaCGH were filtered based on the number of data points, average posterior probabilities for probes in the putative CNV, and average log2 values (see Methods). Of the 892 regions studied, 665 (75%) had at least one individual containing a CNV. Over 95% of the CNV regions that appeared as monomorphic were previously identified in a breed not studied in the CNV genotyping panel; thus, failure to confirm CNVs in these regions is likely due to both individual or breed specific CNVs and false positives in previous CNV discovery experiments. As shown in Table 3, the average number of CNVs across all individuals was 187, ranging from 40 (in a Beagle and Greyhound) to 332 (in a Standard Poodle). Before pursuing detailed population genetics inferences, we performed three analyses to assess data quality and false discovery rates. First, we performed three self-self hybridizations of a Boxer, Greyhound, and Shar-Pei. Using the same criteria to identify CNVs as described above, we called 0, 1, and 6 CNVs in the Shar-Pei, Boxer, and Greyhound, respectively. Thus, the self-self hybridizations suggest a low false discovery rate (< 5%). Second, we included 42 control regions on the genotyping aCGH chip selected from putatively single copy sequence defined from earlier CNV experiments [17]. Across all individuals, and thus a total 61 × 42 = 2,562 total control regions, only 56 CNVs were called (located in 14 distinct control regions), which also suggests a low false discovery rate. Note, it is plausible that genuine CNVs exist in some of these putative single copy control sequences, which were not observed in previous studies that examined a smaller number of individuals. Indeed, Monte Carlo simulations demonstrate that the expected number of control regions to harbor a CNV given 56 false positives is 31 (standard deviation = 2), suggesting that the observed patterns of CNVs in control regions are more clustered than expected by chance and hence some may be genuine CNVs. Third, three of the individuals included in the genotyping panel (a German Shepherd, Labrador Retriever, and Standard Poodle) were also previously interrogated for CNVs with the SD [17] and 2.1 chips (described above). The average overlap between CNVs called in the previous aCGH experiments and the genotyping chip across all three samples was 74.9%. To interpret the observed amount of overlap, we performed extensive simulations that recapitulate characteristics of the three aCGH chips and distribution of log2 values (see Methods). The observed overlap was similar to the simulated data (average overlap 71.9%, with a 95% confidence interval of 70.9-73.2%), and the discordances are primarily a result of different probe densities across chips that influences the power to detect CNVs. Overall, these three analyses suggest the CNV genotype data is of high quality. Furthermore, we also examined whether CNV calls were more concordant between the genotyping chip and the SD chip or between the genotyping chip and NimbleGen 2.1 tiling array. In general, the concordances were similar, but higher for CNVs initially discovered on the SD chip (0.78) than CNVs discovered on the NimbleGen 2.1 tiling chip (0.71). Moreover, as expected, larger CNVs (> = 100 kb) were more concordant (81.6%) than smaller (< 100 kb) CNVs (74.9%).

Patterns of CNV diversity within breeds

We estimated approximate allele frequencies for each breed and for each CNV using a simple EM algorithm [28] (see Methods). From these allele frequencies, we calculated the expected heterozygosity (He) for each breed at every polymorphic CNV region, and the average He for each breed is shown in Table 3. As expected from SNP and sequence data [1,3], Boxers were the least diverse breed studied and Border Collies were the most diverse breed (Table 3). Interestingly, we observe a significant difference (p < 10-5) in the average He between CNVs from SDs and CNVs not from SD (Figure 3) in all breeds, consistent with the dynamic nature of SDs leading to increased segregating variation.

Figure 3

Average heterozygosity of SD and non-SD CNVs. Red squares and blue diamonds denote average heterozygosity for CNV regions associated with SDs and non-SDs, respectively. Vertical lines represent 95% confidence intervals. Breed abbreviations are described in Figure 2. To better understand how CNVs contribute to within breed diversity, we searched for CNV regions that exhibited high levels of heterozygosity. Interestingly, 45 regions were identified that exhibited high diversity in one or more breeds (He > 0.6). For example, a CNV region on CFA12 was identified in the Standard Poodle, which contains a number of genes, such as PSORS1C2, CDSN, and CCHCR1, that are associated with various epithelial processes and skin disorders (Figure 4). Standard Poodles are a breed marked with common occurrences of skin disorders or disorders with epithelial symptoms such as Cushing's disease (hyperadrenocorticism) [29,30] and Sebaceous adenitis [31,32]. Additionally some skin disorders, such as psoriasis in humans, have been associated with copy number polymorphisms [33]. Thus, PSORS1C2, CDSN, and CCHCR1 are excellent candidates to pursue in future association studies of skin phenotypes in Standard Poodles. Furthermore, a topoisomerase gene, TOP3B, involved in the cutting of DNA strands during transcription and recombination [34], was also found to be polymorphic in six breeds (Alaskan Malamute, Border Collie, Brittany, Labrador Retriever, Shar Pei, and Standard Poodle).

Figure 4

Patterns of CNVs in six Standard Poodles for a region on CFA12. Each bar represents the log2 value (y-axis) of a probe as a function of position (x-axis) across the region. Blue, red, and black bars indicate whether the probe was called as being in a gain, loss, or normal copy state, respectively. Highlighted in purple is a genic region corresponding to the location of the human homologs of the PSORS1C2, CDSN, and CCHCR1 genes. Note, the heterozygosity of this region in the main text is based on the entire region, and not just the purple highlighted interval.

Patterns of CNV diversity between breeds

To better understand patterns of CNV variation between breeds, we calculated FST for each polymorphic CNV region. The distribution of FST across all CNV regions is shown in Figure 5, which ranges from 0.028 to 0.86. The average FST is 0.168, which is comparable, although slightly lower than estimates of FST in SNP data [4,8]. No significant difference in FST was detected between SD and non-SD CNVs (p > 0.05). A number of interesting genes exist among the top 50 most differentiated CNV regions that may be relevant to phenotypic variation between breeds, such as ATBF1, a zinc finger transcription factor that regulates neuronal and muscle development [35] and NKAIN2, which is associated with susceptibility to lymphoma [36], the most common form of canine cancer [37].

Figure 5

Distribution of F.

Distribution of F. In addition, we also identified CNVs where all individuals within one or more breeds carried a duplication or deletion, but was absent in at least one of the remaining breeds. In total, 49 such regions exhibiting this pattern were identified (Figure 6, Additional file 2), 21 of which overlap the top 50 most differentiated CNVs described above. A number of these divergent regions possessed genes that potentially contribute to phenotypic differences between breeds such as development (OBSCN, NOTCH2, and NKD2), neuronal processes (TNFRSF1B and ATBF1), olfaction (OR4S2, OR4C30, and OR52B4), and metabolism (HMGCS2).

Figure 6

Diverse CNV region on CFA 1. Each bar represents the log2 value (y-axis) of a probe as a function of position (x-axis) across the region. Blue, red, and black bars indicate whether the probe was called as being in a gain, loss, or normal copy state, respectively. All Border Collie (BC) individuals have a loss in this region, all Boxer (BXR) individuals show no evidence for a CNV, and Greyhounds (GRY) segregate both gains and losses.

Conclusions

In summary, we have compiled the most comprehensive catalog of canine structural variation described to date. Moreover, we examined patterns of variation for all known canine CNVs in a diverse panel of 12 breeds, providing the first insight into how structural variation is apportioned within and between breeds. Interestingly, we found high levels of CNV diversity within breeds, suggesting that structural variation may be an important source of genetic variation contributing to within breed patterns of phenotypic diversity. Moreover, our data is consistent with a high rate of de novo CNV formation within breeds. We anticipate that the CNV resources developed in this work will complement existing genome-wide panels of SNP markers [1,3,9] by providing the foundation for future association studies to delimit how structural variation contributes to canine phenotypic variation and disease susceptibility.

Methods

DNA samples

For the genome-wide tiling aCGH experiments, a single individual from the following breeds was used: Basenji, Doberman, German Shepherd, Labrador Retriever, Pug, Shetland Sheepdog, Siberian Husky, Standard Poodle, Rottweiler, and a Grey Wolf. Samples used in the genotyping aCGH experiments included the following breeds: Alaskan Malamute, Beagle, Border Collie, Boxer, Brittany, Dachshund, German Shepherd, Greyhound, Jack Russell Terrier, Labrador Retriever, Shar Pei, and Standard Poodle. A total of 3 "self-self" hyrbidizations were performed using the female Boxer reference, a Greyhound, and Shar Pei. DNA quality of all samples was assessed by taking OD260/280 and OD260/230 readings using a nanospectrometer.

aCGH and CNV identification

The high density aCGH chip was designed and produced by NimbleGen http://www.NimbleGen.com, and included 2,164,508 oligonucleotide probes with an average probe spacing of 1050 bp. All genomic DNA samples were sent to NimbleGen who performed the hybridizations. In all cases a female Boxer was used as the reference sample. Each hybridization was initially subjected to segmentation using the CGH-segMNT program within the NimbleScan software package. Segments were further partitioned if there was a gap greater than 50 kb between adjacent probes. Furthermore, segments within 5 kb of one another and with consistent log2 ratios (either both positive or both negative) were merged together to form a new segment. To define segments corresponding to gains and losses, we developed an adaptive threshold algorithm that takes advantage of the observation that segments with more data points require smaller changes in log2 ratios to be reliably called as a gain or loss whereas segments with fewer data points require larger magnitudes of log2 ratios to be accurately called as a gain or loss. We trained our algorithm on the self-self hybridization to identify parameters resulting in a low false discovery rate. Specifically, if a segment contained 5-10 data points, 11-100, or > 100 data points, we required an average log2 ratio that was 3, 2, and 1 standard deviations or greater from the mean, respectively, to be retained. Thus, a minimum of five probes was required to call a CNV. All aCGH data has been submitted to GEO http://www.ncbi.nlm.nih.gov/geo/ under accession number GSE26170.

CNV genotyping

A custom aCGH genotyping chip was developed with NimbleGen using the CamFam2.0 assembly. The chip contains 12 individual lanes, each spotted with 136,929 oligonucleotide probes with a mean probe spacing of approximately 560 bp. These probes were primarily designed to tile over all previously identified CNVs including the 678 CNV regions identified in segmental duplications [17], 403 CNV regions identified from a genome-wide CNV detection survey using the NimbleGen 2.1 tiling arrays, and 60 CNV regions from a separate genome-wide study [10]. In addition, 42 putative single copy control regions that had never before been found to contain CNVs and were not associated with segmental duplications were included. Finally, 1,095 additional regions were included on the chip, which were derived from lower confidence CNV calls. Note, these CNV regions were excluded in all analyses described in this manuscript, but information about them is provided in Additional files 3 and 4. Coordinates from all these regions were merged and covered with aCGH probes. Hybridizations of 61 individuals from 12 different breeds were performed using a common female Boxer as a reference sample. Additionally, three self-self hybridizations were also performed. Breeds were randomized across chips to mitigate confounding factors. The raw log2 ratios were first normalized by loess regression. Next, we fit linear models to the residuals of the loess regression to account for spot position and chip number. For all samples, individual probes were grouped into sets of five continuous probes (unless adjacent probes were more than 5 kb apart) and their log2 value was averaged. The average log2 values were then called for CNVs using a reversible jump hidden Markov Model implemented in the software RJaCGH [27]. The output of RJaCGH consists of a state call for each probe (i.e., gain, normal copy, and loss) and the posterior probability of being in each state. Using the self-self hybridizations, adaptive thresholds were established to filter these raw CNV calls based on the number of data points, average posterior probabilities for probes in the putative CNV, and average log2 value across probes in a putative CNV. Specifically, for segments consisting of three to five averaged data points (corresponding to approximately 8.4 - 14 kb), we required a posterior probability greater than 0.75 and a log2 value equal to the mean ± 0.5*standard deviation of all log2 values (note, plus for gains and minus for losses). If the segment consisted of > 5 averaged data points (corresponding to a minimum size of approximately 16.8 kb), we retained RJaCGH CNV calls with a posterior probability ≥ 0.6. All unique X-linked CNVs called as deletions in male dogs were removed since the reference was a female dog.

Simulations

Simulations were performed to interpret the observed amount of overlap between CNVs for the German Shepherd, Labrador Retriever, and Standard Poodle samples, which were analyzed on multiple chip platforms. The aCGH designs considered included the custom segmental duplication chip [17], the genome-wide 2.1 million feature chip, and the genotyping chip. Distributions of CNV sizes, probe spacing, and log2 values were generated for gains, normal copy, and losses conditional on the observed distributions of these quantities in each sample. Using this information, normal copy and CNV regions were simulated for each sample across all three array platforms, and subjected to the same CNV analysis as described above. For a given region, overlapping CNV calls are defined in cases where the same CNV genotype is obtained between platforms.

CNV allele frequency estimations

Exact allele frequencies are difficult to calculate because precise copy numbers are unknown. To this end, we inferred approximate allele frequencies by simplifying CNV phenotypes into three categories: normal copy, gain, or loss. The frequency of each category was estimated by a standard EM algorithm [28]. The estimated allele frequencies were used to calculate expected heterozygosity (He) for each breed and each CNV region as He = 1 - (p2 + q2 + r2), where p, q, and r denote the frequencies of chromosome carrying normal copy, gains, and losses, respectively. Similarly for each CNV region, we calculated FST as: FST = 1 - hs /ht, where hs and ht denote average heterozygosity within subpopulations (breeds) and total heterozygosity, respectively.

Gene identification and PANTHER analysis

A catalog of all canine peptides was downloaded from Ensembl ftp://ftp.ensembl.org/pub/current_fasta/canis_familiaris/pep/, which contains 25,546 peptides. For each breed, the total number of genic CNVs and associated peptides were determined and PANTHER Molecular Function terms were assigned to all peptides using the PANTHER Hidden Markov Model scoring tools http://www.pantherdb.org/downloads/. PANTHER Molecular Function terms with less than five observations among the breed associated genes were not analyzed further. For each breed, we tested for overrepresentation of PANTHER terms in the CNV regions using the hypergeometric distribution. Bonferroni corrections were used to correct p-values for multiple hypothesis testing.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

TJN, EEE, and JMA conceived of and designed the experiments. TJN and CB performed all of the experiments. TJN and JMA analyzed the data. TJN and JMA wrote the paper. All authors read and approved the final manuscript.

Additional file 1

Enriched Panther Molecular Function Terms in CNV regions identified on the 2.1 chip. This table summarizes Gene Ontology Molecular Function terms that are significantly overrepresented in CNV regions identified on the 2.1 chip. Click here for file

Additional file 2

Heterozygosities of the 49 regions where one breed was fixed for a CNV that was absent in one or more breeds. This table summarizes heterozygosities of the 49 CNV regions that exhibit interesting patterns of allele frequency variation within and between breeds. Click here for file

Additional file 3

Summary of all CNV regions. This table provides information on the genomic locations and sources for all CNV regions. Click here for file

Additional file 4

CNV genotypes. This table summarizes genotypes for all individuals across all CNVs. Click here for file

35 in total

1. Sebaceous adenitis in standard poodles.

Authors: D H Scarff
Journal: Vet Rec Date: 2000-04-15 Impact factor: 2.695

2. Hyperadrenocorticism in a dog: a case report.

Authors: J A Mulnix; K W Smith
Journal: J Small Anim Pract Date: 1975-03 Impact factor: 1.522

3. Genome sequence, comparative analysis and haplotype structure of the domestic dog.

Authors: Kerstin Lindblad-Toh; Claire M Wade; Tarjei S Mikkelsen; Elinor K Karlsson; David B Jaffe; Michael Kamal; Michele Clamp; Jean L Chang; Edward J Kulbokas; Michael C Zody; Evan Mauceli; Xiaohui Xie; Matthew Breen; Robert K Wayne; Elaine A Ostrander; Chris P Ponting; Francis Galibert; Douglas R Smith; Pieter J DeJong; Ewen Kirkness; Pablo Alvarez; Tara Biagi; William Brockman; Jonathan Butler; Chee-Wye Chin; April Cook; James Cuff; Mark J Daly; David DeCaprio; Sante Gnerre; Manfred Grabherr; Manolis Kellis; Michael Kleber; Carolyne Bardeleben; Leo Goodstadt; Andreas Heger; Christophe Hitte; Lisa Kim; Klaus-Peter Koepfli; Heidi G Parker; John P Pollinger; Stephen M J Searle; Nathan B Sutter; Rachael Thomas; Caleb Webber; Jennifer Baldwin; Adal Abebe; Amr Abouelleil; Lynne Aftuck; Mostafa Ait-Zahra; Tyler Aldredge; Nicole Allen; Peter An; Scott Anderson; Claudel Antoine; Harindra Arachchi; Ali Aslam; Laura Ayotte; Pasang Bachantsang; Andrew Barry; Tashi Bayul; Mostafa Benamara; Aaron Berlin; Daniel Bessette; Berta Blitshteyn; Toby Bloom; Jason Blye; Leonid Boguslavskiy; Claude Bonnet; Boris Boukhgalter; Adam Brown; Patrick Cahill; Nadia Calixte; Jody Camarata; Yama Cheshatsang; Jeffrey Chu; Mieke Citroen; Alville Collymore; Patrick Cooke; Tenzin Dawoe; Riza Daza; Karin Decktor; Stuart DeGray; Norbu Dhargay; Kimberly Dooley; Kathleen Dooley; Passang Dorje; Kunsang Dorjee; Lester Dorris; Noah Duffey; Alan Dupes; Osebhajajeme Egbiremolen; Richard Elong; Jill Falk; Abderrahim Farina; Susan Faro; Diallo Ferguson; Patricia Ferreira; Sheila Fisher; Mike FitzGerald; Karen Foley; Chelsea Foley; Alicia Franke; Dennis Friedrich; Diane Gage; Manuel Garber; Gary Gearin; Georgia Giannoukos; Tina Goode; Audra Goyette; Joseph Graham; Edward Grandbois; Kunsang Gyaltsen; Nabil Hafez; Daniel Hagopian; Birhane Hagos; Jennifer Hall; Claire Healy; Ryan Hegarty; Tracey Honan; Andrea Horn; Nathan Houde; Leanne Hughes; Leigh Hunnicutt; M Husby; Benjamin Jester; Charlien Jones; Asha Kamat; Ben Kanga; Cristyn Kells; Dmitry Khazanovich; Alix Chinh Kieu; Peter Kisner; Mayank Kumar; Krista Lance; Thomas Landers; Marcia Lara; William Lee; Jean-Pierre Leger; Niall Lennon; Lisa Leuper; Sarah LeVine; Jinlei Liu; Xiaohong Liu; Yeshi Lokyitsang; Tashi Lokyitsang; Annie Lui; Jan Macdonald; John Major; Richard Marabella; Kebede Maru; Charles Matthews; Susan McDonough; Teena Mehta; James Meldrim; Alexandre Melnikov; Louis Meneus; Atanas Mihalev; Tanya Mihova; Karen Miller; Rachel Mittelman; Valentine Mlenga; Leonidas Mulrain; Glen Munson; Adam Navidi; Jerome Naylor; Tuyen Nguyen; Nga Nguyen; Cindy Nguyen; Thu Nguyen; Robert Nicol; Nyima Norbu; Choe Norbu; Nathaniel Novod; Tenchoe Nyima; Peter Olandt; Barry O'Neill; Keith O'Neill; Sahal Osman; Lucien Oyono; Christopher Patti; Danielle Perrin; Pema Phunkhang; Fritz Pierre; Margaret Priest; Anthony Rachupka; Sujaa Raghuraman; Rayale Rameau; Verneda Ray; Christina Raymond; Filip Rege; Cecil Rise; Julie Rogers; Peter Rogov; Julie Sahalie; Sampath Settipalli; Theodore Sharpe; Terrance Shea; Mechele Sheehan; Ngawang Sherpa; Jianying Shi; Diana Shih; Jessie Sloan; Cherylyn Smith; Todd Sparrow; John Stalker; Nicole Stange-Thomann; Sharon Stavropoulos; Catherine Stone; Sabrina Stone; Sean Sykes; Pierre Tchuinga; Pema Tenzing; Senait Tesfaye; Dawa Thoulutsang; Yama Thoulutsang; Kerri Topham; Ira Topping; Tsamla Tsamla; Helen Vassiliev; Vijay Venkataraman; Andy Vo; Tsering Wangchuk; Tsering Wangdi; Michael Weiand; Jane Wilkinson; Adam Wilson; Shailendra Yadav; Shuli Yang; Xiaoping Yang; Geneva Young; Qing Yu; Joanne Zainoun; Lisa Zembek; Andrew Zimmer; Eric S Lander
Journal: Nature Date: 2005-12-08 Impact factor: 49.962

4. Complex patterns of copy number variation at sites of segmental duplications: an important category of structural variation in the human genome.

Authors: Violaine Goidts; David N Cooper; Lluis Armengol; Werner Schempp; Jeffrey Conroy; Xavier Estivill; Norma Nowak; Horst Hameister; Hildegard Kehrer-Sawatzki
Journal: Hum Genet Date: 2006-07-13 Impact factor: 4.132

5. Mammalian DNA topoisomerase IIIalpha is essential in early embryogenesis.

Authors: W Li; J C Wang
Journal: Proc Natl Acad Sci U S A Date: 1998-02-03 Impact factor: 11.205

6. Hotspots for copy number variation in chimpanzees and humans.

Authors: George H Perry; Joelle Tchinda; Sean D McGrath; Junjun Zhang; Simon R Picker; Angela M Cáceres; A John Iafrate; Chris Tyler-Smith; Stephen W Scherer; Evan E Eichler; Anne C Stone; Charles Lee
Journal: Proc Natl Acad Sci U S A Date: 2006-05-15 Impact factor: 11.205

7. The dog genome: survey sequencing and comparative analysis.

Authors: Ewen F Kirkness; Vineet Bafna; Aaron L Halpern; Samuel Levy; Karin Remington; Douglas B Rusch; Arthur L Delcher; Mihai Pop; Wei Wang; Claire M Fraser; J Craig Venter
Journal: Science Date: 2003-09-26 Impact factor: 47.728

Review 8. Principles of treatment for canine lymphoma.

Authors: Susan N Ettinger
Journal: Clin Tech Small Anim Pract Date: 2003-05

Review 9. Hyperadrenocorticism in the dog: canine Cushing's syndrome.

Authors: J M Owens; W D Drucker
Journal: Vet Clin North Am Date: 1977-08

10. An initial comparative map of copy number variations in the goat (Capra hircus) genome.

Authors: Luca Fontanesi; Pier Luigi Martelli; Francesca Beretti; Valentina Riggio; Stefania Dall'Olio; Michela Colombo; Rita Casadio; Vincenzo Russo; Baldassare Portolano
Journal: BMC Genomics Date: 2010-11-17 Impact factor: 3.969

53 in total

1. Complex DNA structures trigger copy number variation across the Plasmodium falciparum genome.

Authors: Adam C Huckaby; Claire S Granum; Maureen A Carey; Karol Szlachta; Basel Al-Barghouthi; Yuh-Hwa Wang; Jennifer L Guler
Journal: Nucleic Acids Res Date: 2019-02-28 Impact factor: 16.971

Review 2. Adaptive potential of genomic structural variation in human and mammalian evolution.

Authors: David W Radke; Charles Lee
Journal: Brief Funct Genomics Date: 2015-05-23 Impact factor: 4.241

3. Genome-wide assessment of recurrent genomic imbalances in canine leukemia identifies evolutionarily conserved regions for subtype differentiation.

Authors: Sarah C Roode; Daniel Rotroff; Anne C Avery; Steven E Suter; Dorothee Bienzle; Joshua D Schiffman; Alison Motsinger-Reif; Matthew Breen
Journal: Chromosome Res Date: 2015-06-03 Impact factor: 5.239

Review 4. Copy number variation in the cattle genome.

Authors: George E Liu; Derek M Bickhart
Journal: Funct Integr Genomics Date: 2012-07-13 Impact factor: 3.410

Review 5. Canine epilepsy genetics.

Authors: Kari J Ekenstedt; Edward E Patterson; James R Mickelson
Journal: Mamm Genome Date: 2011-10-30 Impact factor: 2.957

Review 6. Copy number variation and disease resistance in plants.

Authors: Aria Dolatabadian; Dhwani Apurva Patel; David Edwards; Jacqueline Batley
Journal: Theor Appl Genet Date: 2017-10-17 Impact factor: 5.699

Review 7. Copy number variation in the domestic dog.

Authors: Carlos E Alvarez; Joshua M Akey
Journal: Mamm Genome Date: 2011-12-04 Impact factor: 2.957

8. Franklin H. Epstein Lecture. Both ends of the leash--the human links to good dogs with bad genes.

Authors: Elaine A Ostrander
Journal: N Engl J Med Date: 2012-08-16 Impact factor: 91.245

9. Structural variants in the soybean genome localize to clusters of biotic stress-response genes.

Authors: Leah K McHale; William J Haun; Wayne W Xu; Pudota B Bhaskar; Justin E Anderson; David L Hyten; Daniel J Gerhardt; Jeffrey A Jeddeloh; Robert M Stupar
Journal: Plant Physiol Date: 2012-06-13 Impact factor: 8.340

10. General assessment of copy number variation in normal and tumor tissues of the domestic dog (Canis lupus familiaris).

Authors: Artur Gurgul; Kacper Żukowski; Brygida Ślaska; Ewelina Semik; Klaudia Pawlina; Tomasz Ząbek; Igor Jasielczuk; Monika Bugno-Poniewierska
Journal: J Appl Genet Date: 2014-02-27 Impact factor: 3.240