Literature DB >> 33793571

Identification of common predisposing loci to hematopoietic cancers in four dog breeds.

Benoît Hédan¹, Édouard Cadieu¹, Maud Rimbault¹, Amaury Vaysse¹, Caroline Dufaure de Citres², Patrick Devauchelle³, Nadine Botherel¹, Jérôme Abadie⁴, Pascale Quignon¹, Thomas Derrien¹, Catherine André¹.

Abstract

Histiocytic sarcoma (HS) is a rare but aggressive cancer in both humans and dogs. The spontaneous canine model, which has clinical, epidemiological, and histological similarities with human HS and specific breed predispositions, provides a unique opportunity to unravel the genetic basis of this cancer. In this study, we aimed to identify germline risk factors associated with the development of HS in canine-predisposed breeds. We used a methodology that combined several genome-wide association studies in a multi-breed and multi-cancer approach as well as targeted next-generation sequencing, and imputation We combined several dog breeds (Bernese mountain dogs, Rottweilers, flat-coated retrievers, and golden retrievers), and three hematopoietic cancers (HS, lymphoma, and mast cell tumor). Results showed that we not only refined the previously identified HS risk CDKN2A locus, but also identified new loci on canine chromosomes 2, 5, 14, and 20. Capture and targeted sequencing of specific loci suggested the existence of regulatory variants in non-coding regions and methylation mechanisms linked to risk haplotypes, which lead to strong cancer predisposition in specific dog breeds. We also showed that these canine cancer predisposing loci appeared to be due to the additive effect of several risk haplotypes involved in other hematopoietic cancers such as lymphoma or mast cell tumors as well. This illustrates the pleiotropic nature of these canine cancer loci as observed in human oncology, thereby reinforcing the interest of predisposed dog breeds to study cancer initiation and progression.

Entities: CellLine Chemical Disease Gene Mutation Species

Year: 2021 PMID： 33793571 PMCID： PMC8016107 DOI： 10.1371/journal.pgen.1009395

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

1. Introduction

Over the past decade, dogs have emerged as a relevant and under-used spontaneous model for the analysis of cancer predisposition and progression as well as development and trials of more efficient therapies for many human cancers [1-10]. With over 4.2 million dogs diagnosed with cancer annually in the USA [8], canine cancers represent a unique source of spontaneous tumors. Canine cancers share strong similarities with human cancers, based on both the biological behavior and histopathological features [11-14]. Thus, spontaneous canine models are a natural and ethical model i.e., a non-experimental model to decipher the genetic basis of cancers. Given the incomplete penetrance and genetic heterogeneity of human cancers, identifying their genetic predisposition is complex [6], and almost impossible in rare cancers. Because of specific breed structures and artificial selection, dog breeds have gained numerous susceptibilities to genetic diseases, and a limited number of their critical genes are involved in complex diseases such as cancers [6]. Further, numerous genome-wide association studies (GWAS) in dogs have illustrated that with complex traits, such as body size and cancer, a small number of loci with strong effects are involved in dogs, as compared to humans, thereby facilitating their identification. With large intra-breed linkage disequilibrium (LD), cancer loci have been successfully identified even in studies with a small number of cases and controls [15-20]. Thus, spontaneously affected pet dogs, with breed-specific cancers, provide efficient natural models to identify the genetics underlying several dog-human homologous cancers. Histiocytic sarcoma (HS), which involves histiocytic cells (dendritic or monocytic/macrophagic lineages), is extremely rare in humans, and is associated with a limited response to chemotherapy and high mortality. Due to the rarity of this cancer, there is no consensus on its prognostic factors and standard treatment [21]; therefore, models are urgently needed to better understand this aggressive cancer. In the entire dog species, HS is also a relatively rare cancer; however, a few popular breeds, such as Bernese mountain dogs (BMD), Rottweilers, retrievers (especially flat coated retrievers [FCR]), are highly predisposed to this cancer with breed-specific clinical presentations. Interestingly, the clinical presentation and histopathology of this canine cancer are similar to those observed in humans [22,23]. These breed-specific predispositions have allowed to sample numerous cases and led to the successful identification of shared somatic mutations between human and canine HS involving the mitogen-activated protein kinase (MAPK) pathway [24,25]. We recently showed that the same mutations of protein tyrosine phosphatase non-receptor type 11 (PTPN11), the most frequently altered gene of the MAPK pathway, are found in both human and canine HS [25]. Most importantly, thanks to these breed predispositions, we have previously shown that somatic mutations of PTPN11 found in half of the HS canine cases are linked to an aggressive HS clinical subgroup in both dogs and humans [25]. Regarding predisposition of HS, a previous study with 236 cases and 228 controls has highlighted that S-methyl-5′-thioadenosine phosphorylase (MTAP)—cyclin-dependent kinase inhibitor 2A (CDKN2A) genomic region is one of the main loci that confers susceptibility to HS in BMD [26]. Nevertheless, HS is a multifactorial cancer, and other loci are expected to be involved in HS predisposition. This is in accordance with the fact that, despite the awareness and attempts to select against HS for 20 years, breeders have not succeeded in reducing the prevalence of this devastating cancer because of its strong heritability in BMD [27]. In addition, it is suspected that HS-predisposed breeds (BMD, Rottweiler, and retrievers) share common risk alleles due to common ancestors; thus, cases from close breeds can accelerate the identification of common loci by reducing the haplotype of these critical regions [28]. Finally, it is worth noting that these HS-predisposed breeds also present a high risk of developing other cancers such as lymphomas, mast cell tumors, hemangiosarcomas, osteosarcomas, or melanomas [26,29,30]. It is estimated that a high proportion of deaths in these HS-predisposed breeds are due to several neoplasms (BMD: 45–76%, golden retriever: 39–50%,Labrador retriever: 31–34%,FCR: 54%, and Rottweiler: 30–45%) [30-32]. This study aimed to extend previous studies by deciphering the genetic basis of HS based on a multi-breed approach. We performed exhaustive GWAS with a substantially increased numbers of cases and controls from three different breeds, and with higher density single nucleotide variation (SNV) arrays. Our results not only strengthen the crucial role of the CDKN2A locus in HS, but also shed light on secondary loci located on canine chromosomes 2, 5, 14 and 20 containing relevant novel candidate genes. They point toward the existence of regulatory variants in non-coding regions and/or methylation mechanisms linked to risk haplotypes, which ultimately lead to strong cancer predisposition in specific dog breeds.

2. Results

To decipher the genetic basis of HS in the dog model system, we took advantage of the well-known HS-predisposed breeds. We combined data from GWAS with high-density genotyped and imputed SNV data from BMD, FCR, and Rottweiler with HS, lymphomas, and mast cell tumors as well as publicly available data from lymphoma and mast cell tumors in golden retrievers [15,16] (Table 1).

Table 1

Characteristics of the genome-wide association studies (GWAS) analyses performed in this study.

* Dogs genotyped on the Affymetrix Axiom Canine Genotyping Array were also available in the Illumina 173K SNV Canine HD array format.

GWAS Name	Paragrah	cancers	breeds	SNP arrays format	Number of cases after QC	Number of controls after QC
GWAS_1_HS_BMD	§2.1	HS	BMD	Illumina 173k SNV Canine HD array	172	128
GWAS_2_HS+lymphoma_BMD	§2.2a	HS and lymphoma	BMD	Illumina 173k SNV Canine HD array	252	128
GWAS_3_HS+lymphoma_BMD+golden_retriever	§2.2a	HS and lymphoma	BMD, golden retriever	Illumina 173k SNV Canine HD array	293	300
GWAS_4_HS+MCT_BMD	§2.2b	HS and mast cell tumor	BMD	Illumina 173k SNV Canine HD array	216	128
GWAS_5_HS+MCT_BMD+golden_retriever	§2.2b	HS and mast cell tumor	BMD, golden retriever	Illumina 173k SNV Canine HD array	285	202
GWAS_6_HS_BMD_with_imputed_SNV	§2.3	HS	BMD	Affymetrix Axiom Canine Genotyping array 1.1M SV (n = 113)*Illumina 20k SNV Canine HD array imputed for 1.1M SV (n = 464)Illumina 173k SNV Canine HD array imputed for 1.1M SV (n = 300)	403	347
GWAS_7_HS_BMD+FCR_with_imputed_SNV	§2.3	HS	BMD, FCR	Affymetrix Axiom Canine Genotyping array 1.1M SV (n = 134)*Illumina 20k SNV Canine HD array imputed for 1.1M SV (n = 464)Illumina 173k SNV Canine HD array imputed for 1.1M SV (n = 328)	416	362
GWAS_8_HS_BMD+FCR+Rottweiler_with_imputed_SNV	§2.3	HS	BMD, FCR,Rottweiler	Affymetrix Axiom Canine Genotyping array 1.1M SV (n = 134)*Illumina 20k SNV Canine HD array imputed for 1.1M SV (n = 464)Illumina 173k SNV Canine HD array imputed for 1.1M SV (n = 388)	453	385

Characteristics of the genome-wide association studies (GWAS) analyses performed in this study.

* Dogs genotyped on the Affymetrix Axiom Canine Genotyping Array were also available in the Illumina 173K SNV Canine HD array format.

2.1. Identification of loci linked to HS risk development in BMD breed

Using BMD DNA from 172 HS cases and 128 controls, we performed the first round of GWAS (GWAS_1_HS_BMD) by correcting for population stratification and cryptic relatedness (Fig 1). From 10,3487 SNVs left after applying filters, we identified 21 SNVs that were significantly associated with HS, including 20 SNVs on chromosome 11 spanning 40.3–47.2 Mb (strongest associated SNV was CFA11:41,161,441, pcorrected = 3.11 × 10−7) and one SNV on chromosome 20 (CFA20:30,922,308, pcorrected = 3.73 × 10−5). Moreover, an additional SNV on chromosome 5 (CFA5:30,496,048, pcorrected = 9.48. × 10−5) was close to the genome-wide significance, and was suspected to be associated with HS. This GWAS confirmed that the main locus linked to HS was located on CFA11, overlapping the MTAP-CDKN2A region, a locus previously associated with HS [26]. The analysis also identified a new locus on CFA20 and suggested the existence of another locus on CFA5. Interestingly, these three regions were previously identified for cancer predisposition in dogs: CFA11 (41.3–41.4 Mb) in osteosarcoma, CFA5 (29.6–34.1 Mb) in lymphomas and hemangiosarcomas, CFA20 (30.9–50.1 Mb) in mast cell tumors [15-17]. Indeed, Tonomura et al. has identified two independent peaks on CFA5 involved in lymphomas and hemangiosarcomas [16] overlapping the CFA5 locus suspected in HS; while, Arendt et al. has identified at least two independent peaks on CFA20 involved in mast cell predisposition [15], which also overlap the CFA20 locus found in HS in the BMD breed. Thus, we hypothesized that, because of strong breed selection, the significant associations detected for HS could be due to the cumulative risk alleles/haplotypes that can also be at risk for other hematopoietic cancers. This hypothesis was reinforced by the strong predisposition of BMD to hematopoietic cancers such as lymphomas [33,34] and by the fact that a majority of BMDs (58.3% to 66.5%) succumb to cancer, in first place HS, lymphomas, or mast tumors [26,30,35]. Moreover, we observed that in HS-affected BMD families, relatives of HS affected dogs are frequently affected by other hematopoietic cancers such as mast cell tumors and lymphomas [36] (S1 Fig).

Fig 1

Results of genome-wide association studies (GWAS) on Bernese mountain dogs (BMD) with 172 histiocytic sarcoma (HS) cases and 128 controls (GWAS_1_HS_BMD).

A) Quantile-quantile plot displaying a genomic inflation λ of 1.000005, indicating no residual inflation. B) Manhattan plot displaying the statistical results from the GWAS. This analysis pointed out two loci (arrows) on chromosome 11 (CFA11:41161441, pcorrected = 3.11 × 10−7) and on chromosome 20 (CFA20:30922308, pcorrected = 3.73 × 10−5).

Results of genome-wide association studies (GWAS) on Bernese mountain dogs (BMD) with 172 histiocytic sarcoma (HS) cases and 128 controls (GWAS_1_HS_BMD).

2.2. Involvement of HS loci in other hematopoietic cancers

To test whether these HS predisposing loci could also be involved in the predisposition of lymphomas or mast cell tumors, we added lymphoma or mast cell tumor cases to the previous HS GWAS.

2.2.a. Lymphoma

We performed a second GWAS (GWAS_2_HS+lymphoma_BMD) by adding 80 lymphoma-affected BMDs to the first GWAS (GWAS_1_HS_BMD). We identified six significantly associated SNVs, one of which was located on the CFA5 (best SNV on CFA5:30,496,048, pcorrected = 5.88 × 10−6; Figs 2 and S2). This result confirmed that the CFA5 locus is common to the predisposition of both HS and lymphoma in BMD. SNVs in LD (R2 > 0.6) with the top SNV of CFA5 delimited a locus from 28.3 Mb to 34.4 Mb (Fig 2). The top SNV of the CFA5 locus was located in an intron of the sphingolipid transporter 3 (SPNS3) gene for which the paralogous gene (SPNS2) is known to be important in immunological development, and inflammatory and autoimmune diseases [37]. Since these SNVs are located in one of the two lymphoma predisposing peaks (29.8 Mb and 33 Mb) found by Tonomura et al. in the golden retrievers, we performed a meta-analysis by including BMD GWAS data from this study and the publicly available golden retriever GWAS data [16], resulting in a final data set of 93,100 SNVs. This GWAS (GWAS_3_HS+lymphoma_BMD+golden_retriever), after adding golden retriever lymphoma cases (n = 41) and controls (n = 172) to the second BMD GWAS (GWAS_2_HS+lymphoma_BMD), contained 293 HS or lymphoma cases and 300 controls. We observed an increased signal on the CFA5 locus, thereby strengthening the two loci previously identified by Tonomura et al. at 29.8 Mb and 33 Mb (SNV CFA5:29,836,124, pcorrected = 1.86 × 10−6 and SNV CFA5:32,824,053, pcorrected = 2.2 × 10−7; Figs 2 and S2). These results show that BMDs and golden retrievers share common risk loci on CFA5 that is involved in hematopoietic cancers. Further, the CFA5 locus for both HS and lymphomas in BMD overlaps the two independent loci associated with lymphoma and hemangiosarcoma risk in golden retrievers (29.8 Mb and 33 Mb, respectively; Fig 2).

Fig 2

Close up view of the CFA5 locus.

A) Manhattan plot of the CFA5 20–40 Mb region highlighting the best p-values obtained in the three genome-wide association studies (GWAS): Bernese mountain dog (BMD) GWAS for histiocytic sarcoma (HS) with 172 cases vs. 128 controls (GWAS_1_HS_BMD); BMD GWAS for HS and lymphoma with 252 cases vs. 128 controls (GWAS_2_HS+lymphoma_BMD); meta-analysis combining the BMD GWAS of HS and lymphoma (252 cases vs. 128 controls) and golden retriever GWAS for lymphoma (41 cases vs. 172 controls) from Tonomura et al. [16] (GWAS_3_HS+lymphoma_BMD+golden_retriever). The R2 in cases from the top single nucleotide variation (SNV) is depicted to show the linkage-disequilibrium (LD) structure. B) Regions delimitated by the SNPs in LD with the best GWAS SNVs (R2 > 0.6) in cases; minimal region between the three GWAS (CFA5:28309815–34321500) is delimitated by red lines. C) Close up view of the genes (with available symbols) located in this minimal region (28–34 Mb) of CFA5.

Close up view of the CFA5 locus.

2.2.b. Mast cell tumor

By combining the first HS BMD GWAS (GWAS_1_HS_BMD) and 44 BMDs with mast cell tumors (GWAS_4_HS+MCT_BMD), we identified six significantly associated SNVs (best SNV on CFA11:41,161,441, pcorrected = 6.93 × 10−7), of which one was located on the CFA20 (best SNV on CFA20 CFA20:30,922,308, pcorrected = 1.53 × 10−5; Figs 3 and S3). This result confirmed that the CFA20 locus was common to HS and mast cell tumors (MCT) predisposition in BMD. SNVs in LD (R2 > 0.6) showed that the top SNV of CFA20 delimited a locus from 29.3 Mb to 32.8 Mb (Fig 3). The top SNV of CFA20 locus lies in an intron of fragile histidine triad diadenosine triphosphatase (FHIT), a tumor suppressor involved in apoptosis and prevention of epithelial-mesenchymal transition [38]. Interestingly, this locus overlapped one of the three MCT-independent predisposing peaks (33 Mb, 39 Mb, and 45 Mb; canFam3) identified in golden retrievers [15]. We then performed a meta-analysis combining our BMD GWAS for HS and MCT (GWAS_4_HS+MCT_BMD) with the golden retriever GWAS for MCT by adding publicly available data [15] to create a final data set of 88,202 SNVs. The addition of European golden retriever MCT cases and controls resulted in an increased association signal in the CFA20 locus, and clearly pointed out the CFA20 locus at 33 Mb (best SNV on CFA20:33,321,282, pcorrected = 4.79 × 10−7) in rho guanine nucleotide exchange factor 3 (ARHGEF3) and close to interleukin 17 receptor D (IL17RD), one of the three peaks identified by Arendt et al. These results show that the BMD and golden retriever breeds share common inherited risk factors on CFA20 for HS and MCT (Figs 3 and S3). These results also confirmed that the association of CFA20 with MCT in the golden retrievers is due to the additional effect of at least three risk haplotypes (33 Mb, 39 Mb, and 45 Mb).

Fig 3

Close up view of the CFA20 locus.

A) Manhattan plot of the CFA20 20–45 Mb region highlighting the best p-values obtained in the three genome-wide association study (GWAS): Bernese Mountain dogs (BMD) GWAS for histiocytic sarcoma (HS) with 172 cases vs. 128 controls (GWAS_1_HS_BMD); BMD GWAS for HS and mast cell tumor with 216 cases vs. 128 controls (GWAS_4_HS+MCT_BMD); meta-analysis combining BMD GWAS for HS and mast cell tumor with European golden retriever (69 cases vs. 74 controls) from Arendt et al. [15] (GWAS_5_HS+MCT_BMD+golden_retriever). The R2 in cases from top single nucleotide variation (SNVs) show the linkage disequilibrium (LD) structure. B) Regions delimitated by SNVs in LD with the best GWAS SNVs (R2 > 0.6) in cases; minimal region between the three GWAS (CFA20:31036863–32778949) is delimitated by red lines. C) Close up view of the genes (with available symbols) located in this minimal region (31–33 Mb) of CFA20.

Close up view of the CFA20 locus.

2.3. Refining HS loci by multiple-breed analyses and imputation on higher density SNV array

To increase the power of the GWAS and refine the HS loci, we added HS cases and controls from FCR and Rottweiler breeds genotyped on Illumina 173K SNV Canine HD. To increase the density of markers, 134 dogs (113 BMDs and 21 FCRs) were re-genotyped on the higher density Affymetrix Axiome Canine Genotyping array (1.1M SNV), and were used as a reference panel to impute these SNVs on the Illumina 173K SNV Canine HD. In addition, we also added data from previously published BMD cases and controls [26] by imputing their genotypes from the Canine SNP20 Bead-Chip panel (Illumina -22K SNV) to the higher density Axiome Canine Genotyping array (1.1M SNV). The quality of imputation was evaluated by masking imputed SNVs on half of the BMDs genotyped on the high-density Affymetrix Axiome Canine Genotyping array. The mean concordances between the masked autosomal SNVs and imputed SNVs were 91.97% and 95.86% for SNVs imputed from the Illumina -22k SNV panel to the higher density Affymetrix Axiome Canine Genotyping array and from the Illumina 173K SNV Canine HD panel to the higher density Affymetrix Axiome Canine Genotyping array, respectively. These concordances are similar to those described by the work of Friedenberg and Meurs, which describes a genotype concordance of up to 92.4% with Beagle software [39]. The addition of BMD cases and controls to the first BMD GWAS (GWAS_1_HS_BMD) resulted in a total of 403 cases and 347 controls imputed to form a final data set of 488,872 SNVs (GWAS_6_HS_BMD_with_imputed_SNV). Statistical analysis allowed the identification of 1,730 SNVs significantly associated with HS (Table 2, Fig 4A and 4B). This GWAS, after increasing the number of BMD cases and controls, confirmed the involvement of the CFA11 locus as well as the role of other loci (CFA5 and CFA14), and identified a new locus on chromosome 2 in HS BMD predisposition.

Table 2

Significant loci identified by genome-wide association studies (GWAS) after imputation on high-density single nucleotide variation (SNV) array.

Number of associated SNVs with the best SNV and the corresponding corrected p-value are presented for each locus and each GWAS.

		GWAS
Chromosome		BMD	BMD+FCR	BMD+FCR+Rott
2	Number of associated SNV	1	4	7
	Localisation of the best SNV	29716535	29716535	29716535
	Localisation of the best SNV	p_corrected = 3.24 x 10⁻⁴	p_corrected = 6.02 x 10⁻⁵	p_corrected = 3.58 x 10⁻⁵
	Region delimited by significant associated SNVs	29716535	29653137–29978776	29507029–34223001
	Region delimited by significant associated SNVs and SNVs in LD in cases (R2≥0.6)	29154373–30121047	29154373–30121047	29385904–35075340
5	Number of associated SNV	292	320	322
	Localisation of the best SNV	30496048	33823740	33823740
	Localisation of the best SNV	p_corrected = 8.22 x 10⁻⁶	p_corrected = 2.52 x 10⁻⁶	p_corrected = 2.4 x 10⁻⁶
	Region delimited by significant associated SNVs	25628485–34513401	25522718–34477045	25566642–34477045
	Region delimited by significant associated SNVs and SNVs in LD in cases (R2≥0.6)	25402068–37781406	25402068–34513401	25517580–34513401
11	Number of associated SNV	1145	984	930
	Localisation of the best SNV	41215628	41252822	41252822
	Localisation of the best SNV	p_corrected = 1.46 x 10⁻¹³	p_corrected = 1.49 x 10⁻¹³	p_corrected = 2.02 x 10⁻¹⁴
	Region delimited by significant associated SNVs	29934486–52418087	29978631–52418087	29978631–52418087
	Region delimited by significant associated SNVs and SNVs in LD in cases (R2≥0.6)	29047449–52471659	29315057–52471659	29341449–52418087
14	Number of associated SNV	292	250	246
	Localisation of the best SNVs	6567456	6567456	6566022
		p_corrected = 4.05 x 10⁻⁶	p_corrected = 1.37 x 10⁻⁶	p_corrected = 1.09 x 10⁻⁶
		10231328	10665001	11021670
		p_corrected = 9.19 x 10⁻⁶	p_corrected = 2.78 x 10⁻⁶	p_corrected = 1.52 x 10⁻⁶
	Region delimited by significant associated SNVs	561549–11111293	561549–11111293	561549–11111293
	Region delimited by significant associated SNVs and SNVs in LD in cases (R2≥0.6)	475090–11638599	475090–11379670	475090–11379670

Fig 4

Genome-wide association studies (GWAS) of Bernese mountain dogs (BMD) and other predisposed breeds on histiocytic sarcoma (HS) with the imputation of single nucleotide variations (SNV) on a higher density SNV array.

A–B) BMD GWAS results based on 403 cases and 347 controls (GWAS_6_HS_BMD_with_imputed_SNV). A) Quantile-quantile plot displaying a genomic inflation λ of 1.000023, indicating no residual inflation. B) Manhattan plot displaying the statistical results from the GWAS. This analysis shows four loci (arrows) on chromosome 2 (best SNV at CFA2:29716535, pcorrected = 3.25 × 10−4), chromosome 5 (best SNV at Chr5:30496048, pcorrected = 8.22 × 10−6), chromosome 11 (best SNV at Chr11:41215628, pcorrected = 1.45 × 10−13), and chromosome 14 (CFA14:6567456, pcorrected = 4.04 × 10−6). C–D. GWAS results for HS combining BMDs (403 cases vs. 347 controls) and flat-coated retrievers (FCRs; 13 cases vs. 15 controls; GWAS_7_HS_BMD+FCR_with_imputed_SNV). C) Quantile-quantile plot displaying a genomic inflation λ of 1.000018, indicating no residual inflation. D) Manhattan plot displaying the statistical results from the GWAS. This analysis shows four loci (arrows) on chromosome 2 (best SNV at CFA2:29716535, pcorrected = 6.02 × 10−5), chromosome 5 (best SNV at CFA5:33823740, pcorrected = 2.52 × 10−6), chromosome 11 (best SNV at CFA11:41252822, pcorrected = 1.49 × 10−13), and chromosome 14 (CFA14:6567456, pcorrected = 1.37 × 10−6). E–F) GWAS results for HS combining BMDs (403 cases vs. 347 controls), FCRs (13 cases vs. 15 controls), and Rottweilers (37 cases vs. 23 controls; GWAS_8_HS_BMD+FCR+Rottweiler_with_imputed_SNV). E) Quantile-quantile plot displaying a genomic inflation λ of 1.000013, indicating no residual inflation. F) Manhattan plot displaying the statistical results from the GWAS. This analysis shows four loci (arrows) on chromosome 2 (best SNV at CFA2:29716535, pcorrected = 3.58 × 10−5), chromosome 5 (best SNV at CFA5:33823740, pcorrected = 2.4 × 10−6), chromosome 11 (best SNV at CFA11:41252822, pcorrected = 2.04 × 10−14), and chromosome 14 (best SNV at CFA14:6566022, pcorrected = 1.09 × 10−6). G) Close up view of the CFA11 locus highlighting the best p-values obtained in the three GWAS: BMDs GWAS (GWAS_6_HS_BMD_with_imputed_SNV), BMDs plus FCRs GWAS (GWAS_7_HS_BMD+FCR_with_imputed_SNV), and BMDs plus Rottweilers and FCRs GWAS (GWAS_8_HS_BMD+FCR+Rottweiler_with_imputed_SNV). R2 in cases from top SNV is depicted to show the linkage disequilibrium (LD) structure. H) Regions delimitated by SNVs in LD with the best GWAS SNVs (R2 > 0.6) in cases, minimal region between the three GWAS (CFA11: 38435917–41701130) is delimitated by red lines. I) Close up view of the genes (with available symbols) located in this minimal region (38–42 Mb).

Genome-wide association studies (GWAS) of Bernese mountain dogs (BMD) and other predisposed breeds on histiocytic sarcoma (HS) with the imputation of single nucleotide variations (SNV) on a higher density SNV array.

Significant loci identified by genome-wide association studies (GWAS) after imputation on high-density single nucleotide variation (SNV) array.

Number of associated SNVs with the best SNV and the corresponding corrected p-value are presented for each locus and each GWAS. The addition of 60 Rottweilers (37 cases and 23 controls) to GWAS_7_HS_BMD+FCR_with_imputed_SNV formed a final dataset of 532,053 SNVs (GWAS_8_HS_BMD+FCR+Rottweiler_with_imputed_SNV), and led to the identification of 1,505 SNVs that were significantly associated with HS (Table 2, Fig 4E and 4F). This GWAS confirmed the involvement of the CFA2, CFA5, CFA11, and CFA14 loci in HS predisposition. SNVs in LD with significant SNVs in cases (R2 ≥ 0.6) allowed us to identify large regions spanning several Mb (up to 23 Mb for CFA11; Table 2). The analysis of these SNVs within the three breeds allowed us to reduce the CFA11 locus region to 38.4–41.4 Mb (Fig 4G). These analyses identified SNVs that were significantly associated with HS risk and were shared between the three predisposed breeds on at least chromosomes 2, 5, 11, and 14. To determine the proportion of HS risk that could be explained by these loci, we performed a restricted maximum likelihood (REML) analysis using GCTA software [40]. All chromosomes together could explain at least 61.8% of the phenotype (p-value ≤ 4.93 × 10−29; S1 Table). SNVs of the CFA11 locus could explain over 10.3% (p-value ≤ 3.5 × 10−19) of the HS phenotype; while, SNVs of the CFA14 locus explained a similar part of the phenotype, and the CFA5 could explain only 4.8–6.7% of the phenotype.

2.4. Haplotype analyses of HS loci identified risk haplotypes shared between breeds

To identify risk haplotypes tagged by the best SNVs and shared between HS cases, we determined the haplotype blocks including the best SNVs (Cf Materials and Methods), in each breed. On CFA11, we identified a haplotype block containing the best CFA11 SNV (41,252,822) that was more frequent in HS cases in BMD and Rottweiler breeds than in the controls (0.79 vs. 0.55 and 0.77 vs. 0.54 in BMD and Rottweiler, respectively), and significantly linked to the risk of developing HS (odds ratio = 3.03, p-value = 7.03 × 10−23 for BMDs; odds ratio = 2.61, p-value = 0.0175 for Rottweilers; Table 3). This block was frequently present in the FCR breed (53.8% and 56.6% in FCR cases and controls, respectively), and 75% of FCRs carried at least one copy of this risk haplotype. However, while the number of FCRs in the GWAS remained low, this haplotype did not appear to be enriched in HS cases (odds ratio = 0.89, p-value = 0.83). We identified a second independent CFA11 HS locus between 44 Mb and 45 Mb in the GWAS of the three breeds (GWAS_8_HS_BMD+FCR+Rottweiler_with_imputed_SNV; Fig 4G), which had already been identified in previous studies [26]. The best SNV in this region (CFA11:44,150,645) was not in LD with the best SNV of CFA11 (CFA11:41,252,822; R2 of the three breeds = 0.41), indicating that there were at least two independent peaks on CFA11 involved in HS predisposition. Haplotype analysis of the CFA11:44 Mb region indicated a common risk haplotype enriched in the cases of the three breeds (0.66 vs. 0.44), and was significantly associated with the risk of developing HS (odds ratio = 2.45, p-value = 3.2 × 10−19; Table 3).

Table 3

Association analysis of haplotypes of CFA11, CFA5, and CFA14 loci with the phenotype in predisposed breeds.

		Haplotype frequencies in the 3 breeds			Odds Ratio
		haplotype	affected	unaffected	Odds Ratio	confidence interval (Woolf Method)	pval
Locus: CFA11:41Mb	BMD	ATTTAAAAAGCCC{A/G}T	0,79	0,55	3.03	[2.42–3.79]	7.03x10-23
		{A/G}GCCCGGGGAAGTGC	0,21	0,45	0.32	[0.26–0.4]	2.56x10-23
		others	0,00	0,00	1.73	[0.32–9.47	0.69

	Rottweiler	ATTTAAAAAGCCCAT	0,76	0,54	2.61	[1.19–5.73]	0.0175
		AGCCCGGGGAAGTGC	0,22	0,41	0.39	[0.17–0.87]	0.024
		others	0,03	0,04	0.61	[0.08–4.49]	0.63

	FCR	GTTTAAAAAGCCCAT	0,54	0,57	0.89	[0.31–2.56]	0.83
		GGCCCGGGGAAGTGT	0,38	0,2	2.5	[0.76–8.25]	0.12
		{A/G}GCCCGGGGAAGTAC	0,08	0,23	0.27	[0.05–1.44]	0.15
Locus: CFA11:44Mb	BMD	ATGATAGGAACGGCAACT	0,65	0,44	2.41	[1.96–2.97]	7.27x10-17
		TCAGTTATTGTAATCGTC	0,12	0,25	0.41	[0.31–0.54]	4.91x10-11
		TCAGTTATTGCGGTCGTC	0,09	0,12	0.68	[0.49–0.95]	0.026
		TCAGCAATTGCGGTCGTC	0,09	0,11	0.81	[0.58–1.13]	0.23
		others	0,05	0,08	0.62	[0.4–0.95]	0.03


	Rottweiler	ATGATAGGAACGGCAG{T/C}C	0,82	0,57	3.61	[1.57–8.33]	0.003
		ATGACTATTGTAATCGTC	0,07	0,17	0.34	[0.1–1.11]	0.078
		ATGATAATAGCGGCAACT	0,05	0,15	0.32	[0.09–1.16]	0.1
		others	0,05	0,11	0.47	[0.12–1.85]	0.30

	FCR	ATGATAGGAACGGCAA{T/C}T	0,35	0,17	2.65	[0.76–9.29]	0.137
		TCAGCAGGAACGGCAATC	0,54	0,70	0.5	[0.17–1.5]	0.21
		others	0,12	0,13	0.85	[0.17–4.2]	1
			1,00	1,00

	all breeds	ATGATAGGAACGGCA---	0,66	0,44	2.45	[2.01–2.99]	3.2x10-19
		others	0,34	0,56	0.41	[0.34–0.5]	3.2x10-19
Locus: CFA5:33Mb	BMD	CTTTTCACACAAGTGTCCCGGTAGATT	0,75	0,59	2.05	[1.65–2.55]	1.01x10-10
		ACACCCGGGTTGACAGATTAACGACCT	0,15	0,25	0.53	[0.41–0.69]	1.87x10-6
		ATTTCTGGGTTGACAGATTAACGACCT	0,08	0,13	0.59	[0.42–0.82]	0.0016
		others	0,02	0,03	0.7	[0.37–1.32]	0.69

	Rottweiler	CCACCCACACAAGTGTCCCGGTA{A/G}ATT	0,69	0,63	1.3	[0.6–2.82]	0.55
		CCACCCACACAAGTGTCCCGATAGATT	0,12	0,17	0.66	[0.23–1.85]	0.66
		others	0,19	0,20	0.96	[0.38–2.44]	0.96

	FCR	ACTTTCACACAAGTGTCCCGGTAGA{T/C}C	0,69	0,33	4.5	[1.46–13.89]	0.015
		CCTTCTACACAAGTAGACCGGTAACCT	0,04	0,23	0.13	[0.01–1.14]	0.056
		CTTTCTGGGTTGACGTATTGACAGATT	0,12	0,20	0.52	[0.12–2.33]	0.48
		others	0,15	0,23	0.6	[0.15–2.34]	0.51

	all breeds	------CACACAAGTGTCCCGGTA-------	0,75	0,59	2.05	[1.67–2.52]	6.58X10-12
		others	0,25	0,41	0.49	[0.4–0.6]	6.58X10-12
Locus: CFA14:11Mb	BMD	ATAGGAACCCGCGT	0,74	0,61	1.79	[1.44–2.23]	1.86x10-7
		ACCAACGCACGCAT	0,17	0,19	0.85	[0.65–1.1]	0.25
		GCCAACGTATAAGT	0,05	0,15	0.34	[0.23–0.49]	2.65x10-9
		ATAGGAACCCGCAT	0,04	0,04	0.8	[0.48–1.34]	0.43
		others	0,01	0,01	0.64	[0.22–1.85]	0.43

	Rottweiler	ATAGGAACCCGCGT	0,57	0,43	1.77	[0.85–3.69]	0.14
		ATAGGAACCCGCAT	0,35	0,41	0.77	[0.36–1.64]	0.56
		GCCAACGTATAAGT	0,07	0,11	0.59	[0.16–2.16]	0.59
		others	0,01	0,04	0.2	[0.02–1.98]	0.15

	FCR	ACCAACGCACGCAC	0,46	0,20	3.43	[1.05–11.17]	0.036
		GCCAACGTATAAGT	0,23	0,53	0.26	[0.08–0.83]	0.02
		ATAGGAACCCGCAC	0,23	0,23	0.99	[0.29–3.44]	0.98
		others	0,08	0,03	2.42	[0.21–28.34]	0.59


	all breeds	GCCAACGTATAAGT	0,06	0,16	0.34	[0.24–0.47]	9.08x10-11
		others	0,94	0,84	2.92	[2.09–4.08]	9.08x10-11

Association analysis of haplotypes of CFA11, CFA5, and CFA14 loci with the phenotype in predisposed breeds.

The CFA11:41 Mb haplotype was determined by the genotype of the following SNVs: 41144469, 41161357, 41161441*, 41163558, 41166847, 41176819*, 41185205, 41192676, 41196587*, 41200012, 41204074, 41215628, 41217026*, 41218376, and 41252822. The CFA11:44 Mb haplotype was determined by the genotype of the following SNVs: 43263273, 43272660, 43290639, 43317981, 43690075*, 44070751, 44097553*, 44098601*, 44099662, 44121253, 44126921, 44129025, 44135605, 44150645*, 44152497, 44364615, 44366678, and 44367106*. The CFA5:33 Mb haplotype was determined by the genotype of the following SNVs: 33650581, 33788841*, 33801240*, 33807500*, 33823740*, 33839057, 33872506*, 33879286, 33883533, 33883991, 33897576, 33900059, 33906211, 33916864, 34213158, 34217528, 34220300, 34225552, 34225906, 34227643, 34231901*, 34232173, 34234461, 34239783*, 34240989, 34246822, and 34321500. The CFA14:11 Mb haplotype was determined by the genotype of the following SNVs: 11021670, 11023871, 11026224, 11028349, 11030215, 11033259, 11041572, 11067136, 11070626, 11077346, 11083217, 11092391, 11094795*, and 11108670. At risk haplotype in the breeds are represented in bold. CI: confidence interval (Woolf Method). * SNVs from the 173K SNV Canine HD. For CFA5, the GWAS analysis indicated that a large locus (25–35 Mb) overlapped the two CFA5 lymphoma loci, as previously identified by Tonomura et al. (29 Mb and 33 Mb) [16]. Similarly, the haplotype analysis of the best SNVs indicated that the common risk haplotype was delimited by 18 SNVs (33,839,057–34,234,461) in the three breeds. This haplotype was significantly enriched in BMD and FCR cases, and was common in Rottweilers (69% and 63% in cases and controls, respectively; Table 3). In the three breeds, this risk haplotype was associated with a significantly increased risk of developing HS (odds ratio = 2.05, p-value = 6.58 × 10−12). For CFA14, the third main HS risk locus, GWAS analysis revealed a large region with significant SNVs spanning from 0.4 to 11.3 Mb. The associated SNVs in this region were clustered in at least two peaks located 4.4 Mb apart (Table 2). The top SNVs of these two peaks were located at 6,566,022 and 11,021,670 with p = 1.09 × 10−6 and 1.52 × 10−6, respectively, in the GWAS including the three predisposed breeds (GWAS_8_HS_BMD+FCR+Rottweiler_with_imputed_SNV). With an R2 of 0.11 between these two SNVs across the three breeds, the data suggested that these peaks were independent. When considering the best SNV of CFA11 (CFA11:41,252,822), the two best SNVs of CFA14 were located at 10,665,001 (p = 1.14 × 10−7) and 11,021,670 (p = 1.16 × 10−7). Haplotype analysis of the region showed that the same risk haplotype was enriched in BMD and Rottweiler cases but not in FCR cases. Surprisingly, another haplotype, 5′-GCCAACGTATAAGT-3′, was enriched in the controls of the three breeds and was significantly associated with a decreased risk of developing HS in these predisposed breeds (odds ratio = 0.34, p-value = 9.08 × 10−11). These results suggested that the CFA14 locus contains a protective allele, which is shared by the three predisposed breeds (Table 3). Overall, this imputation with high density of SNVs allowed the identification of shared risk loci between the three HS predisposed breeds. We identified common risk or protective haplotypes shared by the predisposed breeds on major loci localized on CFA11, CFA14, and CFA5.

2.5. HS risk results from cumulative risk haplotypes

In these three breeds, cases had risk alleles on these three chromosomes, especially in BMD and Rottweilers, for which most cases had at least five risk copies (Table 4). In this cohort, no FCR controls had over three risk haplotypes. In the three breeds, the cumulative risk alleles on the three main loci (CFA11, CFA5, and CFA14) strongly impacted the probability of developing HS: carrying 5/6 risk alleles is associated with HS with an odds ratio of 5.27 (p-value = 1.52 × 10−30). Stepwise model selections of the 1,505 significant SNVs identified by GWAS_8_HS_BMD+FCR+Rottweiler_with_imputed_SNV were performed to create generalized risk-models for HS (S2 Table). Both the Stepwise Forward model selection or the Stepwise Forward and Backward model selection included markers of CFA2, CFA5, CFA11, and CFA14, with several markers spaced more than 4 Mb apart. The fact that some markers are separated by several Mbases on each chromosome, shows the cumulative effect of several independent risk haplotypes.

Table 4

Association of CFA5, CFA11, and CFA14 with the phenotype in predisposed breeds.

The number of risk alleles was determined with the genotype of the following SNVs: CFA5:30496048, CFA11:41252822, and CFA14:11021670. CI: confidence interval (Woolf Method).

	frequencies of risk alleles in the 3 breeds			Odds Ratio
	number of risk alleles	affected	unaffected	Odds Ratio	confidence interval (Woolf Method)	pval
BMD	≥5 risk alleles	0,72	0,31	5.67	[4.14–7.76]	1.44x10-29
	4 risk alleles	0,23	0,4	0.46	[0.34–0.63]	1.779x10-6
	≤ 3 risk alleles	0,05	0,29	0.12	[0.07–0.2]	1.34x10—20
Rottweiler	≥5 risk alleles	0,74	0,43	3.51	[1.17–10.53]	0.03
	4 risk alleles	0,18	0,26	0.66	[0.19–2.29]	0.53
	≤3 risk alleles	0,08	0,3	0.2	[0.05–0.88]	0.034
FCR	4 risk alleles	0,38	0	NA	NA	0.013
	3 risk alleles	0,54	0,53	1.02	[0.23–4.52]	1
	≤2risk alleles	0,08	0,47	0.1	[0.01–0.98]	0.037
all breeds	≥5 risk alleles	0,70	0,31	5.27	[3.92–7.08]	1.52X10-30
	4 risk alleles	0,23	0,37	0.51	[0.38–0.69]	1.45x10-5
	≤3 risk alleles	0,07	0,32	0.15	[0.1–0.23]	8.1X10-22

Association of CFA5, CFA11, and CFA14 with the phenotype in predisposed breeds.

The number of risk alleles was determined with the genotype of the following SNVs: CFA5:30496048, CFA11:41252822, and CFA14:11021670. CI: confidence interval (Woolf Method).

2.6. Capture and targeted sequencing of the three best HS candidate loci (CFA5, CFA11, and CFA14)

Since common haplotypes were detected in the three predisposed breeds on the three main loci (CFA5, CFA11, and CFA14), we sequenced these three regions to identify putative common variants. DNA samples from 16 dogs (10 BMDs, 4 Rottweilers, and 2 FCRs) from the three predisposed breeds, with a balanced distribution of risk and protective haplotypes, were selected for targeted sequencing. An average of 9,458 SNVs and 2,674 indels per sample were identified with a mean depth of 142 x per sample. These variants (SNVs and indels) were imputed on the remaining dog samples (453 HS cases and 385 controls from BMD, Rottweiler, and FCR breeds). A total of 2,608 significant variants (of which 1,886 were on CFA11) were identified while performing statistical analysis with the imputed genotypes. For CFA11, no coding variants were significantly associated with HS risk, and four of the top ten variants associated with HS predisposition were imputed genotypes and localized within 100 kb (S3 Table). The best-associated variant remained CFA11:41,252,822, which was already identified in the previous GWAS (see section 2.3.). CFA11:41,252,822 is localized in a non-coding transcript (CFRNASEQ_UC_00018829) that overlaps CDKN2A and CDKN2B antisense RNA (CDKN2B-AS1). Interestingly, looking at RNASeq data from Hoeppner et al [41], we found that CFRNASEQ_UC_00018829 was highly expressed in blood than in eight other tissues. Moreover, the human orthologous region of this SNV (chr9:21,996,622, hg38), close to the CpG island (chr9:21,994,103–21,995,911, hg38) and DNAse I hypersensitivity peak cluster (chr9:21,994,641–21,996,130, hg38), overlaps an enhancer (GH09J021996, chr9:21,996,543–21,996,791, hg38), which regulates CDKN2A and cyclin-dependent kinase 4 inhibitor B (CDKN2B). The fourth top variant was an indel that is localized in a non-coding transcript (CFRNASEQ_IGNC_00021613), 6,500 bp upstream of the CDKN2A transcript (S3 Table). Since it was previously shown that the expression of CDKN2A correlated with CFA11:41 Mb risk haplotype [26], we strongly suspected that these non-coding SNVs could deregulate the expression of CDKN2A. To identify the best candidate variants in the secondary loci, a complementary association analysis was performed by including the genotypes of the best variant (CFA11:41,252,822) as a covariate (S3 Table). We identified 2,413 variants (of which 21 SNVs were on CFA2, 1,533 SNVs were on CFA5, and 859 SNVs were on CF14) with significant residual associations in the three breeds. Residual association was found for the CFA11:44–45 Mb locus with the best SNV CFA11:45,941,548 (p-value = 0.008), close to C9orf72, and not in LD with CFA11:41,252,822 (R2 of the three breeds = 0.0899). This result confirmed the existence of two independent risk loci on CFA11 between 41–45 Mb. For the CFA14 locus, seven of the top ten variants were imputed variants, and were localized between positions 10,665,001 and 11,042,153. This locus overlapped POT1 antisense RNA 1 (POT1-AS1) and protection of telomeres 1 (POT1). In the CFA5 locus, nine of the top ten variants of CFA5 that were found to be associated with the risk of developing HS were imputed variants; they were localized between positions 30,483,338 and 30,496,048 that overlap the SPNS3 gene (S3 Table and S4 Fig). Interestingly, the second (CFA5:30,489,203) and third (CFA5:30,489,217) variants were localized in a region containing DNA methylation marks, and one of them included the CFA5:30,489,217 variant. Moreover, the nearby CFA5:30,489,203 variant created a CpG site. Thus, we hypothesized that these two close SNVs could be associated with allele-specific methylation in histiocytic cells. Bisulfite sequencing of HS cell lines confirmed that these two variants (CFA5:30,489,203 and CFA5:30,489,217) presented with specific alleles (CFA5:30,489,203-G and CFA5:30,489,217-C, respectively), thereby creating CpG sites with methylation in histiocytic cells (S3 Fig). This was also the case for the best HS GWAS SNV on CFA2 with the CFA2:29,716,535-G allele (S5 Fig). We hypothesized that these specific methylation alleles could be associated with modifications of regulation of neighboring genes expression.

2.7. Validation of major loci on independent cohorts

Since risk haplotypes were detected in the three predisposed breeds because of the imputed data, we genotyped variants of the three major loci on an independent cohort of BMD (186 cases and 176 controls) to validate the major role of these loci in HS (S4 Table). This analysis confirmed that the risk alleles of the top variants of CFA5, CFA11, and CFA14 (see section 2.4) significantly increased the risk of developing HS (odds ratios = 2.56–3.94, p-value = 2.4 × 10−6–6.93 × 10−16). Moreover, when a BMD case from this cohort carried 5/6 risk alleles, it strongly impacted the probability of developing this cancer with an odds ratio of 9.71 (p-value = 3.44 × 10−23). Since a majority of BMDs succumb to cancer, mostly HS, lymphomas, or mast tumors [26,30,35], we suspected that carrying these risk alleles would impact the life span of BMDs. We thus analyzed the risk alleles on an independent cohort of 317 dogs (age <10 years and without pathological diagnosis of HS (see Materials and methods) and correlated longevity with the number of risk alleles. This analysis confirmed that, independently of the known clinical status, carrying 5/6 risk alleles significantly impacted the longevity at the BMD population level (median = 7.5 vs 9.17 years, p-value = 0.0015, log-rank test; S6 Fig).

3. Discussion

3.1. Dog breeds: unique models to detangle the genetic features of human cancers

Several genetic studies of canine cancers have shown that high-risk dog breeds can lead to the advancement in genetics of rare cancers in humans [6,8,25]. Indeed, a limited number of critical loci have been identified in canine cancers [6,15-17,26], wherein some loci are shared by several canine cancers, and are also well-known in human cancers. Moreover, somatic alterations identified to date in canine tumors through genome-wide approaches, are found in the same genes [25,42,43]. HS affects a few dog breeds with incredibly high frequencies (BMD, Rottweiler, retrievers), and these breeds appear to be a perfect example to study the genetics underlying such a strong HS predisposition in dogs. Further, dissecting the genetic factors for rare cancers such as HS is challenging in humans. Hence, we proposed that dog models of HS could help in deciphering the genetic predisposition factors of this rare cancer. Previously, we had successfully identified a relevant locus on canine chromosome 11, and this locus is also well-known in canine osteosarcoma and several human cancers (human chr.9p21 locus) [17,44]. In this study, we presented the GWAS on a large cohort of several dog breeds affected by HS. We applied a multiple-breed approach to not only refine the previously identified locus on CFA11 but also identified additional loci on CFA2, CFA5, CFA14, and CFA20. Moreover, this study highlighted the fact that besides an initial association signal peak found in canine cancer GWAS, the independent risk haplotypes can be cumulative and shared by several dog breeds and in several cancers.

3.2. Genetic predisposition of canine cancers: cumulative risk haplotypes

In humans, the genetic architecture of cancer risk is usually described by a combination of rare variations in families with dominant inheritance patterns, and common variants with small-effect sizes in the population. Within-breed canine GWAS usually identifies fewer variants with stronger effects [18] because of breed structure and artificial selection of variants with strong effects, which is not necessarily the rule for cancer development. Indeed, because of the genetic drift of canine breeds resulting from the strong selection of the morphological criteria of sires and dams used for reproduction, deleterious alleles could be involuntarily selected and enriched in specific populations. Consequently, significant associations detected with cancer in dog GWAS could be due to cumulative risk alleles. In such conditions, it would be surprising if HS predisposition was only due to one risk haplotype, especially since at least 33 loci have been identified by previous GWAS for canine osteosarcoma predisposition [17]. Our results confirmed the main role of the CFA11 locus, along with the cumulative effects of at least two different risk haplotypes as suggested in our previous study [26]. Here, a multi-breed approach allowed to refine the main risk haplotype on CFA11 to a region of ~74 kb that was shared between BMDs and Rottweilers. Moreover, this study allowed the identification of a strong candidate variant overlapping CDKN2B-AS1 that regulates CDKN2A. Additional GWAS peaks were also identified on CFA2, CFA5, CFA11, CFA14, and CFA20. Some of these loci were shared between the three predisposed breeds, mostly between Rottweilers and BMDs, which was expected considering the close phylogenetic relationship between these two breeds [45]. FCR are a small number of dogs in France; thus, the number of FCR samples included in the study was low, and further GWAS will be needed to better decipher the shared predispositions between FCR and other breeds. Nevertheless, in these three predisposed breeds, the cumulative risk alleles on the three main loci (CFA11, CFA5, CFA14) strongly impacted the probability of developing this cancer with an odds ratio of 5.27, i.e., to be affected by HS when dogs carry 5/6 risk alleles. This study illustrated that the GWAS association detected between a cancer and a locus in dogs could hide the cumulative risk of several haplotypes. This study also confirmed previous findings in the golden retriever breed by Arendt et al. (2015) and Tonomura et al. (2015) who described that there were at least two independent risk haplotypes on the CFA20 locus for mast cell tumor and on the CFA5 locus for lymphoma, respectively [15,16].

3.3. Multi-cancer loci identified through a multi-breed approach

Additionally, we confirmed that some risk haplotypes were also involved in several cancers, as suggested by Tonomura et al., based on the association of the CFA5 locus with hemangiosarcoma and lymphoma [16]. Such pleiotropy at cancer risk loci has also been observed in human cancers, where one-third of the SNVs mapped to genomic loci are associated with multiple cancers [44]. Here, we confirmed that the multi-cancer effect of loci for CFA5, CFA11, and CFA20 influenced the risk of HS, lymphomas, osteosarcomas, and mast cell tumors in BMD or golden retrievers. However, further studies with higher SNV density in the golden retriever breed are needed to confirm whether the same predisposing risk alleles are shared with BMD. The CDKN2A locus was detected in Rottweilers and BMDs, and a neighboring region of the locus (CFA11:41.37 Mb; canFam3) was associated with osteosarcoma and was fixed in the Rottweiler population [17]. This shows that HS-affected Rottweilers were accumulating risk haplotypes for at least two cancers (osteosarcoma and HS) at this locus. Co-occurrence of two different risk haplotypes for HS and osteosarcoma across 200 kb also perfectly illustrates that a given locus can harbor multi-cancer risk because of different risk haplotypes as observed in humans. Indeed, in human GWAS, for some loci such as 8q24.21 (containing MYC), different risk SNVs are associated with different risk cancers, although they might ultimately converge toward the same oncogenic mechanism [44]. In such a situation, using GWAS in a multi-breed strategy can help decipher risk haplotypes when several dog breeds share several predispositions.

3.4. Pleiotropic effect of loci

Cancers are multigenic diseases wherein cumulative alterations in key pathways are considered as hallmarks of cancer [46]. HS involves histiocytic immune cells, and is suspected to be at the crossroads of immune dysregulation and cancer predisposition in dogs. HS predisposed breeds, especially BMD, are also predisposed to reactive histiocytic diseases [47], and other immune or inflammatory diseases such as glomerulonephritis, aseptic meningitis, and inflammatory bowel disease (https://www. bmdca.org/health/diseases.php) [48,49]. While no causal relationship was found between inflammation and HS, inflammation is suspected to contribute to HS development [50-52]. Thus, it is not surprising that HS GWAS hits overlap not only candidate tumor suppressor genes (TUSC1; tumor suppressor candidate 1) or other well-known tumor suppressors involved in cell cycle (CDKN2A), genome stability (telomere protection: POT1, replication stress/DNA damage: FHIT) but also inflammation (IL17rd, SPNS3, ARHGEF3; S5 Table). The GWAS hits highlighted an enrichment of genes involved in cancer pathways (cell cycle, p-value = 9.28 × 10−3; cellular senescence, p-value = 0.013; bladder cancer, p-value = 0.049; aging, p-value = 0.0024; signaling pathways regulating pluripotency of stem cells, p-value = 0.0066; TP53 network, p-value = 0.03) and lipid metabolism (regulation of lipid metabolism by peroxisome proliferator-activated receptor alpha [PPARalpha], p-value = 0.016; response to leptin, p-value = 0.0066). Expanding our search for HS association signals clearly showed overlaps with human GWAS signals (S5 Table). A number of these genes are not only known to be involved in the predisposition of several cancers (CDKN2A, POT1, FHIT) but are also associated with immune traits (monocyte, platelet, etc.), cholesterol, high density lipids/light density lipids, and allergens in humans. This suggests that the pleiotropic nature of these loci is not limited to cancer risk. This is apparent in humans; for instance, loci associated with N-glycosylation of human immunoglobulin G show pleiotropy with autoimmune diseases and hematological cancers [53]. Concomitantly to this work, a study by Labadie et al. (2020) confirmed the pleiotropic effect of these canine cancer loci by identifying a shared region for canine T zone lymphoma, mast cell tumors, and hypothyroidism in golden retriever; one of these loci on CFA14 that is involved in mast cell tumors and canine T zone lymphoma is ~800 kb downstream of POT1 locus identified in this study [54].

3.5. Genetic predisposition of HS in three most at risk dog breeds (CFA2, CFA5, CFA11, CFA14, and CFA20)

In addition to the CFA11 major locus, this study identified other HS loci on CFA2, CFA5, CFA14, and CFA20. Imputation was done to perform GWAS on a larger cohort of cases and controls from the three predisposed breeds with a higher density of SNVs. To avoid incorrect genotypes and potentially unreliable results after imputation, we used parameters previously shown to give accurate imputation in dogs [39]. Both BMD and FCR breeds were included in the reference panel of the higher density SNVs, imputation was performed within the same breeds, and simulations showed a good concordance of the imputed genotypes (91.97–95.86%). However, while there is a close genetic relationship between BMDs and Rottweilers [45,55], the lack of Rottweilers in the reference panel may be a potential limitation. Nevertheless, the shared risk haplotypes identified in the three predisposed breeds had SNVs only from the Illumina 173K SNV Canine HD array; thus, none of them were imputed in the FCR or Rottweiler cohorts (Table 3). Finally, we replicated these results on two independent cohorts of BMDs, and showed the reliability of the loci and risk alleles identified in this study, and their impact on the longevity of BMDs in the whole population. The CFA5 locus is associated with HS and lymphoma risk in BMDs, and the top SNVs of this locus are located in an intron of the SPNS3 gene. However, very little is known about this gene although its paralogous gene (SPNS2) is important in immunological development, and plays a critical role in inflammatory and autoimmune diseases, influences lymphocyte trafficking and lymphatic vessel network organization, and drives defective macrophage phagocytic functions [37,56]. Moreover, this region is associated in human GWAS with chemokine CLL2 [57]. Thus, SPNS3 is a strong candidate gene that can explain the predisposition to HS and lymphoma. Further, the CFA14 locus has already been suggested in our previous study [26]. Here, the best SNVs are located in the introns of POT1-AS1 and POT1. POT1 encodes a nuclear protein that is involved in telomere maintenance and in the predisposition and development of numerous cancers (S5 Table). This study identified a potential protective haplotype that was shared between the three predisposed breeds. The difference in age during onset of cases carrying zero or one copy, and the difference in age at death of controls carrying none, one, or two copies of this protective haplotype suggested that this shared protective haplotype was most probably involved in longevity, and thereby in the age of onset of HS (S7 Fig). CFA20 locus is associated with HS and mast cell tumor risk in BMDs. The top SNV here is present in the intron of FHIT, a tumor suppressor involved in apoptosis and prevention of the epithelial-mesenchymal transition, and one of the earliest and most frequently altered genes in most human cancers [38], including predisposition in breast cancer [57]. Moreover, stable nuclear localization of FHIT is a special marker for histiocytes, suggesting another function of FHIT as a signaling molecule related to antiproliferation function [58].

3.6. Causal variants

Like previous studies on canine HS predisposition [26] and human cancers [44,59], capture and targeted sequencing of the three HS candidate loci did not allow the identification of potential coding variants and risk variants that are common in the predisposed breeds. While the genotype concordance of imputed SNVs with the Beagle software was reliable, imputation of SNVs identified by the capture could limit the detection of causal variants. However, as in humans, majority of the loci identified from cancer GWAS do not directly affect the amino acid sequence of the expressed protein, and thus elucidation of causal variants is challenging, since all closely linked variants that are in LD with the best GWAS SNV are relevant candidates [44]. In this study, the best SNVs were either in the intronic part of the candidate genes (FHIT, SPNS3, etc.), upstream of a candidate gene (POT1), or overlapped long non-coding RNAs near a strong candidate gene (CDKN2A, POT1). Finally, we showed that the best variants linked to SPNS3 or PFKFB3 belonged to CpG sites methylated in histiocytic cells. This strongly suggests that HS predisposing variants in dogs are non-coding variants with regulatory effects. Under these conditions, the local cumulative risk haplotypes could reflect complex regulatory interactions such as the MYC locus, for which recent Hi-C analysis of its genomic region has demonstrated a complicated regulatory mechanism, thereby implicating that various large intergenic non-coding RNAs may mediate effects at the risk loci [44]. Therefore, further functional studies are needed to identify the involvement of such variants in the regulation of candidate genes.

3.7. Conclusions

In conclusion, we presented the largest GWAS of HS in dog cohorts through a multi-breed approach, and confirmed the main role of the CDKN2A locus (CFA11/HSA9q21), and identified four new loci (CFA2, CFA5, CFA14, and CFA20). We used multiple breeds and cancers to highlight the cumulative effect of different risk haplotypes behind each locus and their pleiotropic nature. Finally, while capture and targeted sequencing of specific loci did not lead to the identification of straightforward variants linked to cancer predisposition, our results pointed toward strong candidate genes and to the existence of regulatory variants in the non-coding regions and CpG islands linked to risk haplotypes, which lead to strong cancer predispositions in specific dog breeds.

4. Materials and methods

4.1 Ethics statement

The study was approved by the Centre National de la Recherche Scientifique (CNRS) ethical board, France (35-238-13).

4.2. Sample collection

Blood and tissue biopsy samples from cancer-affected and unaffected dogs were collected by a network of veterinarians through the Cani-DNA Biological Resource Center (http://dog-genetics.genouest.org), and DNA and RNA samples were extracted as previously described [10]. Samples were collected by veterinarians during the medical care of the dogs, with the informed consent of the owners. Blood and tissue samples were collected at a medical visit or during surgery, and then stored in tubes containing EDTA or RNAlater, respectively. We selected dogs based on their breeds i.e., BMD, Rottweiler, and FCR. For the cases, we included dogs with a pathology report confirming diagnosis of hematopoietic cancer (HS, mast cell tumor, or lymphoma), and for controls, we included dogs >10 years who were without cancer.

4.3. GWAS

DNA samples were genotyped on an Illumina 173K SNV Canine HD array at the Centre National de Génotypage (Evry, France) and on Affymetrix Axiom Canine Genotyping Array Set A and B 1.1M SNV array at Affymetrix, Inc. (Santa Clara, CA, USA; Table 1). SNV genotypes were filtered with pre-sample call rate >95%, per SNV call rate >95%, and minor allele frequency (MAF) = 0.01. Breed check was performed with a cluster tree and genetic matrix distances obtained from common SNVs of our data and in publicly available data [15,16]. Sex check was performed via the Plink 1.9 option “—check-sex “[60]. For GWAS including multiple breeds, we conducted this quality control protocol for each breed before merging the dataset. We used mixed linear model analyses by taking into account the population structure and kinship with R package “Eigenstrat–GenABEL” 1.8 [61] on R studio software (version 1.1.463; Vienna, Austria). P-values corrected for inflation factor λ were used, and we identified all SNVs with significant association exceeding 95% confidence intervals as defined empirically using 1,000 random phenotype permutations with the Eigenstrat–GenABEL 1.8 software [61]. For imputation, the Beagle software [62] version 4.1 was used to impute the Illumina 20K SNV and Illumina 170K SNV arrays to the Affymetrix 712K SNV array format as well as to input 21,614 variants identified from the capture and targeted sequencing of 16 dogs. Thereafter, 113 BMDs and 21 FCRs, representative of the BMD and FCR populations, were selected and genotyped on the Affymetrix 712K SNV array. These dogs were used as a reference panel for the imputation. SNVs for imputation were filtered for MAF > 0.05 and Hardy-Weinberg Equilibrium p-value > 1×10−7, as previously described by Friedenberg and Meurs [39]. All default settings of the Beagle software were used except for the following options: niterations = 50 window = 3000.

4.4. Capture and sequencing of targeted canine GWAS loci

In total, four loci were captured and sequenced for 16 dogs selected according to their haplotype: one localized on CFA5 (29,805,467–34,459,320), two on CFA11 (41,148,019–41,237,204 and 43,520,875–46,778,525), and one on CFA14 (9,949,911–11,524,424). Capture, sequencing, variant detection, and annotation were performed by IntegraGen S.A. (Evry, France). Genomic DNA was captured using Agilent in-solution enrichment methodology via the Agilent SureSelect Target Enrichment System kit (Agilent technology, Santa Clara, California, USA). The SureSelect Target Enrichment workflow is a solution-based system that uses ultralong 120-mer-biotinylated cRNA baits to capture regions of interest by enriching them out of a next-generation sequencing genomic fragment library. Library preparation and capture were followed by paired-end 75 base massive parallel sequencing on an Illumina HiSeq 2000 sequencer [63]. A custom-made SureSelect oligonucleotide probe library was designed to capture the loci of interest according to Agilent’s recommendations with 1× and 2× tiling densities using the eArray web-based probe design tool (https://earray.chem.agilent.com/earray). A total of 57,205 RNA probes were synthesized by Agilent Technologies, Santa Clara, CA, USA. Sequence capture, enrichment, and elution were performed according to the manufacturer’s instructions and protocols (SureSelect, Agilent) without any modification, except library preparation, which was performed instead with NEBNext Ultra kit (New England Biolabs). For the library preparation, 600 ng genomic DNA was fragmented by sonication and purified to yield fragments of 150–200 bp. Paired-end adaptor oligonucleotides from the NEB kit were ligated on repaired A-tailed fragments, then purified and enriched by eight polymerase chain reaction (PCR) cycles. Next, 1200 ng of these purified libraries were hybridized to the SureSelect oligo probe capture library for 72 h. After hybridization, washing, and elution, the eluted fractions were PCR-amplified with nine cycles, purified, and quantified by quantitative PCR (qPCR). Based on this quantification, an equimolar pool was acquired and quantified again by qPCR. Finally, the pool was sequenced on an Illumina HiSeq 2000 platform as paired-end 75 bp reads. Image analysis and base determination were performed using the Illumina RTA software version 1.12.4.2 with default settings. Bioinformatics analyses of sequencing data were based on the Illumina pipeline (CASAVA 1.8.2). CASAVA performs an alignment of a sequencing run to a reference genome (canFam3), calls SNVs based on allele calls and read depth, and detects variants (SNVs and indels). The alignment algorithm used was ELANDv2 (Maloney alignment and multi-seed reducing artifact mismatches). Only the positions included in the bait coordinates were conserved. Genetic variation annotation was performed using IntegraGen in-house pipelines. It consisted of gene annotations (RefSeq), detection of known polymorphisms (dbSNP), and variant annotation (exonic, intronic, silent, nonsense, etc.).

4.5. Predictive modeling

For model selection, we extracted the data only for the 1,505 significant SNVs identified by GWAS_8_HS_BMD+FCR+Rottweiler_with_imputed_SNV, and recoded them as 0 for AA genotype, 1 for AB genotype, and 2 for BB genotype. For the first model selection method, we used a Stepwise Forward selection based on a 0.05 alpha inclusion and exclusion threshold inspired from Zapata et al [20]. Briefly, the selection process began with a model with no terms. Independent variables were sequentially added based on their lowest p-values in the generalized model. Before adding the next term, the selection method would remove any variables that became non-significant after the inclusion of the previous term. The selection process was terminated when no more terms could be added or removed from the model. For the second model selection, we used a stepwise both forward and backward model selection using the R library MASS [64] to select the best predictive model. To reduce the number of SNV to test, a first selection was performed by adding in the generalized model all significant SNVs identified in Table 2 and removing any variables that became non-significant. The Stepwise both forward and backward model selection was performed on the 44 remaining significant SNVs.

4.6. REML analysis

Estimation of the phenotypic variance for HS based on genetic variance was performed by REML analysis using GCTA [40] with default settings. In our analyses, variance of a genetic factor was determined by the genotypes of SNVs on all autosomes and within the associated regions on chromosomes 5, 11, and 14. Log-likelihood ratio tests were performed with no estimated prevalence, since the prevalence of HS is unknown in Rottweiler and FCR breeds, and is 0.25 for BMDs.

4.7. Haplotype analyses

Minimal risk haplotypes for different breeds were identified on the associated loci. First, variants in strong LD (R2 > 0.8) with the top SNVs were identified in each breed using PLINK 1.9 [60] LD clumping, and used as an input for haplotype phasing in each breed with fastPHASE 1.4. [65]. Risk haplotypes enriched in cases were identified based on the top SNV genotype. Starting from the top SNV localization, and then moving both up- and downstream, we identified the SNV positions where the risk haplotype was broken by a recombination event (i.e., two alternative alleles were present on both the risk and non-risk haplotypes). This was done separately for each breed, and thereafter, the minimal shared risk haplotype across breeds was defined.

4.8. Gene-set enrichment analysis

Approved symbols of closest genes to the strongest significant signal per chromosome and per GWAS were analyzed with the online tool Webegestalt [66] to identify overrepresentation of pathways in the identified loci of the GWAS.

4.9. Methylation analyses

Methylation analyses were performed on 12 HS cell lines, including one commercial cell line (DH82; American Tissue Type Culture, CRL-10389; RRID:CVCL_2018) and 11 cell lines developed from HS-affected fresh dog tissue samples. These eleven cell lines are available on request (S6 Table). The cells were cultivated in complete Roswell Park Memorial Institute Medium (RPMI) medium containing RPMI 1640 GlutaMAX supplemented medium (Gibco Life Technologies) with 10% fetal bovine serum (HyClone, GE Healthcare, Life Sciences, Logan, UT) and 0.025% primocin (InvivoGen, Toulouse, France) at 37°C in a humidified 5% CO2 incubator. All cell lines were tested for mycoplasma using the MycoAlert Plus kit (Lonza, Rockland, ME) by following the manufacturer’s protocol, and were found to be mycoplasma-free cells. SNVs of CFA5 and CFA2 were sequenced by Sanger sequencing as previously described [10] with the following primers: CFA2_29716535-F: 5′-GGTGTACTTTCGGGTCCAAC-3′, CFA2_29716535-R: 5′-CCCTGTCATTCGATGTCCTT-3′, CFA5_30489203–30489217_F: 5′-CCTGAGTGAGTGGAATGAGGA-3′, CFA5_30489203–30489217_R, 5′-CTTCCTGCGACCTGCTGT-3′ in absence and/or (CFA2_29716242–29716795_FM: 5′-TAGGTGTTGGGTTTATATTGTTAGG-3′, CFA2_29716242–29716795_RM: 5′-CTTCCTGCGACCTGCTGT-3′, CFA5_30488738–30489339_FM: 5′-TAGGTGTTGGGTTTATATTGTTAGG-3′, CFA5_30488738–30489339_RM: AAACCTATTCTCTTTTTCTAATTCACTTTA) in the presence of bisulfite conversion (EZ DNA Methylation-Gold Kit, Ozyme, St Cyr–l’école, France).

4.10. Genotyping for validation on independent cohorts

Genotyping of the SNVs of Chr5_30488886, Chr11_41252822, and Chr14_11021670 was performed by targeted sequencing using Ion AmpliSeq technology with Ion GeneStudio S5 Prime System (Life Technology, ThermoFisher Scientific, Courtaboeuf, France). Probes were designed using the AmpliSeq design service, and Torrent Suite Software (v5.12) was used for sequencing data processing. Torrent Mapping Alignment Program (TMAP) software was used to perform read processing and mapping on the loaded genome (canFam3) using default parameters. Variant Caller Plugin was used for variant calling, and two independent cohorts were selected from the BMD samples collected between 2012–2020. The first independent cohort for genetic analyses was made of 186 cases and 176 controls. The second independent cohort for survival analyses comprised 317 dogs without a pathological report of HS and younger than 10 years (thus, independent of the first cohort). Survival probability was estimated using the Kaplan–Meier method, and the differences in longevity according to the number of risk alleles were tested using the log-rank test via the “survival” package of R [67]. Pedigrees of a Bernese mountain dog family showing the co-segregation of lymphoma (blue) and mast cell tumor (green) with histiocytic sarcoma (black). (TIFF) Click here for additional data file.

Genome-wide association studies (GWAS) on histiocytic sarcoma (HS) and lymphoma.

A–B. Bernese mountain dog (BMD) GWAS results for HS with 172 cases and 128 controls (GWAS_1_HS_BMD). A) Quantile-quantile plot displaying a genomic inflation λ of 1.000005, indicating no residual inflation. B) Manhattan plot displaying the statistical results from the GWAS. This analysis shows two loci (arrows) on chromosome 11 (CFA11:41161441, pcorrected = 3.11 × 10−7) and chromosome 20 (CFA20:30922308, pcorrected = 3.73 × 10−5). C–D. BMD GWAS results for HS and lymphoma with 252 cases vs. 128 controls (GWAS_2_HS+lymphoma_BMD). C) Quantile-quantile plot displaying a genomic inflation λ of 1.000005, indicating no residual inflation. D) Manhattan plot displaying the statistical results from the GWAS. This analysis shows two loci (arrows) on chromosomes 11 (CFA11:41161441, pcorrected = 1.5 × 10−6) and 5 (CFA5:30496048, pcorrected = 5.88 × 10−6). E–F. Meta-analysis combining the BMD GWAS for HS and lymphoma (252 cases vs. 128 controls) and the golden retrievers GWAS for lymphoma (41 cases vs. 172 controls) from Tonomura et al. [16] (GWAS_3_HS+lymphoma_BMD+golden_retriever). E) Quantile-quantile plot displaying a genomic inflation λ of 1, indicating no residual inflation. F) Manhattan plot displaying the statistical results from the GWAS. This analysis shows the locus on chromosome 5 (CFA5:32824053, pcorrected = 2.2 × 10−7). (TIFF) Click here for additional data file.

Genome-wide association studies (GWAS) on histiocytic sarcoma (HS) and mast cell tumor.

A–B. Bernese mountain dog (BMD) GWAS results for HS with 172 cases and 128 controls (GWAS_1_HS_BMD). A) Quantile-quantile plot displaying a genomic inflation λ of 1.000005, indicating no residual inflation. B) Manhattan plot displaying the statistical results from the GWAS. This analysis shows two loci (arrows) on chromosome 11 (CFA11:41161441, pcorrected = 3.11 × 10−7) and chromosome 20 (CFA20:30922308, pcorrected = 3.73 × 10−5). C–D. BMD GWAS results for HS and mast cell tumor with 216 cases vs. 128 controls (GWAS_4_HS+MCT_BMD). C) Quantile-quantile plot displaying a genomic inflation λ of 1.000005, indicating no residual inflation. D) Manhattan plot displaying the statistical results from the GWAS. This analysis shows two loci (arrows) on chromosomes 11 (CFA11:41161441, pcorrected = 6.94 × 10−7) and 20 (CFA20:30922308, pcorrected = 1.53 × 10−5). E–F. Meta-analysis combining the BMD GWAS for HS and mast cell tumor (216 cases vs. 128 controls) and European golden retriever GWAS for mast cell tumor (69 cases vs. 74 controls) from Arendt et al. [15] (GWAS_5_HS+MCT_BMD+golden_retriever). E) Quantile-quantile plot displaying a genomic inflation λ of 1, indicating no residual inflation. F) Manhattan plot displaying the statistical results from the GWAS. This analysis shows the locus on chromosome 20 (CFA20:33321282, pcorrected = 4.79 × 10−7). (TIFF) Click here for additional data file.

Identification of DNA methylation sites at the single nucleotide variations (SNVs) on the CFA5 locus included in CpG islands.

The UCSC track of chr5:30,489,183–30,489,230 with methylation track (Dog-MDCK-Meth) and Sanger sequencing performed on histiocytic sarcoma cell lines are represented. Sequencing of homozygous and heterozygous histiocytic sarcoma cell lines in the absence and presence of bisulfite treatment showed that the two SNVs present with allele-specific methylation in histiocytic cells. (TIFF) Click here for additional data file.

Identification of DNA methylation sites at the single nucleotide variations (SNVs) on the CFA2 locus included in CpG islands.

The UCSC track of chr2:29,716,519–29,716,550 with methylation tracks (Dog-MDCK-Meth, Dog-R3-Sperm-Meth) and Sanger sequencing performed on histiocytic sarcoma cell lines are represented. Sequencing of homozygous and heterozygous histiocytic sarcoma cell lines in the absence and presence of bisulfite treatment showed that the SNV presents with allele-specific methylation in histiocytic cell lines. (TIFF) Click here for additional data file.

Impact of risk alleles on the life span of Bernese mountain dogs (BMD).

Kaplan–Meier estimates of BMD longevity and the corresponding hazard ratio is represented according to the number of risk alleles (n ≤ 4 or n ≥ 5). Survival mean and survival median are 8.4 and 9.5 years, respectively, for BMDs with ≤ 4 risk alleles; whereas, the survival mean and survival median are 7.54 and 7.92 years, respectively, for BMDs with ≥5 risk alleles. (TIFF) Click here for additional data file.

Age of onset or death of Bernese mountain dog (BMD) histiocytic sarcoma (HS) cases and controls according to the number of copies of protective CFA14 haplotype.

Cases with zero copies, n = 183, mean age = 6.22 years. Cases with one copy, n = 24, mean age = 7.23 years. Cases with two copies, n = 1, mean age = 6.9 years. Controls with zero copies, n = 132, mean age = 11.21 years. Controls with one copy, n = 50, mean age = 11.27 years. Controls with two copies, n = 5, mean age = 11.97 years. One-sided Wilcoxon rank sum test was conducted. (TIFF) Click here for additional data file.

Variance explained by chromosomes 5, 11, 14, or all autosomes, as estimated by restricted maximum likelihood (REML) analysis.

(XLSX) Click here for additional data file.

Risk modeling of histiocytic sarcoma (HS).

(XLSX) Click here for additional data file.

Top 10 variants identified in chromosomes 11, 14, and 5 loci after imputation of captured variants on 455 histiocytic sarcoma (HS) cases and 408 controls from Bernese mountain dog (BMD), Rottweiler, and flat-coated retriever (FCR) breeds.

For chromosomes 14 and 5, the association analyses were performed by taking into account the information of the best SNV of chromosome 11 (CFA11:41252822). (XLSX) Click here for additional data file.

Association analysis of CFA11, CFA5, and CFA14 loci with the phenotype in an independent validation cohort of Bernese mountain dog (BMD; 186 cases and 176 controls).

At-risk alleles are represented in bold. CI: confidence interval (Woolf method). (XLSX) Click here for additional data file.

List of candidate genes identified by genome-wide association studies (GWAS) performed on several dog breeds and several cancers.

Indicated genes correspond to the closest genes with the best single nucleotide variation (SNV) per GWAS experiment. The involvement of these genes in human cancer or inflammation is summarized according to the NIH gene database (https://www.ncbi.nlm.nih.gov/gene). Their association with specific traits in human GWAS is summarized according to the PheGenI database (https://www.ncbi.nlm.nih.gov/gap/phegeni). (XLSX) Click here for additional data file.

List of cell lines developed from HS-affected fresh dog tissue samples.

The breed, the tumoral tissue and mutation status for PTPN11, KRAS and BRAF are indicated. (XLSX) Click here for additional data file. 8 Sep 2020 Dear Drs. André and Hédan, Thank you very much for submitting your Research Article entitled 'The tree that hides the forest: identification of common predisposing loci in several hematopoietic cancers and several dog breeds.' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review again a much-revised version. We cannot, of course, promise publication at that time. In particular, reviewers were unable to understand the design or logical order underlying the overall narrative. They indicated there may be meaningful discovery here, but suggested a major re-write could be most appropriate if their critical technical and statistical concerns were rigorously resolved. Based on the reviews, you could reduce the scope to focus on the strongest findings or could clarify the full study. Statistical significance, possibility of imputation problems and lack of specificity in referring to genetic markers, alleles and haplotypes must be addressed satisfactorily. The issue of lack of replication or other supporting evidence was raised. While this standard for publication of human genetics may not be required for animals, this limitation must be addressed if it were relevant (with suggestions of necessary follow-on studies). Please confirm the data will be deposited in a public archive with free access. Reviewer suggestions that are desirable include consideration of effect sizes and heritability. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Carlos E Alvarez Guest Editor PLOS Genetics Gregory Barsh Editor-in-Chief PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I consider focusing on germline for loci in these canine cancers and ultimately variants conferring susceptibility quite important and adding together different breeds a strength. The datasets are now becoming quite large and able to find interesting results. Q-Q plots indicate that proper corrections for populations have occurred and I have confidence that the SNP genotype data are treated properly. Here is what I understand: loci for different cancers can be shared within a breed and loci for the same cancer can be shared across breeds; a known locus has been further defined and additional loci detected; and the underlying variants are likely not in coding regions of genes. Major comments. But I confess to having a lot of difficulty in following the author’s presentation. After struggling I ended up questioning the logical order the authors use to tell their story. The progression of 7 GWA with adding breeds and cancer types using just the 173K SNP data was really a load to try to understand and the presentation in general did not help. Having to look for Tables 1 – 3, while the Figures were inserted in the text did not help. I suggest that the authors consider starting off right at the clearest analysis and the ones with the most strength; i.e., the large population sizes and imputed data from several lo density arrays on HS alone (GWA 8) that leads to Figure 6. And follow it with the HS GWA that adds in FCR and Rottweilers (GWA 9 and 10). With that strongest and clearest result in hand it seems that some of the figures and analyses with lower density SNPs leading up to it may not be necessary. And then go on to the stories of combining the HS with MCT and Lymphoma. The pedigree of Figure 2 showing cosegregation of cancers may not tell the whole story. Conclusions on a single pedigree can be influenced by ascertainment bias. And, how many of the dogs have more than one form of cancer? Can the numbers of dogs with each or both conditions across the entire population be provided instead? Anyway I suggest moving Figure 2 to Supplemental. Often relies on “top SNPs” and their p values, but converting this into understanding by the reader that these SNPs are tagging haplotypes and that haplotype distributions will be useful is not consistent. For example, is the CFA5 locus necessarily the strongest or most influential locus, or are the marker allele frequencies and its haplotype structure in the populations making it more easily discovered by GWA? Can the authors better provide and explain the haplotype information, numbers and frequencies between cases and controls and how this information resulted in simple regions defined in the figures? Better presenting the complexity of genomic landscape would be very helpful. Better explain the content and results from Tables 2 and 3 in the text. And if there is a limitation to number of Figures/Tables allowed in the main text, shift more of the allotment to the haplotype and loci tables that are now in the Supplement. Please bring out the story around the haplotypes where at most the frequency of a cancer associated haplotype is twice that in the cases than in the controls. Anyway, the discussion of the haplotypes in the text were not enough for me. I believe this wealth of information can be used to better quantitate the heritability and genetic architecture of HS and perhaps the other 2 cancers as well. The SNP genotype data from the cases and controls could be used to estimate narrow sense heritability (Yang, J., Lee, S.H., Goddard, M.E. and Visscher, P.M. (2011) GCTA: a tool for genome‐wide complex trait analysis. Am. J. Hum. Genet. 88, 76‐82.) Other comments: Intro on page 6. Transitioning between somatic tissue studies and mutations found there and GWA studies for underlying susceptibility loci in a very long paragraph is not smooth. Perhaps the use of the term “genetic bases” is not specific enough. Paragraph 2.1 When stating that the 3 loci are also found / or overlapping in GWA of other cancers can the authors find a way to better demonstrate to us that the regions truly overlap? As it stands they do not give the sequence coordinates routinely and just state a gene name or the chromosome number. Paragraphs are missnumbered in Table 1. It is not clear to me how the authors can claim 3 independent peaks on CFA5 for HS and Lymphoma in the combined breeds (Figure 3). Reviewer #2: The authors present a Genome-Wide Association Study that evaluates the risk of HS in a cohort of BMDs that was collected through a collaborative network of veterinarians. Samples collected were genotyped using the Illumina 173k and Affymetrix 1.1m SNP arrays that were further imputed for uniformity. GWAS of this data was sequentially complemented with the addition of genotype data from previously collected data for other cancers including Lymphoma and Mast Cell Tumor Cancer. These data sets were used to discover common SNP variants associated to combined cancer risk. Additional GWAS were performed where several breeds were added to refine the analysis. An evaluation of the imputed data was also included along with a sequencing effort to detect functional variants and a CNV assessment of risk of the chr14 loci. -Overall assessment based on the PLOS GENETICS criteria for publication with a numeric scoring by the reviewer (1-5, five being best): -Originality (2): The presented paper is very similar to others already published such as Karlsson et al 2013 and Tonomura et al 2015 (cited by the study). It does incorporate some new elements but overall they do not detect highly significant findings that are relevant to the field. -Importance (2): The reviewer believes that the dog is a very important emerging model for studying cancer. However, this field has been stagnant for some years where discoveries are not followed up. The novel loci discovered in this study a likely going to follow that trend. This is a sad reality of the field. -Interest to researchers (4): this type of approach is likely to be relevant to researchers in the field and is likely to be cited. Dog papers are very popular. -Rigor (4): The reviewer believes the data was rigorously evaluated as the methods are quite standard in the field. However, the paper requires further clarification of methodological details and a more through evaluation of the assumptions used (i.e. imputation bias of minor allele frequency SNPs) -Evidence (2): The study adds a couple candidates to the list of cancer risk genes but does not provide additional evaluation and critique of these new findings. In comparisons to similar papers such as Karlsson and Tonomura's, this paper looks weak. Major concerns: The paper requires a deep revision to correct grammar and typos. In addition, the paper is written in a very verbose and unnatural language that makes some arguments hard to understand. The reviewer strongly recommends a native English speaker to review and improve the syntax in the paper. There are too many instances to enumerate but some examples are offered at the end. The variants detected in this study are in large part the same as those already discovered for other cancers (such as osteosarcoma by Karson el al in 2013, cited in the paper). This main concerns is even clearly stated by the authors in section 3.4 "A number of these genes are not only already known to be involved in the predisposition of several cancers (CDKN2A, POT1, FHIT,…) but they are also associated with immune traits (monocyte, platelet…) or with cholesterol, HDL/LDL or allergens traits in Human". Curiously, this phenomenon is attributed to a pleiotropic effect (see section 3.3); however, also it is discussed in the same section a few lines after, how there are two variants within a 200kb region where one is fixed in the Rottweiler. This argument is counter intuitive because if there are two variants, how can it be pleiotropy? Although the GWAS implicated other novel loci associations, no additional biological argument supported by any other type of technology different from GWAS was provided to support their inclusion. No independent validation cohorts were used. It is undisclosed where the samples were collected from. Accounting for samples collected by the study and those added from previously published studies become complicated as they are progressively added into the argument. There are no clear descriptions of the samples in the methods and results that help account for samples included in the study. i.e. The reviewer believes the FCR samples were collected by the researchers but is not completely sure. The imputation effort made in the study is concerning even when it has some caveats on his favor. It appears to the reviewer that a 20k genotype dataset was imputed and expanded to a 712k as described in section 4.2. To the reviewer’s opinion, the magnitude of this imputation effort is extreme and very risky. However, to the researcher’s favor, they did this within a breed that are likely to be fixed over large portions of the genome. The reviewer believes it is impossible to determine how appropriate is this approach within the context of dog genomics and the study. In contrast and considering the Minor allele frequency exclusion of 0.01, in humans such threshold would still leave a significant risk of biased SNPs for low frequency SNPs (see Figure 2 in Johnson et al 2013 PMID: 23334152, attached). Although the reviewer considers this section to be plausible, it would be appreciated to have more details to evaluate imputation bias in this specific dog cancer and compared it to a more generalized context. Even when the sequencing effort was made to detect functional variants, there was no significant findings obtained from it, this is clearly stated in section 3.6. This dramatically limits the impact and relevance of the effort. For the figures, the corrected P-values mentioned in the text do not correspond to what is represented in the figures in the respective vertical axis. i.e. Figure 1 chr5 loci is 6.36E-5 while in the figure is barely above the E-4. This occur in all figures. For figure 2, besides being a complete chaos of vertical and horizontal mating, the color coding is not legible when printed in a high quality color laser printer. These two issues are likely to make this figure completely useless for the reader. Some arguments in the paper that are concerning to the reviewer: In the introduction "and models are strongly needed to better understand this dramatic cancer" --- This a weird statement and an example of the odd writing style used in the paper. What is a dramatic cancer? In the introduction "Furthermore, despite a strong/important heritability of HS in BMD, the awareness of this devastating cancer and attempt to selection against HS since 20 years, breeders have not succeeded to reduce the prevalence of this cancer" --- What evidence does the authors have that suggest there has been any effort to reduce prevalence? In section 2.3 "since it is never found in other 231 Swiss dog breeds (18 Appenzeller Sennenhund, 8 Entlebucher Mountain dog and 205 greater Swiss dog, data not shown)..." --- This sentence contains an obvious mistake that makes the sentence very confusing 231 Swiss dog breeds, more like 231 dogs belonging to Swiss breeds. Also, what is the relevance of the Swiss breeds to the BMD and Rottweiler argument when the Rottweiler is not a Swiss breed. In section 2.5 "When performing the statistical analysis with the imputed genotypes, no coding variant significantly associated with HS risk could be found and six of the ten top variants associated with HS predisposition are imputed genotypes and are localized within 100 kb on CFA11" --- Could this be due to the extreme imputation effort made in the study that is previously described as a concern by the reviewer? In section 3.2 " the cumulative risk alleles on the 3 main loci (CFA11, 5, 14) strongly impact the probability to develop this cancer with an Odds ratio of 5.41 to be affected by HS when dogs carry 5/6 risk alleles" --- This is an important finding to the reviewer that is not described or discussed. In section 3.6 "With multiple breeds and cancers approach, we highlight the cumulative effect of different risk haplotypes behind each locus and their pleiotropic nature" --- the reviewer is of the opinion that to make this statement it would have been necessary a modeling approach such as the one presented by Zapata et al 2019 where cumulative effects are expressly evaluated and collinearity is considered. Reviewer #3: Comments to the author: This study aims to identify loci associated with histiocytic sarcoma in the three dog breeds most at-risk for this cancer. To increase GWAS statistical power, the authors: a) increase sample size by including phenotypes for several closely related cancers (like lymphoma and mast cell tumor), making use of previously published data, b) increase sample size by combining cases and controls from several breeds known to have a predisposed risk (like golden retriever and Rottweiler) c) increase the number of SNPs to test by using several different arrays (ranging from a 22k custom capture array to the 1.1M Affymetrix array) and imputing between these. Significantly associated loci on 8 different chromosomes are identified – some also found in previous studies – and the authors follow up on several of these by investigating haplotypes, using targeted sequence capture and performing methylation analyses. The authors find several risk (or protective) haplotypes that are mostly shared between breeds and some even between the different cancer types, and which are regulatory in nature. This is a thorough and interesting research study, which propels the field of canine complex disease genetics. The manuscript is generally written well and easy to follow. But there are many minor English grammatical errors. Main comments: 1. Phenotyping – no details at all are provided on how cases and controls were selected. What criteria were used (apart from breed)? Was there a minimum age for controls? 2. Quality control – what other filters did you use on your genotyping data? Only MAF of 1% is mentioned. What about SNVs that failed Hardy-Weinberg? Or SNVs with high missingness? Did you do a sex check and a breed check (like a PCA) to make sure there were no sample mixups? What was the final SNV number left for analysis after these filters? More details are required. 3. Permutations – I’m a little concerned that the thresholds were all set based on only 500 permutations. 10,000 is a much more reasonable and usual number, but at least 1,000 permutations should be used. Also, more details about how these permutations were run is needed – program used, etc. And how was the threshold actually calculated? Some of the “significant” P-values are very low…certainly much lower than the Bonferroni correction (although realizing this is overly conservative). 4. There are no imputation accuracy metrics given or tests run. Or even previous papers showing canine imputation accuracy cited. How do we know the imputation worked well and provides reliable results? Minor comments: (Note: there are no page numbers or line numbers so that makes this difficult) 1. I’m not a fan of the first part of the title 2. Write out the gene names the first time they are used 3. Introduction, 2nd paragraph – “clinical presentation, histopathology and issue of this canine cancer…” – not sure what is meant by “issue”, remove. 4. Introduction, 2nd paragraph – “attempt to selection against HS since 20 years, breeders have not succeeded to reduce the prevalence” should be “attempt to select against HS for 20 years, breeders have not succeeded in reducing the prevalence” 5. Introduction, 3rd paragraph – “showed that they are also associated to other cancers in predisposed breeds” should be “with other cancers” 6. I think fig. 2 should be supplemental – it doesn’t show your results and is more like a side-note 7. Fig. 4 – addition of 44 cases with MCT to the HS data. But the number of controls decreases from 154 to 135 – why? 8. Section 2.3 last paragraph – “specific to BMD and Rottweilers HS cases since it is never found in other 231 Swiss dog breeds” should be “specific to BMD and Rottweiler HS cases since it is never found in a further 231 dogs of other Swiss breeds” 9. Section 2.3 last paragraph, last line – should be “close” not “closed” 10. Fig 5 & 6 - remove question marks in boxes next to A, B, C, etc 11. Section 2.4, 1st paragraph – 134 dogs were genotyped on Affymetrix array – more detail about the dogs selected – what breeds? Add to methods. 12. Section 2.4, 4th paragraph “no common risk haplotype can been identified with higher SNV…” should be “can be identified” 13. Section 2.4, last three words should be “Supplementary Table 5”. 14. Section 2.5, 1st paragraph – §3.4 doesn’t seem relevant to this sentence. 15. Section 2.5 – either use commas or spaces to separate the millions and thousands in bp coordinates but not both 16. Section 2.5 – the top ten variants are shown but how many significant SNVs were actually identified for each of these GWAS? 17. Section 2.5, 2nd paragraph – where is the data from the GWAS using the CFA11 best SNV as a covariate? Add the top ten results to Supp. Table 6. 18. Discussion – small “h” on human 19. Discussion, 3.1, 2nd sentence – switch the words “common” and “loci” 20. Discussion, 3.1, 3rd sentence needs rewriting. Maybe: “Interestingly, somatic alterations identified to date in canine tumors, through genome-wide approaches, are found in the same genes” 21. Discussion, 3.2 title – remove the “of” after cumulative 22. Discussion, 3.2, 2nd sentence needs rewriting. Maybe: “Within-breed canine GWAS usually identifies fewer variants with stronger…” 23. Discussion, 3.2, 3rd sentence – replace “beauty” with “morphological” 24. Discussion, 3.2, middle of the paragraph – need a fullstop after “Rottweilers” and then on the same line remove the repeated words “a strong” 25. Discussion, 3.2, last line – add in years for Arendt et al and Tonomura et al 26. Discussion, 3.4 – please put P-values into scientific notation and remove the … in the lists in brackets 27. Discussion, 3.4, last line – start needs rewriting. Maybe: “Concomitantly to this work, a study by Labadie et al confirmed the pleiotropic effect of these canine cancer loci by identifying a shared region for canine T zone lymphoma,…” 28. Discussion, 3.5, 2nd sentence – use “The CFA5 locus…” 3rd sentence – use “Very little is known…” 29. Discussion, 3.5, 1st paragraph, reference needed for sentence ending in “associated in human GWAS with the chemokine CCL2.” 30. Discussion, 3.5, 2nd paragraph – use “difference in age” instead of “difference of the age” 31. Discussion, 3.5, 3rd paragraph – use “The CFA20 locus..” Reference needed for sentence ending in “involved in human predisposition of breast cancer.” 32. Supplementary figure 5 caption – change all occurrences of “copy” to “copies” after either zero or two, i.e., should be “zero copies” and “two copies”. Also, second occurrence of “Cases with one copy” should be “Cases with two copies” and second occurrence of “Controls with one copy” should be “Controls with two copies”. 33. Methods – give the full name for abbreviations, like Cani-DNA BRC, CNRS, CNG 34. Methods, 4.1 – change the Ulve et al citation to the corresponding reference number 35. Methods, 4.1, 3rd sentence – should be “veterinarians” and no s on “medical care” 36. Methods, 4.2 – what numbers of dogs from what breeds were genotyped on the different arrays? Details missing. 37. Methods, 4.2 – GenAbel 1.8 – is this an R package? If so, you should state this. 38. Methods, 4.3, 3rd paragraph – small q on qPCR 39. Methods, 4.3, last sentence should read: “It consists of gene annotations (RefSeq), and detection…” 40. Table 1 – paragraph numbers are wrong 41. Supplementary table 6 – please explain the last column, Pc1df 42. Supplementary table 7 – it would help if you added the chromosomes (and approx. bp) for these genes ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: No: Just a note on privacy. The authors offered to make the data available in their private research team page. Although this page is public, it is still possible for the authors to track people who access it from their IPs. Within the close community of dog genomics researchers, it becomes very easy to determine the identity of possible competitors accessing the data and thus provide an unfair advantage to the authors. Reviewer #3: No: But the authors state that the genotypes will be available upon publication ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No Submitted filename: nihms-437755.pdf Click here for additional data file. 12 Jan 2021 Submitted filename: reponse-point-by-point _final.docx Click here for additional data file. 3 Feb 2021 Dear Dr Hedan, We are pleased to inform you that your manuscript entitled "Identification of common predisposing loci to hematopoietic cancers in four dog breeds" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Carlos Alvarez Guest Editor PLOS Genetics Gregory Barsh Editor-in-Chief PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): Please note that Fig. 4.I has all genes in upper case, except for "cdkn2A". The text also refers to CDKN2B and "CDKN2A-AS", but they are absent from this figure. CDKN2B-AS1 is well known, but I am not aware of "CDKN2A-AS"; please check the symbol of the AS gene you refer to. Let us know promptly if you decide to make changes to text or figure. -- Carlos Alvarez, Guest Editor Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The clarity and flow has greatly improved. The authors have addressed each of my concerns. One problem is that the text in several of the Tables is messed up, perhaps during conversion to a pdf? Reviewer #2: First of all, the authors dramatically improved the language and overall written quality of the manuscript. The language used in current revision sounds natural and clear. The addition of clear subject numbers at the beginning of each argument and the addition of table 1 are quite helpful for following and keeping track of the exact subjects used for each study section. The authors addressed the concerns and requests for clarification for other missing or confusing arguments. Table 4 is a great addition to the manuscript along with the addition of the additional text in section 2.5. In addition, the addition of the expanded and rearranged discussion talking about the pleiotropic effects in section 3.4 and genetic predisposition in section 3.5 is much appreciated. Although the reviewer has still some concern on the impact of such an extended imputation effort as a generalized approach in dog genomics, the rationale provided by the authors and the specific circumstances of this present study are reasonable to address such concern. The reviewer believes that these large approaches must be evaluated for appropriateness on a case by case basis where the specifics of the cohort, the reference panel and an “a posteriori” re-evaluation of the appropriateness of the imputation effort based on the amount of hits relying only in imputed SNPs. As mentioned in the first revision, the reviewer believes this approach is plausible and for this study, the new additions, corrections and additional context provided by the authors satisfy his concern for him to recommend the manuscript for publication. Reviewer #3: The authors have made substantial improvements to the manuscript. I thank them for taking the time to thoroughly address all my concerns. The tables (especially table 2) are hard to read because of all the question marks in boxes, but it sounds like this is due to a pdf conversion error, and will be sorted out by editorial staff. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-20-01160R1 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. 26 Feb 2021 PGENETICS-D-20-01160R1 Identification of common predisposing loci to hematopoietic cancers in four dog breeds Dear Dr Hédan, We are pleased to inform you that your manuscript entitled "Identification of common predisposing loci to hematopoietic cancers in four dog breeds" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Alice Ellingham PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics

62 in total

1. GCTA: a tool for genome-wide complex trait analysis.

Authors: Jian Yang; S Hong Lee; Michael E Goddard; Peter M Visscher
Journal: Am J Hum Genet Date: 2010-12-17 Impact factor: 11.025

2. GenABEL: an R library for genome-wide association analysis.

Authors: Yurii S Aulchenko; Stephan Ripke; Aaron Isaacs; Cornelia M van Duijn
Journal: Bioinformatics Date: 2007-03-23 Impact factor: 6.937

3. The MTAP-CDKN2A locus confers susceptibility to a naturally occurring canine cancer.

Authors: Abigail L Shearin; Benoit Hedan; Edouard Cadieu; Suzanne A Erich; Emmett V Schmidt; Daniel L Faden; John Cullen; Jerome Abadie; Erika M Kwon; Andrea Gröne; Patrick Devauchelle; Maud Rimbault; Danielle M Karyadi; Mary Lynch; Francis Galibert; Matthew Breen; Gerard R Rutteman; Catherine André; Heidi G Parker; Elaine A Ostrander
Journal: Cancer Epidemiol Biomarkers Prev Date: 2012-05-23 Impact factor: 4.254

4. Canine Cancer Genomics: Lessons for Canine and Human Health.

Authors: Elaine A Ostrander; Dayna L Dreger; Jacquelyn M Evans
Journal: Annu Rev Anim Biosci Date: 2018-11-12 Impact factor: 8.923

Review 5. Comparative oncology: what dogs and other species can teach us about humans with cancer.

Authors: Joshua D Schiffman; Matthew Breen
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-07-19 Impact factor: 6.237

6. Fhit protein is preferentially expressed in the nucleus of monocyte-derived cells and its possible biological significance.

Authors: P Zhao; N Hou; Y Lu
Journal: Histol Histopathol Date: 2006-09 Impact factor: 2.303

7. Epidemiology, pathology, and genetics of histiocytic sarcoma in the Bernese mountain dog breed.

Authors: Jérôme Abadie; Benoit Hédan; Edouard Cadieu; Clotilde De Brito; Patrick Devauchelle; Catherine Bourgain; Heidi G Parker; Amaury Vaysse; Patricia Margaritte-Jeannin; Francis Galibert; Elaine A Ostrander; Catherine André
Journal: J Hered Date: 2009-06-16 Impact factor: 2.645

8. Risk Factors Associated with Development of Histiocytic Sarcoma in Bernese Mountain Dogs.

Authors: A Ruple; P S Morley
Journal: J Vet Intern Med Date: 2016-05-10 Impact factor: 3.333

9. Risk-modeling of dog osteosarcoma genome scans shows individuals with Mendelian-level polygenic risk are common.

Authors: Isain Zapata; Luis E Moraes; Elise M Fiala; Sara Zaldivar-Lopez; C Guillermo Couto; Jennie L Rowell; Carlos E Alvarez
Journal: BMC Genomics Date: 2019-03-19 Impact factor: 3.969

Review 10. What animals can teach us about evolution, the human genome, and human disease.

Authors: Kerstin Lindblad-Toh
Journal: Ups J Med Sci Date: 2020-02-14 Impact factor: 2.384

6 in total

1. Increased risk of cancer in dogs and humans: a consequence of recent extension of lifespan beyond evolutionarily-determined limitations?

Authors: Aaron L Sarver; Kelly M Makielski; Taylor A DePauw; Ashley J Schulte; Jaime F Modiano
Journal: Aging Cancer Date: 2022-02-23

2. Four novel genes associated with longevity found in Cane corso purebred dogs.

Authors: Evžen Korec; Lenka Ungrová; Jiří Hejnar; Adéla Grieblová
Journal: BMC Vet Res Date: 2022-05-19 Impact factor: 2.792

3. Multi-omics approach identifies germline regulatory variants associated with hematopoietic malignancies in retriever dog breeds.

Authors: Jacquelyn M Evans; Heidi G Parker; Gerard R Rutteman; Jocelyn Plassais; Guy C M Grinwis; Alexander C Harris; Susan E Lana; Elaine A Ostrander
Journal: PLoS Genet Date: 2021-05-13 Impact factor: 5.917

Review 4. LncRNAs in domesticated animals: from dog to livestock species.

Authors: Sandrine Lagarrigue; Matthias Lorthiois; Fabien Degalez; David Gilot; Thomas Derrien
Journal: Mamm Genome Date: 2021-11-13 Impact factor: 3.224

5. Comparison of the Clinical Characteristics of Histiocytic Sarcoma in Bernese Mountain Dogs and Flat-Coated Retrievers.

Authors: Suzanne A Erich; Jane M Dobson; Erik Teske
Journal: Vet Sci Date: 2022-09-11

6. Genome-Wide Analyses for Osteosarcoma in Leonberger Dogs Reveal the CDKN2A/B Gene Locus as a Major Risk Locus.

Authors: Anna Letko; Katie M Minor; Elaine M Norton; Voichita D Marinescu; Michaela Drögemüller; Emma Ivansson; Kate Megquier; Hyun Ji Noh; Mike Starkey; Steven G Friedenberg; Kerstin Lindblad-Toh; James R Mickelson; Cord Drögemüller
Journal: Genes (Basel) Date: 2021-12-09 Impact factor: 4.096

6 in total