Literature DB >> 26859385

A Glance at Recombination Hotspots in the Domestic Cat.

Hasan Alhaddad1, Chi Zhang2, Bruce Rannala3, Leslie A Lyons4.   

Abstract

Recombination has essential roles in increasing genetic variability within a population and in ensuring successful meiotic events. The objective of this study is to (i) infer the population-scaled recombination rate (ρ), and (ii) identify and characterize regions of increased recombination rate for the domestic cat, Felis silvestris catus. SNPs (n = 701) were genotyped in twenty-two East Asian feral cats (random bred). The SNPs covered ten different chromosomal regions (A1, A2, B3, C2, D1, D2, D4, E2, F2, X) with an average region size of 850 Kb and an average SNP density of 70 SNPs/region. The Bayesian method in the program inferRho was used to infer regional population recombination rates and hotspots localities. The regions exhibited variable population recombination rates and four decisive recombination hotspots were identified on cat chromosome A2, D1, and E2 regions. As a description of the identified hotspots, no correlation was detected between the GC content and the locality of recombination spots, and the hotspots enclosed L2 LINE elements and MIR and tRNA-Lys SINE elements.

Entities:  

Mesh:

Year:  2016        PMID: 26859385      PMCID: PMC4747544          DOI: 10.1371/journal.pone.0148710

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Recombination is a major source of genetic variation within sexually reproducing organisms, and is necessary for the proper alignment and segregation of homologous chromosomes during meiosis. New combinations of parental alleles across loci are generated via recombination and transmitted to succeeding generations. Lack of recombination may cause failure of meiotic division, or formation of gametes with chromosome number abnormalities (aneuploidy), which are often detrimental. Localized chromosomal regions with high recombination rates relative to surrounding areas are referred to as recombination “hotspots” [1]. Known hotspots in mice and human are generally 1–2 kb regions of high recombination rates surrounded by regions of low recombination [2]. In humans, recombination hotspots are distributed about every 200 Kb [3] and over 25 000 hotspots have been identified [4]. The first human recombination hotspot was identified using restriction site polymorphisms in β-globin gene cluster [5]. A higher resolution human hotspot was localized using sperm typing via PCR in a region that harbors GC-rich mini-satellite (MS32) as a molecular signature [6]. Several genomic features exhibit correlations with recombination hotspots. The GC content has been found to be positively correlated with recombination hotspots in humans [4], dog [7], pig [8], and chicken [9] but not in mice [10]. Long terminal repeats (LTR), long interspersed elements (LINE), and short interspersed elements (SINE) were also observed to be positively correlated with the locality of recombination hot spots in human [4,11], mice [10] and pigs [8]. DNA motifs (cis) have been identified to be associated with recombination hot spots both in humans [4,12,13] and in other organisms [14]. In addition to the cis elements presented by the motifs, a trans element, PRDM9, was found to be a major determinant of recombination hotspots in human and mice [15,16] but not in dog and related wild relatives [7,17,18]. PRDM9 is thought to bind to a DNA motif via a zinc finger domain, altering the chromatin structure through methylation, and recruiting recombination molecular machinery [19,20]. Coarse-scale recombination in cats has previously been investigated through linkage map analyses using sparsely distributed microsatellite markers [21,22]. The objective of this study is to investigate fine-scale recombination rates, and recombination hotspots, in the domestic cat using population-level data for dense SNPs in ten selected genomic regions on ten different chromosomes.

Materials and Methods

Samples and genotypes

SNP genotype data of twenty-two feral cats from China were obtained as previously described in [23]. The data were generated using a custom Illumina GoldenGate array that represent ten different cat chromosomal regions (A1, A2, B3, C2, D1, D2, D4, E2, F2, X) and were composed of 1536 markers nearly equally distributed across the ten regions. Markers’ positions were updated to their location in the 6.2 cat genome assembly (http://genome.ucsc.edu/). To ensure successful inference, three criteria were used to filter the SNPs: (i) SNPs mapped to a single chromosomal location on the most recent genome assembly of cat with 100% sequence match, (ii) SNPs exhibiting a genotype call success rate of ≥ 80%, and (iii) SNPs possessing a minor allele frequency of ≥ 0.1. The final dataset, included in this study, is composed of 701 markers distributed over the ten regions (S1 Table). The genomic locations of the regions and related summaries are shown in Table 1 (see also Fig 1 and S2 Fig).
Table 1

Summary of the chromosomal regions analyzed for recombination in cats.

Chr.No. SNPsPosition (bp) 1Region size (bp)Mean inter-SNP distanceMean Rho (ρ/Mb)Number bins with BF 2
StartEnd< 100≥ 100Total
A153187 299 691188 082 901783 21015 061127.68391503915
A27398 172 53598 956 464783 92910 890186.513904153919
B388126 081 886126 859 181777 2958 934202.67388603886
C2534 360 3035 085 917725 61413 950291.85362803628
D17050 348 84551 177 183828 33812 000176.724131104141
D28659 655 06160 470 312815 2519 591192.93407404074
D47345 514 02646 304 562790 53610 980259.47395103951
E29042 987 23543 794 319807 0849 068309.484002324034
F27810 965 56212 451 3501 485 78819 300185.95742707427
X37100 018 322100 740 914722 59220 07077.45361303613
Mean70--851 96412 984201425364259
Total701--8 519 637--425315742588

1 Start and end positions are based on the 2011 cat genome assembly (ICGSC Felis_catus 6.2)

2 BF: Bayes factor.

Fig 1

Recombination overview of chromosomes A2, D1, and E2 regions.

(a-c) Posterior recombination rates (population size scaled) across chromosomes A2, D1, and E2 regions, respectively. Solid line shows the whole recombination rates, while the dashed line shows the background recombination rates. (d-f) Posterior probability of hotspots along chromosomes A2, D1, and E2 regions, respectively. (g-i) Bayes factor of hotspots for chromosomes A2, D1, and E2 regions, respectively. Horizontal dotted line corresponds to Bayes factor of 100 in a log10 scale. Position and distribution of SNPs included in the recombination analysis are at the bottom.

1 Start and end positions are based on the 2011 cat genome assembly (ICGSC Felis_catus 6.2) 2 BF: Bayes factor.

Recombination overview of chromosomes A2, D1, and E2 regions.

(a-c) Posterior recombination rates (population size scaled) across chromosomes A2, D1, and E2 regions, respectively. Solid line shows the whole recombination rates, while the dashed line shows the background recombination rates. (d-f) Posterior probability of hotspots along chromosomes A2, D1, and E2 regions, respectively. (g-i) Bayes factor of hotspots for chromosomes A2, D1, and E2 regions, respectively. Horizontal dotted line corresponds to Bayes factor of 100 in a log10 scale. Position and distribution of SNPs included in the recombination analysis are at the bottom. Three SNPs residing in a recombination hotspot on chromosome E2 region (see results) were chosen for genotype validation via sequencing. Primers were designed to flank the SNPs (S2 Table) and PCR was performed using DNA Engine Gradient Cycler (MJ Research, GMI, Ramsey, MN) using the following conditions. For each reaction, 2μl of DNA was used in a 1.5mM magnesium concentration with 1μM primers in a total reaction volume of 20μl. The annealing temperature of all primers was 62°C. The PCR protocol was as follows: initial denaturation at 94°C for 5 min followed by 40 cycles of 94°Cx45 sec, 62°Cx20 sec, 72°Cx30 sec, and a final extension at 72°C for 20 min. The PCR products were purified with ExoSap (USB, Cleveland, OH) per the manufacture’s recommendations and directly sequenced using the BigDye terminator Sequencing Kit v3.1 (Applied Biosystems, Foster City, CA) as previously implemented [24]. Sequences were verified and aligned using the software sequencer version 4.10 (Gene Codes Corp., Ann Arbor, MI).

Recombination inference

The program inferRho uses the coalescent with recombination model [25-27] in a full Bayesian framework to infer recombination rates and hot spots along the chromosome regions [28,29]. The evolutionary relationship of the SNPs is represented by the ancestral recombination graph (ARG), which is an unobserved random variable that is integrated over using Markov Chain Monte Carlo (MCMC). In the variable recombination rate model, the population-scaled recombination rate is ρ = 4Nc, where N is the effective population size, c is the recombination rate per generation in cM/Mb between marker i and i +1 (i = 1,…, k − 1, and k is the total number of SNPs). The total recombination rate (c) consists of the background crossing-over rate and recombination hotspots (arising according to a Markov process) [28,29]. As genetic data (SNPs) from a population sample of unrelated individuals only provide information about evolutionary distance, the parameters in coalescent estimators are typically scaled by the effective population size (i.e. θ = 4Nμ, where μ is the substitution rate per site per generation; and ρ = 4Nc). To convert ρ into standard unit (cM/Mb), we need to divide it by the ploidy factor and effective population size (4N for the nuclear genome of cats), which typically comes from other sources of information. To make the data analysis computationally tractable, each chromosomal region was divided into blocks, with 20 SNPs per block (e.g., chromosome A1 region with 53 SNPs was divided into 3 blocks, 20 SNPs in the first and second block and 13 SNPs in the third block). Blocks from each chromosomal region are assumed to share the same population size parameter (θ) but have an independent ARG for each block. Preliminary runs were executed to determine the appropriate length and thinning interval of the MCMC chain for each chromosomal dataset. The last half of the MCMC samples (1000 samples) was used to estimate the recombination rates (ρ) and to plot them with respect to the SNP marker positions. The mean population-scaled recombination rate per site for each chromosomal region was calculated using the following equation: in which d is the distance between marker i and i+1, and k is the total number of markers in the region. The numerator is total population-scaled recombination rate of the region, while the denominator is the length of this region in Mb.

Designation of recombination hot spots

We calculated the Bayes factors [30] to locate the positions of hotspots and represent relative odds of a hotspot being present. Each chromosomal region was divided into bins (200 bp per bin) for analysis of posterior samples, which contain the start and end positions of the hotspots, to estimate the probability of having hotspot (p) for each bin. The corresponding probability of hotspot for each bin (q) from the prior was obtained by running the program without data (e.g., constant likelihood) but maintaining the same sample size and marker positions. The Bayes factor (BF) is defined as the ratio of the posterior and prior odds: The odds ratio measures the proportional change in odds favoring a hotspot in the region (versus the prior odds) that results from including the data (SNPs). The total number of bins depends on the length of the region; thus the range of j is variable among the ten chromosome regions. Recombination hotspots are defined to have at least two consecutive bins with BF ≥ 100 whereas regions of BF < 100 are denoted as “neutral” bins awaiting further investigation (cf. section 3.2 in [30]).

GC content and genomic elements analyses

The GC content was calculated for the sequences of each bin using the function CG.content of package APE in R [31]. Variation and repeat elements within each of the ten chromosomal regions were downloaded from UCSC genome browser using RepeatMasker for the v6.2 cat genome assembly. Elements were analyzed separately for bins with BF < 100 and BF ≥ 100 (hotspot bins). The elements within neutral bins (BF < 100) were to provide the general overview of the elements within each chromosomal region.

Results

The dataset is composed of 701 SNPs on ten different cat chromosomal regions with an average of 70 SNPs/region. The chromosome E2 region harbors the highest number of SNPs (n = 90) while chromosome X region has the lowest (n = 37). SNPs are distributed across the regions with an average distance between SNPs of 12 Kb and a range of 145 bp– 651 Kb (Table 1). Four SNPs were selected for genotype verification using direct sequencing. The sequencing results were concordant to the genotypes obtained previously using the genotyping platform.

Recombination rates and hotspots

The ten regions on chromosomes: A1, A2, B3, C2, D1, D2, D4, E2, F2, and X, were analyzed using inferRho. The lengths of the MCMC chains were determined by the trace plot and effective sample size (ESS) of the parameters as: 600 000 iterations for A1, C2; 800 000 iterations for A2, D1, D4, F2; 1 000 000 iterations for B3, D2, E2; and 500 000 iterations for X. Two thousand posterior samples for each chromosomal dataset were obtained after thinning, while the first half was discarded as burn-in. Using a parallel computing approach, each run could be accomplished within one to two weeks on a cluster with 2 Opteron 270 (2.0 GHz) processors per node. Population recombination rate (ρ) is plotted for each region (Fig 1a–1c and S1a Fig). The mean recombination rate (ρ) across all regions is 200 per Mb. The E2 region exhibits the highest mean rate (309 /Mb) whereas X region has the lowest (77 /Mb) (Table 1). The difference between the background recombination rate and whole recombination rate is indicative of recombination spots. This difference is most noticeable in regions of chromosomes A2, D1, and E2 (Fig 1a–1c). The posterior probability of hotspot was calculated for bins of size 200 bp in each region (Fig 1d–1f and S1b Fig). Chromosomes A2, D1, and E2 show distinctly high posterior probabilities (> 0.6) in four localized areas (Fig 1d–1f). The posterior probabilities for the other seven chromosome regions (A1, B3, C2, D2, D4, F2, and X) are less than 0.2, indicating little support for elevation of recombination rate in these areas. The Bayes factors (Fig 1g–1i and S1c Fig) are consistent in pattern with the posterior probabilities (Fig 1d–1f and S1b Fig) and recombination rates (Fig 1a–1c and S1a Fig). Approximately 99% (n = 42,531) of bins were classified as “neutral” (BF < 100) across all regions examined. The hot spots were found in only three chromosomal regions (A2, D1, and E2) and represented ~ 0.13% (n = 57) of all bins studied. Summaries of the numbers and distribution of bins within each chromosomal region are provided in Table 1. The hotspots had size of 3 Kb in A2, 1.8 Kb in D1, and 1.8 Kb for the first and 4.6 Kb for the second in E2. The distance between the two hotspots on E2 was ~37.4 Kb (Table 2).
Table 2

Summary of recombination “hotspots” in cats selected regions.

Chr.Position (bp)Size (Kb)# binsMeanElements
StartEndPPBFGCLINESINELCLTRSRDNA
A298 237 70098 240 7003150.54339.850.35L1MEgSINEC-Fc2 MIR MIRbAT-rich--MER46C
D150 430 40050 432 2001.890.56271.900.38L2aSINEC-Fc2 MIRb (2)-MLT1J(TA)n-
E242 987 50042 989 3001.890.54418.270.40L2aMIR3----
E243 026 70043 031 3004.6230.65293.300.42L2b (2) L2aSINEC-Fc2 MIR--(GA)n (TG)nMER20
Mean2.80140.57330.830.39

PP: mean posterior probability of hotspot,

BF: mean Bayes factor,

GC: mean GC content within interval,

Elements: variation and repeat elements within interval (start and end positions)

(LINE: long interspersed element,

SINE: short interspersed element,

LC: low complexity,

LTR: long terminal repeat,

SR: simple repeat,

DNA: MER—Medium reiterated repeats).

PP: mean posterior probability of hotspot, BF: mean Bayes factor, GC: mean GC content within interval, Elements: variation and repeat elements within interval (start and end positions) (LINE: long interspersed element, SINE: short interspersed element, LC: low complexity, LTR: long terminal repeat, SR: simple repeat, DNA: MER—Medium reiterated repeats).

GC content analysis

The log10 of the Bayes factor was plotted as a function of the GC content in S2 Fig. Pearson’s correlation test revealed a positive correlation between GC and log10(Bayes factor) (cor = 0.077, p < 0.0001) but was not suggestive of strong correlation. Moreover, no significant differences in the mean GC content of each class of bins were observed (t-test, p = 0.05) (S2 Fig). The mean GC contents of the hotspots are shown in Table 2.

Repeat elements analysis

The four hotspot regions contained 22 repeat and variation elements (Table 2, Fig 2a). SINE elements constitute the highest proportion (40%) of the elements present in the hotspots followed by LINE elements (27%). Within LINE elements, L2 elements were present in three of the four hotspot regions. MIR family elements were present in all hotspot regions and tRNA-Lys family elements are present in three of the four elements. Low complexity, long terminal repeats, simple repeats, and DNA elements were inconsistently present across the hotspot regions.
Fig 2

A qualitative overview of variation and repeat elements in the cat chromosomal regions studied.

a) Variation and repeat element (n = 22) within four hotspots. All elements within the hotspots are shown at the tips of the outer circle. b) Variation and repeat element (n = 16,798) within “neutral” bins. Elements present more than a hundred times within the “neutral” bins are shown at the tips of the outer circle. The inner circle represents the elements’ classes, middle circle represents elements’ families, and outer circle represent the individual elements. Note: the Fig is intended a general description rather than a quantitative comparison.

A qualitative overview of variation and repeat elements in the cat chromosomal regions studied.

a) Variation and repeat element (n = 22) within four hotspots. All elements within the hotspots are shown at the tips of the outer circle. b) Variation and repeat element (n = 16,798) within “neutral” bins. Elements present more than a hundred times within the “neutral” bins are shown at the tips of the outer circle. The inner circle represents the elements’ classes, middle circle represents elements’ families, and outer circle represent the individual elements. Note: the Fig is intended a general description rather than a quantitative comparison. Variation and repeat elements within neutral regions where investigated to get a picture of the general distribution of repeat elements (Fig 2b). In neutral regions, the SINE elements represent the highest proportion of elements, 34%. The tRNA-Lys SINE elements constitute 22% and MIR SINE elements represent 12%. The second highest repeat elements was the LINE elements, 29%, where L1 elements represent ~17% and L2 elements represent 9.5% of all elements in the neutral regions.

Discussion

Advances in population genetic theory and technology supports the estimation of recombination rates directly from genotype data on population samples, overcoming the limitations of sperm typing or using large extended families [32]. The strategy of the population genetics based approach is to use information on the number of recombination events that have occurred in the history of the population, which can be detected by modeling the patterns of genetic variation expected to be present in randomly selected individuals. The model accounts for coalescent and recombination events in an Ancestral Recombination Graph (ARG). Markers will have coalescent trees that are likely to vary across the genome. In theory, all markers in a chromosome are correlated by an ARG. However, the size of the ARG may grow much faster than linear with the increasing number of SNPs, making it computational intractable to simultaneously analyze all markers in an analysis. In practice, the data are usually partitioned into blocks. Blocks from the same chromosomal region are assumed to share the same population size parameter (θ) but have an independent ARG for each block. For fast computation, some methods use an approximate likelihood instead of the full likelihood calculation, for example, the composite-likelihood method implemented in LDhat [33] and PAC-likelihood method implemented in PHASE [34]. Approximate-likelihood methods may be feasible to apply to large genomic regions, but may lack power to detect a moderate or low rate of recombination. Full-likelihood methods, such as implemented in inferRho, use all the information in the data and should therefore provide more accurate estimates [28,29]. The analysis of recombination hotspots in cats, presented here, constitutes the first application of the program inferRho to non-human data and the first analysis of fine-scale recombination rates in cats. The population recombination rate was found to be variable between the regions analyzed and, as expected, the mean recombination rate of X chromosome regions was lower than that of any autosomal regions. This variation in recombination rates and the notable reduced rate outside of the pseudoautosomal region of the X chromosome are in agreement with observations of recombination in human [4] and dog [7]. The latter result is expected due to that fact that recombination outside of the pseudoautosomal region occurs only in females. Four decisive hot spots were identified on three chromosomes: A2, D1 and E2. The localities of the hotspots are in agreement with the localities of increased posterior probabilities and the general topography of recombination rates as expected. Acknowledging the limitation posed by the total size of the regions analyzed (total ~ 8.5 Mb) compared to the size of the genome and the lack of power to perform correlation and element enrichment analyses, the following observations have been made: (i) the cat hotspots, identified in this study, show no distinct positive correlation with GC content. (ii) The four hotspots contain at least one L2 LINE element and MIR and tRNA-Lys SINE elements. (iii) The similarity of the repeat elements in cat hotspots compared to other mammals might suggest similar recombination mechanisms. This study represents a glimpse of recombination hotspots in cats and only an initial step toward understanding recombination in cats. The markers are sparsely sampled in some of the genomic regions, which may reduce power for inferRho to detect a large number of hotspots. As cat resources develop, genome-wide analyses could be performed allowing more definitive conclusions to be reached. Nonetheless, our preliminary description of the recombination landscape, and our finding that hotspots are present, helps shed light on the mechanism of recombination in cats compared to other species, furthering our understanding of the patterns of variation generated by recombination in cats, and potentially leading to better implementation of efficient disease mapping strategies in cats.

Recombination overview of seven chromosomal regions.

(a) Estimated recombination rates (population size scaled crossing-over rate) along each region. Solid line shows the estimated recombination rates between markers, while the dashed line shows the background recombination rates. (b) Posterior probability of hotspots across each region. (c) Bayes factor of hotspots along each region. Horizontal dashed line corresponds to Bayes factor of 10 in a log10 scale. (JPG) Click here for additional data file.

Analysis of GC content of recombination spots in cats.

(a) The GC contents (x-axis, 200 bp each bin) against the log10 of Bayes factors (y-axis). Gray circles represent neutral bins with BF < 100, and black circles represent hotspot bins with BF ≥ 100. (b) Boxplot of the GC content in the two classes of bins listed in (a). (JPG) Click here for additional data file.

SNP genotype data used to infer recombination hotspots in selected regions of the cat genome.

(XLSX) Click here for additional data file.

Primers of selected SNPs located in chromosome E2 hotspot.

(DOC) Click here for additional data file.
  30 in total

1.  A coalescent-based method for detecting and estimating recombination from gene sequences.

Authors:  Gil McVean; Philip Awadalla; Paul Fearnhead
Journal:  Genetics       Date:  2002-03       Impact factor: 4.562

2.  A fine-scale map of recombination rates and hotspots across the human genome.

Authors:  Simon Myers; Leonardo Bottolo; Colin Freeman; Gil McVean; Peter Donnelly
Journal:  Science       Date:  2005-10-14       Impact factor: 47.728

Review 3.  Insights into recombination from population genetic variation.

Authors:  Garrett Hellenthal; Matthew Stephens
Journal:  Curr Opin Genet Dev       Date:  2006-10-16       Impact factor: 5.578

4.  High-resolution mapping of crossovers in human sperm defines a minisatellite-associated recombination hotspot.

Authors:  A J Jeffreys; J Murray; R Neumann
Journal:  Mol Cell       Date:  1998-08       Impact factor: 17.970

5.  Nonuniform recombination within the human beta-globin gene cluster.

Authors:  A Chakravarti; K H Buetow; S E Antonarakis; P G Waber; C D Boehm; H H Kazazian
Journal:  Am J Hum Genet       Date:  1984-11       Impact factor: 11.025

6.  A molecular map of the immune response region from the major histocompatibility complex of the mouse.

Authors:  M Steinmetz; K Minard; S Horvath; J McNicholas; J Srelinger; C Wake; E Long; B Mach; L Hood
Journal:  Nature       Date:  1982-11-04       Impact factor: 49.962

7.  A genetic linkage map of microsatellites in the domestic cat (Felis catus).

Authors:  M Menotti-Raymond; V A David; L A Lyons; A A Schäffer; J F Tomlin; M K Hutton; S J O'Brien
Journal:  Genomics       Date:  1999-04-01       Impact factor: 5.736

8.  The fine-scale structure of recombination rate variation in the human genome.

Authors:  Gilean A T McVean; Simon R Myers; Sarah Hunt; Panos Deloukas; David R Bentley; Peter Donnelly
Journal:  Science       Date:  2004-04-23       Impact factor: 47.728

9.  APE: Analyses of Phylogenetics and Evolution in R language.

Authors:  Emmanuel Paradis; Julien Claude; Korbinian Strimmer
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

10.  PRDM9 binding organizes hotspot nucleosomes and limits Holliday junction migration.

Authors:  Christopher L Baker; Michael Walker; Shimpei Kajita; Petko M Petkov; Kenneth Paigen
Journal:  Genome Res       Date:  2014-03-06       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.