Literature DB >> 21867557

Analysis of human meiotic recombination events with a parent-sibling tracing approach.

Yun-Shien Lee1, Angel Chao, Chun-Houh Chen, Tina Chou, Shih-Yee Mimi Wang, Tzu-Hao Wang.   

Abstract

BACKGROUND: Meiotic recombination ensures that each child inherits distinct genetic materials from each parent, but the distribution of crossovers along meiotic chromosomes remains difficult to identify. In this study, we developed a parent-sibling tracing (PST) approach from previously reported methods to identify meiotic crossover sites of GEO GSE6754 data set. This approach requires only the single nucleotide polymorphism (SNP) data of the pedigrees of both parents and at least two of children.
RESULTS: Compared to other SNP-based algorithms (identity by descent or pediSNP), fewer uninformative SNPs were derived with the use of PST. Analysis of a GEO GSE6754 data set containing 2,145 maternal and paternal meiotic events revealed that the pattern and distribution of paternal and maternal recombination sites vary along the chromosomes. Lower crossover rates near the centromeres were more prominent in males than in females. Based on analysis of repetitive sequences, we also showed that recombination hotspots are positively correlated with SINE/MIR repetitive elements and negatively correlated with LINE/L1 elements. The number of meiotic recombination events was positively correlated with the number of shorter tandem repeat sequences.
CONCLUSIONS: The advantages of the PST approach include the ability to use only two-generation pedigrees with two siblings and the ability to perform gender-specific analyses of repetitive elements and tandem repeat sequences while including fewer uninformative SNP regions in the results.

Entities:  

Mesh:

Year:  2011        PMID: 21867557      PMCID: PMC3186786          DOI: 10.1186/1471-2164-12-434

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Meiotic recombination is important for generating genetic diversity. Meiotic recombination occurs between homologous chromosomes during chiasmata formation, a process that is required for normal chromosomal segregation during meiosis. While variation in recombination rates is a ubiquitous feature of the human genome [1], the mechanisms governing the distribution of crossovers along meiotic chromosomes remain largely unclear, with the exception of the recent discovery that Prdm9 is involved in the activation of mammalian recombination hotspots [2-5]. Sex-specific effects [6-8] on regional meiotic recombination have been described. Recombination rates are approximately 1.7-fold higher in female meiosis than in male meiosis. In addition, crossover rates in males are 5-fold lower near centromeres but 10-fold higher near telomeres compared with those in females [9]. These differences could be related to sex-specific patterns of initiation of synapses between homologs. For example, synaptonemal complex lengths are shorter in males than in females [10], and synapses appear preferentially in subtelomeric regions in males [11]. Meiotic recombination events can be measured directly or indirectly [12]. Physical crossovers between homologous chromosomes, indicating meiotic recombination events, can be directly observed at specific time points during spermatogenesis [13]. Alternatively, crossovers may be analyzed directly in cytogenetic analysis by labeling meiosis-related proteins, such as MLH1 [14]. Despite the unequivocal value of direct analysis, these techniques are labor-intensive and precision is limited. Therefore, most analyses of human recombination currently rely on indirect approaches such as genetic linkage analysis of human pedigrees. This involves tracking the inheritance of alleles at multiple polymorphic markers (short tandem repeat polymorphisms, STRP; or single nucleotide polymorphisms, SNP) along the chromosomes across generations [15-17]. Molecular markers in individuals with known pedigrees can be traced to an ancestral identity using either the identity by descent (IBD) method [12] or the identity by state (IBS) method [18]. Two alleles at a particular locus in the progeny are assumed to be identical if they are derived from an identical locus in a common ancestor. The IBD method requires knowledge of the genotypes of three generations to determine if the DNA segments are identical by descent from each generation. In the IBD method, shared results between each child and his/her paternal and maternal grandparents are analyzed separately. A paternal recombination event is detected when the IBD sharing "switches" from one paternal grandparent to the other. This application can be applied in the same manner for the maternal side. For instance, meiotic events can be switched between 2 SNP sites (Figure 1A and Additional File 1A). Therefore, application of the IBD method requires the pedigrees of three generations [12]. The IBS method was used to detect meiotic recombination sites between individuals by analyzing allele sharing between siblings [18]. Recently, Ting et al. also proposed another method for identifying meiotic recombination patterns based on two-generation pedigrees (pediSNP) [19]. In the pediSNP method, genotypes of two children are analyzed and compared with the genotype of one parent [19].
Figure 1

Different types of pedigrees are required for determining meiotic recombination sites by various methods. (A) Three-generation pedigrees are required for the identity by descent (IBD) method, and (B) complete two-generation pedigree for the parent-sibling tracing (PST) method. In the IBD method, the 'A' and 'B' allele in child 1 were required to originate from grandmother and grandfather, respectively. In PST approach, the paternal genotype was 'Aa' and the maternal genotype was 'AA', children with 'Aa' and 'aa' were coded as "0: not identical between siblings". If both children were 'Aa' and 'Aa' [or ('AA' and 'AA')], they were coded as "1: identical between siblings", (identical genotype origin for both children). : GF, grandfather; GM, grandmother; FA, father; MO, mother; CH1 and CH2, child 1 and child 2.

Different types of pedigrees are required for determining meiotic recombination sites by various methods. (A) Three-generation pedigrees are required for the identity by descent (IBD) method, and (B) complete two-generation pedigree for the parent-sibling tracing (PST) method. In the IBD method, the 'A' and 'B' allele in child 1 were required to originate from grandmother and grandfather, respectively. In PST approach, the paternal genotype was 'Aa' and the maternal genotype was 'AA', children with 'Aa' and 'aa' were coded as "0: not identical between siblings". If both children were 'Aa' and 'Aa' [or ('AA' and 'AA')], they were coded as "1: identical between siblings", (identical genotype origin for both children). : GF, grandfather; GM, grandmother; FA, father; MO, mother; CH1 and CH2, child 1 and child 2. Based on the distribution of SNPs in both parents and multiple siblings, meiotic cross sites in human chromosomes can be identified. This method was first proposed by Coop et al. in 2008 to trace the "informative markers" transmitted by the father to each offspring [6]. They defined the "informative markers" as SNPs that are heterozygous in the father and homozygous in the mother. In 2009, Chowdhury et al. used two datasets, namely, the Autism Genetic Research Exchange (AGRE) and the Framingham Heart Study (FHS), to characterize the variation in recombination phenotypes [20]. They analyzed sex differences and recombination jungles across the human genome, and described the gene loci associated with recombination phenotypes [20]. In this study, we have used a parent-sibling tracing (PST) approach, which was derived from two previous reports [6,20], to analyze the Genomic Medicine Research Core Laboratory, Taiwan (GMRCL) dataset of Affymetrix SNP6.0 arrays which consists of 900 K SNP markers and the GSE6754 dataset from Gene Expression Omnibus (GEO) [21], which consists of 853 families. Our analyses of this dataset of 2,145 meioses resulted in a 1-Mb-resolution recombination map. In addition, we were able to characterize the relationships between recombination sites and repetitive elements as well as the relationships between recombination sites and tandem repeats sequences.

Results

Comparison of two methods of detecting meiotic recombination sites

We used the GMRCL dataset of 900 K SNPs as a reference standard for comparison between the PST approach (Figure 1B) and previous approaches such as the IBD method [12] (Figure 1A). The code calling schema of PST is depicted in Figure 1B and Additional File 1B. Using chromosome 1 as an example, IBD analysis in both children could define the sites of meiotic recombination for paternal gametes. In child 1 and child 2, we observed 1 and 4 meiotic recombination events on their paternal gametes, respectively (Figures 2A and 2B). Using the PST approach, we could analyze the paternal genotypes for both children. When the paternal genotype was Aa and the maternal genotype was AA, children with Aa and AA were coded as "0: not identical between siblings". If both children were Aa and Aa [or (AA and AA)], they were coded as "1: identical between siblings" (identical genotype origin for both children). The PST approach (Figure 2C) detected the recombination sites of the combinatorial results for child 1 and child 2 as determined by IBD (Figures 2A and 2B). These results indicate that, using the SNP information of only two generations, PST can identify the origin of the recombination site. For the IBD method, information from three generations is required to determine whether the origin is from the grandfather or the grandmother. The 43 recombination sites identified in the GMRCL dataset using the IBD and PST methods are shown in Additional file 2.
Figure 2

The paternal recombination site on chromosome 1 of child 1 and 2 (CH1 and CH2, defined in Figure 1) in the GMRCL dataset were defined using the identity by descent (IBD) (. The grandmother and grandfather origin of paternal recombination is indicated as GM and GF, respectively. Children with identical or not identical origin are indicated as 1 and 0, respectively. Panels D and E are the enlarged view of the 114.6 -114.7 kb region on chromosome 1 shown in panels B and C, respectively, which are indicated by the black arrows. D and E: the SNP sites (open circles) that could not be mapped to either GF or GM in the IBD method, or to either an identical or non-identical status using the PST approach, are indicated as a uninformative SNPs. The calling schema of IBD and PST methods is shown in Additional File 1. The chromosomal regions without any SNP site in the Affymetrix Genome-Wide Human SNP 6.0 arrays are marked as gray blocks (A to C).

The paternal recombination site on chromosome 1 of child 1 and 2 (CH1 and CH2, defined in Figure 1) in the GMRCL dataset were defined using the identity by descent (IBD) (. The grandmother and grandfather origin of paternal recombination is indicated as GM and GF, respectively. Children with identical or not identical origin are indicated as 1 and 0, respectively. Panels D and E are the enlarged view of the 114.6 -114.7 kb region on chromosome 1 shown in panels B and C, respectively, which are indicated by the black arrows. D and E: the SNP sites (open circles) that could not be mapped to either GF or GM in the IBD method, or to either an identical or non-identical status using the PST approach, are indicated as a uninformative SNPs. The calling schema of IBD and PST methods is shown in Additional File 1. The chromosomal regions without any SNP site in the Affymetrix Genome-Wide Human SNP 6.0 arrays are marked as gray blocks (A to C). Comparison of the code calling schemas between the IBD and PST methods showed that IBD identified fewer genotyping combination calls than the PST approach. For instance, when we analyzed the recombination sites in the 100-kb genomic region located at 114.6 Mb on chromosome 1 (Figures 2B and 2C, indicated with the arrow), the numbers of uninformative SNPs in the recombination site for the IBD and PST methods were 22 and 19, respectively (Figures 2D and 2E), resulting in uninformative regions of 54 kb for the IBD method (Figure 2D) and 48 kb for the PST approach (Figure 2E), respectively. The use of the IBD and PST methods in the GMRCL sample led to the identification of 43 paternal recombination sites in child 1 and child 2. The mean numbers of uninformative SNP for the 43 paternal recombination sites were 71.2 and 36.7 for the IBD and PST methods, respectively (Table 1). The mean sizes of the uninformative regions for the 43 paternal recombination sites were 253 ± 349 kb (mean ± SD) with 110 (58 - 336) in Q2 (Q1-Q3) for the IBD method, and 167 ± 391 kb with 60 (23 - 157) in Q2 (Q1-Q3) for the PST approach (Table 1). The paired t-test showed that the PST approach resulted in significantly shorter uninformative regions than the IBD method (P < 10-10).
Table 1

Comparison of the size and SNP numbers in uninformative regions

IBDPST
Sibling#Q2 (Q1 - Q3) kb SNP#*Q2 (Q1 - Q3) kb SNP#*

900 K
Paternal2110 (58 - 336)71.260 (23 - 157)36.7
Maternal261 (19 - 189)39.6

Autism_3117
Paternal43291 (2255 - 5738)12.311751 (1270 - 3347)7.1
Maternal42683 (1249 - 5796)15.341806 (947 - 3389)5.9

Autism_3180
Paternal63768 (1858 - 6420)16.11701 (938 - 2853)5.8
Maternal62842 (1145 - 5789)14.62151 (1234 - 3712)6.9

Autism_8071
Paternal43557 (1877 - 6415)15.61892 (1195 - 3230)7.5
Maternal42046 (1130 - 4031)7.8

† Q2 (Q1 - Q3): Q2 (second quartile) = 50th percentile; Q1 (first quartile) = 25th percentile; Q3 (third quartile) = 75th percentile

* SNP#: Average number of SNPs in the "uninformative" region

Comparison of the size and SNP numbers in uninformative regions † Q2 (Q1 - Q3): Q2 (second quartile) = 50th percentile; Q1 (first quartile) = 25th percentile; Q3 (third quartile) = 75th percentile * SNP#: Average number of SNPs in the "uninformative" region

Analysis of the GEO dataset GSE6754 containing 11,000 SNP markers

The Affymetrix Human Mapping 10 K 2.0 Arrays (containing 10 K SNPs) were used to map autism susceptibility loci in the GSE6754 dataset [22]. Three three-generation pedigrees (family ID: 3117, 3180, 8071) were selected to compare the usefulness of the IBD and PST methods. Since the 10 K 2.0 array covered fewer SNPs, the mean size of uninformative regions were about 20-fold higher and the number of uninformative SNPs was approximately 6-fold lower than those of SNP 6.0 Arrays. Compared to other approaches, the PST approach identified fewer uninformative SNPs and smaller uninformative genomic regions (Table 1). In the 3864 arrays (853 families, 1721 parents, 2145 siblings) analyzed using the PST approach, the mean number of maternal recombination events was approximately 1.67-fold higher than that of paternal origin, with the highest value observed on chromosome 17 (2.00-fold) and the lowest on chromosome 22 (1.32-fold) (Table 2). The distribution of recombination events of paternal origin (mean 23.8 ± 4.1, median 22.5) and maternal origin (mean 39.5 ± 5.7, median 38.0) is presented in Figure 3A. The numbers of recombination events of each chromosome (2,145 maternal and paternal meioses) are summarized in Table 2.
Table 2

Number of recombination sites in 2145 siblings from 853 families

ChromosomeMaleFemaleFemale/male
chr1399068191.71
chr2391767231.72
chr3350758471.67
chr4300753611.78
chr5286452551.83*
chr6297150631.70
chr7258245601.77
chr8237842121.77
chr9249538831.56
chr10254444171.74
chr11234840171.71
chr12250341401.65
chr13199631621.58
chr14200727841.39*
chr15198828591.44*
chr16176229021.65
chr17139327832.00*
chr18185629031.56
chr19121020721.71
chr20161223831.48*
chr21100414441.44*
chr2296512651.31*
chrX2932

* P value < 0.01 (chi square analysis under the null hypothesis that the male-to-female proportion was 1.667)

Figure 3

Distribution of the 2,145 paternal and 2,145 maternal recombination events across all human autosomal chromosomes (A), chromosome 1 (B) and chromosome 6 (C). (A) The distribution of the numbers of the paternal (blue bar) and maternal (red bar) recombination events across autosomal chromosomes. (B) The number of recombination sites for chromosome 1 was calculated using a window width of 1 Mb. The middle and lower panel of the Figure 3B are the Marshfield recombination map and Icelandic recombination map, respectively. The maternal (red) and paternal (blue) genetic distance for each 1-Mb window was calculated on the basis of the SNP position information provided by Affymetrix. We assumed a constant crossover rate between two adjacent SNP markers. The physical position and the chromosome ideogram are shown on the top and bottom of the figure, respectively. (C) The regression lines for maternal (red) and paternal (blue) crossover rates corresponding to the distance from the centromere are shown, using chromosome 6 as an example. The slope was significantly different from zero in the p arm of male but not in female chromosomes. In contrast, both genders showed a significant correlation in the number of recombination sites towards the telomere of the q arm. The chromosomal regions without any SNP site in the Affymetrix Genome-Wide Human SNP 6.0 arrays are marked as gray blocks.

Number of recombination sites in 2145 siblings from 853 families * P value < 0.01 (chi square analysis under the null hypothesis that the male-to-female proportion was 1.667) Distribution of the 2,145 paternal and 2,145 maternal recombination events across all human autosomal chromosomes (A), chromosome 1 (B) and chromosome 6 (C). (A) The distribution of the numbers of the paternal (blue bar) and maternal (red bar) recombination events across autosomal chromosomes. (B) The number of recombination sites for chromosome 1 was calculated using a window width of 1 Mb. The middle and lower panel of the Figure 3B are the Marshfield recombination map and Icelandic recombination map, respectively. The maternal (red) and paternal (blue) genetic distance for each 1-Mb window was calculated on the basis of the SNP position information provided by Affymetrix. We assumed a constant crossover rate between two adjacent SNP markers. The physical position and the chromosome ideogram are shown on the top and bottom of the figure, respectively. (C) The regression lines for maternal (red) and paternal (blue) crossover rates corresponding to the distance from the centromere are shown, using chromosome 6 as an example. The slope was significantly different from zero in the p arm of male but not in female chromosomes. In contrast, both genders showed a significant correlation in the number of recombination sites towards the telomere of the q arm. The chromosomal regions without any SNP site in the Affymetrix Genome-Wide Human SNP 6.0 arrays are marked as gray blocks. In order to identify the regions with the highest and the lowest number of recombination events, we scanned the entire human genome. We first divided the genome into 2,765 bins of 1-Mb each. We then identified the number of recombination sites in each bin separately for female and male meioses. The results obtained from chromosome 1 are shown in Figure 3B (see the Additional file 3 for the results on other chromosomes). We also compared the recombination maps obtained from dataset GSE6754 with Marshfield map [23] (Figure 3B, middle panel), and Icelandic map [16] (Figure 3B, lower panel). The correlation coefficients between the data in GSE6754 map and Icelandic map and Marshfield map were r = 0.49 and r = 0.31, respectively. To test the hypothesis that recombination rates are lower near the centromere but higher near the telomeres in men, we analyzed the correlation between the distances from the recombination sites to the centromere and the number of recombination sites. We found significant correlations (P < 0.00001) on chromosomes 1q, 2p, 3q, 4q, 5p, 5q, 6p, 6q, 7q, 8q, 9p, 9q, 10p, 10q, 11q, 12p, 12q, 16q, 18q, 19q, 20q, 21q in men. In contrast, similar correlations were found only on chromosome 1q and 6q in women (Table 3). For instance, the slope of correlation was significant in p arm of chromosome 5 in men but not in women (Figure 3C). On the other hand, both sexes showed significant correlations in the number of recombination sites near the telomere in the q arm. SNP information was not available for the p arm of chromosomes 13, 14, 15, 21, and 22.
Table 3

Correlation of the distance from the recombination site to the centromere with the number of recombination events

MaleFemale
Chromosomep armq armp armq arm

chr10.039794.1E-100.634133.7E-10
chr23.1E-070.120.007830.01009
chr30.001271.1E-110.000223.9E-05
chr40.008652.1E-090.206080.00063
chr51.9E-075.3E-060.875120.00262
chr63.4E-076.7E-160.102385.7E-08
chr70.003292.2E-070.96580.00189
chr83.9E-052.6E-060.331930.41184
chr92.3E-121.9E-080.000640.01546
chr102.3E-091.6E-070.924430.00077
chr110.012941.1E-080.838314E-05
chr126.2E-063.6E-060.177440.00675
chr13NA0.88298NA0.41116
chr14NA0.42025NA0.0348
chr15NA0.10395NA0.00605
chr160.097966.8E-090.887470.34803
chr170.74780.000620.155360.69596
chr180.020792E-090.34420.22101
chr190.955244.7E-100.064230.11279
chr204.1E-052.6E-070.335460.04724
chr21NA2.3E-10NA0.66508
chr22NA0.0005NA0.05598

*NA: SNP information not available for the p arm of chromosomes 13, 14, 15, 21, and 22.

The values in bold indicate a P value < 1E5

Correlation of the distance from the recombination site to the centromere with the number of recombination events *NA: SNP information not available for the p arm of chromosomes 13, 14, 15, 21, and 22. The values in bold indicate a P value < 1E5

Relation between the recombination site and repetitive elements

We compiled 57 major repetitive element classes that were characterized by RepeatMasker [24]. Twenty-three repetitive-element classes were identified in more than 6,000 sites in the human genome. After downloading the location information of the human CpG islands from the UCSC database [25], we divided the genome into 2,765 bins of 1-Mb each and determined the number of repetitive-element sites in each bin. Using the 53,487 repetitive-elements on chromosome 1 as an example, we depicted the distribution of SINE/MIR (green lines in Figure 4A) and LINE/L1 sites (green lines in Figure 4C). In addition, the distributions of meiotic recombination sites (both paternal and maternal combined) are shown as blue lines. In each 1-Mb bin, we also analyzed the correlation between the number of meiotic recombination sites and the number of SINE/MIR (plotted in Figure 4B) and LINE/L1 sites (plotted in Figure 4D). The correlation coefficients between recombination sites and SINE/MIR and the correlation coefficients between recombination sites and LINE/L1 were 0.23 (P = 0.0005) and 0.29 (P = 0.00001), respectively.
Figure 4

Correlation between the number of sex-averaged recombination sites and SINE/MIR (A, B) or LINE/L1 (C, D) repetitive sequences elements. The distribution of the number of sex-averaged recombination sites (blue) and repetitive sequences elements (green) on chromosome 1 was calculated using a window width set to 1 Mb (A, C). The scatter plot shows the number of sex-averaged recombination sites and repetitive sequences on chromosome 1 (B, D). Regression lines are marked in red. The chromosomal regions without any SNP site in the Affymetrix Genome-Wide Human SNP 6.0 arrays are marked as gray blocks.

Correlation between the number of sex-averaged recombination sites and SINE/MIR (A, B) or LINE/L1 (C, D) repetitive sequences elements. The distribution of the number of sex-averaged recombination sites (blue) and repetitive sequences elements (green) on chromosome 1 was calculated using a window width set to 1 Mb (A, C). The scatter plot shows the number of sex-averaged recombination sites and repetitive sequences on chromosome 1 (B, D). Regression lines are marked in red. The chromosomal regions without any SNP site in the Affymetrix Genome-Wide Human SNP 6.0 arrays are marked as gray blocks. The correlation coefficients and the corresponding P values for each of the 23 repetitive-elements, CpG island sites, and meiotic recombination sites are summarized in Table 4. The repetitive elements SINE/MIR, DNA/hAT-Charlie, DNA/hAT, LINE/L2, SINE/Alu, DNA/hAT-Tip100, DNA/hAT-Blackjack were positively correlated with meiotic recombination sites. In contrast, repetitive elements, which included LINE/L1, LTR/ERVK, and Low complexity (Table 4), showed negative correlation with meiotic recombination sites. In general, we found no significant differences in the distribution of maternal and paternal recombination sites. The scatter plots of the correlation analyses of repetitive elements SINE/MIR and LINE/L1 in the entire human genome are shown in Figure 5.
Table 4

Correlation between the recombination sites and particular repeats

PaternalMaternalBoth
RepeatNumberCorrPCorr.PCorr.P

SINE/MIR5105800.221E-160.301E-160.291E-16
DNA/hAT-Charlie2149010.181E-160.311E-160.291E-16
DNA/hAT106240.163E-130.241E-160.231E-16
LINE/L23972940.163E-130.221E-160.211E-16
SINE/Alu9262990.101E-040.221E-160.191E-16
DNA/hAT-Tip100260870.103E-050.201E-160.181E-16
DNA/hAT-Blackjack170190.154E-120.162E-130.171E-16
Simple_repeat3434740.191E-160.114E-060.168E-14
LINE/CR1522440.094E-040.155E-120.139E-09
DNA/TcMar-Tc269790.111E-050.133E-080.133E-08
DNA/TcMar-Mariner140460.105E-050.132E-080.122E-07
CpG196610.062E-010.103E-050.111E-06
LTR/ERVL-MaLR2925200.112E-060.096E-040.118E-06
LTR/Gypsy?69120.0510.114E-060.082E-03
Unknown61740.055E-010.064E-010.051
LINE/RTE154210.0410.062E-010.051
LTR/Gypsy94290.0210.0410.031
DNA/TcMar-Tigger87328-0.0110.0410.011
LTR/ERVL134989-0.031-0.031-0.041
LTR/ERV1139204-0.137E-09-0.106E-05-0.132E-08
Low_complexity314872-0.113E-06-0.176E-15-0.181E-16
LTR/ERVK8019-0.191E-16-0.163E-13-0.191E-16
LINE/L1767428-0.168E-13-0.191E-16-0.201E-16

1. Repeat classes including more than 6000 repeats were considered for the purpose of analyses.

2. Corr.: correlation coefficients between the recombination sites and specific repeats.

3. P values under the null hypothesis of an absence of correlation. The Bonferroni's correction was applied for multiple comparisons. An adjusted P value > 1 was reported as 1.

4. The values in bold indicate a P value < 1E5

Figure 5

Scatter plot of the number of paternal (A, D), maternal (B, E), and sex-averaged (C, F) recombination sites for the SINE/MIR (A, B, C) and LINE/L1 (D, E, F) repetitive sequences on chromosome 1. Regression lines are marked in red.

Correlation between the recombination sites and particular repeats 1. Repeat classes including more than 6000 repeats were considered for the purpose of analyses. 2. Corr.: correlation coefficients between the recombination sites and specific repeats. 3. P values under the null hypothesis of an absence of correlation. The Bonferroni's correction was applied for multiple comparisons. An adjusted P value > 1 was reported as 1. 4. The values in bold indicate a P value < 1E5 Scatter plot of the number of paternal (A, D), maternal (B, E), and sex-averaged (C, F) recombination sites for the SINE/MIR (A, B, C) and LINE/L1 (D, E, F) repetitive sequences on chromosome 1. Regression lines are marked in red.

Relation between recombination sites and the length of tandem repeat sequences

Repetitive elements, including tandem repeat sequences, are distributed widely throughout the genome. Tandem DNA repeats are defined as a repeated pattern of two or more nucleotides. The pattern can range in length from 2 to ~100 base pairs (bp) (for example (CATG)n in a genomic region) [26]. In this study, a total 947,696 tandem repeats sequences were identified using the Tandem Repeats Finder [26]. The length distribution of the tandem repeats are shown in Figure 6A, where the 25, 50 and 75 percentile of the length of the tandem repeats were 4, 15 and 24 bp, respectively.
Figure 6

(A) Distribution of the length of the 947,696 tandem repeats sequences. (B) Scatter plot of the number of maternal recombination sites and the number of tandem repeat sequences. When the tandem repeat sequences are grouped into 4 quartiles according to the length of repeat sequences, scatter plots for each quartile are shown in (C) Q1, 1-4 base pairs (bp), (D) Q2, 5-15 bp, (E) Q3, 16-24 bp, and (F) Q4, larger than 25 bp, respectively. Regression lines are marked in red, and the Pearson correlation coefficients between number of maternal recombination events and the number of tandem repeat sequences are indicated.

(A) Distribution of the length of the 947,696 tandem repeats sequences. (B) Scatter plot of the number of maternal recombination sites and the number of tandem repeat sequences. When the tandem repeat sequences are grouped into 4 quartiles according to the length of repeat sequences, scatter plots for each quartile are shown in (C) Q1, 1-4 base pairs (bp), (D) Q2, 5-15 bp, (E) Q3, 16-24 bp, and (F) Q4, larger than 25 bp, respectively. Regression lines are marked in red, and the Pearson correlation coefficients between number of maternal recombination events and the number of tandem repeat sequences are indicated. We divided the genome into 2,765 bins of 1-Mb each and determined the number of tandem repeats in each bin. We then analyzed the correlation between the number of maternal meiotic recombination sites and the number of tandem repeats (Figure 6B); the correlation coefficient was 0.11 (P < 2 × 10-7). Furthermore, we grouped tandem repeats into 4 quartiles by the length of these repeat sequences, as (Q1) 1-4, (Q2) 5-15, (Q3) 16-24 and (Q4) > 25 bp. The correlation coefficients between recombination sites and the 4 quartiles were 0.25 (P < 1 × 10-16), 0.11 (P < 2 × 10-8), 0.04 (P = 0.08) and 0.03 (P = 0.16), respectively (Figures 6C-F). These results showed that the maternal meiotic recombination sites were positively correlated with shorter repeat sequences and less correlated with longer repeat sequences. Similarly, we analyzed the correlation between the number of paternal meiotic recombination sites and the number of tandem repeats, with r = 0.12 (P < 5 × 10-9). The correlation coefficients for the 4 subgroups were 0.19 (P < 1 × 10-16), 0.09 (P < 4 × 10-6), 0.09 (P < 3 × 10-6) and 0.05 (P = 0.004), respectively (Additional file 4).

Discussion

In this study, we use a PST approach to analyze the sites of meiotic recombination in two-generation pedigrees. We first tested it on a GMRCL dataset of the Affymetrix SNP 6.0 array consisting of 900 K SNP markers, followed by a 10 K GSE6754 dataset. In the GSE6754 dataset, which was previously used for mapping autism risk loci, most data are based on two-generation pedigrees (1,168 families) as this dataset contains only 29 three-generation pedigrees. Although the PST approach requires only pedigrees of two generations, it requires information from at least two siblings. The use of SNPs as genetic markers to identify recombination sites can often result in the inclusion of uninformative regions. However, the size of uninformative regions that result from the PST approach is significantly lower than that seen from the use of the IBD method (Table 1). We next assessed whether crossovers may alter the DNA sequence by causing de novo mutations at sites of recombination. Given that the uninformative regions of PST were relatively small, eight recombination events were identified with sizes of less than 2 kb. Notably, we did not identify any sequence variation at these recombination points (data not shown). This observation needs further validation by sequencing more datasets. The average number of recombination events observed with the PST approach was similar to the findings of other studies. The distribution of recombination events showed a mean value of 23.8 in paternal origin and 39.5 in maternal origin. Chowdhury et al reported the genome-wide recombination events in paternal origin ranged from 25.9 to 27.3 while in maternal origin ranged from 38.4 to 47.2 [20]. Another study by Cheung et al demonstrated that the mean numbers of recombination events were 24.0 in male meiosis and 38.4 in female meiosis [15]. In an indirect pedigree analysis using SNPs as genetic markers, Cheung et al [15] reported that several recombination events appeared to occur nearer to the telomeres. Using the PST approach, we analyzed the distance between the recombination site and the centromere for each gender separately (Table 3). In male meiosis, most of the crossovers are located in the q arms, and the number of recombination events increased significantly when moving from centromeres to telomeres. Interestingly, we observed fewer recombination events in the p arms of female chromosomes, resulting in the male-to-female ratio of 1.67 (Table 2). In women, only chromosomes 1q and 6q showed a significant, positive correlation between the number of recombination sites and distance from the centromere (Table 3). To determine the extensive sequence-context variation in recombination hotspots, Myers et al. constructed a fine-scale map of recombination rates and hotspots across the human genome based on genotypes of 1.6 million SNPs in three sample populations, including 24 European Americans, 23 African Americans, and 24 Han Chinese [27]. The authors reported an increase of recombination hotspots in the regions surrounding coding genes, though these were preferentially located outside the transcribed regions. The analysis of the relationships between recombination hotspots and repeat elements indicated that L2 and THE1B are unusually high in hotspots, whereas L1 elements are low [27]. In this study, we identified a similar pattern of frequent hotspots in L2 as opposed to the low number of hotspots in L1 elements (Table 4). Of note, results showed that the majority of the hotspots in both paternal and maternal meioses were similar.

Conclusion

Human chromosomes are characterized by prominent differences in the pattern and rate of meiotic recombination events. Significant inter-individual and gender differences also exist. The major advantages of the PST approach include the use of two-generation pedigrees with two or more siblings, fewer uninformative SNP regions, and the ability to perform gender-specific analyses of recombination hotspots (using databases derived from high density arrays such as Affymetrix SNP6.0) and repetitive elements. An accurate determination of meiotic crossovers using this approach may prove useful to explore the biology of human chromosomes.

Methods

Identification of meiotic recombination sites

In the present study we compared different SNP-based methods for detecting recombination points, i.e. IBD (Figure 1A) [12], and PST (Figure 1B). The code calling schema for the IBD and PST methods are depicted in the Additional Files 1A and 1B. The meiosis recombination sites were exported from the PSTReader, a MATLAB-based program (version 7.9). The PSTReader was used to define the recombination sites for the IBD and PST methods. The MATLAB source code, example data, and a standalone application can be freely downloaded from: http://www.mcu.edu.tw/department/biotec/en_page/PSTReader/index.htm.

GMRCL Dataset

In this study, a set of the Affymetrix Genome-Wide Human SNP array 6.0 (GMRCL dataset) consisting of 900 K SNP markers was used as a template. DNA was extracted from blood collected in a study that was approved by the Chang Gung Memorial Hospital Institute Review Board (IRB#99-0229B). SNP genotyping was performed using the SNP array 6.0 (Affymetrix, Santa Clara, CA, http://www.affymetrix.com) at the Genomic Medicine Research Core Laboratory (GMRCL), Chang Gung Memorial Hospital. The GMRCL dataset includes the genotypes of an anonymous family consisting of the paternal/maternal grandfather, paternal/maternal grandmother, father, mother and two children. The identity-delinked SNP genotypes and pedigree information for each member can be downloaded from http://www.mcu.edu.tw/department/biotec/en_page/PSTReader/index.htm.

GSE6754 Dataset

The GSE6754 dataset was downloaded from the Gene Expression Omnibus (GEO), and contains information from 6,971 Affymetrix GeneChip Human Mapping 10 K 2.0 Arrays. Data from parental and sibling genotypes are available for 1,168 families in this dataset. To increase analytic accuracy, we excluded samples with genotyping call rates less than 90%, those lacking pedigree information, and individuals with chromosomal abnormalities (n = 22) [28]. The remaining 3,864 arrays of 853 families (1,721 parents and 2,145 siblings) were included in the PST analysis of recombination events in human meiosis. The details on individual, families, and pedigrees are provided in Additional file 5.

Mapping of the recombination sites, repetitive elements and tandem repeat sequences

The recombination sites and repetitive elements were mapped using the hg18 (NCBI Build 36) human reference assembly. The classes and characters of major repetitive elements were downloaded from RepeatMasker [24], and the tandem repeat sequences were identified using the Tandem Repeats Finder program [26]. Correlations between recombination sites and repetitive elements or tandem repeat sequences were analyzed with MATLAB (version 7.9). To assess the distribution and correlation between recombination sites and repetitive elements or tandem repeat sequences, we calculated the number of recombination sites (or repetitive elements or tandem repeat sequences) using a window width set to 1 Mb. We divided the human genome into 2765 bins of 1 Mb each and determined the number of recombination sites in each bin. The distance for each 1 Mb window was calculated based on SNP positions according to the Affymetrix data, assuming a constant crossover rate between two adjacent SNP markers. To calculate the correlation coefficients between the recombination in GSE6754 map, Icelandic map and Marshfield map, we divided the human genome into 2765 bins of 1 Mb each and determined the number of recombination sites in each bin, as described above.

Abbreviations

PST: parent-sibling tracing; IBD: identity by descent; IBS: identity by state; STRP: simple tandem repeat polymorphisms; SNP: single nucleotide polymorphisms.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

YSL, AC, SMW and THW designed the study and prepared the manuscript. YSL, TC and CHC carried out the statistical analysis. YSL and THW carried out the Affymetrix microarray experiments, obtained the clinical materials and analyzed clinical information. All authors read and approved the final manuscript.

Additional file 1

Calling schema. Tables with calling schema for analyzing meiosis, identity by descent (IBD) and parent-sibling tracing (PST). Click here for file

Additional file 2

Paternal recombination site along the chromosomes. The paternal recombination site of child 1 and 2 of GMRCL dataset (CH1 and CH2, defined in Figure 1) along chromosomes are demonstrated in figures by the identity by descent (IBD) and parent-sibling tracing (PST) methods. Click here for file

Additional file 3

Distribution of recombination events. Figures illustrating the distribution of the 2,145 paternal and 2,145 maternal recombination events in human for each chromosome. Click here for file

Additional file 4

Correlation between tandem repeats sequences and paternal recombination sites. Distribution of the length of the tandem repeats sequences and scatter plot of the number of paternal recombination sites with the tandem repeats sequences. Click here for file

Additional file 5

Detailed information of GSE6754 dataset. Family ID, individual ID and the pedigree relative of the analyzed 3864 samples which were downloaded from GEO, GSE 6754. Click here for file
  26 in total

1.  A high-resolution recombination map of the human genome.

Authors:  Augustine Kong; Daniel F Gudbjartsson; Jesus Sainz; Gudrun M Jonsdottir; Sigurjon A Gudjonsson; Bjorgvin Richardsson; Sigrun Sigurdardottir; John Barnard; Bjorn Hallbeck; Gisli Masson; Adam Shlien; Stefan T Palsson; Michael L Frigge; Thorgeir E Thorgeirsson; Jeffrey R Gulcher; Kari Stefansson
Journal:  Nat Genet       Date:  2002-06-10       Impact factor: 38.330

2.  Fine-scale recombination rate differences between sexes, populations and individuals.

Authors:  Augustine Kong; Gudmar Thorleifsson; Daniel F Gudbjartsson; Gisli Masson; Asgeir Sigurdsson; Aslaug Jonasdottir; G Bragi Walters; Adalbjorg Jonasdottir; Arnaldur Gylfason; Kari Th Kristinsson; Sigurjon A Gudjonsson; Michael L Frigge; Agnar Helgason; Unnur Thorsteinsdottir; Kari Stefansson
Journal:  Nature       Date:  2010-10-28       Impact factor: 49.962

Review 3.  Distribution of meiotic recombination events: talking to your neighbors.

Authors:  Enrique Martinez-Perez; Monica P Colaiácovo
Journal:  Curr Opin Genet Dev       Date:  2009-03-26       Impact factor: 5.578

4.  PRDM9 is a major determinant of meiotic recombination hotspots in humans and mice.

Authors:  F Baudat; J Buard; C Grey; A Fledel-Alon; C Ober; M Przeworski; G Coop; B de Massy
Journal:  Science       Date:  2009-12-31       Impact factor: 47.728

5.  Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination.

Authors:  Simon Myers; Rory Bowden; Afidalina Tumian; Ronald E Bontrop; Colin Freeman; Tammie S MacFie; Gil McVean; Peter Donnelly
Journal:  Science       Date:  2009-12-31       Impact factor: 47.728

6.  Prdm9 controls activation of mammalian recombination hotspots.

Authors:  Emil D Parvanov; Petko M Petkov; Kenneth Paigen
Journal:  Science       Date:  2009-12-31       Impact factor: 47.728

7.  Locations and patterns of meiotic recombination in two-generation pedigrees.

Authors:  Jason C Ting; Elisha D O Roberson; Duane G Currier; Jonathan Pevsner
Journal:  BMC Med Genet       Date:  2009-09-17       Impact factor: 2.103

8.  Genetic analysis of variation in human meiotic recombination.

Authors:  Reshmi Chowdhury; Philippe R J Bois; Eleanor Feingold; Stephanie L Sherman; Vivian G Cheung
Journal:  PLoS Genet       Date:  2009-09-18       Impact factor: 5.917

9.  The UCSC Genome Browser Database: update 2009.

Authors:  R M Kuhn; D Karolchik; A S Zweig; T Wang; K E Smith; K R Rosenbloom; B Rhead; B J Raney; A Pohl; M Pheasant; L Meyer; F Hsu; A S Hinrichs; R A Harte; B Giardine; P Fujita; M Diekhans; T Dreszer; H Clawson; G P Barber; D Haussler; W J Kent
Journal:  Nucleic Acids Res       Date:  2008-11-07       Impact factor: 16.971

10.  Broad-scale recombination patterns underlying proper disjunction in humans.

Authors:  Adi Fledel-Alon; Daniel J Wilson; Karl Broman; Xiaoquan Wen; Carole Ober; Graham Coop; Molly Przeworski
Journal:  PLoS Genet       Date:  2009-09-18       Impact factor: 5.917

View more
  6 in total

1.  MRLR: unraveling high-resolution meiotic recombination by linked reads.

Authors:  Peng Xu; Timothy Kennell; Min Gao; Robert P Kimberly; Zechen Chong
Journal:  Bioinformatics       Date:  2020-01-01       Impact factor: 6.937

2.  Genetic Effect of Chemotherapy Exposure in Children of Testicular Cancer Survivors.

Authors:  Gregory V Kryukov; Craig M Bielski; Kaitlin Samocha; Menachem Fromer; Sara Seepo; Carleen Gentry; Benjamin Neale; Levi A Garraway; Christopher J Sweeney; Mary-Ellen Taplin; Eliezer M Van Allen
Journal:  Clin Cancer Res       Date:  2015-12-02       Impact factor: 12.531

3.  High-resolution genotyping and mapping of recombination and gene conversion in the protozoan Theileria parva using whole genome sequencing.

Authors:  Sonal Henson; Richard P Bishop; Subhash Morzaria; Paul R Spooner; Roger Pelle; Lucy Poveda; Martin Ebeling; Erich Küng; Ulrich Certa; Claudia A Daubenberger; Weihong Qi
Journal:  BMC Genomics       Date:  2012-09-23       Impact factor: 3.969

4.  A Glance at Recombination Hotspots in the Domestic Cat.

Authors:  Hasan Alhaddad; Chi Zhang; Bruce Rannala; Leslie A Lyons
Journal:  PLoS One       Date:  2016-02-09       Impact factor: 3.240

5.  Fine-scale mapping of meiotic recombination in Asians.

Authors:  Thomas Bleazard; Young Seok Ju; Joohon Sung; Jeong-Sun Seo
Journal:  BMC Genet       Date:  2013-03-08       Impact factor: 2.797

6.  Genomic analysis of hESC pedigrees identifies de novo mutations and enables determination of the timing and origin of mutational events.

Authors:  Dalit Ben-Yosef; Francesca S Boscolo; Hadar Amir; Mira Malcov; Ami Amit; Louise C Laurent
Journal:  Cell Rep       Date:  2013-09-12       Impact factor: 9.423

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.