Literature DB >> 23001126

Estimating the human mutation rate using autozygosity in a founder population.

Catarina D Campbell¹, Jessica X Chong, Maika Malig, Arthur Ko, Beth L Dumont, Lide Han, Laura Vives, Brian J O'Roak, Peter H Sudmant, Jay Shendure, Mark Abney, Carole Ober, Evan E Eichler.

Abstract

Knowledge of the rate and pattern of new mutation is critical to the understanding of human disease and evolution. We used extensive autozygosity in a genealogically well-defined population of Hutterites to estimate the human sequence mutation rate over multiple generations. We sequenced whole genomes from 5 parent-offspring trios and identified 44 segments of autozygosity. Using the number of meioses separating each pair of autozygous alleles and the 72 validated heterozygous single-nucleotide variants (SNVs) from 512 Mb of autozygous DNA, we obtained an SNV mutation rate of 1.20 × 10(-8) (95% confidence interval 0.89-1.43 × 10(-8)) mutations per base pair per generation. The mutation rate for bases within CpG dinucleotides (9.72 × 10(-8)) was 9.5-fold that of non-CpG bases, and there was strong evidence (P = 2.67 × 10(-4)) for a paternal bias in the origin of new mutations (85% paternal). We observed a non-uniform distribution of heterozygous SNVs (both newly identified and known) in the autozygous segments (P = 0.001), which is suggestive of mutational hotspots or sites of long-range gene conversion.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2012 PMID： 23001126 PMCID： PMC3483378 DOI： 10.1038/ng.2418

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

The rate and pattern of new mutation is critical to the understanding of human disease and evolution. We used extensive autozygosity in a genealogically well-defined population of Hutterites to estimate the mutation rate over multiple generations. We sequenced whole genomes from five parent-offspring trios and identified 44 segments of autozygosity. We computed the number of meioses separating each pair of autozygous alleles and validated 72 heterozygous single nucleotide variants (SNVs) from 512 Mbp of autozygous DNA providing an SNV mutation rate of 1.20×10−8 (95% confidence interval 0.89–1.43×10−8) mutations per basepair per generation. We observed a 9.5-fold increase for bases within CpG dinucleotides (9.72×10−8) and strong evidence (p = 2.67×10−4) for a paternal bias in the origin of new mutations (85% paternal). We observed a non-uniform distribution of heterozygous SNVs (both novel and known) in the autozygous segments (p = 0.001) suggestive of mutational hotspots or sites of long-range gene conversion. Various approaches have provided a wide range of SNV mutation rate estimates (1–3×10−8 per basepair per generation). Early studies of mutation rates in humans focused on specific loci or the de novo incidence of disease [1-4]. More recent studies have leveraged whole-genome sequencing (WGS) data on a total of three nuclear families to estimate de novo mutation rates for SNVs of about 1×10−8 [Ref. 5–6]. Comparative studies of chimpanzee and human genomes provided higher estimates (i.e. 2.5×10−8) but are highly contingent upon uncertainty in the number of generations since the human-chimpanzee divergence [7]. In contrast to studies focused on identifying new mutations arising in a single generation, populations with a small number of founding individuals provide an ideal resource for estimating mutation rates across a small number of generations. The Hutterites are a population of Anabaptist farmers living in the plains of the United States and Canada who descended from a small group of founders (<90 individuals). The genealogy is completely known and genome-wide single nucleotide polymorphism (SNP) genotype data have been collected from over 1400 individuals who are related to each other in a 13-generation pedigree descended from 64 founders [8-9]. Due to increased levels of consanguinity, Hutterite individuals carry large segments of the genome that are autozygous or homozygous by recent decent [10]. The alleles in an autozygous segment are descended from a recent common ancestor and have accumulated mutations in the generations since transmission from this individual. We selected five Hutterite parent-offspring trios for whole-genome sequencing (WGS) where the parents in each trio are related to each other within 6–8 (mean = 6.6) meiotic transmissions (Figure 1). We performed WGS on DNA isolated from whole blood using Illumina paired-end sequencing. We generated 775 gigabases (Gbp) of sequence with an average of 13-fold coverage per individual (Supplementary Table 1). We aligned the sequencing reads for each sample to the human reference genome (NCBI build 36). We identified a total of 5.4 million SNVs based on the intersection of two different algorithms [11-12] (Supplementary Table 2). The SNP genotypes from WGS were highly concordant to SNP microarray (mean genotype concordance = 99.7%) (Supplementary Table 2).

Figure 1

Relationship of sequenced individuals

A simplified pedigree showing the relationship between the 15 sequenced individuals is shown. Children in the five trios are colored in black and their parents in gray. Founders are connected by shades of blue lines denoting the number of generations separating those individuals. For clarity, only the shortest relationships between each individual and the parents of each individual are shown.

We identified extended regions of homozygosity in the offspring of the five trios and in five previously sequenced genomes (three European Americans and two Yoruba) [13] (Methods). The extent of homozygosity was correlated to the inbreeding coefficients of the Hutterite individuals (Supplementary Table 3; Supplementary Fig. 1; Supplementary Note). As expected, the five Hutterite probands showed significantly greater autozygosity (223 Mbp average per individual) than other European-American individuals (95 Mbp) or the Yoruba individuals (4 Mbp) (Figure 2; Supplementary Table 3). Although the amount of short homozygous segments was similar, we observed a 33-fold increase in autozygous basepairs in segments greater than 2 Mbp in the Hutterite individuals compared to the other European-American individuals (Figure 2; Supplementary Note). We further refined and validated segments that were longer than 5 Mbp by comparing to SNP microarray data for the same samples [10,15] to obtain a final list of 44 regions of autozygosity (6–12 segments per individual; 5–54 Mbp in length). We restricted subsequent analyses to these 512 Mbp of autozygous DNA (Supplementary Fig 2; Supplementary Table 4).

Figure 2

Elevated autozygosity in the Hutterite individuals

Autozygous segments were binned by size for the five Hutterite individuals, three European-American individuals, and two Yoruba individuals. The x-axis represents bins of autozygous segment sizes and the y-axis is the number of segments in each bin. Each individual is represented by a “bubble” for each bin where the size of the bubble represents the total amount of genomic sequence in that bin.

We determined the number of meioses separating each allele within each autozygous segment. The small founding population and complex genealogy (Figure 1) of the Hutterite population made this potentially problematic because of the large number of shared common ancestors (CAs) and multiple paths of descent between any ancestor-descendant pair. To resolve this, we combined both the pedigree structure and genome-wide SNP genotype data[9] to identify the most recent common ancestors (MRCAs) based on segregation within the Hutterite genealogy [8] (Figure 3; Supplementary Fig. 3; Supplementary Note). We estimated that the two haplotypes of the 44 autozygous segments are separated by 8–18 meioses based on the MRCAs (Supplementary Table 4).

Figure 3

Determination of the MRCA for an autozygous segment

A) A 54 Mbp autozygous segment on chromosome 2 in Individual 3. Genomic coordinates (hg18) are represented horizontally, and each individual is represented vertically: the five Hutterite individuals, followed by the three European-American individuals, and then the two Yoruba. Each SNV is represented by a vertical bar colored blue if the variant is homozygous and green if it is heterozygous. The autozygous segment in Individual 3 is boxed in orange. B) Determination of the MRCA for this autozygous segment. The pedigree containing all the haplotype carriers of the autozygous haplotype is shown. Cyan lines connect the same individuals who are represented twice in the pedigree. Individual 3 is shown in yellow. All samples with SNP microarray data are shown with red arrows, and haplotype carriers are shown in gray. These haplotype carriers have two MRCAs (boxed) as well as additional CAs further up the pedigree. The paths from these individuals to the autozygous subject are shown in red for the maternal ancestors and blue for the paternal ancestors; all ancestors of the individual are marked with a star.

To calculate the SNV mutation rate, we identified heterozygous SNVs within each autozygous segment, excluding regions of common repeats, segmental duplication, and known SNPs (dbSNP132). We validated 72 SNVs as heterozygous by Sanger-based capillary sequencing (Table 1; Supplementary Table 5; Supplementary Note). We calculated an SNV mutation rate (μ) of 1.20×10−8 mutations per basepair per generation (95% confidence intervals (CI): 0.89×10−8 – 1.43×10−8). We observed consistent μ across the five trios with a range between 0.92×10−8 and 1.51×10−8 (Figure 4; Table 1). Among these mutations, we observed an excess of transitions relative to transversions (Ti/Tv = 1.64) that was not significantly different from the genome-wide SNV ratio of 2.17 (two-tailed chi-square p = 0.27). Twelve of the 72 validated heterozygous SNVs (16.7%) mapped to CpG dinucleotides. We calculated μ for CpG sites of 9.72×10−8 per CpG basepair per generation, 9.5X greater than the μ for non-CpG bases (1.02×10−8). We also estimated mutation rate based on de novo mutations in the most recent generation (Supplementary Note; Supplementary Table 6). Based on 176 validated de novo SNVs, we calculated a mutation rate of 0.96×10−8 (95% CI: 0.82×10−8 – 1.09×10−8); although this rate is lower than calculated using autozygosity, the confidence intervals of these rates overlap (Figure 4).

Table 1

SNV mutation rates determined from segments of autozygosity

individual	Segments (>5 Mbp)	Total callable (Mbp)*	Mean meioses (MRCA)†	SNVs‡	SNV μ	95% CI§
1	7	63.4	13.8	13	1.51×10⁻⁸	0.62–2.28×10⁻⁸
2	6	55.9	13.8	7	0.92×10⁻⁸	0.17–1.72×10⁻⁸
3	9	124.8	9.9	13	1.07×10⁻⁸	0.45–1.63×10⁻⁸
4	10	147.6	12.0	19	1.09×10⁻⁸	0.56–1.55×10⁻⁸
5	12	120.8	12.0	20	1.40×10⁻⁸	0.73–1.96×10⁻⁸
ALL	44	512.4	11.9	72	1.20×10⁻⁸	0.89–1.43×10⁻⁸

Non-segmental duplication, non-simple repeat and non-dbSNP132 with at least six mapped reads.

Weighted by length of segment.

Validated novel heterozygous.

Based on a Poisson distribution.

Figure 4

SNV mutation rate estimates

The SNV mutation rate point estimates are shown for each individual and all five individuals combined with the error bars representing the 95% confidence intervals based on a Poisson distribution. μ – SNV mutations per generation per basepair. Filled diamonds are estimates from autozygous segments and open diamonds are estimates from the most recent generation.

We identified and validated one potential gene conversion event involving paralogs of segmental duplications containing the genes C4A and C4B—a region where lower copy number has been associated with lupus [16]. Although individual 4 has a total diploid copy number of six for this CNV, we determined that the sequence content of the two alleles differ (Supplementary Fig. 4) likely as a result of gene conversion between paralogous copies of, at a minimum, the gene TNX (6 kbp). Both theoretical and experimental analyses have predicted that the male germline contributes disproportionately to de novo mutations as compared to the female germline [7,17-18]. However, a recent analysis on two parent-offspring trios reported a paternal bias in mutation in one trio and a maternal bias in the other [5]. Given the complexity of the Hutterite pedigree and transmissions through multiple female and male ancestors, we focused on the putative genome-wide de novo mutations in the most recent generation. We used molecular phasing [5,12,18] to determine the parental origin of 26 of the 176 validated de novo SNVs, and we found that 84.6% (22/26) of de novo SNPs originated on the paternal haplotype confirming a male bias for new SNVs (two-tailed binominal p = 2.67×10−4; 95% CI – 70.8–98.5%). One advantage of using autozygosity is the ability to identify potential gene conversion events between homologous chromosomes. Such events could lead to clusters of heterozygous SNVs (including known SNPs) within regions of autozygosity, and we identified four clusters (two or more SNVs mapping within 10 kbp of each other) (Table 2). One of these clusters is 309 kbp suggesting that this is most likely a product of crossover events [19]. Excluding this large cluster, the average distance between heterozygous SNPs in the remaining three clusters is 2723 bp (range: 7–7839 bp). We tested this distribution by simulation (n = 10,000 replicates) and determined a significant excess of “clustered” SNVs (empirical p = 0.001).

Table 2

Clusters of heterozygous SNVs

Individual	chr	start	end	length	heterozygous SNVs (novel)*	CpG SNVs†	GC%‡	Meioses§
1	1	4403120	4414313	11193	3 (1)	0	0.46	15.5
1	1	74906779	74909810	3031	2 (2)	0	0.36	14.0
5	1	10788610	10793450	4840	5 (0)	1	0.50	9.0
5	2	211397873	211706890	309017	66 (1)	8	0.35	9.0
1	7	45928832	45930883	2051	2 (2)	0	0.47	2.0
1	16	77252309	77256230	3921	2 (2)	0	0.43	2.0
2	1	189717619	189717626	7	2 (2)	0	0.31	2.0

Total number of SNVs in the cluster with the number of novel (non-dbSNP132) SNVs in parentheses.

SNVs in CpG dinucleotides.

Percent of G and C bases in the heterozygous cluster.

Number of meioses in which event(s) occurred based on the MRCA.

We also tested whether the de novo SNVs in the most recent generation were uniformly distributed in the genome. Interestingly, we observed three clusters of validated de novo variants (Table 2; Supplementary Table 5) and a significant excess (empirical p = 6×10−6) of de novo SNVs in close proximity (<10 kbp) based on simulation (n = 1,000,000 replicates). There has recently been much interest in using massively parallel sequence data to obtain an accurate estimate of mutation rate using nuclear families [5-6]. We developed an approach using extended regions of autozygosity to discover new mutations that have emerged within a few generations. Compared to analyses focused on de novo mutations in a single generation, we significantly reduced the number of false positives and somatic mutations, since most mutations in autozygous segments are transmitted from one of the parents. In addition, given the relationship between paternal age and number of de novo mutations [18,20], our approach reduces this confounding effect by yielding an average mutation rate over 8–18 meioses. Disadvantages include uncertainties in ancestry of the autozygous segments (Supplementary Note), the smaller genomic “search space” (512 Mbp), the potential to confound new mutation and gene conversion events, and an increased potential for purifying selection to eliminate a small fraction of new mutations, although fraction of such events should be negligible [21]. We have tried to reduce the confounding effect of gene conversion by limiting our analysis to novel SNVs. We estimated an SNV mutation rate of 1.20×10−8 using autozygous segments, which is higher than the rate of 0.96×10−8 that we estimated for the most recent generation and the rate 1.1×10−8 previously published for the whole genome [5-6] yet lower than the rate estimated in a recent resequencing study [22]. While this manuscript was under review, two additional studies of mutation rate were published. First, a mutation rate of 1.2×10−8 was calculated from an analysis of WGS in over 70 trios[20], which is equal to that obtained in our analysis suggesting the accuracy of using autozygosity to estimate mutation rate. Interestingly, the quantification of the correlation between number of mutations and paternal age reported by Kong et al.[20] suggests that the relatively young age of the father of the trios analyzed here (21–30 years old at the time of the child’s birth) may provide an explanation for the lower mutation rate we observed in the most recent generation. In a second publication, an inferred mutation rate of 1.82×10−8 was calculated by modeling population genetic parameters based on the mutational properties of microsatellites[23]; the difference between this estimate and our estimates are likely due to differences in methodology. We observed a non-uniform distribution of a small fraction of mutations within autozygous segment that appear to be evidence of recent allelic gene conversion. We observed three clusters that were unlikely generated by crossover mechanisms and might represent potential allelic gene conversion events in the autozygous segments, although one cluster was larger (11 kbp) than expected for typical gene conversion events [24]. Only one of the ten SNVs in these clusters was in a CpG, and the GC content (0.36–0.50) of these three regions was not consistent with a model of recurrent mutation due to CpG methylation and demethylation. The average distance between heterozygous SNPs in these clusters was 2723 bp, ruling out compound mutation [25] as a likely mechanism. Intriguingly, one of these clusters of heterozygous SNVs is comprised of two novel SNVs (i.e. not in dbSNP) and could be further evidence of a non-uniform distribution of new mutations similar to what we observed with the de novo mutations. In addition, we observed an excess of heterozygous bases at dbSNP positions in autozygous segments (N = 22), most of which were not clustered with other heterozygous variants (16 out of 22) but may also be evidence of recent gene conversion. Notably, we observed a non-uniform distribution (three clusters with two de novo SNVs within 10 kbp; range 7–3921 bp) (empirical p = 6×10−6) among the validated de novo events. One of these clusters contained SNVs 7 bp apart suggesting a compound mutational event [25]; based on this event, 0.97% of de novo mutations were part of multi-nucleotide mutations (95% CI (Wilson method) – 0.27–3.5%). While this is somewhat lower than the estimate of 2–3% of de novo mutations in compound mutational events based on WGS of two trios [25], the confidence intervals do overlap. The remaining two clusters contained SNVs greater than 2 kbp apart suggesting that, even at greater distances, mutational mechanisms do not produce uniform distributions of new mutations. We have presented a novel approach for estimating mutation rate in humans. We based our analysis on autozygosity and WGS from whole blood DNA to remove the effects somatic mutations and cell line artifacts. In addition, we were able to detect other recent changes in the genome including gene conversion events. Furthermore, we believe that the application of this approach to additional families has the potential to elucidate the dynamics of other forms of mutation including CNVs and indels.

METHODS

DNA samples and whole-genome sequencing

We selected five parent-offspring trios for WGS from a 13-generation pedigree of Hutterites from South Dakota (Figure 1). Detailed phenotyping data are available for all individuals in this study design. None of the individuals studied have a Mendelian disorder. All individuals consented, and the project was approved by the institutional review boards (IRBs) at The University of Chicago and the University of Washington. One library for each individual was constructed from DNA isolated from whole blood using Illumina recommended protocols. Briefly, 1 to 3 ug of DNA were fragmented using sonication. The resulting fragments were end-repaired, adenosine overhangs added, and adaptors ligated. Size selection was performed by running the library on an agarose gel and cutting a band around 400 bp. The size-selected libraries were amplified using quantitative PCR. The resulting libraries were sequenced using an Illumina HiSeq 2000 to generate 51 and 101 bp paired-end reads. We generated 36–68 Gbp of sequencing data for each of the 15 individuals (10–17X coverage of the human genome) (Supplementary Table 1). Sequence data have been deposited into dbGAP; accession numbers pending.

Sequence data analysis and SNV identification

We aligned the paired-end reads to the NCBI build 36 reference of the human genome using BWA (version 0.5.9) with standard parameters [26]. The quality scores for the mapped reads were recalibrated and PCR duplicates were removed using Picard (version 1.43). Then, we realigned the reads around potential indels to reduce spurious SNV calls due to misalignment in these regions using the Genome Analysis Toolkit (GATK) software (version 1.0.5777) [12]. We used both GATK and SAMtools (version 0.1.8) [11] to identify SNVs. After generating an initial list of SNVs, we used the following approaches to generate a high-confidence variant list. First, we applied variant recalibration to the variants identified with GATK and used a variant quality score recalibration (VQSR) threshold of 2.30 (99% of known high-quality SNPs identified) to generate a final list of GATK variants [27]. In addition, we applied the standard recommended GATK filters and filtered out SNVs located near indels. For the SAMtools call set, we filtered calls with a quality score less than 10. Then, we used the intersection of the GATK and SAMtools call sets for further analysis.

Concordance with SNP microarray genotypes and false negative rate estimation

To assess the quality of our SNV genotypes from WGS, we compared these data to genotype data generated for these same individuals on Affymetrix SNP microarrays (versions 500K and 6.0) [9]. We assessed the genotype concordance for the 211,438–468,681 SNPs that passed quality control metrics on the microarray for each individual (Supplementary Table 1). We estimated the false negative rate for heterozygous variants, since our calculations of mutation rate are based on heterozygous SNVs and these variants are more likely to be missed in moderate coverage WGS data. For each individual, we determined the number of heterozygous SNPs genotyped by SNP microarray and then determined how many of these variants were missed in the WGS (i.e. called as homozygous). We calculated false negative rate as the number of missed heterozygous SNPs divided by the total number of heterozygous SNPs (Supplementary Table 1).

Identification and definition of autozygous segments

We identified long stretches of homozygosity in the genomes of the five children of the Hutterite trios, two European Americans from the Centre d’Etude du Polymorphisme Humain collection (CEU) [13], two Yoruba from Ibadan, Nigeria (YRI) [13], and one additional European-American male. We ran PLINK [14] on all filtered SNV genotypes with variants in segmental duplications removed to avoid artifacts due to paralogous sequence variation. We required a minimum homozygosity length of 600 kbp, a maximum gap in homozygosity of 100 kbp, and a maximum of three heterozygous SNPs per 5 Mbp. We merged all segments within 50 kbp, since it appeared that some regions, especially in the Hutterites, were erroneously split. We did not consider regions with >20% gaps or segmental duplications in the reference assembly. We compared the resulting list of large autozygous segments (>5 Mbp) to regions of likely autozygosity determined with SNP microarray genotypes and the 3555 member pedigree using IBDLD [15]. The intersection of regions determined by genome sequence data and those determined from SNP microarrays gave us a final list of autozygous regions greater than 5 Mbp. We trimmed all 44 segments by 100 kbp because we observed an excess of heterozygous variants at the edges (Supplementary Fig. 5).

Determination of common ancestors for autozygous segments

The 15 sequenced individuals are part of a larger 13-generation pedigree of Hutterites, and we made use of this pedigree information to trace each autozygous segment to the MRCA of all carriers as previously described [8]. We used genome-wide SNP genotypes of 1415 individuals from this large pedigree [8-9] to identify all individuals who carried at least one allele identical by state (IBS) to sequenced individuals with the autozygous segment across the majority of the SNPs in the autozygous region. We considered individuals to be a carrier of the autozygous haplotype if he/she was IBS >= 1 with the sequenced individual at >99% of the genotyped SNPs in the autozygous segment; the median number of SNPs used was 1204 (range: 84–5493). After we identified all haplotype carriers for an autozygous segment, we examined the pedigree to identify all CAs of these individuals. We defined the MRCA as the individual with the smallest mean number of meioses to all haplotype carriers. To estimate mutation rate, we used the mean number of meioses in all paths from the MRCA(s) to the autozygous individual (Supplementary Note).

Identification of heterozygous SNVs in regions of autozygosity

We intersected our list of autozygous segments in each sample with the list of quality-filtered SNVs identified in that sample. Then, for each heterozygous SNV identified in the autozygous sample, we applied an additional read-depth filter requiring a minimum of six sequencing reads at that location in the heterozygous individual. We did not consider SNVs in simple-repeats (based on the Tandem Repeats Finder track for hg18 in the UCSC Genome Browser), segmental duplications, or in dbSNP132. We identified a total of 85 novel, heterozygous SNVs in the refined autozygous regions (Supplementary Table 5), and attempted to validate these variants with Sanger sequencing (Supplementary Note).

Determination of SNV mutation rate using autozygosity

For our mutation rate calculations, we determined the number of basepairs in each autozygous segment that we were able to test for heterozygous variants. We counted the number of unique (i.e. not in segmental duplications), non-simple repeat, non-dbSNP132 bases with a read depth of at least six that were callable by GATK; these “callable” bases served as the denominator in our mutation rate calculation. To calculate mutation rate for each individual and across all five individuals, we applied the following formula: where N is the number of validated heterozygous SNVs in the autozygous regions for that individual, FNR is the false negative rate for heterozygous SNVs, G is the weighted average of the number of meioses separating the alleles of autozygous segments (weighted by length of segment) for that individual, and L is the sum of callable basepairs in autozygous segments in that individual. The same formula was applied to calculate the mutation rate across all five individuals where N is the 72 validated heterozygous SNVs, G is 11.9 — the weighted mean (by segment length) number of meioses separating the two alleles of all autozygous segments, and L is the 512.4 Mbp of total callable basepairs in autozygous segments in the five individuals. To calculate the mutation rate at CpG basepairs, we considered only the estimated number of true heterozygous SNVs at CpG bases divided by “callable” CpG basepairs and the number of generations to the MRCA.

Determination of mutation rate using de novo mutations in the most recent generation

We identified putative de novo SNVs across the whole genome as those variants observed only in a single individual (i.e. not observed in the parents or any of the other Hutterite genomes). We did not consider variants in segmental duplications, simple repeats, or known SNPs (dbSNP132), and we required a read depth of at least six for all individuals in the trio. After applying these filters, we obtained a list of 632 putative de novo variants (Supplementary Table 7). Using molecular inversion probes (MIPs) [28-29], we attempted to capture and resequence these putative de novo variants (Supplementary Note). We calculated the mutation rate using the following equation: where N is the total number of putative de novo mutations, FDR is the false discovery rate for putative de novo SNVs (Supplementary Note; Supplementary Table 6), FNR is the false negative rate, and L is the total callable basepairs with read depth of at least six in all members of the trio in the genome.

Simulations of mutation clusters

To determine whether there was a non-random distribution of heterozygous SNVs in autozygous segments, we performed the following simulations. We randomly permuted the positions of heterozygous SNVs in autozygous blocks that contained greater than one heterozygous SNV and determined the number of clusters of SNVs with less than 10 kbp. We repeated this process 10,000 times and determined an empirical p-value based on the number of simulations with at least three heterozygous SNV clusters (i.e. heterozygous SNVs within 10 kbp). We performed a similar analysis for the de novo SNVs by permuting 1,000,000 times the locations of the estimated number of true de novo SNVs in each sample within the callable regions of the genome.

28 in total

1. Quantitative-trait homozygosity and association mapping and empirical genomewide significance in large, complex pedigrees: fasting serum-insulin level in the Hutterites.

Authors: Mark Abney; Carole Ober; Mary Sara McPeek
Journal: Am J Hum Genet Date: 2002-03-04 Impact factor: 11.025

2. Estimate of the mutation rate per nucleotide in humans.

Authors: M W Nachman; S L Crowell
Journal: Genetics Date: 2000-09 Impact factor: 4.562

3. PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors: Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal: Am J Hum Genet Date: 2007-07-25 Impact factor: 11.025

Review 4. Rates of spontaneous mutation.

Authors: J W Drake; B Charlesworth; D Charlesworth; J F Crow
Journal: Genetics Date: 1998-04 Impact factor: 4.562

5. Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases.

Authors: Alexey S Kondrashov
Journal: Hum Mutat Date: 2003-01 Impact factor: 4.878

6. Gene copy-number variation and associated polymorphisms of complement component C4 in human systemic lupus erythematosus (SLE): low copy number is a risk factor for and high copy number is a protective factor against SLE susceptibility in European Americans.

Authors: Yan Yang; Erwin K Chung; Yee Ling Wu; Stephanie L Savelli; Haikady N Nagaraja; Bi Zhou; Maddie Hebert; Karla N Jones; Yaoling Shu; Kathryn Kitzmiller; Carol A Blanchong; Kim L McBride; Gloria C Higgins; Robert M Rennebohm; Robert R Rice; Kevin V Hackshaw; Robert A S Roubey; Jennifer M Grossman; Betty P Tsao; Daniel J Birmingham; Brad H Rovin; Lee A Hebert; C Yung Yu
Journal: Am J Hum Genet Date: 2007-04-26 Impact factor: 11.025

7. Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations.

Authors: Brian J O'Roak; Laura Vives; Santhosh Girirajan; Emre Karakoc; Niklas Krumm; Bradley P Coe; Roie Levy; Arthur Ko; Choli Lee; Joshua D Smith; Emily H Turner; Ian B Stanaway; Benjamin Vernot; Maika Malig; Carl Baker; Beau Reilly; Joshua M Akey; Elhanan Borenstein; Mark J Rieder; Deborah A Nickerson; Raphael Bernier; Jay Shendure; Evan E Eichler
Journal: Nature Date: 2012-04-04 Impact factor: 49.962

8. Autozygome maps dispensable DNA and reveals potential selective bias against nullizygosity.

Authors: Hanif G Khalak; Salma M Wakil; Faiqa Imtiaz; Khushnooda Ramzan; Batoul Baz; Abeer Almostafa; Samya Hagos; Fatema Alzahrani; Nada Abu-Dhaim; Leen Abu Safieh; Latifa Al-Jbali; Mohammed S Al-Hamed; Dorota Monies; Mohammed Aldahmesh; Mohammed S Al-Dosari; Namik Kaya; Hanan Shamseldin; Ranad Shaheen; May Al-Rashed; Mais Hashem; Nada Al-Tassan; Brian Meyer; Anas M Alazami; Fowzan S Alkuraya
Journal: Genet Med Date: 2012-01-05 Impact factor: 8.822

9. A direct characterization of human mutation based on microsatellites.

Authors: James X Sun; Agnar Helgason; Gisli Masson; Sigríður Sunna Ebenesersdóttir; Heng Li; Swapan Mallick; Sante Gnerre; Nick Patterson; Augustine Kong; David Reich; Kari Stefansson
Journal: Nat Genet Date: 2012-08-23 Impact factor: 38.330

10. Rate of de novo mutations and the importance of father's age to disease risk.

Authors: Augustine Kong; Michael L Frigge; Gisli Masson; Soren Besenbacher; Patrick Sulem; Gisli Magnusson; Sigurjon A Gudjonsson; Asgeir Sigurdsson; Aslaug Jonasdottir; Adalbjorg Jonasdottir; Wendy S W Wong; Gunnar Sigurdsson; G Bragi Walters; Stacy Steinberg; Hannes Helgason; Gudmar Thorleifsson; Daniel F Gudbjartsson; Agnar Helgason; Olafur Th Magnusson; Unnur Thorsteinsdottir; Kari Stefansson
Journal: Nature Date: 2012-08-23 Impact factor: 49.962

103 in total

1. The impact of accelerating faster than exponential population growth on genetic variation.

Authors: Mark Reppell; Michael Boehnke; Sebastian Zöllner
Journal: Genetics Date: 2013-12-30 Impact factor: 4.562

2. General triallelic frequency spectrum under demographic models with variable population size.

Authors: Paul A Jenkins; Jonas W Mueller; Yun S Song
Journal: Genetics Date: 2013-11-08 Impact factor: 4.562

3. A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data.

Authors: Ying Zhou; Sharon R Browning; Brian L Browning
Journal: Am J Hum Genet Date: 2020-03-12 Impact factor: 11.025

4. Novel diet-related mouse model of colon cancer parallels human colon cancer.

Authors: Anil R Prasad; Shilpa Prasad; Huy Nguyen; Alexander Facista; Cristy Lewis; Beryl Zaitlin; Harris Bernstein; Carol Bernstein
Journal: World J Gastrointest Oncol Date: 2014-07-15

5. A Bayesian framework for de novo mutation calling in parents-offspring trios.

Authors: Qiang Wei; Xiaowei Zhan; Xue Zhong; Yongzhuang Liu; Yujun Han; Wei Chen; Bingshan Li
Journal: Bioinformatics Date: 2014-12-21 Impact factor: 6.937

6. De novo single-nucleotide and copy number variation in discordant monozygotic twins reveals disease-related genes.

Authors: Nirmal Vadgama; Alan Pittman; Michael Simpson; Niranjanan Nirmalananthan; Robin Murray; Takeo Yoshikawa; Peter De Rijk; Elliott Rees; George Kirov; Deborah Hughes; Tomas Fitzgerald; Mark Kristiansen; Kerra Pearce; Eliza Cerveira; Qihui Zhu; Chengsheng Zhang; Charles Lee; John Hardy; Jamal Nasir
Journal: Eur J Hum Genet Date: 2019-03-18 Impact factor: 4.246

7. Accuracy of Demographic Inferences from the Site Frequency Spectrum: The Case of the Yoruba Population.

Authors: Marguerite Lapierre; Amaury Lambert; Guillaume Achaz
Journal: Genetics Date: 2017-03-24 Impact factor: 4.562

Review 8. Genetic drift, selection and the evolution of the mutation rate.

Authors: Michael Lynch; Matthew S Ackerman; Jean-Francois Gout; Hongan Long; Way Sung; W Kelley Thomas; Patricia L Foster
Journal: Nat Rev Genet Date: 2016-10-14 Impact factor: 53.242

9. A Statistical Framework for Mapping Risk Genes from De Novo Mutations in Whole-Genome-Sequencing Studies.

Authors: Yuwen Liu; Yanyu Liang; A Ercument Cicek; Zhongshan Li; Jinchen Li; Rebecca A Muhle; Martina Krenzer; Yue Mei; Yan Wang; Nicholas Knoblauch; Jean Morrison; Siming Zhao; Yi Jiang; Evan Geller; Iuliana Ionita-Laza; Jinyu Wu; Kun Xia; James P Noonan; Zhong Sheng Sun; Xin He
Journal: Am J Hum Genet Date: 2018-05-10 Impact factor: 11.025

10. Causality in genetics: the gradient of genetic effects and back to Koch's postulates of causality.

Authors: Ali J Marian
Journal: Circ Res Date: 2014-01-17 Impact factor: 17.367