Literature DB >> 25519376

De novo mutations discovered in 8 Mexican American families through whole genome sequencing.

Heming Wang1, Xiaofeng Zhu1.   

Abstract

De novo mutations enrich the sequence diversity and carry the clue of evolutional selection. Recent studies suggest the de novo mutations could be one of the risk factors for complex diseases. We conducted a survey of de novo mutations using the whole genome sequence data but only available on the odd autosomes of Mexican American families provided by Genetic Analysis Workshop 18. We extracted 8 three-generation families who have sequencing data available from 20 large pedigrees. By comparing the known single nucleotide variants (SNVs) in dbSNP129 and the de novo variants transmitted in the Mexican American families, we were able to estimate a de novo mutation rate of 1.64(±0.42) × 10(-8) per position per haploid genome. This result is consistent with the estimates in literature that required many extensive validation efforts, such as genotyping and further resequencing. Our analysis suggests the importance of using family samples for studying rare variants.

Entities:  

Year:  2014        PMID: 25519376      PMCID: PMC4143763          DOI: 10.1186/1753-6561-8-S1-S24

Source DB:  PubMed          Journal:  BMC Proc        ISSN: 1753-6561


Background

De novo mutations enrich the sequence diversity and carry the clue of evolutional selection [1]. Because of the technological advances in whole genome sequencing, genome-wide de novo mutation survey becomes possible. Recent studies show that de novo mutations, including de novo copy number variations, are strongly associated with multiple diseases, such as autism and schizophrenia [2]. Currently de novo mutations are often studied in family trios by comparing the parents' and child's whole genome sequence data, as well as the publicly available dbSNP database [3]. Variants observed in offspring, but not in their parents, are often considered as potential de novo mutations. However, even highly accurate sequencing data will have inevitable errors that lead to false variant callings and possible mendelian errors. Therefore, the de novo mutation candidates observed by comparing offspring's and their parents' sequencing data can be false positive [4]. Thus, researchers often resequence or genotype the candidates to confirm the true de novo mutations [1-4]. This procedure could be time and money consuming. Here we propose an approach using 3-generation families to detect de novo mutations (a) using the parents and grandparents to search for de novo mutation candidates, and (b) using offspring sequence data to confirm true de novo mutations. We applied this approach to the Genetic Analysis Workshop 18 (GAW18) data and found our results consistent with previous genotyping and further resequencing validation efforts. This result suggested our approach is reliable. With the continuously decreasing cost of whole genome sequencing, this approach should be efficient to detect de novo mutations.

Methods

GAW18 data include 20 large Mexican American pedigrees as part of the Type 2 Diabetes Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D-GENES) project. Whole genome sequence data on the odd autosomes are provided to the GAW18 participants. Our analysis focused on the 464 individuals who were whole genome sequenced, resulting in 12 million SNVs. Among those, more than 6.1 million SNVs are novel and not present in dbSNP129. Among the novel SNVs, 5,086,136 SNVs have minor allele frequencies less than 0.5% (Figure 1). As our goal is to detect de novo mutations, our analysis is restricted to these novel and rare SNVs in order to reduce the false-positive rate. When a real de novo mutation is observed in an individual, there is a 50% probability of it being transmitted to each of the individual's children. Thus, the transmission of variants from an individual to the individual's offspring can be used as a validation procedure in detecting the de novo mutations. Therefore, we selected families with sequenced data available for at least 3 generations. A total of 8 three-generation families were selected (Figure 2). For each of the families in Figure 2, we examined every rare and novel variant and considered it as a de novo mutation candidate if it is present in a parent (the child in the triangle) but absent in both grandparents. We next examined whether a de novo mutation candidate is transmitted from a parent to the parent's offspring. Only a de novo mutation candidate who transmitted to his/her offspring is declared as a true de novo mutation. Among the 8 families in Figure 2, 4 families (including 1 a family and 3 e families) were used to identify de novo mutations in males, and 4 families (including 2 b families, 1 c, and 1 d families) were used to identify de novo mutations in females, depending on whether the parent is male or female. We further categorized the 8 families into 2 family types according to the number of offspring: type I included families a, b, d, and e, and type II included family c. Let Nbe the number of de novo mutations observed in a family and L be the sequence length of all odd autosomes in human. For a type I family, the total number of de novo mutations is then estimated as 2Nbecause only half of them are expected to be transmitted. Because humans have a pair chromosomes, the mutation rate µ is estimated as N. For a type II family, mutation rate µ is estimated as 2Nbecause 75% of de novo mutations are expected to transmit to 1 of the 2 children. As families d and e have both parents with sequencing data available, it is possible to further exclude any of variants present in both parents, further reducing the false discovery rate.
Figure 1

Comparison between the distribution of SNVs in dbSNP129 and novel SNVs.

Figure 2

A summary of selected family types. We identified 1 a family, 2 b families, 1 c family, 1 d family, and 3 e families. The upper trios (in the dashed triangles) are used to identify de novo mutation candidates, and the third generations are used to confirm a true de novo mutation. Family a and e measure the de novo mutations in males. Family b, c, and d measure the de novo mutations in females.

Comparison between the distribution of SNVs in dbSNP129 and novel SNVs. A summary of selected family types. We identified 1 a family, 2 b families, 1 c family, 1 d family, and 3 e families. The upper trios (in the dashed triangles) are used to identify de novo mutation candidates, and the third generations are used to confirm a true de novo mutation. Family a and e measure the de novo mutations in males. Family b, c, and d measure the de novo mutations in females.

Results

We analyzed the sequencing data after quality controls provided by GAW18. By investigating the first 2 generations in the 8 families, we were able to identify a total of 13,584 de novo mutation candidates. Among these candidates, 186 were successfully transmitted to the grandchildren. On average, 23.25 (±5.62) de novo mutations on the odd autosomes per family were discovered (Table 1). Considering there is an average of 1.35 billion base pairs on the odd chromosomes, we estimated an average mutation rate (µ) of 1.64(±0.42) × 10−8 per position per haploid genome, which falls in the range between 1.1 × 10−8 and 3.8 × 10−8 reported in the literature [4-6]. We did not observe a significant difference between the de novo mutations in males (1.61 × 10−8 ) and females (1.67 × 10−8).
Table 1

Summary of de novo mutation numbers in each family.

Family IDFamily typePaternal ageMaternal ageObserved de novo mutations NoDe novo mutation rate µ
Fam2_1e3531272.00 × 10−8
Fam2_2a2624251.85 × 10−8
Fam2_3c2523271.33 × 10−8
Fam10_1d2923332.44 × 10−8
Fam10_2b2629201.48 × 10−8
Fam10_3b2125191.41 × 10−8
Fam16_1e3127181.33 × 10−8
Fam27_1e2621171.26 × 10−8

Average1.64(±0.42) × 10−8
Summary of de novo mutation numbers in each family. We used the UCSC genome browser (http://genome.ucsc.edu/) [7,8] and SIFT (http://sift.jcvi.org/) [9] to map and predict the protein functions of the 186 de novo mutations. Seven of them are in exon regions and 2 are nonsynonymous SNVs. One of the nonsynonymous SNVs is in the gene PDZ domain containing 2 (PDZD2) on chromosome 5; the other is in gene spastic ataxia of Charlevoix-Saguenay (sacsin) (SACS) on chromosome 13. PDZ domains are protein-protein recognition modules that play a central role in organizing diverse cell signaling assemblies, most often in the cytoplasmic tails of transmembrane receptors and channels. PDZD2 and its secreted form (sPDZD2) are possibly involved in functional maturation of human fetal PPC-derived ICCs and the early stages of prostate tumorigenesis [10,11]. SACS encodes the sacsin protein, which is highly expressed in the central nervous system. Mutations in this gene will cause autosomal recessive spastic ataxia of Charlevoix-Saguenay, but the detail of its function is still unknown [12,13]. CpG sites are known as the mutation hotspots in mammals [14]. In the great apes, the de novo mutation rate on the CpG sites is estimated to be 11 times higher than that on the non-CpG sites [4,15]. We extracted the CpG islands from UCSC genome browser and examined the locations of the identified de novo mutations. Of our confirmed 186 de novo mutations, only 1 is located on the CpG islands. Considering the coverage of CpG islands on the odd autosomes, we expect we underestimated the CpG mutations. In the remaining 185 non-CpG mutations, we observed 127 transition mutations and 58 transversion mutations. The transition-to-transversion ratio is 2.2, similar to previous estimates [4,6]. Furthermore, we examined the relationships between the age of parents and the de novo mutation rate in the child using the first 2 generations in the 8 families by constructing linear models. In general, the de novo mutation rate in the child increases with the child's parents' ages, especially with the father's age. This is consistent with the previous report that the de novo mutation rate in offspring is positively correlated with the paternal age [1]. Nevertheless, no significant association effect was observed because of the small sample size in this study.

Discussion

We conducted an analysis of the whole genome sequences on odd autosomes of 8 three-generation families to identify de novo mutations. We found this 3-generation approach is efficient, although no further resequencing of the candidate variants was performed. In the 8 selected Mexican American families, we estimated a mutation rate of 1.64(±0.42) × 10−8 per position per haploid human genome, which is consistent with the previous estimates [4-6]. Among the 13,584 de novo mutation candidates observed in 8 three-generation families, only 186 are observed in grandchildren. This is remarkably less than the expected number of transmissions, suggesting that most de novo mutation candidates can be attributed to SNV calling errors. Because the goals in a whole genome sequencing project are to detect rare and possible de novo variants and test for association of these to a complex disease, how to account for the false-positive calls of SNVs is extremely important in an association study. Our analysis suggests sequencing family members is an efficient way to detect these SNV calling errors. For example, our analysis suggests that a variant observed in offspring but not in their parents in a simple trio can usually be treated as an SNV calling error, and should be excluded in downstream analyses. Previous studies suggest family data has many statistical advantages in detecting rare disease variants [16,17]. Thus, our results suggest whole-genome sequencing family members is worthwhile when most current whole genome sequencing projects only focus on unrelated subjects. It should be pointed out that the recruitment of multigeneration pedigrees is more difficult than family trios. However, many multigeneration pedigrees have already been collected in traditional linkage studies, such as the pedigrees used here. We expect the proposed method can be useful in detecting de novo mutations.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

XZ designed the overall study, HW conducted statistical analyses, HW and XZ drafted the manuscript. All authors read and approved the final manuscript.
  17 in total

1.  Estimate of the mutation rate per nucleotide in humans.

Authors:  M W Nachman; S L Crowell
Journal:  Genetics       Date:  2000-09       Impact factor: 4.562

2.  ARSACS, a spastic ataxia common in northeastern Québec, is caused by mutations in a new gene encoding an 11.5-kb ORF.

Authors:  J C Engert; P Bérubé; J Mercier; C Doré; P Lepage; B Ge; J P Bouchard; J Mathieu; S B Melançon; M Schalling; E S Lander; K Morgan; T J Hudson; A Richter
Journal:  Nat Genet       Date:  2000-02       Impact factor: 38.330

3.  Molecular basis of base substitution hotspots in Escherichia coli.

Authors:  C Coulondre; J H Miller; P J Farabaugh; W Gilbert
Journal:  Nature       Date:  1978-08-24       Impact factor: 49.962

4.  Detecting rare and common variants for complex traits: sibpair and odds ratio weighted sum statistics (SPWSS, ORWSS).

Authors:  Tao Feng; Robert C Elston; Xiaofeng Zhu
Journal:  Genet Epidemiol       Date:  2011-05-18       Impact factor: 2.135

5.  PDZ-domain containing-2 (PDZD2) drives the maturity of human fetal pancreatic progenitor-derived islet-like cell clusters with functional responsiveness against membrane depolarization.

Authors:  Kwan Keung Leung; Po Man Suen; Tse Kin Lau; Wing Hung Ko; Kwok Ming Yao; Po Sing Leung
Journal:  Stem Cells Dev       Date:  2009-09       Impact factor: 3.272

6.  Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases.

Authors:  Alexey S Kondrashov
Journal:  Hum Mutat       Date:  2003-01       Impact factor: 4.878

7.  Novel SACS mutations in autosomal recessive spastic ataxia of Charlevoix-Saguenay type.

Authors:  G S Grieco; A Malandrini; G Comanducci; V Leuzzi; M Valoppi; A Tessa; S Palmeri; L Benedetti; A Pierallini; S Gambelli; A Federico; F Pierelli; E Bertini; C Casali; F M Santorelli
Journal:  Neurology       Date:  2004-01-13       Impact factor: 9.910

8.  Strong association of de novo copy number mutations with autism.

Authors:  Jonathan Sebat; B Lakshmi; Dheeraj Malhotra; Jennifer Troge; Christa Lese-Martin; Tom Walsh; Boris Yamrom; Seungtai Yoon; Alex Krasnitz; Jude Kendall; Anthony Leotta; Deepa Pai; Ray Zhang; Yoon-Ha Lee; James Hicks; Sarah J Spence; Annette T Lee; Kaija Puura; Terho Lehtimäki; David Ledbetter; Peter K Gregersen; Joel Bregman; James S Sutcliffe; Vaidehi Jobanputra; Wendy Chung; Dorothy Warburton; Mary-Claire King; David Skuse; Daniel H Geschwind; T Conrad Gilliam; Kenny Ye; Michael Wigler
Journal:  Science       Date:  2007-03-15       Impact factor: 47.728

9.  The UCSC Genome Browser database: update 2011.

Authors:  Pauline A Fujita; Brooke Rhead; Ann S Zweig; Angie S Hinrichs; Donna Karolchik; Melissa S Cline; Mary Goldman; Galt P Barber; Hiram Clawson; Antonio Coelho; Mark Diekhans; Timothy R Dreszer; Belinda M Giardine; Rachel A Harte; Jennifer Hillman-Jackson; Fan Hsu; Vanessa Kirkup; Robert M Kuhn; Katrina Learned; Chin H Li; Laurence R Meyer; Andy Pohl; Brian J Raney; Kate R Rosenbloom; Kayla E Smith; David Haussler; W James Kent
Journal:  Nucleic Acids Res       Date:  2010-10-18       Impact factor: 16.971

10.  Rate of de novo mutations and the importance of father's age to disease risk.

Authors:  Augustine Kong; Michael L Frigge; Gisli Masson; Soren Besenbacher; Patrick Sulem; Gisli Magnusson; Sigurjon A Gudjonsson; Asgeir Sigurdsson; Aslaug Jonasdottir; Adalbjorg Jonasdottir; Wendy S W Wong; Gunnar Sigurdsson; G Bragi Walters; Stacy Steinberg; Hannes Helgason; Gudmar Thorleifsson; Daniel F Gudbjartsson; Agnar Helgason; Olafur Th Magnusson; Unnur Thorsteinsdottir; Kari Stefansson
Journal:  Nature       Date:  2012-08-23       Impact factor: 49.962

View more
  5 in total

1.  Value of Mendelian laws of segregation in families: data quality control, imputation, and beyond.

Authors:  Elizabeth M Blue; Lei Sun; Nathan L Tintle; Ellen M Wijsman
Journal:  Genet Epidemiol       Date:  2014-09       Impact factor: 2.135

2.  The germline mutational process in rhesus macaque and its implications for phylogenetic dating.

Authors:  Lucie A Bergeron; Søren Besenbacher; Jaco Bakker; Jiao Zheng; Panyi Li; George Pacheco; Mikkel-Holger S Sinding; Maria Kamilari; M Thomas P Gilbert; Mikkel H Schierup; Guojie Zhang
Journal:  Gigascience       Date:  2021-05-05       Impact factor: 6.524

Review 3.  Genetic mosaics and the germ line lineage.

Authors:  Mark E Samuels; Jan M Friedman
Journal:  Genes (Basel)       Date:  2015-04-17       Impact factor: 4.096

4.  Whole genome sequencing data from pedigrees suggests linkage disequilibrium among rare variants created by population admixture.

Authors:  Tao Feng; Xiaofeng Zhu
Journal:  BMC Proc       Date:  2014-06-17

5.  Evolution of the Insertion-Deletion Mutation Rate Across the Tree of Life.

Authors:  Way Sung; Matthew S Ackerman; Marcus M Dillon; Thomas G Platt; Clay Fuqua; Vaughn S Cooper; Michael Lynch
Journal:  G3 (Bethesda)       Date:  2016-08-09       Impact factor: 3.154

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.