| Literature DB >> 35718832 |
Muhammad Kohailan1, Waleed Aamer2, Najeeb Syed3, Sujitha Padmajeya2, Sura Hussein2, Amira Sayed2, Jyothi Janardhanan2, Sasirekha Palaniswamy2, Nady El Hajj1, Ammira Al-Shabeeb Akil2, Khalid A Fakhro4,5,6.
Abstract
While de novo mutations (DNMs) are key to genetic diversity, they are also responsible for a high number of rare disorders. To date, no study has systematically examined the rate and distribution of DNMs in multiplex families in highly consanguineous populations. Leveraging WGS profiles of 645 individuals in 146 families, we implemented a combinatorial approach using 3 complementary tools for DNM discovery in 353 unique trio combinations. We found a total of 27,168 DNMs (median: 70 single-nucleotide and 6 insertion-deletions per individual). Phasing revealed around 80% of DNMs were paternal in origin. Notably, using whole-genome methylation data of spermatogonial stem cells, these DNMs were significantly more likely to occur at highly methylated CpGs (OR: 2.03; p value = 6.62 × 10-11). We then examined the effects of consanguinity and ethnicity on DNMs, and found that consanguinity does not seem to correlate with DNM rate, and special attention has to be considered while measuring such a correlation. Additionally, we found that Middle-Eastern families with Arab ancestry had fewer DNMs than African families, although not significant (p value = 0.16). Finally, for families with diseased probands, we examined the difference in DNM counts and putative impact across affected and unaffected siblings, but did not find significant differences between disease groups, likely owing to the enrichment for recessive disorders in this part of the world, or the small sample size per clinical condition. This study serves as a reference for DNM discovery in multiplex families from the globally under-represented populations of the Middle-East.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35718832 PMCID: PMC9510050 DOI: 10.1038/s10038-022-01054-9
Source DB: PubMed Journal: J Hum Genet ISSN: 1434-5161 Impact factor: 3.755
Description of the included families and identified DNMs in the study cohort
| Description | Count |
|---|---|
| Total cohort size | 645 samples |
| Trios (males, females) | 353 (190, 163) |
| Phenotypes | |
| Neurogenetic | 92 |
| Craniofacial | 17 |
| Endocrine | 9 |
| Multi-system | 17 |
| Other | 25 |
| Healthy | 193 |
| Sub-populations | |
| African | 33 |
| South-Asian | 67 |
| Middle-Eastern | 207 |
| Caucasian | 21 |
| Other | 25 |
| Total families | 146 |
| Consanguineous families | 47 |
| Median fathers’ age | 34 years old |
| Median mothers’ age | 29 years old |
| Total identified de novo variants | 27,168 |
| SNVs (median per individual) | 24,808 (70) |
| INDELs (median per individual) | 2360 (6) |
| Effective genome coverage | 2.797 × 109 |
| SNVs rate | 1.25 × 10–8 |
| INDELs rate | 1.07 × 10−9 |
Fig. 1Parental age effects on DNM counts. A Correlation between parental age at conception and number of phased DNMs normalized to the total number of phased DNMs in each individual, performed across all families. The blue regression line (slope = 1.36, 95% CI = 1.11–1.61) shows paternally phased DNMs, while the red line (slope = 0.33, 95% CI = 0.11–0.56) shows maternally phased DNMs. B Paternal age is plotted against the number of total autosomal DNMs for individuals in large families (number of offspring ≥4, total = 21 families), with each family analyzed separately. Families were plotted in order of ascending correlation for easier visualization. Slopes of the regression lines range from −0.54 (95% CI: − 7.06–5.97) to +7.74 (95% CI: 3.14–12.33). C A Poisson regression for each large family. The plot shows the slope of each regression ± 95% confidence intervals. The vertical line indicates the average paternal age effect for all families in this model
Fig. 2Effect of family size on DNM counts. Number of offspring per family is plotted against DNM count in individuals. The red line represents the regression line (slope = −1.15) with 95% confidence intervals shown in gray
Fig. 3De novo mutation load and consanguinity. Parents in each trio in the dataset were categorized into 1st degree cousins (blue), 2nd degree cousins (green), and unrelated (red). Boxplots show the median and interquartile range, and p values are shown above brackets. Plots show the correlation between relatedness scores and (A) DNM count, (B) father’s age at conception, (C) family size, (D) DNM count after correcting for father’s age, and (E) DNM count after correcting for family size
Fold difference in the fraction of DNMs based on methylation levels
| Cell type | Methylation level | DNM CpGs | All CpGs | Fraction | Fold difference | |
|---|---|---|---|---|---|---|
| SSCs | ≤50% | 86 | 8,804,182 | 9.77 × 10-6 | – | – |
| >50% | 389 | 19,617,481 | 1.98 × 10-5 | 2.03 | 6.62 × 10−11 | |
| Liver cells | ≤50% | 288 | 18,383,721 | 1.57 × 10−5 | – | – |
| >50% | 187 | 10,037,942 | 1.86 × 10−5 | 1.19 | 0.03 | |
| PBMCs | ≤50% | 316 | 20,440,889 | 1.55 × 10−5 | – | – |
| >50% | 159 | 7,980,774 | 1.99 × 10−5 | 1.29 | 0.004 |
Fig. 4DNM counts by sub-population and disease phenotypes. Boxplots show the median and interquartile range, and p values (Bonferroni corrected in A and B) are shown above brackets. Plots show the (A) DNM counts in different populations, (B) DNM counts in the populations normalized to the father’s age, and (C) DNM counts with regard to disease phenotypes