| Literature DB >> 35031571 |
Daniel Melamed1,2, Yuval Nov3, Assaf Malik4, Michael B Yakass5,6, Evgeni Bolotin1,2, Revital Shemer7, Edem K Hiadzi6, Karl L Skorecki8, Adi Livnat1,2.
Abstract
Although it is known that the mutation rate varies across the genome, previous estimates were based on averaging across various numbers of positions. Here, we describe a method to measure the origination rates of target mutations at target base positions and apply it to a 6-bp region in the human hemoglobin subunit beta (HBB) gene and to the identical, paralogous hemoglobin subunit delta (HBD) region in sperm cells from both African and European donors. The HBB region of interest (ROI) includes the site of the hemoglobin S (HbS) mutation, which protects against malaria, is common in Africa, and has served as a classic example of adaptation by random mutation and natural selection. We found a significant correspondence between de novo mutation rates and past observations of alleles in carriers, showing that mutation rates vary substantially in a mutation-specific manner that contributes to the site frequency spectrum. We also found that the overall point mutation rate is significantly higher in Africans than in Europeans in the HBB region studied. Finally, the rate of the 20A→T mutation, called the "HbS mutation" when it appears in HBB, is significantly higher than expected from the genome-wide average for this mutation type. Nine instances were observed in the African HBB ROI, where it is of adaptive significance, representing at least three independent originations; no instances were observed elsewhere. Further studies will be needed to examine mutation rates at the single-mutation resolution across these and other loci and organisms and to uncover the molecular mechanisms responsible.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35031571 PMCID: PMC8896469 DOI: 10.1101/gr.276103.121
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Experiment overview. Sperm samples are obtained from world regions with high or low malaria infection burden (malaria impact map adjusted from the CDC map) (CDC Division of Parasitic Diseases and Malaria 2019). Whole-genome DNA is extracted and an amount equivalent to 60–80 million sperm cells per donor is subjected to Bsu36I digestion. Bsu36I cleaves the DNA at multiple sites, including the HBB and HBD ROIs, which carry a specific recognition sequence. The HbS mutation blocks Bsu36I digestion and is thus enriched over the wild-type (WT). A primary barcode is added directly to each antisense DNA strand that carries the HBB or HBD ROI via a DNA polymerase–assisted fill-in reaction. Because each barcode consists of a random sequence of nucleotides, each of the numerous target fragments has its own unique barcode, illustrated by a unique color on the left end of the representation of each barcoded fragment. Multiple single-strand copies are each generated directly from each uniquely barcoded target fragment by linear amplification. A secondary barcode composed of a random sequence of nucleotides is added to the other end of each of these copies by a single primer extension reaction, illustrated by a unique color on the right end of each barcoded fragment. Thus, only full-length fragments (i.e., mutant or WT ROI sequences that evaded Bsu36I digestion) carry both the primary and the secondary barcodes and can be amplified by PCR for high-throughput sequencing. At the sequence analysis step, sequencing reads representing the PCR products of the linearly amplified copies are grouped together into families (see boxes), where in each family, reads share the same primary barcode sequence. Sporadic sequencing errors or DNA-polymerase errors generated during linear or subsequent amplification steps are unlikely to be repeated in multiple copies and are removed. De novo mutations, such as the HbS mutation, are easily identified by their appearance in multiple reads from distinct linear-amplification events. For a complete description of the library preparation protocol, which includes additional steps, see Supplemental Figures S1–S3.
Figure 2.Accuracy and yield of MEMDS compared with current cutting-edge methods for studying target regions. (A) Under a highly conservative estimate, MEMDS increases accuracy by at least 40-fold compared to duplex sequencing (DS) (Kennedy et al. 2014) and maximum depth sequencing (MDS) (Jee et al. 2016). (B) MEMDS also increases yield per sequenced base (i.e., the number of MEMDS confirmed bases divided by the number of paired-end sequenced bases) by orders of magnitude over both DS and MDS (Kennedy et al. 2014; Jee et al. 2016). Notice that in MEMDS, the yield can be higher than 1 because the mutation enrichment factor is accurately calculated (Supplemental Text S2) and the base identity is known for the ROI sequences that were digested and removed from the final sequencing libraries (they have the restriction enzyme recognition sequence). Although the accuracy of DS has been improved in the context of sequencing large parts of the genome (Abascal et al. 2021), yield considerations and other limitations preclude applying current DS-based methods to narrow ROIs and target mutations (Kennedy et al. 2014; Supplemental Text S1) with the same efficiency as that of MEMDS.
HBB and HBD ROI mutation counts