| Literature DB >> 32699385 |
Steven Pastor1, Oanh Tran2, Andrea Jin2, Danielle Carrado2, Benjamin A Silva3, Lahari Uppuluri4, Heba Z Abid4, Eleanor Young4, T Blaine Crowley2, Alice G Bailey2, Daniel E McGinn5, Donna M McDonald-McGinn2,3, Elaine H Zackai2,3, Michael Xie5, Deanne Taylor5, Bernice E Morrow6, Ming Xiao4,7, Beverly S Emanuel2,3.
Abstract
The most prevalent microdeletion in humans occurs at 22q11.2, a region rich in chromosome-specific low copy repeats (LCR22s). The structure of this region has defied elucidation due to its size, regional complexity, and haplotype diversity, and is not well represented in the human genome reference. Most individuals with 22q11.2 deletion syndrome (22q11.2DS) carry a de novo hemizygous deletion of ~ 3 Mbp occurring by non-allelic homologous recombination (NAHR) mediated by LCR22s. In this study, optical mapping has been used to elucidate LCR22 structure and variation in 88 individuals in thirty 22q11.2DS families to uncover potential risk factors for germline rearrangements leading to 22q11.2DS offspring. Families were optically mapped to characterize LCR22 structures, NAHR locations, and genomic signatures associated with the deletion. Bioinformatics analyses revealed clear delineations between LCR22 structures in normal and deletion-containing haplotypes. Despite no explicit whole-haplotype predisposing configurations being identified, all NAHR events contain a segmental duplication encompassing FAM230 gene members suggesting preferred recombination sequences. Analysis of deletion breakpoints indicates that preferred recombinations occur between FAM230 and specific segmental duplication orientations within LCR22A and LCR22D, ultimately leading to NAHR. This work represents the most comprehensive analysis of 22q11.2DS NAHR events demonstrating completely contiguous LCR22 structures surrounding and within deletion breakpoints.Entities:
Mesh:
Year: 2020 PMID: 32699385 PMCID: PMC7376033 DOI: 10.1038/s41598-020-69134-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Defining LCR22 features through optical mapping. DLE1-labeled molecules (yellow), > 150 kbp in length, are assembled into contigs (blue), and aligned to the in silico labeled hg38 reference map (green) to obtain individual LCR22A and LCR22D haplotype maps. Molecules comprising haplotypes anchor outside segmental duplications (green boxes) and connect tandem duplicons with unique labels or polymorphic labels. Here, LCR22A has two haplotypes and the top one lacks SD22-3 (mustard) and contains one reference orientation 160 kbp module (red). The second LCR22A haplotype is differentiated from the first by having three 160 kbp modules and again lacks SD22-3. Anchored molecules connecting to the four labels in the 5′-most 160 kbp module differentiate it as a separate haplotype from the first haplotype, which contains three labels at the same reference-based locus. Likewise, the first haplotype contains clear evidence of a contig and its molecules anchoring in the 3′ end whereas the second haplotype continues to the next 160 kbp module. LCR22D also contains two unique haplotypes. Here, the first haplotype contains a 160 kbp module with six 5′ labels and an inversion (pink) and the second haplotype contains a 160 kbp module with four labels at the same reference-based locus and no inversion. Mapping contigs and molecules in all genomes yielded four clearly-defined features, explained in the two boxes. SD22-3 (mustard), SD22-4 (red), and the frequent ~ 64 kbp LCR22D inversion (pink) are named based on[9] and[15].
Figure 2Optical mapping unambiguously reveals the correct structures and haplotypes in 22q11.2. Illumina sequencing reads of the 11744C genome map to the hg38 reference genome 22q11.2 sequence. This includes reads mapping to the SD22-3 segmental duplication (mustard). Upon inspection of SD22-3 in the UCSC Genome Browser’s segmental duplication track, reads mapping to several loci within SD22-3 would also map to other segmental duplications in 22q11.2 with > 98% identity. A small section of SD22-3 is indicated in the figure, where reads mapping from 18,560,037 to 18,560,186 would also map to two LCR22D loci with 100% identity. Optical mapping of the 11744C genome indicates the complete absence of SD22-3 in either of its haplotypes (blue contigs) when aligned to the hg38 reference map (green). One haplotype consists of two copies of SD22-4, a previously characterized 160 kbp element, with inverted and reference orientations. The other haplotype consists of three copies of SD22-4, all inverted relative to the reference. Anchor regions outside the segmental duplications (green) validate correct mapping of contigs, providing clear evidence of these two haplotypes, demonstrating that SD22-3 does not exist as a gross structure in this genome as Illumina short reads incorrectly indicated.
Figure 3Unique LCR22A structures across parental genomes. LCR22A haplotypes were grouped based on the two defined segmental duplication features (see Fig. 1) and DLE-1 label patterns. The left-most integers of each haplotype contig map (blue) indicate maps grouped by DLE-1 labels (61 groups) and the right-most integers indicate maps grouped by the two segmental duplication features (32 groups). The 160 kbp modules (red arrows) and SD22-3 modules (mustard arrows) were present, absent, or copy number variable in any orientation relative to the hg38 reference map. The partial 160 kbp modules (smaller red arrows) were only present in single copies and in reference orientation.
Figure 4Unique LCR22D structures across parental genomes. Optical maps reveal a relatively stable LCR22D configuration across haplotypes and genomes with very few changes to the 160 kbp module (red arrows) but a frequent inversion polymorphism (pink arrows). Left-most integers indicate the maps grouped by DLE-1 labels and right-most integers are maps grouped by the previously defined segmental duplication features. There are also instances of partial 160 kbp duplications (left-most integer labels 6, 7, 16, and 17). The majority of LCR22D parental haplotypes contain the inversion and no significant difference in the frequency of the inversion between parents-of-deletion-origin LCR22D haplotypes as compared to non-transmitting parents was observed (bottom-right box).
Figure 5Thirteen unique estimated NAHR events across 30 families. In each event, a parental LCR22A contig (blue) is the top-most contig and the opposing parental LCR22D contig is at the bottom. The middle contig is a proband’s deletion-containing contig from the families listed below in the gray box. Matched labels between parental LCR22A (green boxes) and the proband contigs signify unambiguous LCR22A before the site of NAHR. Matched labels between parental LCR22D (purple boxes) and the proband contigs signify unambiguous LCR22D after the site of NAHR. Red boxes signify the estimated ambiguous site of recombination, denoted by shared labels across parental LCR22A and LCR22D contigs. All NAHR events overlapped with FAM230 sequences. Estimated NAHR ranges are not to scale.