| Literature DB >> 32600246 |
Sandra Louzada1,2,3, Walid Algady4, Eleanor Weyell4, Luciana W Zuccherato5, Paulina Brajer4, Faisal Almalki4, Marilia O Scliar6, Michel S Naslavsky6, Guilherme L Yamamoto6, Yeda A O Duarte7, Maria Rita Passos-Bueno6, Mayana Zatz6, Fengtang Yang1, Edward J Hollox8.
Abstract
BACKGROUND: Approximately 5% of the human genome shows common structural variation, which is enriched for genes involved in the immune response and cell-cell interactions. A well-established region of extensive structural variation is the glycophorin gene cluster, comprising three tandemly-repeated regions about 120 kb in length and carrying the highly homologous genes GYPA, GYPB and GYPE. Glycophorin A (encoded by GYPA) and glycophorin B (encoded by GYPB) are glycoproteins present at high levels on the surface of erythrocytes, and they have been suggested to act as decoy receptors for viral pathogens. They are receptors for the invasion of the protist parasite Plasmodium falciparum, a causative agent of malaria. A particular complex structural variant, called DUP4, creates a GYPB-GYPA fusion gene known to confer resistance to malaria. Many other structural variants exist across the glycophorin gene cluster, and they remain poorly characterised.Entities:
Keywords: Copy number variation; Erythrocytes; GYPA; GYPB; GYPE; Glycophorin; Immune response; Inversion; Malaria; Structural variation
Year: 2020 PMID: 32600246 PMCID: PMC7325229 DOI: 10.1186/s12864-020-06849-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Glycophorin structural variants identified in this study
| Variant | Proximal breakpoint hg19 | Distal breakpoint hg19 | Variant size (kb) | Resolution of breakpoint (kb) | Index sample | Genes involved | Breakpoint identification method | In ref. [ |
|---|---|---|---|---|---|---|---|---|
| DEL1 | chr4:144835143–144,835,279 | chr4:144945375–144,945,517 | 110 | 0.143 | NA19223 | |||
| DEL2 | chr4:144912872–144,913,001 | chr4:145016127–145,016,256 | 103 | 0.130 | NA19144 | |||
| DEL4 | chr4:144750739–144,760,739 | chr4:144950739–144,960,739 | 200 | 10 | HG01986 | |||
| DEL6 | chr4:144780045–144,780,137 | chr4:145004120–145,004,212 | 224 | 0.093 | HG04039 | |||
| DEL7 | chr4:144780111–144,780,497 | chr4:144900945–144,901,334 | 121 | 0.390 | HG02716 | |||
| DEL13 | chr4:144925739–144,935,739 | chr4:145035739–145,045,739 | 110 | 10 | NA20867 | |||
| DEL15 | chr4:144800739 144,802,739 | chr4:144920739–144,922,739 | 119 | 2 | HGDP01172 | |||
| DEL16 | chr4:144752739–144,754,739 | chr4:144952739–144,954,739 | 200 | 2 | BR1296010301 | |||
| DEL17 | chr4:144882739–144,987,739 | chr4:144984739--144,987,739 | 103 | 3 | BR1183605501 | |||
| DEL18 | chr4:144755739–144,757,739 | chr4:144875739–144,878,739 | 123 | 2 | BR1099223302 | |||
| DUP2 | chr4: 145039739–145,041,739 | chr4: 144919739–144,921,739 | 120 | 2 | NA18593 | |||
| DUP3 | chr4:145004465–145,004,526 | chr4:144780388–144,780,449 | 224 | 0.062 | NA19360 | |||
| DUP4 | Multiple | Multiple | n/a | n/a | HG02554 | |||
| DUP5 | Multiple, including chr4:145113700 | Multiple, including chr4:144936865 | n/a | 0.001 | HG02585 | |||
| DUP7 | chr4:144895000–144,905,000 | chr4:144775000–144,785,000 | 120 | 10 | HG02679 | |||
| DUP8 | chr4:14504573 9–145,048,739 | chr4:144808739–144,810,739 | 240 | 3 | I1_S_Irula1, HG03837 | |||
| DUP14 | chr4:144853613–144,853,688 | chr4:144723019–144,723,094 | 131 | 0.076 | NA18646 | |||
| DUP22 | chr4:144926739–144,929,739 | chr4:144881739–144,884,739 | 45 | 3 | BR210800138, HG02181 | |||
| DUP26 | chr4:145065739–145,075,739 | chr4:144830739–144,840,739 | 155 | 10 | HG03729 | |||
| DUP27 | chr4: 145039739–145,041,739 | chr4: 144919739–144,921,739 | 120 | 2 | NA12249 | |||
| DUP29 | chr4:144939393–144,939,452 | chr4:144825584–144,825,643 | 114 | 0.060 | HG03686 | |||
| DUP30 | chr4:144989739–144,991,739 | chr4:144885739–144,887,739 | 102 | 2 | HGDP00543 | |||
| DUP33 | chr4:144959739–144,962,739 | chr4:144849739–144,851,739 | 111 | 3 | BR54409051 | |||
| DUP34 | chr4:145002739–145,004,739 | chr4:144900739–144,902,739 | 102 | 2 | BR1086675791 | |||
| DUP35 | chr4:144878739–144,880,739 | chr4:144758739–14,476,073,939 | 120 | 2 | BR981404021 |
Notes: SD = sequence depth analysis of high coverage genomic sequencing. DUP19 (NA19223), DUP25 (HG02031), DUP28 (NA19084) no clear 5 kb window pattern, DEL4 and DEL16, and DUP2 and DUP27 share overlapping breakpoint regions and may be the same variants. DUP23 (HG02491) and DUP24 (hg03837), identified by reference [11], share population and breakpoint regions with DUP8 and are classified as DUP8. The column titled “in ref. [11]” indicates whether the variant was previously observed by Leffler et al. (reference [11])
Fig. 1Structure of the glycophorin reference allele. A representation of the reference allele assembled in the GRCh37/hg19 assembly is shown, with the three distinct paralogous ~ 120 kb repeats of the glycophorin region coloured green, orange and purple, carrying GYPE, GYPB and GYPA respectively. Numbers over the start and end of each paralogue represent coordinates in chromosome 4 GRCh37/hg19 assembly. Coloured bars represent fosmids used as probes in fibre-FISH, with the fosmid identification number underneath. The lower black panel is an example fibre FISH image of this reference haplotype (from sample HG02585). The fibre-FISH image is scaled approximately to match the reference above it, with approximate boundaries between glycophorin repeats shown as dashed lines
Fig. 2Fibre-FISH validation of four glycophorin deletions. Sequence read depth (SRD) analysis of selected deletions (DEL1, DEL2, DEL6, DEL7) is shown on the left. The sequence read depth for each 5 kb window is shown as a point coloured according to the key on each plot either by sample or by cohort. The solid black line is the Loess best-fit line through the points. Individuals homozygous or DEL1 or DEL2, are shown in the plot with a very low sequence read depth (~ 0) across the deleted region. Above each plot the coloured bars show the glycophorin repeat regions, as in Fig. 1. The smaller coloured bars represent the location of each glycophorin gene (GYPE, GYPB, GYPA) labelled above each one. Representative fibre-FISH images from the index sample of each variant are shown on the right, with clones and fluorescent labels as shown in Fig. 1. All index samples apart from NA18719 are heterozygous, with a representative reference (top) and variant (bottom) allele from that sample shown. A schematic diagram next to the corresponding fibre-FISH image shows the structure of each allele inferred from the fibre-FISH and SRD analysis
Fig. 3Fibre-FISH validation of six glycophorin duplications. Sequence read depth (SRD) analysis of selected duplications (DUP2, DUP3, DUP7, DUP8, DUP14 and DUP29) is shown on the left. The sequence read depth for each 5 kb window is shown as a point coloured according to the key on each plot either by sample or by cohort. The solid black line is the Loess best-fit line through the points. Above each plot the coloured bars show the glycophorin repeat regions, as in Fig. 1. The smaller coloured bars represent the location of each glycophorin gene (GYPE, GYPB, GYPA) labelled above each one. Representative fibre-FISH images from the index sample of each variant are shown on the right, with clones and fluorescent labels as shown in Fig. 1, with an additional green-labelled PCR product specific to the glycophorin E repeat for HG03686. All index samples are heterozygous, with a representative reference and variant allele from that sample shown. A schematic diagram next to the corresponding fibre-FISH image shows the structure of each allele inferred from the fibre-FISH and SRD analysis
Fig. 4Analysis of DUP5 and DUP26 complex structures. a Sequence read depth (SRD) analysis of three individuals heterozygous for the DUP5 variant. b Representative fibre-FISH images from the DUP5 index sample HG02585. Clones and fluorescent labels as shown in Fig. 1. c Representative fibre-FISH images from the DUP5 index sample HG02585. Clones and fluorescent labels as shown in Fig. 1, except the red probe is fosmid G248P89366H1 and the pink probe is the glycophorin E repeat-specific PCR product. d Schematic showing design of PCR primers for specific amplification (black arrows) on reference and DUP5 structures. The ethidium bromide stained agarose gel shows a ~ 8 kb PCR product generated by these DUP5 specific primers. HG02554 is the DUP5 sample, “-” indicates a negative control with no genomic DNA and the marker, indicated by “m”, is Bioline Hyperladder 1 kb+. The triangles indicate increasing PCR annealing temperature from 65 °C to 67 °C. e Sequence read depth (SRD) analysis (left) and fibre-FISH analysis (right) of the index sample HG03729 heterozygous for DUP26 variant. Fosmid clones for fibre-FISH are as Fig. 1, except with the addition of the glycophorin E repeat-specific PCR product labelled in pink (c, d) or green (e)
Fig. 5Examples of refining breakpoints of a deletion (DEL6) and a duplication (DUP14). a Sequence read depth analysis, indicating position of PCR primers (not to scale). b Variant model, showing position of primers on reference and variant. c Agarose electrophoresis of long PCR products using variant-specific primers indicated in b). “-” indicates a negative control with no genomic DNA and the marker, indicated by “m”, is Bioline Hyperladder 1 kb+. The triangles indicate increasing PCR annealing temperature from 58 °C to 67 °C. d Multiple sequence alignment of the variant-specific PCR product, with homologous sequence on the GYPA repeat and the GYPE repeat. GYPE-specific variants are in green, GYPA-repeat-specific variants are in purple. e A model of the generation of the variants by NAHR
Fig. 6Structural variant breakpoints and meiotic recombination hotspots. The glycophorin region is shown together with the glycophorin genes. Below are the breakpoint regions for each structural variant, labelled in blue for the distal breakpoint in the variant, and red for the proximal breakpoint in the variant. Meiotic double strand break hotspots, corresponding to recombination hotspots [25] are shown in orange, labelled the PRDM9 allele responsible for activating that hotspot
Global distributionof glycophorin structural variants
| Continental grouping | 1000 Genomes | Gambian | Simons | Brazilian | ||||
|---|---|---|---|---|---|---|---|---|
| EUR | AFR | SAS | EAS | AMR | AFR | ALL | AMR | |
| Total number of chromosomes | 600 | 640 | 386 | 606 | 258 | 782 | 546 | 2650 |
| DEL1 | 0 | 53 | 0 | 1 | 1 | 55 | 7 | 19 |
| DEL2 | 0 | 26 | 0 | 0 | 2 | 2 | 4 | 12 |
| DEL4/16 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 3 |
| DEL6 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
| DEL7 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 0 |
| DUP2/27 | 0 | 1 | 1 | 11 | 1 | 0 | 0 | 7 |
| DUP3 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| DUP5 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 0 |
| DUP7 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 2 |
| DUP8 | 0 | 0 | 4 | 0 | 0 | 0 | 1 | 2 |
| DUP29 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| DUP22 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
| DUP30 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 0 |
| DUP35 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
Notes: Variants observed more than once are included. The full list of individuals with different glycophorin variants, together with their population of origin, is available as supplementary data