| Literature DB >> 21291537 |
Abstract
BACKGROUND: During gene conversion, genetic information is transferred unidirectionally between highly homologous but non-allelic regions of DNA. While germ-line gene conversion has been implicated in the pathogenesis of some diseases, somatic gene conversion has remained technically difficult to investigate on a large scale.Entities:
Mesh:
Year: 2011 PMID: 21291537 PMCID: PMC3048570 DOI: 10.1186/1741-7015-9-12
Source DB: PubMed Journal: BMC Med ISSN: 1741-7015 Impact factor: 8.775
Figure 1A model of gene conversion between duplicons. Two homologous but non-allelic sequences are shown, with homology indicated by a common green color. After a double strand break in the original sequence, the template sequence is used to form a heteroduplex DNA structure with the original sequence during the process of repair. A possible repair outcome is shown, illustrating changes to both the template and original sequences far from the location of the break, as well as changes and deletion in the original sequence in the vicinity of the break.
Figure 2Effects of gene conversion on probe intensity signals. A microarray has two probes for a SNP, each 25 bp long (top). An individual with an AA homozygous genotype at the SNP locus is shown. Two examples of gene conversion are illustrated. The left example considers the case when the duplicon contains sequence that exactly matches the B probe. The right example considers the case when the duplicon contains sequence that does not match either probe.
Figure 3Cluster plots for SNP rs4988327 in the WTCCC data. Note the high spread for RA, and the resulting increase in no-calls (orange) relative to calls (green).
Summary of the three data filters.
| Filter | No-call rate | Min. homology | Biased |
|---|---|---|---|
| Stringent | 300 bp, 85% | ||
| Relaxed | 1,000 bp, 90% | ||
| No-call only | 1,000 bp, 90% | - |
SNPs identified for various cohorts using the stringent filter (Part 1).
| Cohort(s) | SNP/identity | Chr: Pos. (hg17) | Duplicon | Characterized genes and | |
|---|---|---|---|---|---|
| CD | rs4471699 | 16: | 30.2 M→ | 147 kb | SULT1A3, GIYD2, BOLA2, IMAA, CORO1A |
| 99.6% | 16: | 29.4 M→ | 146 kb | SULT1A3, GIYD2, BOLA2, | |
| 98.1% | 16: | 21.8 M→ | 41 kb | [UQCRC2] | |
| 98.0% | 16: | 22.3 M← | 42 kb | [NPIPL3] | |
| 98.0% | 16: | 21.3 M→ | 42 kb | ||
| 97.1% | 16: | 18.8 M→ | 75 kb | ||
| RA | rs669980 | 9: | 0.2 M→ | 193 kb | CBWD1, FOXD4, FAM138A, WASH1, [DOCK8] |
| 98.9% (F) | 2: | 114 M← | 189 kb | CBWD2, FOX4DL1, FAM138B, WASH2P, [RABL2A] | |
| CAD, T2D | rs10502407 | 18: | 10.6 M→ | 52 kb | - |
| 97.9% | 18: | 12.2 M← | 64 kb | [CIDEA] | |
| CAD | rs12134625 | 1: | 78 M→ | 931 | - |
| 97.0% | 1: | 24 M← | 932 | ||
| BD, CAD | rs9551988 | 13: | 19.2 M→ | 2.6 kb | |
| HT | 96.2% (F) | 13: | 18.7 M→ | 2.8 kb | [TUBA3C] |
| HT | rs935019 | 2: | 127,162 K→ | 3.6 kb | |
| 95.3% (F) | 2: | 127,166 K→ | 3.5 kb | ||
| HT | rs12227938 | 12: | 37 M→ | 154 kb | ALG10B |
| 95.3% (P) | 12: | 34 M→ | 127 kb | ALG10 | |
| T2D | SNP_A-1797773 | 16: | 45 M→ | 14 kb | |
| 94.8% (F) | 16: | 34 M← | 16 kb | - | |
| T1D | rs12381130 | 16: | 5 M→ | 88 kb | ALG1, FAM86A |
| 94.7% | 3: | 127 M← | 76 kb | ALG1L | |
| 94.6% | 11: | 67 M→ | 79 kb | - | |
| 94.6% | 11: | 71 M← | 40 kb | FAM86C, [DEFB108B] | |
| 94.5% | 11: | 3 M→ | 91 kb | [ZNF195] | |
| 94.3% | 3: | 76 M→ | 44 kb | [FAM86D] | |
| 94.0% | 4: | 9 M← | 120 kb | - | |
| 93.9% | 3: | 131 M→ | 44 kb | - | |
| 93.9% | 4: | 4 M→ | 53 kb | - | |
| 93.7% | 12: | 8 M→ | 53 kb | [FAM90A1] | |
| 93.6% | 8: | 12 M→ | 41 kb | [FAM86B1] | |
| 93.5% | 8: | 8 M← | 63 kb | - | |
Multiple almost-contiguous segmental duplications are treated as a single large duplicon (intervening sequence is included in the length). The table includes only duplicons with at least 85% identity to the region containing the SNP. Duplicons with identical flanking sequence to the SNP are labeled as fully degenerate (F); duplicons with partial degeneracy are labeled (P). Characterized genes are listed if they occur within a duplicon. Genes in square brackets are outside the duplicon, but the duplicon is at most 30 kb upstream of the gene. Genes for a SNP are italicized if the SNP is within that gene, or if the SNP maps to a position within that gene in the duplicon.
SNPs identified for various cohorts using the stringent filter (Part 2).
| Cohort(s) | SNP/identity | Chr: Pos. (hg17) | Duplicon | Characterized genes and | |
|---|---|---|---|---|---|
| CD | rs11060028 | 12: | 128 M→ | 1.5 kb | |
| 93.4% (P) | 10: | 102 M← | 1.2 kb | [ABCC2] | |
| T1D | rs3805006 | 3: | 4,775 K→ | 402 | |
| 93.4% (P) | 3: | 4,773 K← | 407 | ||
| BD, HT | rs9378249 | 6: | 31.4 M→ | 27 kb | HLA-B, DHFRP2 |
| 92.9% (F) | 6: | 31.3 M→ | 35 kb | HLA-C | |
| HT | rs841245 | 12: | 27.1 M→ | 84 kb | - |
| 92.0% (P) | 12: | 27.6 M→ | 82 kb | ||
| BD | rs12070036 | 1: | 224 M→ | 9 kb | |
| 91.9% | 12: | 7 M← | 3.5 kb | [PEX5] | |
| 91.1% | 12: | 123 M← | 10 kb | [RILPL1], [TMED2] | |
| 90.9% | 11: | 26 M← | 2.7 kb | - | |
| RA | rs4988327 | 11: | 68 M→ | 104 kb | LRP5 |
| 91.2% | 22: | 24 M← | 64 kb | LRP5L | |
| T2D | rs11010908 | 10: | 37.2 M→ | 6 kb | - |
| 90.6% | 10: | 27.2 M← | 12 kb | - | |
| 90.0% | 10: | 27.6 M→ | 6 kb | - | |
| CAD | rs295470 | 3: | 141 M→ | 1.9 kb | |
| 89.5% | 17: | 77 M← | 2.3 kb | ||
| 89.1% | 1: | 92 M← | 866 | - | |
| 87.8% | X: | 53 M← | 636 | - | |
| 86.6% | 2: | 108 M→ | 568 | - | |
| 86.5% | 17: | 17 M→ | 696 | [FLCN] | |
| 85.9% | 3: | 12 M→ | 1.9 kb | ||
| BD, HT | rs2122231 | 3: | 35 M→ | 4.9 kb | - |
| 88.8% | 6: | 117.0 M→ | 4 kb | [NT5DC1] | |
| 88.6% | 18: | 5 M→ | 3.9 kb | - | |
| 88.5% | 2: | 194 M→ | 1 kb | - | |
| 87.9% | 1: | 96 M→ | 4.9 kb | - | |
| 87.3% | 10: | 117 M→ | 4.2 kb | - | |
| 86.3% | 20: | 24 M→ | 728 | - | |
| BD, HT | SNP_A-1948953 | 17: | 17 M→ | 894 | |
| 87.0% (P) | 4: | 54 M← | 21 kb | ||
| CD | rs9839841 | 3: | 16 M→ | 110 kb | |
| 86.8% (F) | Y: | 7.6 M→ | 100 kb | ||
| BD, T2D | rs4850057 | 2: | 4 M→ | 4.7 kb | - |
| 86.8% | 9: | 35 M→ | 4.5 kb | ||
| 86.1% | 11: | 5 M→ | 3.0 kb | [TRIM68], [OR51D1], [OR51E1] | |
SNPs identified in the MHC region for T1D using the stringent filter.
| Cohort(s) | SNP/identity | Chr: Pos. (hg17) | Duplicon | Characterized genes and | |
|---|---|---|---|---|---|
| T1D | rs9378249 | 6: | 31.4 M→ | 27 kb | HLA-B, DHFRP2 |
| 92.9% (F) | 6: | 31.3 M→ | 35 kb | HLA-C | |
| T1D | rs9257223 | 6: | 29 M→ | 16 kb | - |
| 92.5% | 11: | 50 M→ | 16 kb | - | |
| T1D | rs389600 | 6: | 30 M→ | 4 kb | HLA-K |
| 87.5% | 6: | 30 M← | 4 kb | HLA-A | |
| 87.5% | 6: | 30 M← | 4 kb | HLA-H | |
| 86.2% | 6: | 30 M← | 4 kb | HLA-J | |
| 85.8% | 6: | 30 M← | 3.5 kb | HLA-G | |
Numeric codes describing the strength of evidence for an association of a gene with a disease.
| Code | Kind of evidence |
|---|---|
| 6 | Known association of the gene with the disease. |
| 5 | Gene is known to interact with an intermediate, and the intermediate has a known association with the disease. |
| 4 | Known association of the gene with a function central to disease pathogenesis (for example, insulin secretion for diabetes). |
| 3 | Gene is known to interact with an intermediate, and the intermediate has a known association with a function central to disease pathogenesis. |
| 2 | Known association of a region containing the gene with the disease. |
| 1 | Gene disruption is known to have a general mutagenic effect. |
| 0 | No evidence. |
Figure 4The structure of the homology between the HLA-B and HLA-C containing duplicons on chromosome 6. Genes and pseudogenes are shown in blue. Corresponding homologous regions are shown in matching shades of green, together with the degree of homology according to the segmental duplication track of the UCSC browser (the two rightmost segments) or Blast (the leftmost segment). The pink region is about 91% homologous to the DHFR region on chromosome 5.
Figure 5Proportion of each of the nine populations having low measured intensity at rs7761068. The intensity thresholds were chosen so that 50% of the combined control population would have low intensity.
Figure 6Cluster plot for males at the rs9839841 locus. The populations are CD (788 males), 58C (752 males), and NBS (720 males). Note the higher spread of the data points in CD.
Comparison of the stringent and mock tests.
| Test | 6 | ≥5 | ≥4 | ≥3 | ≥2 | ≥1 | 0 | |
|---|---|---|---|---|---|---|---|---|
| Mock | (/70) | 6 (4) | 6 (4) | 7 (5) | 11 (8) | 13 (9) | 16 (11) | 84 (59) |
| Stringent | (/28) | 21 (6) | 36 (10) | 54 (15) | 79 (22) | 79 (22) | 79 (22) | 21 (6) |
| (/16) | 31 (5) | 50 (8) | 62.5 (10) | 87.5 (14) | 87.5 (14) | 87.5 (14) | 12.5 (2) | |
Percentage (count) of SNPs with evidence in the various categories for the mock and stringent tests. The second row for the stringent test limits the analysis to SNPs belonging to a duplicon in the segmental duplication database.
Additional SNPs identified using the relaxed filter.
| SNP | Disease(s) | Characterized genes in duplicons |
|---|---|---|
| rs10147986 | CD | (40 duplicons) |
| rs10502407 | BD | [CIDEA] |
| rs10896468 | CAD | OR8U8, OR5M8 |
| rs11010995 | RA | - |
| rs11028186 | RA | ALG1L, ASNS, [ZNF195], [FAM86B2], [DEFB10P1], [DEFA5], [ZFYVE20] |
| rs11053044 | T2D | ALG10, ALG10B |
| rs11118278 | CAD | CR1L, MCP |
| rs1192923 | HT | [ORAOV1] |
| rs12227938 | BD, CAD, T1D | ALG10, ALG10B |
| rs12256867 | T2D | ZNF33A, ZNF37A, ZNF33B, ZNF37B, [ZNF25] |
| rs12413153 | CAD | DDX18, BTBD15, WDR22, [IBRDC2] |
| rs1291361 | BD | HTR7, HTR7P, [HEBP1] |
| rs1404223 | CAD | - |
| rs17080801 | T2D | PARP4, TPTE2 |
| rs17230081 | T2D | ORM1, ORM2 |
| rs17636964 | CD | IPMK |
| rs17645907 | T2D | [POMZP3] |
| rs1842055 | CAD | - |
| rs1868584 | CAD, HT, RA, T1D | ROCK2, CGGBP1, [ZNF654] |
| rs2120273 | BD | - |
| rs2236014 | BD | MTRF1L, [FBX05] |
| rs2515832 | RA | MAGEA12, CSAG1, MAGEA2, MAGEA3, TRAG3, |
| [MAGEA6] | ||
| rs2523544 | T1D | DHFRP2, DHFRL1, DHFR, PSMA8, [HLA-B], [MSH3], [NSUN3] |
| rs2617729 | CD, T2D | ZNF761, ZNF765, ZNF813, [ZNF331] |
| rs330201 | CAD | MRPL10, [LRRC46], [OSBPL7] |
| rs3858741 | BD, CD, HT | PSPC1, [TUBA3C] |
| rs4318932 | CD | TYW1, TYW1B, [STAG3L4] |
| rs4453734 | CAD, RA | - |
| rs4473816 | RA | [GSPT2] |
| rs4532803 | BD, HT | ELA3A, ELA3B, [HSPC157] |
| rs4545817 | BD | ALG1, FAM86A, [COL6A4P2] |
| rs4881702 | BD | - |
| rs500192 | BD, T1D | TBL1XR1 |
| rs5946541 | BD | [BAGE] |
| rs6427130 | RA | XCL1, XCL2 |
| rs6463213 | BD, T2D | RBAK, RNF216L, XKR8 |
| rs6744284 | BD | UGT1A3 -UGT1A10 |
| rs6945984 | RA | CYP3A4, CYP3A7, [CYP3A5] |
| rs7259082 | CAD | ZNF737, M74509, ZNF66 |
| rs7549545 | BD | [IER5] |
| rs7677996 | T1D | [UGT2B7] |
| rs7808342 | BD | - |
| rs940331 | T2D | [ZNF735], [ZNF716] |
| rs9551988 | T2D | PSPC1, [TUBA3C] |
| rs9624808 | T1D | LRP5, LRP5L |
| rs9665670 | BD, CAD | [PDSS1] |
| rs9775226 | CD, HT | (40 duplicons) |
| SNP_A-1797773 | BD, CD | VPS35, [ORC6L] |
| SNP_A-1817967 | CD | FAM22A |
| SNP_A-1858955 | RA | GUSBL1, GUSBL2, SMA4, GUSBP1, [RGL4] |
Genes in square brackets are outside the duplicons, but a duplicon is at most 30 kb upstream of the gene. Genes for two SNPs having 40 duplicons each are omitted.
Additional SNPs identified using the no-call-only filter.
| SNP | Disease(s) | Characterized genes in duplicons |
|---|---|---|
| rs10238378 | BD | - |
| rs10485575 | BD | SNX5, ANO4 |
| rs10768666 | RA | HCCA2, KRTAP5-8, KRTAP5-3, [KRTAP5-2], [KRTAP5-1], [KRTAP5-5], [KRTAP5-9], [KRTAP5-10], |
| rs10811497 | BD | IFNA4, IFNA7, IFNA10, IFNA14, IFNA16, IFNA17, IFNA21, [IFNW1] |
| rs10896468 | BD, CD, T2D | OR8U8, OR5M8, [OR5M3], [OR5M9] |
| rs11228904 | BD, HT, T1D | TRIM48, TRIM53 |
| rs11583656 | HT | MYPT2, [UBE2T] |
| rs1191684 | BD | [PAX8] |
| rs12428824 | BD | ENPP3, CTAGE4, CTAGE6, [OR2A7], [OR2A20P], [OR2A4] |
| rs1421867 | T1D | - |
| rs1708080l | BD, HT | PARP4, TPTE2 |
| rs17310770 | T2D | ROPN1, ROPN1B, CCDC14 |
| rs17423694 | HT | [NBPF11] |
| rs17636964 | BD, RA, T2D | IPMK |
| rs1809667 | T1D | HCCA2, KRTAP5-2, KRTAP5-8, KRTAP5-3, KRTAP5-10, KRTAP5-11, KRTAP5-7, [KRTAP5-1], [DUSP8], [KRTAP5-5], [KRTAP5-9] |
| rs1819829 | HT | CES7, [CES1] |
| rs1820450 | RA | GPC5, [GOLGA8B] |
| rs1868584 | BD | ROCK2, CGGBP1, [ZNF654] |
| rs193017l | T1D | PCDH15 |
| rs2039945 | T2D | - |
| rs2804672 | HT | HSD17B7, HSD17B7P2, CDC10L |
| rs3864439 | BD | DPY19L2, DPY19L2P1, DPY19L2P4, [DPY19L1], [STEAP1] |
| rs4236384 | RA | SLC29A4, TNRC18 |
| rs4318932 | T2D | TYW1, TYW1B, [STAG3L4] |
| rs4471699 | BD, T2D | SULT1A3, GIYD2, BOLA2, IMAA, CORO1A, MLAS, SMG1, [UQCRC2], [NPIPL3] |
| rs4532803 | CAD | ELA3A, ELA3B, [HSPC157] |
| rs4545817 | HT | ALG1, FAM86A, [COL6A4P2] |
| rs584630 | BD | ZNF33A, ZNF33B, ZNF37A, ZNF37B, [ZNF25] |
| rs649483l | RA | FMN1 |
| rs6510085 | RA | ZNF419, ZNF773, [ZNF772], [ZNF549] |
| rs651263l | CAD | - |
| rs731999l | BD, CAD, HT, T1D, T2D | CENPI |
| rs8182488 | T1D | ZNF765, ZNF761, [ZNF813] |
| rs9948005 | BD | FAM38B |
| rs9976299 | RA | ITGB2 |
| SNP_A-1817967 | CAD, CD | - |
| SNP_A-1858955 | CD, BD | GUSBL1, GUSBL2, SMA4, GUSBP1, [RGL4] |
Genes in square brackets are outside the duplicons, but a duplicon is at most 30 kb upstream of the gene.
SNPs with anomalous cluster plots.
| Filter | SNP | Disease(s) | Cluster plot feature |
|---|---|---|---|
| Stringent, relaxed | rs10502407 | CAD, T2D, BD | 58C disparity |
| Stringent | rs11010908 | T2D | 58C disparity |
| Stringent | rs12070036 | BD | 58C disparity |
| Stringent | rs12381130 | T1D | 58C disparity |
| Stringent | rs295470 | CAD | 58C disparity |
| Stringent, relaxed | SNP_A-1797773 | T2D, BD, CD | 58C disparity |
| Stringent (MHC in T1D) | rs9257223 | T1D | 58C disparity |
| Relaxed | rs11028186 | RA | 58C disparity |
| Relaxed | rs12256867 | T2D | 58C disparity |
| Relaxed | rs1404223 | CAD | 58C disparity |
| Relaxed | rs17230081 | T2D | 58C disparity |
| Relaxed | rs1842055 | CAD | 58C disparity |
| Relaxed | rs330201 | CAD | 58C disparity |
| Relaxed, no-call | rs4318932 | T2D | NBS/CAD/RA disparity |
| Relaxed | rs4473816 | RA | 58C disparity |
| Relaxed | rs7259082 | CAD | 58C disparity |
| Relaxed | rs9665670 | BD, CAD | 58C disparity |
| Relaxed, no-call | SNP_A-1817967 | CD | 58C disparity |
| No-call | rs10238378 | BD | 58C disparity |
| No-call | rs10485575 | BD | 58C disparity |
| No-call | rs10811497 | BD | 58C disparity |
| No-call | rs12428824 | BD | 58C disparity |
| No-call | rs1421867 | T1D | more than 3 clusters |
| No-call | rs1819829 | HT | NBS/CAD/RA disparity |
| No-call | rs2039945 | T2D | 58C disparity |
| No-call | rs2804672 | HT | NBS/CAD/RA disparity |
| No-call | rs6512631 | CAD | 58C disparity |
Linkage between stringent-filter SNPs and adjacent SNPs.
| Stringent test SNP | Disease | Adjacent SNP | Comments | |
|---|---|---|---|---|
| rs9551988 | HT | rs3858741 | 3.2 × 10-12 | 1.1 kb away, within same duplicon |
| BD | rs3858741 | 8.9 × 10-8 | (No linkage for CAD.) | |
| rs9378249 | BD | rs2596477 | 3.7 × 10-7 | 22 bp away, within same duplicon |
| HT | rs2596477 | 6.0 × 10-4 | ||
| rs841245 | HT | rs12229182 | 2.5 × 10-6 | 12 kb away, within same duplicon |
| rs841636 | 1.7 × 10-5 | 12 kb away, within same duplicon | ||
| (No linkage at intervening SNP rs10842853) | ||||
| SNP_A-1948953 | BD | rs9893203 | 5.4 × 10-4 | 8.6 kb away |
| HT | rs9893203 | 6.8 × 10-4 | ||
| rs11010908 | T2D | rs17561365 | 1.2 × 10-3 | 3.2 kb away, within same duplicon |
The P value corresponds to a one-sided chi-squared test for an increased no-call rate. P values for excluded SNPs were all above 2 × 10-3.
De-Novo conversion events in disease [1].
| Disease | Donor | Acceptor |
|---|---|---|
| Atypical haemolytic uraemic syndrome | CFHR1 | CFH |
| Congenital adrenal hyperplasia | CYP21A1P | CYP21A2 |
| Neural tube defects | FOLR1P | FOLR1 |
| Hereditary persistence of fetal haemoglobin | HBG2 | HBG1 |
| Shwachman-Diamond syndrome | SBDSP | SBDS |
Possible conversion in duplicons for genes previously observed to have undergone germ-line conversion.
| Genes | Number of SNPs | SNPs showing |
|---|---|---|
| CFHR1/CFH | 7 | rs395998, rs413979 |
| FOLR1P/FOLR1 | 5 | rs1540087 |
| HBG2/HBG1 | 3 | rs6578592 |
| SBDSP/SBDS | 38 | rs4717344, SNP_A-1849003, rs4718487, rs1465306, rs2003206 |
Events are identified by visually inspecting cluster plots for all SNPs in the region.