| Literature DB >> 29661137 |
Rajib Dutta1,2,3, Arnab Saha-Mandal4,5, Xi Cheng1, Shuhao Qiu1,2, Jasmine Serpen6,7, Larisa Fedorova8, Alexei Fedorov9,10.
Abstract
BACKGROUND: GC-Biased Gene Conversion (gBGC) is one of the important theories put forward to explain profound long-range non-randomness in nucleotide compositions along mammalian chromosomes. Nucleotide changes due to gBGC are hard to distinguish from regular mutations. Here, we present an algorithm for analysis of millions of known SNPs that detects a subset of so-called "SNP flip-over" events representing recent gBGC nucleotide changes, which occurred in previous generations via non-crossover meiotic recombination.Entities:
Keywords: Bioinformatics; Evolution; Mutation; Polymorphism; SNP
Mesh:
Substances:
Year: 2018 PMID: 29661137 PMCID: PMC5902838 DOI: 10.1186/s12864-018-4593-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Characterization of haplotypes of frequent genetic variants and putative case of BGC event. a - arrangement of computationally processed chromosomal segments for analysis of haplotypes. Autosomes have been divided into 56,328 segments, each containing 50 high-frequency (MAF > 25%) genetic variants. b An example of common haplotypes inside the segment 23 of Chromosome 1. Haplotypes were constructed from 50 adjacent high-frequency (MAF > 25%) genetic variants and are represented by the strings of fifty 0 s and 1 s, where “0” means the presence of a reference allele, while “1” means an alternative allele in the haplotype. The haplotypes that occur ≥100 times in the 1092 individuals are defined as ‘common haplotypes’ and are listed in descending order of their occurrence. The exemplified segment 23 has three common haplotypes. Putative BGC events were searched only in individuals who have one common haplotype and another nearly identical rare haplotype, which has only one allele difference with the common haplotype at the “acceptor” site (marked with a blue square). In this example such conditions were found in individual NA20787 from the TSI population. In the two parental haplotypes of TSI_NA20787, the first (Parent 1) is a common haplotype, which occurs 216 times in the 1092 genomes. The other haplotype (Parent 2), despite being identical to the common haplotype at 49 polymorphic sites, is a rare haplotype which occurs only once in the 1092 individuals. This rare haplotype contains the Acceptor site (marked with a blue square), which represents a case of putative base pair conversion event at this location in one of the ancestors of this individual. The location of this acceptor site in the haplotype string, its reference and alternative alleles, and base change due to BGC event (purple arrow) is shown in the bottom of the figure. Detailed information about every segment and all putative BGC events are available from our web site
Distribution of computational segments and putative gene conversion events in all autosomes
| Chromo-some | # of Segments | Rare haplotype count = 1 | Rare haplotype count <= 5 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AT → GC cases | GC → AT cases | No Base Change cases | Total cases | AT → GC cases | GC → AT cases | No Base Change cases | Total cases | ||
| 1 | 4359 | 1964 | 1677 | 637 | 4278 | 8973 | 8200 | 3181 | 20,354 |
| 2 | 4660 | 2318 | 1921 | 760 | 4999 | 10,514 | 8928 | 3605 | 23,047 |
| 3 | 4048 | 1977 | 1640 | 632 | 4249 | 8785 | 7715 | 3196 | 19,696 |
| 4 | 4191 | 1945 | 1891 | 722 | 4558 | 9034 | 8442 | 3373 | 20,849 |
| 5 | 3687 | 1786 | 1513 | 629 | 3928 | 7842 | 7190 | 3052 | 18,084 |
| 6 | 3869 | 1791 | 1614 | 629 | 4034 | 8516 | 7835 | 2975 | 19,326 |
| 7 | 2838 | 1361 | 1227 | 492 | 3080 | 6133 | 5431 | 2311 | 13,875 |
| 8 | 3179 | 1631 | 1287 | 560 | 3478 | 7124 | 6170 | 2691 | 15,985 |
| 9 | 2389 | 1163 | 963 | 420 | 2546 | 4898 | 4253 | 1814 | 10,965 |
| 10 | 2830 | 1311 | 1112 | 423 | 2846 | 5897 | 5283 | 1997 | 13,177 |
| 11 | 2838 | 1352 | 1196 | 479 | 3027 | 5943 | 5444 | 2302 | 13,689 |
| 12 | 2676 | 1150 | 1035 | 391 | 2576 | 5475 | 4978 | 1953 | 12,406 |
| 13 | 2135 | 948 | 772 | 282 | 2002 | 4334 | 4000 | 1518 | 9852 |
| 14 | 1852 | 862 | 757 | 277 | 1896 | 3781 | 3368 | 1422 | 8571 |
| 15 | 1647 | 789 | 627 | 251 | 1667 | 3362 | 2822 | 1176 | 7360 |
| 16 | 1723 | 794 | 629 | 367 | 1790 | 3440 | 2787 | 1501 | 7728 |
| 17 | 1540 | 713 | 606 | 201 | 1520 | 3144 | 2709 | 1027 | 6880 |
| 18 | 1629 | 720 | 603 | 226 | 1549 | 3057 | 2728 | 1110 | 6895 |
| 19 | 1334 | 662 | 511 | 196 | 1369 | 2678 | 2314 | 868 | 5860 |
| 20 | 1272 | 559 | 449 | 146 | 1154 | 2350 | 2083 | 785 | 5218 |
| 21 | 840 | 384 | 326 | 121 | 831 | 1620 | 1411 | 574 | 3605 |
| 22 | 792 | 403 | 290 | 125 | 818 | 1643 | 1321 | 523 | 3487 |
The first two columns of Table 1 lists the number of computationally generated segments in different human autosomes. The next four columns describe number of AT to GC, number of GC to AT, number of ‘No Base Change’ and total mismatch repair cases respectively in all autosomes when only single rare haplotype occurrence in the 1092 genomes was considered. The last four columns present number of AT to GC cases, number of GC to AT cases, number of ‘No Base Change’ cases and total mismatch repair cases respectively in all autosomes when rare haplotype occurrence <= 5 in the 1092 genomes was considered
Fig. 2Number of AT - > GC vs GC - > AT changes due to putative base pair conversion events. The number of identified base pair conversion events is presented along the horizontal axis, while the vertical axis shows the different computational conditions for registration of these events. We considered cases where the rare haplotype (with the Acceptor site) occurs only once among the 1092 individuals (labeled as 1, at the bottom), twice among the 1092 individuals (labeled as 2), single and double occurrences taken together (labeled as 1 and 2) and less than or equal to 5 occurrences among the 1092 individuals (labeled as 1 to 5)
Fig. 3Distribution of local GC content in regions surrounding AT - > GC vs GC - > AT conversion events. Local GC content was calculated within 100 bp window by considering 50 nucleotides before and after each putative AT - > GC or GC - > AT conversion site
The red line shows the distribution of local GC content around 22,646 AT - > GC conversion events while the blue line presents the local GC content around the same number of GC - > AT cases. The yellow line (control) represents local GC content around same number of sites selected randomly, independent of gene conversion events.