| Literature DB >> 22321652 |
Alejandro Cáceres1, Suzanne S Sindi, Benjamin J Raphael, Mario Cáceres, Juan R González.
Abstract
BACKGROUND: Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies.Entities:
Mesh:
Year: 2012 PMID: 22321652 PMCID: PMC3296650 DOI: 10.1186/1471-2105-13-28
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Scan search for inversions in simulated data. We show the Bayesian information criterion (BIC) for each possible window (fixed size 0.4 Mb) for genotype data on a simulated inversion with 40% frequency located at position 0.75-1.25 Mb. Our method clearly favors rejecting the null (no inversion) model (when BIC>0) in the region of inversion.
Figure 2Accuracy in classification of chromosomes (haplotype data) and subjects (genotype data) into normal and inverted subpopulations. We compute the classification of individuals according to Equation 7 using the sliding window segments, from Figure 1 that overlap and have BIC > tin one simulation sample. We compare results for the our method on haplotypes and three local phasing procedures for genotype data. Error bars indicate data within the second and third quartile.
Figure 3Average accuracy in the classification of subjects (haplotypes) into normal and inverted populations. The picture shows the average and individual classification accuracies across simulated cases (50) as a function of t, for different SNP densities.
Figure 4Average accuracy in the classification of subjects (genotypes) into normal and inverted populations. The picture shows the average and individual classification accuracies across simulated cases (50) as a function of t, for different lengths and population frequency of 60%. The simulated regions were scanned with a window size of 60% of the real inversion and block size of 5 SNPs.
Figure 5Maximum accuracy in the classification of subjects (genotypes) into normal and inverted populations as function of inversion age. The figure shows mean classification accuracy across tvalues for all the simulation cases (2250). Our method achieves high accuracy for large inversions and a wide range of frequencies, as simulated by invertFREGENE. We find low accuracy in a discontinuous cluster of older and high frequency inversions, suggesting a critical behavior in the simulations at this point which we discuss further in the main text.
Inverted sequences found in chromosome 16 of the CEU population using haplotype (phased) and genotype data
| Data type | window | LBPmin | LBPmax | RBPmin | RBPmax | MaxBic | invFreq | Ns |
|---|---|---|---|---|---|---|---|---|
| Haplotypes | 0.4 | 28.23496 | 28.40377 | 28.67367 | 28.80467 | 117.29 | 0.60 | 118 |
| 0.4 | 33.70440 | 34.67086 | 34.11169 | 35.07705 | 153.01 | 0.39 | 494 | |
| 0.4 | 45.68583 | 46.08754 | 46.08754 | 46.49553 | 236.34 | 0.77 | 1302 | |
| 0.4 | 66.24902 | 66.44043 | 66.65147 | 66.84146 | 189.82 | 0.40 | 350 | |
| 0.4 | 68.51016 | 68.66370 | 68.91604 | 69.06441 | 159.35 | 0.42 | 641 | |
| 0.4 | 70.99660 | 71.05418 | 71.39778 | 71.45419 | 99.61 | 0.40 | 17 | |
| 0.7 | 33.49144 | 34.31572 | 34.19297 | 35.03194 | 207.00 | 0.49 | 923 | |
| 0.7 | 45.69871 | 45.81018 | 46.40781 | 46.51082 | 186.17 | 0.74 | 259 | |
| Genptypes | 0.4 | 34.07920 | 34.55067 | 34.48884 | 35.00029 | 137.04 | 0.19 | 6 |
| 0.4 | 68.51016 | 68.66370 | 68.93968 | 69.06441 | 147.57 | 0.46 | 38 | |
The segments where identified scanning the whole chromosome with window sizes 0.4 Mb and 0.7 Mb. Overlapping trial segments with BIC > 50 where selected for the final classification of chromosomes and to set the limits of the inverted sequences. Keys: LBPmin: minimum left breakpoint coordinate, LBPmax: maximum left breakpoint, RBPmin: minimum right breakpoint, RBPmax: maximum right breakpoint, MaxBic: maximum BIC, invFreq: frequency of the inversion within the population, Ns: Number of overlapping segments in the inversion.
Predicted inversions on chromosome 17 for the CEU and YRI populations on genotype data (window size 0.4Mb and t= 50).
| Population | LBPmin | LBPmax | RBPmin | RBPmax | MaxBic | invFreq | Ns |
|---|---|---|---|---|---|---|---|
| CEU | 21.96788 | 22.09834 | 22.37082 | 22.50184 | 99.32 | 0.25 | 24 |
| 24.96601 | 25.14566 | 25.36840 | 25.54785 | 137.18 | 0.47 | 265 | |
| 41.07267 | 41.64131 | 41.47280 | 42.09285 | 196.52 | 0.22 | 444 | |
| 53.84780 | 54.18923 | 54.25283 | 54.59168 | 322.95 | 0.28 | 802 | |
| YRI | 21.88505 | 22.11686 | 22.36658 | 22.52599 | 111.94 | 0.39 | 9 |
| 24.99465 | 25.09088 | 25.39824 | 25.49422 | 79.01 | 0.57 | 11 | |
| 53.90580 | 54.11198 | 54.30626 | 54.51477 | 235.38 | 0.46 | 12 | |
One of the CEU predictions corresponds to a known inversion at 17q21, and was also found with window size 0.7Mb (see main text). Consistent with experimental evidence this is a CEU specific inversion, no inversions where predicted in the YRI population in this region with any window size.
Frequency of the inversion in 17q21 across all HapMap III populations
| genInv | ASW | CEU | CHB | CHD | GIH | JPT | LWK | MEX | MKK | TSI | YRI |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Hom | 81 | 59 | 100 | 100 | 82 | 99 | 100 | 73 | 89 | 44 | 100 |
| invHet | 18 | 41 | 0 | 0 | 17 | 1 | 0 | 23 | 9 | 41 | 0 |
| invHom | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 4 | 2 | 15 | 0 |
| chr freq | 0.10 | 0.26 | 0 | 0 | 0.10 | 0.01 | 0 | 0.17 | 0.06 | 0.44 | 0 |
Hom: non-inverted homozygous, Het: inverted heterozygous, and invHom: inverted homozygous. The chromosomal frequency of the inversion is highest in the European populations (CEU + TSI), see last row of the table. Population Key: ASW: African ancestry in Southwest USA, CEU: Utah residents with Northern and Western European ancestry, CHB: Han Chinese in Beijing, CHD: Chinese in Metropolitan Denver, GIH: Gujarati Indians in Houston, JPT: Japanese in Tokyo, LWK: Luhya in Webuye, MXL: Mexican ancestry in Los Angeles, MKK: Maasai in Kinyawa, TSI: Toscani, YRI: Yoruba in Ibadan.
Tagging of the inversion in 17q21 as detected by inveRsion and SNP "rs1800547"
| no-inversion homozygous | inversion heterozygous | inversion homozygous | |
|---|---|---|---|
| rs1800547 homozygous | 985 | 0 | 0 |
| rs1800547 heterozygous | 9 | 166 | 0 |
| rs1800547 variant-homozygous | 0 | 1 | 23 |
The H1 and H2 haplotypes have been traditionally associated to the inversion status in the 17q21 inversion. From those haplotypes the SNP "rs1800547" from the MAPT gene has been used to tag the inversion in 17q21 [15]. As expected, our classification of individuals from all 11 HapMap populations agrees well with the their status for this SNP.
Figure 6Scan across chromosome 17 for the joint CEU + YRI populations. We conduct an extensive local search for a known inversion on chromosome 17. We show the BIC values for each sliding window of size 0.4 Mb; a horizontal line indicates the zero level for which the inversion model favors the mixture of inverted and non-inverted populations. The inversion at 17q21 (~ 40 Mb) is clearly visible. The full list of inversions are given in Additional file 1, Table S4.
Experimentally validated inversions
| CHR | Inv. Size (Mb) | Cyt.band | Source | Segment | inveRsion scan | Note |
|---|---|---|---|---|---|---|
| ch3 | 1.9 | 3q29 | Antonacci 2009 | 196886879-198874600 | 192235076-193551650 | within 3 Mb distance |
| chr4 | 5.0572 | 4p16.1-16.2 | Giglio 2002 | 3792970-9461815 | ||
| chr7 | 0.9615 | 7p22.1 | Feuk 2005 | 5832188-6899188 | ||
| chr7 | 0.0179 | 7q11.22 | Feuk 2005 | 70058906-70076823 | ||
| chr7 | 2.2186 | 7q11.23 | Osborne 2001 | 71956869-74995982 | ||
| chr8 | 4.6117 | 8p23.1 | Giglio 2002 | 6913382-12332070 | 8804291-10982030 | exten. search |
| chr9 | 23.5 | 9p12-q13 | Starke 2002 | 37000000-71000000 | ||
| chr10 | 22.6 | 10p11.21-q21.1 | Gilling 2006 | 37147500-59748500 | 37983987-57966260 | each BP separately |
| chr15 | 5.9972 | 15q11.2-13.1 | Gimelli 2003 | 20459937-27687533 | 26039213-27099713 | right BP only |
| chr15 | 2 | 15q13.3 | Antonacci 2009 | 28524207-30602466 | 27030510-27441289 | within 1 MB distance |
| chr15 | 1.2 | 15q24 | Antonacci 2009 | 72151413-73356183 | 72449440-73651390 | |
| chr16 | 0.3052 | 16p11.2 | Martin 2004 | 28256775-28695952 | 28234960-28804670 | haplotype data only CEU |
| chr16 | 0.0011 | 16q24.1 | Feuk 2005 | 83746238-83747302 | ||
| chr17 | 1.5 | 17q12 | Antonacci 2009 | 31888441-33393152 | 34694635-35097385 | within 1 Mb distance |
| chr17 | 0.9 | 17q21.31-21.32 | Stefansson 2005 | 40930361- 1930361 | 41111654-42092850 |
The table shows the details of 15 experimentally validated autosomic inversions, and their detection by a genome-wide scan on genotype data performed by inveRsion.