| Literature DB >> 25422094 |
Michelle Daya, Lize van der Merwe, Christopher R Gignoux, Paul D van Helden, Marlo Möller, Eileen G Hoal1.
Abstract
BACKGROUND: The admixed South African Coloured population is ideally suited to the discovery of tuberculosis susceptibility genetic variants and their probable ethnic origins, but previous attempts at finding such variants using genome-wide admixture mapping were hampered by the inaccuracy of local ancestry inference. In this study, we infer local ancestry using the novel algorithm implemented in RFMix, with the emphasis on identifying regions of excess San or Bantu ancestry, which we hypothesize may harbour TB susceptibility genes.Entities:
Mesh:
Year: 2014 PMID: 25422094 PMCID: PMC4256931 DOI: 10.1186/1471-2164-15-1021
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Percentage of miss-called ancestry
| Percentage of total [IQR] | Percentage of Ancestry [IQR] | |||
|---|---|---|---|---|
| Type of miss-call | LAMP-LD | RFMix | LAMP-LD | RFMix |
| San as Bantu | 4.10 [1.65–9.76] | 1.95 [0.88–3.55] | 13.28 [5.12–35.16] | 6.19 [3.13–10.45] |
| San as non-African | 1.11 [0.05–9.58] | 0.27 [0.11–0.66] | 3.12 [0.18–36.09] | 0.89 [0.36–2.05] |
| Bantu as San | 0.08 [0.00–5.49] | 0.04 [0.00–0.13] | 0.32 [0.00–21.54] | 0.14 [0.00–0.49] |
| Bantu as non-African | 0.45 [0.02–8.63] | 0.09 [0.02–0.23] | 2.36 [0.07–33.44] | 0.37 [0.09–0.92] |
| Non-African as San | 0.14 [0.00–8.31] | 0.09 [0.03–0.19] | 0.42 [0.00–25.87] | 0.25 [0.08–0.54] |
| Non-African as Bantu | 0.95 [0.14–9.66] | 0.18 [0.07–0.33] | 3.09 [0.35–28.68] | 0.47 [0.20–0.92] |
This table reports the interquartile range (IQR) of the percentage of SNPs that were miss-called by LAMP-LD and RFMix per each of the six possible miss-call categories. The known ancestry of a simulated data set of 1500 SAC chromosomes was compared to the ancestry called by the software program (chromosome 1). The median percentage of miss-called SNPs across all SNPs as well as the median percentage of miss-called SNPs across SNPs of that source ancestry are shown. San ancestry can for example be miss-called as either Bantu or non-African ancestry. The median percentage of all SNPs that were miss-called as such are shown in the second and third columns of the first two rows, and the median percentage of San SNPs that were miss-called as such are shown in the fourth and fifth columns of the first two rows. The mean proportion of San, Bantu and non-African ancestry in the simulated data set was 0.3342, 0.2772 and 0.3885 respectively. The difference in number of SNPs miss-called by RFMix, compared to the corresponding number of SNPs miss-called by LAMP-LD, were significant with p-values 210 for each of the six possible miss-call categories.
Correlation between the number of miss-called ancestry segments and deviation in ancestry
| Deviation | |||
|---|---|---|---|
| Number miss-called | San | Bantu | Non-African |
| San as Bantu | -0.83 | +0.89 | +0.05 |
| San as non-African | -0.39 | +0.19 | +0.43 |
| Bantu as San | +0.01 | +0.09 | -0.17 |
| Bantu as non-African | -0.02 | +0.13 | -0.21 |
| Non-African as San | +0.01 | +0.06 | -0.14 |
| Non-African as Bantu | -0.11 | +0.24 | -0.21 |
This table summarizes the correlation between the number of ancestry miss-calls that occurred at a segment of ancestry, per each of the six possible miss-call categories, and the deviation in local ancestry of the segment. Miss-called ancestry was identified by comparing the known ancestry of a simulated data set of 1500 SAC chromosomes to the ancestry called by RFMix (chromosome 1). Deviations in ancestry were calculated by subtracting the overall mean RFMix ancestry from the local ancestry of each segment, for each of the three source ancestries (San, Bantu, non-African).
Figure 1Mean local ancestry across the genome. The mean local ancestry estimates of TB cases and controls are shown per genomic position, for each of the source ancestries. Each panel represents a separate chromosome.
Regions of the genome with excess San ancestry in TB cases relative to controls
| Length | Mean San ancestry | ||||
|---|---|---|---|---|---|
| Region | Begin-end SNP | (Nr SNPs) | TB cases | Control | Genes |
| 1p31 | rs12144711-rs7554551 | 671230 (123) | 0.2902 | 0.1615 | GADD45A, GNG12, DIRAS32 |
| 9q21 | rs2309428-rs1847503 | 2080640 (323) | 0.2909 | 0.1609 | FAM189A, APBA1, PTAR1, C9orf135, MAMDC2, SMC5, KLF9 |
| 22q12 | rs16986925-rs6006426 | 1290997 (152) | 0.2850 | 0.1745 | C22orf31, KREMEN1, EWSR1, RHBDD3, EMID1, AP1B1, RASL10A, GAS2L1, NEFH, RFPL1, NF2, NIPSNAP1, THOC5, UQCR10, CABP7, ZMAT5, ASCC2, MTMR3, HORMAD2, LIF, OSM |
This table summarizes regions of the genome with excess San ancestry, found in TB cases relative to controls, after adjusting for age, gender and genome-wide San ancestry. Ancestry segments that are associated with increased San ancestry in cases compared to controls were identified and contiguous segments were merged. P-values for each of the individual ancestry segments are available in Additional file 1: Table S1. The mean RFMix genome-wide San ancestry estimates are 0.2304 and 0.1847 for cases and controls respectively, and the standard deviation of San local ancestry deviations is 0.0258 and 0.0321 in cases and controls respectively. Only regions of 500 000 base pairs or longer are shown (two short regions on chromosome 5 were excluded).
Regions of the genome with excess African ancestry in TB cases relative to controls
| Length | Mean African ancestry | ||||
|---|---|---|---|---|---|
| Region | Begin-end SNP | (Nr SNPs) | TB cases | Control | Genes |
| 5q11 | rs26090-rs1382907 | 739064 (70) | 0.6480 | 0.4615 | ISL1 |
| 10q22 | rs827299-rs7083934 | 6243529 (693) | 0.6607 | 0.5030 | UNC5B, SLC29A3, CDH23, C10orf105, PSAP, B7-H5, CHST3, SPOCK2, ASCC1, DDIT4, DNAJB12, MICU1, MCU, OIT3, PLA2G12B, P4HA1, NUDT13, FAM149B1, DNAJC9, MRPS16, TTC18, ANXA7, PPP3CB, MSS51, MYOZ1, AGAP5, SYNPO2L, CAMK2G, NDST2, SEC24C, ZSWIM8, FUT11, CHCHD1, PLAU, C10orf55, VCL, AP3M1, ADK, KAT6B, DUPD1, SAMD8, DUSP13, VDAC2, COMTD1, ZNF503 |
| 15q15 | rs1712435-rs16966424 | 2669916 (182) | 0.6511 | 0.4963 | PLA2G4D, VPS39, GANC, TMEM87A, CAPN3, SNAP23, ZNF106, HAUS2, LRRC57, TTBK2, CDAN1, UBR1, EPB42, TMEM62, TGM5, TGM7, TP53BP1, LCMT2, ZSCAN29, TUBGCP4, ADAL, CKMT1B, MAP1A, PPIP5K1, STRC, CKMT1A, CATSPER2, PDIA3, MFAP1, SERF2, HYPK, ELL3, SERINC4, WDR76, FRMD5, CASC4, CTDSPL2, EIF3J, SPG11, PATL2 |
| 17q22 | rs7210845-rs9908090 | 5200677 (479) | 0.6579 | 0.4698 | ANKFN1, NOG, C17orf67, TRIM25, DGKE, COIL, SCPEP1, AKAP1, MSI2, CCDC182, MRPS23, CUEDC1, SRSF1, VEZF1, DYNLL2, EPX, OR4D1, MKS1, OR4D2, LPO, MPO, BZRAP1, SUPT4H1, RNF43, HSF5, SEPT4, MTMR4, C17orf47, TEX14, RAD51C, PPM1E, TRIM37, SKA2, GDPD1, SMG8, PRR11, CLTC, DHX40, PTRH2, VMP1, RPS6KB1, TUBD1, RNFT1, HEATR6, CA4, USP32, SCARNA20, RPL32P32, C17orf64, APPBP2, PPM1D |
This table summarizes regions of the genome with excess African ancestry (San or Bantu), found in TB cases relative to controls, after adjusting for age, gender and genome-wide African ancestry. Ancestry segments that are associated with increased African ancestry in cases compared to controls were identified and contiguous segments were merged. P-values for each of the individual ancestry segments are available in Additional file 1: Table S2. The mean RFMix genome-wide African ancestry estimates are 0.6096 and 0.5238 for cases and controls respectively, and the standard deviation of local ancestry deviations is 0.0187 and 0.0336 in cases and controls respectively. Only regions of 500 000 base pairs or longer and that contain protein coding genes are shown (one short region on chromosome 5, one short region on chromosome 6, and four short regions on chromosome 10 were excluded).
Source population data
| Population | Group | Description | Source | Platform | Size |
|---|---|---|---|---|---|
| San | Ju |’hoansi San from North Namibia | Private | Affymetrix 6.0 | 21 | |
| Bantu | YRI | Yoruba in Ibadan, Nigeria | HapMap3 | Release 2 | 112 |
| Non-African | CEU | Utah residents with Northern and Western European ancestry, USA | HapMap3 | Release 2 | 112 |
| GIH | Gujarati Indians from Houston, Texas, USA | HapMap3 | Release 2 | 88 | |
| JPT + CHB | Japanese in Tokyo, Japan and Han Chinese in Beijing, China | HapMap3 | Release 2 | 170 |
Data sets used to represent the source populations of the South African Coloured population. The sample size reflects the group size after relative pairs have been removed.