| Literature DB >> 22303327 |
Conghui Qu1, Johanna M Schuetz, Jeong Eun Min, Stephen Leach, Denise Daley, John J Spinelli, Angela Brooks-Wilson, Jinko Graham.
Abstract
We describe a statistical approach to predict gender-labeling errors in candidate-gene association studies, when Y-chromosome markers have not been included in the genotyping set. The approach adds value to methods that consider only the heterozygosity of X-chromosome SNPs, by incorporating available information about the intensity of X-chromosome SNPs in candidate genes relative to autosomal SNPs from the same individual. To our knowledge, no published methods formalize a framework in which heterozygosity and relative intensity are simultaneously taken into account. Our method offers the advantage that, in the genotyping set, no additional space is required beyond that already assigned to X-chromosome SNPs in the candidate genes. We also show how the predictions can be used in a two-phase sampling design to estimate the gender-labeling error rates for an entire study, at a fraction of the cost of a conventional design.Entities:
Keywords: X-chromosome SNPs; candidate-gene association study; error rates; gender-labeling errors; genotype intensities; heterozygosity; quality control; two-phase sampling design
Year: 2011 PMID: 22303327 PMCID: PMC3270323 DOI: 10.3389/fgene.2011.00031
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Stratum labels (strat), numbers within each stratum (tot), numbers within strata of known true gender (gen) and true gender counts (tf = true female, tm = true male, tf + tm = gen).
| Prob. male | Labeled gender | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Female | Male | |||||||||
| strat | tot | gen | tf | tm | strat | tot | gen | tf | tm | |
| [0, 0.2] | s1 | 534 | 42 | 42 | 0 | s2 | 6 | 5 | 5 | 0 |
| (0.2, 0.8] | s3 | 1 | 1 | 1 | 0 | s4 | 3 | 1 | 0 | 1 |
| (0.8, 1] | s5 | 12 | 10 | 0 | 10 | s6 | 654 | 131 | 0 | 131 |
Each stratum is defined by a combination of labeled gender and the probability of being labeled male (Prob. male) estimated by our approach.
Stratum labels (strat), second-phase sampling fractions (frac), numbers in second-phase sample (sam = frac × tot, where tot is given in Table .
| Prob. male | Labeled gender | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Female | Male | |||||||||
| strat | frac | sam | tf | tm | strat | frac | sam | tf | tm | |
| [0, 0.2] | s1 | 0.028 | 15 | 15 | 0 | s2 | 0.833 | 5 | 0 | |
| (0.2, 0.8] | s3 | 1.000 | 1 | 1 | 0 | s4 | 0.333 | 1 | 0 | 1 |
| (0.8, 1] | s5 | 0.833 | 10 | 0 | s6 | 0.023 | 15 | 0 | 15 | |
Each stratum is defined by a combination of labeled gender and the probability of being labeled male (Prob. male) estimated by our approach. Gender-labeling errors are marked in bold.
*Estimated probability of being labeled male is 0.49.
.
Figure 1Adjusted X-chromosome intensity versus proportion of heterozygous X-chromosome SNPs. Data forms into vertical bars because, with nine X-chromosome SNPs, a given sample may be heterozygous at zero, one, two, and theoretically up to nine SNPs. We see only eight main vertical bars because no samples with genotypes for all nine SNPs were heterozygous for eight or more SNPs. Intermediate heterozygosities result from samples with genotypes for fewer than nine SNPs. X-chromosome intensity values on the vertical axis have been adjusted for the sample mean GenCall score and sample mean intensity across autosomal SNPs, as described in the text.
Summary of results for male-labeled samples flagged by our approach or by positive heterozygosity values.
| Sample | No. het X-chromosome SNPs | No. called X-chromosome SNPs | Heterozygosity only | Prob. male | Proposed method | PCR result |
|---|---|---|---|---|---|---|
| 1 | 3 | 9 | F | 0.04 | F | F |
| 2 | 1 | 9 | F | 0.06 | F | F |
| 3 | 3 | 9 | F | 0.07 | F | U |
| 4 | 5 | 9 | F | 0.09 | F | F |
| 5 | 5 | 9 | F | 0.11 | F | F |
| 6 | 1 | 9 | F | 0.12 | F | F |
| 7 | 0 | 9 | M | 0.33 | F | M |
| 8 | 0 | 9 | M | 0.76 | M | U |
| 9 | 0 | 9 | M | 0.78 | M | U |
| 10 | 1 | 9 | F | 1 | M | M |
| 11 | 1 | 9 | F | 1 | M | M |
| 12 | 1 | 9 | F | 1 | M | M |
1Gender inferred based only on presence of any heterozygous genotypes.
2Fitted probability of being labeled male under the proposed method.
3Gender call based on an arbitrary threshold of 0.5 for fitted probability of being labeled male.
*U indicates sample could not be checked due to low DNA amounts.
Summary of results for female-labeled samples flagged by our approach.
| Sample | No. het X-chromosome SNPs | No. called X-chromosome SNPs | Prob. male | Proposed method | PCR result |
|---|---|---|---|---|---|
| 13 | 0 | 8 | 0.49 | F | F |
| 14 | 0 | 9 | 0.91 | M | U |
| 15 | 0 | 9 | 0.97 | M | M |
| 16 | 0 | 9 | 0.92 | M | M |
| 17 | 0 | 9 | 0.92 | M | M |
| 18 | 0 | 9 | 0.93 | M | M |
| 19 | 0 | 9 | 0.94 | M | M |
| 20 | 0 | 9 | 0.94 | M | M |
| 21 | 0 | 9 | 0.98 | M | M |
| 22 | 0 | 9 | 0.99 | M | M |
| 23 | 0 | 9 | 0.99 | M | M |
| 24 | 0 | 9 | 0.97 | M | M |
| 25 | 0 | 7 | 0.93 | M | U |
1Fitted probability of being labeled male under the proposed method.
2Gender call based on an arbitrary threshold of 0.5 for fitted probability of being labeled male.
.
*U indicates sample could not be checked due to low DNA amounts.
Figure 2Plot of gender-labeling errors predicted by our approach. Subjects in the left panel are labeled male samples; “predfemale”: predicted probability of being male less than 0.5. Subjects in the right panel are labeled female samples; “predmale”: predicted probability of being male greater than or equal to 0.5. X-chromosome intensity values on the vertical axes have been adjusted for the sample mean GenCall score and sample mean intensity across autosomal SNPs, as described in the text.
Method-specific numbers of flagged samples (flag) with true gender ascertained (gen), numbers of confirmed gender errors (tf = true female, tm = true male) among these, and error discovery rate (edr).
| Method | Labeled gender | |||||||
|---|---|---|---|---|---|---|---|---|
| Female | Male | |||||||
| flag | gen | tm | edr | flag | gen | tf | edr | |
| Our method | 12 | 10 | 10 | 1.000 | 7 | 6 | 5 | 0.833 |
| PLINK | 77 | 18 | 10 | 0.556 | 4 | 3 | 3 | 1.000 |
| Golden Helix-H | 77 | 18 | 10 | 0.556 | 9 | 8 | 5 | 0.625 |
| Golden Helix-I | 103 | 15 | 8 | 0.533 | 115 | 29 | 4 | 0.138 |
| PLATO | – | – | – | – | 9 | 8 | 5 | 0.625 |
*Golden Helix heterozygosity-based approach.
.
.