Literature DB >> 23056239

Handling missing data in transmission disequilibrium test in nuclear families with one affected offspring.

Gulhan Bourget1.   

Abstract

The Transmission Disequilibrium Test (TDT) compares frequencies of transmission of two alleles from heterozygote parents to an affected offspring. This test requires all genotypes to be known from all members of the nuclear families. However, obtaining all genotypes in a study might not be possible for some families, in which case, a data set results in missing genotypes. There are many techniques of handling missing genotypes in parents but only a few in offspring. The robust TDT (rTDT) is one of the methods that handles missing genotypes for all members of nuclear families [with one affected offspring]. Even though all family members can be imputed, the rTDT is a conservative test with low power. We propose a new method, Mendelian Inheritance TDT (MITDT-ONE), that controls type I error and has high power. The MITDT-ONE uses Mendelian Inheritance properties, and takes population frequencies of the disease allele and marker allele into account in the rTDT method. One of the advantages of using the MITDT-ONE is that the MITDT-ONE can identify additional significant genes that are not found by the rTDT. We demonstrate the performances of both tests along with Sib-TDT (S-TDT) in Monte Carlo simulation studies. Moreover, we apply our method to the type 1 diabetes data from the Warren families in the United Kingdom to identify significant genes that are related to type 1 diabetes.

Entities:  

Mesh:

Year:  2012        PMID: 23056239      PMCID: PMC3466247          DOI: 10.1371/journal.pone.0046100

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The Transmission Disequilibrium Test (TDT) is the most widely used family-based test for linkage disequilibrium [1], [2]. It was first introduced to handle one affected offspring in a nuclear family, and was later extended to two or more affected offspring, and to multi-allelic markers as well. The TDT is a test for linkage in the presence of linkage disequilibrium [1], [2]. The TDT compares frequencies of the transmission of two alleles from heterozygote parents to an affected offspring. The TDT requires complete genotypes from parents and offspring. However, sometimes genotypes may not be available. If genotypes of parents are missing, including only complete cases [3], [4], [5], [6], [7], or reconstructing missing parental genotypes by assuming a missing at random (MAR) model [8] have been suggested as common approaches in practice. However, if parental genotypes are missing due to his genotype at the locus of interest, then the informatively missing model is more appropriate than the MAR model [9]. Also, including only complete families and families with only one parent missing in informatively missing parent(s) [3], [6], [7], [10] reconstructing parental genotypes from their affected offspring [2], or from affected and unaffected siblings (Reconstruction-Combined TDT) [4], [11], or completely ignoring parental genotypes and comparing frequencies of genotypes of unaffected and affected offspring (S-TDT) [12], [13],[14], [15], or combining different data sets from families with parental genotypes and from families with missing parental genotype data but whose siblings' genotypes are unaffected (C-TDT) [12] has been also proposed as alternative approaches. The robust TDT (rTDT) was proposed to handle any missing genotypes in a nuclear family with one affected offspring and bi-allelic marker [16]. The rTDT does not assume any missing model, and defines an interval estimate of TDT by considering all possible completions of missing genotypes. Sebastiani et al. [16] claimed that rTDT has more power than TDT. The simulation study was not performed, and the claim of having more power than TDT was shown mathematically for a specific missing pattern for each family [16]. That is, they assumed that missing families have the same form: the genotype of one parent is missing, the other parent has a heterozygous genotype, and the affected child has homozygous genotype [see Discussion section for more details]. This specific missing pattern for each family is not a reasonable assumption in practice. Alpargu (Bourget) [17] defined the rTDT for two affected offspring, and showed in simulation studies that rTDT was too conservative, and had low power. Because of its poor performance, the Mendelian Inheritance-Transmission Disequilibrium Test (MI-TDT), which takes population frequencies of the disease allele and marker allele into account in rTDT, was proposed [17]. The MI-TDT performed better than rTDT by controlling type I error rates and having high power. Since, MI-TDT outperformed rTDT, in this paper we propose the Mendel Inheritance-Transmission Disequilibrium Test (MITDT-ONE) for one affected offspring. The MITDT-ONE considers and in rTDT. The simulation study replicating real life scenarios such as different missing models and different genetic models shows that MITDT-ONE outperforms rTDT by providing better control of type I error rates and producing higher power.

Methods

We demonstrate the features of rTDT and MITDT-ONE with an example. We assume that we have genotypes of nuclear families with one affected offspring, and bi-allelic markers with alleles 1 and 2. In a given data set, there are (1,1), (1,2), or (2,2) complete genotypes or (0,0) missing genotypes. For each family, there are three genotypes with the first two genotypes for parents and the last genotype for offspring (e.g., (1,2)(1,1)(1,2)). If at least one of the genotypes is unknown, then the data is called incomplete. Otherwise it is called complete. Hence, a whole data set has two parts for a given marker: complete and incomplete trio genotypes. The TDT considers transmission from heterozygote parents () to affected offspring. Let be the number of that transmit allele 1 to an affected offspring, and be the number of that transmit allele 2 to an affected offspring. Then, the TDT statistic for complete datatests linkage () between a disease and a marker locus in the presence of linkage disequilibrium ( or ) [1]. Under the null hypothesis of no linkage (), follows a central chi-square distribution with 1 degree of freedom (df). We construct interval estimates of MITDT-ONE and rTDT as follows: (1) compute maximum and minimum increments in and by considering all possible admissible completions of missing genotypes ( for maximum increments of ), (2) find population frequencies of disease allele , and marker allele (), and finally, (3) compute maximum and minimum values of and ( and for and and for ). While all three steps are involved in MITDT-ONE, rTDT does not require step (2). This is the only important difference between two methods. However, MITDT-ONE requires the value of , which is difficult to know in some diseases. We can overcome the knowledge of by assuming because McGinnis (1998) [18] showed that TDT is able to detect linkage, and its power exceeds 0.5 only when is close to its most positive value (see the definition of in the following section) when , and allele frequencies and are similar in magnitude at marker and disease locus. For complete families, let us assume that we have 50 heterozygote parents () in which 35 of them transmit allele 1 (), and 15 of them transmit allele 2 (). Using (1), we compute . The chi-square distribution with 1 df at 5% nominal level is 3.84. Based on only complete cases, we reject the null hypothesis of no linkage at 5% nominal level. Now, assume and with two missing families as in Table 1.
Table 1

Two missing cases.

FamilyParentsChildren
1(0,0)(1,2)(0,0)
2(1,2)(1,1)(0,0)
The first step of imputing missing cases involves only possible admissible completions. The MITDT-ONE and rTDT (as does TDT) consider families with at least one heterozygote parent. For example, if the incomplete case is (1,1)(0,0)(1,2), we do not consider the completion (1,1)(2,2)(1,2) because both parents have homozygous genotypes. Moreover, in family 2 above, (1,2)(1,1)(2,2) is not a possible admissible completion because the only possible completions for offspring are (1,1) or (1,2). All possible admissible genotypes are defined in Table 2.
Table 2

Admissible cases.

FamilyScenarioParentChildren
11(1,1)(1,2)(1,1)10
2(1,1)(1,2)(1,2)01
3(2,2)(1,2)(1,2)10
4(2,2)(1,2)(2,2)01
5(1,2)(1,2)(1,1)20
6(1,2)(1,2)(1,2)11
7(1,2)(1,2)(2,2)02
28(1,2)(1,1)(1,1)10
9(1,2)(1,1)(1,2)01
Under the null hypothesis , heterozygote parent transmits allele 1 but not allele 2 to an affected offspring with probability , and the same parent transmits allele 2 but not allele 1 to an affected offspring with probability , where is the coefficient of disequilibrium, is the frequency of the marker allele 1, and is the population relative frequency of disease allele [19]. The statistic compares the number of transmissions with probabilities and . It can be shown that these probabilities are the same under the null hypothesis. Thus, the expected number of transmissions are the same. Thus, . However, the probabilities are different when there is linkage, and hence the number of transmissions are different. This means that the statistic is related to the parameters , and . All these families have equal probabilities of being considered under the null hypothesis of no linkage. However, MITDT-ONE and rTDT consider increments in () and (). The exact maximum and minimum values of TDT in (1) are attained by rTDT. The interval estimate of rTDT is . While the minimum value is attained when and (scenarios 7 and 9), the maximum value is attained when and (scenarios 5 and 8). The interval estimate of MITDT-ONE is with the same completion of the families as rTDT. Both tests use the same admissible cases and consider lower limits to identify significant genes. Both methods reject the null hypothesis of no linkage at 5% nominal level in the above example. The interval estimate of MITDT-ONE is always contained in the interval estimate of rTDT (see in Construction of the MITDT-ONE and rTDT for more details). It is important to note that MITDT-ONE and rTDT have the same minimum values for and but differ at maximum values of and . Therefore, MITDT-ONE will never have less power than rTDT. Since the MITDT-ONE has more power and controls type I error rates better, we suggest using the MITDT-ONE test instead of rTDT test.

Construction of the MITDT-ONE and rTDT

There are 17 admissible missing cases in a nuclear family with one affected offspring (Table 3). Sebastiani et al. [16] proposed an interval estimate of rTDT for one affected offspring. They proceeded in the following way: in (1) is a monotone convex function on a closed domain. Thus, it achieves its maximum and minimum values at one of its extreme points. The maximum and minimum values of and were considered to define the maximum and minimum values of . First, all possible admissible completions were identified (Tables 4 and 5), and then the maximum and minimum increments in and (Table 6) were defined as where ( is the number of missing families in case . The maximum and minimum values of and were defined as where is the number of that transmit allele 1 (2) to affected offspring in complete data set. And finally, the interval estimate of rTDT was defined as
Table 3

Number of missing cases in a family with one affected offspring.

Offspring Genotype
CaseParental Genotype(0,0)(1,1)(1,2)(2,2)Total
1(0,0) (0,0)++++4
2(0,0) (1,1)+++3
3(0,0) (1,2)++++4
4(0,0) (2,2)+++3
5(1,1) (1,2)+ 1
6(1,2) (1,2)+ 1
7(1,2) (2,2)+ 1
Total number of admissible incomplete trios17

The symbols , and denote possible incomplete, impossible incomplete, and complete cases, respectively.

Table 4

List of admissible completions for cases 1–8.

CaseIncomplete GenotypesAdmissible CompletionsIncrements
kParentsOffspringParentsOffspring
1(0, 0)(0, 0)(0, 0)(1,1) (1,1)(1,1)00
(1,1) (1,2)(1,1)10
(1,1) (1,2)(1,2)01
(1,1) (2,2)(1,2)00
(1,2) (1,2)(1,1)20
(1,2) (1,2)(1,2)11
(1,2) (1,2)(2,2)02
(2,2) (1,2)(1,2)10
(2,2) (1,2)(2,2)01
(2,2) (2,2)(2,2)00
2(0, 0)(0, 0)(1, 1)(1,1) (1,1)(1,1)00
(1,1) (1,2)(1,1)10
(1,2) (1,2)(1,1)20
3(0, 0)(0, 0)(1, 2)(1,1) (1,2)(1,2)01
(1,1) (2,2)(1,2)00
(1,2) (1,2)(1,2)11
(1,2) (2,2)(1,2)10
4(0, 0)(0, 0)(2, 2)(1,2) (1,2)(2,2)02
(1,2) (2,2)(2,2)01
(2,2) (2,2)(2,2)00
5(0, 0)(1, 1)(0, 0)(1,1) (1,1)(1,1)00
(1,2) (1,1)(1,1)10
(1,2) (1,1)(1,2)01
(2,2) (1,1)(1,2)00
6(0, 0)(1, 1)(1, 1)(1,1) (1,1)(1,1)00
(1,2) (1,1)(1,1)10
7(0, 0)(1, 1)(1, 2)(1,2) (1,1)(1,2)01
(2,2) (1,1)(1,2)00
8(0.0) (1,2)(0,0)(1,1) (1,2)(1,1)10
(1,1) (1,2)(1,2)01
(1,2) (1,2)(1,1)20
(1,2) (1,2)(1,2)11
(1,2) (1,2)(2,2)02
(2,2) (1,2)(1,2)10
(2,2) (1,2)(2,2)01
Table 5

List of admissible completions for cases 9–17.

CaseIncomplete GenotypesAdmissible CompletionsIncrements
kParentsOffspringParentsOffspring
9(0, 0)(1, 2)(1, 1)(1,1) (1,2)(1,1)10
(1,2) (1,2)(1,1)20
10(0, 0)(1, 2)(1, 2)(1,1) (1,2)(1,2)01
(1,2) (1,2)(1,2)11
(2,2) (1,2)(1,2)10
11(0, 0)(1, 2)(2, 2)(1,2) (1,2)(2,2)02
(2,2) (1,2)(2,2)01
12(0, 0)(2, 2)(0, 0)(1,1) (2,2)(1,2)00
(1,2) (2,2)(1,2)10
(1,2) (2,2)(2,2)01
(2,2) (2,2)(2,2)00
13(0, 0)(2, 2)(1, 2)(1,1) (2,2)(1,2)00
(1,2) (2,2)(1,2)10
(2,2) (2,2)(1,2)00
14(0, 0)(2, 2)(2, 2)(1,2) (2,2)(2,2)01
(2,2) (2,2)(2,2)00
15(1, 1)(1, 2)(0, 0)(1,1) (1,2)(1,1)10
(1,1) (1,2)(1,2)01
16(1, 2)(1, 2)(0, 0)(1,2) (1,2)(1,1)20
(1,2) (1,2)(1,2)11
(1,2) (1,2)(2,2)02
17(1, 2)(2, 2)(0, 0)(1,2) (2,2)(1,2)10
(1,2) (2,2)(2,2)01
Table 6

Admissible increments of and .

CaseParentsOffspringIncrement Min.Max.
(0,0)(1,0)(2,0)(1,1)(0,1)(0,2)IncInc
1(0,0) (0,0)(0,0)++++++02
2(1,1)+++02
3(1,2)++++02
4(2,2)+++02
5(0,0) (1,1)(0,0)+++01
6(1,1)++01
7(1,2)++01
8(0,0) (1,2)(0,0)+++++12
9(1,1)++12
10(1,2)+++12
11(2,2)++12
12(0,0) (2,2)(0,0)+++01
13(1,2)++01
14(2,2)++01
15(1,1) (1,2)(0,0)++11
16(1,2) (1,2)(0,0)+++22
17(1,2) (2,2)(0,0)++11

In , and represent the increments in and , respectively. The plus (minus) sign indicates that the increment is plausible (not plausible). The last two columns show the maximum and minimum increments in each cases.

If , then If , then In all other cases: The value of () makes a decision against (conforming) the null hypothesis. If for complete data (i.e., missing data are ignored) and reach the conclusion of the alternative hypothesis (i.e., significant genes), and , then rTDT affirms significant genes of complete data. Similarly, the value of ratifies the insignificant genes if and cannot reject the null hypothesis, and . In all other scenarios, rTDT cannot verify any conclusions of complete data. The symbols , and denote possible incomplete, impossible incomplete, and complete cases, respectively. In , and represent the increments in and , respectively. The plus (minus) sign indicates that the increment is plausible (not plausible). The last two columns show the maximum and minimum increments in each cases. Sebastiani et al. [16] did not run any simulation study to demonstrate the performance of rTDT. They theoretically showed that if all missing families are in case 9, which is not a reasonable assumption in practice, then rTDT has higher power than the classical . Since the power of TDT depends on linkage disequilibrium , and relative frequencies of marker allele () and disease allele () [20], we ran simulation studies to take into account different realistic disease models and missing models, involving and . The simulation results show that rTDT overestimates the values of (results are not shown), and hence becomes a conservative test with low power. Since does not involve , we decided to scale down to have a smaller value of for MITDT-ONE. One way to achieve this goal is to involve and in scaling. These parameters appear together in maximum linkage disequilibrium when linkage disequilibrium is positive , and when linkage disequilibrium is negative [18]. We scale with and when , and define for MITDT-ONE as the average of these values. That is,whereSimilarly, we can define by replacing in (4) with . Since TDT provides better power when linkage disequilibrium is at its maximum () for , and [18], we can reformulate (4) for real sample data as The lowest values of the interval estimates of rTDT and MITDT-ONE find significant genes when they are actually not. The way the interval estimate for MITDT-ONE constructed guarantees that its lowest interval estimate is always larger than the lowest interval estimate of rTDT . This fact can be shown theoretically in the following way: let us assume (the other two conditions in (31) can be shown similarly). Since , we have We claimed that rTDT is a conservative test. We have observed this through simulation study but not theoretically. The reason rTDT becomes conservative is that the value of , in general, falls below the value of chi-square distribution with 1 df at nominal level (for example, when , this value is 3.84).

Results

Simulation

We replicated the simulation study in [17] for one affected offspring. Let us assume a bi-allelic marker with alleles 1 and 2 which is linked to a bi-allelic disease locus with disease-predisposing allele and non-predisposing allele . The penetrance for and genotypes are and , respectively, with , and the population frequencies for the marker with disease locus haplotype for 1D, 1d, 2D and 2d are and , respectively, where . The population relative frequency of disease allele D is . The frequencies of the marker alleles 1 and 2 are and , respectively. The recombination fraction between the disease and marker locus is , and the coefficient of disequilibrium is . The probability of a heterozygote parent transmitting marker allele 1 to a particular affected child [18] is defined as where Our simulation study demonstrates realistic complex disease models. We generated 5,000 data sets for four different missing models and three genetics models (additive, dominant and recessive). In each simulation, we generated 100 families and each family consisted of one affected and one unaffected offspring, and 50 heterozygote fathers and 50 heterozygote mothers. In disease models, the probabilities of an affected child given the homozygosity (), heterozygosity (), and absence of the disease alleles () are defined as , and , respectively. The values of these parameters were as for dominant , additive (), and recessive models . In missing models, we consider (1) Missing Completely at Random (MCAR) for all genotypes, (2) informative missing for parental genotypes and MCAR for offspring genotypes, (3) informative missing for all genotypes, and (4) MCAR for parental genotypes and informative missing for offspring genotypes. A model is called “informatively missing” if at least two of the are not equal, where are missing rates for f, m, and o with (1,1), (1,2) and (2,2) genotypes, respectively. In Table 7, the first column denotes the missing patterns () and missing rates ().
Table 7

Missing model (MM) and missing rates (MR).

Missing Rates
MM/MRFatherMotherOffspring
1/1(0.10,0.10,0.10)(0.10,0.10,0.10)(0.10,0.10,0.10)
2/1(0.05, 0.05, 0.10)(0.05, 0.05, 0.10)(0.10,0.10,0.10)
2/2(0.05, 0.075, 0.10)(0.05, 0.075, 0.10)(0.10,0.10,0.10)
2/3(0.05, 0.10, 0.10)(0.05, 0.10, 0.10)(0.10,0.10,0.10)
2/4(0.10, 0.05, 0.05)(0.10, 0.05, 0.05)(0.10,0.10,0.10)
2/5(0.10, 0.075, 0.05)(0.10, 0.075, 0.05)(0.10,0.10,0.10)
2/6(0.10, 0.10, 0.05)(0.10, 0.10, 0.05)(0.10,0.10,0.10)
3/1(0.05, 0.05, 0.10)(0.05, 0.05, 0.10)(0.05, 0.05, 0.10)
3/2(0.05, 0.075, 0.10)(0.05, 0.075, 0.10)(0.05, 0.075, 0.10)
3/3(0.05, 0.10, 0.10)(0.05, 0.10, 0.10)(0.05, 0.10, 0.10)
3/4(0.10, 0.05, 0.05)(0.10, 0.05, 0.05)(0.10, 0.05, 0.05)
3/5(0.10, 0.075, 0.05)(0.10, 0.075, 0.05)(0.10, 0.075, 0.05)
3/6(0.10, 0.10, 0.05)(0.10, 0.10, 0.05)(0.10, 0.10,0.05)
4/1(0.10, 0.10, 0.10)(0.10, 0.10, 0.10)(0.05, 0.05, 0.10)
4/2(0.10, 0.10, 0.10)(0.10, 0.10, 0.10)(0.05, 0.075, 0.10)
4/3(0.10, 0.10, 0.10)(0.10, 0.10, 0.10)(0.05, 0.10, 0.10)
4/4((0.10, 0.10, 0.10)(0.10, 0.10, 0.10)(0.10, 0.05, 0.05)
4/5(0.10, 0.10, 0.10)(0.10, 0.10, 0.10)(0.10, 0.075, 0.05)
4/6(0.10, 0.10, 0.10)(0.10, 0.10, 0.10)(0.10, 0.10,0.05)

Missing models:(1) Missing Completely at Random (MCAR) for all genotypes, (2) informative missing for parental genotypes and MCAR for offspring genotypes, (3) informative missing for all genotypes, and (4) MCAR for parental genotypes and informative missing for offspring genotypes. and denote missing rates for father (f), mother (m), and offspring (o) with (1,1), (1,2), and (2,2) genotypes, respectively.

Missing models:(1) Missing Completely at Random (MCAR) for all genotypes, (2) informative missing for parental genotypes and MCAR for offspring genotypes, (3) informative missing for all genotypes, and (4) MCAR for parental genotypes and informative missing for offspring genotypes. and denote missing rates for father (f), mother (m), and offspring (o) with (1,1), (1,2), and (2,2) genotypes, respectively. The performances of the methods were demonstrated by validity and power analysis. The S-TDT, which ignores genotypes of the parents and compares frequencies of the affected and unaffected offspring [see 14 for the computation of S-TDT], was included to compare our methods with one of the widely used family based methods. Since S-TDT completely ignores parental genotypes and requires unaffected offspring genotypes from these families, and also assumes affected offspring genotypes are available, none of the missing mechanism models were taken into account. It means that the type I error rates for S-TDT are all the same whatever the missing mechanism models are for a given value. In validity and power analysis tables, the TDT ignores missing cases and considers only complete cases, S-TDT ignores parental genotypes and considers only genotypes of affected and unaffected offspring of all 100 families (genotypes are all known), and MITDT and rTDT use all 100 families after construction of all possible admissible genotypes. The most positive value of linkage disequilibrium is defined as when , and the most negative value of linkage disequilibrium is defined as when . Since type I error rate and power results for when at are equal to type I error rate and power results for when at , we only consider the values of when . In the presence of positive linkage disequilibrium (), the null hypotheses are there is no linkage in validity analysis, and there is a complete linkage in power analysis. The values of were chosen as moderate and maximum with and .

Validity Analysis

When , the probability that an informative parent transmits marker allele 1 to a particular affected child () becomes 0.5 because is zero in (8). That is, the value of in and the disease model in are not involved in validity analysis. It means that type I error rates are the same for every disease model. All testing procedures (TDT, MITDT-ONE, rTDT) except S-TDT were valid tests at 1% and 5% significance levels (Tables 8 and 9). Since TDT, MITDT-ONE and rTDT takes also information about genotypes of parents into account as opposed to S-TDT, this information had a positive impact on the sizes of the tests. Since S-TDT had inflated type I errors, we excluded its performance in power analysis. Overall, MITDT-ONE outperformed rTDT by providing type I error rates close to the corresponding significance levels. The rTDT was the conservative test. Actually, this was the main reason for us to propose a new test that controls type I error rates better. The results in Tables 8 and 9 show that the MITDT-ONE achieved this goal. Since MITDT-ONE (and rTDT) does not assume any specific missing models, we suggest that MITDT-ONE should be preferred over some widely used family based testing procedures.
Table 8

Type I error rates at 1% significance level under the null hypothesis of .

GMMM/MRS-TDTTDTrTDTMI-TDT
D, A, R1/10.0240.00900.007
2/10.0240.00900.007
2/20.0100.007
2/30.00900.007
2/40.00900.006
2/50.00900.007
2/60.00900.007
3/10.0240.0100.007
3/20.01100.007
3/30.0100.007
3/40.00900.007
3/50.0100.008
3/60.0100.009
4/10.0240.0100.007
4/20.0100.007
4/30.0100.007
4/40.0100.009
4/50.0100.009
4/60.0100.009
D, A, R1/10.0220.00700.006
2/10.0220.00700.004
2/20.00700.005
2/30.00700.006
2/40.00800.004
2/50.00800.004
2/60.00800.005
3/10.0220.00900.005
3/20.00900.005
3/30.00800.006
3/40.00800.005
3/50.00900.006
3/60.00800.007
4/10.0220.00800.006
4/20.00800.006
4/30.00800.006
4/40.00800.007
4/50.00800.007
4/60.00800.007
Table 9

Type I error rates at 5% significance level under the null hypothesis of .

GMMM/MRS-TDTTDTrTDTMI-TDT
D, A, R1/10.1040.0500.04
2/10.1040.04800.036
2/20.04800.037
2/30.0500.04
2/40.0470.0010.034
2/50.0470.0010.034
2/60.050.0010.037
3/10.1040.0520.0010.035
3/20.05200.036
3/30.05200.041
3/40.0480.0030.036
3/50.0530.0020.04
3/60.0530.0010.042
4/10.1040.0530.0010.039
4/20.05200.04
4/30.05200.041
4/40.0530.0010.042
4/50.05300.043
4/60.05300.045
D, A, R1/10.1020.04500.036
2/10.1020.04300.031
2/20.04300.034
2/30.04500.036
2/40.0420.0010.028
2/50.0420.0010.03
2/60.0450.0010.034
3/10.1020.04500.032
3/20.04600.034
3/30.04700.037
3/40.0410.0030.033
3/50.0470.0020.037
3/60.0450.0010.038
4/10.1020.04600.036
4/20.04600.037
4/30.04700.037
4/40.04500.038
4/50.04600.039
4/60.04700.039

In column 2, D, A, and R represent dominant, additive, and recessive genetic models (GM), respectively.

In column 2, D, A, and R represent dominant, additive, and recessive genetic models (GM), respectively.

Power Analysis

In power analysis, the null hypothesis is that there is a complete linkage (). When , the probability of an informative parent transmitting marker allele 1 to a particular affected child () becomes greater than or equal to 0.5 because , and contribute to the value of . It means information from linkage disequilibrium and and (parameters of disease model) have positive effect on power. This theoretical fact was also observed through simulation studies in Tables 10, 11, 12, 13, 14, 15. The pattern of power for all disease models, missing rates, missing models, and strength of linkage disequilibrium were the same for different significance levels (1% and 5%). However, the power values were better at 5% significance level than at 1% significance level.
Table 10

Power values at 1% significance level when alternative hypothesis is .

GMMM/MRS-TDTTDTrTDTMI-TDT
Dominant1/10.2910.7690.0720.829
2/10.2910.7820.1020.833
2/20.7850.0980.835
2/30.7690.0720.829
2/40.780.180.815
2/50.7820.1750.819
2/60.7670.1450.823
3/10.2910.7640.1360.815
3/20.7760.1110.826
3/30.7690.0720.829
3/40.7950.4490.83
3/50.8120.3850.838
3/60.7780.2940.833
4/10.2910.7610.0940.821
4/20.7620.080.823
4/30.7690.0720.829
4/40.7690.2980.826
4/50.770.2660.829
4/60.7770.2440.835
Additive1/10.2570.720.0530.788
2/10.2570.7290.0770.792
2/20.7350.0740.795
2/30.720.0530.788
2/40.7240.1430.772
2/50.7310.1390.774
2/60.7160.1130.781
3/10.2570.7140.1050.77
3/20.7260.0830.786
3/30.720.0530.788
3/40.7510.3830.788
3/50.7670.3270.795
3/60.7310.2480.793
4/10.2570.7130.0680.778
4/20.7150.0590.781
4/30.720.0530.788
4/40.7210.2530.785
4/50.7240.2180.788
4/60.7290.1970.795
Table 11

Power analysis continues.

GMMM/MRS-TDTTDTrTDTMI-TDT
Recessive1/10.2310.680.0420.756
2/10.2310.6880.0640.755
2/20.6930.0610.761
2/20.680.0420.756
2/20.6860.1190.734
2/20.690.1170.737
2/20.6760.0970.746
3/10.2310.6710.0890.731
3/20.6820.0680.748
3/30.680.0420.756
¾0.7140.3420.753
3/50.7280.2870.76
3/60.6930.2070.761
4/10.2310.6730.0550.744
4/20.6760.0480.747
4/30.680.0420.756
4/40.6830.2180.753
4/50.6860.1860.756
4/60.690.1670.764
Dominant1/10.0210.00900.007
2/10.0210.00900.006
2/20.00900.006
2/30.00900.007
2/40.00900.005
2/50.00900.005
2/60.00900.007
3/10.0210.0100.007
3/20.0100.007
3/30.00900.007
¾0.00900.007
3/50.0100.007
3/60.00900.007
4/10.0210.00900.007
4/20.00900.007
4/30.00900.007
4/40.0100.008
4/50.00900.008
4/60.00900.008
Table 12

Power analysis continues.

GMMM/MRS-TDTTDTrTDTMI-TDT
Additive1/10.0150.01200.016
2/10.0150.00900.014
2/20.0100.014
2/30.01200.016
2/40.0100.011
2/50.0100.011
2/60.01200.014
3/10.0150.00900.012
3/20.0100.013
3/30.01200.016
¾0.0110.0010.014
3/50.01300.015
3/60.01300.016
4/10.0150.01200.015
4/20.01200.015
4/30.01200.016
4/40.01200.017
4/50.01200.018
4/60.01300.018
Recessive1/10.0140.01400.021
2/10.0140.01200.017
2/20.01300.019
2/30.01400.021
2/40.01300.015
2/50.01300.015
2/60.01300.018
3/10.0140.01200.014
3/20.01300.017
3/30.01400.021
¾0.0130.0010.017
3/50.0160.0010.019
3/60.01500.021
4/10.0140.01200.02
4/20.01200.02
4/30.01400.021
4/40.01400.022
4/50.01400.023
4/60.01500.023
Table 13

Power values at 5% significance level when alternative hypothesis is .

GMMM/MRS-TDTTDTrTDTMI-TDT
Dominant1/10.5710.9110.2490.941
2/10.5710.9190.2950.942
2/20.9190.2870.943
2/30.9110.2490.941
2/40.9190.4840.938
2/50.920.4690.938
2/60.9130.4020.937
3/10.5710.9120.3660.935
3/20.9160.3140.939
3/30.9110.2490.941
¾0.9280.7340.941
3/50.9330.670.944
3/60.9160.5530.941
4/10.5710.9090.3090.936
4/20.9110.2740.937
4/30.9110.2490.941
4/40.9110.570.939
4/50.9130.5350.94
4/60.9140.5070.944
Additive1/10.5240.8850.1980.92
2/10.5240.8940.2430.92
2/20.8930.2340.922
2/30.8850.1980.92
2/40.8940.4210.914
2/50.8940.4070.916
2/60.8850.3430.914
3/10.5240.8830.3130.91
3/20.890.2620.916
3/30.8850.1980.92
3/40.9040.6810.918
3/50.9090.6080.924
3/60.890.4890.918
4/10.5240.8810.2540.914
4/20.8830.2210.916
4/30.8850.1980.92
4/40.8850.5040.917
4/50.8860.4680.919
4/60.8880.440.923
Table 14

Power analysis continues.

GMMM/MRS-TDTTDTrTDTMI-TDT
Recessive1/10.4980.8620.1680.905
2/10.4980.8730.2050.907
2/20.8730.1980.909
2/30.8620.1680.905
2/40.8730.3780.899
2/50.8720.3650.9
2/60.8630.3020.898
3/10.4980.860.2720.895
3/20.870.2210.901
3/30.8620.1680.905
¾0.8850.6330.902
3/50.8910.5620.908
3/60.8690.4480.904
4/10.4980.8580.220.899
4/20.860.190.901
4/20.8620.1680.905
4/20.8630.4590.902
4/20.8650.4250.904
4/20.8660.3950.909
Dominant1/10.1040.0400.038
2/10.1040.04300.034
2/20.04200.034
2/30.0400.038
2/40.04200.032
2/50.04200.03
2/60.0400.037
3/10.1040.04500.032
3/20.04600.033
3/30.04200.039
¾0.0410.0040.033
3/50.0440.0020.037
3/60.0420.0010.039
4/10.1040.04200.039
4/20.04300.039
4/30.04200.039
4/40.0420.0010.04
4/50.0430.0010.039
4/60.04200.041
Table 15

Power analysis continues.

GMMM/MRS-TDTTDTrTDTMI-TDT
Additive1/10.080.05500.07
2/10.080.05400.061
2/20.05600.065
2/30.05500.07
2/40.0530.0020.055
2/50.0550.0020.058
2/60.0550.0010.062
3/10.080.05200.055
3/20.05600.06
3/30.05500.07
¾0.0560.0070.059
3/50.0650.0050.069
3/60.0590.0030.068
4/10.080.05500.065
4/20.05500.066
4/30.05500.07
4/40.0570.0030.07
4/50.0580.0030.071
4/60.0580.0020.075
Recessive1/10.0770.06600.084
2/10.0770.06400.078
2/20.06600.081
2/30.06600.084
2/40.0630.0020.07
2/50.0650.0020.072
2/60.0650.0010.075
3/10.0770.0610.0010.068
3/20.06500.075
3/30.06600.083
¾0.0670.0080.074
3/50.0780.0060.083
3/60.070.0040.083
4/10.0770.0640.0010.077
4/20.06500.079
4/20.06600.083
4/20.0670.0040.084
4/20.0680.0030.086
4/20.070.0020.09
When the linkage disequilibrium was at its moderate level (), dominant models had the highest power following by additive and recessive models. While the power of MITDT-ONE ranged between 0.73 (0.94) and 0.84 (0.89), the power of rTDT ranged between 0.042 (0.17) and 0.45 (0.68) when . When linkage disequilibrium was at its maximum (), all testing procedures lacked power because the value of in (8) was close to 0.5 (this value was exactly 0.5 in validity analysis). When , recessive models had the highest power, following by additive and dominant models, which was a reserve observation for . Over all, MITDT-ONE was the only method that provided the highest power at any significance level.

Real Data: U.K. Warren Family

We illustrate the robustness of the MITDT-ONE for type 1 diabetes at insulin dependent diabetes mellitus 2 locus (IDDM2) on chromosome 11p15. At our request, Neil Walker of the Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory (JDRF/WT DIL) compiled data from 475 families with two affected offspring from the U.K. Warren Families for 52 SNPs. This data set was analyzed by [17] to demonstrate the method of MI-TDT for two affected offspring. The author of [21] used extensive logistic regression studies on the same data set, and identified −23 HphI, +1,140A/C, +1428 FokI, and VNTR as significant SNPs. The same SNPs as in [21] and six more were also identified by [17]. We considered the same U.K. Warren Families but chose the first affected child from each family to have only one affected offspring to demonstrate the performance of MITDT-ONE and rTDT. For the MITDT-ONE, we need to know frequencies of marker allele 1 () and disease allele () for each SNP. The values of were provided to us along with the data set, except two (VNTR (DIL967) and TH micro' Z (DIL950)), but not the values of . McGinnis (1998) [18] showed that TDT was able to detect linkage and its power exceeded 0.5 only when was close to and allele frequencies and were similar in magnitude at the marker and disease locus. Therefore, we chose optimal values for by assuming . The percentage of missing genotypes ranged from low (4% for DIL977) to high (52% for DIL997). Table 16 reports 18 significant SNPs out of 52 at 5% significance level for complete genotypes. Since we tested 52 SNPs, we applied Bonferroni multiple testing procedure at 0.05% significance level or 99.95% confidence level, and identified seven significant SNPs (underlined -values). Since percentage of missing genotypes ranged from small to high, one should be cautious to declare significant SNPs when missing genotypes are ignored. Since DIL950 was insignificant for complete data, we dropped it from the computation of MITDT-ONE and rTDT. DIL967 was significant for complete data but its marker allele were not provided to us. Since we did not have any knowledge about the value of , and did not want to assign any preferential value, we considered equal frequencies for and .
Table 16

Type I Diabetes (IDDM): The significant SNPs for complete data.

SNPVariant%
DIL997C/T5240.0455003
DIL996C/T284.26315790.0389475
DIL989C/T264.74074070.0294564
DIL985C/T426.10843370.0134538
DIL984G/A223.85714290.0495346
DIL977G/A417.386831
DIL976T/G3610.9408280.0009407
DIL975C/T3011.5714290.0006697
DIL974A/C3016.568966
DIL973T/C1614.069767
DIL971G/C2010.971429
DIL969A/T623.027397
DIL967VNTR621.300341
DIL965T/C2014.901478
DIL963A/C2210.4142860.0012504
DIL954C/T366.28571430.0121715
DIL3872C/G187.47457630.0062576
DIL2048C/T123.78151260.0518218

The third, fourth, and fifth columns show the percentages of missing data, the TDT statistics for complete data, and uncorrected p-values at 5% significance level. The significance SNPs are shown by underlined -values for Bonferroni at 0.05% significance level.

The third, fourth, and fifth columns show the percentages of missing data, the TDT statistics for complete data, and uncorrected p-values at 5% significance level. The significance SNPs are shown by underlined -values for Bonferroni at 0.05% significance level. The MITDT-ONE and rTDT could verify if the significant SNPs for complete data are also significant when missing genotypes are taken into account. However, if either method could not reach significant result as in complete case, it does not mean that these SNPs are insignificant. It simply means that both methods reach an inconclusive decision. Moreover, the number of significant SNPs could be smaller when either test is employed, compared to the number of significant SNPs for complete data. Out of 18 significant SNPs in complete cases, MITDT-ONE (rTDT) verified seven (three) to be significant (Table 17). The MITDT-ONE as well as rTDT found 23 HphI, +1428 FokI, and VNTR as significant SNPs as in [21] and [17]. Furthermore, MITDT-ONE identified four more same SNPs in [17] as significant; hence, we suggest researchers to investigate these SNPs as possible casual variant genes.
Table 17

The Significant SNPs at 5% significance level when the MITDT-ONE is applied.

SNPVariantNamedbSNP
DIL977* G/A+1,428 FokIrs384275617.3916.820.00003050.0000412
DIL973T/C+1,127 PstIrs384275214.078.000.00017620.0046696
DIL971G/C+805 DraIIIrs384274810.986.130.00092530.0133311
DIL969* A/T−23 HphIrs68923.0321.4415.97× 36.48×10−5
DIL967* VNTRVNTR-21.3018.273.93× 0.0000192
DIL965T/C−2,221 MspIrs384272914.907.970.00011330.0047659
DIL963A/C−2,733A/Crs384272710.414.680.00125040.0306104

The third and fourth columns show the name of the SNP defined in Barratt et al. (2004) and the SNP database, respectively. The fifth and sixth columns show the statistics for complete and incomplete data when MITDT-ONE is applied, respectively. The seventh and eight columns show the type I errors of the columns fifth and sixth, respectively.

are SNPs found in association by using rTDT.

The third and fourth columns show the name of the SNP defined in Barratt et al. (2004) and the SNP database, respectively. The fifth and sixth columns show the statistics for complete and incomplete data when MITDT-ONE is applied, respectively. The seventh and eight columns show the type I errors of the columns fifth and sixth, respectively. are SNPs found in association by using rTDT.

Discussion

Sebastiani et al. [16] proposed to handle missing genotypes of parents or offspring in a nuclear family with one affected offspring. However, rTDT produces a conservative test and lacks power. Hence, we proposed MITDT-ONE to correct the problems of rTDT. The MITDT-ONE takes population frequencies of marker allele and disease allele into account in the rTDT method. With these and values, we restrict the domain of rTDT to have much better estimates for the maximum values of and . The minimum values of the interval estimates of MITDT-ONE and rTDT make a decision against the null hypothesis of no linkage. One of the advantages of using MITDT-ONE is that significance results achieved by complete data is ratified when the minimum value of the interval estimate is smaller than the value of TDT for complete data. The other advantage of our method is that it allows researchers to implement our method to any missing rates. As discussed in the introduction, many studies deal with missing genotypes in parents but not in offspring. Moreover, these methods assume some missing mechanism (e.g., MAR) to recover parental genotypes. Thus, another strength of MITDT-ONE is that it does not assume any missing model but simply considers the Mendelian Inheritance property to define all possible admissible genotypes in parents or offspring. Also, MITDT-ONE and rTDT become classical TDT when . In the construction of MITDT-ONE, we consider cases where all genotypes of family members are missing (Case 1). It is intuitive that since these families do not have any information they should be ignored from the study. We suggest that these families be omitted from the data if only one SNP is studied. However, if more than one SNP are studied then we suggest keeping them in the computation of MITDT-ONE to have same number of families for each SNP. In summary, simulation studies show that MITDT-ONE controls type I error rates very well and produces high power when degree of linkage disequilibrium is mild. More than one offspring: rTDT for two affected offspring was proposed by [17]. However, it was a conservative test and had low power. Hence, Alpargu [17] proposed MI-TDT to remedy the problems. With the motivation of Alpargu [17], we proposed MITDT-ONE. Both MITDT-ONE and MI-TDT correct the problems arising from rTDT. Theoretically, it is possible to propose our method for families with at least three and more affected offspring. However, the computation will be tedious because the number of missing cases increases as the number of affected offspring increases. Moreover, in the linkage studies it is very rare to have more than two affected offspring. Multiple alleles: We proposed MITDT-ONE for bi-allelic cases. However, it is possible to extend to multi-allelic cases. We consider two approaches that have been used in practice [22], [23]. In the first approach, all alleles except the allele of interest are grouped as allele 2, and the MITDT-ONE for bi-allelic case is applied [22]. In the second approach, if we have alleles, then for each allele, the first approach is applied to obtain MITDT-ONE statistics, then the largest MITDT-ONE is chosen as the test statistic [23] to make a decision about significant gene.
  20 in total

1.  Comparison of tests for association and linkage in incomplete families.

Authors:  A C Cervino; A V Hill
Journal:  Am J Hum Genet       Date:  2000-06-06       Impact factor: 11.025

2.  Allowing for missing parents in genetic studies of case-parent triads.

Authors:  C R Weinberg
Journal:  Am J Hum Genet       Date:  1999-04       Impact factor: 11.025

3.  A note on power approximations for the transmission/disequilibrium test.

Authors:  M Knapp
Journal:  Am J Hum Genet       Date:  1999-04       Impact factor: 11.025

4.  The transmission/disequilibrium test and parental-genotype reconstruction: the reconstruction-combined transmission/ disequilibrium test.

Authors:  M Knapp
Journal:  Am J Hum Genet       Date:  1999-03       Impact factor: 11.025

5.  Robust transmission/disequilibrium test for incomplete family genotypes.

Authors:  Paola Sebastiani; Maria M Abad; Gülhan Alpargu; Marco F Ramoni
Journal:  Genetics       Date:  2004-12       Impact factor: 4.562

6.  A comparative study of sibship tests of linkage and/or association.

Authors:  S A Monks; N L Kaplan; B S Weir
Journal:  Am J Hum Genet       Date:  1998-11       Impact factor: 11.025

7.  Hidden linkage: a comparison of the affected sib pair (ASP) test and transmission/disequilibrium test (TDT).

Authors:  R E McGinnis
Journal:  Ann Hum Genet       Date:  1998-03       Impact factor: 1.670

8.  A sibship test for linkage in the presence of association: the sib transmission/disequilibrium test.

Authors:  R S Spielman; W J Ewens
Journal:  Am J Hum Genet       Date:  1998-02       Impact factor: 11.025

9.  General score tests for associations of genetic markers with disease using cases and their parents.

Authors:  D J Schaid
Journal:  Genet Epidemiol       Date:  1996       Impact factor: 2.135

10.  A discordant-sibship test for disequilibrium and linkage: no need for parental data.

Authors:  S Horvath; N M Laird
Journal:  Am J Hum Genet       Date:  1998-12       Impact factor: 11.025

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.