Literature DB >> 19812770

A unified discussion on the concept of score functions used in the context of nonparametric linkage analysis.

Abstract

In this article we try to discuss nonparametric linkage (NPL) score functions within a broad and quite general framework. The main focus of the paper is the structure, derivation principles and interpretations of the score function entity itself. We define and discuss several families of one-locus score function definitions, i.e. the implicit, explicit and optimal ones. Some generalizations and comments to the two-locus, unconditional and conditional, cases are included as well. Although this article mainly aims at serving as an overview, where the concept of score functions are put into a covering context, we generalize the noncentrality parameter (NCP) optimal score functions in Angquist et al. (2007) to facilitate--through weighting--for incorporation of several plausible distinct genetic models. Since the genetic model itself most oftenly is to some extent unknown this facilitates weaker prior assumptions with respect to plausible true disease models without loosing the property of NCP-optimality.Moreover, we discuss general assumptions and properties of score functions in the above sense. For instance, the concept of identical by descent (IBD) sharing structures and score function equivalence are discussed in some detail.

Entities: Chemical Disease Gene Species

Keywords: IBD-sharing structures; NCP-optimality; allele sharing; equivalence of score functionse; families of score function definitions; genetic disease models; genetic models; inheritance vectors; nonparametric linkage analysis; score functions

Year: 2008 PMID： 19812770 PMCID： PMC2735957

Source DB: PubMed Journal: Bioinform Biol Insights ISSN： 1177-9322

Introduction

In linkage analysis (Ott, 1999) or, in a wider sense, gene mapping (Haines and Pericak-Vance, 2006; Siegmund and Yakir, 2007) one searches for disease loci along genetic regions of interest; in other words, through what we refer to as a genome. This is done by observing so called genotypes and phenotypes of a pedigree set, i.e. a set of multigenerational families, throughout the genome. The rationale for doing this is that, at a disease locus, the genotypes and phenotypes should generally show correlation of some strength on the individual level within the pedigree, where the actual strength depends on the structure of disease, i.e. the so called genetic model. Observed present correlations, measured through some kind of test statistic, suggests localizations of loci corresponding to underlying disease genes or, at least, it narrows down the interesting genome regions to neighbourhoods of the findings. The amount of trust put into such loci actually being disease-related are generally evaluated, in a standard sense, through statistical significance calculations; preferably corrected for the multiple testing throughout the genome. An example of a small pedigree set is given in Figure 1. To further reduce the size of a plausible region for an interesting disease finding, i.e. to use a fine-mapping technique, one may, for instance, use methods from the toolbox of association analysis (see Balding, 2006).

Figure 1

A pedigree set example consisting of 5 distinct pedigrees of different structures and phenotype settings.

Basic notation and concepts

In practise the genotypes are observed as well-defined allelic types at polymorphic marker loci located along the genome of interest. Vaguely speaking, a marker locus might be seen as an, in some sense, observable short chromosomal segment and it is polymorphic if several types of genetic observations are possible, in the underlying population, with respect to this segment. Hence polymorphic markers correspond to genetic variation in the population.

Example 1 (Alleles and genotypes)

Consider a situation where we have a polymorphic locus with respect to three distinct possible allelic types-outcomes A, B and C within the population. Hence, at this locus, a specific individual will have any of the six consistent unordered genotypes; AA, AB, AC, BB, BC and CC, with certain probabilities jointly summing to one. For more information on, for instance, alleles, genotypes and genetic markers cf. Strachan and Read (2003). As a restriction or application, in nonparametric linkage (NPL) analysis (Whittemore and Halpern, 1994; Kruglyak et al. 1996; Ängquist, 2007) one searches for genetic linkage between disease and marker locus by observing and analyzing marker genotype data, without explicitly assuming a known genetic disease model. As noted above the linkage analysis approach may somewhat vaguely be described as analyzing the amount of dependence or correlation between genotypes and phenotypes among the observable individuals in the data set at hand, and hence in the nonparametric case one does not incorporate information on any disease loci in the standard analysis or search. Most oftenly such data is then taken to be representative of a homogeneous underlying population. Note that generally the phenotypes are assumed to be qualitative in the working form of indicators of disease status. In this context the prime quantity of central importance to the actual statistical analysis-procedure is the process of inheritance of alleles. Each individual inherits two alleles, i.e. a genotype, at each chromosomal locus; one from the father and one from the mother. The inherited alleles themselves originates from either the corresponding grandfather or the grandmother and this leads to the following statement: For a single pedigree, at locus x, the inheritance process may be totally described by the binary—zero-one—inheritance vector (Donnelly, 1983), where p and m correspond to the ith nonfounder’s paternal and maternal allele respectively, i.e. each value is connected to one of the m = 2(n − f) specific meioses.1 Note that, for instance, one may in practise let 0 and 1 correspond to inheriting grandpaternal and grandmaternal alleles respectively.

Example 2 (Founders and nonfounders)

Consider the pedigrees in Figure 2. In both cases the parents constitutes the set of founders, whereas the siblings are the nonfounders.

Figure 2

The pedigree structures corresponding to affected sib-pair (ASP) and affected sib-trio (AST) pedigrees.

In the same manner as (1), but somewhat more obscure, one may summarize inheritance through IBD-sharing structures, where IBD means identical-by-descent. Two alleles are IBD if they are both ancestrally inherited from the same unique founder allele2 with respect to the corresponding pedigree. Basically, forming IBD-sharing structures means grouping the elements of the set of all the 2 possible inheritance vectors, , according to some pedigree-relational symmetry rules, into distinct IBD-groups. Such symmetry rules are, at least in principle, to some extent subjective. A commonly accepted example is that inheritance vectors will fall into the same group if they correspond to, i.e. one gets the same inheritance structure, permuting the inheritance of two siblings (with corresponding offsprings). Most oftenly, this is accepted even if the siblings being of distinct sexes. For more information, see Example 3 and Appendix A. To numerically facilitate analysis of inheritance and phenotype-genotype dependence one may introduce a score function. Expressed in general terms this is just a function S giving a (numerical) score S(v) to each possible inheritance vector v ∈ , i.e. it serves as a representation of the quantification of phenotype-genotype correlation.3 Normally one searches for inheritance-wise deviations in the form of increased allele-sharing among affecteds,4 since this indicates presence of genetic linkage between the marker and disease loci. As a consequence one aims at giving inheritance vectors consistent with such increased sharing high scores. On the other hand vectors being non-consistent, in this sense, are then given low scores.

Assumption 1

We assume that score functions are invariant within IBD-sharing structures. Explicitly, this implies that each inheritance vector v corresponding to a specific structure A produces the same output (score), i.e. where is the equivalence class including all inheritance vectors corresponding to structure A. Hence considering a pedigree with m corresponding meioses leads to 2 possible scores, assuming some order of inheritance vectors.5 In this setting some scores will according to symmetry, and in some cases by—explicitly or implicitly—definition, be numerically equal. Using the context of IBD-sharing structures one may reformulate (2) as where the index corresponds to the by-score-ordered set of IBD-sharing structures, i.e. a natural restriction (order) is given by assuming s1 < s2 < … < s. Quite naturally one may note that n ≤ 2−.

Remark 1

In fact one may instantly note that n ≤ 2, where f is the number of founders, which follows from what is generally referred to as ‘founder couple reduction’ (; ). This is an inheritance symmetry property originating from the uncertainty of founder phases (ambiguity of inheritance vector interpretation).

Example 3 (Founder couple reduction)

For an affected sib-pair (ASP), see 6 the ordering (phases) of the founder alleles is unknown we, in these cases, do not really know which of the following ordered founder-genotypes that is truth-valid: The implication of this is that all inheritance vectors related through transformations between these founder-genotypes are inheritance-wise evidentially equivalent. (Hence giving rise to equivalent IBD-sharing structures; see Appendix A.) For further information on equivalent IBD-sharing structures consider Appendix A.

Aims and scope

Our primary goal with this paper is to, in such a generally accessible way as possible, formalize and discuss the structure of nonparametric linkage score functions. Oftenly, in published works, these functions are either directly applied using some of the standard instances or derived in an ad hoc or highly theoretical, or non-intuitive, fashion. Having this in mind, the text to follow is not a complete summary of suggested and published score function variants, or the most theoretical exposition out there. Rather, it aims at being a review-like overview discussing the underlying structure, contexts of derivations and interpretations (and to some extent performance) of certain families of NPL score functions. In Section 2 three distinct such families—the implicit, the explicit and the optimality-based one—are introduced and discussed, whereas Section 3 gives a new generalization of an existing optimality-based function. A small simulation study with respect to five distinct score functions of various types is performed in Section 4. The two appendices, Appendix A and Appendix B, discusses equivalence-properties with respect to structure and standardization of score functions respectively.

Approaches to Score Function Definitions

For an underlying disease to be genetically inheritable, i.e. to include a genetic component, some kind of correlation between the phenotype and the disease genotypes must exist. This is usually described by means of a genetic model λ. One may note that λ usually, at least to some extent, is unknown so, if needed, it is estimated prior to analysis using so called segregation analysis (Khoury et al. 1993; Haines and Pericak-Vance, 2006). The complete, possibly multilocus, genetic model may be summarized as, where p is the set of disease allele frequencies, f is the set of penetrance values, describing the link between phenotypes and disease genotypes, and l defines the disease loci positions. Now, to define a score function one basically has to instantiate the numerical scores corresponding to (2) or (3). This may be done in several distinct ways, which is furtherly discussed below. What truly is the core question with respect to such definitions is the evidential performance of the corresponding score function. (Most likely in the form of statistical power calculations.) One may note that the relative performance of different score functions depends on the underlying genetic model λ and the combined present pedigree-structure of the pedigree set. A score function performing well under a wide range of different λ ∈ Λ, where Λ is the set of all possible disease models, is termed a robust score function. The best score function with respect to a criterion C and disease model λ is called an optimal score function Sopt = S (v|λ).

Implicitly defined versions

Vaguely speaking, as noted above, at a true disease locus, the IBD-sharing within phenotypes should be expected to increase. This makes it possible to define functions, depending on pedigree IBD-sharing only, meeting this requirement (property). Since such functions implicitly instantiate (2) and (3) through the higher-level sharing-based function definition we call them implicitly defined score functions. Next, we will note on two distinct such definitions.

Traditional score functions

Firstly, Spairs (Weeks and Lange, 1988) is based on IBD-sharing among all pairs of affected individuals in the pedigree, where i < j, is the set of affecteds in the pedigree7 and IBD(x, y) is the number of alleles shared IBD between individuals x and y. Secondly, Sall (Whittemore and Halpern, 1994) is based on the simultaneous IBD-sharing among all the affecteds in the pedigree, where | | is the number of affecteds, ℍ is a set containing the elements corresponding to all ways of selecting one allele from each affected, 2f is the number of founder alleles in the pedigree and b (h) is the number of times the ith founder allele is present in selection h ∈ ℍ.8

Example 4 (Score function Sall)

For an ASP ( ∈ ℍ): where for instance, treating A as the 1st founder allele, Π=14 b (h1) =2!0!0!0!=2.

Remark 2

Both (5) and (6) give high (low) scores to excess (low) IBD-sharing. The difference lies in that the latter one, relatively seen, upweight increased sharing of specific founder alleles within large groups of individuals, thus reflecting a higher degree of belief in such inheritance evidence.

Extended score functions

Both functions (5) and (6) are defined, given the inheritance vector v, with respect to the set of affecteds only, which might be notationally pointed out as S(v) = S(v | ). Henceforth we refer to such score functions as traditional score functions. In fact a vast majority of the most commonly used functions are of this kind. In Ängquist (2006) several extensions to traditional score functions are given. Now, assume a traditional instance S and let S′ denote a corresponding extended version. A first-level extension is to combine information from both phenotype groups (affecteds as well as unaffecteds) through This aims at additionally searching for unusual IBD-sharing within the set of unaffecteds . Note that S(v | ) in practise means, given inheritance vector v, applying the traditional score function S to the same pedigree set, in the standard way using the same function-definition, but with phenotypes interchanged between affecteds and unaffecteds.9

Example 5 (Extended score functions; phenotype-switching)

Consider the pedigree consisting of two parents (unknown phenotypes) and four siblings (A, B, C and D) in | ) this is done with respect to Siblings A and D. After the phenotype-switching process displayed in | ) is calculated using Siblings B and C. Note that the actual score function algorithm, for instance underlying (5) or (6), is the same in both cases. A second-order extension may be formulated as where ℙ denotes the set of individuals with unknown phenotype. Here one additionally corrects for the overall sharing within the pedigree, i.e. it compares the IBD-sharing (through the traditional function S) within phenotype-groups to what is jointly given on the pedigree-level.

Remark 3

An intuitive critiscism to extensions as (7) and (8) might be that unaffecteds is not to the same extent as affecteds a secure (final) phenotype, since in time such individuals might turn into affecteds.10 However, this could be solved by letting ambiguous cases, according to some criterion, being labelled as having affection status unknown, i.e. as ℙ-individuals. Further, given a well-defined probability model for (possible) affection time one may weight the analysis (scores) with respect to this model.

Remark 4

Another objection against usage of unaffecteds in this way may be raised if a disease is not purely caused by gene mutations, but rather through a combination of genetic and environmental factors. In this case the unaffecteds in the pedigree are not obviously good representatives of the normal population in terms of genetic composition. It is then logically possible that such unaffecteds still share a common genetic background with the affected relatives, but lack certain environmental factors; or lack some trigger events in their health history. Hence, their role in gene mapping is in this case of secondary importance.

Explicitly defined versions

It is perfectly possible not to use a closed definition or high-level algorithm when calculating the vector of scores constituting the corresponding score function. We refer to such cases as explicitly defined score functions. The construction of an explicit score function reduces to (explicitly) distributing scores to all present IBD-sharing structures, thus reflecting numerically the assumed connection between these sharing structures and evidence for a present disease locus. For instance, such an approach might be interesting if one can show by some real examples, or a priori assume, that certain combination of inheritance vector states are impossible or unlikely.

Example 6 (Explicit ASP-definition)

Once more, consider an ASP. Here three IBD-sharing structures (with scores s1, s2 and s3) are possible corresponding to the sib-pair sharing 0, 1 and 2 alleles IBD respectively. Arbitrarily fixing s1 and s3 with s1 < s3 the closure of an explicit definition is reflected by the choice of s2 with the restriction of s1 ≤ s2 ≤ s3; see Section 2.4 and Appendix B. Explicit definitions, so to speak, implicitly make some (though quite vague) assumptions on the type of underlying disease structure. In this sense they are more strongly directed towards certain disease models than implicit definitions, but much less so than the family of definitions described below in Section 2.3. There explicit assumptions on true (plausible) genetic disease models λ under corresponding alternative hypotheses H1 are made.

Optimality defined versions

If having an explicit algorithm (as for implicitly defined versions) but where this algorithm is formulated with respect to, in some sense, an optimality criterion C, we say that we deal with C-optimal score functions. Given a disease model λ, define the expected score at the disease locus under this model as where P(w|λ) is the inheritance distribution under disease model λ. The expected value in (9) is referred to as the noncentrality parameter (NCP). It is showed in Ängquist et al. (2007) (based on results given in Hössjer, 2005) that optimal score functions with respect to (maximization of) NCPs may be expressed as with m equaling the number of meioses. This approach might be interpreted as basing the scores on the difference between inheritance vector-probabilities under the null and alternative hypothesis in all cases.11 The rationale for being interested in NCPs are that this concept is closely linked, but not equivalent, to statistical power (Feingold et al. 1993). Hence one may note that the optimal score function (10) depends on the true genetic model and should be interpreted as, in this sense, the best possible result that the investigator might expect when the genetic model is correctly specified. In practice though, the genetic model is often unknown. Then in a natural way, for each choice of score function and for a range of different genetic models, (10) facilitates comparisons with optimality, leading to a quantification of the apparent loss of information. The optimal score function might also serve as a form of explicit score function with respect to certain assumptions or prior information. Further, in Hössjer (2003) locally most powerful tests are outlined using specific parametric models (in the form of exponential expansions) for the inheritance distribution under alternatives. Consider also the discussions in Whittemore (1996), Kong and Cox (1997), Nicolae (1999) and McPeek (1999).

Equivalent score functions

As a way of enhancing interpretation one usually uses standardized versions of the score functions. Standardization is performed through where, for a pedigree with m meioses, are the mean and variance of S prior to standardization; under the null hypothesis H0 of no linkage and where summation is over all the 2 distinct elements w ∈ .12

Remark 5

Note that S on the right-hand side in (11) is referred to as an ‘unstandardized’ score function, whereas S on the left-hand side is a ‘standardized’ score function. Equipped with the concept of standardization one may define equivalent (unstandardized) score functions. In order to define this concept in a clear and straighforward manner we need the following additional assumption.

Assumption 2

We assume that there is a general agreement on the order of the IBD-sharing structures, i.e. that s∀i) in (3) correspond to the same structure regardless of which score function you choose. If two unstandardized score functions through standardization are transformed to equal13 standardized score functions they are referred to as being equivalent. For more detailed information and corresponding equivalence-criterions, see Appendix B.

Example 7 (Equivalence of Spairs and Sall for ASPs)

For an ASP the score functions S . Adopting the approach in (2) these scores correspond to, in turn, 4, 8 and 4 distinct inheritance vectors related to the ASP sharing 0, 1 and 2 alleles IBD respectively. (Here we have m = 4 meioses and 2 = 16 unique inheritance vectors.) Alternatively, one may use (3) where only contain these three scores (structures), which are then attained with probabilities 0.25, 0.50 and 0.25 respectively under H0. One may also note that actual numerical standardized scores corresponding to a specific score function (or several equivalent ones) are dependent on the score distribution P(s|H0) under the null hypothesis H0, which is given by the actual pedigree structure and phenotype setting.14

Real studies and data

Note that throughout this article we try to discuss score functions without explicitly mentioning the actual test statistics they are used in connection with when facing real and imperfect marker data MD.15 An exception is the use of standardization through (11) which implicitly refer to the practise of the ‘NPL score’ test statistic (Kruglyak et al. 1996; Ängquist, 2007). where the expected value, at locus x, is taken over P [v(x)|MD] which is the inheritance distribution given the observed marker data.16 Given imperfect data the variance of the NPL score V(Z) ≤ V(S), hence if decreasing leading to conservative procedures assuming V(Z) = V(S) = 1. In order to increase the actual variance in data, hence reducing the conservativeness, one usually bases real studies on so called multipoint analysis, where all inheritance information from the surrounding chromosome is used when calculating the inheritance distribution at a locus. Here the calculations are preferably performed using Hidden Markov Models (HMMs) through the Lander-Green-Kruglyak algorithm; see Lander and Green (1987), Kruglyak et al. (1995) and the expositional review in Ziegler and Koenig (2006).17 Actually, the complete marker data assumption seems fairly realistic when all pedigree members are genotyped with a density of SNP markers of at least, say, 0.1 cM. Replacing σ2 in (11) with V(Z) at each loci leads to the interpretation of the standardized score as a common statistical score function based on the derivative of a corresponding likelihood function (see Kong and Cox, 1997).18 However, note that although the choice of test statistic and possible standardization procedure are important from a testing and statistical significance perspective it is not particularily essential for the present discussion. Moreover, generally the interpretations and relative performances of the different score function variants will not change when dealing with imperfect data, hence this matter is only noted on in this specific subsection.

Two-locus score functions

One may generalize the one-locus procedure above in order to simultaneously, or sequentially, search for two distinct disease loci on the genome. The former case is referred to as an unconditional analysis, whereas the latter case is a conditional analysis performed conditioning on some kind of genetic information at one, or several, conditioning loci. One may generally use the same basic score function definitions in both cases, taking into account that the standardizations will differ. Implicitly defined score functions may in some cases be relatively easily generalized to the two-locus case, but in some cases the corresponding score-algorithm will be refrainingly more complex. As a positive example, one may generalize (5) into a two-locus score function. In Ängquist et al. (2007) the following, quite general, formulation is given where IBD (w) equals IBD(a, a) in (5) with respect to inheritance vector w, related to the ith (disease or marker) loci, and {a, a} ∈ . For k > 1 (13) may be thought of as trying to capture epistatic joint pairwise IBD-sharing within a pedigree. The case k = 1 of (13) corresponds to the additive score function used in Strauch et al. (2000), which these authors also implemented into the analysis program GENEHUNTER-TWOLOCUS. In the applications of Ängquist et al. (2007) the case k = 2 is used, which shows close to NCP-optimal performance for the one-parameter genetic disease model families used in their simulations.

Example 8 (ASP score matrix)

For ASPs one might summarize a two-locus score function completely using a 3 × 3 score matrix.19 Letting where IBDk = l means that the ASP shares l alleles IBD at the kth (marker or disease) locus, leads to the general score matrix Several instances and substructures of (14) are given, implemented and discussed in . Two-locus explicitly defined score functions are concept-wise straightforward generalizations of one-locus ones. Moreover, the NCP-optimal score function (10) of Ängquist et al. (2007), for unconditional and conditional two-locus analysis respectively, may be generalized to Note that the interpretation of these scores as being proportional to probability-based differences with respect to the null and (assumed) alternative hypotheses still hold true.

Generalization to the Optimality-Based Definition Approach

In some cases where the true disease model λ is fully or partially unknown the usage of the NCP-optimal score function (10) based on an estimate (or assumption) λ̂ may be considered to lack robustness and applicability. In order to reduce unnecessary usage-avoidance we will next try to further generalize this approach, hence increasing its practical usefulness. More explicitly, our suggested method is adapted to include prior, assumed or intuitive, information on λ in a more direct sense than what is available through using explicit versions, but still avoiding the assumption of a single plausible genetic disease model.

Algorithm

Begin with choosing d distinct genetic disease models with inheritance distributions under corresponding alternatives Now, a simple generalization to the previous score in (10) is given by where d in the denominator in principle is unnecessary (according to the standardization) but makes comparisons between (10) and (16) possible in a natural way. A further generalization arises if adopting a Bayesian perspective with respect to the prior distribution of possible disease models.20 Fix d and let π = (π1, π2,…,π), with ∑=1 π =1, be the vector of prior probabilities corresponding to the d distinct disease models. This leads to (16) being generalized into One may note that (16) is the special case of (17) where π = (1/d, 1/d,…,1/d) and that (10) correspond to d = 1 and hence π = π1 = 1 for a single disease model λ1. Finally, observe that the NCP-optimality property (Ängquist et al. 2007) is kept if (in a somewhat abstract sense) π, given the present knowledge-base, is the true probability distribution with respect to the present genetic disease model-ambiguity.

A Small Simulation Study

For illustrational purposes we include a small-scale simulation analyses in this section. We perform power calculations for various settings and present them through ROC-curves, i.e. as plots with significance levels versus power with respect to a set of underlying score thresholds (Selin, 1965; Bradley, 1996). The results are given, and graphically displayed, in Figures 4–6.

Figure 4

Power calculations for Pedigree 1 and score functions S1-S5. Presented as ROC-curves with significance levels α(T) vs. powers β(T) for score thresholds T. (Logarithmic X/Y-scales.) Upper and lower panel uses penetrance vectors f = (0.02, 0.2, 0.8) and f = (0.02, 0.8, 0.8) respectively.

Figure 6

Power calculations for Pedigree 3. See caption of Figure 4.

Simulation set-up

Consider a pedigree consisting of two parents of unknown phenotypes and M siblings. For instance, Pedigree 3 in Figure 1 is such a pedigree with M = 5. We construct three homogeneous pedigree sets, i.e. a set consisting of pedigrees with similar structure and phenotype setting only, based on three distinct such pedigrees with M = 6: (i) Pedigree 1 consisting of 4 affected and 2 unaffected siblings. (ii) Pedigree 2 consisting of 3 affected and 3 unaffected siblings. (iii) Pedigree 3 consisting of 2 affected and 4 unaffected siblings. The number of pedigrees in each pedigree set is put to N = 15. Further, for each case we use a genome consisting of a single chromosome of length G = 4 Morgans, J = 2000 simulations and score thresholds ranging from T = 3 to T = 10. The analyses are made with respect to five distinct score functions: The S1 = Spairs function in (5), the S2 = Sall function in (6), the extended version, S3, using (7) for Spairs, the extended version, S4, using (7) for Sall, the NCP-optimal score function S5 in (10). All score functions are standardized through (11) and calculations are performed using the NPL score approach (12). Finally, we used two genetic models, λ1 and λ2, where both correspond to disease allele frequency p = 0.01, but with distinct penetrance vectors, respectively. Here f denotes the probability for an individual, having a disease genotype consisting of i disease alleles and 2 − i normal alleles, of being affected.

Results and discussion

It is quite hard to draw very certain conclusions from such a small study, once more note that this section is in some sense a side-track, but a few general observations of some interest may be stated: (i) S2 performs better than S1 for Pedigree 1, whereas the opposite is true for Pedigree 2–3 under λ2. In other words their relative performance is affected by the pedigree structure as noted above. (ii) The extended versions S3 and S4 often outperforms the traditional (nonextended) versions S1 and S2. These extensions seem somewhat more favourable for Pedigree 3 than for Pedigree 1, which seems reasonable since the latter pedigree has a structure more directed towards unaffected individuals. They also seem more advantageous under λ2 than for λ1, which might be explained by the latter model having more IBD-sharing discrimination power within the subgroup of unaffecteds; according to a higher disease penetrance for disease heterozygotes. (iii) The NCP-optimal score function S5 is performance-wise much better under λ2. Probably mainly follows from similar reasoning as given in the last sentence under (ii).

14 in total

1. Parametric and nonparametric multipoint linkage analysis with imprinting and two-locus-trait models: application to mite sensitization.

Authors: K Strauch; R Fimmers; T Kurz; K A Deichmann; T F Wienker; M P Baur
Journal: Am J Hum Genet Date: 2000-05-04 Impact factor: 11.025