| Literature DB >> 20092660 |
Marko Toplak1, Tomaz Curk, Janez Demsar, Blaz Zupan.
Abstract
BACKGROUND: Computational methods that infer single nucleotide polymorphism (SNP) interactions from phenotype data may uncover new biological mechanisms in non-Mendelian diseases. However, practical aspects of such analysis face many problems. Present experimental studies typically use SNP arrays with hundreds of thousands of SNPs but record only hundreds of samples. Candidate SNP pairs inferred by interaction analysis may include a high proportion of false positives. Recently, Gayan et al. (2008) proposed to reduce the number of false positives by combining results of interaction analysis performed on subsets of data (replication groups), rather than analyzing the entire data set directly. If performing as hypothesized, replication groups scoring could improve interaction analysis and also any type of feature ranking and selection procedure in systems biology. Because Gayan et al. do not compare their approach to the standard interaction analysis techniques, we here investigate if replication groups indeed reduce the number of reported false positive interactions.Entities:
Mesh:
Year: 2010 PMID: 20092660 PMCID: PMC2823693 DOI: 10.1186/1471-2164-11-58
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Disease penetrance models. Penetrance models used to simulate epistasis between two SNPs. Allele frequencies are denoted with p and q. For example, model 1 specifies that 10% of individuals with genotypes AABb, AaBB, Aabb or aaBb and none of individuals with other genotypes have the disease.
Figure 2Replication groups scoring. Replication groups scoring involves three steps: (1) data partitioning, (2) assessment of score (X, Y) for a given SNP pair (X, Y) on a replication group S, and (3) computation of the final score min0≤(X, Y).
Figure 3Performance graphs. The dependency of false positive counts given the number of selected best candidate interactions. A direct scoring (solid curves) was compared to scoring with two (dashed curves) or three (dotted curves) replication groups (RG). Curves closer to lower-right corner of the graph indicate better performance. The axes are in logarithmic scale to emphasize the results for smaller numbers of best candidates. The theoretically best and worst possible performance curves are shown in light gray.