| Literature DB >> 15814067 |
Rany M Salem1, Jennifer Wessel, Nicholas J Schork.
Abstract
Interest in the assignment and frequency analysis of haplotypes in samples of unrelated individuals has increased immeasurably as a result of the emphasis placed on haplotype analyses by, for example, the International HapMap Project and related initiatives. Although there are many available computer programs for haplotype analysis applicable to samples of unrelated individuals, many of these programs have limitations and/or very specific uses. In this paper, the key features of available haplotype analysis software for use with unrelated individuals, as well as pooled DNA samples from unrelated individuals, are summarised. Programs for haplotype analysis were identified through keyword searches on PUBMED and various internet search engines, a review of citations from retrieved papers and personal communications, up to June 2004. Priority was given to functioning computer programs, rather than theoretical models and methods. The available software was considered in light of a number of factors: the algorithm(s) used, algorithm accuracy, assumptions, the accommodation of genotyping error, implementation of hypothesis testing, handling of missing data, software characteristics and web-based implementations. Review papers comparing specific methods and programs are also summarised. Forty-six haplotyping programs were identified and reviewed. The programs were divided into two groups: those designed for individual genotype data (a total of 43 programs) and those designed for use with pooled DNA samples (a total of three programs). The accuracy of programs using various criteria are assessed and the programs are categorised and discussed in light of: algorithm and method, accuracy, assumptions, genotyping error, hypothesis testing, missing data, software characteristics and web implementation. Many available programs have limitations (eg some cannot accommodate missing data) and/or are designed with specific tasks in mind (eg estimating haplotype frequencies rather than assigning most likely haplotypes to individuals). It is concluded that the selection of an appropriate haplotyping program for analysis purposes should be guided by what is known about the accuracy of estimation, as well as by the limitations and assumptions built into a program.Entities:
Mesh:
Year: 2005 PMID: 15814067 PMCID: PMC3525117 DOI: 10.1186/1479-7364-2-1-39
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Description of unrelated haplotyping programs, divided into four classes based on method.
| Program name | Algorithm | Outputa | Missing datab | Assumptionsc | Key features | Limitations | MAX subjects, loci and type | Platform | Ref.d |
|---|---|---|---|---|---|---|---|---|---|
| HAPAR | Parsimony | HA | No | None | Overcomes limitations of HAPINFREX | May be susceptible to HWE departures | Practical limit, biallelic | PC/UNIX | [ |
| Increasing sample size improves | |||||||||
| HAPINFERX | Clark's | HA | No | None | Intuitive method, fast | May fail to start | Practical limit, | UNIX | [ |
| Reduced number | Sensitive to | ||||||||
| No limit on | Unstable and | ||||||||
| BPPH | IP | HA | No | IP | Similar to HAPH | User interface | Practical limit, biallelic | MAC | [ |
| Speed | |||||||||
| DPPH | PP | HA | No | PP | Handles large datasets | Theoretical | Practical limit, biallelic | MAC | [ |
| Speed | Strict population assumptions | ||||||||
| GPPH | PP | HA | No | PP | Handles large datasets | Theoretical | Practical limit, biallelic | MAC/PC/UNIX | [ |
| Speed | Strict population assumptions | ||||||||
| HAPH | IP | HA/HF | Yes | HWE, IP | Predicts haplotype blocks | No probability for haplotype assignments | Max 500 loci, Practical limit biallelic | Web-based | [ |
| Constructs | |||||||||
| Identifies block | |||||||||
| Web-based | |||||||||
| Arlequin v2.0 | EM | HA/HF | No | HWE | Includes numerous population genetic analysis tools | EM issues | EM Practical Limits, biallelic/multiallelic | JRE on MAC/PC/UNIX | [ |
| CHAPLIN | ECM | HF | Yes | HWE | Graphical interface | ECM algorithm needs to be compared with standard EM methods | Practical limits, biallelic/multiallelic | PC | [ |
| Association tests | |||||||||
| HWE assumption relaxed in case | |||||||||
| EH | EM | HF | No | HWE | Estimates haplotype frequency | EM issues | No Max, 3-4 practical max, biallelic/multiallelic | PC | [ |
| Compares case-control HF | Must specify mode | ||||||||
| EHPLUS | EM | HF | No | HWE | Improves EH, | Long run times | Max 5 loci, | PC/UNIX | [ |
| Incorporates | |||||||||
| EM-DeCODER | EM | HA/HF | No | HWE | Program with | EM issues | Max 15 loci, | UNIX | [ |
| FASTEHPLUS | EM | HF | No | HWE | Similar to EHPLUS, | EM issues | Max 5 loci, | PC/UNIX | [ |
| GENECOUNTING | EM | HA/HF | Yes | HWE | Provides posterior | Missing data | 10-15 loci | PC/UNIX | [ |
| Compares global | EM issues | ||||||||
| GCHAP | EM | HA/HF | YES | HWE | Haplotypes with | EM issues | 20 loci practical | JRE on | [ |
| Similar to SNPHAP | |||||||||
| GS-EM | EM | HA/HF | Yes | HWE | Includes algorithm | EM issues | Practical limit, | Web-based | [ |
| Haplotypes constructed using assigned genotypes probability | Limited to biallelic SNPs | ||||||||
| Web-based | |||||||||
| HAPZ | EH | HA/HF | Yes | HWE | Modified version of | EM issues | Practical limit, biallelic/multiallelic | PC/UNIX | [ |
| HAPMAX | MLE | HF | No | HWE | Ease of use | Accommodates a limited number | 8 loci, biallelic | PC | [ |
| Interface | |||||||||
| HAPLOH | EM | HF | Yes | HWE | Handles some missing data | EM issues | 10 loci, 40 alleles max, biallelic/multiallelic | UNIX | [ |
| Utilises pedigree data, if available | |||||||||
| Calculates standard | |||||||||
| HAPLOSCOPE | EM/MCMC | † | † | † | Platform program, incorporates SNPHAP and PHASE v1.0 | See individual programs for limitations/features | † | UNIX/Windows | [ |
| Facilitates | |||||||||
| Graphical interface, identifies tagging SNPs and LD | |||||||||
| HAPLOVIEW | EM+PL | HA/HF | Yes | HWE | Calculates pairwise LD | EM issues | 100 s, practical limit, biallelic | JRE on MAC/PC/UNIX | [ |
| Checks for | |||||||||
| Identifies tagging | |||||||||
| Accepts pedigree | |||||||||
| HAPLO.STATS | EM | HA/HF | Yes | HWE | Incorporates | Requires | Practical | S-PLUS | [ |
| Separate | EM issues | ||||||||
| (1) assign | |||||||||
| (2) allow linear | |||||||||
| (3) calculates score | |||||||||
| HIT | EM/MCMC/ | † | † | † | Platform program, | See individual | † | * | [ |
| Facilitates | |||||||||
| Graphical interface, | |||||||||
| HPLUS | EM+EE+PL | HA/HF | Yes | HWE | Provides posterior | Requires Matlab | 100 loci, | MATLAB | [ |
| Compares | EM issues | ||||||||
| Utilises pedigree | |||||||||
| LDSUPPORT | EM | HA/HF | Yes | HWE | Provides posterior | EM issues | * | UNIX | [ |
| Identifies LD | |||||||||
| Examines association with disease, | |||||||||
| LOGINSERM | EM | HA/HF | Yes | HWE | Program uses ML | EM issues | Practical | PC/ | [ |
| Offers option to | |||||||||
| MLHAPFRE | EM | HF | Yes | HWE | Performance improves with presence of LD | Incorporated into Arlequin | 16 loci, biallelic | JRE on Mac/PC/UNIX | [ |
| Performs well with | EM issues | ||||||||
| MLOCUS | EM | HA/HF | Yes | HWE | Provides posterior probabilities for assigned haplotypes | EM issues | 11 loci, biallelic/multiallelic | PC | [ |
| Notes observed vs. | |||||||||
| Calculates pairwise LD | |||||||||
| OSLEM | EM | Yes | No | HWE | Modified EM algorithm that runs 2 × faster | EM issues | Practical limit, biallelic | Web-based | [ |
| PL-EM | EM+PL | HA/HF | Yes | HWE | Combines PL with EM | EM issues | 100 s, practical limit, biallelic | PC/UNIX | [ |
| EM-based version | |||||||||
| Calculates variance | |||||||||
| SAS Genetics | EM | HA/HF | Yes | HWE | Provides posterior probabilities for assigned haplotypes | Requires SAS | Practical limit, biallelic/multiallelic | SAS on PC/UNIX | [ |
| Incorporates | EM issues | ||||||||
| SNPEM | EM | HF | No | HWE | Estimates | EM issues | 10 loci, biallelic | UNIX | [ |
| Compares global | |||||||||
| SNPHAP | EM | HA/HF | Yes | HWE | Uses posterior and | EM issues | Practical limit, | UNIX | [ |
| Provides posterior | |||||||||
| THESIAS | S-EM | HF | Yes | HWE | Stochastic EM | S-EM algorithm | Practical limit, | PC/UNIX | [ |
| Includes tests for | |||||||||
| Accommodates | |||||||||
| WHAP | EM | † | † | † | Uses haplotype | EM issues | † | PC/UNIX | [ |
| Allows weighted | Requires | ||||||||
| Zaykin | EM | HF | No | HWE | Program on | EM issues | Practical | PC/UNIX | [ |
| Subjects with | |||||||||
| Zou and Zhao | MLE/EM | HF | Yes | HWE | Adjust haplotype | Assumes genotyping errors are | Practical limits, | * | [ |
| Program also | Assumes error | ||||||||
| 3locus.PAS | EM | HF | Yes | HWE | Handles some | EM issues | 3 loci, biallelic/ | PC/UNIX | [ |
| Various tests | |||||||||
| Improves with | |||||||||
| HAPLOTYPER | MC+PL | HA/HF | Yes | HWE | Uses PL algorithm | Long run times | 256 max, | UNIX | [ |
| Provides posterior | Posterior | ||||||||
| HAPLOREC | MC-VL | HA/HF | Yes | HWE | Uses variable | Restarts avoid | Practical | Java | [ |
| Handles large | |||||||||
| Arlequin v3.0 | ELB | HA/HF | No | Adaptive | Includes numerous | Long run times | 1,000 s, biallelic/ | JRE on | [ |
| Handles recombination | |||||||||
| PHASE v2.0 | MCMC+PL | HA/HF | Yes | Coalescent/ | Improved run time | Departure for | Practical limit, | PC/MAC/ | [ |
| Compares haplotype frequency | Posterior | ||||||||
| Handles | |||||||||
| Provides posterior | |||||||||
| PHASE v1.0 | MCMC | HA/HF | No | Coalescent/ | Incorporates | Departures for | Practical limit, | UNIX | [ |
| Incorporates | Slow run times | ||||||||
| Provides posterior | Posterior probabilities | ||||||||
| SLHAP v1.0 | MCMC | HA/HF | Yes | Neutral | Similar to PHASE | Departures for | Practical limit, | UNIX | [ |
| Missing data | |||||||||
| Improved run time | |||||||||
a Program haplotype output, individual assignment, frequency estimates or both.
b Ability of program to accept missing data.
c Program assumptions.
d List of references.
e Programs in this section make assumptions based on or draw inference from coalescent model.
*Could not determine from available data.
†See incorporated programs for features and limitations.
EE: Estimating equation; ECM: Expectation conditional maximisation algorithm; ELB: Excoffier-Laval-Balding algorithm, Bayesian; EM: Expectation maximisation algorithm; EM issues: May be sensitive to HWE departures, long run times, and non-global max (requiring multiple restarts); HF: Haplotype frequency estimate; HA: Individual haplotype assignment; HWE: Hardy-Weinberg equilibrium; IP: Imperfect phylogeny-based method; JRE: Java runtime environment; LD: Linkage disequilibrium; MAC: Program runs on Apple computer; MC: Monte Carlo algorithm, Bayesian algorithm; MCMC: Markov Chain Monte Carlo algorithm, Bayesian algorithm; MC-VL: Monte Carlo-variable length chain algorithm, Bayesian Algorithm; MLE: Maximum likelihood estimation algorithm; PC: IBM compatible personal computer; PL: Partition ligation algorithm; PP: Perfect phylogeny-based method; Practical Limit: program has no upper limit on number of markers and/or subjects, however computational and practical considerations limit this value; S-EM: Stochastic EM algorithm; UNIX: Runs on Unix operating system, including Linux, FORTRAN, Solaris and others.
Web-based haplotyping programs and related websites.
| Haplotyping program | Website | Comments |
|---|---|---|
| GS-EM | Assigns probability to genotype calls | |
| Constructs haplotypes using probabilities | ||
| HAPH | Imperfect phylogeny method | |
| Handles missing data | ||
| OSLEM | Modified EM algorithm that runs faster | |
| No missing data | ||
| PHASE v2.02 | MCMC method | |
| Comparison HF between groups | ||
| Handles missing data and recombination | ||
| SNPEM | EM method | |
| Comparison HF between groups | ||
| SNPHAP | EM method | |
| Handles missing data | ||
| Boas Center for Genomics and Human Genetics: North Shore LIJ Research Institute | Comprehensive list of statistical genetics software | |
| Mirrored by Rockefeller site | ||
| International HapMap Project | HapMap project news, data and information | |
| Laboratory of Statistical Genetics at Rockefeller University | Comprehensive list of statistical genetics software | |
| Mirror of North Shore LIJ site | ||
| MRC Rosalind Franklin Centre for Genomics Research | Registration required | |
| Numerous programs available | ||
| Power for Association with Errors | Calculates power and sample size in the presence of differing genotype error rates | |
| PRL: Polymorphism Research Lab | Additional information and links to all reviewed programs | |
| The Wellcome Trust Centre for Human Genetics | Several additional programs with links to their sources |
EM: Expectation maximisation algorithm; HF: Haplotype frequency estimate; MCMC: Markov Chain Monte Carlo algorithm, Bayesian.
Haplotyping software for hypothesis testing and analysis.
| Program name | Haplotyping algorithm | Key analysis feature(s) | Discrete outcomea | Continuous outcomeb |
|---|---|---|---|---|
| CHAPLIN | ECM | Includes likelihood ratio statistic and score statistic for haplotype-phenotype analysis, uses permutation test to determine significance | Yes, case-control | No |
| Includes AIC for model selection, does not accommodate covariates | ||||
| EH | EM | Test for LD for unrelated and in case-control | Yes, case-control | No |
| Test for frequency difference between case-control under: H1 association, H2 association for all loci | ||||
| EHPLUS | EM | Improves on EH | Yes, case-control | No |
| Model-free analysis and permutation test | ||||
| FASTEHPLUS | EM | Implements EH and EHPLUS test | Yes, case-control | No |
| Significant speed improvements | ||||
| GENECOUNTING | EM | Compares overall and specific haplotype frequency between cases and controls | Yes, case-control | No |
| HAPH | IP | Phylogeny based haplotyping method | Yes, case-control | Yes |
| Uses information from phylogeny for analysis, includes parametric and non-parametric tests for qualitative and quantitative phenotypes | ||||
| HAPLO.STATS | EM | Score statistic for haplotype-phenotype analysis | Yes, binary, ordinal, & Poisson | Yes |
| GLM for regression of trait on haplotype, adjustment for covariates and interaction | ||||
| HPLUS | EE +PL+ EM | Compares haplotypes frequency between cases and controls, option to adjust for covariates, and interaction assessment | Yes, case-control | No |
| Reports OR, confidence interval and identifies haplotype blocks | ||||
| LDSUPPORT | EM | Uses likelihood method to calculate risk of developing disease phenotype from diplotype configuration | Yes, case-control | No |
| PHASE v2.0 | MCMC | Allows comparison of haplotype frequency between populations | Yes, case-control | No |
| THESIAS | S-EM | Compares haplotypes frequency between cases and controls, survival | Yes, case-control, | Yes |
| Uses Chi-square statistics/t-test for analysis | ||||
| SAS Genetics | EM | Allows comparison of haplotype frequency between populations | Yes, case-control | Yes |
| Haplotype trend regression (HTR) and several population genetic tests | ||||
| TDT test for family data | ||||
| SNPEM | EM | Compares overall and specific haplotype frequency between cases and controls | Yes, case-control | No |
| Includes batch feature for sliding windows analysis | ||||
| WHAP | EM | Uses SNPHAP for regression based haplotype association test on SNPs, provides beta estimates of effects | Yes, case-control | Yes |
| Includes haplotype weighted likelihood analysis, permutation | ||||
| Zaykin et al. | EM | Likelihood ratio statistic for haplotype-phenotype analysis | Yes, case-control | Yes |
| Allows sliding windows analysis | ||||
| 3locus.PAS | EM | Test for global disequilibrium, including pairwise and three | No | No |
| way disequilibrium for an unrelated sample | ||||
| Arlequin v2.0/3.0 | EM/ELB | Several population genetic tests | ||
| Zou and Zhao | EM | Adjust haplotype frequency estimates for genotyping error |
a Qualitative phenotype.
b Quantitative phenotype.
AIC: Akaike's Information Criterion; EE: Estimating equation; ELB: Excoffier-Laval-Balding algorithm, Bayesian; ECM: Expectation Conditional maximisation algorithm; EM: Expectation maximisation algorithm; GLM: General Linear Model; IP: Imperfect phylogeny; MCMC: Markov Chain Monte Carlo algorithm, Bayesian algorithm; OR: Odds Ratio; PL: Partition ligation algorithm; S-EM: Stochastic EM algorithm.
Description of programs designed for pooled samples.
| Program name | Algorithm | Outputa | Missing datab | Assumptionsc | Key features | Limitations | Pool Size, MAX #Loci, Type | Platform | Ref.d |
|---|---|---|---|---|---|---|---|---|---|
| Pools2 | Clark's/EM | HF/HA | N/A | None | Haplotype-tagging SNPs | Computationally | Pools of | PC | [ |
| Accommodates | Need to re-calculate several times | ||||||||
| EM issues | |||||||||
| LDPooled | EM | HF/HA | No | HWE | Calculates LD | LD impacts | Based on pools | * | [ |
| SNPs or | EM issues | ||||||||
| EHP.R | EM | HF | Yes | HWE | Tests haplotype-disease | Variance increases | Pools of 4 | PC/UNIX | [ |
| Assessment | EM issues | ||||||||
| Handles different types of | Requires knowledge of S-Plus |
a Program haplotype output, individual assignment, frequency estimates or both.
b Ability of program to accept missing data.
c Program assumptions.
d List of references.
*Could not determine from available data.
EM: Expectation maximisation algorithm; EM issues: May be sensitive to HWE departures, long run times, and non-global max (requiring multiple restarts); HF: Haplotype frequency estimate; HA: Individual haplotype assignment; HWE: Hardy-Weinberg equilibrium; LD: Linkage disequilibrium; PC: IBM compatible personal computer; UNIX: Runs on Unix operating system, including Linux, FORTRAN, Solaris and others.