| Literature DB >> 30744658 |
Kshitij Srivastava1, Kurt R Wollenberg2, Willy A Flegel3.
Abstract
BACKGROUND: Sequence information generated from next generation sequencing is often computationally phased using haplotype-phasing algorithms. Utilizing experimentally derived allele or haplotype information improves this prediction, as routinely used in HLA typing. We recently established a large dataset of long ERMAP alleles, which code for protein variants in the Scianna blood group system. We propose the phylogeny of this set of 48 alleles and identify evolutionary steps to derive the observed alleles.Entities:
Keywords: Allele prediction; ERMAP; Next generation sequencing; Phylogeny; Scianna
Mesh:
Substances:
Year: 2019 PMID: 30744658 PMCID: PMC6371619 DOI: 10.1186/s12967-019-1791-9
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Fig. 1Phylogenetic tree of 48 ERMAP alleles. The phylogeny of the 48 known ERMAP alleles was determined based on a standard Bayesian phylogenetic analysis. Branch width indicates posterior probability support (thick is ≥ 0.95 and thin is < 0.95). The colored circles represent sampled alleles that are also predicted ancestral alleles with the highest posterior probability. The 13 nodes are labelled A to L. Nodes B and B′ share the same allele with the greatest, but different, posterior probabilities (see Table 1)
Predicted alleles at internal nodes of the ERMAP phylogeny
| Node | Allelea | Sequenceb | Posterior probability | Status | GenBank number |
|---|---|---|---|---|---|
| Reference | Allele1 | ATTGGCACCAGGCCGCCGCCCTGCTTAAGCCCTGGCGTGGTACTCGTCACGGTCCGCCGGGGCCGGATTAAA | 1 | Observed | KX265235 |
| A | SPA18 | ------G--------------G--------T-------------T--------------------------- | 0.235 | Predicted | na |
| B | SPA03 | ------G--------------G--------T----------------------------------------- | 0.792c | Predicted | na |
| C | SPA06 | ------G----------A---G--------T-----------T-T--------------------------- | 0.444 | Predicted | na |
| B′ | SPA03 | ------G--------------G--------T----------------------------------------- | 0.516c | Predicted | na |
| D | SPA09 | ------G----------A---G--------T---A-A-----T-T--------------------------- | 0.608 | Predicted | na |
| E | SPA04 | ------G--------------G-------------------------------------------------- | 0.747 | Predicted | na |
| F | SPA07 | ------G-------------TG--G--G--T----------------------------------------- | 0.626 | Predicted | na |
| G | SPA10 | -C----G----------A---G--------T---A-A-----T-T--------------------------- | 0.594 | Predicted | na |
| H | SPA13 | -C----G----------A---G----T---T---A-A-----T-TC-------------------------- | 0.492 | Predicted | na |
| I | Allele12 | ---------------------G-------------------------------------------------- | 0.621 | Observed | KX265198 |
| J | Allele18 | G-----G---A--------TTG--G--G--T----------------------------------------- | 0.674 | Observed | KX265204 |
| K | Allele08 | ------G--------------G-------------T------------------------------------ | 0.888 | Observed | KX265194 |
| L | Allele17 | -C----G----------A---G----T---T---A-A---C-T-TC-------------------------- | 0.634 | Observed | KX265203 |
na not applicable
aAlleles 1, 8, 12, 17, and 18 are experimentally confirmed alleles as published previously [21]. SPA03—SPA18 are predicted alleles (see Additional file 1: Table S1)
bThe nucleotides at the 72 SNP positions with variations are shown in 5′ to 3′ orientation (Table S2 in Srivastava et al. [21])
cThe posterior probabilities differ for SPA03 depending on its position in the phylogenetic tree (see Fig. 1)
Fig. 2Algorithm to analyze genotypes and determine alleles using phylogeny data. Patient or blood donor genotype information for a particular gene is phased into alleles or haplotypes using statistical algorithms for clinical decisions. We propose a novel approach where the confidence for the inferred allele is based on verified, experimentally confirmed alleles and predicted alleles (see Fig. 1). The posterior probability of the predicted alleles is determined by a Bayesian phylogenetic analysis. Whenever a new allele is observed and experimentally confirmed, the phylogenetic analysis is in turn used to predict an updated set of alleles and their posterior probabilities. While this loop process continues, previously unobserved alleles will be encountered less frequently, as the set of confirmed allele increases