| Literature DB >> 23555798 |
Steven Ringquist1, Gaia Bellone, Ying Lu, Kathryn Roeder, Massimo Trucco.
Abstract
Located on Chromosome 6p21, classical human leukocyte antigen genes are highly polymorphic. HLA alleles associate with a variety of phenotypes, such as narcolepsy, autoimmunity, as well as immunologic response to infectious disease. Moreover, high resolution genotyping of these loci is critical to achieving long-term survival of allogeneic transplants. Development of methods to obtain high resolution analysis of HLA genotypes will lead to improved understanding of how select alleles contribute to human health and disease risk. Genomic DNAs were obtained from a cohort of n = 383 subjects recruited as part of an Ulcerative Colitis study and analyzed for HLA-DRB1. HLA genotypes were determined using sequence specific oligonucleotide probes and by next-generation sequencing using the Roche/454 GSFLX instrument. The Clustering and Alignment of Polymorphic Sequences (CAPSeq) software application was developed to analyze next-generation sequencing data. The application generates HLA sequence specific 6-digit genotype information from next-generation sequencing data using MUMmer to align sequences and the R package diffusionMap to classify sequences into their respective allelic groups. The incorporation of Bootstrap Aggregating, Bagging to aid in sorting of sequences into allele classes resulted in improved genotyping accuracy. Using Bagging iterations equal to 60, the genotyping results obtained using CAPSeq when compared with sequence specific oligonucleotide probe characterized 4-digit genotypes exhibited high rates of concordance, matching at 759 out of 766 (99.1%) alleles.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23555798 PMCID: PMC3610899 DOI: 10.1371/journal.pone.0059835
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
HLA-DRB1 alleles.
| HLA-DRB1 | European Freq (Rank) | Worldwide Freq (Rank) | UC Cohort Freq |
|
| 0.09149 (4) | 0.04123 (8) | 0.07311 |
|
| 0.01703 (13) | 0.01161 (26) | 0.02872 |
|
| 0.00889 (19) | 0.00329 (41) | 0.02480 |
|
| 0.12916 (3) | 0.06760 (3) | 0.10183 |
|
| 0.09111 (5) | 0.02896 (13) | 0.07963 |
|
| 0.00972 (17) | 0.00742 (32) | 0.00392 |
|
| 0.00572 (23) | 0.02659 (17) | 0.00392 |
|
| 0.03634 (9) | 0.01795 (19) | 0.02219 |
|
| 0.00368 (25) | 0.04776 (6) | 0.00522 |
|
| 0.00947 (18) | 0.01536 (23) | 0.00783 |
|
| 0.00248 (26) | 0.00188 (43) | 0.00261 |
|
| 0.13767 (2) | 0.06986 (2) | 0.10574 |
|
| 0.02363 (12) | 0.00875 (29) | 0.03525 |
|
| 0.00025 (34) | 0.02104 (17) | 0.00131 |
|
| 0.00133 (29) | 0.03864 (9) | 0.00261 |
|
| 0.00089 (31) | 0.00500 (36) | 0.00392 |
|
| 0.00006 (41) | 0.00101 (47) | 0.00131 |
|
| 0.00000 (NA) | 0.00003 (68) | 0.00131 |
|
| 0.00820 (21) | 0.05450 (5) | 0.01175 |
|
| 0.00826 (20) | 0.01284 (24) | 0.01044 |
|
| 0.05654 (7) | 0.05945 (4) | 0.08747 |
|
| 0.00152 (28) | 0.00604 (33) | 0.00131 |
|
| 0.00483 (24) | 0.00227 (42) | 0.00522 |
|
| 0.03189 (10) | 0.01780 (20) | 0.04178 |
|
| 0.00000 (NA) | 0.00003 (68) | 0.00131 |
|
| 0.00000 (NA) | 0.00000 (NA) | 0.00131 |
|
| 0.00000 (NA) | 0.00002 (69) | 0.00131 |
|
| 0.01468 (14) | 0.02712 (16) | 0.03003 |
|
| 0.06283 (6) | 0.03152 (12) | 0.05875 |
|
| 0.04015 (8) | 0.03746 (10) | 0.03655 |
|
| 0.00991 (16) | 0.00784 (30) | 0.01436 |
|
| 0.00000 (NA) | 0.00000 (NA) | 0.00131 |
|
| 0.02459 (11) | 0.03218 (11) | 0.02742 |
|
| 0.14441 (1) | 0.07864 (1) | 0.13316 |
|
| 0.00775 (22) | 0.04507 (7) | 0.00653 |
|
| 0.01061 (15) | 0.01656 (21) | 0.02480 |
|
|
|
|
Allele frequencies and rankings are taken from Maiers et al. [19] for European -DRB1 and from Lancaster et al. [20] for worldwide frequencies. UC cohort frequencies are determined from CAPSeq genotyping results.
Figure 1The Clustering and Alignment of Polymorphic Sequences (CAPSeq) software application illustrated as a schematic.
Input Data: Next-generation sequence data formatted as modified FASTQ files consisting of sequences and corresponding Q-scores along with an additional input data file containing known HLA allele sequences. CAPSeq Application: The analysis software can be broken down into 3 principle steps consisting of those developed to align sequences and use corresponding Q-scores to generate a weighted pairwise similarity score (step 1) that can be analyzed via diffusion mapping, followed by K-means clustering to enable the identification of homogeneous sequence groups (step 2) followed by Bootstrap Aggregating, Bagging, of multiple analyses of the data to ensure genotyping precision (step 3). Output Data: The HLA genotyping data is provided as a tab delimited text file containing the most likely allelic match between the CAPSeq generated consensus sequences and list of known HLA alleles.
Comparison of SSO based 4-digit HLA-DRB1 genotyping with CAPSeq.
| BaggingIterations | Concordant | Sensitivity Error | Specificity Error |
| 5 | 733 (95.7%) | 31 (4.0%) | 2 (0.3%) |
| 10 | 747 (97.5%) | 17 (2.2%) | 2 (0.3%) |
| 20 | 752 (98.2%) | 11 (1.4%) | 3 (0.4%) |
| 40 | 758 (99.0%) | 6 (0.8%) | 2 (0.3%) |
| 60 | 759 (99.1%) | 5 (0.7%) | 2 (0.3%) |
Figure 2CAPSeq Bagging iterations result in improved genotyping sensitivity.
Bagging iterations (x-axis) were varied from 5 to 60. The median frequency of the minor sequence that was detectable by CAPSeq (y-axis) is determined from interrogation of the raw sequencing data obtained using the Roche/454 GSFLX instrument.
Figure 3Frequency of HLA-DRB1 genotypes obtained using SSO (x-axis) and CAPSeq (y-axis) compared at 4-digit resolution.
The dashed line represents the theoretical identity between the two methods. Pearson’s correlation coefficient (r) exceeded 0.999.