| Literature DB >> 23378606 |
Cao Nguyen1, Michael D Varney, Leonard C Harrison, Grant Morahan.
Abstract
Evaluating risk of developing type 1 diabetes (T1D) depends on determining an individual's HLA type, especially of the HLA DRB1 and DQB1 alleles. Individuals positive for HLA-DRB1*03 (DR3) or HLA-DRB1*04 (DR4) with DQB1*03:02 (DQ8) have the highest risk of developing T1D. Currently, HLA typing methods are relatively expensive and time consuming. We sought to determine the minimum number of single nucleotide polymorphisms (SNPs) that could rapidly define the HLA-DR types relevant to T1D, namely, DR3/4, DR3/3, DR4/4, DR3/X, DR4/X, and DRX/X (where X is neither DR3 nor DR4), and could distinguish the highest-risk DR4 type (DR4-DQ8) as well as the non-T1D-associated DR4-DQB1*03:01 type. We analyzed 19,035 SNPs of 10,579 subjects (7,405 from a discovery set and 3,174 from a validation set) from the Type 1 Diabetes Genetics Consortium and developed a novel machine learning method to select as few as three SNPs that could define the HLA-DR and HLA-DQ types accurately. The overall accuracy was 99.3%, area under curve was 0.997, true-positive rates were >0.99, and false-positive rates were <0.001. We confirmed the reliability of these SNPs by 10-fold cross-validation. Our approach predicts HLA-DR/DQ types relevant to T1D more accurately than existing methods and is rapid and cost-effective.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23378606 PMCID: PMC3661605 DOI: 10.2337/db12-1398
Source DB: PubMed Journal: Diabetes ISSN: 0012-1797 Impact factor: 9.461
FIG. 1.A regional linkage disequilibrium map and IGRs of 29 SNPs used to annotate HLA-DR types. SNPs are plotted according to their chromosome positions (National Center for Biotechnology Information [NCBI] build36/hg18) with their IGRs from the discovery phase. The three selected SNPs (rs2854275, rs6931277, rs3104413) by the GRASPER method are shown as triangles. The two SNPs (rs2187668, rs7454108) found by Barker et al. (10) are shown as circles. Linkage disequilibrium (calculated as r values) between the key SNP rs2854275 (see Fig. 2) and the other SNPs is indicated by gray within the SNP symbol. Compare the intensity vs. the scale at the right of the figure. The estimated recombination rates from 1,000 Genome Pilot 1 samples also are plotted. The genes within the region containing the 29 SNPs are annotated. Display software to produce this graph was obtained from http://www.broadinstitute.org/mpg/snap/ldplot.php.
FIG. 2.Accuracy decay (AUC) in predicting HLA-DR types using smaller subsets of the selected 29 SNPs. Seven best SNPs for predicting HLA-DR types are as follows: rs2854275, rs6931277, rs3104413, rs3129716, rs2187668, rs9273327, and rs2856674. Six best SNPs are as follows: rs2854275, rs6931277, rs3104413, rs2187668, rs9273327, and rs2856674. Five best SNPs are as follows: rs2854275, rs6931277, rs3104413, rs9273327, and rs2856674. Four best SNPs are as follows: rs6931277, rs3104413, rs9273327, and rs2856674. Three best SNPs are as follows: rs2854275, rs6931277, and rs3104413. Two best SNPs are as follows: rs2854275 and rs3104413. One best SNP is rs2854275. Accuracy for the discovery dataset reported using 10-fold cross-validation. Accuracy for the validation dataset reported using a predictive model trained on the discovery dataset. Accuracy for the combined dataset reported using 10-fold cross validation. Note that there is no difference in the reported AUCs between discovery, validation, and combined datasets, suggesting our SNP selection method is not biased.
FIG. 3.Accuracy and AUC of five machine learning methods in predicting HLA-DR types using the two SNPs (rs2854275 and rs3104413) selected by the GRASPER method. SVM, support vector machines.
Comparison of the novel GRASPER method and other feature selection methods
Predictive accuracy distribution is consistent across geographically and ethnically diverse recruitment networks
FIG. 4.Simple rules for determining individuals with HLA-DR, DR4-DQ8, DR4-DQ7, DR3/4-DQ8, and DRB1*03:01-DQA1*05:01-DQB1*02:01 types with three SNPs. *SNP rs9273363 can be replaced by rs9275184 or rs9275495 or rs9275334 or rs9275532.
Comparison of SNPs found using the GRASPER method and the two other SNPs: breakdown by HLA-DR types