| Literature DB >> 29327117 |
Jun Miyake1,2, Yuhei Kaneshita3, Satoshi Asatani3, Seiichi Tagawa3, Hirohiko Niioka3, Takashi Hirano3,4,5.
Abstract
Alleles of human leukocyte antigen (HLA)-A DNAs are classified and expressed graphically by using artificial intelligence "Deep Learning (Stacked autoencoder)". Nucleotide sequence data corresponding to the length of 822 bp, collected from the Immuno Polymorphism Database, were compressed to 2-dimensional representation and were plotted. Profiles of the two-dimensional plots indicate that the alleles can be classified as clusters are formed. The two-dimensional plot of HLA-A DNAs gives a clear outlook for characterizing the various alleles.Entities:
Keywords: Allele; Artificial intelligence; Autoencoder; Deep learning; HLA
Mesh:
Substances:
Year: 2018 PMID: 29327117 PMCID: PMC5852191 DOI: 10.1007/s13577-017-0194-6
Source DB: PubMed Journal: Hum Cell ISSN: 0914-7470 Impact factor: 4.174
Data set of alleles: numbers of samples of alleles of HLA-A in the database
| Allele | Number |
|---|---|
| A*01 | 30 |
| A*02 | 164 |
| A*03 | 28 |
| A*11 | 60 |
| A*23 | 12 |
| A*24 | 96 |
| A*26 | 41 |
| A*29 | 15 |
| A*30 | 20 |
| A*31 | 30 |
| A*32 | 14 |
| A*33 | 30 |
| Total | 540 |
Data set of alleles: attribution of bases to digits
| Base | Attributed digit |
|---|---|
| A | 1000 |
| T | 0100 |
| C | 0010 |
| G | 0001 |
Fig. 1Schematic illustration of the compression process of autoencoder for HLA analysis. DNA sequence of HLA-A (822 bp) is regarded as the input layer. Resulted 2-dimensional layer is expressed as a dot on 2-D plane
Fig. 2Graphical projection of HLA-A DNA onto 2-dimensional feature space. Colors of the dots corresponding the alleles, individually. Clusters of the dots are obviously related to the allele types expressed by different colors (A*01–A*33)
Fig. 3Histogram-based document vector analysis of HLA-A using autoencoder. Positions of alleles are different from Fig. 2 but each one looks much sharpened and independent from those of other alleles. The meanings of the distances and directions of alleles are under investigation but they could be correlated to the genetic differences for immune characteristics. Closer the positions should indicate the mutual similarities of the sequences
Fig. 4Histogram-based document vector analysis of HLA-A by PCA method. Positions of alleles are different from Fig. 3 and the clusters are less clearly organized and overlaps prevent simple identification of alleles