| Literature DB >> 32260456 |
Jeong-An Gim1,2, Yonghan Kwon1,3, Hyun A Lee1,4, Kyeong-Ryoon Lee1,5, Soohyun Kim1, Yoonjung Choi6, Yu Kyong Kim7, Howard Lee1,4,8.
Abstract
Tacrolimus is an immunosuppressive drug with a narrow therapeutic index and larger interindividual variability. We identified genetic variants to predict tacrolimus exposure in healthy Korean males using machine learning algorithms such as decision tree, random forest, and least absolute shrinkage and selection operator (LASSO) regression. rs776746 (CYP3A5) and rs1137115 (CYP2A6) are single nucleotide polymorphisms (SNPs) that can affect exposure to tacrolimus. A decision tree, when coupled with random forest analysis, is an efficient tool for predicting the exposure to tacrolimus based on genotype. These tools are helpful to determine an individualized dose of tacrolimus.Entities:
Keywords: decision tree; genotype; machine learning; random forest; tacrolimus
Mesh:
Substances:
Year: 2020 PMID: 32260456 PMCID: PMC7178269 DOI: 10.3390/ijms21072517
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Simplified (depth: 3) decision tree for the maximum plasma concentration (Cmax, μg mL−1, A) and the area under the concentration curve from time zero to the last quantifiable time point (AUClast, h μg mL−1, B) of tacrolimus. The rectangles denote the branches, which contain the gene name, the single nucleotide polymorphism (SNP) accession number, proportion (%), and frequency of subjects, and the classifying alleles. The rounded rectangles represent the final nodes, in which the mean values of Cmax and AUClast, the percentage, and number of subjects are shown. (C) Mean concentration time profiles of tacrolimus by node for AUClast as identified in (B). Subjects in node 3 had the highest values of Cmax and AUClast.
Genetic variants associated with tacrolimus Cmax and AUClast identified by decision tree.
| Gene | SNP | Location | Reference Allele | Variant Allele | Reference Allele Frequency | Variant Allele Frequency | ||
|---|---|---|---|---|---|---|---|---|
| 1000 Genomes * | Our Data ** | 1000 Genomes * | Our Data ** | |||||
|
|
| Splice acceptor | T | C | 0.379 | 0.253 | 0.621 | 0.747 |
|
|
| Exon | T | C | 0.239 | 0.136 | 0.761 | 0.864 |
|
|
| 3′UTR | G | C | 0.698 | 0.370 | 0.302 | 0.630 |
Abbreviations: Cmax, maximum plasma concentration; AUClast, area under the concentration curve from time zero to the last quantifiable time point; SNP, single nucleotide polymorphism. The allele frequency was calculated using the 1000 Genomes Project * data and our data **. SNP data were retrieved from dbSNP. *** Cmax only.
Top four genetic variants for tacrolimus Cmax and AUClast identified in the random forest analysis.
| Gene | SNP and Genotype | Location | Reference Allele | Variant Allele | Reference Allele Frequency | Variant Allele Frequency | Importance | ||
|---|---|---|---|---|---|---|---|---|---|
| 1000 Genomes * | Our Data ** | 1000 Genomes * | Our data ** | ||||||
| Cmax | |||||||||
|
|
| Splice acceptor | T | C | 0.379 | 0.253 | 0.621 | 0.747 | 0.28524489 |
|
|
| Intron | G | A | 0.517 | 0.525 | 0.483 | 0.475 | 0.14800742 |
|
|
| Exon | C | G | 0.627 | 0.358 | 0.373 | 0.642 | 0.13512953 |
|
|
| 3′UTR | G | C | 0.698 | 0.370 | 0.302 | 0.630 | 0.11857793 |
| AUClast | |||||||||
|
|
| Splice acceptor | T | C | 0.379 | 0.253 | 0.621 | 0.747 | 1.5377314 |
|
|
| Intron | G | A | 0.517 | 0.525 | 0.483 | 0.475 | 0.3333521 |
|
|
| Exon | T | C | 0.239 | 0.136 | 0.761 | 0.864 | 0.1921316 |
|
|
| Exon | C | T | 0.678 | 0.710 | 0.322 | 0.290 | 0.1419874 |
Abbreviations: Cmax, maximum plasma concentration; AUClast, area under the concentration curve from time zero to the last quantifiable time point; NA, not applicable. The allele frequency was calculated using the 1000 Genomes Project * data and our dataset **.
Genetic variants with a coefficient >0 for tacrolimus Cmax and AUClast in the least absolute shrinkage and selection operator (LASSO) models.
| Gene | SNP | Location | Reference Allele | Variant Allele | Reference Allele Frequency | Variant Allele Frequency | Coefficient | ||
|---|---|---|---|---|---|---|---|---|---|
| 1000 Genomes * | Our Data ** | 1000 Genomes * | Our Data ** | ||||||
| Cmax | |||||||||
|
|
| Splice acceptor | T | C | 0.379 | 0.253 | 0.621 | 0.747 | 0.13331 |
|
|
| Intron | T | C | 0.270 | 0.519 | 0.730 | 0.481 | 0.07863 |
|
|
| Exon | G | A, T | 0.323 | 0.025 | 0.677 | 0.975 | 0.07224 |
| AUClast | |||||||||
|
|
| Splice acceptor | T | C | 0.379 | 0.253 | 0.621 | 0.747 | 0.36133 |
Abbreviations: Cmax, maximum plasma concentration; AUClast, area under the concentration curve from time zero to the last quantifiable time point. The allele frequency was calculated using the 1000 Genomes Project * data and our dataset **.
Figure 2Duplexes identified by in silico analysis between a microRNA (miR) and rs1060253 of the SLC7A5 (left: reference allele; right: variant allele) for hsa-miR-301a-3p (top) and miR-301b-3p (bottom). The shades denote the seed region of miR-301a-3p and -301b-3p. The circles represent the reference and variant nucleotides of rs1060253.