Literature DB >> 33469342

Review on Databases and Bioinformatic Approaches on Pharmacogenomics of Adverse Drug Reactions.

Hang Tong^1,2, Nga V T Phan^1,2, Thanh T Nguyen³, Dinh V Nguyen^4,5, Nam S Vo³, Ly Le^1,2,3.

Abstract

Pharmacogenomics has been used effectively in studying adverse drug reactions by determining the person-specific genetic factors associated with individual response to a drug. Current approaches have revealed the significant importance of sequencing technologies and sequence analysis strategies for interpreting the contribution of genetic variation in developing adverse reactions. Advance in next generation sequencing and platform brings new opportunities in validating the genetic candidates in certain reactions, and could be used to develop the preemptive tests to predict the outcome of the variation in a personal response to a drug. With the highly accumulated available data recently, the in silico approach with data analysis and modeling plays as other important alternatives which significantly support the final decisions in the transformation from research to clinical applications such as diagnosis and treatments for various types of adverse responses.

Entities: Chemical Disease Gene Mutation Species

Keywords: adverse drug reactions; candidate gene approach; genome-wide association study; next generation sequencing; pharmacogenomics

Year: 2021 PMID： 33469342 PMCID： PMC7812041 DOI： 10.2147/PGPM.S290781

Source DB: PubMed Journal: Pharmgenomics Pers Med ISSN： 1178-7066

Adverse Drug Reactions

Adverse drug reactions (ADRs) are defined as adverse events that happen to patients after taking certain drugs in clinical treatment.1,2 ADRs can cause failure to almost every organ, but more frequent targets are skin, blood, and liver.3–16 These reactions have been reported to affect 10–20% inpatients and about 25% outpatients17–19 and becoming the major burden of healthcare globally. Edward and Aronson have characterized ADRs into six types from A-F (Table 1), in which two major causes are determined as pharmacologic (type A) and immunologic effects (type B). Some rare ADRs resulted from complementary drug metabolism and immunogenic responses.20 Type A is the most common ADRs which driven by the pharmacodynamics reactions including drug metabolism and transport. This type of ADRs is dose-dependent and predictable, therefore they can be managed by adjusting drug intake. The incidences of this ADR type depend largely on the manifestations of Phase I and Phase II liver enzymes such as cytochrome P450s and glutathione transferases. The majority of ADR type B is drug hypersensitivity reactions (DHRs) which count for about 20% of total ADR cases and are mostly driven by immune system factors such as human leukocyte antigens. The reactions may happen at a very low amount of drug compared with a normal dose, and are classified based on different sorts of mechanisms. DHRs are more frequently categorized as immediate and delayed reactions regarding the time course of development.21 Even though the other four types of ADRs were also classified based on the involvement of the associated factors and systematic response to the monitoring methods, the most common ADRs observed from literature are type A (on-target or intrinsic) and type B (off-target or idiosyncratic). The ADRs may appear in patients with mild symptoms from dizziness to very severe syndromes or death, causing much uncomfortability during treatment. In some specific conditions, drugs must be withdrawn and the treatments have to be switched to new therapies. Accurate diagnosis therefore becomes vital to save patients and reduce financial tension. The pharmacogenomic studies of gene–drug relationships bring relevant knowledge to add genetic factors as one to be diagnosed prior to using a drug.

Table 1

ADR Classification

Type	Clinical Characteristics	Examples	Drugs
A	Pharmacological effect; Predictable; Dose dependent	ThrombolysisSerotonin syndrome	AntiplateletsDigoxin
B	Caused by Immune – mediation and non-immune –mediation; Nonpredictable; Dose independent	MPESCARs	BetalactamsAnticonvulsants
C	Mix pharmaco-immuno effect; Chronic; Cumulative dose-related; Manageable by withdrawal	Hypothalamic-pituitary-adrenal axis suppression	Corticosteroids
D	Dose – related; Nonmanageable by withdrawal	Teratogenesis	Diethylstilbestrol
E	Withdrawal effect; Manageable by slow withdrawal or reintroducing	Opiate withdrawal syndrome	Opiate
F	Failure; Dose – related; Often caused by drug interaction; Manageable by changing dose	Inadequate dosage of an oral contraceptive	Contraceptive drugs

ADR Classification

Genetic Predisposition of ADRs

Advances in pharmacogenomics and immunogenomics have revealed the involvement of multiple molecular factors in response to a drug, raising the concept of drug–gene relationship, and moreover, the mechanisms of response. Pharmacogenomics approaches developed to perceive the presentation of genes in a particular group of samples as well as the gene products under certain conditions. Upon exposure to a drug, a particular gene set will express and bring about the products that could be assessed. The completion of the human genome sequencing has brought several advantages to elucidate the relationship between a person’s genome and drug response. Drug metabolizing enzymes cytochrome P450 (CYPs) especially members 1,2 and 3 families play an important role in drug metabolism and toxicity. CYP2D6, CYP2C19 and CYP2C9 involve in the metabolism of about 80% of therapeutic drugs today. The variation in these genes therefore accounts for most of the ADRs type A in literature. In clinical practice, CYP2D6 biomarkers were observed to link with about 18% of ADR cases reported and suggested by the FDA.22 This gene variation is ethnic-specific and required individual validation in a certain population. In drug hypersensitivity, observational studies in patients treated with some drug groups have shown the high association of genes of HLA classes I and II with the incidence of disease states. For example, HLA-B*15:11 is well-known linked with carbamazepine – inducing SCARs in Japanese and Korean people,23,24 while in South East Asian countries, SCARs to the same drug have prevalently appeared in carriers of HLA-B*15:02 alleles.25–28 Antiretroviral abacavir, on the other hand, is more consistently linked with hypersensitivity in patients carrying HLA-B*57:01 across multiple populations of different origins.29–38 The increasing size and abundance of data accumulated from studies make research in data mining now available for applications.39,40 This review presents the features of current approaches and archive, and state the availability of the data obtained for public share.

Pharmacogenomics Approaches to Study ADRs

Studies on Target Genes Candidates: Replication Approach

More than 95 percents of ADR investigations in Asia are replication approaches.41 A similar situation is also seen in other regions. The study is based on case-control design where people using the same drug(s) are chosen for investigation. Those subjects that develop adverse reactions are carefully characterized and set as cases, whereas controls are the ones who can metabolize the same drug normally. Key variants that show strong association parameters in genome-wide association study (GWAS) would be selected for replications. The designated genes or variants might also be chosen from previous investigations in closely related populations. Replication approach can be considered to assess the prevalence of genes in certain groups of patients in the association study. It can also be used to validate a marker in multiple samples. The replication scheme is therefore designed in a variable approach such as combining different polymerase chain reaction (PCR) – based techniques such as PCR followed with restriction fragment length polymorphism (RFLP); conventional PCR with sequencing; or RT-PCR and in silico techniques.24,42–45 Compared with next generation sequencing, this method is much cheaper and totally affordable for research, and it can be scaled and transformed into an application with little optimization.46,47 Several genetic risk factors responding to drugs were discovered in the gene candidate approach. Abacavir becomes one of the first drugs studied to date to be aware by FDA in patients carrying HLA-B*57:01 alleles (released in 2017). The allele was found more prevalently in abacavir – inducing hypersensitivity patients across continents.30,34–36,48–50 PCR method was repeated simply in several samples using the sequence-specific oligonucleotides followed with sequencing. The consistency of the association between abacavir hypersensitivity and HLA-B*57:01 leads to the general instruction and guideline by FDA to use HLA-B*57:01 test prior to taking this drug. The other example is antiepileptic drug carbamazepine, which has caused highly variable reactions to people in different ethnic groups. Using PCR – based genotyping, HLA-B*15:02 is shown to associate with SJS/TEN in South Asian populations including Han Chinese, Thai, Vietnamese, and Indian,25–28,35,51,52 but in East Asians like Korean and Japanese, the disease is tightly linked with HLA-B*15:11.23,24 In addition to the main allele mentioned above for carbamazepine-induced SCAR, HLA-A*31:01 is also associated with the severe skin disease caused by this drug. A detection test for these alleles has been developed to promisingly use as a diagnostic method in screening the carbamazepine sensitive patients before prescription.53,54 Using PCR in case–control design, several gene-drug associations have been revealed, providing the basis for syndrome – drug linkage using amplification methods (Table 2, Table 3).

Table 2

Selected Genes in Replication Approaches

Drug	Gene	Disease	Population	Methods	Ref.
Abacavir	HLA – B*57:01	Hypersensitivity	Costa Rica Central,Australia, Italia, Argentina	PCR, sequencing	30,34–36,48–50
Nevirapine	CYP2B6 TRAF3IP2	Hypersensitivity	African	PCR	55–57
Carbamazepine	HLA – B*15:02	SCAR	Han Chinese, Thai, Vietnamese	Sequence specific oligonucleotide reverse line blots	25,26,52,54,58
	HLA-B*1511	SJS/TEN	Japanese		23
	HLA-A*31:01	MPE DRESS	Han Chinese, Vietnamese	PCR	26,54
	HLA-B*51:01	MPE DRESS	Han Chinese		26
Phenytoin	HLA - B15:02HLA – B15:13	SCAR	Malaysia	PCR	59
Phenytoin	HLA-B13:01, HLA- B56:02/04, CYP2C19*3	SCAR	Thailand	PCR	60
NSAID	ALOX15	Respiratory disease	Spanish	PCR	42
Co-trimoxazole	CYP2C9 2/3 and CYP2C19 3* and NAT2	Hypersensitivity	UK	PCR	61

Table 2

Approaches of GWAS in Drug Hypersensitivity and Outcomes

Drugs	Population	Case	Control	Associated Gene – Disease	Ref.
Antiretrovirals
Nevirapine	Saharan African	151	182	HLA-C*04:01 - SJS/TEN	73
	Thai	72	77	(rs1265112 and rs746647) within CCHCR1 – skin rash	74
Antibiotics
Sulfonamide	US	91	184	None – hypersensitivity	75
Dapsone	Chinese	39	833	HLA-B*13:01 – dapsone hypersensitivity	76
Beta lactam - Penicillin	SpainItaly	387299	1124362	Rs4958427 of ZNF300 – penicillin allergyrs17612 of C5rs7754768 and rs9268832 of the HLA-DRA \| HLA-DRB5 interregionrs7192 of HLA-DRA	77
NSAID
NSAID	Spanish	112	124	None but suggestive regions of RIMS1, BICC1 and RAD51L 1 – urticarial/angioderma	78
NSAID	Han Chinese	120	101	None but suggestive regions of ABI3BP - urticarial/angioderma	78
Aspirin	Korean	117	685	HLA-DPB1 rs1042151 - Respiratory disease	79
Aspirin	Korean			SBF1 – asthma	80
Cold medicine	Japanese	117	691	rs4917014 of IKZF1SCAR	81
Anticonvulsant
LamotriginePhenytoin	UK	4644	1296	None – hypersensitivity	82
Phenytoin	Taiwan			CYP2C9*3 - SCAR	83
Lamotrigine	Korean	34	1214	rs12668095 near CRAMP1L/TMEM204/IFT140/HN1Lrs79007183 near TNS3skin rash	84
Carbamazepine	Japanese	53	882	HLA-A*31:01 - SCAR	85
Carbamazepine	European	65	3987	HLA-A*31:01 – immediate and delayed hypersensitivity (including SCAR)	86
Various drugs	Caucasian	96	198	None – SJS/TEN	87
Others
Allopurinol	Japanese			SJS/TEN HLA-B*58:01	88
Asparaginase	US	589	3308	rs6021191 variant in NFATC2rs17885382 in HLA-DRB1	89

Selected Genes in Replication Approaches Accumulated evidences from individual researches across populations reveal the geographic difference in genetic variability. The association of a gene to a drug response in different populations are varied along with variable allele frequency, haplotype frequency and linkage disequilibrium. Several diseases are multifactorial that require the combinational interpretation of many genes. The meta-analysis of whole exome data from six ethnic populations from different parts of the world shows the consistent findings in drug-related genes. About half of the functional variants of these genes are unique to one of 6 populations studied.62 In addition, amongst the drug-related genes analyzed, CYPs and phase II enzyme have highest difference in cumulative allele probability (CAP), indicating the variation in possibility that a functional variant affects the drug response phenotypes. Although the analysis was based on only the exome data, the major outcome obtained from this result is crucial for each country to build the own pharmacogenomic database. The availability of this data would get us closer to personalized medicine when doctors can more precisely predict the effect of a drug to a patient and choose the best medication for a person.

Genome-Wide Association Studies

Genome-wide association study (GWAS) has shown to be one of the very effective approaches to screen for the risk factors associated with a certain disease.63 To date, there are 4054 publications and almost 140 thousand associations have been published in the GWAS catalog.64 The approaches are designed with a case–control model followed with next generation sequencing and SNP calling against the reference genome. At this present time, the enormous number of SNPs would be obtained and sent for further evaluation of association analysis. Since GWAS evaluated a large number of SNPs, it requires a much larger number of samples to achieve the statistical reliability.65–69 The correlation between the number of SNP and sample size is calculated. It is estimated that testing a single SNP marker requires 248 cases, while testing 500,000 SNPs and 1 million markers requires 1206 cases and 1255 cases, respectively, under the assumption of an odds ratio of 2, 5% disease prevalence, 5% minor allele frequency, complete linkage disequilibrium (LD), 1:1 case/control ratio, and a 5% error rate in an allelic test.70 Recruitment of the large sample in research requires big financial support as well as expertise in data analysis. For this reason, GWAS is preferably used for emerging diseases or complex traits that cannot be simply understood by a single gene phenotype. Specific variants were found in different cohorts providing that risk factors could be thoroughly scanned and validated. GWAS can be designed to integrate with pharmacodynamics to study the drug response in either prospective or retrospective approaches. The suspected drug is administered at common or tested doses and given to individuals. People are recruited based on their response and their genotypes are scanned for all SNPs, and phenotypes are observed through the response of each individual under drug conditions.71 In common design for ADR study, patients are selected with a similar phenotype that demonstrating the characteristics of the same disease in response to a certain drug, compared with the control subjects who take the same prescription and well improved without adverse reactions (drug tolerant). The statistical analysis is then performed to find the association based on genome wide significance, predictive values, et etc.72 The anticonvulsants, antimicrobial and NSAID are among the most causative drugs of ADRs, and were among the first drugs studied and analyzed in GWAS (Table 2). Several factors have been discussed for the better improvement of the GWAS significance. The sample size is one of the important factors. In the standard approach mentioned, the number of subjects recruited in the study should reach the required quantity in which it could cover the number of possible variants.90 However, in many rare syndromes, recruiting adequate samples seems impossible in a given time. It is therefore requiring consecutive phases of research in which sample collection needs to be completed prior to implementing all other research steps. Multiple attempts have been made to improve the reliability and accuracy of the genome-wide analysis even with a small reference population.91 The findings discussed the adjustment of the model, in which after testing in reference data set of population, both theory and empirical observation from simulation agreed well in the population samples with a high degree of relatedness. These results suggested that higher significance would be obtained in the subjects with the same ethnic origin than in the mixed groups such as meta-analysis. For that, the accuracy and significance at genome-wide scale are still obtained even in the cohort with a small number of subjects (as 50). Design is another important feature besides sample size added to the success of GWAS. Several samples such as some examples explained above cannot get the genome wide significance assigned by the International Hapmap consortium92 assuming that an OR of 2, MAF 5%, disease prevalence 5% and complete linkage disequilibrium. The outcome might be due to the low allele frequency. The associated p-value suggested 5 x 10−8 is valid for common variants with MAF ≥ 5%. When analyzing variants with lower MAF values such as 1%, 0.5% or 0.1%, the model showed the genome wide significance with p values are 3 × 10−8, 2×10−8 and 1 × 10−8, respectively. The inclusion of LD was indicated not necessary as the model can analyze all variants even they have complete linkage disequilibrium (r2 =1).93

Data Mining and in silico Approach

Although replication approaches have contributed to the mass data accumulation through scanning several cohorts, adding to better understanding the genetics of population, the research design should be utilized with thorough consideration in different aspects. First, the selected candidate gene should be chosen from the combinational results of genome scanning and/or tested gene of the whole population. The high association of a single gene with the disease might not completely interpret the cause–effect relationship. Pan reported an HLA-B*15:02 negative - case in Taiwan population who developed hypersensitivity in response to carbamazepine,47 suggesting that, the insight about the genetics of a common disease contributes as part of disease development. In the Thai cohort, screening of the HLA genes could help to protect only 22% of SCAR Thai patients (mainly allopurinol - and carbamazepine – inducing) whereas using drug-induced IFN-γ-specific cells scan, approximately 46% of patients with SCAR were detected positive.94 Second, with the complication of the metabolism, approaching a real system to study mechanism sometimes is not possible. The virtual platform may be one of the very good options to try. The in silico study of carbamazepine and SCAR using pooled data from Asian populations have proved that the polymorphic alleles themselves may not be sufficient to explain the clinical outcomes, instead, the proteins or complex combination of multiple factors could efficiently help to increase the predictive value. For example, when aligning available HLA – B75 protein members on the crystal structure of HLA – B*15:01, all except HLA – B*15:21 imposed and fit on each other. The molecular docking of the chemical structure of carbamazepine to the antigen-presenting sites of tested HLAs could determine the drug binding amino acids on HLA protein. This finding explains for the case reported, in which the HLA – B*15:02 negative but HLA – B*15:21 positive patients still developed SCAR when taking carbamazepine.39 When both alleles belong to the HLA-B75 family, the presence of one and/or another causes hypersensitivity. More investigation combining gene detection and in silico studies have been implemented for dapsone or NSAID.95,96

Analyzing ADRs Using Next Generation Sequencing Data

Data Generation and Collection

In recent years, high-throughput technologies have accelerated genomics/pharmacogenomics studies and resulted in large-scale data. Next-generation sequencing technologies (NGS) such as those provided by Ion Torrent or Illumina platforms are becoming the most common way to get genomics data. Such technologies enabled genomic sequencing on a massive scale at a low-cost and high-quality, which enabled genome-wide studies in large-scale cohorts. Recently, third-generation sequencing technologies (TGS) such as those provided by Pacific Biosciences or Oxford Nanopore Technologies are also exploited in some recent genomics/pharmacogenomics studies.97–101 Such technologies could give read lengths of around tens to even hundreds of thousands of bases although with high-cost and somewhat low-quality compared to second-generation technologies. By generating long reads, TGS has clear advantages in resolving highly repetitive or polymorphic regions which are quite common in pharmacogenomics and immunogenomics studies.

Available Databases for ADRs Studies

Large-scale data from pharmacogenomics studies have been collected and managed under large consortia and networks such as Clinical Pharmacogenetic Consortium (CPIC), Pharmacogenomics Research Network (PGRN), or The South East Asian Pharmacogenomic Network (SEAPHARM). More details of these consortia and networks are shown in Table 3. Some recent work such as The Observational Medical Outcomes Partnership–Common Data Model (OMOP-CDM)102 provides clinical data sources such as electronic health records (EHR) from which ADR-related information can be extracted. Specific databases such as FDA Adverse Event Reporting System (FAERS), Side Effect Resource (SIDER), or Healthcare Cost and Utilization Project (HCUP) provide public datasets that can be used to analyze ADRs and support controlling ADEs.103,104 Currently, many databases have been integrated into larger ones such as PharmGKB.105 PharmGKB also provides a number of datasets and annotations for drugs that have shown adverse reactions, many of them have been pharmacogenetically tested. Some others such as CTD,106 KEGG,107 or SuperTarget108 provide non-clinical data for metabolic pathways, or drug metabolism, interactions in the molecular structures of proteins, or associations of drugs-drugs, drugs-genes. In addition, immunogenomics databases such as HLA-ADR provide allele frequency and haplotype of HLA genes that have been associated with ADRs.109 These databases can be used to help minimize ADRs in drug design or patient treatment.110,111 Some general databases such as dbSNP or the Database of Genomic Variants can be also used as reference for pharmacogenomics studies, eg, to check allele frequency, genotype, and annotation. Figure 1 shows an overview of data, methods and resources for ADR studies.

Table 3

PGx Consortia and Networks

PGx Consortia and Networks	Description	URL
PharmGKB	PharmGKB is a database that collects and curates knowledge of human genetic variation on drug responses. The database provides information about 706 annotated drugs and 149 curated pathways. There are currently 155 annotations for clinical guidelines and 753 annotations for drug labeling. The database also provides 4,570 clinical annotations and 23,938 variant annotations (accessed on July 15th).	https://www.pharmgkb.org/
Clinical Pharmacogenetics Implementation Consortium (CPIC)	CPIC is an assessment organization with updated information on clinical findings and laboratories in the field of pharmacogenomics. CPIC has provided 24 guidelines of 20 genes and 62 drugs to address and breakdown barriers in clinical implementation, reducing “one size fit all” status, and optimizing drugs in precision medicine (accessed on July 15th).	https://cpicpgx.org/
Dutch Pharmacogenetics Working Group (DPWG)	The aim of DPWG consortium is to provide well-known PGx clinical testing to translate genotype to phenotype. With more than 90 clinical guidelines, annotation validated by DPWG will be formulating hypotheses to support clinical implementation or in silico related pharmacogenomics.	https://www.pharmgkb.org/page/dpwg
Ubiquitous Pharmacogenomics (U-PGx)	U-PGx was established by European experts to implement a pre-emptive pharmacogenomics approach. A panel of 13 PGx genes with 50 variants helps to study the genetic factors that influence the patient’s response to medication, with the aim of improving the quality of life, reducing costs, and giving better results for patients.	http://upgx.eu
ClinGen PGx Working Group	As a data center to support clinical practitioners, researchers with genomic and phenotypic information to help to interpret gene factors. There are currently more than 1750 curated genes, 50 expert groups and 11,413 experts for the development of bioinformatics tools and increasing accuracy in the fairly healing process.	https://www.clinicalgenome.org/working-groups/
PGRN-RIKEN	PGRN-RIKEN is a collaboration between Pharmacogenomics Research Network (PGRN) and RIKEN in the use of patient samples and drug response for collaborative research, involving adverse drug response.	https://www.pgrn.org/pgrn-riken.html
Canadian Pharmacogenomics Networks for Drug Safety (CPNDS)	CPNDS was founded in 2004 with the goal of building guidelines (8 guidelines – updated 07/03/2020) related to PGx response and ADR. Learn and assess risks to genetic factors, develop PGx clinical implementation tools to support and optimize drug use.	http://cpnds.ubc.ca/
PharmVar	PharmVar is a repository of pharmacogenomics variation that supporting the defined haplotype and alleles, focusing on human cytochrome P450 families and NUDT15. A comprehensive database providing information for pharmacogenomics Knowledge (PharmGKB) and the Clinical Pharmacogenetics Implementation Consortium (CPIC).	https://www.pharmvar.org/
European Pharmacogenetic Implementation Consortium (EU – PIC)	A group for clinical implementation of many European countries to improve the treatment from pharmacogenomics guideline into clinical care.	http://www.eu-pic.net
Southeast Asain Pharmacogenomic Research Network (SEAPHARM)	SEAPharm was established to enable PGx research among the various communities within but not limited to countries in South East Asia, with the ultimate goal to support PGx implementation in the region.	http://www.pharmagtc.org/seapharm/
Database genomic variant	DGV provides the archiving, accessioning and distribution of public available genomic structural variant in all species.	https://www.ebi.ac.uk/dgva/
dbSNP	dbSNP contains human single nucleotide variations, microsatellites and small scale insertions and deletions along with publications, allele frequencies, molecular sequences and genomic mapping information for both common variation and clinical mutations.	https://www.ncbi.nlm.nih.gov/snp/

Figure 1

Pharmacogenomics for ADRs: networks, data, and pipelines.

Approaches of GWAS in Drug Hypersensitivity and Outcomes PGx Consortia and Networks Pharmacogenomics for ADRs: networks, data, and pipelines.

Data Analysis

With the advances in high-performance computing and big data analytics, we can take advantage of large-scale chemical, biological, and biomedical information to discover complex genetic mechanisms of ADRs. Nevertheless, while sequencing data are now easier and cheaper to produce, analyzing such data is still the bottleneck. In the following paragraphs, we will focus on two types of analyses: detecting genomics/pharmacogenomics variants/haplotypes and annotating them for ADR studies. State-of-the-art bioinformatics tools/pipelines for such analyses and emerging machine learning techniques to classify genomic variants into ADRs were summarized in Table 4.

Table 4

Common Tools Used for Data Analysis

PGx Variants and Haplotypes Calling	Description	Ref.
GATK	A standard tool for variant calling, support Whole Genomes/Exomes, Gene Panels, RNA-seq and Targeted Sequencing.	https://gatk.broadinstitute.org/hc/en-us
BWA	A standard tool for aligning short genomic sequences to large reference sequences such as human genome.	http://bio-bwa.sourceforge.net/bwa.shtml
DeepVariant	A variant calling tool which applies convolutional neural network approach for identifying genomic variants.	https://github.com/google/deepvariant/
Novoalign	An accurate tool for aligning short reads to large reference genomes	http://www.novocraft.com/
Astrolabe	A tool for star allele calling which was initially developed for the CYP2D6 gene, then extended to CYP2C9 and CYP2C19 and other genes	https://www.nature.com/articles/npjgenmed201639
Stargazer	A tool for calling star alleles (haplotypes) in PGx genes using data from NGS or SNP array.	https://stargazer.gs.washington.edu/stargazerweb/
HLA typing
Seq2HLA	RNA-seq; iterative allele inference (greedy); 4-digit resolution	https://bio.tools/seq2hla
Kourami	WGS; discovery of novel alleles; up to 6-digit resolution	https://bio.tools/kourami
Polysover	WES; k-mer seeding to get HLA reads; Bayesian inference for inferring best alleles: up to 6-digit resolution	https://github.com/researchapps/polysolver
HLA-HD	WGS, WES, RNA-seq; discovery of novel alleles; up to 6-digit resolution	https://www.genome.med.kyoto-u.ac.jp/HLA-HD/
Optitype	RNA-seq, WGS, WES; ILP solving to aligned reads for best alleles; 4-digit resolution	https://github.com/FRED-2/OptiType
PGx annotation
VEP	A tool for predicting effects of variants including SNPs, Indels, CNVs or Structural Variants; work with genes, transcripts, protein sequences and regulatory regions.	https://asia.ensembl.org/info/docs/tools/vep/index.html
Annovar	An annotation tool which can identify protein coding changes through the transformation of SNVs, CNVs	http://wannovar.wglab.org/
SnpEff/SnpSift	SnpEff is a tool for variant effect annotation and prediction, particularly on genes and proteins. SnpSift is a tool for genomic variant annotation, using annotated databases. The latter is often used after the former to find the most significant variants.	https://pcingola.github.io/SnpEff/
Intervar	A tool for clinical interpretation of genetic variants based on ACMG/AMP guidelines.	http://wintervar.wglab.org/
PharmCAT	A tool for translation of genotype to phenotype using genotyping and sequencing data based on CPIC guidelines.	http://pharmcat.org/

Common Tools Used for Data Analysis

Genetic Variant Calling

The GATK best practices112 are often recommended to use for variant calling on NGS data, together with BWA-MEM113 for reading alignment. However, some others such as DeepVariant114 or Novoalign () can be used as complementary for BWA-MEM or even replacement in some cases to improve performance. This strategy can be widely applied for WGS, WES or sequencing data generated from gene panels. The called variants often serve as a starting point for downstream analyses to study ADRs.

PGx Haplotyping

Due to the complexity of PGx regions, general haplotype callers often have limitations in determining PGx star alleles.101 As a result, specific tools such as Astrolabe,115 Cyrius,116 Aldy,117 or Stargazer118 are often used. However, most of the current PGx haplotyping tools are still limited to detecting star alleles in a subset of PGx genes, eg, Cyrius works on only gene Cytochrome P450 2D6 (CYP2D6), while Stargazer covers only 51 PGx genes. Developing more comprehensive and accurate tools is still an urgent need.

HLA Typing

The complexity and high polymorphisms of HLA regions make HLA typing always challenging, especially for NGS data. HLA typing tools for NGS data such as PHLAT, Polysolver, OptiType, xHLA, or Kourami can get HLA alleles with 4-digits,119,120 6-digits,121,122 up to 8-digits using WGS/WES data. These tools are often limited to detecting known HLA alleles. For RNA-seq data, seq2HLA can be used to get 4-digits resolution.123 Despite a lot of efforts, current tools for HLA typing still suffer from detecting novel or class II HLA alleles.

PGx Annotation

Variant annotation is the process of determining the effects of genetic variants on disease and genes.124 Annotation for PGx variants can be obtained using general tools for annotation of genetic variants such as VEP, SnpEff/SnpSift, Annovar, or Intervar. More informative annotation of PGx genes could be obtained using specific tools such as PharmCAT, which was built based on CPIC’s guidelines. Such guidelines link genotypes to phenotypes and prescribing recommendations based on genotype/phenotype. Recently, PGx guidelines are provided by CPIC, DPWG, CPNDS, and PharmVar are still the gold standard. Due to the limited number of guidelines, such annotation tools can cover only a small subset of PGx genes. The current version of PharmCAT can detect only 12 guidelines out of 64 very important pharmacogenomics (VIPs) of more than 100 known PGx genes.

Classifying Genomic Variants into ADRs

The American College of Medical Genetics and Genomics and Association of Molecular Pathology (ACMG-AMP) has developed guidelines for standardizing and improving disease classification based on genomic variants, which can be used for studying ADRs.125,126 Some tools aim to predict phenotype from genotype using activity score assigned based on allele frequency information from CPIC.127 Some others used hierarchical or k-means clustering to detect the correlation between genotype and phenotype.128 Some recent tools such as Hubble have applied deep learning techniques to predict the functions of PGx alleles.129 Nevertheless, classifying genomic variants to explore the correlation of genes related to ADRs is still challenging and needs significant improvement.

Put-All-Together

A number of data analysis pipelines have been developed to detect and annotate ADR-associated variants and haplotypes. A pipeline for analyzing three genes CYP2C9, CYP2C12, and HLA using WGS/WES/genotyping data have been carried out in.130 They have extracted 39 variants from whole exome sequencing data of 1585 individuals, then haplotype was assigned based on U-PGx translation. Currently the pipeline developed by PGRN seems to be a gold standard for PGx data analysis.131 Meanwhile, some other groups such as RIKEN-Pharmacogenomics Laboratory, SEAPHARM also built their own pipelines for pharmacogenomics data analysis (personal discussion). Nevertheless, accurate identification of haplotype in pharmacogenomics is still challenging, especially with highly polymorphic regions such as HLA or CYP2D6. Many haplotypes/diplotypes of PGx genes are still unknown. Annotation of PGx genes including novel alleles is still challenging, and there is a need for developing better annotation tools. The gene–drug interaction guidelines are now still very limited, eg, the CPIC database provides only 12 specific guidelines to support optimizing drug therapy. Furthermore, the genotype-phenotype association is still uncertain in many cases due to many factors such as age, gender, the interaction between drug molecules. Developing tools to capture such uncertainty is pretty necessary but very challenging.

Suggestion for Clinical Applications

We have reviewed the typical works in pharmacogenomics to detect adverse drug reactions, a key task in the post-market drug safety surveillance. It is well-known that datadriven131 and data integration approach is essential in solving that problem. Advance in next generation sequencing technology and abundance of genomic data has led to the great outcomes from pharmacogenomic research in drug response to the transformation into clinical applications. A number of drugs have been labeled which concern patients to take the genetic test prior to prescription. Specific programs have also been implemented in particular countries where population genomics have been well analyzed. The US FDA Sentinel Initiative () or The UK CPRD () are among the programs which have been implemented. These are successful examples for translating research into clinical guidelines for patients or populations who suspect to use specific drugs in the future. For emerging countries, the introduction and improvement of technologies as well as the gradual enlarging of available data bring a number of options of choice. Our review has presented the characteristics, requirements and possible outcomes of some popular and powerful approaches for future applications. Using either one or another depends a lot on the prevalence of current data, design and expectation, however, they can be combined to have the best interpretation of the mechanism of drug response or association of genetic components with a disease, therefore suggest the most effective risk factors for diagnosis or strategic therapy. Enriching knowledge and attitude towards implementation of pharmacogenomic services among clinicians and administrative staff are crucial for clinical application. In addition, expanding the list of drug-gene pairs covered by national health policy is feasible goal for the near future when sufficient pharmacogenomic and health economic data are generated to aid in the decision-making process. Efforts to implement pharmacogenomics into clinical practice are proceeding at different rates in different countries. It is important to realize the potential benefits of pharmacogenomic implementation on the healthcare system. Effective data and experience sharing at international levels would reduce unnecessary healthcare costs from inefficient or inappropriate drug therapies, maximize the effectiveness of drugs and minimize adverse drug events.

4 in total

Review 1. Genetic Determinants in HLA and Cytochrome P450 Genes in the Risk of Aromatic Antiepileptic-Induced Severe Cutaneous Adverse Reactions.

Authors: Ali Fadhel Ahmed; Chonlaphat Sukasem; Majeed Arsheed Sabbah; Nur Fadhlina Musa; Dzul Azri Mohamed Noor; Nur Aizati Athirah Daud
Journal: J Pers Med Date: 2021-05-07

Review 2. Genetics of Severe Cutaneous Adverse Reactions.

Authors: Shang-Chen Yang; Chun-Bing Chen; Mao-Ying Lin; Zhi-Yang Zhang; Xiao-Yan Jia; Ming Huang; Ya-Fen Zou; Wen-Hung Chung
Journal: Front Med (Lausanne) Date: 2021-07-15

Review 3. The Transporter-Mediated Cellular Uptake and Efflux of Pharmaceutical Drugs and Biotechnology Products: How and Why Phospholipid Bilayer Transport Is Negligible in Real Biomembranes.

Authors: Douglas B Kell
Journal: Molecules Date: 2021-09-16 Impact factor: 4.411

4. Man vs. machine: comparison of pharmacogenetic expert counselling with a clinical medication support system in a study with 200 genotyped patients.

Authors: Sally H Preissner; Paolo Marchetti; Maurizio Simmaco; Björn O Gohlke; Andreas Eckert; Saskia Preissner; Robert Preissner
Journal: Eur J Clin Pharmacol Date: 2021-12-27 Impact factor: 2.953

4 in total