Literature DB >> 30653102

Coverage rate of ADME genes from commercial sequencing arrays.

Nabil Zaid^1,2, Youness Limami³, Nezha Senhaji⁴, Nadia Errafiy⁵, Loubna Khalki⁵, Youssef Bakri⁶, Younes Zaid³, Saaid Amzazi^1,2.

Abstract

Pharmacogenomics offers remarkable potential for the rapid translation of discoveries into changes in clinical practice. In the present work, we are interested in evaluating the ability of commercially available genome-wide association sequencing chips to cover genes that have high pharmacogenomics potential.We used a set of 2794 variations within 369 absorption, distribution, metabolism, and elimination (ADME) genes of interest, as previously defined in collaboration with the Pharma ADME consortium. We have compared the Illumina TrueSeq and both Agilent SureSelect and HaloPlex sequencing technologies. We have developed Python scripts to evaluate the coverage for each of these products. In particular, we considered a specific list of 155 allelic variants in 34 genes which present high pharmacogenomics potential. Both the theoretical and practical coverage was assessed.Given the need to have a good coverage to establish confidently the functionality of an enzyme, the observed rates are unlikely to provide sufficient evidence for pharmacogenomics studies. We assessed the coverage using enrichment technology for exome sequencing using the Illumina Trueseq exome, Agilent SureSelectXT1 V4 and V5, and Haloplex exome, which offer a coverage of 96.12%, 91.61%, and 88.38%, respectively.Although pharmacogenomic advances had been limited in the past due in part to the lack of coverage of commercial genotyping chips, it is anticipated that future studies that make use of new sequencing technologies should offer a greater potential for discovery.

Entities: Chemical

Mesh：

Year: 2019 PMID： 30653102 PMCID： PMC6370070 DOI： 10.1097/MD.0000000000013975

Source DB: PubMed Journal: Medicine (Baltimore) ISSN： 0025-7974 Impact factor: 1.817

Introduction

Despite major recent progress in the knowledge of the human genome, interindividual genetic polymorphism, the metabolism of the drugs, and of the molecular techniques of biology, therapeutic individualization is not a current practice.[ Even with promising results, conclusive data are often missed and both effectiveness and tolerance of statistical data are still taken into consideration before suggesting treatment to a patient. The standardization of both protocols and doses, that are useful to a period of development of new therapeutic indications, shows now its limits in term of effectiveness and tolerance.[ Pharmacogenetics focuses on the study of metabolic differences between individuals as well as their absorption, distribution, metabolism, and elimination of drugs.[ The study of phenotypic expressions allowed genetic analysis, particularly interindividual differences in the enzymatic equipment and their effects on the metabolism of the organism.[ Pharmacogenomics, with the development of molecular genetics techniques, applies to the gene itself and not only its expression.[ It gathers pharmacogenetics and renews it by identifying different variations in the genome which are responsible for a modification in responses of the body.[ Pharmacogenomics, therefore, concerns the study of all the genes involved in drug response and refers to the use of genomics in the search for new therapeutic targets.[ Bioinformatics is an essential tool in pharmacogenomics.[ Thanks to its capacity to collect and treat important quantities of data, it allows the use of new methods in various fields, in particular, health.[ The safety and efficacy of medication are of great clinical concern and several teams worldwide are investing concerted efforts toward the identification of the genetic causes of variable drug responses in the hopes of offering genetically-determined personalized therapy.[ The effective concentration of a drug at its effective site is a key determinant of the safety and efficacy of the drug. It depends on the absorption, distribution, metabolism, and elimination (ADME) mechanisms of the drug.[ Genetic variations in ADME genes can lead to large differences in drug exposure between individuals. Unlike other factors, genetic variations remain stable throughout a person's lifetime and provide a valuable means to predict drug response and prevent adverse drug reactions.[ Pharmacogenetics has achieved impressive progress toward the personalization of pharmaceutical treatment, with over 100 drugs in the list of the U.S. Food and Drug Administration (FDA) with genetic testing recommendations, ensuring drug safety and efficacy.[ The interaction of these many genes and pathways are very complex, and current commercial platforms do not allow good coverage of the ADME variants.[ Single nucleotide polymorphisms (SNPs) are the most frequent type of polymorphism in the human genome. They can provide a huge number of useful genetic markers for many genetic analyses (eg, phylogenetic analysis, ultra-dense genetic mapping, and genotype/phenotype association studies) and important applications (eg, cultivar identification and marker-assisted selection), which are simplified as these markers are most often biallelic.[ This analysis aims to calculate the coverage rate of ADME genes for each of the variants and amplicons lists from different technologies. To this end, the genomic position of each genetic variation present in one of the ADME gene was used, according to the most recent build of the human genome (Hg19). Using Python scripts, the coverage rate of each of the genetic variation of interest was computed as an intersection between Interest list and ADME list divided over ADME list. Coverage of the markers of interest that can be achieved with linkage disequilibrium (LD) with neighboring markers was also considered. To refine the work and make it more significant, theoretical results according to the targeted coverage of the different technologies and the practical results were compared. The result of this analysis allows for the identification of the technology which has the best possible coverage of the ADME genes.

Materials and methods

Generating data

Different technologies of interest were compared regarding their coverage of different ADME lists (Table 1). Variants of interest were extracted from ADME genes (Table 2) that were determined to be associated with drug metabolism.[ A total of 155 variations extracted from 34 genes that were considered to be of highest importance, the ADME Core.

Table 1

ADME and genomic platforms used in this study.

Table 2

Most important ADME genes.

ADME and genomic platforms used in this study. Most important ADME genes. In terms of enrichment platforms, we have: the TrueSeq list, includes 20,794 genes that have 201,121 exons, spanning different databases: the NCBI CCDS (97.2%), NCBI RefSeq (96.4%), NCBI RefSeq including the non-coding DNA (88.3%), EncodeGencode of UCSC Genome Bioinformatics (93.2%), and microRNA (77.6%). The TrueSeq list is the result of the TrueSeq Technology product by Illumina, which represents the latest advances in Illumina targeted sequencing by capture hybridization.[ The SureSelect, a list of covered polymorphisms from the “Agilent Technologies” product for capture hybridization, which has the following website: http://www.genomics.agilent.com/. This list includes 554,751 amplicons likely to cover the variants of each one of the ADME lists. The HaloPlex, technology provides outstanding performance, streamlined workflow, and low sample input requirements for next-generation sequencing of human exomes. The HaloPlex Exome has been optimized to provide comprehensive coverage of the coding regions of the human genome.

Steps

Before starting the calculation of the coverage of our interest lists, the data were processed to ensure organization in the appropriate format, complementation by finding the position of each ADME gene, purification by eliminating duplicates of each list, and standardization by unifying the build on hg19.

Selection of the database

The various databases offer essentially the same information. It was, therefore, necessary to select only 1 database to avoid redundancy. The 1000 Genomes Project is the first project to sequence the genomes of a large number of people, to provide a comprehensive resource on human genetic variation. Also, this database allows finding most genetic variants that have frequencies of at least 1% in the populations studied.[ Since the 1000 genomes database allows the extraction of these data in BED format files, which is an ideal format for data mining using PLINK, we chose this database as a study tool.

Results

Theoretical coverage of enrichment platforms

For enrichment platforms, we relied on probes contained in these lists. The coverage rates of the ADME genes by enrichment platforms are recorded in Table 3.

Table 3

Summary of coverage rates of our ADME lists of interest by enrichment platforms.

Summary of coverage rates of our ADME lists of interest by enrichment platforms. The TrueSeq list can theoretically cover 137 out of 155 ADME variants, the coverage rate is 88%. The coverage of the SureSelect is around 60% (version 4), and 70% (version 5). The Haloplex can theoretically cover 143 of the 155 ADME variants Core, which presents a rate of 92%. As we can infer, from enrichment platforms previously described, the Haloplex list allows the best coverage of the ADME variants. But this coverage remains theoretical.

Practical results

To refine the work and make it more meaningful, we calculated the coverage of the 155 ADME core variants that was achieved from the 3 sequencing platforms (Trueseq, SureSelect, and Haloplex) by using read datasets made available in our lab according to a bioinformatics pipeline (Fig. 1).

Figure 1

Steps of calculating the practical coverage.

Steps of calculating the practical coverage. We demultiplexed the sequencing raw data of different samples which allowed us to get the results as a “FASTQ” files. Then, we acceded to the pipeline using these files. The pipeline consists of aligning the sequences of the “FastQ” files according to a reference genome.[ The result of this protocol is a “.BAM” file from which we calculated the depth (number of tags) of the practical coverage in 3 platforms, base by base, using the following command: The last step of the pipeline consists in variant calling. In our study, we were limited to the calculation of the coverage from the alignment (“.BAM” file) because of the number of samples.

Comparison of the 3 sequencing platforms

The practical coverage (Table 4) of TrueSeq (made at 130x) list is between 95% (147 of 155) 5x, and 92% (142 of 155) 25x. SureSelect V5 has a practical coverage between 95% (147 of 155) at 5x, and 69% (107 of 155) at 25x. The SureSelect platform, unlike Trueseq, is made at 49x, which explains the low coverage at depths greater than 30x. The practical coverage of Haloplex list (made at 42x) is between 85% (132 of 155) at 5x, and 67% (104 of 155) at 25x.

Table 4

Practical coverage of ADME core variants by sequencing platforms according to the depth.

Practical coverage of ADME core variants by sequencing platforms according to the depth. To make the comparison more significant between these 3 platforms, we normalized Trueseq and Sureselect at 42x. To this end, we developed a python script to eliminate randomly some reads from “bam files” of these platforms, to make them at 42x. The diagram (Fig. 2) shows the coverage of different platforms normalized at 42x, previously described in terms of depth.

Figure 2

Comparison of the coverage for 3 sequencing platforms normalized at 42x.

Comparison of the coverage for 3 sequencing platforms normalized at 42x. As the diagram shows, by normalizing the data, Sureselect and Trueseq allow a good coverage (96% and 95%, respectively) at a depth of 1x. However, we cannot make a “SNP calling” with 1x. At depths from 10x to 20x, Sureselect coverage falls to 68% while Trueseq still allows a good coverage (88%). At depths greater than 20x, it is the Trueseq that allows the best coverage, even if it does not exceed the HaloPlex coverage rate significantly.

Discussion

The aim of this work is to evaluate the coverage of clinically relevant ADME genes and specific variants of interest in those genes that can be obtained from commercial sequencing platforms so that this can guide platform selection in preparation of a study design. According to previous studies, the highest coverage of genotyping arrays was 67% (105 of 155 variants), taking into account a LD threshold of 0.8. Among the genotyping platforms assessed previously by Gamazon,[ no platform showed a good coverage of ADME genes (after accounting for LD) sufficiently to conduct pharmacogenomic studies. This is particularly important as several metabolizer phenotypes can only be predicted from diplotype status. According to the FDA, the most important pharmacogenes are CYP2D6 (rs765776661, rs267608302) and CYP2C19 (rs12248560, rs17884712, rs140278421, rs138142612, rs4917623, rs192154563) (Table 5). From the 87 FDA approved drugs (excluding biomarkers of cancerous tissues or virus), these 2 genes are linked to 49 drugs, which presents 56% of all FDA approved drugs.[ In term of coverage, Omni 5.0 chip contains 8 more pharmacogenes, namely: CYP1A1, CYP2C9, DPYD, NAT1, NAT2, TPMT, UGT1A1, and VKORC1, in addition to CYP2D6, and CYP2C19, which allows a coverage of 58 drugs as shown in Table 5.

Table 5

Pharmacogenomic biomarkers in drug labeling.

Pharmacogenomic biomarkers in drug labeling. The lack of coverage is due to the choice of a set of homologous genes. However, these genes are the most important in personalized medicine and pharmacogenomics, hence the importance of these results to those interested in the evaluation of genomic data. Previous studies have shown the limits of genome-wide methods for pharmacogenomic testing. Gamazon et al.[ focused on a set of the most important genes in pharmacogenomics and personalized medicine, using only genotyping platforms. Their findings demonstrated that even by taking into consideration the SNPs which are in LD, the coverage rate of these genes by genotyping platforms is sub-optimal. In this study, we assessed also exome sequencing platforms. The HaloPlex enrichment allows the best coverage of the ADME variants. However, this coverage remains theoretical. To further evaluate the coverage that can be obtained using the sequencing technologies, we used practical sequencing results from 3 platforms, namely: Trueseq, Haloplex, and Sureselect. According to the results obtained, we can conclude that among the 4 commercial platforms tested, Trueseq offers a good coverage of our ADME variants of interest.

Author contributions

Conceptualization: Nabil Zaid, Nezha Senhaji, Loubna Khalki, Younes Zaid, and Saaid Amzazi. Data curation: Nabil Zaid, Youness Limami, Loubna Khalki, Younes Zaid, and Saaid Amzazi. Formal analysis: Nabil Zaid, Nezha Senhaji, and Saaid Amzazi. Investigation: Nabil Zaid, Youness Limami, and Saaid Amzazi. Methodology: Nabil Zaid, Youness Limami, Nezha Senhaji, Nadia Errafiy, Youssef Bakri, Younes Zaid, and Saaid Amzazi. Resources: Nadia Errafiy. Software: Nabil Zaid, Youness Limami, and Nadia Errafiy. Supervision: Youness Limami, Younes Zaid, and Saaid Amzazi. Validation: Nabil Zaid, Youness Limami, Youssef Bakri, Younes Zaid, and Saaid Amzazi. Visualization: Nabil Zaid, Nadia Errafiy, and Saaid Amzazi. Writing – original draft: Nabil Zaid, Nezha Senhaji, Loubna Khalki, Youssef Bakri, and Younes Zaid. Writing – review & editing: Younes Zaid and Saaid Amzazi.

26 in total

Review 1. [Pharmacogenomics and gene expression analysis. Functional genome research for individual application to patients].

Authors: Matthias U Kassack
Journal: Med Monatsschr Pharm Date: 2003-05

Review 2. Predicting ADME properties and side effects: the BioPrint approach.

Authors: Cecile M Krejsa; Dragos Horvath; Sherri L Rogalski; Julie E Penzotti; Boryeu Mao; Frédérique Barbosa; Jacques C Migeon
Journal: Curr Opin Drug Discov Devel Date: 2003-07

3. [Phenotypic and genetic analysis of a boy with partial trisomy of 1q].

Authors: Dong Wu; Hui Zhang; Hongdan Wang; Qiaofang Hou; Tao Wang; Tao Li; Yanli Yang; Shixiu Liao
Journal: Zhonghua Yi Xue Yi Chuan Xue Za Zhi Date: 2017-06-10

Review 4. Evolution of ADME science: where else can modeling and simulation contribute?

Authors: Dennis A Smith
Journal: Mol Pharm Date: 2013-01-22 Impact factor: 4.939

5. Therapeutic inertia or individualization? Delay in clinical management of type 2 diabetes mellitus.

Authors: Nagaaki Tanaka; Takeshi Kurose; Yutaka Seino
Journal: Curr Med Res Opin Date: 2016-06-13 Impact factor: 2.580

6. The limits of genome-wide methods for pharmacogenomic testing.

Authors: Eric R Gamazon; Andrew D Skol; Minoli A Perera
Journal: Pharmacogenet Genomics Date: 2012-04 Impact factor: 2.089

7. Prevention of adverse drug reactions in intensive care patients by personal intervention based on an electronic clinical decision support system.

Authors: Thilo Bertsche; Johannes Pfaff; Petra Schiller; Jens Kaltschmidt; Markus G Pruszydlo; Wolfgang Stremmel; Ingeborg Walter-Sack; Walter E Haefeli; Jens Encke
Journal: Intensive Care Med Date: 2010-02-09 Impact factor: 17.440

8. Integrative Analysis of Genetic, Genomic, and Phenotypic Data for Ethanol Behaviors: A Network-Based Pipeline for Identifying Mechanisms and Potential Drug Targets.

Authors: James W Bogenpohl; Kristin M Mignogna; Maren L Smith; Michael F Miles
Journal: Methods Mol Biol Date: 2017

9. Human SNPs reveal no evidence of frequent positive selection.

Authors: Liqing Zhang; Wen-Hsiung Li
Journal: Mol Biol Evol Date: 2005-08-17 Impact factor: 16.240

Review 10. Cancer pharmacogenomics: DNA genotyping and gene expression profiling to identify molecular determinants of chemosensitivity.

Authors: J Todd Auman; Howard L McLeod
Journal: Drug Metab Rev Date: 2008 Impact factor: 4.518

1 in total

Review 1. Genophenotypic Factors and Pharmacogenomics in Adverse Drug Reactions.

Authors: Ramón Cacabelos; Vinogran Naidoo; Lola Corzo; Natalia Cacabelos; Juan C Carril
Journal: Int J Mol Sci Date: 2021-12-10 Impact factor: 5.923

1 in total