| Literature DB >> 35865963 |
Gayatri Panda1, Neha Mishra1, Disha Sharma2,3, Rintu Kutum2,3,4, Rahul C Bhoyar3, Abhinav Jain2,3, Mohamed Imran2,3, Vigneshwar Senthilvel2,3, Mohit Kumar Divakar2,3, Anushree Mishra3, Parth Garg1, Priyanka Banerjee5, Sridhar Sivasubbu2,3, Vinod Scaria2,3, Arjun Ray1.
Abstract
India confines more than 17% of the world's population and has a diverse genetic makeup with several clinically relevant rare mutations belonging to many sub-group which are undervalued in global sequencing datasets like the 1000 Genome data (1KG) containing limited samples for Indian ethnicity. Such databases are critical for the pharmaceutical and drug development industry where diversity plays a crucial role in identifying genetic disposition towards adverse drug reactions. A qualitative and comparative sequence and structural study utilizing variant information present in the recently published, largest curated Indian genome database (IndiGen) and the 1000 Genome data was performed for variants belonging to the kinase coding genes, the second most targeted group of drug targets. The sequence-level analysis identified similarities and differences among different populations based on the nsSNVs and amino acid exchange frequencies whereas a comparative structural analysis of IndiGen variants was performed with pathogenic variants reported in UniProtKB Humsavar data. The influence of these variations on structural features of the protein, such as structural stability, solvent accessibility, hydrophobicity, and the hydrogen-bond network was investigated. In-silico screening of the known drugs to these Indian variation-containing proteins reveals critical differences imparted in the strength of binding due to the variations present in the Indian population. In conclusion, this study constitutes a comprehensive investigation into the understanding of common variations present in the second largest population in the world and investigating its implications in the sequence, structural and pharmacogenomic landscape. The preliminary investigation reported in this paper, supporting the screening and detection of ADRs specific to the Indian population could aid in the development of techniques for pre-clinical and post-market screening of drug-related adverse events in the Indian population.Entities:
Keywords: IndiGenome Consortium; Indian genetic variations; adverse drug reactions; docking; pharmacogenomics; single nucleotide variants
Year: 2022 PMID: 35865963 PMCID: PMC9294532 DOI: 10.3389/fphar.2022.858345
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.988
FIGURE 1Dendrogram representation kinase coding genes in IndiGen data using KinMapbeta. The circle size represents the number of drug molecules available for a gene with known drug-gene interaction. The class of kinase is highlighted with a unique color and the color gradient in each data circle represents the number of variations present in IndiGen data for that gene.
FIGURE 2Sequence Analysis using amino-acid exchanges reported for 545 druggable kinase Coding genes in IndiGen Data: (A). Amino-acid exchange matrix for reference and alternate amino acids of SNVs in IndiGen data. (B) Clustermap showing chemical shift observed among the reference and alternate amino acids at SNV sites reported in IndiGen data. (C) Scatter plot of mutability scores for each amino acid type in IndiGen data.
FIGURE 3(A) Comparing the trend of amino acid exchange among different populations from 1000 genome data with the Indian population. Bubble-plot was generated on the basis of the FDR corrected p-value associated with AA-exchange frequency for a particular Reference and Alternate AA observed in IndiGen data with EUR and SAS populations of 1000 genome data. The size of the bubble is proportional to the −log10 (p-value) linked with the amino acid exchange. AA exchanges with p-value 0.05 are highlighted in blue color. (B) An UpSet Plot of statistically significant AA exchanges observed in IndiGen w.r.t to other populations in 1000G data. (C) A grouped bar plot showing the count of variations lying before (green), within (pink), and after the domain (violet) for variants in IndiGen and populations in 1000G data. (D) A Box-plot for comparing allele-frequency distribution of common IndiGen variants (AF ≥10%) qualifying the filters used for structure data (22 variants) with different populations in 1000 genome data. (E) IndiGen specific SNVs (22 variants in structure data) with AF ≥ 10% observed in different databases like 1000 genome project, gnomAD exome data, and ExAC database; with IndiGen variations on X-axis and their allele frequencies in different databases on Y-axis.
FIGURE 4Comparison of structural characteristics of variants in IndiGen and Humsavar data: (A) Solvent accessibility for the variants in both datasets. (B) Secondary structure in which each of the variants occurs in both datasets. (C) Conservation score and ΔHydrophobicity distribution of variants in Humsavar and IndiGen data. (D) The area under the curve present on the left of −2 (ΔHydrophobicity) belongs to the percentage of residues for which a significant increase in hydrophobicity after the mutation was observed while a decrease in hydrophobicity was observed the for percentage of residue present on the right of +2 on X-axis. (E) Alluvial plot representing a change in folding energy (in kcal/mol) (δ δ G) and vibrational entropy by Dynamut energy for 22 variants. (F) Sunburn Plot representing secondary structure assignment done by DSSP for mutant residues. (G) A circle packing plot is showing relative solvent accessibility of mutated residues of 22 variants corresponding to 12 proteins calculated using Naccess. (H) HBPLUS results showing the number of hydrogen bonds made by mutated residue before mutation (green -bar), after mutation (blue-bar), and ΔH-bonds (yellow bars).
FIGURE 5(A) Bar plot showing docking results for 45 protein-drug pairs on the x-axis and change in binding affinity observed on the y-axis. Red bars represent a decrease in binding affinity and green bars represent an increase in binding affinity after mutation. (B) Ligand interaction diagram of native 6GQ7 (PIK3CG gene) bound to Zinc Sulfate (DB09322) .(C) Ligand interaction diagram of mutant T857A of PIK3CG gene (PDB ID 6GQ7) bound to Zinc Sulfate (DB09322) and main binding pocket (grey colour) where the majority of ligands were docked. (D) An alluvial plot with showing association of genes, variants, and diseases, where the thickness of the variant-disease line is proportional to the VDA score obtained from DisGeNET.
FIGURE 6(A) Heatmap showing pairwise ligand dissimilarity using MACCS fingerprints. (B) Heatmap showing ligand similarity using MACCS fingerprints. (C) Ligand dissimilarity using Morgan fingerprints. (D) Heatmap showing ligand similarity using Morgan fingerprints. Similarity and dissimilarity (1-similarity) score is represented using the Tanimoto coefficient (taking a value between 0 and 1, with 1 corresponding to maximum similarity).