| Literature DB >> 35350604 |
Xiaojing Cong1, Wenwen Ren2, Jody Pacalon1, Rui Xu3, Lun Xu4, Xuewen Li3, Claire A de March5, Hiroaki Matsunami5, Hongmeng Yu4,6,7, Yiqun Yu4,6, Jérôme Golebiowski1,8.
Abstract
G protein-coupled receptors (GPCRs) conserve common structural folds and activation mechanisms, yet their ligand spectra and functions are highly diverse. This work investigated how the amino-acid sequences of olfactory receptors (ORs)-the largest GPCR family-encode diversified responses to various ligands. We established a proteochemometric (PCM) model based on OR sequence similarities and ligand physicochemical features to predict OR responses to odorants using supervised machine learning. The PCM model was constructed with the aid of site-directed mutagenesis, in vitro functional assays, and molecular simulations. We found that the ligand selectivity of the ORs is mostly encoded in the residues up to 8 Å around the orthosteric pocket. Subsequent predictions using Random Forest (RF) showed a hit rate of up to 58%, as assessed by in vitro functional assays of 111 ORs and 7 odorants of distinct scaffolds. Sixty-four new OR-odorant pairs were discovered, and 25 ORs were deorphanized here. The best model demonstrated a 56% deorphanization rate. The PCM-RF approach will accelerate OR-odorant mapping and OR deorphanization.Entities:
Year: 2022 PMID: 35350604 PMCID: PMC8949627 DOI: 10.1021/acscentsci.1c01495
Source DB: PubMed Journal: ACS Cent Sci ISSN: 2374-7943 Impact factor: 14.553
Figure 1Machine learning protocol and residue selection. (A) Machine learning workflow, in which different residue subsets were extracted from the sequence alignment for the training of different models. The PCM approach combined the OR sequence features, the ligand physicochemical features, and the response data (if available) of each OR–ligand pair. (B) Available site-directed mutagenesis data (including literature data, summarized in ref (24)) projected on the 3D model of mOR256-31. Residues in dark red and red belong to poc17 and poc20, respectively. (C) Matthew’s correlation coefficient (MCC)[28] and hit rate of the RF classifiers on the in vitro test set.
Chemical Structure, PubChem CID, and Training Dataa of the Query Odorants (in Bold) and Their Analogues
P: number of responsive (positive) ORs. N: number of nonresponsive (negative) ORs. See Data File S1 for the lists of ORs.
Figure 2In vitro evaluation of machine learning predictions of OR responses to odorants. (A) All of the OR–odorant pairs were ranked by the predicted probability to be responsive. The initial model assessments focused on four odorants. 20 responsive and 60 nonresponsive ORs (negative controls) predicted by the poc60 model were selected for functional assays. Heatmaps show the in vitro EC50 values, in which the false predictions are labeled with ×. Assessments of the other models are provided in Figure S3. (B) In vitro assessment of the poc60 model predictivity for acyclic odorants. (C) Dose-dependent response curves of all of the responsive OR–odorant pairs identified in this study. Error bars indicate SEM (n = 3–6).
Performance of the poc60 Model in Predicting New OR–Odorant Pairsa
| initial test odorants | additional test odorants | ||||||
|---|---|---|---|---|---|---|---|
| metrics | acetophenone | R-carvone | coumarin | 4-chromanone | citral | nonanal | nonanoic acid |
| MCC | 0.47 | 0.45 | 0.43 | 0.48 | 0.24 | 0.48 | 0.40 |
| hit rate (precision) | 0.39 | 0.6 | 0.58 | 0.6 | 0.50 | 0.50 | 0.25 |
| recall (sensitivity) | 0.78 | 0.46 | 0.47 | 0.5 | 0.25 | 0.67 | 1.00 |
| F1 score | 0.52 | 0.52 | 0.52 | 0.55 | 0.33 | 0.57 | 0.40 |
| specificity | 0.85 | 0.94 | 0.92 | 0.94 | 0.93 | 0.88 | 0.65 |
| AUC | 0.84 | 0.72 | 0.72 | 0.74 | 0.58 | 0.66 | 0.74 |
| true positives | 7 | 6 | 7 | 6 | 1 | 2 | 2 |
| true negatives | 60 | 63 | 60 | 64 | 14 | 14 | 11 |
| false positives | 11 | 4 | 5 | 4 | 1 | 2 | 6 |
| false negatives | 2 | 7 | 8 | 6 | 3 | 1 | 0 |
See Data File S2C for the raw data.
See the Methods section in the SI for the definitions.
Figure 3Location of the residues that best encode OR responses to ligands, illustrated with mOR256-31. Conserved motifs in ORs are squared. The N- and C-termini are truncated for clarity.