Pharmaceuticals and industrial chemicals, both in the environment and in research settings, commonly interact with aquatic vertebrates. Due to their short life-cycles and the traits that can be generalized to other organisms, fish and amphibians are attractive models for the evaluation of toxicity caused by endocrine disrupting chemicals (EDCs) and adverse drug reactions. EDCs, such as pharmaceuticals or plasticizers, alter the normal function of the endocrine system and pose a significant hazard to human health and the environment. The selection of suitable animal models for toxicity testing is often reliant on high sequence identity between the human proteins and their animal orthologs. Herein, we compare in silico the ligand-binding sites of 28 human "side-effect" targets to their corresponding orthologs in Danio rerio, Pimephales promelas, Takifugu rubripes, Xenopus laevis, and Xenopus tropicalis, as well as subpockets involved in protein interactions with specific chemicals. We found that the ligand-binding pockets had much higher conservation than the full proteins, while the peroxisome proliferator-activated receptor γ and corticotropin-releasing factor receptor 1 were notable exceptions. Furthermore, we demonstrated that the conservation of subpockets may vary dramatically. Finally, we identified the aquatic model(s) with the highest binding site similarity, compared to the corresponding human toxicity target.
Pharmaceuticals and industrial chemicals, both in the environment and in research settings, commonly interact with aquatic vertebrates. Due to their short life-cycles and the traits that can be generalized to other organisms, fish and amphibians are attractive models for the evaluation of toxicity caused by endocrine disrupting chemicals (EDCs) and adverse drug reactions. EDCs, such as pharmaceuticals or plasticizers, alter the normal function of the endocrine system and pose a significant hazard to human health and the environment. The selection of suitable animal models for toxicity testing is often reliant on high sequence identity between the human proteins and their animal orthologs. Herein, we compare in silico the ligand-binding sites of 28 human "side-effect" targets to their corresponding orthologs in Danio rerio, Pimephales promelas, Takifugu rubripes, Xenopus laevis, and Xenopus tropicalis, as well as subpockets involved in protein interactions with specific chemicals. We found that the ligand-binding pockets had much higher conservation than the full proteins, while the peroxisome proliferator-activated receptor γ and corticotropin-releasing factor receptor 1 were notable exceptions. Furthermore, we demonstrated that the conservation of subpockets may vary dramatically. Finally, we identified the aquatic model(s) with the highest binding site similarity, compared to the corresponding humantoxicity target.
Aquatic vertebrates
are targeted by pharmaceutical and industrial
chemicals, both intentionally and unintentionally, in a variety of
research and environmental contexts. In the wild, these animals are
exposed to the pharmaceuticals and industrial chemicals present in
the surface waters. In research settings, aquatic vertebrates may
be used to evaluate novel chemicals for toxicity, including the early
identification of adverse drug reaction (ADR) or endocrine disruption
(ED) potential of pharmaceutical candidates and industrial chemicals.Lower order vertebrates, such as amphibians and fish, are being
increasingly viewed as a replacement for rodent models. They are convenient
and cost-effective model organisms due to their short life-cycles
and the presence of traits that can be generalized to other organisms.[1] Species that are commonly used for toxicological
evaluations include Danio rerio (zebrafish), Pimephales promelas (fathead minnow), Takifugu rubripes (Japanese pufferfish), Xenopus laevis (African
clawed frog), and Xenopus tropicalis (Western clawed
frog).[1−4] Specifically, D. rerio has been widely used to
study ADRs that include reproductive toxicity, cardiotoxicity, hepatotoxicity,
and neurotoxicity,[5] as well as the evaluation
of potential endocrine disrupting chemicals (EDCs; reviewed in ref (1)). P. promelas has been used to predict the aquatic toxicity of environmental chemicals,[2] and T. rubripes has been used
to evaluate EDCs.[6,7] Amphibians are known to be good
models for studying EDCs that interact with thyroid hormone receptors[8] and X. laevis has been used
to study ADRs related to membrane transporters.[9]Toxicity, for chemicals with low concentrations in
the target organisms,
is most frequently caused by their specificity to particular proteins
in the organism. Comparing the protein sequences and structures of
humantoxicity targets to their orthologs in aquatic species can assist
in the identification of the most similar ortholog.For the
reliable prediction of pharmaceutical or environmental
toxicity, robust animal models are required whose proteins are highly
similar to the orthologous human ADR and toxicity targets. Additionally,
in the wild, these species are more vulnerable than others to pharmaceuticals
present in the environment that have been specifically designed for
high-affinity interactions with the designated proteins.[10]Typically in toxicity studies, one rodent
model and one nonrodent
model are employed.[11] However, depending
on the target and the class of chemicals in question, some animal
models may be more relevant than others. The ever-increasing number
of species with fully sequenced genomes has begun to allow for druggable
genome and proteome comparisons. Recently, the genomes of eight relevant
toxicological species were compared to the human genome.[12] Target similarity has been assessed at the level
of protein sequence, with the degree of conservation of specific drug
targets in humans and model organisms evaluated by performing sequence-by-sequence
alignments,[10] and limited studies have
been conducted on the domain conservation for the androgen receptor
(AR) and estrogen receptor α (ERα).[13]Nevertheless, the levels of conservation between
orthologous sequences
usually vary throughout the sequence (Figure 1). Thus, it is important to focus on the similarity of sections of
the sequence that are most relevant to chemical interactions. The
conservation of residues directly involved in ligand binding is a
more relevant parameter for evaluation of aquatic species models than
full sequence similarity. Interspecies variations in the amino-acid
composition of the binding-pocket can sometimes have dramatic effects
on the utility of species in pharmacological assays. For example,
in the serotonin 6 receptor (5-HT6R), two residues in the
ligand-binding pocket were found to significantly change the pharmacology
of the mouse 5-HT6R (resulting in a systematic one log
unit shift of the 5-HT6R ligands), compared to the human
and rat 5-HT6R,[15,16] making the mouse model
an unfavorable choice for testing 5-HT6R-targeting pharmaceuticals,
while the rat 5-HT6R binding pocket is identical to humans.
Similarly, two (out of 13) minor amino-acid substitutions (Thr to
Ala and Ala to Val) in the binding pocket of the rat and mouse histamine
H3 receptors (H3R), compared to the humanH3R, lead to a systematic compound potency measurement error
and limits both of their utilities in H3-related studies.[17]
Figure 1
Variations in sequence conservation across the sequence
of the
AR for D. rerio, P. promelas, T. rubripes, X. laevis, and X.
tropicalis compared to the human AR (binding site residues
highlighted in cyan). All sequences were window averaged across 25
residues. Abbreviations: AF1/2, activation function 1/2; DBD, DNA
binding domain; LBD, ligand binding domain.[14]
Variations in sequence conservation across the sequence
of the
AR for D. rerio, P. promelas, T. rubripes, X. laevis, and X.
tropicalis compared to the humanAR (binding site residues
highlighted in cyan). All sequences were window averaged across 25
residues. Abbreviations: AF1/2, activation function 1/2; DBD, DNA
binding domain; LBD, ligand binding domain.[14]Because orthologous proteins in
different species typically bind
the same or similar endogenous ligands,[8] the conservation of the binding pockets far exceeds the full length
sequence conservation. They are also likely to bind the same exogenous
chemicals. The aim of this research was to identify the aquatic organisms
(from the set of D. rerio, P. promelas, T. rubripes, X. laevis, and X. tropicalis) that share the highest binding pocket similarity
with humans in each of the 28 best-characterized toxicity targets.
X-ray crystal structures were used to identify the amino-acid residues
constituting the ligand-binding pockets, which were extrapolated to
the aquatic orthologs. Sequence similarity and identity were calculated
for the ligand-binding sites, and the most similar orthologs to the
28 humantoxicity targets were identified.
Materials and Methods
Selection
of Human EDC and ADR Targets
An initial set
of 85 unique human proteins that have been previously characterized
as side-effect and toxicity targets were compiled from the 73 protein
assays listed in the Novartis in vitro safety panels
(Table S1, Supporting Information), 11
targets from the VirtualToxLab,[18,19] and the Constitutive
Androstane Receptor (CAR; NR1I3). All 85 proteins were used for sequence
analyses. For binding pocket similarity analyses, the 85 targets were
matched against the Pocketome encyclopedia (http://pocketome.org),[20] a collated set of annotated, binding
pocket structure ensembles from the Protein Data Bank (PDB).[21] At the time of this study, 28 out of the 85
targets had Pocketome entries for their ligand-binding pockets available
(Table S1, Supporting Information) that
contained at least one cocrystallized ligand making it possible to
precisely identify the binding site residues. These 28 targets were
used for binding site similarity and identity comparisons.
Identification
of Orthologs of Human EDC and ADR Targets in
the Aquatic Species
The complete proteomes of D.
rerio, Mus musculus (mouse), P.
promelas, Rattus norvegicus (rat), T. rubripes, X. laevis, and X.
tropicalis were downloaded in FASTA format from the UniProt
Knowledgebase.[22] The M. musculus and R. norvegicus results have been included in
all Supporting Information for comparison
purposes. For each of the files, BLAST search index was generated
using the bioinformatics module of the Internal Coordinate Mechanics
(ICM) software version 3.7-3a (Molsoft L.L.C., La Jolla, CA).[23,24] A BLAST search[25] was performed to identify
orthologs of the 85 human proteins in the corresponding aquatic species.
One hit per target per species was retained using the following prioritization
rules: (i) manually annotated orthologs of the toxicity and side-effect
targets were retained with the highest priority; (ii) for automatically
annotated analogues, orthologs with the same gene name as the human
protein and the highest probability score to the human protein were
kept; (iii) if only sequence fragments were available, the longest
fragment was retained.
Sequence Alignment and Analysis
Pairwise alignments
were constructed between the full sequence of human protein and the
corresponding orthologs, and pairwise sequence scores were calculated
with the Needleman and Wunsch algorithm[26] modified for the zero end-gap penalties (the ZEGA algorithm[27]) as implemented in the ICM program. We used
gap opening and gap extension penalties of 2.4 and 0.15, respectively.
Sequence identity was represented by the number of identical residues
over the total number of aligned residues. Sequence similarity was
calculated using the GONNET residue substitution comparison matrix.[28]
Binding Site Definition and Classification
Using Ligand Contact
Strength Fingerprints
For each ligand in the pocketome entry
and each non-hydrogen atom in the protein, distance-dependent contact
strengths were calculated using the parameters developed in context
of GPCR Dock 2010 evaluation.[29,30] The per-atom contact
strengths were aggregated into per-residue contact strength values
by taking the sum over all non-hydrogen atoms in the residue side-chain.
Only residue side-chains were included in the calculation because,
except for proline, ligand contacts with backbone atoms may not be
affected by residue substitutions between species. If a ligand was
cocrystallized in multiple structures, the vectors of per-residue
contact strengths were averaged. To reduce noise and binding site
definition artifacts associated with increased conformational variability
of individual residues, the contact strength vector components were
multiplied by a factor ranging from 0 to 1 and inversely proportional
to the observed conformational variability of the corresponding residue
in the Pocketome ensemble.Each unique ligand L was characterized by a vector FP of per-residue numbers ranging
from 0 (no contact) to 32 (extensive close contact with Phe168 in
the adenosine A2A receptor (A2AR); Table S1, Supporting Information). Normalized fingerprint
distance between ligands L and L was calculated
as D = 1 – (ΣMin(FP,FP))/(Σ(FP + FP)/(2)) where Min(FP,FP) and (FP+FP)/2 are vectors of element-wise minima
and element-wise averages between vectors FP and FP, respectively.[30] When defined that way,
ligand fingerprint distances range from 0 (for identical fingerprints)
to 1 (for nonoverlapping fingerprints). Ligand interaction fingerprints
were clustered at the distance cutoff of D = 0.35
to identify classes of ligand occupying distinct areas in the binding
site. The cutoff of 0.35 was found to be the optimal trade-off between
the excessive number of clusters and the unwanted aggregation of substantially
different ligand chemotypes in multiple targets. This cutoff indicates
that the ligands will be classified as belonging to different clusters
if their fingerprints vary by one-third (or more) of the contacts.Next, clusters of unique crystallographic ligands were ordered
by their size, starting with the most populated one and ending with
singletons (i.e., clusters containing only a single ligand). Top clusters
containing 80% of the ligands were combined to define the set of residues
interacting with the majority of the ligands. The remaining 20% were
disregarded in the pocket definition to ensure that it is not affected
by occasional or spurious contacts.
Binding Pocket Sequence
Identity and Similarity Calculations
For each subpocket in
the binding site, as determined by ligand
contact strength fingerprint clustering, a subalignment was extracted
by projecting the full sequence alignment between human and ortholog
sequences onto the corresponding residue selection. Binding pocket/subpocket
identity and similarity were calculated from these subalignments using
the same parameters as the full sequence alignments. The same was
done for the set of residues forming the interaction site(s) for at
least 80% of the ligands, as described above, and thus represent the
aggregation of the consistently populated regions of the pocket. The
comparison of complete pockets (including interaction fingerprints
of all crystallographic ligands) is available in Supporting Information.
Results
Orthologs of
Human EDC and ADR Targets in Aquatic Vertebrates
Five fish
and amphibians frequently used in toxicological evaluations
were used in this study: D. rerio, P. promelas, T. rubripes, X. laevis, and X. tropicalis. In their proteomes, we identified the orthologs
of the known human side-effect and environmental target proteins.
In some cases, orthologs could not be found: 89% of the toxicity targets
were identified in D. rerio, 20% in P. promelas, 84% in T. rubripes, 51% in X. laevis, and 85% in X. tropicalis (Table S1, Supporting Information). This may be explained
by the fact that only the genomes of D. rerio,[31,32]T. rubripes,[33] and X. tropicalis(34) have been fully
sequenced, while the remaining two genomes (P. promelas and X. laevis), and thus proteomes, are incomplete.
Additionally, in some cases, only protein fragments of the toxicity
target orthologs have been identified. The sequences of the human
and orthologous toxicity proteins were aligned, and the full sequence
similarity was calculated (Figure 2a, Table
S1, Supporting Information).
Figure 2
Sequence similarity
(percentage and color) and sequence identity
(number of identical residues/number of aligned residues is shown
in parentheses) for the 28 toxicity target proteins of (a) the full
sequence and (b) the ligand-contact residues conserved for 80% of
the cocrystallized ligands. White spaces indicate that no ortholog
was identified (often due to an incomplete proteome).
Sequence similarity
(percentage and color) and sequence identity
(number of identical residues/number of aligned residues is shown
in parentheses) for the 28 toxicity target proteins of (a) the full
sequence and (b) the ligand-contact residues conserved for 80% of
the cocrystallized ligands. White spaces indicate that no ortholog
was identified (often due to an incomplete proteome).
Full Sequence Similarity between Human EDC/ADR
Targets and Their
Orthologs in Aquatic Vertebrates
The relevance of a model
organism for prediction of toxicity in humans has previously been
evaluated using the amino acid conservation across entire protein
sequences, e.g., ref. (10). In the present study, the majority of the humantoxicity targets
displayed 60–70% sequence similarity with their aquatic vertebrate
orthologs (Figure 2a). The average full sequence
similarity between the human proteins and the aquatic orthologs was
69% for D. rerio, 63% for P. promelas, 70% for T. rubripes, 71% for X. laevis, and 72% for X. tropicalis (Figure S1, Supporting Information). In some cases, the overall
sequence similarity was relatively high. For example, X. tropicalis had the highest full sequence similarity for the androgen receptor
(AR, 88%). However, the protein sequence for X. tropicalis was only a fragment of the full sequence that lacked the N-terminal
domain of the protein compared to the other species, giving artificially
higher sequence similarity. The corticotropin-releasing factor receptor
1 (CRF1R) is highly conserved in four species (∼85%
sequence similarity). The interspecies variations in full sequence
similarity were more informative for the estrogen receptors α
and β (ERα and ERβ, respectively), and the glucocorticoid
receptor (GR), where the full sequences were similar in length. X. laevis and X. tropicalis shared higher
conservation of these receptors with human (9–24% higher sequence
similarity) than with D. rerio, P. promelas, and T. rubripes. The impact of the variability
of the sequence length on the full sequence similarity demonstrates
the difficulties with using the full protein sequence (or longest
available sequence) in these calculations.
Ligand-Binding Pocket Similarity
between Human EDC/ADR Targets
and Their Orthologs in Aquatic Vertebrates
As expected, the
ligand-binding pockets of the orthologous proteins generally shared
higher sequence conservation with the humantoxicity targets than
the full protein sequences (Figure 2b). For
example, the ligand-binding site of humanAR shared ∼98% sequence
similarity with all five species, whereas the full sequence similarity
was only 47–88%. Likewise, the binding sites of ERα,
ERβ, and GR are 92–100% conserved in all five aquatic
species, while the highest full sequence conservation observed in X. laevis and X. tropicalis did not exceed
70–76%. The relative ranking of species by the full sequence
similarity to humans often varies from that by binding pocket similarity.
For example, on the basis of full sequence similarity, one would choose X. laevis or X. tropicalis as the most
relevant model for testing ERα-targeting chemicals; however,
our pocket similarity analysis indicates that all five species are
almost equally good, with the fish species having a slight advantage
over the frogs. Similarly, despite being most similar to human in
terms of full β2 adrenergic receptor (β2AR) sequence, X. tropicalis is probably the
least accurate of the five models for evaluation of β2AR ligand pharmacology, as it has as many as 5 residue substitutions
in the binding pocket (Figure S3, Supporting Information).Surprisingly, two targets had lower sequence conservation
in the binding site as compared to the full sequence. These were the
obesity- and stress-related targets, peroxisome proliferator-activated
receptor γ (PPARγ) and CRF1R. PPARγ displayed
lower binding-site similarity (56–85%) than full sequence similarity
(74–89%). CRF1R displayed higher sequence similarity
across the full protein sequence (∼85%) than in the peptide-binding
site in its extracellular domain (46–78%). However, GPCRs often
have a greater degree of sequence variability in the extracellular
domains; hence, the lower sequence similarity in the peptide-binding
site of CRF1R is consistent with the nature of this receptor.
Ligand-Binding Pockets in ADR/EDC Targets: One Size Does Not
Fit All
On closer inspection of the ligand-binding interactions
in the X-ray crystal structures of the human EDC and ADR targets,
there were often noticeably different residue interaction fingerprints
for different ligand chemotypes. In some cases, different chemotypes
can bind to distinct ligand-binding pockets or “sub-pockets”
of the proteins.This is exemplified by the identification of
three different subpockets of the adenosine A2A receptor
(A2AR). Promisingly, the three subpockets identified for
A2AR (Figure 3a) correspond to an
agonist-bound structure (Figure 3b), the endogenous
agonist-bound structure (Figure 3c), and the
antagonist-bound structures (Figure 3d), respectively.
All subpockets were fully conserved in X. laevis and X. tropicalis. Additionally, significant variations in the
conservation of subpockets can be observed for the ortholog of β2AR in X. tropicalis (Figure S3, Supporting Information), where subpocket 1 displays
75% conservation, yet subpocket 2 has only 48% sequence similarity.
Figure 3
(a) Sequence
similarity (percentage and color) and sequence identity
(number of identical residues/number of aligned residues is shown
in parentheses) for the three A2AR subpockets (white spaces
indicate that no ortholog was identified). A2AR crystal
structures (gray ribbons), all cocrystallized ligands (mesh), and
subpocket (solid surface); (b) subpocket 1 (agonist-bound structures),
(c) subpocket 2 (the endogenous agonist-bound structure), and (d)
subpocket 3 (antagonist-bound structures).
(a) Sequence
similarity (percentage and color) and sequence identity
(number of identical residues/number of aligned residues is shown
in parentheses) for the three A2AR subpockets (white spaces
indicate that no ortholog was identified). A2AR crystal
structures (gray ribbons), all cocrystallized ligands (mesh), and
subpocket (solid surface); (b) subpocket 1 (agonist-bound structures),
(c) subpocket 2 (the endogenous agonist-bound structure), and (d)
subpocket 3 (antagonist-bound structures).Because the likelihood of a chemical interacting with an
aquatic
species ortholog of its target protein largely depends on the conservation
of specific interacting residues and not the entire binding site,
we sought to identify the individual subpockets in each of the target
pockets and to separately evaluate their similarity to the corresponding
subpockets in the studied aquatic organisms. Subpockets were identified
by the clustering of contact-strength fingerprints (see Materials and Methods).
GPCR Subpocket Sequence
Conservation
GPCRs are a superfamily
of membrane bound proteins characterized by seven transmembrane (TM)
helices and many have been implicated in ADRs, endocrine disruption,
and reproductive toxicity.[35] The A2AR is implicated in a number of ADRs such as palpitations
and angina.[18] ADRs for the β2 adrenergic receptor (β2AR) include tremor,
cardiac failure, and angina;[18] it has also
been implicated in ED in aquatic vertebrates.[4,36] The
serotonin 2B receptor (5-HT2BR) is linked to valvular heart
disease;[37] the histamine H1 receptor
(H1R) is involved in sedation, and the human M2 muscarinic acetylcholine receptor (M2R) is associated
with constipation.[18] The dopamine D3 receptor (D3R) is implicated in dyskinesia and
Parkinsonism[18] and shown to bind the known
endocrine disruptor BPA.[38] Two class B
GPCRs were also evaluated: CRF1R, which is implicated in
stress-related disorders,[39,40] and the gastric inhibitory
polypeptide receptor (GIPR), which is implicated in diabetes and obesity.[41]Two subpockets were identified for β2AR (Figure S3, Supporting Information), the classical orthosteric site (subpocket 1) and the orthosteric
site with some additional residues from the less conserved TM1/TM2/TM7
region (subpocket 2). Generally, X. tropicalis displayed
poor ligand-binding pocket conservation to the human β2AR (75% and 48%, subpockets 1 and 2, respectively). Due to the scarcity
of multiple crystal structures for many GPCRs, subpockets were unable
to be explored for the 5-HT2BR, D3R H1R, κ opioid receptor (κOR), and M2R; however,
the binding pockets were generally well conserved (69–100%;
Figures 2b and S4, Supporting
Information).At the time of this study, crystal structures
were only available
for the extracellular domains of the GPCRs CRF1R and GIPR,
which contain the peptide-binding sites. These peptide-binding sites
were expected to have lower levels of conservation because it is well
established that the extracellular domains of GPCRs have a large degree
of sequence variability. Only X. tropicalis had a
moderately conserved ortholog for GIPR (61%, Figure S4, Supporting Information), indicating that alternate
animal models should also be investigated. X. laevis and X. tropicalis displayed higher ligand-binding
pocket similarity across both subpockets (60–78%, Figure S4, Supporting Information). However, it is unlikely
that peptides in the environment would result in endocrine disruption
via the peptide-binding site of CRF1R and GIPR in either
humans or the fish and amphibians evaluated in this study, as potential
ED peptides are unlikely to be readily absorbed. Consequently, this
technique should also be applied to the small molecule binding site
of GIPR when a structure becomes available and to the recently released
structure of CRF1R.[42]
Nuclear
Receptor Subpocket Conservation
Nuclear receptors
are a superfamily of proteins that regulate development, growth, and
homeostasis, and they are commonly implicated in endocrine disruption.
Some classic examples of ED that occur via nuclear
receptors include the weak agonistic activity of the plasticizer bisphenol
A (BPA) against the ERα;[43] the feminization
of fish by 17α-ethinylestradiol (EE2), a synthetic
estrogen in human contraceptives;[44] and
modulation of PPARγ by EDCs, which is implicated in obesity.[45]ERα subpockets were generally highly
conserved across the aquatic species (94–100%, Figure S5, Supporting Information), with the exception of T. rubripes for subpocket 8, which is bound to a large estradiol
metal chelate ligand (88%). The binding pocket of ERβ across
the five species, compared to the human ERβ, was generally highly
conserved (92–100%, Figure S4, Supporting
Information). However, across all the subpockets, T.
rubripes was slightly less conserved (92–95% vs. 98–100%).
ERR1 has only been cocrystallized with two unique ligands in two unique
subpockets (Figure S4, Supporting Information), with subpocket 2, cocrystallized with a thiazolidinedione, having
higher sequence conservation (82–89% vs. 54–60%). The
subpockets of the glucocorticoid receptor (GCR) were generally well
conserved with the human receptor (91–98%, Figure S4, Supporting Information). The binding sites of
the progesterone receptor (PR) for X. laevis and X. tropicalis shared slightly higher pocket conservation
with the human receptor (98–100%, Figure S4, Supporting Information). The subpockets of the androgen receptor
(AR) were highly conserved (96–100%, Figure S6, Supporting Information), and the subpockets of
the thyroid hormone receptor β (TRβ) were fully conserved
(100%, Figure S4, Supporting Information). Unlike TRβ, the thyroid hormone receptor α (TRα)
did not show full sequence conservation across all species (86–100%;
Figure S4, Supporting Information). All
subpockets across all species (except for P. promelas for which no ortholog was identified) were fully conserved for the
Liver X Receptor (LXR; Figure S4, Supporting Information). While no subpockets were identified for the mineralocorticoid
receptor (MCR; Figure S4, Supporting Information), X. tropicalis had the lowest LBD similarity (81%).
Of the five aquatic species, T. rubripes consistently
displayed higher homology to the humanPregnane X receptor (PXR; 54–64%,
Figure S7, Supporting Information). Despite
this, the overall pocket similarity was relatively low (maximum 64%),
indicating that PXR is not well conserved in these aquatic vertebrates
and that other animal models with higher binding site conservation
should also be investigated. Similarly, low binding-pocket conservation
was observed for the Constitutive Androstane Receptor (CAR; 35–43%;
Figure S4, Supporting Information). In
15 out of the 16 subpockets, X. tropicalis had the
highest ligand-binding pocket sequence similarity to the human PPARγ
(81–100%; Figure S8, Supporting Information). Interestingly X. laevis, a close relative of X. tropicalis, had significantly lower ligand-binding pocket
sequence similarity (50–80%).
Cytochrome P450 Subpocket
Sequence Conservation
Cytochrome
P450s (CYPs) are a superfamily of enzymes that catalyze the oxidation
of a diverse range of organic compounds and are commonly involved
in the metabolism of xenobiotic compounds. CYPs typically have large
and conformationally flexible binding sites in order to accommodate
a wide range of chemically dissimilar compounds,[46,47] which is supported by the diverse array of subpockets identified.
There were closely related orthologs to the humanCYP1A2, with D. rerio having the highest pocket similarity (96%, Figure
S4, Supporting Information). Both D. rerio and P. promelas had closely related
orthologs of CYP3A4 across five out of six subpockets (89–100%,
Figure S9, Supporting Information). X. laevis and X. tropicalis had the highest
subpocket similarities for CYP2C9 (60–78%); however, the ligand-binding
pocket conservation was moderate (Figure S4, Supporting
Information). Orthologs of CYP2D6 were only identified in X. laevis and X. tropicalis, which displayed
good conservation to the human protein (Figure S4, Supporting Information).
Subpocket Sequence Conservation
of Other Enzymes
Monoamine
oxidase A (MAO-A) is involved in the catabolism of neurotransmitters
and dietary amines; inhibition can lead to neuroendocrine disruption,[48] and it is implicated in ADRs including psychosis
and hypertensive crisis.[49] ADRs associated
with cAMP-specific 3′,5′-cyclic phosphodiesterase 4D
(PDE4D) include diarrhea and nausea,[50] and
due to its role in the endocrine system, PDE4D may also be a target
for EDCs.[51] The binding site of PDE4D was
fully conserved across the identified ortholog binding sites (100%,
Figure S4, Supporting Information). The
subpockets for MAO-A, however, displayed higher sequence similarity
for D. rerio and T. rubripes (95%,
Figure S4, Supporting Information).
Discussion
The present study performs a comparison of 28 humantoxicity targets
to their orthologs in five aquatic species, with the goal of identifying
the aquatic organisms with the highest ligand-binding pocket sequence
similarity to the humantoxicity target. The comparison was performed
not only at the level of full protein sequences but also, more relevantly,
at the level of the ligand-binding sites. By using the X-ray crystal
structures of humantoxicity targets, residue-level interaction fingerprints
were calculated for each unique cocrystallized ligand, and binding
pockets and spatially distinct subpockets were identified, with each
residue selection extrapolated onto the orthologous proteins in the
five aquatic vertebrates. In some cases, the contact fingerprints
could also separate the toxicity target crystal structures based on
the mode of action of the cocrystallized ligands (such as A2AR; Figure 3), providing a basis for understanding
the subpocket sequence conservation.We identified the aquatic
vertebrate(s) that share the highest
sequence similarity for the ligand-binding pockets (Table 1), compared to the humantoxicity targets, as well
as determined the sequence similarity of the spatially distinct subpockets. X. tropicalis had the largest number of orthologs that shared
the highest conservation with the humantoxicity targets (out of the
five aquatic species), having the highest ligand-binding site similarity
for 21 out of the 28 toxicity targets, closely followed by D. rerio (19), T. rubripes (19), and X. laevis (18). P. promelas had the lowest
number of highly conserved ligand-binding pockets with only 7 ligand-binding
sites with high similarity, which can be partially attributed to an
incomplete genome.
Table 1
Identification of the Aquatic Vertebrate
Model(s) with the Highest Ligand-Binding Pocket Similarity (Denoted
by X) Compared to the Corresponding Human Toxicity Targeta
aquatic
vertebrate model(s) with the highest ligand-binding pocket similarity
receptora
D. rerio
P. promelas
T. rubripes
X. laevis
X. tropicalis
5-HT2BR
X
X
A2AR
X
X
X
X
M2R
X
X
X
β2AR
X
X
X
X
AR
X
X
X
X
X
MAO-A
X
X
CYP1A2
X
X
CYP2C9*
X
X
CYP2D6
X
X
CYP3A4
X
X
CRF1R
X
X
D3R
X
X
ERR1
X
X
X
X
ERα
X
X
X
X
X
ERβ
X
X
X
X
X
GCR
X
X
X
X
X
GIPR
X
H1R
X
X
X
MCR
X
X
X
LXR
X
X
X
X
PXR*
X
X
X
CAR*
X
X
κOR
X
X
X
PDE4D
X
X
X
PPARγ
X
PR
X
X
TRα
X
X
X
TRβ
X
X
X
X
X
∗ indicates targets where
other species should be investigated due to orthologs with only low
or moderate pocket similarity.
∗ indicates targets where
other species should be investigated due to orthologs with only low
or moderate pocket similarity.In this study, we demonstrated that the major difficulty faced
when using the full sequence similarity for the comparison of toxicity
target orthologs to human proteins is due to variations in the length
of the amino acid sequences. For example, while X. tropicalis has the highest full sequence similarity for AR (88%), the longest
available sequence of the AR of X. tropicalis was
actually incomplete, lacking the N-terminal of the protein including
the DNA binding domain (393 residues vs. >729 residues), thus giving
artificially higher sequence similarity. This also occurred for some
of the aquatic orthologs of MCR, PDE4D, PPARγ, PR, TRα,
and TRβ. Additionally, we have shown that high full sequence
similarity does not always correlate with high ligand-binding site
conservation. For example, the sequence similarity for the extracellular
domains of CRF1R for all species is high (∼85%),
yet the peptide-binding sites have lower conservation (46–78%).
Generally, we have demonstrated that the ligand-binding sites share
higher conservation between orthologs, compared to the full sequences
(Figure 2). Consequently, we also have shown
that the ligand-binding site similarity is the preferred method for
the identification of the most conserved orthologs, because it is
more informative than the full sequence similarity and it is not influenced
by variations in the length of the longest available amino acid sequence
of an ortholog. Additionally, if full sequence similarity alone is
to be considered, variations in the length of the full (or longest
available sequence) should also be incorporated into these assessments.There are a few caveats that need to be taken into consideration
when using orthologous sequence comparisons to aid in the selection
of animal models for the evaluation of toxicity. First, the provided
principles only suggest toxicity target orthologs in aquatic species
based on sequence similarity, without attention to possible variations
in the protein function or the downstream pathways.[10,52] This method unfortunately does not provide any detail regarding
the signaling pathways for orthologous protein and will, of course,
require a certain level of understanding of the animal model. Binding
pocket similarity may be a necessary but not a sufficient condition
for model utility, as exemplified by the pair of human and rat ARs:
a large-scale study of interspecies variations in binding affinity
of chemicals[17] identified this pair as
having systematic one log unit differences in potency of multiple
diverse chemicals, despite the fact that not only the binding pockets
but also the entire ligand binding domain of AR is strictly conserved
between human and rat. Second, our method is reliant on the availability
of the proteome of the organisms or, at the very least, the availability
of sequences of the orthologs of the toxicity targets. Third, calculating
ligand-binding site conservation requires X-ray crystal structures
of the humantoxicity targets, preferably in a complex with a diverse
range of chemicals. Both of the problems regarding the availability
of the full proteomes and crystal structures can be addressed in future
studies, due to the increasing availability of these data. Thus, this
study could be expanded to a wider range of toxicity targets and species,
including toxicity targets that lack crystal structures, by using
crystal structures of highly homologous proteins.By calculating
the amino acid similarity in the ligand-binding
pockets, we have successfully avoided the problem of full sequence
length variability in sequence similarity calculations, to determine
the aquatic orthologs with the most similar ligand-binding pockets
for 28 humantoxicity targets. This method also allows for the calculation
of binding site similarity for subpockets that are involved in the
specific chemical–protein interactions. We believe that this
study will be a useful tool when designing target-specific assays
for the assessment of ADRs and ED potential of chemicals.
Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971
Authors: Richard L Hauger; Dimitri E Grigoriadis; Mary F Dallman; Paul M Plotsky; Wylie W Vale; Frank M Dautzenberg Journal: Pharmacol Rev Date: 2003-03 Impact factor: 25.468
Authors: Kathleen M Giacomini; Shiew-Mei Huang; Donald J Tweedie; Leslie Z Benet; Kim L R Brouwer; Xiaoyan Chu; Amber Dahlin; Raymond Evers; Volker Fischer; Kathleen M Hillgren; Keith A Hoffmaster; Toshihisa Ishikawa; Dietrich Keppler; Richard B Kim; Caroline A Lee; Mikko Niemi; Joseph W Polli; Yuichi Sugiyama; Peter W Swaan; Joseph A Ware; Stephen H Wright; Sook Wah Yee; Maciej J Zamek-Gliszczynski; Lei Zhang Journal: Nat Rev Drug Discov Date: 2010-03 Impact factor: 84.694
Authors: J H Postlethwait; Y L Yan; M A Gates; S Horne; A Amores; A Brownlie; A Donovan; E S Egan; A Force; Z Gong; C Goutel; A Fritz; R Kelsh; E Knapik; E Liao; B Paw; D Ransom; A Singer; M Thomson; T S Abduljabbar; P Yelick; D Beier; J S Joly; D Larhammar; F Rosa; M Westerfield; L I Zon; S L Johnson; W S Talbot Journal: Nat Genet Date: 1998-04 Impact factor: 38.330
Authors: Jessica J Vamathevan; Matthew D Hall; Samiul Hasan; Peter M Woollard; Meng Xu; Yulan Yang; Xin Li; Xiaoli Wang; Steve Kenny; James R Brown; Julie Huxley-Jones; Jon Lyon; John Haselden; Jiumeng Min; Philippe Sanseau Journal: Toxicol Appl Pharmacol Date: 2013-04-19 Impact factor: 4.219
Authors: Maxwell C K Leung; Andrew C Procter; Jared V Goldstone; Jonathan Foox; Robert DeSalle; Carolyn J Mattingly; Mark E Siddall; Alicia R Timme-Laragy Journal: Reprod Toxicol Date: 2017-03-04 Impact factor: 3.143
Authors: Daniel L Villeneuve; Doug Crump; Natàlia Garcia-Reyero; Markus Hecker; Thomas H Hutchinson; Carlie A LaLone; Brigitte Landesmann; Teresa Lettieri; Sharon Munn; Malgorzata Nepelska; Mary Ann Ottinger; Lucia Vergauwen; Maurice Whelan Journal: Toxicol Sci Date: 2014-12 Impact factor: 4.849
Authors: Carlie A LaLone; Jason P Berninger; Daniel L Villeneuve; Gerald T Ankley Journal: Philos Trans R Soc Lond B Biol Sci Date: 2014-11-19 Impact factor: 6.237
Authors: Rebecca H Weissinger; Brett R Blackwell; Kristen Keteles; William A Battaglin; Paul M Bradley Journal: Sci Total Environ Date: 2018-05-02 Impact factor: 7.963
Authors: Paul M Bradley; Dana W Kolpin; Kristin M Romanok; Kelly L Smalling; Michael J Focazio; Juliane B Brown; Mary C Cardon; Kurt D Carpenter; Steven R Corsi; Laura A DeCicco; Julie E Dietze; Nicola Evans; Edward T Furlong; Carrie E Givens; James L Gray; Dale W Griffin; Christopher P Higgins; Michelle L Hladik; Luke R Iwanowicz; Celeste A Journey; Kathryn M Kuivila; Jason R Masoner; Carrie A McDonough; Michael T Meyer; James L Orlando; Mark J Strynar; Christopher P Weis; Vickie S Wilson Journal: Environ Sci Technol Date: 2018-11-21 Impact factor: 9.028
Authors: Tony Ngo; Andrey V Ilatovskiy; Alastair G Stewart; James L J Coleman; Fiona M McRobb; R Peter Riek; Robert M Graham; Ruben Abagyan; Irina Kufareva; Nicola J Smith Journal: Nat Chem Biol Date: 2016-12-19 Impact factor: 15.040