| Literature DB >> 35998659 |
Gulnara Shavalieva1, Stavros Papadokonstantakis1,2, Gregory Peters3.
Abstract
Early assessment of the potential impact of chemicals on health and the environment requires toxicological properties of the molecules. Predictive modeling is often used to estimate the property values in silico from pre-existing experimental data, which is often scarce and uncertain. One of the ways to advance the predictive modeling procedure might be the use of knowledge existing in the field. Scientific publications contain a vast amount of knowledge. However, the amount of manual work required to process the enormous volumes of information gathered in scientific articles might hinder its utilization. This work explores the opportunity of semiautomated knowledge extraction from scientific papers and investigates a few potential ways of its use for predictive modeling. The knowledge extraction and predictive modeling are applied to the field of acute aquatic toxicity. Acute aquatic toxicity is an important parameter of the safety assessment of chemicals. The extensive amount of diverse information existing in the field makes acute aquatic toxicity an attractive area for investigation of knowledge use for predictive modeling. The work demonstrates that the knowledge collection and classification procedure could be useful in hybrid modeling studies concerning the model and predictor selection, addressing data gaps, and evaluation of models' performance.Entities:
Year: 2022 PMID: 35998659 PMCID: PMC9472271 DOI: 10.1021/acs.jcim.1c01079
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 6.162
Figure 1Knowledge extraction method for a specific domain.[37]
Figure 2Schematic representation of the use of prior knowledge in the development of hybrid models: PK, prior knowledge; PKM, prior knowledge model (GC+QSAR)[37]
Figure 3Correlation of data set toxicity (−ln(LC50/EC50)) values with molecular weight.
Set of Rules for Evaluation of the Performances of the Models[37]
| main toxicity trends | expressed in descriptors |
|---|---|
| Toxicity increases with hydrophobicity.[ | Toxicity increases with an increase of MolLogP (RDkit). |
| Toxicity
increases
with polarizability.[ | Toxicity increases with an increase of molar refractivity MR (RDkit). |
| Toxicity decreases with an increase of GATS 1p (PaDELPy). | |
| Toxicity increases with an increase of AATSC0p (PaDELPy). | |
| Toxicity has
a negative
correlation with topological polar surface area.[ | Toxicity decreases with an increase in TPSA (RDkit). |
| Most
of the
toxic compounds act as hydrogen-bonding acceptors, while the least
toxic compounds act mainly as hydrogen-bonding donors.[ | Toxic compounds have lower SHBd (PaDELPy). |
| Toxic compounds have lower maxHBint2 (PaDELPy). | |
| There is a positive effect
of unsaturation and electronegative atom count.[ | Toxicity decreases with an increase of ETA_dEpsilon_A (PaDELPy). |
| Toxicity decreases
with increase in ionization potential.[ | Toxicity decreases when Mi (PaDELPy) increases. |
| The larger the “GATS1i” (PaDELPy), the less likely the compound will be to react and generate toxicity. | |
| Molecular size
and bulk have positive influences on toxicity.[ | With an increase of MW (RDkit), the toxicity increases. |
| Toxicity is higher for higher values of ETA_Alpha (PaDELPy). | |
| There is an inverse
effect
of branching on toxicity.[ | Toxicity decreases with an increase of ETA_EtaP_B (PaDELPy). |
| Toxicities
of primary, secondary,
and dimethyl tertiary amines increase with increasing chain length.[ | Toxicity of molecules containing N or amine group increases if the number of carbon atoms increases. |
| Toxicity increases with
increasing alkyl chain length in ethoxylates.[ | Toxicity of molecules containing the methoxy group increases if the number of carbon atoms increases. |
| Substitution of H atom with
a methyl group (−CH3) on the N atom reduces the toxicity of
amine surfactants.[ | The toxicity of molecules decreases with the number of N–CH3 fragments. |
Figure 4Knowledge classification scheme for aquatic toxicities of chemicals (*excluding inorganics, metals and metalloorganic compounds, ionic liquids, epoxides, peroxides, and mixtures).[37]
Examples of the Quantitative Knowledge Extracted as Part of the Classification Scheme
| applicability domain | model type | end point | descriptors | performance | ref |
|---|---|---|---|---|---|
| global | linear modeling (MLR+GA) | –logLC50 | AlogP, ELUMO, S2K, nRNH2 | Pavan et al. (2006)[ | |
| MoA, specifically acting chemicals | linear modeling | log(1/LC50) | Raevsky et al. (2009)[ |
Selected performance indicators from the respective original article.
Examples of the Qualitative Knowledge Extracted as Part of the Classification Scheme
| applicability domain | species, end point | extracted knowledge | ref |
|---|---|---|---|
| substituted benzenes | positive correlation with end point: | Gupta et al. (2015)[ | |
| pIGC50 | - MW, | ||
| - nAtomP, | |||
| - TopoPSA | |||
| negative correlation | |||
| - SHdsCH, | |||
| lipoaffinity index | |||
| pharmaceuticals | higher toxicity
to | Kar et al. (2018)[ | |
| LC50 | - keto group | ||
| - aasC fragment | |||
| higher toxicity to fish: | |||
| - keto group, | |||
| - X=C=X fragment, | |||
| - R–C(=X)–X fragment, | |||
| R–C≡X fragment |
Summary of the Performances of the Modelsa
| model | Spr_m | accuracy | precision | recall | |
|---|---|---|---|---|---|
| DESC_2 | 0.83 | 0.94 | 0.87 | 0.96 | 0.96 |
| DESC_4 | 0.85 | 0.95 | 0.87 | 0.96 | 0.96 |
| DESC_5 | 0.85 | 0.95 | 0.88 | 0.97 | 0.97 |
| DESC _6 | 0.86 | 0.95 | 0.87 | 0.96 | 0.96 |
| DESC _8 | 0.86 | 0.95 | 0.85 | 0.96 | 0.96 |
| DESC_H0_3 | |||||
| DESC_H0_7 | |||||
| DESC_H1_2 | |||||
| FPN_2 | 0.70 | 0.84 | 0.56 | 0.85 | 0.84 |
| FPN_5 | 0.74 | 0.86 | 0.53 | 0.84 | 0.85 |
| FPN_7 | 0.74 | 0.86 | 0.50 | 0.83 | 0.84 |
| FPN_12 | 0.73 | 0.86 | 0.51 | 0.84 | 0.85 |
| FPN_14 | 0.73 | 0.86 | 0.50 | 0.84 | 0.85 |
| FPN_H2_2 | 0.43 | 0.80 | |||
| FPN_H2_7 | 0.45 | 0.80 | |||
| FPN_H2_12 | 0.46 | 0.80 | |||
| FPN_H3_12 | 0.52 | 0.76 | 0.54 | 0.87 | 0.88 |
| FPN_H3_14 | 0.53 | 0.76 | 0.53 | 0.86 | 0.88 |
DESC and FPN specify the descriptor-based and fingerprint-based models, respectively. H0, H1, H2, and H3 are the applied types of hybridization models. The designations _2 to _14 indicate the numbers of closest neighbors used for the prediction. The classification metrics uses three labels (T, PT, and NT) and considers the label imbalance.
Figure 5Spearman’s correlation coefficient between the descriptors (Table ) and toxicity predictions made by the models. Red: positive correlation with toxicity, blue: negative. Only two descriptor and fingerprint-based models (best and worst) are shown due to the similar performance of the rest of the standard models to the presented ones.[37]
Figure 6Spearman correlation coefficient (Spr_m) (left) and accuracy scores (right) vs Rule affinity for the standard and hybrid models.