| Literature DB >> 32722414 |
Irini Furxhi1, Finbarr Murphy1.
Abstract
The practice of non-testing approaches in nanoparticles hazard assessment is necessary to identify and classify potential risks in a cost effective and timely manner. Machine learning techniques have been applied in the field of nanotoxicology with encouraging results. A neurotoxicity classification model for diverse nanoparticles is presented in this study. A data set created from multiple literature sources consisting of nanoparticles physicochemical properties, exposure conditions and in vitro characteristics is compiled to predict cell viability. Pre-processing techniques were applied such as normalization methods and two supervised instance methods, a synthetic minority over-sampling technique to address biased predictions and production of subsamples via bootstrapping. The classification model was developed using random forest and goodness-of-fit with additional robustness and predictability metrics were used to evaluate the performance. Information gain analysis identified the exposure dose and duration, toxicological assay, cell type, and zeta potential as the five most important attributes to predict neurotoxicity in vitro. This is the first tissue-specific machine learning tool for neurotoxicity prediction caused by nanoparticles in in vitro systems. The model performs better than non-tissue specific models.Entities:
Keywords: in vitro; machine learning; nanotoxicology; neurotoxicity
Year: 2020 PMID: 32722414 PMCID: PMC7432486 DOI: 10.3390/ijms21155280
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Model development workflow.
Results of toxicological endpoints from in vitro neuronal assays, extracted as outcomes for use in developing the model (list of biological endpoints not exhaustive).
| Mechanism and Results of Toxicity |
|---|
| Pro-inflammatory response ( |
| Mitochondrial effects ( |
| Cellular Uptake ( |
| Autophagy ( |
| Cell viability ( |
| BBB permeability ( |
| Genotoxicity ( |
| Oxidative stress ( |
| Morphological changes ( |
Figure 2Dataset I completeness (percentages) of input parameters and the identified outcomes (left). Dataset II completeness of inputs with one outcome: cellular viability (right).
Input final variables, type of input and information related to the labels and metrics.
| Type | Min-Max or Labels | ||
|---|---|---|---|
|
| Dose | Numeric | 0.001–800 (μgr/mL) |
| Duration | 1–168 (h) | ||
|
| Nanoparticle | Nominal | FeO, SiO2, TiO2, Ag, CuO, ZnO, GO, MnO, SWCNT, |
| Shape | Spherical, irregular, prism, cubic, nanotube, flat, oval, rod, crystalline, unknown | ||
| Zeta Potential | Numeric | −49–44 (mV), unknown | |
| Hydro_size | 14–2181 (nm), unknown | ||
| Primary size | 1–219 (nm) | ||
| Surface Area | 17–240 (m2/g), unknown | ||
|
| Cell origin | Nominal | Human, rat, mouse |
| Cell type | Endothelial, astrocytes, microglial, medulloblastoma, neuroblastoma, mesencephalic, pheochromocytoma, cerebellar granule, Schwann cells | ||
| Cell line | HCMEC, BMEC, primary, ALT, D384, SHSY5Y, N9, BV2, PC12, N2a, CGC, RSC96, N27 | ||
| Assay | Nominal | MTT, MTS, XTT, AlamarBlue, LDH, Caspase 3/7, clonogenic, CCK-8, Trypan-blue, PI, BrdU, TUNEL, NRU, Annexin_V/PI | |
|
| Cell viability | Nominal | Toxic, non-toxic |
Figure 3Skewness of numeric inputs based on different normalization methods. Four datasets were tested for their distribution of data, raw dataset (no normalization), log10, min-max, and zscore dataset.
Internal 10-fold cross validation model performance and external validation-predictivity for the ten random samples of the dataset.
| Internal Validation | ||||||||
|---|---|---|---|---|---|---|---|---|
| ACC | PREC | SENS | SPEC | F1 | MCC | ROC | ||
| Replication 1 | 96.7% | 96.8% | 96.7% | 96.8% | 0.97 | 0.94 | 0.99 |
|
| Replication 2 | 97.6% | 97.6% | 97.6% | 97.6% | 0.98 | 0.95 | 0.99 | |
| Replication 3 | 97.9% | 97.9% | 97.9% | 97.9% | 0.98 | 0.96 | 1.00 | |
| Replication 4 | 97.5% | 97.5% | 97.5% | 97.5% | 0.98 | 0.95 | 0.99 | |
| Replication 5 | 97.6% | 97.6% | 97.6% | 97.6% | 0.98 | 0.95 | 1.00 | |
| Replication 6 | 97.4% | 97.4% | 97.4% | 97.4% | 0.97 | 0.95 | 1.00 | |
| Replication 7 | 98.1% | 98.1% | 98.1% | 98.1% | 0.98 | 0.96 | 1.00 | |
| Replication 8 | 98.0% | 98.0% | 98.0% | 98.0% | 0.98 | 0.96 | 1.00 | |
| Replication 9 | 97.9% | 97.9% | 97.9% | 97.9% | 0.98 | 0.96 | 1.00 | |
| Replication 10 | 98.2% | 98.2% | 98.2% | 98.2% | 0.98 | 0.96 | 1.00 | |
|
|
|
|
|
|
|
|
| |
|
| ||||||||
|
|
|
|
|
|
|
| ||
| Replication 1 | 98.5% | 98.5% | 98.5% | 98.4% | 0.99 | 0.97 | 1.00 |
|
| Replication 2 | 98.4% | 98.5% | 98.4% | 98.5% | 0.98 | 0.97 | 0.99 | |
| Replication 3 | 96.2% | 96.2% | 96.2% | 96.1% | 0.96 | 0.92 | 0.99 | |
| Replication 4 | 99.2% | 99.2% | 99.2% | 99.3% | 0.99 | 0.98 | 1.00 | |
| Replication 5 | 98.5% | 98.5% | 98.5% | 98.4% | 0.99 | 0.97 | 1.00 | |
| Replication 6 | 99.2% | 99.2% | 99.2% | 99.0% | 0.99 | 0.98 | 1.00 | |
| Replication 7 | 96.9% | 97.0% | 96.9% | 97.0% | 0.97 | 0.94 | 0.97 | |
| Replication 8 | 98.4% | 98.5% | 98.4% | 98.4% | 0.98 | 0.97 | 1.00 | |
| Replication 9 | 99.1% | 99.1% | 99.1% | 98.7% | 0.99 | 0.98 | 1.00 | |
| Replication 10 | 98.3% | 98.3% | 98.3% | 98.3% | 0.98 | 0.97 | 1.00 | |
|
|
|
|
|
|
|
|
| |
|
| ||||||||
|
|
|
|
|
|
|
| ||
| Replication 1 | 72% | 72% | 71% | 66% | 0.72 | 0.38 | 0.73 | |
Figure 4Attribute importance evaluation based on models’ information gain.
Percentage of test set instances falling within the model AD according to different AD methods.
| AD Method | % of Test Set Falling within the Model AD |
|---|---|
| Bounding Box | 100 |
| Bounding Box PCA | 100 |
| Leverage | 100 |
| Distance from centroid | 100 |
| Distance kNN—fixed k | 100 |
| Distance kNN—variable k | 100 |
Figure 5Physicochemical characterization completeness and experimental methodology.
In silico tools available in the literature capturing neurotoxicity. Data source and number of studies regarding data compilation, its size for all tissues, level of biological organization, and final input variables used to predict an endpoint. The algorithm implemented is also shown.
| Ref. | Data Availabi-lity | Data Source | Dataset Size and Input Variables | NPs Category | Level of Biological Organisation | Endpoint—Metric | Algorithm Category |
|---|---|---|---|---|---|---|---|
| [ | Data spreadsheet provided | Literature: 307 | 1741 rows. 14 input | Cadmium-containing quantum dots | In vitro | Cell viability and IC50 Numerical | Decision Tree, RF |
| [ | Literature: 517 | 3028 rows cell viability and 837 IC50. 18 input | Bayesian, BN | ||||
| [ | Literature: 32 | 559 rows. 20 input | Metal, Metal oxide | In vivo, in vitro | Aggregated (cytotoxicity, neurological, pulmonary, fibrosis, etc.) (nominal) | ||
| [ | Not Available | Database (S2NANO): 63 | 2005 rows. 14 input | Metal (Au, Ag) | In vitro | Cell viability (Binary) | Instance Based, Decision Tree, SVM, RF |
| [ | Database (S2NANO): 216 | 6842 rows. 15 input | Metal, Metal oxide | Decision Tree, RF | |||
| [ | Data spreadsheet provided | Literature: 93 | 3 datasets: 1052 rows, 1261 rows 540 rows. 17 features | Carbon-based, Metal, Metal Oxide, Polymeric, Dendrimers, Quantum Dots | Decision Tree, DT | ||
| [ | Data spreadsheet provided | Literature: 24 | 246 rows. 12 input | Metal, Metal oxide, Polymeric | Disrupted processes (i.e., cell cycle and proliferation) (Binary) | Bayesian, BN |
In silico available tools in the literature capturing neurotoxicity (relevant features in bold). The final input variables selected for model implementation are categorized in theoretical descriptors, p-chem properties, exposure attributes and in vitro characteristics.
| Input Variables | ||||
|---|---|---|---|---|
| Ref. | Theoretical Descriptors | P-Chem Properties | Exposure Attributes | In Vitro Characteristics |
| [ | - | Source, core, shell, diameter, surface ligand, surface charge, surface modification | Exposure dose and time | Cell anatomical type (epithelial, |
| [ | Shell, core, source, diameter, surface ligand, surface charge, surface modification, ligand chemical | Exposure dose and time, delivery type | Cell source species (human, rat etc.), anatomical type (epithelial, | |
| [ | Shape, NP type, dissolution, surface area, surface charge, coating, surface reactivity, aggregation, particle size | Administration route | Study type (in vitro or in vivo) | |
| [ | NPs type, shape, core size, hydrodynamic size (W), surface charge (W), specific surface area, coating | Exposure dose and time | toxic assay method (MTT, MTS, etc.), cell lines (A549, | |
| [ | Quantum Chemical properties | Core size, Surface charge, Hydrodynamic size, Specific surface area, Formation enthalpy, Conduction band energy, Valence band energy, Electronegativity, NP material | Assay (MTT, MTS), cell species (human, mouse, hamster etc.), cell origin (lung, blood, skin, | |
| [ | - | NP type, core, surface coating, diameter, surface charge | Cell-type (L929, | |
| [ | - | NPs type, core size, shape, coatings, surface area, zeta potential | cell line (HEPG2, VSMC, HACAT etc.), type (cancer, normal), tissue (kidney, | |
In silico available tools in the literature capturing neurotoxicity. Attribute importance methodology along with the top five and lowest five attributes are shown for relevant outcomes (in bold). In addition, applicability domain and performance metrics and results are shown.
| Ref. | Attribute Importance Method | Top 5 Important Attributes | Lowest 5 Important Attributes | Applicability Domain | Performance |
|---|---|---|---|---|---|
| [ | Random Permutation | - | Cell viability: R2 = 0.70. | ||
| [ | Sensitivity analysis | - | (cross) Cell viability: F1 = 0.86 | ||
| [ | Nanoparticle>surface coatings>surface area>Aggregation>Particle size>surface charge>shape>surface reactivity>dissolution | - | (no cross) ACC: 72% | ||
| [ | Weights by chi square statistic method. | Dose>cell line>core size>surface charge>specific surface area. | Shape<species<type of cells<material<time<assay | k-nearest neighbours’ algorithm—weighted Euclidean distance | (only external) Dataset with best results: ACC: 87.0%, SENS: 61.3%, F1 score: 65.2% |
| [ | Leave-one-out out-of-bag (OOB) errors | Dose>assay>time>surface area>core size. | Cell type<surface charge<3 quantum chemical properties (Ec<x<Ev) | (cross) Dataset with best results: ACC: 96%, F1: 93% | |
| [ | Gain ratio algorithm | Not specified, generation of several decision trees | - | (cross) Generation of several decision trees | |
| [ | Sensitivity analysis | Normalized information gain for each model outcome. In general NP type, exposure dose and in vitro characteristics ranked first among the other variables. | - | Three different models with 9 different outcomes. | |