| Literature DB >> 29942350 |
Kimberly T To1, Rebecca C Fry2, David M Reif1,3,4.
Abstract
BACKGROUND: The Toxicological Priority Index (ToxPi) is a method for prioritization and profiling of chemicals that integrates data from diverse sources. However, individual data sources ("assays"), such as in vitro bioassays or in vivo study endpoints, often feature sections of missing data, wherein subsets of chemicals have not been tested in all assays. In order to investigate the effects of missing data and recommend solutions, we designed simulation studies around high-throughput screening data generated by the ToxCast and Tox21 programs on chemicals highlighted by the Agency for Toxic Substances and Disease Registry's (ATSDR) Substance Priority List (SPL), which helps prioritize environmental research and remediation resources.Entities:
Keywords: Chemical prioritization; Imputation; Missing data; Multiple imputation; Simulation; ToxCast; ToxPi
Year: 2018 PMID: 29942350 PMCID: PMC5998548 DOI: 10.1186/s13040-018-0169-5
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1Conceptual overview of the simulation process and experimental design. Assays were randomly sampled from the original data based on a desired number of assays and assay sources (slices) so that the simulated datasets contained a subset of assays with arbitrarily assigned sources and all of 426 chemicals present in the original dataset. Simulated datasets were imputed and ToxPi profiles were calculated, with an overall summed ToxPi score given for analysis
Fig. 3Comparison of Imputation Methods by Toxpi Score. a Root Mean Square Error between Imputed Simulated Data ToxPi Scores and Imputed Raw Data Chemical Scores. After imputation and ToxPi calculation, scores were compared to the ToxPi scores using the standard “0” method. RMSE density distributions are separated by imputation method. The distribution of kNN is centered at the lowest RMSE compared to the other methods. Binomial, LLS, and Mean imputation are heavily overlapped. SVD is centered similarly, but shows a wider spread. Maximum imputation has the largest RMSE. b ToxPi Score Variance of Imputed Simulated Data ToxPi Scores. Amongst 1000 replicate simulations, the variance for each of 426 chemicals was calculated. Compared to SVD, all other methods present relatively low variability from chemical to chemical. SVD has an extremely wide range of ToxPi Score variability
Fig. 2Comparison of Imputation Methods Using ToxPi priority ranks. Mean ToxPi Rank Change between Imputed Simulated Data and Imputed Raw Data. Rank change was calculated by using the magnitude of difference between individual chemical ranks in the imputed raw data and the chemical ranks from each simulated dataset. Binomial, LLS, Maximum, and Mean show small magnitudes of change in rank. kNN shows a wider variation in rank change and therefore represents less stability in the method. Minimum value imputation and SVD present wider ranges in rank change, although the magnitude of change is smaller than kNN
Values for the range of RMSE and range of Rank Change for each of 7 imputation methods, separated by the number of assays per slice
| Slices/Assay | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | Measure | 1 | 5 | 10 | 15 | 20 | 30 | 50 | 90 | 125 |
| Binomial | Rank Change | (-210,-40) | (-150,2) | (-74,4.2) | (-37,4.5) | (-26,4.6) | (-15,4.6) | (-8.8,4.6) | (3.2,4.6) | (3.1,4.6) |
| Binomial | RMSE | (0.024,0.35) | (0.058,0.25) | (0.083,0.27) | (0.086,0.24) | (0.12,0.26) | (0.12,0.25) | (0.13,0.24) | (0.15,0.24) | (0.16,0.24) |
| kNN | Rank Change | (-210,-57) | (-210,-59) | (-210,-55) | (-210,-55) | (-210,-62) | (-210,-61) | (-210,-57) | (-210,-48) | (-210,-45) |
| kNN | RMSE | (0.016,0.16) | (0.017,0.23) | (0.017,0.22) | (0.019,0.2) | (0.018,0.22) | (0.018,0.25) | (0.019,0.22) | (0.019,0.21) | (0.017,0.2) |
| LLS | Rank Change | (-190,-12) | (-56,-1.7) | (-27,-0.31) | (-25,1) | (-19,1.4) | (-14,1.6) | (-11,2) | (-8.7,2) | (-6.6,2) |
| LLS | RMSE | (0.019,0.23) | (0.043,0.43) | (0.077,0.41) | (0.09,0.45) | (0.054,0.42) | (0.1,0.45) | (0.13,0.31) | (0.14,0.24) | (0.14,0.23) |
| Maximum | Rank Change | (-54,-1.7) (-210,-11) | (-29,0.53) | (-22,0.88) | (-18,1.4) | (-14,1.8) | (-11,2) | (-8.3,2) | (-6.4,2) | |
| Maximum | RMSE | (0.031,0.78) | (0.36,0.76) | (0.45,0.68) | (0.46,0.67) | (0.5,0.66) | (0.54,0.65) | (0.55,0.64) | (0.57,0.63) | (0.57,0.64) |
| Mean | Rank Change | (-210,-11) | (-54,-1.7) | (-29,0.53) | (-22,0.88) | (-18,1.4) | (-14,1.8) | (-11,2) | (-8.3,2) | (-6.4,2) |
| Mean | RMSE | (0.019,0.26) | (0.043,0.23) | (0.069,0.28) | (0.08,0.24) | (0.097,0.26) | (0.1,0.24) | (0.12,0.24) | (0.14,0.23) | (0.15,0.23) |
| Minimum | Rank Change | (-210,-57) | (-180,-21) | (-25,0) | (-110,-7.5) | (-91,-4.8) | (-73,-1.9) | (-50,-0.76) | (-31,0) | (-140,-9.2) |
| Minimum | RMSE | (0.014,0.14) | (0.031,0.16) | (0.047,0.15) | (0.065,0.16) | (0.074,0.16) | (0.086,0.17) | (0.093,0.17) | (0.11,0.17) | (0.11,0.17) |
| SVD | Rank Change | (-210,-49) | (-160,-17) | (-130,-5.5) | (-97,-4.4) | (-81,-3.4) | (-64,-1.2) | (-51,-0.15) | (-34,0.15) | (-25,0.15) |
| SVD | RMSE | (0.02,0.46) | (0.04,0.62) | (0.073,0.54) | (0.097,0.54) | (0.096,0.57) | (0.11,0.54) | (0.1,0.54) | (0.12,0.53) | (0.11,0.53) |