| Literature DB >> 35058454 |
William K Boyes1, Bradley Beach2, Gayle Chan2,3, B Lila M Thornton4, Paul Harten5, Holly M Mortensen6.
Abstract
The US EPA Office of Research and Development (ORD) has conducted a research program assessing potential risks of emerging materials and technologies, including engineered nanomaterials (ENM). As a component of that program, a nanomaterial knowledge base, termed "NaKnowBase", was developed containing the results of published ORD research relevant to the potential environmental and biological actions of ENM. The experimental data address issues such as ENM release into the environment; fate, transport and transformations in environmental media; exposure to ecological species or humans; and the potential for effects on those species. The database captures information on the physicochemical properties of ENM tested, assays performed and their parameters, and the results obtained. NaKnowBase (NKB) is a relational SQL database, and may be queried either with SQL code or through a user-friendly web interface. Filtered results may be output in spreadsheet format for subsequent user-defined analyses. Potential uses of the data might include input to quantitative structure-activity relationships (QSAR), meta-analyses, or other investigative approaches.Entities:
Year: 2022 PMID: 35058454 PMCID: PMC8776817 DOI: 10.1038/s41597-021-01098-0
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Overview of the NKB SQL structure. The lines indicate the nature of each relationship. Each relationship is of a one-to-many nature, where the end with two lines is “one” and the end with a triangle is “many”, such as one publication being able to have many mediums.
An overview of the data tables in NKB, with a brief description of the general type or category of data collated in each table.
| Description | |
|---|---|
| publication | Identification and metadata of the published manuscript from which the data originated |
| medium | The medium in which the nanomaterial was tested (e.g. water, saline, cell culture medium, etc.) |
| additive | Any substances that may have been added to the media (e.g. FBS, strep/pen) |
| material | The composition of the nanomaterial and any physiochemical parameters reported |
| contam | Any contaminants of the test material reported |
| materialfg | Functional groups affixed to primary test material and the method of affixation |
| functionalgroup | Identities of functional groups in database (e.g. alcohol groups) |
| assay | The type of test system used (e.g. |
| parameters | Parameters manipulated in the experiment (e.g. dose/concentration tested, test time, etc.) |
| result | Measured results linked to the assays and parameters employed |
| molecularresult | Pointer to results from complex assays such as genomics, proteomics, etc., that are deposited elsewhere |
Publication table data fields.
| Description | |
|---|---|
| DOI | Unique Digital Object Identifier |
| PubTitle | Title of the publication |
| Year | Year of the publication |
| Journal | Journal of the publication |
| Volume | Volume of the journal |
| Issue | Issue of the journal |
| PageStart | Starting page number |
| PageEnd | Ending page number |
| Keywords | Keywords provided in the publication |
| Abstract | Publication Abstract |
| FirstAuthor | First and last name of first author; Middle name/initial included if included in publication author list. |
| Correspondence | Name of the author the paper indicates as handling correspondence. |
| Affiliation | Institutional affiliation of the author listed in Correspondence. |
Curation of the data began with extracting the metadata and storing it in the publication table. DOIs were used as a unique identifier for publications. Additional metadata included publication title, journal title, volume and issue numbers, page numbers, publication year, abstract, keywords, first author, the point of contact, and the affiliation of the point of contact.
Molecularresults table data fields.
| Description | |
|---|---|
| MolecularResultID | Unique (within publication) numerical identifier for the molecular result data. |
| assay_AssayID | Reference to the Assay ID of the assay this molecular result came from. |
| assay_publication_DOI | Reference to DOI of the assay’s source publication. |
| GEOAccession | Number used to access molecular result set in the NCBI Gene Expression Omnibus. |
| OrganismName | Scientific species name for the subject species. |
| SpeciesID | Unique identifier for the subject species on the NCBI Taxonomy Browser |
| AssayType | Details on the style of assay used to collect the data |
| Platform | Array, probe set, etc used to perform the assay. |
| Series | Reference number for the assay series. |
| SampleCount | Number of samples included in the results. |
| URL | Web address of the reported dataset. |
This table was an alternative to Results used to store references to results that exceeded the capacity of NKB for complexity such as genomic, proteomic or metabolomic assays. Such results were typically already deposited on outside data repositories. NKB, in these cases, provided the ENM specific aspects and experimental design considerations of these studies, which could be linked to the large datasets housed elsewhere. Rows in this table catalogued web addresses to external sources for the results in question.
| Measurement(s) | engineered nanomaterial effects |
| Technology Type(s) | digital curation |
| Factor Type(s) | physicochemical property |
| Sample Characteristic - Environment | nanomaterial |
Medium table data fields.
| Description | |
|---|---|
| MediumID | Unique (within publication) numerical identifier to link a medium to entries in other tables |
| publication_DOI | Reference to DOI of source publication |
| MediumDescription | Name of test medium (e.g. water, saline, etc.) |
Data regarding the dispersion mediums and any additives to the mediums were recorded. The Medium and Additive tables were used to track information on any medium a nanomaterial was suspended in during an experiment. Where multiple instances of media changes or particle characterizations were made over time, these were recorded in association with experimental time variables with appropriately linked experimental parameters. Mediums were uniquely identified by a combination of their source publication’s DOI and an incrementing number, MediumID, since one research publication could have studied multiple mediums. Complete medium data included the unique identification key and a description of the medium, such as a common name or a majority component (e.g. Dulbecco’s Modified Eagle Medium or water).
Additive table data fields.
| Description | |
|---|---|
| Unique (within publication) numerical identifier to link an additive to entries in other tables | |
| Name of additive | |
| Concentration of additive | |
| Units of additive | |
| Reference to Medium ID of the medium this additive was added to | |
| Reference to DOI of medium’s source publication |
Additives to a medium were recorded in the Additive table. Entries in this table were comprised of the DOI and MediumID of the medium in question, the name of the substance being added, the amount being added, and the units. A medium could have any number of additives, including zero.
Material table data fields.
| Description | |
|---|---|
| MaterialID | Unique identification code for the tested material |
| publication_DOI | Reference to DOI of source publication |
| CoreComposition | Primary composition of the tested material |
| ShellComposition | Primary composition of a shell applied to the core substance |
| CoatingComposition | Primary composition of material applied as a coating to the core substance |
| SynthesisMethod | How the ENM was made: “Original method” if original, DOI of publication if a method is cited from a publication, or name of method if a common name is used. |
| SynthesisDate | When the ENM was made |
| CASRN | CAS Registry Number of core composition |
| Supplier | Source of the material |
| ProductNumber | Manufacturer’s product number |
| LotNumber | Production lot number |
| • OuterDiameter | Seven separate fields capture summary measurement information for each of the nine ENM characteristics in the bulleted list, totalling 63 fields. The field “OuterDiameterValue” is used for non-nanotube particle size measurements. ApproxSymbol captures characters used to qualify measurements that lack precision, typically due to limitations of the instrumentation used for measurement (e.g. <, >, ~). Low and High are defined by Uncertainty. If Uncertainty describes a concept with two numbers (e.g. range), Low and High hold the endpoints. If Uncertainty requires a single value ( |
| • InnerDiameter | |
| • Length | |
| • Thickness | |
| • SurfaceArea | |
| • SizeDistribution | |
| • Purity | |
| • HydrodynamicDiameter | |
| • SurfaceCharge | |
| Shape | The shape of the original particle |
| medium_MediumID | Reference to Medium ID of the medium this material was examined in |
| medium_publication_DOI | Reference to DOI of source publication |
| ShapeInMedium | Particle shape in identified medium |
| Solubility | Particle solubility in medium |
Each entry in the material table was uniquely identified by the DOI of the source publication and an incrementing number to account for publications that studied multiple materials. Fields in this table address ENM composition, metadata (i.e., manufacturing information), and other physicochemical properties including, but not limited to, those addressed on EPA forms for submission of novel nanomaterials for registration under the Toxic Substances Control Act (TSCA) (https://www.regulations.gov/document?D=EPA-HQ-OPPT-2009-0686-0015). Note that companies were not required to generate data for these fields in order to submit TCSA registrations, only to report such data if available.
Core Composition was defined as the base material of the ENM, and any additions to the structure were recorded in Shell Composition or Coating Composition. Synthesis Method refers to a common method name or the DOI for a publication available. Core Composition was defined as the base material of the ENM, and any additions to containing the methodology. Several fields associated with large-batch or industrial scale ENM manufacturing are included: Synthesis Date, Supplier, Product Number, Lot Number, and if applicable, the Chemical Abstracts Service Registry Number (CASRN). Shape recorded the typical shape of the material, which was important for materials like carbon that varied wildly (e.g. sheets, tubes, or a simple bulk form). If the material was suspended in a medium, that medium was referenced by DOI and Medium. This allowed for important rows about medium-specific qualities, such as Shape in Medium or Solubility, to be captured. Specifically, NKB captures many quantitative characteristics for a nanomaterial, e.g. outer diameter, inner diameter, length, thickness, surface area, size distribution, purity, hydrodynamic diameter, and surface charge. Many publications report these data using summary statistics without raw data. Therefore, each ENM characteristic was described using a set of seven fields capable of capturing raw and processed data: Value ApproxSymbol, Unit, Uncertainty, Low, High, and Method. Average contained either the raw or average numeric value reported for a measurement. ApproxSymbol captured any qualifying characters (e.g. <, >, ~) denoting measurements that lacked precision, typically due to a limitation of the machine used for measurement. Unit contained the physical unit for the measurement, using standard scientific abbreviations when possible. Raw data were reported using these first three fields along with Method. The Uncertainty, Low, and High fields are used in combination to describe the spread or distribution of processed data. The Uncertainty field held statistical terms such as “range” or “standard deviation”. If the term required two endpoints, Low and High held the numeric values for those respective endpoints. For example, the Low and High of an “interquartile range” would be the first and third quartile values, respectively. If the “Uncertainty” statistic term required only one value (e.g. standard deviation), the value was recorded in Low. Finally, the technique or method used to produce the raw or processed measurements was recorded in Method (e.g., transmission electron microscopy).
Contam table data fields.
| Description | |
|---|---|
| ContamID | Unique identifier for the contaminant data point. |
| material_MaterialID | Reference to Material ID of the material in which this contaminant was found |
| material_publication_DOI | Reference to DOI of source publication |
| Contaminant | Chemical identity of the contaminant |
| ContamAmount | Measured numerical amount of the contaminant |
| ContamUnit | Units of measurement of contaminant (e.g. %, units of mass per volume) |
| ContamMethod | Analytical method to identify and measure the contaminant (e.g. ICP-MS, etc.) |
The contaminants table, “contam”, served as an addendum to the material table. The primary key was comprised of the publication DOI and MaterialID of the contaminated material. The field Contaminant listed the name of the contaminating substance. ContamAmount, ContamUnit, and ContamMethod held the information on the scale of the contaminant and the way the contamination was measured. This allowed for a material to have any number of contaminants, each detailed in its own row.
Materialfg table data fields.
| Description | |
|---|---|
| MaterialFGID | Unique identifier for the functional group-material link. |
| material_MaterialID | Material ID of the material which has a functional group attached |
| material_publication_DOI | Reference to DOI of source publication |
| functionalgroup_FunctionalGroup | Chemical identity of the functional group. |
| FunctionalizationProtocol | Technical method to functionalize the material (e.g. acid wash, etc.) |
Materialfg connects specific functional group data to the broader material data. If a material had functional groups, these were tracked in the functional group and materialfg tables. Functional group was a simple list of predefined functional groups. Each row in the materialfg table was a combination of a functional group, a material ID, a publication DOI, and the name of the functionalization protocol used to add the functional group to the material. A material could have any number of functional groups.
Assay table data fields.
| Description | |
|---|---|
| AssayID | Unique (within publication) numerical identifier to link an assay to entries in other tables |
| publication_DOI | Reference to DOI of source publication |
| AssayType | Type of assay performed (e.g. |
| AssayName | Name of the Assay performed (e.g. cell viability |
| medium_MediumID | Reference to Medium ID of the medium used in this assay |
| medium_publication_DOI | Reference to DOI of medium’s source publication |
| material_MaterialID | Reference to Material ID of the material used in this assay |
| material_publication_DOI | Reference to DOI of material’s source publication |
The experiments performed in the publication were recorded in the Assay and Parameter tables. An assay was considered to be the experiment at large, while parameters were the experimental constants (such as the species being studied) and variables (dosage concentrations or exposure durations). Rows in Assay were uniquely identified through the DOI and an incrementing ID. Assays were assigned an Assay Type from a defined list of terms like “in vitro” and “in vivo”. AssayName held the common name for the experiment being performed. Each row in the assay table referenced a material and medium by their respective DOI-ID combinations.
Parameters table data fields.
| Description | |
|---|---|
| Unique (within publication) numerical identifier to link a parameter to entries in other tables | |
| ParameterName | Parameter evaluated in the assay (e.g. dose, time, etc.) |
| ParameterNumberValue | Numerical value of the parameter. Mutually exclusive with ParameterNonNumberValue. |
| ParameterNonNumberValue | Non-numerical value of the parameter. Mutually exclusive with ParameterNumberValue (e.g. natural light, a species, etc.). |
| ParameterUnit | Unit of value (e.g. percent, millimolar) |
| assay_AssayID | Reference to Assay ID of the assay this parameter helps define |
| assay_publication_DOI | Reference to DOI of assay’s source publication |
Each assay was defined by one or more parameters, which were each stored in a row of the parameters table. All rows in the parameters table referenced an Assay by DOI and ID. Other fields included: ParameterName, ParameterNumberValue, ParameterNonNumberValue, and ParameterUnit. All parameters had a name but were restricted to either a numeric value and unit or a non-numeric value.
Results table data fields.
| Description | |
|---|---|
| ResultID | Unique (within publication) numerical identifier for the result data |
| ResultType | Type of results reported (e.g. viability) |
| ResultDetails | Any optional notes about the result |
| ResultValue | Numeric value of the reported result |
| ResultApproxSymbol | Used to note when a measurement is above or below the physical detection limits of the methods or machinery used (e.g. >, <, =) |
| ResultUnit | The units of the reported value |
| ResultUncertainty | States what uncertainty type is reported with the value, such as standard deviation or a range. |
| ResultLow | Holds the values described by Result Uncertainty. For ranges, this field holds the lower endpoint. If the uncertainty only reports one value (such as standard deviation), this field holds that value. |
| ResultHigh | Holds the upper endpoint for values described by Result Uncertainty. Is left blank for uncertainties with only one value reported. |
| assay_AssayID | Reference to the Assay ID of the assay this result came from. |
| assay_publication_DOI | Reference to DOI of the assay’s source publication. |
This table was used to record the results of an assay. Each row in Results referenced an assay by DOI and ID. Since an assay could have multiple results, each row in Results was given an incrementing ID to serve as the primary key. The ResultName field specified what kind of result, or endpoint, was being reported (e.g. size, pH, mortality, LD50, etc.). ResultDetails included any additional information that ResultName could not capture. Finally, the seven fields used to capture raw and processed measurement data from the material table were used here to describe the result measurement or assessment.