| Literature DB >> 26665069 |
Richard L Marchese Robinson1, Mark T D Cronin1, Andrea-Nicole Richarz1, Robert Rallo2.
Abstract
Analysis of trends in nanotoxicology data and the development of data driven models for nanotoxicity is facilitated by the reporting of data using a standardised electronic format. ISA-TAB-Nano has been proposed as such a format. However, in order to build useful datasets according to this format, a variety of issues has to be addressed. These issues include questions regarding exactly which (meta)data to report and how to report them. The current article discusses some of the challenges associated with the use of ISA-TAB-Nano and presents a set of resources designed to facilitate the manual creation of ISA-TAB-Nano datasets from the nanotoxicology literature. These resources were developed within the context of the NanoPUZZLES EU project and include data collection templates, corresponding business rules that extend the generic ISA-TAB-Nano specification as well as Python code to facilitate parsing and integration of these datasets within other nanoinformatics resources. The use of these resources is illustrated by a "Toy Dataset" presented in the Supporting Information. The strengths and weaknesses of the resources are discussed along with possible future developments.Entities:
Keywords: ISA-TAB-Nano; databases; nanoinformatics; nanotoxicology; quantitative structure–activity relationship (QSAR)
Year: 2015 PMID: 26665069 PMCID: PMC4660926 DOI: 10.3762/bjnano.6.202
Source DB: PubMed Journal: Beilstein J Nanotechnol ISSN: 2190-4286 Impact factor: 3.649
Figure 1A schematic illustration of the links between ISA-TAB-Nano files. Biological or material samples are prepared for measurements in biological or physicochemical assays respectively. Assay files link measurement values with prepared sample identifiers (“Sample Name” values). Study files describe sample preparation. Material files describe the nanomaterials obtained for testing, denoted via their “Material Source Name” identifiers. N.B. Italic font denotes generic names, e.g., “Factor Value [test material]” is replaced with “Factor Value [nanomaterial]” in the NanoPUZZLES in vitro cell-based Study file template.
Summary of challenges with the generic ISA-TAB-Nano specification which were addressed in the current work.
| no. | challenge | Applicable, in principle, to any format rather than being specific to ISA-TAB or ISA-TAB-Nano? | Applicable to ISA-TAB? | Applicable to ISA-TAB-Nano? |
| 1 | Standardised reporting of stepwise sample preparation needs to be established. | × | × | × |
| 2 | Ambiguity exists regarding where different kinds of information should be recorded. | — | × | × |
| 3 | Standardised recording of imprecisely reported experimental variables and measurements is required. | × | × | × |
| 4 | Ambiguity exists regarding the creation of “Comment […]” fields. | × | × | × |
| 5 | Statistical terms need to be clearly defined. | × | ×a | ×a |
| 6 | Ambiguity exists regarding how to link to terms from ontologies. | — | — | × |
| 7 | Ambiguity exists regarding whether or not “Parameter Value” or “Factor Value” column entries must be constant or not constant respectively. | — | × | × |
| 8 | Linking to images reported in publications is challenging. | × | × | × |
| 9 | Standardised reporting of multiple component “characteristics”, “factors”, and “parameters” (e.g. mixtures) needs to be established. | — | × | × |
| 10 | A standardised means of linking multiple “external” files to a given Material file is required. | — | — | × |
| 11 | Greater clarity regarding the existence of “unused” factors, parameters and measurement names in the Investigation file is required. | — | ×a | × |
| 12 | A standardised approach for dealing with “non-applicable” metadata is required. | × | × | × |
| 13 | The concept of an “investigation” should be more tightly defined for the purpose of collecting data from the literature. | — | — | × |
| 14 | Clearly defined minimum information criteria are required. | × | × | × |
aIt should be noted that ISA-TAB is not designed to record experimental measurements in Assay files, i.e., the “Measurement Value [statistic(measurement name)]” Assay file columns and the corresponding Investigation file “Study Assay Measurement Name” field are an ISA-TAB-Nano extension [17,37,39]. However, regarding the issue of clearly defining statistical terms (challenge no. 5), ISA-TAB datasets may include “external” data files (i.e., “external” to the basic Investigation, Study and Assay file types) such as “data matrix” files which may include statistical terms such as “p-value” [36,43]. Standardisation of statistical terms may be achieved via using terms from the STATistics Ontology (STATO) [44]. The challenge noted here (challenge no. 5) regarding clearly defining statistical terms concerns how to appropriately create links to ontologies for these terms in ISA-TAB-Nano datasets.
Categories of physicochemical information which the NanoPUZZLES ISA-TAB-Nano templates were designed to capture.
| category | template(s) | comments |
| chemical composition (including surface composition, purity and levels of impurities) | “m_MaterialSourceName.xls” | Only chemical composition information associated with the original / vendor supplied nanomaterial should be reported here, i.e., not adsorption data (see below). |
| crystal structure/ | “m_MaterialSourceName.xls”; “a_InvID_PC_crystallinity_Method.xls” | — |
| shape | “m_MaterialSourceName.xls”; “a_InvID_PC_shape_Method.xls” | Both qualitative descriptions of shape or “aspect ratio” data [ |
| particle size/ | “m_MaterialSourceName.xls”; “a_InvID_PC_size_Method.xls”; “a_InvID_PC_size_DLS.xls”; “a_InvID_PC_size_TEM.xls” | Dynamic light scattering (DLS) [ |
| surface area | “m_MaterialSourceName.xls”; “a_InvID_PC_surface area_Method.xls” | This was designed to record “specific surface area” values, i.e., surface area per unit mass [ |
| surface charge/ | “m_MaterialSourceName.xls”; “a_InvID_PC_zetapotential_Method.xls” | Zeta potential is commonly used as a proxy for surface charge [ |
| adsorption | “a_InvID_PC_adsorption_Method.xls” | This was designed to record “adsorption constants” [ |
| reactivity | “a_InvID_PC_reactivity.rateofchange_of.X_SeparationTechnique_Method.xls” | The design of this template reflects the fact that, for some reactivity assays, the analysed species needs to be removed prior to making measurements [ |
| dissolution | (1) “a_InvID_PC_dissolution.conc_of.X_SeparationTechnique_Method.xls” ; | The design of these templates reflects the fact that a number of different kinds of dissolution measurement may be made for inorganic nanoparticles: (1) the (time dependent) concentrations of various species released by dissolution [ |
| molecular solubility | “a_InvID_PC_solubility_Method.xls” | In the current context, the Chemical Methods Ontology definition of “solubility” [ |
| agglomeration/ | “a_InvID_PC_AAN_BETapproach.xls” | This template was designed for recording the “average agglomeration number” derived from BET gas adsorption data, size measurements and particle density values [ |
| hydrophobicity | “m_MaterialSourceName.xls”; “a_InvID_PC_logP_Method.xls” | — |
Summary of the NanoPUZZLES business rules.
| business rule no. | short description |
| 1 | A new “investigation” (corresponding to a new dataset comprising a single Investigation file, a set of Study, Assay and Material files and any “external” files if applicable) should be created for each reference (e.g., journal article), unless that reference specifically states that additional information regarding experiments on the same original nanomaterial samples was reported in another reference. |
| 2 | The “Factor Value […]” columns in the Study file refer to those values which are applicable to the sample prepared immediately prior to application of an assay protocol. |
| 3 | If the entry for a “Characteristics […]”, “Factor Value […]” or “Parameter Value […]” column corresponds to multiple components (e.g., mixtures), record this as a semicolon (“;”) delimited list of the separate components. |
| 4 | If the entry for a “Characteristics […]”, “Factor Value […]” or “Parameter Value […]” column corresponds to multiple components, record the entries in corresponding columns as a semicolon (“;”) delimited list with the entries in the corresponding order. |
| 5 | Any intrinsic chemical composition information associated with a nanomaterial sample (as originally sourced) should be recorded using a Material file even if it is determined/confirmed using assay measurements reported in the publication from which the data were extracted. |
| 6 | Any suspension medium associated with the nanomaterial sample (as originally sourced) should only be described using a Material file “Material Description” column. |
| 7 | Any impurities should be described using entries in the relevant Material file “Characteristics [….]” columns. |
| 8 | Any original nanomaterial components, which are neither a suspension medium nor described as “impurities” in the reference from which the data are extracted, should be described using separate rows of the Material file as per the generic ISA-TAB-Nano specification. |
| 9 | All “Sample Name” values for “true samples” should have the following form: “s_[Study Identifier]_[x]”, e.g., “s_[Study Identifier]_1”a |
| 10 | Assay file “Measurement Value […]” column entries which correspond to concentration-response curve statistics, or similarly derived measures, should be associated with a “derived sample” identifier rather than a “true sample” identifier. |
| 11 | Imprecisely reported experimental variables should be reported using “Factor Value [statistic(original factor name)]” columns created “on-the-fly”. |
| 12 | Imprecisely reported measurement values should be reported using “Measurement Value [statistic(measurement name)]” columns created “on-the-fly”. |
| 13 | “Comment […]” columns (rows) can be added without restriction to a Study, Assay, Material (Investigation) file as long as they are appropriately positioned and as long as each new “Comment […]” column (row) has a unique name for a given file. |
| 14 | All “statistic” names must be entered in the corresponding Investigation file template “Comment [Statistic name]” row. |
| 15 | When linking to terms from ontologies, the “preferred name” should be selected and the full ID entered in the corresponding “Term Accession Number” field. |
| 16 | “Factor Value […]” column entries are allowed to be constant. |
| 17 | Only “Parameter Value […]” column entries associated with a given “Protocol REF” column entry in a Study or Assay file need to be constant. |
| 18 | Images should be linked to assay measurements using a new “ImageLink” file type, if the generic ISA-TAB-Nano approach cannot be applied. |
| 19 | Any nanomaterial structure representation files, which are not associated with specific Assay file “Measurement Value […]” entries, should be linked to the corresponding Material file using ZIP archives specified in the appropriate “Material Data File” column entry. |
| 20 | Empty “Factor Value […]”, “Parameter Value […]” or “Measurement Value […]” columns in Study or Assay files can be deleted without having to update the corresponding Investigation file “Study Protocol Parameters Name”, “Study Factor Name”, or “Study Assay Measurement Name” fields. |
| 21 | Non-applicable columns should be populated with “N/A” where this conveys information. |
| 22 | “Measurement Value [statistic(measurement name)]” columns in the templates which use a label of the form “[TO DO:…]” for the statistic or measurement name must either be updated, based on the kind of statistic and/or measurement name indicated by the label(s), or deleted. |
aHere, the “[Study Identifier]” [37] is unique to the corresponding Study file and “[x]” denotes a numeric value which is specific to a given “true sample”, meaning a prepared sample corresponding to a specific set of experimental conditions, in contrast to the “derived sample” concept introduced in NanoPUZZLES business rule no. 10.
Figure 2Schematic overview of the steps carried out by the Python program for converting Excel (“xls”) based ISA-TAB-Nano datasets to tab-delimited text (“txt”) based ISA-TAB-Nano datasets. For simplicity, only one Investigation, Study, Assay and Material file (and no external file such as an image) is included in this hypothetical dataset. In addition to the file processing steps summarised in this schematic, basic checks are carried out on the input: (1) there should be at least one Investigation, Study, Assay and Material file; (2) there should be no duplicate column titles in a Study, Assay or Material file other than those which are explicitly allowed by the ISA-TAB-Nano specification (e.g., “Unit”).
Figure 5A summary of the in vitro cell-based assay toy data in the “Toy Dataset” (Supporting Information File 4) generated via the nanoDMS system. This summary can be generated via selecting the applicable dataset entry under the "Browse" menu of the nanoDMS system.
Figure 6A summary of the physicochemical assay toy data recorded in the “Toy Dataset” (Supporting Information File 4), generated via the nanoDMS system as per Figure 5. This does not include the hypothetical chemical composition and nominal/vendor supplied data recorded in the Material files.
Summary of some notable limitations of the NanoPUZZLES templates and business rules.
| limitation no. | brief description |
| 1 | Standardised reporting of stepwise sample preparation is still not handled perfectly. |
| 2 | Time dependent physicochemical characterisation data may not be perfectly captured by the templates. |
| 3 | Recording of reaction rate constants and quantum yields may need revision. |
| 4 | The manner in which chemical composition information is captured via the templates may require revision. |
| 5 | There is the possibility of information loss when mapping (raw) data reported in the literature onto predefined “Measurement Value […]” columns. |
| 6 | The current templates are not best suited to capturing experimental data for all kinds of samples. |
| 7 | The business rules regarding multiple component “characteristics”, “factors” or “parameters” (e.g., mixtures) may require revision. |
| 8 | The templates are not currently designed to capture data from in vivo toxicology studies. |
| 9 | Manually populating the Excel templates is time consuming and error prone. |