SUMMARY: As large-scale metabolic phenotyping studies become increasingly common, the need for systemic methods for pre-processing and quality control (QC) of analytical data prior to statistical analysis has become increasingly important, both within a study, and to allow meaningful inter-study comparisons. The nPYc-Toolbox provides software for the import, pre-processing, QC and visualization of metabolic phenotyping datasets, either interactively, or in automated pipelines. AVAILABILITY AND IMPLEMENTATION: The nPYc-Toolbox is implemented in Python, and is freely available from the Python package index https://pypi.org/project/nPYc/, source is available at https://github.com/phenomecentre/nPYc-Toolbox. Full documentation can be found at http://npyc-toolbox.readthedocs.io/ and exemplar datasets and tutorials at https://github.com/phenomecentre/nPYc-toolbox-tutorials.
SUMMARY: As large-scale metabolic phenotyping studies become increasingly common, the need for systemic methods for pre-processing and quality control (QC) of analytical data prior to statistical analysis has become increasingly important, both within a study, and to allow meaningful inter-study comparisons. The nPYc-Toolbox provides software for the import, pre-processing, QC and visualization of metabolic phenotyping datasets, either interactively, or in automated pipelines. AVAILABILITY AND IMPLEMENTATION: The nPYc-Toolbox is implemented in Python, and is freely available from the Python package index https://pypi.org/project/nPYc/, source is available at https://github.com/phenomecentre/nPYc-Toolbox. Full documentation can be found at http://npyc-toolbox.readthedocs.io/ and exemplar datasets and tutorials at https://github.com/phenomecentre/nPYc-toolbox-tutorials.
Metabolic phenotyping offers a powerful window into gene-environment interactions (Nicholson ). Inter-study comparison in the field is complicated by the diversity of analytical platforms used to generate data, and the lack of standard quality criteria. Standards are emerging around the most common platforms: Nuclear Magnetic Resonance spectroscopy (NMR), and hyphenated-Mass Spectrometry (MS), and procedures for the acquisition of profiles from human biofluid samples in particular are well established (Dona ; Lewis ). However, QC in profiling studies has typically been conducted on an ad-hoc basis in individual studies, although there is an increasing push towards the systematization and automation of pre-processing procedures (Giacomoni ; van Rijswijk ).The toolbox presented here provides software for pre-processing, QC and visualization of metabolic profiling datasets, embodying the MRC-NIHR National Phenome Centre (NPC) practices and focusing on the interpretability of the output to both data generators and analysts (Fig. 1).
Fig. 1.
Conceptual diagram outlining the workflow embodied by the toolbox, from import of raw or feature-extracted datasets, preprocessing and filtering, QC, visualization and export
Conceptual diagram outlining the workflow embodied by the toolbox, from import of raw or feature-extracted datasets, preprocessing and filtering, QC, visualization and export
2 The nPYc-Toolbox
2.1 Implementation
The toolbox is designed to allow reproducible processing of datasets with minimal reliance on human judgement during the process. It may be used interactively (e.g. in a Jupyter notebook, for which tutorials are provided), or as an API in automated workflows. It is coded in Python 3.6. To account for the differing processing workflows expected of the common analytical datasets outlined above, the toolbox subclasses its Dataset object; the NMRDataset encapsulates methods for handling spectral NMR data; MSDataset for discretely measured (peak-picked) hyphenated-MS profiling datasets; and TargetedDataset for targeted, quantified datasets, derived from MS, NMR or any other analytical platform.
2.2 Features
Dataset objects are initialized from raw (Bruker NMR) or feature-extracted data [outputs of software such as XCMS (Tautenhahn ), Progenesis QITM, TargetLynxTM, &c], and associated with study design parameters or metadata read directly from the raw data or from csv files. The csv template is structured so that each row corresponds to a sample, and columns contain a set of mandatory fields, and any other user required metadata. The role that each sample plays in the assay and its pre-processing is delineated using a standardized nomenclature.Routines for pre-processing 1D NMR spectra by the automated calculation of QC metrics assessing line-width, water suppression and baseline stability are implemented (as described by Dona ). Current best-practices in QC of profiling LC-MS (Broadhurst ; Dunn ; Lewis ; Want ) include repeated injections of pooled quality control samples, and a serial dilution of the reference sample to calculate per feature analytical precision and linearity of response. Correction of run-order effects follows an adapted version of the LOWESS approach proposed by Dunn . The targeted pre-processing module contains a set of reports and data consistency checks, to assist analysts in assessing the presence of batch effects, standardizing the linearity range over multiple batches, and visualizing the distribution ranges of samples assayed and relationships within the limits of quantification.Exploratory data analysis with PCA is used to assess the impact of the QC choices on the final dataset, and screen for associations between acquisition parameters and study factors.Parameter sets can be specified as JSON dictionaries, allowing simple automation and generation of standardized workflows with basic auditing of all manipulations in a dataset. This toolbox can therefore be used to ensure reproducible pre-processing and quality control. Processed datasets can be exported as csv files in a number of different formats.
3 Conclusion
The nPYc-Toolbox supports both profiling and targeted metabolic phenotyping datasets, and provides tools for pre-processing, quality control and visualization.
Funding
This work was supported by the Medical Research Council and National Institute for Health Research [grant number MC_PC_12025] through funding for the MRC-NIHR National Phenome Centre, infrastructure support was provided by the NIHR Imperial Biomedical Research Centre and PhenoMeNal, European Commission Horizon2020 programme, grant agreement number 654241.Conflict of Interest: none declared.
Authors: Elizabeth J Want; Ian D Wilson; Helen Gika; Georgios Theodoridis; Robert S Plumb; John Shockcor; Elaine Holmes; Jeremy K Nicholson Journal: Nat Protoc Date: 2010-06 Impact factor: 13.491
Authors: Warwick B Dunn; David Broadhurst; Paul Begley; Eva Zelena; Sue Francis-McIntyre; Nadine Anderson; Marie Brown; Joshau D Knowles; Antony Halsall; John N Haselden; Andrew W Nicholls; Ian D Wilson; Douglas B Kell; Royston Goodacre Journal: Nat Protoc Date: 2011-06-30 Impact factor: 13.491
Authors: Jeremy K Nicholson; Elaine Holmes; James M Kinross; Ara W Darzi; Zoltan Takats; John C Lindon Journal: Nature Date: 2012-11-15 Impact factor: 49.962
Authors: Matthew R Lewis; Jake T M Pearce; Konstantina Spagou; Martin Green; Anthony C Dona; Ada H Y Yuen; Mark David; David J Berry; Katie Chappell; Verena Horneffer-van der Sluis; Rachel Shaw; Simon Lovestone; Paul Elliott; John Shockcor; John C Lindon; Olivier Cloarec; Zoltan Takats; Elaine Holmes; Jeremy K Nicholson Journal: Anal Chem Date: 2016-08-26 Impact factor: 6.986
Authors: Anthony C Dona; Beatriz Jiménez; Hartmut Schäfer; Eberhard Humpfer; Manfred Spraul; Matthew R Lewis; Jake T M Pearce; Elaine Holmes; John C Lindon; Jeremy K Nicholson Journal: Anal Chem Date: 2014-09-16 Impact factor: 6.986
Authors: David Broadhurst; Royston Goodacre; Stacey N Reinke; Julia Kuligowski; Ian D Wilson; Matthew R Lewis; Warwick B Dunn Journal: Metabolomics Date: 2018-05-18 Impact factor: 4.290
Authors: Meghana D Gadgil; Monika Sarkar; Caroline Sands; Matthew R Lewis; David M Herrington; Alka M Kanaya Journal: Diabetes Res Clin Pract Date: 2022-03-12 Impact factor: 8.180
Authors: Benjamin J Blaise; Gonçalo D S Correia; Gordon A Haggart; Izabella Surowiec; Caroline Sands; Matthew R Lewis; Jake T M Pearce; Johan Trygg; Jeremy K Nicholson; Elaine Holmes; Timothy M D Ebbels Journal: Nat Protoc Date: 2021-07-28 Impact factor: 13.491
Authors: Meghana D Gadgil; Alka M Kanaya; Caroline Sands; Matthew R Lewis; Namratha R Kandula; David M Herrington Journal: Diabet Med Date: 2020-12-25 Impact factor: 4.359
Authors: Ravi Mehta; Elena Chekmeneva; Heather Jackson; Caroline Sands; Ewurabena Mills; Dominique Arancon; Ho Kwong Li; Paul Arkell; Timothy M Rawson; Robert Hammond; Maisarah Amran; Anna Haber; Graham S Cooke; Mahdad Noursadeghi; Myrsini Kaforou; Matthew R Lewis; Zoltan Takats; Shiranee Sriskandan Journal: Med (N Y) Date: 2022-01-31
Authors: Eric L Harshfield; Caroline J Sands; Anil M Tuladhar; Frank Erik de Leeuw; Matthew R Lewis; Hugh S Markus Journal: Brain Date: 2022-07-29 Impact factor: 15.255
Authors: Panteleimon G Takis; Beatriz Jiménez; Caroline J Sands; Elena Chekmeneva; Matthew R Lewis Journal: Chem Sci Date: 2020-05-27 Impact factor: 9.825