| Literature DB >> 31984317 |
Tom J Pollard1, Alistair E W Johnson1, Jesse D Raffa1, Roger G Mark1.
Abstract
OBJECTIVES: In quantitative research, understanding basic parameters of the study population is key for interpretation of the results. As a result, it is typical for the first table ("Table 1") of a research paper to include summary statistics for the study data. Our objectives are 2-fold. First, we seek to provide a simple, reproducible method for providing summary statistics for research papers in the Python programming language. Second, we seek to use the package to improve the quality of summary statistics reported in research papers.Entities:
Keywords: descriptive statistics; python; quantitative research
Year: 2018 PMID: 31984317 PMCID: PMC6951995 DOI: 10.1093/jamiaopen/ooy012
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Example of a table produced by the tableone package when applied to a small subset of data from MIMIC-III
| Variables | Level | Is null | Overall |
|---|---|---|---|
| 1000 | |||
| Age (years), median (IQR) | 0 | 68 (53–79) | |
| SysABP (mmHg), mean (SD) | 291 | 114.25 (40.16) | |
| Height (cm), mean (SD) | 475 | 170.09 (22.06) | |
| Weight (pounds), mean (SD) | 302 | 82.93 (23.83) | |
| ICU type, | CCU | 0 | 162 (16.2) |
| CSRU | 202 (20.2) | ||
| MICU | 380 (38.0) | ||
| SICU | 256 (25.6) | ||
| In-hospital mortality, | 0 | 0 | 864 (86.4) |
| 1 | 136 (13.6) |
Warnings about inappropriate summaries of the data are raised during generation and displayed below the table.
Warning, Hartigans Dip Test reports possible multimodal distributions for: Age, Height, SysABP.
Warning, Tukey rule indicates far outliers in: Height.
IQR: interquartile range; SysABP: systolic arterial blood pressure; ICU: intensive care unit.
Figure 1.A executable Jupyter Notebook provides worked examples for applying the TableOne package to exemplar data.
Example of the data used, showing the first 5 rows
| Age | SysABP | Height | Weight | ICU | MechVent | LOS | death |
|---|---|---|---|---|---|---|---|
| 54 | NaN | NaN | NaN | SICU | 0 | 5 | 0 |
| 76 | 105.0 | 175.3 | 80.6 | CSRU | 1 | 8 | 0 |
| 44 | 148.0 | NaN | 56.7 | MICU | 0 | 19 | 0 |
| 68 | NaN | 180.3 | 84.6 | MICU | 0 | 9 | 0 |
| 88 | NaN | NaN | NaN | MICU | 0 | 4 | 0 |
Each row captures a unique case (eg a patient) and each column pertains to an observation associated with the case (eg patient age).
NaN: Not a Number; SysABP: systolic arterial blood pressure; ICU: intensive care unit; SICU: surgical ICU; CSRU: cardiac surgery recovery unit; MICU: medical ICU; MechVent: mechanical ventilation; LOS: hospital length of stay.
Figure 2.A test for modality raises a warning message for both “Age” and “SysABP” (systolic arterial blood pressure). Upon inspection, SysABP shows clear peaks at both ∼0 and ∼120.
Figure 3.Box-plot of 3 variables with whiskers located at a distance of three times the interquartile range. Points outside these whiskers are labeled “far outliers” and denoted by circles. A test for far outliers with Tukey’s rule raises a warning for height but not age or systolic arterial blood pressure (SysABP).