| Literature DB >> 28406450 |
Marilyn N Martinez1, Mary J Bartholomew2.
Abstract
Typically, investigations are conducted with the goal of generating inferences about a population (humans or animal). Since it is not feasible to evaluate the entire population, the study is conducted using a randomly selected subset of that population. With the goal of using the results generated from that sample to provide inferences about the true population, it is important to consider the properties of the population distribution and how well they are represented by the sample (the subset of values). Consistent with that study objective, it is necessary to identify and use the most appropriate set of summary statistics to describe the study results. Inherent in that choice is the need to identify the specific question being asked and the assumptions associated with the data analysis. The estimate of a "mean" value is an example of a summary statistic that is sometimes reported without adequate consideration as to its implications or the underlying assumptions associated with the data being evaluated. When ignoring these critical considerations, the method of calculating the variance may be inconsistent with the type of mean being reported. Furthermore, there can be confusion about why a single set of values may be represented by summary statistics that differ across published reports. In an effort to remedy some of this confusion, this manuscript describes the basis for selecting among various ways of representing the mean of a sample, their corresponding methods of calculation, and the appropriate methods for estimating their standard deviations.Entities:
Keywords: animal pharmacology; data analysis; pharmacokinetics
Year: 2017 PMID: 28406450 PMCID: PMC5489931 DOI: 10.3390/pharmaceutics9020014
Source DB: PubMed Journal: Pharmaceutics ISSN: 1999-4923 Impact factor: 6.321
Figure 1Best fit distribution for “Number” and for “Ln number”.
Comparison of statistics based on original values and Ln-transformed values back-transformed to the original units. Stdev: standard deviation.
| Transformation Used | Mean | Mean ± Stdev | Mean ± 2 Stdev | Mean ± 3 Stdev |
|---|---|---|---|---|
| Untransformed | 11.6 | 5.7, 17.3 | 0.2, 23 | −5.5, 28.7 |
| Back-transformed from Ln | 10.4 | 6.2, 17.3 | 3.7, 28.8 | 2.2, 47.9 |
Example of differing summary values depending upon the underlying assumptions and estimation procedures for obtaining the mean.
| 11 | 2.4 | 0.09 | |
| 7 | 1.95 | 0.14 | |
| 9 | 2.20 | 0.11 | |
| 4 | 1.39 | 0.25 | |
| 10 | 2.30 | 0.10 | |
| 12 | 2.48 | 0.08 | |
| 23 | 3.14 | 0.04 | |
| 15 | 2.71 | 0.07 | |
| 7 | 1.95 | 0.14 | |
| 18 | 2.89 | 0.06 | |
| Mean | 11.60 | 10.38 | 9.20 |
| Stdev | 5.70 | 5.29 | 5.06 |
| %CV | 49.14 | 50.96 | 50.03 |
Matching the selection of mean to the nature of the distribution and the question being addressed.
| Type of mean and standard deviation | Nature of distribution (examples) | Considerations for its application |
|---|---|---|
| Arithmetic | Normal (i.e., additive) | When the estimate of interest is based upon the sum of the individual observations and the data are best presented as a normal distribution. |
| Harmonic | Reciprocal transformation of positive real values | In general, the harmonic mean (HM) is useful for expressing average rates (e.g., miles per hour; widgets per day). In clinical pharmacology, an elimination rate constant, |
| Geometric | Log transformation of positive real values | Within the realm of pharmacokinetics, geometric means are typically used when describing the means of variables such as area under the curve (AUC) and maximum concentrations ( |
| Least square means | Should be consistent with the distribution characteristics of the data collected and the model used to address the study assumptions and investigation | The use of least square means is important when there are an unequal number of observations associated with any of the terms in the statistical model. |
Comparison of arithmetic vs. harmonic standard deviation (Stdev) values and Excel codes for T1/2.
| Data Row | Column C | Column D | Column E | Column F | |
|---|---|---|---|---|---|
| λz sq′d Deviations | |||||
| Row 2 | 11 | 0.063 | (C2–C12)2 | (D2–D12)2 | |
| Row 3 | 7 | 0.099 | (C3–C12)2 | (D3–D12)2 | |
| Row 4 | 9 | 0.077 | (C4–C12)2 | (D4–D12)2 | |
| Row 5 | 4 | 0.17325 | (C5–C12)2 | (D5–D12)2 | |
| Row 6 | 10 | 0.0693 | (C6–C12)2 | (D6–D12)2 | |
| Row 7 | 12 | 0.05775 | (C7–C12)2 | (D7–D12)2 | |
| Row 8 | 23 | 0.05775 | (C8–C12)2 | (D8–D12)2 | |
| Row 9 | 15 | 0.0462 | (C9–C12)2 | (D9–D12)2 | |
| Row 10 | 7 | 0.099 | (C10–C12)2 | (D10–D12)2 | |
| Row 11 | 18 | 0.077 | (C11–C12)2 | (D11–D12)2 | |
| Row 12 | Arithmetic mean | 11.6 | 0.0753 | ||
| Row 13 | Harmonic mean of | 9.20 | |||
| Arithmetic Stdev | 5.70 | sqrt[(sum(E2:E11))/9] | |||
| Harmonic Stdev | 5.06 | [D132 × sqrt[(sum(F2:F11))/9]]/0.693 |
Comparison of estimating average change when expressed as an arithmetic versus as a geometric mean when there are compounded (multiplicative) changes.
| Row 2 | 0.25 | 1.25 | - | |
| Row 3 | −0.36 | 0.64 | ||
| Row 4 | 0.18 | 1.18 | ||
| Row 5 | 0.14 | 1.14 | ||
| Row 6 | −0.75 | 0.25 | ||
| Arithmetic mean | - | −0.108 | - | |
| Geometric mean | 0.769 | (D2 × D3 × D4 × D5 × D6)(1/5) | ||
| Proportion remaining | - | 0.892 | - | - |
Estimation of bacterial growth.
| Doubling time or halving time ( | Time duration ( | Value of “ | Relative change ( | Resulting # bacteria ( |
|---|---|---|---|---|
| 10 | 20 | 100 | 4 | 400 |
| 35 | 120 | 400 | 0.0929 | 37.150 |
| 10 | 60 | 37.15 | 64 | 2377.591 |
Excel spreadsheet code for estimating bacterial growth rate.
| Data Row | Column A | Column B | Column C | Column D | Column E |
|---|---|---|---|---|---|
| Doubling time or halving time | Time duration | Value of “ | Relative change | Resulting # bacteria ( | |
| Row 1 | 10 | 20 | 100 | 2B1/A1 | C1 × D1 |
| Row 2 | 35 | 120 | 400 | 0.5B2/A2 | C2 × D2 |
| Row 3 | 10 | 60 | 37.15 | 2B3/A3 | C3 × D3 |
Calculating the number of bacteria at 10-min intervals. GM: geometric mean.
| Column A | Column B | Column C | Column D | Column E | Column F | Column G |
|---|---|---|---|---|---|---|
| Row | Minutes | 10 min change rate change ( | Number of bacteria ( | Estimating | Number of bacteria based on geometric mean | |
| 2 | 0 | 100 | ||||
| 3 | 10 | 2 | 200 | D2 × C3 | 117.17 | D2 × $C$26 |
| 4 | 20 | 2 | 400 | D3 × C4 | 137.28 | F3 × $C$26 |
| 5 | 30 | 0.82 | 328.13 | D4 × C5 | 160.85 | F4 × $C$26 |
| 6 | 40 | 0.82 | 269.18 | D5 × C6 | 188.46 | F5 × $C$26 |
| 7 | 50 | 0.82 | 220.82 | D6 × C7 | 220.82 | F6 × $C$26 |
| 8 | 55 | 200.00 | D7 × (0.820.5) | |||
| 9 | 60 | 0.82 | 181.14 | D8 × (C90.5) | 258.73 | F7 × $C$26 |
| 10 | 70 | 0.82 | 148.60 | D9 × C10 | 303.14 | F9 × $C$26 |
| 11 | 80 | 0.82 | 121.90 | D10 × C11 | 355.18 | F10 × $C$26 |
| 12 | 90 | 0.82 | 100.00 | D11 × C12 | 416.16 | F11 × $C$26 |
| 13 | 100 | 0.82 | 82.03 | D12 × C13 | 487.60 | F12 × $C$26 |
| 14 | 110 | 0.82 | 67.30 | D13 × C14 | 571.31 | F13 × $C$26 |
| 15 | 120 | 0.82 | 55.20 | D14 × C15 | 669.39 | F14 × $C$26 |
| 16 | 125 | 50.00 | D15 × (0.820.5) | |||
| 17 | 130 | 0.82 | 45.29 | D16 × (C170.5) | 784.31 | F15 × $C$26 |
| 18 | 140 | 0.82 | 37.15 | D17 × C18 | 918.96 | F17 × $C$26 |
| 19 | 150 | 2 | 74.30 | D18 × C19 | 1076.72 | F18 × $C$26 |
| 20 | 160 | 2 | 148.60 | D19 × C20 | 1261.56 | F19 × $C$26 |
| 21 | 170 | 2 | 297.20 | D20 × C21 | 1478.14 | F20 × $C$26 |
| 22 | 180 | 2 | 594.40 | D21 × C22 | 1731.90 | F21 × $C$26 |
| 23 | 190 | 2 | 1188.80 | D22 × C23 | 2029.22 | F22 × $C$26 |
| 24 | 200 | 2 | 2377.59 | D23 × C24 | 2377.59 | F23 × $C$26 |
| 25 | GM | Product (C3:C24)(1/20) | ||||
| 26 | GM | 1.17 | ||||
Figure 2AUC for estimating average CFD over time.
Estimation of geometric means (based upon Ln-transformation of individual observations).
| Data Row | Ln | Ln | |
|---|---|---|---|
| Row 2 | 11 | 2.40 | - |
| Row 3 | 7 | 1.95 | |
| Row 4 | 9 | 2.20 | |
| Row 5 | 4 | 1.39 | |
| Row 6 | 10 | 2.30 | |
| Row 7 | 12 | 2.48 | |
| Row 8 | 23 | 3.14 | |
| Row 9 | 15 | 2.71 | |
| Row 10 | 7 | 1.95 | |
| Row 11 | 18 | 2.89 | |
| Arithmetic mean | 11.6 | 2.34 | sum(E2:E11)/10 |
| Geometric mean (exponentiation of arithmetic mean of Ln values)) | - | 10.4 | exp[sum(E2:E11)/10] |
Values and spreadsheet code for estimating the geometric stdevs.
| Column A | Column B | Column C | Column D | ||
|---|---|---|---|---|---|
| Row | Number | Ln number | Geometric sq′d dev | ||
| 1 | - | 11 | 2.40 | 0.00 | (B1–D12) |
| 2 | 7 | 1.95 | 0.15 | (B2–D12)2 | |
| 3 | 9 | 2.20 | 0.02 | (B3–D12)2 | |
| 4 | 4 | 1.39 | 0.91 | (B4–D12)2 | |
| 5 | 10 | 2.30 | 0.00 | (B5–D12)2 | |
| 6 | 12 | 2.48 | 0.02 | (B6–D12)2 | |
| 7 | 23 | 3.14 | 0.63 | (B7–D12)2 | |
| 8 | 15 | 2.71 | 0.14 | (B8–D12)2 | |
| 9 | 7 | 1.95 | 0.15 | (B9–D12)2 | |
| 10 | 18 | 2.89 | 0.30 | (B10–D12)2 | |
| 11 | Sum | - | - | 2.34 | sum(D1:D10) |
| 12 | Average | - | sum(B1:B10)/10 | ||
| 13 | Geometric mean | exp(D12) | |||
| 14 | Arithmetic Stdev | - | |||
| 14 | Geometric Stdev | 5.29 | D13 × sqrt(D11/9) |
Arithmetic means versus LSmeans for the effect of exercise on body weights.
| Male | Female | Male | Female | |
| 210 | 150 | 200 | 138 | |
| 215 | 168 | 192 | 138 | |
| 189 | 145 | 176 | 144 | |
| 196 | 160 | 202 | 154 | |
| 202 | 166 | 210 | 140 | |
| 155 | 189 | |||
| 159 | 176 | |||
| 149 | 188 | |||
| 138 | 192 | |||
| 188 | ||||
| Marginal means | 202 | 158 | 192 | 143 |
| Arithmetic mean | 173 | 174 | ||
| LSmean | 180 | 167 | ||
Figure 3The distribution of weight values for the no exercise group predicted using the LSmean and a stdev estimate based on the standard error (SE) of the LSmean when assuming that the sample was generated from a single normal population.
Figure 4Distribution of female weights based upon the mean and variance of the sample values of body weights of females in the no exercise group. The X-axis represents body weight (lbs) and the Y-axis represents the fraction of the population of females at that body weight based upon the information contained within subset of individuals included in this hypothetical study. The possible values of the Y-axis range from a value of zero (no individuals expected at that body weight) to 1 (all individuals in the population will have the identical body weight).
Figure 5Distribution of male weights based upon the mean and variance of the sample values of body weights of the males in the no exercise group.
Figure 6Distribution of population of male and female body weights based upon the mean and variance of the sample values from the no-exercise group, assuming an equal likelihood of sampling from either gender.
Comparison of stdev based upon an assumption of a single versus bimodal dataset comprising the treatment effect.
| LSmean | Description | Within/between calculated Stdev | Simulated Stdev (equal probability of sampling from each group) |
|---|---|---|---|
| No exercise (T0) | Average of T0 males and T0 females | 25.70 | 25.10 |
| Exercise (T1) | Average of T1 males and T1 females | 27.77 | 27.05 |
| Males | Average of T0 males and T1 males | 12.57 | 12.95 |
| Females | Average of T0 females and T1 females | 13.71 | 13.93 |
Example of a cross study assessment of mean and stdev. MPH: miles per hour.
| Heading | Trainer | # Runners | Average MPH | Stdev | Mean × ni | LSmean |
|---|---|---|---|---|---|---|
| 1 | 10 | 6.2 | 1.24 | 62 | - | |
| 2 | 5 | 5.5 | 0.55 | 27.5 | - | |
| 3 | 8 | 6.1 | 0.915 | 48.8 | - | |
| 4 | 12 | 6.8 | 1.7 | 81.6 | - | |
| Sum | - | 35 | - | - | 219.9 | 6.28 |
Total weighted stdev across the four trainers.
| Data Row | Trainer Col B | # Runners Col C | Average MPH Col D | Stdev Col E | Mean×ni Col F | Lsmean Col G | StdevWB Col K | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Row 2 | 1 | 10 | 6.2 | 1.24 | 62 | - | 13.84 | 0.07 | - | - |
| Row 3 | 2 | 5 | 5.5 | 0.55 | 27.5 | 1.21 | 3.06 | |||
| Row 4 | 3 | 8 | 6.1 | 0.915 | 48.8 | 5.86 | 0.27 | |||
| Row 5 | 4 | 12 | 6.8 | 1.7 | 81.6 | 31.79 | 3.21 | |||
| Row 6 | - | 35 | - | - | 219.9 | 6.28 | 52.70 | 6.61 | 1.74 | 1.32 |
Excel cell formulas.
| Data Row | Trainer Col B | # Runners Col C | Ave MPH Col D | Stdev Col E | Mean×ni Col F | Lsmean Col G | StdevWB | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Row 2 | 1 | 10 | 6.2 | 1.24 | D2 × C2 | - | (E22) × (C2-1) | C2 × (D2-$G$6)2 | - | |
| Row 3 | 2 | 5 | 5.5 | 0.55 | D3 × C3 | (E32) × (C3-1) | C3 × (D3-$G$6)2 | |||
| Row 4 | 3 | 8 | 6.1 | 0.915 | D4 × C4 | (E42) × (C4-1) | C4 × (D4-$G$6)2 | |||
| Row 5 | 4 | 12 | 6.8 | 1.7 | D5 × C5 | (E52) × (C5-1) | C5 × (D5-$G$6)2 | |||
| Row 6 | - | SUM | - | - | SUM | F6/C6 | SUM | SUM | (I6 + H6)/(C6 − 1) | SQRT(J6) |
Comparison of arithmetic vs. HM for T1/2 estimates (including both the values and the codes for estimating these values based upon the columns and rows that might be found in an Excel spreadsheet).
| Data Row | Column C | Column D | Column E | |||
|---|---|---|---|---|---|---|
| 1/ | 0.693/ | |||||
| Row 2 | 11 | 0.063 | 0.091 | |||
| Row 3 | 7 | 0.099 | 0.143 | |||
| Row 4 | 9 | 0.077 | 0.111 | |||
| Row 5 | 4 | 0.173 | 0.25 | |||
| Row 6 | 10 | 0.069 | 0.1 | |||
| Row 7 | 12 | 0.058 | 0.083 | |||
| Row 8 | 23 | 0.03 | 0.043 | |||
| Row 9 | 15 | 0.046 | 0.067 | |||
| Row 10 | 7 | 0.099 | 0.143 | |||
| Row 11 | 18 | 0.039 | 0.056 | |||
| Arithmetic mean | 11.6 | 0.075 | sum(C2:C11)/10 | |||
| 0.693/(Arith mean | 9.202 | 0.693/(sum(D2:D11)/10) | ||||
| Harmonic mean of | 9.202 | 1/(sum(E2:E11)/10) | ||||