| Literature DB >> 28405271 |
Joshua P Kilborn1, David L Jones1, Ernst B Peebles1, David F Naar1.
Abstract
Clustering data continues to be a highly active area of data analysis, and resemblance profiles are being incorporated into ecological methodologies as a hypothesis testing-based approach to clustering multivariate data. However, these new clustering techniques have not been rigorously tested to determine the performance variability based on the algorithm's assumptions or any underlying data structures. Here, we use simulation studies to estimate the statistical error rates for the hypothesis test for multivariate structure based on dissimilarity profiles (DISPROF). We concurrently tested a widely used algorithm that employs the unweighted pair group method with arithmetic mean (UPGMA) to estimate the proficiency of clustering with DISPROF as a decision criterion. We simulated unstructured multivariate data from different probability distributions with increasing numbers of objects and descriptors, and grouped data with increasing overlap, overdispersion for ecological data, and correlation among descriptors within groups. Using simulated data, we measured the resolution and correspondence of clustering solutions achieved by DISPROF with UPGMA against the reference grouping partitions used to simulate the structured test datasets. Our results highlight the dynamic interactions between dataset dimensionality, group overlap, and the properties of the descriptors within a group (i.e., overdispersion or correlation structure) that are relevant to resemblance profiles as a clustering criterion for multivariate data. These methods are particularly useful for multivariate ecological datasets that benefit from distance-based statistical analyses. We propose guidelines for using DISPROF as a clustering decision tool that will help future users avoid potential pitfalls during the application of methods and the interpretation of results.Entities:
Keywords: Monte Carlo; PRIMER‐E; SIMPROF; constrained clustering; data simulation; permutation testing
Year: 2017 PMID: 28405271 PMCID: PMC5383504 DOI: 10.1002/ece3.2760
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Figure 1Theoretical diagram of the process flow for DISPROF clustering with UPGMA: (1) Data are pretreated and configured. (2) An appropriate resemblance metric is applied to the pretreated dataset. (3) The UPGMA site‐connection linkage is assembled. (4) DISPROF is employed in an iterative process to identify the grouping structure in the data and create breaks in the associated linkage tree. (5) DISPROF settles on a final solution, and a two‐dimensional dendrogram visualization is created
Figure 2Two examples of Euclidean‐dissimilarity profiles: Resemblance value sort order is increasing along the x‐axis, and the sorted pairwise dissimilarity values are increasing along the y‐axis. (a) A dissimilarity profile for a simulated unstructured dataset drawn from the exponential probability distribution with [N × P] = [50 × 50]. The observed profile is within the 99% confidence envelope based on 999 permutations of the observed data. (b) A dissimilarity profile for a simulated structured dataset drawn from the normal distribution with two groups having equal variance, [N × P] = [50 × 50], and Ov = 0.01. The observed profile has many dissimilarity values that are above and below the expected mean permuted profile, and its associated 99% confidence envelope, thereby signifying the presence of structure in the dataset
Detail of the simulation scenarios used for the study listed as Sim 1–Sim 4
| Probability distribution |
| Parameter 1 | Parameter 2 |
|
| |
|---|---|---|---|---|---|---|
| Sim 1. Unstructured data | ||||||
| a. | Binomial | 1 |
| 0 ≤ | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} |
| b. | Chi‐square | 1 | 1 ≤ | — | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} |
| c. | Exponential | 1 | 0 ≤ | — | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} |
| d. | Log‐normal | 1 | 0 ≤ | 0 ≤ | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} |
| e. | Negative binomial | 1 | 0 ≤ | 0 ≤ | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} |
| f. | Negative binomial/Poisson | 1 | 1 ≤ | 0 ≤ | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} |
| g. | Normal | 1 | −100 ≤ | 0 ≤ | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} |
| h. | Poisson | 1 | 0 ≤ | — | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} |
| Sim 2. Structured data—overlapping groups | ||||||
| a. | Normal (OCLUS) | 2 |
|
|
| {2, 3, 5, 10, 25, 50, 150, 225, 300} |
| Sim 3. Structured data—Overdispersed descriptors | ||||||
| a. | Negative binomial/Poisson | 2 |
|
|
| {2, 3, 5, 10, 25, 50, 150, 225, 300} |
| b. | Negative binomial/Poisson | 2 |
|
|
| {2, 3, 5, 10, 25, 50, 150, 225, 300} |
| Sim 4. Structured data—correlated descriptors | ||||||
| a. | Normal | 2 |
|
|
| {2, 3, 5, 10, 25, 50, 150, 225, 300} |
| b. | Normal | 2 |
|
|
| {2, 3, 5, 10, 25, 50, 150, 225, 300} |
For each scenario, S = 1,000 datasets were simulated, and mean dissimilarity profiles (DISPROF) were obtained with 1,000 permutations and the p‐values for the test were calculated with 999 permutations (α = .05). Variables are as follows: G, total number of groups; N , total number of objects; P, total number of descriptors; T, number of successful trials; df, degrees of freedom; μ , mean for all descriptors in group i; λ, Poisson rate parameter; , variance for all descriptors in group i; q, probability of success for a trial; θ , overdispersion parameter for all descriptors in group i; Σ , correlation among descriptors in group i; Ov, average overlap per axis between data clouds for G 1 and G 2.
Where θ = 0, then μ = σ , and the negative binomial distribution reduces to the Poisson.
Probability distributions used in Sim 1–Sim 4: The representative data type and the resemblance measure used to determine the pairwise distance between objects
| Probability distribution | Data type | Resemblance |
|---|---|---|
| Binomial | Binary, presence/absence | Jaccard |
| Chi‐square | Rational, continuous | Euclidean |
| Exponential | Rational, continuous | Euclidean |
| Log‐normal | Rational, continuous | Euclidean |
| Negative binomial | Integer, frequency with many 0's | Bray–Curtis |
| Negative binomial/Poisson | Overdispersed ecological count data | Bray–Curtis |
| Normal | Rational, continuous | Euclidean |
| Poisson | Integer, frequency with many 0's | Bray–Curtis |
No data were transformed prior to subjection to the resemblance measure.
Descriptive statistics for DISPROF type I error based on Sim 1
| Probability distribution |
|
| Minimum | Mean | Mode | Maximum | σ |
| |
|---|---|---|---|---|---|---|---|---|---|
| Sim 1. Type I error – S = 40,000 | |||||||||
| a. | Binomial | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} | 0.008 | 0.046 | 0.055 | 0.068 | 0.013 | .002 |
| b. | Chi‐square | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} | 0.032 | 0.050 | 0.050 | 0.067 | 0.007 | .001 |
| c. | Exponential | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} | 0.037 | 0.049 | 0.049 | 0.067 | 0.006 | .001 |
| d. | Log‐normal | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} | 0.033 | 0.050 | 0.047 | 0.070 | 0.008 | .001 |
| e. | Negative binomial | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} | 0.034 | 0.049 | 0.050 | 0.064 | 0.006 | .001 |
| f. | Negative binomial/Poisson | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} | 0.028 | 0.048 | 0.045 | 0.063 | 0.008 | .001 |
| g. | Normal | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} | 0.035 | 0.051 | 0.050 | 0.066 | 0.008 | .001 |
| h. | Poisson | {10, 25, 50, 150, 300} | {2, 3, 10, 25, 50, 150, 225, 300} | 0.036 | 0.049 | 0.043 | 0.062 | 0.007 | .001 |
Unstructured data: Type I error rate estimates and statistics were obtained from S = 40,000 datasets across all configurations of [N × P] for each probability distribution simulated. Error rate estimates for each configuration were based on S = 1,000 datasets, and all p‐values were obtained via 999 permutations with significance assessed at α = .05. N, total number of objects; P, total number of descriptors; σ, standard deviation of the mean; SE, standard error of the mean.
Figure 3Ratio of P:N versus the proportion of type I error: The type I error rates (α = .05) for the DISPROF hypothesis test for multivariate structure of S = 1,000 simulated unstructured datasets from eight different probability distributions simulated in scenario Sim 1. Data points represent each of the 40 different [N × P] configurations; the dotted vertical line indicates the mean type I error rate for all 40 configurations. All data were randomly parameterized and drawn from the (a) binomial, (b) chi‐square, (c) exponential, (d) log‐normal, (e) negative binomial, (f) negative binomial/Poisson, (g) normal, and (h) Poisson probability distributions. The σ and standard error for all probability distributions tested were ≤0.01 and .002, respectively
Descriptive statistics for power, , and for DISPROF based on Sim 2
|
| Ov | Minimum | Mean | Mode | Maximum | σ |
|
|---|---|---|---|---|---|---|---|
| Sim 2. Power − | |||||||
|
|
| 0.342 | 0.626 | 0.476 | 1.000 | 0.221 | .004 |
|
|
| 0.491 | 0.713 | 0.629 | 1.000 | 0.164 | .003 |
|
|
| 0.770 | 0.877 | 0.760 | 1.000 | 0.068 | .001 |
|
|
| 0.990 | 0.997 | 0.999 | 1.000 | 0.002 | <.001 |
|
|
| 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | .000 |
| Sim 2. | |||||||
|
|
| 1.46 | 1.81 | 1.66 | 2.14 | 0.23 | <.01 |
|
|
| 1.70 | 1.95 | 2.16 | 2.19 | 0.16 | <.01 |
|
|
| 2.07 | 2.16 | 2.13 | 2.22 | 0.03 | <.01 |
|
|
| 2.08 | 2.15 | 2.15 | 2.21 | 0.02 | <.01 |
|
|
| 2.05 | 2.06 | 2.06 | 2.09 | 0.01 | <.01 |
|
|
| 2.03 | 2.06 | 2.06 | 2.09 | 0.01 | <.01 |
|
|
| 2.03 | 2.06 | 2.06 | 2.09 | 0.01 | <.01 |
|
|
| 2.04 | 2.07 | 2.06 | 2.09 | 0.01 | <.01 |
|
|
| 2.04 | 2.06 | 2.07 | 2.09 | 0.01 | <.01 |
| Sim 2. | |||||||
|
|
| 0.116 | 0.347 | 0.116 | 0.927 | 0.232 | .005 |
|
|
| 0.198 | 0.407 | 0.198 | 0.897 | 0.190 | .004 |
|
|
| 0.447 | 0.591 | 0.447 | 0.883 | 0.111 | .002 |
|
|
| 0.846 | 0.875 | 0.846 | 0.934 | 0.019 | <.001 |
|
|
| 0.984 | 0.988 | 0.984 | 0.991 | 0.001 | <.001 |
|
|
| 0.995 | 0.997 | 0.995 | 0.998 | 0.001 | <.001 |
|
|
| 0.995 | 0.997 | 0.995 | 0.998 | 0.001 | <.001 |
|
|
| 0.996 | 0.997 | 0.996 | 0.998 | 0.001 | <.001 |
|
|
| 0.995 | 0.997 | 0.995 | 0.998 | 0.001 | <.001 |
Structured data—overlapping groups: Power estimates for each [N × P × Ov] configuration were based on S = 1,000 datasets with mean values based on 50 [P × Ov] configurations at each P; all p‐values were obtained via 999 permutations with significance assessed at α = .05. Mean number of groups () and average clustering solution correspondence () estimations and statistics were obtained from S = 50,000 datasets across all Ov for each configuration of [N × P]. N, total number of objects (n = number of objects in group i); P, total number of descriptors; Ov, average overlap per axis between data clouds for G 1 and G 2 ; , variance of group i; σ, standard deviation of the mean; SE, standard error of the mean.
Figure 4Power of the DISPROF test versus the proportion of group overlap: Statistical power of DISPROF versus Ov for all P tested under Sim 2. Each line plot represents the 50 power values for S = 1,000 datasets at each Ov level for a given P. The horizontal dashed line at power = 0.8 is the lower limit of acceptable power values
Figure 5The relationship for and with Ov for DISPROF clustering: (a) The mean number of groups identified () versus the average data cloud overlap (Ov) for all P tested under Sim 2. Each line plot represents the 50 values for S = 1,000 datasets at each Ov level for a given P. The optimal grouping solution (G = 2) is represented by the horizontal dashed line. (b) The mean correspondence of the grouping solution () versus the average data cloud overlap (Ov) for all P tested under Sim 2. Each line plot is configured as in panel (a), the horizontal black dashed line represents lower bound for excellent correspondence ( = 0.9), and the red dashed line represents lower bound for good correspondence ( = 0.8). Boxplots to the right represent the distribution of standard errors for each estimate of the and for all Ov within a noted dimensionality for P. The horizontal red line in each boxplot represents the median standard error value in the distribution, with the upper and lower edges of the box being the 25th and 75th percentiles. Whiskers extend to encompass the most extreme data points, and outliers are plotted individually as crosses
Descriptive statistics for for DISPROF based on Sim 3
|
|
| Minimum | Mean | Mode | Maximum |
|
|
|
| Minimum | Mean | Mode | Maximum |
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sim 3a. | Sim 3b. | ||||||||||||||
|
|
| 1.00 | 1.06 | 1.00 | 4.00 | 0.28 | .01 |
|
| 2.00 | 2.07 | 2.00 | 5.00 | 0.32 | .01 |
|
| 1.00 | 1.10 | 1.00 | 5.00 | 0.35 | .01 |
| 2.00 | 2.07 | 2.00 | 5.00 | 0.30 | .01 | ||
|
| 1.00 | 1.32 | 1.00 | 5.00 | 0.62 | .02 |
| 1.00 | 2.16 | 2.00 | 5.00 | 0.62 | .02 | ||
|
| 1.00 | 1.75 | 1.00 | 6.00 | 0.86 | .03 |
| 1.00 | 2.51 | 2.00 | 6.00 | 0.98 | .03 | ||
|
|
| 1.00 | 1.07 | 1.00 | 4.00 | 0.30 | .01 |
|
| 2.00 | 2.06 | 2.00 | 5.00 | 0.29 | .01 |
|
| 1.00 | 1.13 | 1.00 | 5.00 | 0.42 | .01 |
| 2.00 | 2.05 | 2.00 | 5.00 | 0.27 | .01 | ||
|
| 1.00 | 1.84 | 1.00 | 6.00 | 0.99 | .03 |
| 2.00 | 2.36 | 2.00 | 6.00 | 0.62 | .02 | ||
|
| 1.00 | 3.18 | 3.00 | 8.00 | 1.44 | .05 |
| 1.00 | 3.45 | 3.00 | 7.00 | 1.03 | .03 | ||
|
|
| 1.00 | 1.07 | 1.00 | 6.00 | 0.34 | .01 |
|
| 2.00 | 2.05 | 2.00 | 4.00 | 0.24 | .01 |
|
| 1.00 | 1.25 | 1.00 | 6.00 | 0.58 | .02 |
| 2.00 | 2.06 | 2.00 | 5.00 | 0.27 | .01 | ||
|
| 1.00 | 3.93 | 3.00 | 10.00 | 1.73 | .05 |
| 2.00 | 2.34 | 2.00 | 5.00 | 0.55 | .02 | ||
|
| 3.00 | 7.27 | 7.00 | 13.00 | 1.71 | .05 |
| 2.00 | 4.23 | 4.00 | 8.00 | 1.23 | .04 | ||
|
|
| 1.00 | 1.06 | 1.00 | 4.00 | 0.31 | .01 |
|
| 2.00 | 2.07 | 2.00 | 6.00 | 0.35 | .01 |
|
| 1.00 | 1.94 | 1.00 | 8.00 | 1.14 | .04 |
| 2.00 | 2.05 | 2.00 | 4.00 | 0.24 | .01 | ||
|
| 4.00 | 9.71 | 10.00 | 16.00 | 1.96 | .06 |
| 2.00 | 2.24 | 2.00 | 6.00 | 0.50 | .02 | ||
|
| 8.00 | 12.91 | 12.00 | 18.00 | 1.65 | .05 |
| 2.00 | 3.94 | 4.00 | 10.00 | 1.22 | .04 | ||
|
|
| 1.00 | 1.11 | 1.00 | 7.00 | 0.57 | .02 |
|
| 2.00 | 2.06 | 2.00 | 5.00 | 0.28 | .01 |
|
| 1.00 | 6.01 | 6.00 | 14.00 | 2.30 | .07 |
| 2.00 | 2.06 | 2.00 | 6.00 | 0.32 | .01 | ||
|
| 12.00 | 17.93 | 18.00 | 23.00 | 1.66 | .05 |
| 2.00 | 2.05 | 2.00 | 6.00 | 0.28 | .01 | ||
|
| 14.00 | 19.70 | 20.00 | 24.00 | 1.58 | .05 |
| 2.00 | 2.64 | 2.00 | 7.00 | 0.81 | .03 | ||
|
|
| 1.00 | 1.10 | 1.00 | 8.00 | 0.51 | .02 |
|
| 2.00 | 2.09 | 2.00 | 6.00 | 0.41 | .01 |
|
| 5.00 | 12.73 | 13.00 | 20.00 | 2.37 | .08 |
| 2.00 | 2.06 | 2.00 | 7.00 | 0.35 | .01 | ||
|
| 18.00 | 23.12 | 23.00 | 26.00 | 1.40 | .04 |
| 2.00 | 2.07 | 2.00 | 6.00 | 0.32 | .01 | ||
|
| 19.00 | 23.55 | 24.00 | 26.00 | 1.30 | .04 |
| 2.00 | 2.17 | 2.00 | 6.00 | 0.45 | .01 | ||
|
|
| 1.00 | 1.10 | 1.00 | 10.00 | 0.61 | .02 |
|
| 2.00 | 2.07 | 2.00 | 9.00 | 0.41 | .01 |
|
| 18.00 | 22.75 | 23.00 | 27.00 | 1.41 | .04 |
| 2.00 | 2.05 | 2.00 | 6.00 | 0.27 | .01 | ||
|
| 24.00 | 25.91 | 26.00 | 27.00 | 0.31 | .01 |
| 2.00 | 2.05 | 2.00 | 7.00 | 0.28 | .01 | ||
|
| 24.00 | 25.92 | 26.00 | 27.00 | 0.29 | .01 |
| 2.00 | 2.05 | 2.00 | 7.00 | 0.31 | .01 | ||
|
|
| 1.00 | 1.11 | 1.00 | 9.00 | 0.67 | .02 |
|
| 2.00 | 2.07 | 2.00 | 6.00 | 0.36 | .01 |
|
| 21.00 | 24.83 | 25.00 | 27.00 | 0.95 | .03 |
| 2.00 | 2.07 | 2.00 | 5.00 | 0.32 | .01 | ||
|
| 25.00 | 25.99 | 26.00 | 27.00 | 0.12 | <.01 |
| 2.00 | 2.09 | 2.00 | 7.00 | 0.40 | .01 | ||
|
| 25.00 | 25.99 | 26.00 | 28.00 | 0.12 | .00 |
| 2.00 | 2.07 | 2.00 | 6.00 | 0.35 | .01 | ||
|
|
| 1.00 | 1.10 | 1.00 | 10.00 | 0.60 | .02 |
|
| 2.00 | 2.07 | 2.00 | 6.00 | 0.34 | .01 |
|
| 23.00 | 25.65 | 26.00 | 27.00 | 0.58 | .02 |
| 2.00 | 2.06 | 2.00 | 6.00 | 0.32 | .01 | ||
|
| 25.00 | 26.00 | 26.00 | 27.00 | 0.05 | <.01 |
| 2.00 | 2.08 | 2.00 | 6.00 | 0.37 | .01 | ||
|
| 25.00 | 26.00 | 26.00 | 27.00 | 0.07 | <.01 |
| 2.00 | 2.08 | 2.00 | 8.00 | 0.41 | .01 | ||
Structured data—overdispersed descriptors: Estimates of the mean number of groups identified () for each [N × P × (θ 1, θ 2)] configuration were based on S = 1,000 datasets. N, total number of objects (n = number of objects in group i); P, total number of descriptors; θ , overdispersion for descriptors in group i; μ , mean value of descriptors in group i; σ, standard deviation of the mean; SE, standard error of the mean.
Descriptive statistics for for DISPROF based on Sim 3
|
|
| Minimum | Mean | Mode | Maximum |
|
|
|
| Minimum | Mean | Mode | Maximum |
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sim 3a. | Sim 3b. | ||||||||||||||
|
|
| −0.013 | 0.000 | 0.000 | 0.060 | 0.003 | <.001 |
|
| 0.676 | 0.988 | 1.000 | 1.000 | 0.042 | .001 |
|
| −0.013 | 0.000 | 0.000 | 0.077 | 0.004 | <.001 |
| 0.399 | 0.909 | 1.000 | 1.000 | 0.100 | .003 | ||
|
| −0.004 | 0.004 | 0.000 | 0.310 | 0.019 | .001 |
| 0.000 | 0.563 | 0.000 | 1.000 | 0.269 | .009 | ||
|
| −0.006 | 0.014 | 0.000 | 0.326 | 0.035 | .001 |
| 0.000 | 0.316 | 0.000 | 1.000 | 0.232 | .007 | ||
|
|
| −0.021 | 0.000 | 0.000 | 0.021 | 0.002 | <.001 |
|
| 0.721 | 0.994 | 1.000 | 1.000 | 0.030 | .001 |
|
| −0.007 | 0.000 | 0.000 | 0.038 | 0.002 | <.001 |
| 0.615 | 0.970 | 1.000 | 1.000 | 0.059 | .002 | ||
|
| −0.007 | 0.012 | 0.000 | 0.312 | 0.032 | .001 |
| 0.000 | 0.780 | 0.920 | 1.000 | 0.161 | .005 | ||
|
| −0.003 | 0.063 | 0.000 | 0.555 | 0.086 | .003 |
| 0.000 | 0.539 | 0.770 | 1.000 | 0.170 | .005 | ||
|
|
| −0.019 | 0.000 | 0.000 | 0.028 | 0.002 | <.001 |
|
| 0.701 | 0.996 | 1.000 | 1.000 | 0.025 | .001 |
|
| −0.011 | 0.001 | 0.000 | 0.109 | 0.006 | <.001 |
| 0.727 | 0.992 | 1.000 | 1.000 | 0.029 | .001 | ||
|
| 0.000 | 0.065 | 0.000 | 0.422 | 0.075 | .002 |
| 0.527 | 0.915 | 1.000 | 1.000 | 0.088 | .003 | ||
|
| 0.002 | 0.264 | 0.151 | 0.573 | 0.112 | .004 |
| 0.256 | 0.705 | 0.882 | 1.000 | 0.121 | .004 | ||
|
|
| −0.017 | 0.000 | 0.000 | 0.017 | 0.001 | <.001 |
|
| 0.701 | 0.995 | 1.000 | 1.000 | 0.030 | .001 |
|
| −0.003 | 0.005 | 0.000 | 0.125 | 0.014 | <.001 |
| 0.747 | 0.997 | 1.000 | 1.000 | 0.018 | .001 | ||
|
| 0.026 | 0.260 | 0.219 | 0.533 | 0.097 | .003 |
| 0.708 | 0.984 | 1.000 | 1.000 | 0.035 | .001 | ||
|
| 0.247 | 0.451 | 0.452 | 0.558 | 0.054 | .002 |
| 0.589 | 0.860 | 0.961 | 1.000 | 0.097 | .003 | ||
|
|
| −0.019 | 0.000 | 0.000 | 0.106 | 0.004 | <.001 |
|
| 0.676 | 0.997 | 1.000 | 1.000 | 0.020 | .001 |
|
| −0.003 | 0.059 | 0.012 | 0.310 | 0.056 | .002 |
| 0.656 | 0.996 | 1.000 | 1.000 | 0.022 | .001 | ||
|
| 0.328 | 0.460 | 0.476 | 0.535 | 0.034 | .001 |
| 0.626 | 0.997 | 1.000 | 1.000 | 0.021 | .001 | ||
|
| 0.467 | 0.515 | 0.515 | 0.533 | 0.011 | <.001 |
| 0.673 | 0.966 | 1.000 | 1.000 | 0.049 | .002 | ||
|
|
| −0.017 | 0.000 | 0.000 | 0.029 | 0.002 | <.001 |
|
| 0.676 | 0.995 | 1.000 | 1.000 | 0.027 | .001 |
|
| 0.028 | 0.236 | 0.266 | 0.481 | 0.080 | .003 |
| 0.626 | 0.996 | 1.000 | 1.000 | 0.025 | .001 | ||
|
| 0.430 | 0.506 | 0.510 | 0.523 | 0.012 | <.001 |
| 0.665 | 0.995 | 1.000 | 1.000 | 0.028 | .001 | ||
|
| 0.430 | 0.509 | 0.508 | 0.520 | 0.004 | <.001 |
| 0.727 | 0.992 | 1.000 | 1.000 | 0.024 | .001 | ||
|
|
| −0.018 | 0.000 | 0.000 | 0.035 | 0.002 | <.001 |
|
| 0.631 | 0.995 | 1.000 | 1.000 | 0.027 | .001 |
|
| 0.352 | 0.454 | 0.468 | 0.517 | 0.028 | .001 |
| 0.792 | 0.997 | 1.000 | 1.000 | 0.015 | <.001 | ||
|
| 0.395 | 0.505 | 0.505 | 0.508 | 0.004 | <.001 |
| 0.633 | 0.997 | 1.000 | 1.000 | 0.019 | .001 | ||
|
| 0.428 | 0.505 | 0.505 | 0.508 | 0.003 | <.001 |
| 0.689 | 0.997 | 1.000 | 1.000 | 0.021 | .001 | ||
|
|
| −0.010 | 0.000 | 0.000 | 0.013 | 0.001 | <.001 |
|
| 0.699 | 0.996 | 1.000 | 1.000 | 0.026 | .001 |
|
| 0.424 | 0.484 | 0.464 | 0.513 | 0.021 | .001 |
| 0.714 | 0.996 | 1.000 | 1.000 | 0.022 | .001 | ||
|
| 0.465 | 0.505 | 0.505 | 0.507 | 0.002 | <.001 |
| 0.646 | 0.995 | 1.000 | 1.000 | 0.029 | .001 | ||
|
| 0.391 | 0.505 | 0.505 | 0.507 | 0.004 | <.001 |
| 0.663 | 0.996 | 1.000 | 1.000 | 0.024 | .001 | ||
|
|
| −0.010 | 0.000 | 0.000 | 0.019 | 0.001 | <.001 |
|
| 0.607 | 0.996 | 1.000 | 1.000 | 0.023 | .001 |
|
| 0.464 | 0.502 | 0.505 | 0.510 | 0.011 | <.001 |
| 0.739 | 0.997 | 1.000 | 1.000 | 0.020 | .001 | ||
|
| 0.465 | 0.505 | 0.505 | 0.507 | 0.001 | <.001 |
| 0.611 | 0.996 | 1.000 | 1.000 | 0.026 | .001 | ||
|
| 0.465 | 0.505 | 0.505 | 0.507 | 0.002 | <.001 |
| 0.610 | 0.995 | 1.000 | 1.000 | 0.027 | .001 | ||
Structured data—overdispersed descriptors: Estimates of mean correspondence ( ) for each [N × P × (θ 1, θ 2)] configuration were based on S = 1,000 datasets, where correspondence is measured between the clustering solution achieved via DISPROF w/UPGMA and the simulated grouping partition. N, total number of objects (n = number of objects in group i); P, total number of descriptors; θ , overdispersion for descriptors in group i; μ , mean value of descriptors in group i; σ, standard deviation of the mean; SE, standard error of the mean. ARI values estimate the likelihood of agreement between one randomly selected pair of objects represented in both partitions, corrected for change, and negative values represent probabilities that are less than would be expected by random chance alone.
Descriptive statistics for and for DISPROF based on Sim 4
|
|
| Minimum | Mean | Mode | Maximum |
|
|
|
| Minimum | Mean | Mode | Maximum |
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sim 4. | Sim 4. | ||||||||||||||
|
|
| 2.000 | 2.058 | 2.000 | 5.000 | 0.294 | .009 |
|
| 0.691 | 0.994 | 1.000 | 1.000 | 0.033 | .001 |
|
| 2.000 | 3.620 | 3.000 | 7.000 | 0.974 | .031 |
| 0.345 | 0.769 | 1.000 | 1.000 | 0.153 | .005 | ||
|
| 4.000 | 6.515 | 6.000 | 10.000 | 0.986 | .031 |
| 0.254 | 0.411 | 0.353 | 0.752 | 0.071 | .002 | ||
|
| 2.000 | 2.844 | 3.000 | 6.000 | 0.740 | .023 |
| 0.413 | 0.881 | 1.000 | 1.000 | 0.111 | .004 | ||
|
| 3.000 | 4.343 | 4.000 | 7.000 | 0.743 | .023 |
| 0.398 | 0.699 | 0.684 | 0.892 | 0.053 | .002 | ||
|
|
| 2.000 | 2.056 | 2.000 | 4.000 | 0.247 | .008 |
|
| 0.731 | 0.995 | 1.000 | 1.000 | 0.027 | .001 |
|
| 2.000 | 4.553 | 4.000 | 10.000 | 1.017 | .032 |
| 0.326 | 0.637 | 0.505 | 1.000 | 0.136 | .004 | ||
|
| 5.000 | 7.601 | 8.000 | 11.000 | 1.074 | .034 |
| 0.193 | 0.341 | 0.306 | 0.562 | 0.058 | .002 | ||
|
| 2.000 | 3.349 | 3.000 | 6.000 | 0.744 | .024 |
| 0.505 | 0.812 | 1.000 | 1.000 | 0.096 | .003 | ||
|
| 3.000 | 4.899 | 5.000 | 8.000 | 0.798 | .025 |
| 0.381 | 0.668 | 0.650 | 0.830 | 0.044 | .001 | ||
|
|
| 2.000 | 2.064 | 2.000 | 5.000 | 0.307 | .010 |
|
| 0.691 | 0.996 | 1.000 | 1.000 | 0.025 | .001 |
|
| 3.000 | 5.335 | 5.000 | 10.000 | 0.988 | .031 |
| 0.311 | 0.537 | 0.588 | 0.923 | 0.101 | .003 | ||
|
| 6.000 | 8.943 | 9.000 | 13.000 | 1.077 | .034 |
| 0.168 | 0.284 | 0.257 | 0.473 | 0.047 | .001 | ||
|
| 2.000 | 3.731 | 4.000 | 7.000 | 0.746 | .024 |
| 0.492 | 0.766 | 0.777 | 1.000 | 0.074 | .002 | ||
|
| 4.000 | 5.535 | 5.000 | 9.000 | 0.823 | .026 |
| 0.365 | 0.640 | 0.630 | 0.783 | 0.036 | .001 | ||
|
|
| 2.000 | 2.066 | 2.000 | 5.000 | 0.316 | .010 |
|
| 0.709 | 0.996 | 1.000 | 1.000 | 0.024 | .001 |
|
| 4.000 | 6.248 | 6.000 | 10.000 | 1.034 | .033 |
| 0.259 | 0.446 | 0.482 | 0.823 | 0.076 | .002 | ||
|
| 8.000 | 10.540 | 10.000 | 15.000 | 1.221 | .039 |
| 0.136 | 0.234 | 0.222 | 0.388 | 0.036 | .001 | ||
|
| 3.000 | 4.196 | 4.000 | 8.000 | 0.795 | .025 |
| 0.462 | 0.719 | 0.731 | 0.925 | 0.053 | .002 | ||
|
| 4.000 | 6.407 | 6.000 | 11.000 | 0.908 | .029 |
| 0.309 | 0.615 | 0.616 | 0.727 | 0.030 | .001 | ||
|
|
| 2.000 | 2.056 | 2.000 | 6.000 | 0.266 | .008 |
|
| 0.729 | 0.997 | 1.000 | 1.000 | 0.014 | .000 |
|
| 5.000 | 7.640 | 7.000 | 12.000 | 1.133 | .036 |
| 0.205 | 0.355 | 0.326 | 0.588 | 0.059 | .002 | ||
|
| 8.000 | 12.723 | 13.000 | 17.000 | 1.282 | .041 |
| 0.120 | 0.185 | 0.161 | 0.309 | 0.029 | .001 | ||
|
| 3.000 | 4.911 | 5.000 | 9.000 | 0.788 | .025 |
| 0.402 | 0.676 | 0.666 | 0.925 | 0.042 | .001 | ||
|
| 5.000 | 7.505 | 7.000 | 10.000 | 0.905 | .029 |
| 0.455 | 0.593 | 0.583 | 0.679 | 0.021 | .001 | ||
|
|
| 2.000 | 2.068 | 2.000 | 5.000 | 0.302 | .010 |
|
| 0.775 | 0.996 | 1.000 | 1.000 | 0.018 | .001 |
|
| 6.000 | 8.792 | 9.000 | 12.000 | 1.197 | .038 |
| 0.185 | 0.303 | 0.287 | 0.518 | 0.052 | .002 | ||
|
| 10.000 | 14.368 | 14.000 | 21.000 | 1.468 | .046 |
| 0.098 | 0.156 | 0.146 | 0.264 | 0.024 | .001 | ||
|
| 3.000 | 5.499 | 5.000 | 9.000 | 0.878 | .028 |
| 0.517 | 0.650 | 0.626 | 0.823 | 0.036 | .001 | ||
|
| 5.000 | 8.405 | 8.000 | 14.000 | 1.078 | .034 |
| 0.393 | 0.578 | 0.573 | 0.646 | 0.021 | .001 | ||
|
|
| 2.000 | 2.054 | 2.000 | 4.000 | 0.247 | .008 |
|
| 0.889 | 0.998 | 1.000 | 1.000 | 0.011 | .000 |
|
| 7.000 | 10.652 | 10.000 | 16.000 | 1.316 | .042 |
| 0.137 | 0.235 | 0.218 | 0.371 | 0.038 | .001 | ||
|
| 12.000 | 17.067 | 17.000 | 24.000 | 1.578 | .050 |
| 0.073 | 0.122 | 0.119 | 0.237 | 0.019 | .001 | ||
|
| 4.000 | 6.476 | 6.000 | 10.000 | 0.973 | .031 |
| 0.492 | 0.616 | 0.616 | 0.731 | 0.027 | .001 | ||
|
| 6.000 | 9.766 | 10.000 | 14.000 | 1.166 | .037 |
| 0.453 | 0.562 | 0.555 | 0.626 | 0.015 | .000 | ||
|
|
| 2.000 | 2.052 | 2.000 | 6.000 | 0.282 | .009 |
|
| 0.716 | 0.997 | 1.000 | 1.000 | 0.016 | .000 |
|
| 8.000 | 11.348 | 11.000 | 16.000 | 1.357 | .043 |
| 0.131 | 0.217 | 0.208 | 0.328 | 0.035 | .001 | ||
|
| 14.000 | 18.052 | 18.000 | 23.000 | 1.550 | .049 |
| 0.076 | 0.112 | 0.110 | 0.186 | 0.017 | .001 | ||
|
| 4.000 | 6.769 | 7.000 | 10.000 | 0.963 | .030 |
| 0.443 | 0.609 | 0.603 | 0.712 | 0.027 | .001 | ||
|
| 7.000 | 10.169 | 10.000 | 14.000 | 1.139 | .036 |
| 0.405 | 0.558 | 0.552 | 0.608 | 0.014 | .000 | ||
|
|
| 2.000 | 2.053 | 2.000 | 6.000 | 0.317 | .010 |
|
| 0.646 | 0.997 | 1.000 | 1.000 | 0.018 | .001 |
|
| 8.000 | 11.973 | 12.000 | 17.000 | 1.342 | .042 |
| 0.124 | 0.203 | 0.188 | 0.321 | 0.031 | .001 | ||
|
| 14.000 | 18.726 | 19.000 | 24.000 | 1.659 | .052 |
| 0.070 | 0.107 | 0.104 | 0.218 | 0.017 | .001 | ||
|
| 4.000 | 7.107 | 7.000 | 10.000 | 1.001 | .032 |
| 0.412 | 0.602 | 0.597 | 0.717 | 0.026 | .001 | ||
|
| 7.000 | 10.588 | 11.000 | 14.000 | 1.175 | .037 |
| 0.378 | 0.555 | 0.552 | 0.616 | 0.015 | .000 | ||
Structured data—correlated descriptors: Estimates of the mean number of groups identified () and mean correspondence () for each [N × P × (Σ 1, Σ 2)] configuration were based on S = 1,000 datasets, where correspondence is measured between the clustering solution achieved via DISPROF with UPGMA and the simulated partition. N, total number of objects (n = number of objects in group i); P, total number of descriptors; Σ , correlation among descriptors in group i; μ , mean value of descriptors in group i; σ, standard deviation of the mean; SE, standard error of the mean. ARI values estimate the likelihood of agreement between one randomly selected pair of objects represented in both partitions, corrected for chance.