| Literature DB >> 26628903 |
Abstract
BACKGROUND: Here we present an application of advanced registration and atlas building framework DRAMMS to the automated annotation of mouse mandibles through a series of tests using single and multi-atlas segmentation paradigms and compare the outcomes to the current gold standard, manual annotation.Entities:
Keywords: Automated landmarking; Geometric morphometrics; Mandible; Multi-atlas segmentation; microCT
Year: 2015 PMID: 26628903 PMCID: PMC4666065 DOI: 10.1186/s12983-015-0127-8
Source DB: PubMed Journal: Front Zool ISSN: 1742-9994 Impact factor: 3.172
Dice (upper triangle) similarity scores and correlation coefficients (lower triangle) between atlases build from different initializing samples
| Sample 1 | Sample 2 | Sample 3 | Sample 4 | |
|---|---|---|---|---|
| Sample 1 | 0.997 | 0.997 | 0.997 | |
| Sample 2 | 0.999 | 0.998 | 0.997 | |
| Sample 3 | 0.999 | 0.999 | 0.997 | |
| Sample 4 | 0.999 | 0.999 | 0.999 |
DICE similarity is calculated as the ratio of twice the intersection of two images divided by sum the two images, with score of 1 representing two identical images. Four samples randomly chosen from the study population to initiate the atlas building process
Fig. 1Visualization of the distances between the atlas surface that was landmarked (p90) and four other surfaces constructed. a 50 % Probability surface (p50); b 70 % Probability surface (p70); c Surface thresholded at grayscale value of 35. d Surface thresholded at grayscale value of 55. RMS: Root mean square error
Fig. 2Comparison of automated landmarking methods to the gold standard. Each point is the digitization error associated with that landmark in one sample in a given method. Horizontal tick marks are means for each landmark. Gray bars indicate +/−1 SD from the mean
Digitization errors associated with each annotation technique.
| LMs | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gold Standard | 0.04 | 0.06 | 0.04 | 0.07 | 0.02 | 0.05 | 0.14 | 0.06 | 0.06 | 0.18 | 0.07 | 0.1 | 0.05 | 0.07 | 0.04 | 0.11 |
| 0.02 | 0.02 | 0.02 | 0.05 | 0.01 | 0.05 | 0.08 | 0.08 | 0.03 | 0.15 | 0.03 | 0.08 | 0.04 | 0.08 | 0.02 | 0.10 | |
| Single Atlas | 0.07* | 0.10* | 0.06* | 0.08 | 0.05* | 0.06 | 0.06a | 0.08* | 0.08* | 0.22 | 0.07 | 0.15 | 0.07 | 0.09 | 0.06* | 0.12 |
| 0.02 | 0.03 | 0.02 | 0.04 | 0.01 | 0.03 | 0.03 | 0.03 | 0.02 | 0.13 | 0.02 | 0.11 | 0.04 | 0.05 | 0.03 | 0.07 | |
| Improved Atlas | 0.06* | 0.06 | 0.07* | 0.07 | 0.04* | 0.07* | 0.07a | 0.05 | 0.09* | 0.19 | 0.05a | 0.16 | 0.08* | 0.08 | 0.04 | 0.10 |
| 0.02 | 0.02 | 0.02 | 0.03 | 0.01 | 0.03 | 0.03 | 0.02 | 0.03 | 0.1 | 0.02 | 0.1 | 0.02 | 0.04 | 0.02 | 0.06 | |
| Multi Atlas | 0.04 | 0.04a | 0.05 | 0.07 | 0.04* | 0.07* | 0.06a | 0.05 | 0.06 | 0.18 | 0.04a | 0.16 | 0.06 | 0.07 | 0.05 | 0.09 |
| 0.02 | 0.02 | 0.03 | 0.04 | 0.02 | 0.03 | 0.03 | 0.03 | 0.04 | 0.12 | 0.03 | 0.09 | 0.03 | 0.04 | 0.04 | 0.06 |
Mean (upper row) and standard deviations (lower row). Units are millimeters. A Paired Mann-Whitney U test was used to test for differences in digitization errors in each automated method with respect to gold standard at p=0.01. * indicates errors greater than the GS landmarks, while a denotes less. This is determined by a U statistic found in the tail. Error distributions indistinguishable from the GS landmark, which means U statistics not found in the tails, are not marked. N = 36 for all groups.
P values from statistical tests of different GM parameter estimates
| EDMA FORM | GPA SHAPE (one sample) | GPA SHAPE (two sample) | Centroid Size | Centroid size R2 | |
|---|---|---|---|---|---|
| GS v Atlas | 0.010 | <0.001 | <0.001 | <0.001 | 0.96 |
| GS v Improved Atlas | 0.083 | 0.076 | 0.091 | <0.001 | 0.97 |
| GS v MAAP | 0.476 | 0.1399 | 0.157 | <0.001 | 0.95 |
For EDMA, we used the Form procedure of the WinEDMA (Cole, 2002), which used a permutation test with 100,000 replicates to establish the significance. For GPA we used the testmeanshapes function from R shapes package. A permutation test was used for the one sample test (assuming exchangeability between groups), whereas a bootstrap procedure was used for two-sample test. 50,000 replicates were used in both cases. Because the number of samples were low for a true multivariate test such as Hotelling T^2, we reported the Goodall F-test metric which uses the sum-of-squared Procrustes distances to measure SS (Goodall, 1991). This test is also known as Procrustes ANOVA. A paired t-test was used to compare centroid size estimates. All comparisons were run as separate statistical tests. All groups contained the identical set of samples (N = 36 per group). Adjusted R2 results are from linear regressions of centroid size from automated methods on GS centroid size
Fig. 3Comparison of MAAP and TINA results with respect to gold standard. Conventions same as Fig. 2. Because TINA reports values only as integers, our results from Fig. 2 were also rounded to the closest integer
Fig. 4Comparison of the outlier detection performance in MAAP and TINA. For each landmark left column (M) is the result for MAAP and right column (T) is the result for TINA. Each data point represents the difference of the estimated landmark to the corresponding GS one. Horizontal line at five voxel mark represent the threshold specified to assess the outliers in both methods. For MAAP, if two or more of the templates (out of 10) were outside of this threshold range, the software flagged the landmark for manual verification. Green circle indicates landmarks that are correctly flagged as outliers, red circle indicates landmarks that are in reality outliers but missed by detection software, and blue indicates landmarks that were incorrectly flagged since they were below threshold
Fig. 5Landmarks used in the study. Further information on landmarks definitions were provided as an online supporting document