| Literature DB >> 18320060 |
Ilya Plyusnin1, Alistair R Evans, Aleksis Karme, Aristides Gionis, Jukka Jernvall.
Abstract
The ability to analyze and classify three-dimensional (3D) biological morphology has lagged behind the analysis of other biological data types such as gene sequences. Here, we introduce the techniques of data mining to the study of 3D biological shapes to bring the analyses of phenomes closer to the efficiency of studying genomes. We compiled five training sets of highly variable morphologies of mammalian teeth from the MorphoBrowser database. Samples were labeled either by dietary class or by conventional dental types (e.g. carnassial, selenodont). We automatically extracted a multitude of topological attributes using Geographic Information Systems (GIS)-like procedures that were then used in several combinations of feature selection schemes and probabilistic classification models to build and optimize classifiers for predicting the labels of the training sets. In terms of classification accuracy, computational time and size of the feature sets used, non-repeated best-first search combined with 1-nearest neighbor classifier was the best approach. However, several other classification models combined with the same searching scheme proved practical. The current study represents a first step in the automatic analysis of 3D phenotypes, which will be increasingly valuable with the future increase in 3D morphology and phenomics databases.Entities:
Mesh:
Year: 2008 PMID: 18320060 PMCID: PMC2254194 DOI: 10.1371/journal.pone.0001742
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Outline of procedures for data mining of morphology.
(a) Seven main steps from 3D data acquisition and processing, feature extraction, data mining procedures (classifiers and feature selection) to possible applications of the method. (b–d) illustrations of feature extraction methods. (b) Section areas and section convolutions. (i) Tooth (upper molar of Rhinolophus blasii) is divided into 10 equal sections perpendicular to the z-axis. Upper and lower bounds (relative to the zApex) for every second section are given along the z-axis. (ii) Occlusal view with every second section highlighted in red, with areas and convolutions for each section. (c) Orientation patch count (OPC). (i) Surface of the tooth (upper molar of Felis silvestris) is grouped into surface vertices according to their orientation in the xy-plane (ii). (iii) Vertices are further grouped according to their 4-cell connectivity followed by (iv) exclusion of small patches. (v) The resulting OPC value is the final number of patches. (d) The effect of surface folding and elongation on surface relief. Relief is calculated by dividing the 3D surface area by its 2D projected area. A flat, unspecialized surface (i) has a relief of 1. If the surface is folded (ii), such as in Otomys irroratus, or elongated (iii), such as in Felis silvestris, its relief increases.
Figure 2Dental types illustrated by samples from tooth-morph and toothrow-morph sets.
Dental types: Car, carnassials; Dil-Trib, dilambdodont and tribosphenic; Sel, selenodont; Loph, lophodont; Bun, bunodont; CarTr, carnassial tooth row; SelTr, selenodont tooth row; LophTr, lophodont tooth row; BunTr, bunodont tooth row. Sample morphologies: Car, Canis lupus u-p4; Dil-Trib, Rhinolophus blasii u-m2; Sel, Alcelaphus buselaphus u-m2; Loph, Berylmys bowersi u-m1; Bun, Pongo pygmaeus u-m2; CarTr, Canis lupus u-p2m12; SelTr, Alcelaphus buselaphus u-tr; LophTr, Berylmys bowersi u-m1-3; BunTr, Pongo pygmaeus u-tr. u, upper; m, molar; p, premolar. Upper right teeth and tooth rows; anterior to the right.
Features used in the tooth-diet training set.
| ID | Feature | P |
| 99 | D2dist-100000-mean | 0.2 |
| 15 | sectionConv −10 −5 −5 | 0.04 |
| 42 | MROPC −5 −0.1 −9 | 0.04 |
| 22 | MROPC −4 −0.002 −9 | 0.03 |
| 25 | MROPC −4 −0.008 −9 | 0.03 |
| 57 | MROPC −7 −0.006 −9 | 0.03 |
| 16 | sectionConv −10 −5 −6 | 0.03 |
| 1 | sectionAreas −10 −5 −1 | 0.03 |
| 73 | MROPC −8 −0.06 −9 | 0.03 |
| 98 | relief | 0.02 |
Top ten features for the tooth-diet training set ranked by their mean probability distribution. ID, feature ID; Feature, procedure name including parameters; P, mean value for probability density function rounded to the nearest percent.
Features used in the toothrow-diet training set.
| ID | Feature | P |
| 50 | MROPC −6 −0.04 −9 | 0.1 |
| 6 | sectionAreas −10 −5 −6 | 0.06 |
| 9 | sectionAreas −10 −5 −9 | 0.06 |
| 4 | sectionAreas −10 −5 −4 | 0.05 |
| 93 | MROPC −10 −0.02 −9 | 0.04 |
| 5 | sectionAreas −10 −5 −5 | 0.04 |
| 61 | MROPC −7 −0.04 −9 | 0.04 |
| 3 | sectionAreas −10 −5 −3 | 0.04 |
| 13 | sectionConv −10 −5 −3 | 0.03 |
| 7 | sectionAreas −10 −5 −7 | 0.03 |
Top ten features for the toothrow-diet training set ranked by their mean probability distribution. See Table 1 for definitions.
Features used in the mixed-diet training set.
| ID | Feature | P |
| 99 | D2dist-100000-mean | 0.11 |
| 7 | sectionAreas −10 −5 −7 | 0.1 |
| 2 | sectionAreas −10 −5 −2 | 0.07 |
| 42 | MROPC −5 −0.1 −9 | 0.04 |
| 1 | sectionAreas −10 −5 −1 | 0.03 |
| 6 | sectionAreas −10 −5 −6 | 0.03 |
| 9 | sectionAreas −10 −5 −9 | 0.03 |
| 54 | MROPC −7 −0.001 −9 | 0.03 |
| 11 | sectionConv −10 −5 −1 | 0.03 |
| 15 | sectionConv −10 −5 −5 | 0.03 |
Top ten features for the mixed-diet training set ranked by their mean probability distribution. See Table 1 for definitions.
Features used in the tooth-morph training set.
| ID | Feature | P |
| 24 | MROPC −4 −0.006 −9 | 0.07 |
| 6 | sectionAreas −10 −5 −6 | 0.07 |
| 11 | sectionConv −10 −5 −1 | 0.06 |
| 2 | sectionAreas −10 −5 −2 | 0.06 |
| 100 | D2dist-100000-std | 0.05 |
| 25 | MROPC −4 −0.008 −9 | 0.05 |
| 4 | sectionAreas −10 −5 −4 | 0.05 |
| 7 | sectionAreas −10 −5 −7 | 0.04 |
| 99 | D2dist-100000-mean | 0.04 |
| 73 | MROPC −8 −0.06 −9 | 0.04 |
Top ten features for the tooth-morph training set ranked by their mean probability distribution. See Table 1 for definitions.
Features used in the toothrow-morph training set.
| ID | Feature | P |
| 15 | sectionConv −10 −5 −5 | 0.06 |
| 3 | sectionAreas −10 −5 −3 | 0.06 |
| 54 | MROPC −7 −0.001 −9 | 0.05 |
| 65 | MROPC −8 −0.001 −9 | 0.05 |
| 4 | sectionAreas −10 −5 −4 | 0.05 |
| 95 | MROPC −10 −0.06 −9 | 0.05 |
| 39 | MROPC −5 −0.04 −9 | 0.05 |
| 62 | MROPC −7 −0.06 −9 | 0.04 |
| 43 | MROPC −6 −0.001 −9 | 0.04 |
| 80 | MROPC −9 −0.008 −9 | 0.03 |
Top ten features for the toothrow-morph training set ranked by their mean probability distribution. See Table 1 for definitions.
Overlap of features in the five training sets.
| tooth-diet | |||||
| toothrow-diet | |||||
| mixed-diet | |||||
| tooth-morph | |||||
| toothrow-morph | |||||
|
| x | 0 | 4 | 3 | 1 |
|
| x | 3 | 3 | 2 | |
|
| x | 5 | 2 | ||
|
| x | 1 | |||
|
| x | ||||
The number of exact matches between top ten features of different training sets.