| Literature DB >> 27390931 |
Anne Sonnenschein1,2, David VanderZee3, William R Pitchers2,3, Sudarshan Chari2,3, Ian Dworkin4,5,6,7.
Abstract
BACKGROUND: Extracting important descriptors and features from images of biological specimens is an ongoing challenge. Features are often defined using landmarks and semi-landmarks that are determined a priori based on criteria such as homology or some other measure of biological significance. An alternative, widely used strategy uses computational pattern recognition, in which features are acquired from the image de novo. Subsets of these features are then selected based on objective criteria. Computational pattern recognition has been extensively developed primarily for the classification of samples into groups, whereas landmark methods have been broadly applied to biological inference.Entities:
Keywords: Computer vision; Drosophila; Geometric morphometrics; Mutants; Phenomics; Wing shape
Mesh:
Year: 2015 PMID: 27390931 PMCID: PMC4942975 DOI: 10.1186/s13742-015-0065-6
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Fig. 1Wing landmarks and semi-landmarks. a Example wing image from D. melanogaster that has been splined using WINGMACHINE. b After landmark and semi-landmark data is extracted, data is translated (centered to origin), scaled by centroid size and superimposed (Procrustes superimposition for landmarks) data all lies in a common subspace. Image represents 50 individual configurations from specimens to demonstrate some of the variation among individuals
Drosophila allele information
| Bloomington stock number | Gene name | Gene symbol | Allele name |
|---|---|---|---|
| 10385 |
|
| P{lacW} |
| 14189 |
|
| P{SUPor-P} |
| 10418 |
|
| P{lacW} |
| 14403 |
|
| P{SUPor-P} |
Drosophila wings dissected by sex and genotype
| Female | Male | |
|---|---|---|
|
| 116 | 118 |
|
| 106 | 130 |
|
| 107 | 100 |
|
| 115 | 111 |
|
| 116 | 116 |
Fig. 2Magnitude and direction of the effect of each mutation (red) relative to Samarkand wild type (black). Magnitudes are in units of Procrustes distance (PD), which for this (tangent approximation) is equivalent to the Euclidean distance between the mean vector of each mutant and the Samarkand (SAM) wild type. The vectors of shape differences are magnified three-fold to enhance the clarity of the effects
Fig. 3Representative images from the database. From right top corner counter-clockwise: mastermind, Epidermal growth factor receptor, Star and thickveins. mastermind, Egfr and Star mutations are all homozygous lethal and thickveins has a qualitative defect as a homozygote. As heterozygotes, they are qualitatively indistinguishable from the Samarkand (SAM) wild type (center)
Fig. 4Left and right wings from the same female (SAM) fly, imaged four times. Top left are images taken on Olympus BX51 microscope at 40× magnification, top right are taken on Leica M125 at 40× magnification. Bottom left and right are images taken at 20× magnification
Classification accuracy of machine learning algorithms using landmark and semi-landmark data
| Algorithm | Sex (± Standard error) | Genotype (± Standard error) |
|---|---|---|
| LDA | 98.2 % (±1.6) | 86.1 % (±1.5) |
| QDA | 81.5 % (±6.4) | 68.7 % (±2.2) |
| FDA | 98.2 % (±1.6) | 86.0 % (±1.5) |
| MDA | 98.1 % (±1.6) | 84.8 % (±1.6) |
| Bagging | 93.3 % (±2.9) | 57.6 % (±2.9) |
| Random forest | 94.6 % (±2.7) 100 trees | 74.9 % (±2.1) 1,000 trees |
| SVM | 96.8 % (±2.1) sigmoid | 83.8 % (±1.6) radial |
| Neural network (size 10) | 98.3 % (±1.6) | 81.2 % (±2.2) |
| KNN | 98.3 % (±1.5) k = 4 | 59.3 % (±2.1) k = 32 |
Fig. 5Separation of specimens using landmark data using linear discriminant analysis. Separation of specimens for each of the five genotype by linear discriminant analysis (LDA) in training set (left panel) and testing set (right panel), plotting the first discriminant function by the second discriminant function. This includes data for both males and females, but averaged (left and right wings) per specimen
Classification accuracy of machine learning algorithms compared with BioCAT
| Classification | Algorithm | Hessian | Shape | Shape + Size |
|---|---|---|---|---|
| Sex | Random forest (10) | 85.0 % | 92.3 % (±3.7) | 94.7 % (±2.6) |
| Random forest (1,000) | 85.0 % | 96.1 % (±2.2) | 95.9 % (±2.1) | |
| SVM | 81.7 % | 99.0 % (±1.2) | 99.0 % (±1.2) | |
| Genotype | Random forest (10) | 52.0 % | 43.3 % (±3.5) | 44.7 % (±3.7) |
| Random forest (1,000) | 46.7 % | 69.1 % (±3.4) | 70.2 % (±2.8) | |
| SVM | 43.3 % | 75.1 % (±2.8) | 75.8 % (±2.7) |
Hessian column represents accuracy of classifications based on Hessian features extracted with BioCAT. Shape column represents classification accuracy based on landmarks and semi-landmarks, not including centroid. Shape + size represents classification accuracy based on landmarks and semi-landmarks, including centroid
Fig. 6Confusion matrices. Heatmap of confusion matrices from classification (random forest) using features extracted using BioCAT (a) compared with landmark and semi-landmark data (b). The data in (a) and (b) is shown together in (c) to facilitate comparison. Numbers represent percentage of correct classifications. lm_* represent the landmark/semi-landmark data. BioCAT features were mis-classified more consistently as some genotypes, e.g. mis-classification of mam mutants as Star (a). This pattern is not evident in the classification using the landmark data (b). Scale represents frequency of classification
Classification accuracy of machine learning algorithms compared with BioCAT for predicting sex
| Method | Training images | Testing images | Sex (± Standard error) |
|---|---|---|---|
| BioCAT | Olympus 40× | Olympus 40× | 85.0 % |
| Olympus 40× | Olympus 20× | 50.0 % | |
| Olympus 40× cropped | Olympus 20× cropped | 50.0 % | |
| Leica 40× cropped | Leica 40× cropped | 93.0 % | |
| Olympus 40× cropped | Leica 40× cropped | 73.7 % | |
| Olympus & Leica 40× cropped | Olympus 40× cropped | 73.3 % | |
| Olympus & Leica 40× cropped | Leica 40× cropped | 86.0 % | |
| Landmarks | Olympus 40× landmarks | Olympus 40× landmarks | 98.2 % (±1.6) |
| Olympus 40× landmarks | Olympus 20× landmarks | 81.2 % (±1.4) | |
| Leica 40× landmarks | Leica 40× landmarks | 97.8 % (±0.69) | |
| Olympus 40× landmarks | Leica 40× landmarks | 79.1 % (±1.3) |
Machine learning algorithms using landmark and semi-landmark features, compared with Hessian features extracted by BioCAT, trained and tested across microscopes and magnifications