| Literature DB >> 25574849 |
Nathalie Jeanray1, Raphaël Marée2, Benoist Pruvot3, Olivier Stern4, Pierre Geurts4, Louis Wehenkel5, Marc Muller3.
Abstract
Zebrafish is increasingly used to assess biological properties of chemical substances and thus is becoming a specific tool for toxicological and pharmacological studies. The effects of chemical substances on embryo survival and development are generally evaluated manually through microscopic observation by an expert and documented by several typical photographs. Here, we present a methodology to automatically classify brightfield images of wildtype zebrafish embryos according to their defects by using an image analysis approach based on supervised machine learning. We show that, compared to manual classification, automatic classification results in 90 to 100% agreement with consensus voting of biological experts in nine out of eleven considered defects in 3 days old zebrafish larvae. Automation of the analysis and classification of zebrafish embryo pictures reduces the workload and time required for the biological expert and increases the reproducibility and objectivity of this classification.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25574849 PMCID: PMC4289190 DOI: 10.1371/journal.pone.0116989
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Image Preprocessing: From original image to the one used for classification.
The original image is submitted to an ImageJ script (S1 Fig. for the code) which allows to automatically crop the zebrafish embryo into a square as far as possible. The embryo will be surrounded by a rectangle if it is placed near the sides of the original image. We use a connected-component labeling approach combined with different morphological transformations and binarizations, in order to localize the embryo.
Summary of the effects of the substances used to intoxicate the zebrafish embryos.
|
|
|
|
|---|---|---|
| Acetaminophen | Cardioactive | Tail, Heart and Yolk Sac malformations |
| Propanolol | Cardioactive | Large pericardial edemas, weak pigmentation and tail curvatures |
| Amiodarone | Cardioactive | Failure of cardiac valve formation |
| Thallium | Heavy Metal | High toxicity but morphological effects on the zebrafish embryos are still unrecognized |
| Methylmercury (MeHg) | Heavy Metal | Tail fin fold defects and abolishes the tail fin primordium |
| Lead acetate (PbAc) | Heavy Metal | Induces malformations such as uninflated swim bladder, bent spine and yolk-sac edema |
| Zinc sulfate (ZnSO4) | Heavy Metal | Causes pathological alterations in isolated fish erythrocytes and induces abnormal embryogenesis, low hatchability, delayed hatching, and reduction of newly hatched larvae, and a poor survival ratio |
| Valproic acid (VPA) | Anticonvulsant and Mood-stabilizing | Causes a ventrally curved body axis and pericardial edema |
Figure 2Examples of images representing all the analyzed phenotypes.
“Without phenotype” defect, also called “Normal” embryos (A) are without any phenotype. The “Dead” phenotype (B) is shown as totally necrosed, while the “Chorion” phenotype (C) represents embryos that are still located in their chorion. In “Down Curved Tail” (D), the tail is obviously oriented downward compared to the horizontal. “Hemostasis” (E) presents a small amount of blood which can be located everywhere in the embryo (mainly in the head or in the pericardial area). “Necrosed Yolk Sac” (F) corresponds to a darker yolk compared to normal. In the “Edema” phenotype (G), an edema generally surrounds the anteroventral part of the fish. The “Short Tail” phenotype (H) describes a tail shorter than normal. “Up Curved Fish” (I) and “Up Curved Tail” (J) are two slightly different phenotypes. There is a curvature on the back of the embryo for “Up Curved Fish” (like a kind of lordosis), whereas the curvature is located on the tail for the “Up Curved Tail” phenotype.
Number of images by class (+) and (-) for each phenotype, in the learning set (LS) and the test set (TS).
|
| ||||
|---|---|---|---|---|
|
|
| |||
|
| Nb of (+) images | Nb of (-) images | Nb of (+) images | Nb of (-) images |
|
| 114 | 415 | 53 | 288 |
|
| 18 | 511 | 5 | 336 |
|
| 11 | 518 | 16 | 325 |
|
| 57 | 472 | 83 | 258 |
|
| 167 | 362 | 11 | 330 |
|
| 160 | 369 | 54 | 287 |
|
| 49 | 480 | 149 | 192 |
|
| 32 | 497 | 17 | 324 |
|
| 64 | 465 | 13 | 328 |
|
| 96 | 433 | 29 | 312 |
|
| 160 | 369 | 82 | 259 |
Summary of the classification results in cross-validation on the learning sets and on the independent test sets using the “All binary” or “Two-tier” method.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| BAGS | 25–75 | SIMPLETHRES | 99.63% Chorion: 100% “-”: 99.26% | 90.00% Chorion: 80% “-”: 100% |
|
|
| BAGS | 10–75 | SIMPLETHRES | 99.73% Dead: 99.74% “-”: 99.74% | 99.06% Dead: 98.11% “-”: 100% |
|
|
| BAGS | 50–90 | DIFFNEIGHBOR | 85.13% Down: 95.38% “-”: 74.87% | 82.68% Down: 68.75% “-”: 96.61% | 85.80% Down: 75.0% “-”: 96.61% |
|
| BAGS | 0–100 | SIMPLETHRES | 92.36% Necr.: 99.63% “-”: 85.09% | 95.15% Necr.: 100% “-”: 90.30% | 90.15% Necr.: 90.91% “-”: 89.39% |
|
| BAGS | 10–90 | DIFFNEIGHBOR | 92.07% Edem.: 95.09% “-”: 89.06% | 73.85% Edem.: 75.92% “-”: 71.78% | 75.24% Edem.: 75.92% “-”: 74.56% |
|
| BAGS | 25–90 | DIFFNEIGHBOR | 91.25% Short: 94.16% “-”: 88.33% | 89.94% Short: 89.26% “-”: 90.62% | 89.12% Short: 86.58% “-”: 91.67% |
|
| C | 25–75 | SIMPLETHRES | 96.19% UpFish: 99.04% “-”: 93.33% | 92.04% UpFish: 92.31% “-”: 91.77% | 95.42% UpFish: 100% “-”: 90.85% |
|
| BAGS | 25–90 | DIFFNEIGHBOR | 87.0% UpTail: 94.0% “-”: 80.0% | 86.54% UpTail: 76.47% “-”: 96.60% | 85.45% UpTail: 76.47% “-”: 94.44% |
|
| C | 0–90 | SIMPLETHRES | 94.84% UpFishTail: 98.84% “-”: 91.56% | 80.43% UpFishTail: 72.41% “-”: 88.46% | 78.55% UpFishTail: 68.96% “-”: 88.14% |
|
| BAGS | 25–90 | DIFFNEIGHBOR | 79.82% Hemo: 86.67% “-”: 72.98% | 54.57% Hemo: 28.91% “-”: 80.23% | 51.31% Hemo: 8.43% “-”: 94.19% |
|
| BAGS | 10–75 | SIMPLETHRES | 97.54% Norm.: 98.87% “-”: 96.23% | 91.09% Norm.: 98.78% “-”: 83.40% | 91.09% Norm.: 98.78% “-”: 83.40% |
For each phenotype (column 1), the optimal mode of classification (“BAGS” or “C”) (second column), the size of the extracted random subwindows (expressed in percentage of the size of the original image) (third column) and the kind of test type used for each node (SIMPLETHRES or DIFFNEIGHBOR)(fourth column) are given. “CV rate on LS” gives the results for each phenotype obtained in cross-validation on the corresponding learning sets (LS) with these parameters in the following form: global recognition rate, recognition rate of the specific phenotype, recognition rate of the corresponding “negative” phenotype. “Rate on TS” gives the results obtained with the same parameters on the corresponding independent test sets, respectively with the “All binary” or with the two-tier approach (three-class model followed by binary classification). Recognition rates are in % of the corresponding set.
Figure 3Two-tier pipeline.
Schematic overview of the Two-tier approach for classification, also showing the recognition rates observed on the test set for the three-class model.
Automatic classification of “difficult” images and “a posteriori” expert agreement.
|
|
|
| |
|---|---|---|---|
|
| 0 | 0 | 0 |
|
| 0 | 0 | 0 |
|
| 12 | 12 | 2 |
|
| 0 | 0 | 1 |
|
| 11 | 11 | 0 |
|
| 4 | 4 | 1 |
|
| 2 | 2 | 0 |
|
| 22 | 8 | 2 |
|
| 8 | 4 | 8 |
|
| 22 | 11 | 0 |
|
| 12 | 3 | 2 |
24 “non consensus” images were first classified using the “Two-tier” method and then evaluated by experts for possible agreement. The second column shows the number of images that automatic classification identified as having the corresponding phenotype (+). These images were visualized by an expert and he provided an “a posteriori” opinion on the classification performed automatically. The number of images where he agreed with the automatic classification are shown on the third column. Finally, all images were again screened by the expert for all phenotypes to identify possible “false negative” images, their number is given in the last column.
Chemicals used to intoxicate embryos to build the validation set.
|
|
|
|
|
|---|---|---|---|
|
| 82,4mM | Sigma Aldrich | 194.19 |
|
| 40,85mM | Sigma Aldrich | 180 |
|
| 200mg/l | Sigma Aldrich | 162.02 |
Name, concentration of stock solution, provider, chemical formula, and molecular weight are given.
Summary of the classification results on validation set after being classified by the Two-tier approach.
|
|
|
|
| |
|---|---|---|---|---|
|
| 108 | 11 | 0 | 108/119 = 90,8% |
|
| 18 | 197 | 0 | 197/215 = 91,6% |
|
| 1 | 18 | 838 | 838/857 = 97,8% |
The number of predicted phenotypes and overall prediction rates are given for each of the three “true” subsets.
Figure 4Dose-response curves for caffeine, theophylline and DCA.
Survival and morphological defects of larvae intoxicated from 2dpf to 3dpf. The fraction of surviving larvae is represented by the “Survival” curve on each graph by a LC50 curve. The EC50 (teratogenicity) curve is drawn according to the results given by the “Normal” phenotype, as the fraction of surviving larvae. A, C, E graphs have been obtained on the basis of manual observations, whereas B, D and F graphs are based on automatic analysis.
Comparison of the deduced LC50, EC50 and TI for caffeine, DCA and theophylline.
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||
|
| 0.82 ± 0.03 | 0.81 ± 0.02 | -0.9 ± 0.7 | -0.8 ± 1.0 | 1.72± 0.73 | 1.61± 1.0 | 1.26±0.05 |
|
| 1.74 ± 0.02 | 1.75 ± 0.02 | 0.62 ± 0.02 | 0.56 ± 0.03 | 1.12± 0.04 | 1.19± 0.05 | Not available |
|
| 0.62 ± 0.07 | 0.59 ± 0.07 | -0.11 ± 0.05 | -0.16 ± 0.01 | 0.73± 0.12 | 0.75± 0.1 | 1.06±0.04 |
This table presents all the log(LC50) and log(EC50) values processed on the manual observations and on the values processed by our classification pipeline. These values are compared for each chemical and finally log(TI) (TI = teratogenic index) are calculated this way: log(TI) = log(LC50)-log(EC50). According to the bibliography and also to the results given by our automatic classification, we can conclude that caffeine theophylline and DCA clearly revealed their teratogenicity (log(TI) > 0) on the basis of the results achieved by the automatic analysis.