| Literature DB >> 35387429 |
Anne E Thessen1, Skylar Marvel2, J C Achenbach3, Stephan Fischer4, Melissa A Haendel1, Kimberly Hayward5, Nils Klüver6, Sarah Könemann7, Jessica Legradi8, Pamela Lein9, Connor Leong5, J Erik Mylroie10, Stephanie Padilla11, Dante Perone5, Antonio Planchart12, Rafael Miñana Prieto13, Arantza Muriana14, Celia Quevedo15, David Reif2, Kristen Ryan16, Evelyn Stinckens17, Lisa Truong5, Lucia Vergauwen17, Colette Vom Berg7, Mitch Wilbanks10, Bianca Yaghoobi9, Jon Hamm18.
Abstract
Toxicological evaluation of chemicals using early-life stage zebrafish (Danio rerio) involves the observation and recording of altered phenotypes. Substantial variability has been observed among researchers in phenotypes reported from similar studies, as well as a lack of consistent data annotation, indicating a need for both terminological and data harmonization. When examined from a data science perspective, many of these apparent differences can be parsed into the same or similar endpoints whose measurements differ only in time, methodology, or nomenclature. Ontological knowledge structures can be leveraged to integrate diverse data sets across terminologies, scales, and modalities. Building on this premise, the National Toxicology Program's Systematic Evaluation of the Application of Zebrafish in Toxicology undertook a collaborative exercise to evaluate how the application of standardized phenotype terminology improved data consistency. To accomplish this, zebrafish researchers were asked to assess images of zebrafish larvae for morphological malformations in two surveys. In the first survey, researchers were asked to annotate observed malformations using their own terminology. In the second survey, researchers were asked to annotate the images from a list of terms and definitions from the Zebrafish Phenotype Ontology. Analysis of the results suggested that the use of ontology terms increased consistency and decreased ambiguity, but a larger study is needed to confirm. We conclude that utilizing a common data standard will not only reduce the heterogeneity of reported terms but increases agreement and repeatability between different laboratories. Thus, we advocate for the development of a zebrafish phenotype atlas to help laboratories create interoperable, computable data.Entities:
Keywords: Danio rerio; annotation; endpoint; ontology; phenotype; zebrafish
Year: 2022 PMID: 35387429 PMCID: PMC8979167 DOI: 10.3389/ftox.2022.817999
Source DB: PubMed Journal: Front Toxicol ISSN: 2673-3080
FIGURE 1Reducing data heterogeneity with ontologies. Different laboratories test the same chemical and observe the same endpoint but report their observations differently according to each laboratory’s internal standard. Mapping these terms to an ontology reduces this heterogeneity and aids in data integration across laboratories.
FIGURE 2Example zebrafish larva image from the Vertebrate Automates Screening Technology System. Each survey participant was asked to annotate 24 of these images for each of two surveys.
Data for example trait: abnormal tail.
| General term | Granular term | CURIE | Verbatim annotation by participant |
|---|---|---|---|
| abnormal tail | — | ZP:0001129 | Malformation tail |
| abnormal tail | — | ZP:0001129 | tail deformation |
| abnormal tail | — | ZP:0001129 | malformation tail |
| abnormal tail | — | ZP:0001129 | length of the tail |
| abnormal tail | abnormal tail fin | ZP:0004969 | caudal fin malformation |
| abnormal tail | abnormal tail fin | ZP:0004969 | caudal fin malformations |
| abnormal tail | abnormal tail fin | ZP:0004969 | malformed caudal fin |
| abnormal tail | abnormal tail fin | ZP:0004969 | Cfin |
| abnormal tail | abnormal tail fin | ZP:0004969 | cfin |
| abnormal tail | abnormal tail fin | ZP:0004969 | Malformation tail fin |
| abnormal tail | abnormal tail fin | ZP:0004969 | Tail End Vacuolization |
| abnormal tail | abnormal tail fin | ZP:0004969 | ruffled fin |
| abnormal tail | abnormal tail fin | ZP:0004969 | tail tip necrosis |
| abnormal tail | abnormal tail fin | ZP:0004969 | vacuolization in end of tail |
| abnormal tail | abnormally curved tail | ZP:0010319 | Curved tail |
| abnormal tail | abnormally curved tail | ZP:0010319 | Slightly Curved Tail |
| abnormal tail | abnormally curved tail | ZP:0010319 | bent tail tip |
| abnormal tail | abnormally curved tail | ZP:0010319 | C tail |
| abnormal tail | abnormally curved tail | ZP:0010319 | C-tail |
| abnormal tail | abnormally curved tail | ZP:0010319 | bend tail |
| abnormal tail | abnormally curved tail | ZP:0010319 | curved tail |
| abnormal tail | abnormally curved tail | ZP:0010319 | curved tail tip |
| abnormal tail | abnormally curved tail | ZP:0010319 | slightly curved tail |
| abnormal tail | abnormally curved tail | ZP:0010319 | slight tail curve |
| abnormal tail | abnormally curved tail | ZP:0010319 | tail curve |
| abnormal tail | abnormally curved tail | ZP:0010319 | tail tip curve |
| abnormal tail | abnormally curved tail | ZP:0010319 | tail bending |
| abnormal tail | abnormally curved tail | ZP:0010319 | abnormal tail curvature |
| abnormal tail | abnormally short tail | ZP:0001130 | Possible Short Tail |
| abnormal tail | abnormally short tail | ZP:0001130 | short tail |
| abnormal tail | abnormally short tail | ZP:0001130 | possible short tail |
| abnormal tail | abnormally short tail | ZP:0001130 | reduced tail length |
Survey 1: analytical summary.
| General trait | CURIE | Total tags | Unique terms | Granular child traits | Number of larva | Terms per tag | Terms per trait |
|---|---|---|---|---|---|---|---|
| abnormal | ZP:0005632 | 24 | 7 | 0 | 22 | 0.29 | 7 |
| abnormal axis | ZP:0127724 | 71 | 15 | 1 | 12 | 0.21 | 8 |
| abnormal body length | ZP:0012799 | 82 | 16 | 1 | 18 | 0.20 | 8 |
| abnormal brain | ZP:0000100 | 40 | 7 | 0 | 16 | 0.18 | 7 |
| abnormal eye | ZP:0000943 | 77 | 12 | 1 | 15 | 0.16 | 6 |
| abnormal gut | ZP:0002008 | 8 | 6 | 1 | 4 | 0.75 | 3 |
| abnormal head | ZP:0001609 | 127 | 30 | 3 | 23 | 0.24 | 8 |
| abnormal heart | ZP:0000107 | 249 | 28 | 2 | 20 | 0.11 | 9 |
| abnormal jaw | ZP:0007203 | 153 | 24 | 2 | 24 | 0.16 | 8 |
| abnormal notochord | ZP:0000624 | 49 | 22 | 4 | 8 | 0.45 | 4 |
| abnormal otic vesicle | ZP:0001601 | 41 | 21 | 0 | 8 | 0.51 | 21 |
| abnormal pectoral fin | ZP:0001610 | 14 | 12 | 0 | 13 | 0.86 | 12 |
| abnormal pigmentation | ZP:0015121 | 29 | 12 | 1 | 12 | 0.41 | 6 |
| abnormal snout | ZP:0014550 | 78 | 5 | 0 | 22 | 0.06 | 5 |
| abnormal swim bladder | ZP:0127709 | 221 | 34 | 3 | 20 | 0.15 | 9 |
| abnormal tail | ZP:0001129 | 79 | 33 | 3 | 16 | 0.42 | 8 |
| abnormal trunk | ZP:0003437 | 43 | 3 | 0 | 12 | 0.07 | 3 |
| abnormal yolk | ZP:0002676 | 274 | 53 | 5 | 23 | 0.19 | 9 |
| dead | ZP:0000306 | 2 | 2 | 0 | 1 | 1.00 | 2 |
| necrosis | ZP:0000398 | 12 | 8 | 0 | 5 | 0.67 | 8 |
| normal | 51 | 17 | 0 | 9 | 0.33 | 17 | |
| hatched | 24 | 1 | 0.04 |
Annotations with green color are those with a high degree of homogeneity and annotations with a blue color are ones that had a high degree of heterogeneity.
The number of times the trait, using a general or granular term, was tagged across all larvae and annotators.
The number of unique strings used to describe the trait across all larvae and annotators.
The number of granular traits that fall under each general trait.
The number of larvae to which the trait was applied at least once.
The number of unique terms normalized to the number of times the trait was annotated.
The number of unique terms normalized to the number of general and granular traits.
Intraclass correlation repeatability estimate.
| Annotation granularity | Grouping factor | Survey | |
|---|---|---|---|
| 1 | 2 | ||
| General | Larva | 0.092 | 0.159 |
| Annotation | 0.208 | 0.263 | |
| Rater | 0.043 | 0.047 | |
| Granular | Larva | 0.038 | 0.068 |
| Annotation | 0.150 | 0.111 | |
| Rater | 0.016 | 0.019 | |
FIGURE 3Mean rater concordance using general phenotype terms. These boxplots show the mean concordance (x axis and red or blue bar in shaded box) of the raters by larva (A) or by annotation (B) with interquartile range indicated by shaded area (first to third quantiles). Data from Survey 1 are in red and from Survey 2 are in blue. The dashed whiskers denote the data that are within 1.5 times the interquartile range, with circles annotating data outside that range. Please note that larvae 7 and 8 did not exist. No data were discarded.
FIGURE 4Mean rater concordance using granular phenotype terms. These boxplots show the mean concordance (x axis and red or blue bar in shaded box) of the raters by larva (A) or by annotation (B) with interquartile range indicated by shaded area (first to third quantiles). Data from Survey 1 are in red and from Survey 2 are in blue. The dashed whiskers denote the data that are within 1.5 times the interquartile range, with circles annotating data outside that range. Please note that larvae 7 and 8 did not exist. No data were discarded.
FIGURE 5Concordance change for general phenotype terms. Concordance here represents the frequency for which a particular rater (identified along x axis) made the same annotation as the majority of raters. The “concordance change” is calculated as the number of concordant annotations for Survey 1 subtracted from those for Survey 2 (maximum range is from −24 to 24). An increase in concordance is indicated by blue and a decrease is indicated by red. Both the annotation and rater labels have the overall mean concordance for both surveys in parentheses, with a color-coded change in mean concordance from Survey 1 to Survey 2 below. Note that the lower bound for the annotation mean concordance is 50%, but the rater lower bound is 0%. Axes are sorted by overall mean concordance values. Significant changes in concordance as determined by Fisher's exact tests are indicated by an asterisk.
FIGURE 6Concordance change for granular phenotype terms. Concordance here means the rater made the same annotation as the majority of raters. The “concordance change” is the difference between the number of concordant calls for Survey 2 and those for Survey 1 (maximum values would range from −24 to 24). An increase in concordance is indicated by blue and a decrease is indicated by red. The annotation and rater labels have the overall mean concordance in parentheses (combines both surveys), and a color-coded change in mean concordance from Survey 1 to Survey 2 just below. Note that the lower bound for the annotation mean concordance is 50%, but the rater lower bound is 0%. Axes are sorted by overall mean concordance values. Significant changes in concordance as determined by Fisher's exact tests are indicated by an asterisk.
FIGURE 7Mean concordance and variability in endpoint reporting. Endpoints that were described using a higher number of unique terms (A) and were observed in more larvae (B) had a lower mean concordance across both surveys (filled circles). The change in concordance from Survey 1 to Survey 2 did not share this relationship (open circles).
FIGURE 8Expanding a data set using a knowledge graph. The zebrafish endpoint “microcephaly” can be used to query the Monarch knowledge graph to find relevant genes (rpl11 and rps3a), variants (hi3820bTg), diseases (Diamond-Blackfan anemia), and biological processes (hemopoeisis) to enrich the data set and generate new hypotheses.