| Literature DB >> 33303770 |
Yu Li1, Hongfei Cao1, Carla M Allen2, Xin Wang3, Sanda Erdelez4, Chi-Ren Shyu5,6.
Abstract
Visual reasoning is critical in many complex visual tasks in medicine such as radiology or pathology. It is challenging to explicitly explain reasoning processes due to the dynamic nature of real-time human cognition. A deeper understanding of such reasoning processes is necessary for improving diagnostic accuracy and computational tools. Most computational analysis methods for visual attention utilize black-box algorithms which lack explainability and are therefore limited in understanding the visual reasoning processes. In this paper, we propose a computational method to quantify and dissect visual reasoning. The method characterizes spatial and temporal features and identifies common and contrast visual reasoning patterns to extract significant gaze activities. The visual reasoning patterns are explainable and can be compared among different groups to discover strategy differences. Experiments with radiographers of varied levels of expertise on 10 levels of visual tasks were conducted. Our empirical observations show that the method can capture the temporal and spatial features of human visual attention and distinguish expertise level. The extracted patterns are further examined and interpreted to showcase key differences between expertise levels in the visual reasoning processes. By revealing task-related reasoning processes, this method demonstrates potential for explaining human visual understanding.Entities:
Mesh:
Year: 2020 PMID: 33303770 PMCID: PMC7730148 DOI: 10.1038/s41598-020-77550-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The system architecture of visual reasoning quantification and extraction.
p-value for Mann–Whitney test on all group comparisons with overall spatial distance and Kullback–Leibler temporal distance.
| Task | Spatial Distance | Temporal Distance | ||||||
|---|---|---|---|---|---|---|---|---|
| EE/EN | EE/EJ | EE/ES | EJ/ES | EE/EN | EE/EJ | EE/ES | EJ/ES | |
| 1 | < | < | < | < | < | |||
| 2 | < | < | < | 0.693 | 0.703 | |||
| 3 | < | < | < | < | < | |||
| 4 | < | < | 0.228 | < | < | < | < | |
| 5 | 0.074 | 0.699 | < | < | ||||
| 6 | < | < | < | 0.656 | < | < | < | 0.365 |
| 7 | < | < | < | 0.097 | < | < | < | |
| 8 | < | < | 0.498 | < | < | < | 0.502 | |
| 9 | < | < | < | < | 0.143 | |||
| 10 | 0.729 | 0.056 | 0.553 | |||||
Bold numbers highlight a p-value smaller than 0.05.
The number of contrast patterns (), average and maximum pattern lengths (, ), average pair-wise Levenshtein and ROI distance between patterns within a group (, ), average support value (), and the percentage of patterns with infinity growth (, unique subsequences for the group) of extracted contrast visual reasoning patterns for Expert and Novice groups in Expert vs. Novice comparison.
| Task | Expert|ExpertNovice | Novice|ExpertNovice | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 19 | 8.42 | 16 | 8.532 | 0.604 | 0.407 | 68.4% | 28 | 7.86 | 16 | 7.780 | 0.636 | 0.357 | 75.0% |
| 2 | 101 | 7.59 | 16 | 7.671 | 0.769 | 0.272 | 76.2% | 41 | 6.20 | 13 | 6.690 | 0.806 | 0.296 | 92.7% |
| 3 | 115 | 8.64 | 16 | 7.955 | 0.536 | 0.275 | 53.9% | 71 | 7.00 | 10 | 6.601 | 0.644 | 0.326 | 71.8% |
| 4 | 28 | 8.07 | 13 | 7.722 | 0.722 | 0.281 | 46.4% | 47 | 8.98 | 13 | 6.669 | 0.313 | 0.318 | 63.8% |
| 5 | 41 | 7.95 | 16 | 7.706 | 0.589 | 0.289 | 53.7% | 36 | 6.92 | 13 | 7.148 | 0.671 | 0.326 | 75.0% |
| 6 | 50 | 8.26 | 16 | 8.607 | 0.755 | 0.269 | 42.0% | 54 | 6.33 | 13 | 6.628 | 0.776 | 0.322 | 53.7% |
| 7 | 97 | 9.41 | 19 | 8.619 | 0.565 | 0.256 | 43.3% | 34 | 6.38 | 13 | 6.783 | 0.793 | 0.321 | 73.5% |
| 8 | 9 | 4.33 | 7 | 4.528 | 0.912 | 0.252 | 77.8% | 28 | 4.54 | 7 | 4.667 | 0.836 | 0.318 | 71.4% |
| 9 | 35 | 7.26 | 13 | 8.525 | 0.616 | 0.338 | 57.1% | 65 | 5.85 | 10 | 6.974 | 0.682 | 0.362 | 70.8% |
| 10 | 65 | 9.17 | 16 | 6.793 | 0.627 | 0.288 | 38.5% | 37 | 7.24 | 13 | 6.259 | 0.796 | 0.295 | 62.2% |
10-Level knowledge structure with corresponding experiment questions.
| Task | Visual Knowledge Level | Questions |
|---|---|---|
| 1 | Type Technique | What is the modality of this image? |
| 2 | Global Distribution | Describe the overall photographic properties of this image. |
| 3 | Local Structure | What basic textual elements do you identify on this image? |
| 4 | Global Composition | How do you orient yourself to this image? |
| 5 | Generic Objects | What body part does this image demonstrate? |
| 6 | Generic Scene | What is the projection of this image? |
| 7 | Specific Objects | Identify 3 foreign objects on this image |
| 8 | Specific Scene | Evaluate the positioning of this image. |
| 9 | Abstract Objects | Describe this patient based on what you see in this image |
| 10 | Abstract Scene | What problem(s) do you think this patient has? |
Figure 2Example eye movement sequences for Task 4. ROIs are marked with yellow lines. The connected blue lines and circles on each image show an entire eye movement sequence by one participant. Each circle is a fixation in the sequence, and its size represents the fixation duration. The red subsequences highlight the contrast pattern by experts which checks the R marks and the label. The sequences are plotted with Matplotlib[43].
Figure 3Example eye movement sequences for Task 7. The red subsequences of experts show close examination of the potential diseased area, while the novices are distracted by the technical imperfections at the left edge (green subsequences).
Figure 4Example eye movement sequences for Task 10. Unlike novices, the experts are not distracted by the non-anatomical structures and closely examine the fracture of the humeral head as shown by the red subsequences.