| Literature DB >> 35321479 |
Shaofu Lin1,2, Zhe Xu1, Ying Sheng1, Lihong Chen1,3, Jianhui Chen1,4,5.
Abstract
Provenances are a research focus of neuroimaging resources sharing. An amount of work has been done to construct high-quality neuroimaging provenances in a standardized and convenient way. However, besides existing processed-based provenance extraction methods, open research sharing in computational neuroscience still needs one way to extract provenance information from rapidly growing published resources. This paper proposes a literature mining-based approach for research sharing-oriented neuroimaging provenance construction. A group of neuroimaging event-containing attributes are defined to model the whole process of neuroimaging researches, and a joint extraction model based on deep adversarial learning, called AT-NeuroEAE, is proposed to realize the event extraction in a few-shot learning scenario. Finally, a group of experiments were performed on the real data set from the journal PLOS ONE. Experimental results show that the proposed method provides a practical approach to quickly collect research information for neuroimaging provenance construction oriented to open research sharing.Entities:
Keywords: attribute extraction; deep adversarial learning; event extraction; neuroimaging provenance; neuroimaging text mining
Year: 2022 PMID: 35321479 PMCID: PMC8936590 DOI: 10.3389/fnins.2021.739535
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
FIGURE 1The whole process of provenance information extraction. (1) Brain informatics provenance modeling: construct an improved BI provenance model to capture the provenance requirements of research sharing in open and FAIR neuroscience. (2) Neuroimaging event definition: define a group of neuroimaging events to transform the BI provenance model into text mining tasks. (3) Corpus extraction and annotation: construct a group of labeled corpora for model training and test. (4) Neuroimaging event extraction: develop the NeuroEAE model to extract defined neuroimaging events for meeting the provenance requirements in open and FAIR neuroscience.
FIGURE 2Diverse attributes of subjects in different articles. In example 1 (Daniel et al., 2014), the attributes of subjects include medication history, age, medical history, and gender. In example 2 (Lanting et al., 2014), the attributes of subjects include health condition and medical history.
FIGURE 3The BI provenance model. It uses six activities (rectangle), including “BI:PerformExperiment,” “nidm:Acquisition,” “BI:PerformAnalysis,” “BI:Activate,” “BI:Deactivate,” and “BI:Effect” to characterize experimental design, analytical process, and results. Six entities (circle), three agents (hexagon), and three attributes (diamond) are connected to these activities for describing various key factors in the research process.
Neuroimaging event category.
| Category | Definition | Description |
| Activate | [{Activate… },<{cognitive function}, cause >,<{brain area}, affect > +] | An “Activate” event is used to describe the appearance of active states in some specific brain areas, which is produced by the execution of a cognitive function. |
| Deactivate | [{deactivate… }, < {Cognitive Function}, cause >,<{Brain Area}, affect > +] | A “Deactivate” event is used to describe the appearance of inactive states in some specific brain areas, which is produced by the execution of a cognitive function. |
| Effect | [{effect | influence… }, < {Cognitive Function}, cause >, < {Brain Area}, affect > +] | An “Effect” event is used to describe the change of states in some specific brain areas, which is produced by the execution of a cognitive function, but whether it is activated or inactivated is unknown. |
| Perform experiment | [{perform | complete… }, < {Study Participant (Study Participant_A)*}, participates in > *, < {Task}, uses > +, < {Stimuli Response Mode (Stimuli Response Mode_A)*}*, by >] | A “Perform Experiment” event is used to describe a neuroimaging research action by a group of study participants, i.e., subjects do one or several experimental tasks by some kind of stimuli response mode. Study participants often have some attributes, such as age, gender, and medical history. The stimuli response mode is also involved with some features, such as the perception channel, the stimuli task category. |
| Acquisition | [{assess | acquire | obtain…}, < {Acquisition Object}, produces > *, < {Data Acquisition Device (Data Acquisition Device_A)*}, uses > +, < {Study Participant (Study Participant_A)*}, from > *] | The “Acquisition” event is used to describe a research action that a data acquisition device with relevant parameters produces physiological and psychological data from study participants. |
| Perform analysis | [{perform | complete… }, < {Analytical Results}, produces > +, < {Acquisition Object}, on > *, < {Analytical Tools or Methods}, uses > +] | The “Perform Analysis” event is used to describe a research action that the analytical tool or method is used on physiological and psychological data to produce a group of analytical results, i.e., brain responses, such as Default Mode Network (dmn). |
“*” indicates that the element may occur zero or more times and “+” indicates that the element may occur one or more times.
Argument categories.
| Argument category | Description |
| Cognitive function (COG) | The cognitive function is an ability of the human brain to process information and used to denote the brain function implied by brain responses in computational neuroscience research. |
| Brain area (BRI) | The brain area is an anatomical region of the cerebral cortex and used to mark the occurrence location of brain response in computational neuroscience research. |
| Data acquisition device (ACQ) | The data acquisition device is a kind of brain testing equipment used in computational neuroscience research. |
| Stimuli response mode (SEN) | The stimuli response mode is used to denote the sensory channel of stimulus presentation in computational neuroscience research. |
| Study participant (STP) | The study participant is a participant in computational neuroscience research and recorded for behavioral or brain physiological data. |
| Task (TSK) | The experimental task is a task (e.g., questions, games, etc.) which is performed by the study participant in computational neuroscience research. |
| Acquisition object (AOB) | The acquisition object is a kind of physiological and psychological data which are collected by the data acquisition device in computational neuroscience research. |
| Analytical tools or methods (TOL) | The analytical tool or methods is a mining algorithm or tool which is used to analyze experimental data in computational neuroscience research. |
| Analytical results (RLT) | The analytical results are a series of brain responses which are mined from experimental data in computational neuroscience research. |
FIGURE 4An example of an “Acquisition” event from the article (Mutschler et al., 2016). This event consists of a trigger word “using,” two “Data Acquisition Device” category of arguments “fMRI” and “SCR,” and one “Study Participant” category of argument “infant.”
The number of terms in dictionaries.
| Element type | Element | Term number | Element | Term number |
| Trigger word | Acquisition | 65 | Effect | 145 |
| Perform experiment | 62 | Deactivate | 20 | |
| Perform analysis | 227 | Activate | 12 | |
| Argument | BRI | 576 | SEN | 52 |
| COG | 812 | STP | 97 | |
| AOB | 48 | TOL | 171 | |
| ACQ | 57 | TSK | 814 | |
| RLT | 28 | |||
| Argument | Data acquisition device_A | 33 | Study participant_A | 7 |
| Stimuli response mode_A | 24 |
FIGURE 5The distribution of event mentions in the experimental data set. It consists of 3331 event mentions extracted from 677 neuroimaging articles. The “Activate” category includes 788 mentions and accounts for 24% of the total. The “Deactivate” category includes 128 mentions and accounts for 4% of the total. The “Effect” category includes 1169 mentions and accounts for 35% of the total. The “Perform experiment” category includes 665 mentions and accounts for 20% of the total. The “Acquisition” category includes 266 mentions and accounts for 8% of the total. The “Perform Analysis” category includes 315 mentions and accounts for 9% of the total.
Event element annotation.
| Annotation example | ||||||||
| High | fMRI | Was | Obtained | Through | The | Central | Sulcus | |
| O | O | B-AOB | O | B-Acq | O | O | B-BRI | I-BRI |
Event role and attribute annotation.
| Position | Word | Event role | Role/attribute position |
| 1 | High | ||
| 2 | Produces | ||
| 3 | fMRI | Produces | [2] |
| 4 | Was | ||
| 5 | Obtained | Trigger | [3,8] |
| 6 | Through | ||
| 7 | The | ||
| 8 | Central | From | |
| 9 | Sulcus | From |
FIGURE 6The AT-NeuroEAE model. The text vectorization layer encodes sentences as textual vectors based on lexical units, case features, and domain terminology dictionaries. The event element prediction layer predicts the potential event elements by using the BiLSTM-CRF model. The role-attribute recognition layer identifies the role and attribute of argument by using the sigmoid function. The adversarial learning mechanism adds small and persistent disturbances to the input of joint model for improving the robustness and generalization of the model. BiLSTM: bi-directional long short-term memory; CRF: conditional random fields; adv: adversarial learning.
F1 values of event extraction.
| Event category | CNN-BiLSTM-PCNN | GCN | NeuroEAE | AT-NeuroEAE |
| Acquisition | 0.588 | 0.542 | 0.689 | 0.734 |
| Perform analysis | 0.448 | 0.596 | 0.547 | 0.671 |
| Perform experiment | 0.368 | 0.429 | 0.612 | 0.853 |
| Effect | 0.569 | 0.576 | 0.690 | 0.692 |
| Activate | 0.614 | 0.682 | 0.809 | 0.756 |
| Deactivate | 0.922 | 0.952 | 0.974 | 0.608 |
F1 values in fivefold cross validation.
| Event category | ||||||
| 1 | 2 | 3 | 4 | 5 | Average | |
| Acquisition | 0.700 | 0.720 | 0.768 | 0.763 | 0.722 | 0.734 |
| Perform analysis | 0.676 | 0.674 | 0.699 | 0.610 | 0.696 | 0.671 |
| Perform experiment | 0.889 | 0.837 | 0.863 | 0.871 | 0.808 | 0.853 |
| Effect | 0.724 | 0.686 | 0.696 | 0.734 | 0.618 | 0.692 |
| Activate | 0.772 | 0.756 | 0.740 | 0.781 | 0.728 | 0.756 |
| Deactivate | 0.639 | 0.613 | 0.608 | 0.600 | 0.583 | 0.608 |
Running times in fivefold cross validation.
| Experiment | 1 | 2 | 3 | 4 | 5 | Average |
| Model training (s) | 17194.8 | 17601.7 | 17331.0 | 17425.7 | 17191.0 | 17348.8 |
| Model test (s) | 7.46 | 5.95 | 6.79 | 6.98 | 6.18 | 6.7 |
Event element prediction.
| Element | CNN-BiLSTM-PCNN | NeuroEAE | AT- NeuroEAE | |||||||
|
|
|
|
|
|
|
|
|
| ||
| Trigger word | Acquisition | 0.864 | 0.888 | 0.876 | 0.951 | 0.925 | 0.938 | 0.972 | 0.972 | 0.972 |
| Perform experiment | 0.841 | 0.803 | 0.821 | 0.851 | 0.802 | 0.826 | 0.915 | 0.915 | 0.915 | |
| Analytical results | 0.879 | 0.79 | 0.832 | 0.981 | 0.879 | 0.927 | 0.843 | 0.931 | 0.885 | |
| Effect | 0.958 | 0.846 | 0.898 | 0.956 | 0.945 | 0.951 | 0.951 | 0.951 | 0.951 | |
| Deactivate | 0.821 | 1 | 0.902 | 1 | 0.958 | 0.978 | 1 | 1 | 1 | |
| Activate | 1 | 0.985 | 0.993 | 0.889 | 1 | 0.941 | 0.971 | 0.985 | 0.978 | |
| Argument | BRI | 0.863 | 0.871 | 0.867 | 0.929 | 0.908 | 0.919 | 0.921 | 0.908 | 0.915 |
| COG | 0.744 | 0.703 | 0.723 | 0.769 | 0.704 | 0.735 | 0.861 | 0.789 | 0.824 | |
| AOB | 0.941 | 0.899 | 0.92 | 0.975 | 0.91 | 0.941 | 0.988 | 0.943 | 0.965 | |
| ACQ | 0.962 | 0.962 | 0.962 | 0.897 | 1 | 0.897 | 1 | 1 | 1 | |
| RLT | 0.963 | 0.897 | 0.929 | 1 | 0.896 | 0.945 | 0.964 | 0.931 | 0.947 | |
| SEN | 0.972 | 0.648 | 0.778 | 0.895 | 0.781 | 0.834 | 0.959 | 0.854 | 0.903 | |
| STP | 0.744 | 0.727 | 0.736 | 0.851 | 0.909 | 0.879 | 0.811 | 0.977 | 0.886 | |
| TOL | 0.776 | 0.844 | 0.809 | 0.864 | 0.711 | 0.78 | 0.907 | 0.867 | 0.886 | |
| TSK | 0.982 | 0.873 | 0.924 | 0.936 | 0.936 | 0.937 | 0.953 | 0.968 | 0.961 | |
| Argument attribute | ACQ_A | 0.667 | 0.905 | 0.776 | 0.944 | 0.809 | 0.871 | 0.9 | 0.857 | 0.878 |
| SEN_A | 0.744 | 0.806 | 0.773 | 0.794 | 0.75 | 0.771 | 0.965 | 0.778 | 0.862 | |
| STP_A | 0.962 | 0.833 | 0.893 | 0.964 | 0.9 | 0.931 | 1 | 0.966 | 0.983 | |
Role-attribute recognition.
| Event role | CNN-BiLSTM-PCNN | NeuroEAE | AT- NeuroEAE | ||||||
|
|
|
|
|
|
|
|
|
| |
| Acquisition-uses | 0.597 | 0.74 | 0.661 | 0.708 | 0.667 | 0.687 | 0.766 | 0.706 | 0.735 |
| Acquisition-produces | 0.558 | 0.898 | 0.688 | 0.772 | 0.809 | 0.791 | 0.8 | 0.762 | 0.781 |
| Perform analysis-on | 0.571 | 0.667 | 0.615 | 0.681 | 0.687 | 0.684 | 0.729 | 0.794 | 0.761 |
| Perform analysis-produces | 0.5 | 0.692 | 0.581 | 0.686 | 0.528 | 0.597 | 0.687 | 0.632 | 0.658 |
| Perform analysis-uses | 0.7 | 0.333 | 0.451 | 0.667 | 0.5 | 0.571 | 0.642 | 0.843 | 0.729 |
| Perform experiment-by | 0.471 | 0.696 | 0.562 | 0.56 | 0.583 | 0.571 | 0.619 | 0.541 | 0.578 |
| Perform experiment-participates in | 0.3 | 0.462 | 0.364 | 0.416 | 0.357 | 0.384 | 0.462 | 0.428 | 0.444 |
| Perform experiment-uses | 0.511 | 0.649 | 0.571 | 0.56 | 0.736 | 0.636 | 0.591 | 0.684 | 0.634 |
| Effect-affect | 0.75 | 0.788 | 0.768 | 0.711 | 0.821 | 0.762 | 0.832 | 0.806 | 0.818 |
| Effect-cause | 0.649 | 0.649 | 0.649 | 0.677 | 0.677 | 0.677 | 0.731 | 0.686 | 0.708 |
| Deactivate-affect | 0.875 | 0.966 | 0.918 | 1 | 1 | 1 | 1 | 1 | 1 |
| Deactivate-cause | 0.905 | 0.95 | 0.918 | 0.952 | 0.952 | 0.952 | 1 | 1 | 1 |
| Activate-affect | 0.862 | 0.812 | 0.836 | 0.833 | 0.864 | 0.848 | 0.861 | 0.765 | 0.81 |
| Activate-cause | 0.566 | 0.625 | 0.594 | 0.676 | 0.588 | 0.625 | 0.667 | 0.549 | 0.602 |
| ACQ-attribute | 0.594 | 0.92 | 0.867 | 0.956 | 0.709 | 0.814 | 0.806 | 0.806 | 0.806 |
| SEN-attribute | 0.385 | 0.385 | 0.385 | 1 | 0.526 | 0.689 | 0.9 | 0.346 | 0.5 |
| STP-attribute | 1 | 0.815 | 0.898 | 0.692 | 0.62 | 0.654 | 0.772 | 0.586 | 0.667 |
Comparison of pipeline models with and without cascading errors.
| Event role | CNN-BiLSTM-PCNN (no cascading errors) | CNN-BiLSTM-PCNN | ||||
|
|
|
|
|
|
| |
| Acquisition-uses | 0.657 | 0.98 | 0.951 | 0.597 | 0.74 | 0.661 |
| Acquisition-produces | 0.612 | 1 | 0.759 | 0.558 | 0.898 | 0.688 |
| Perform analysis-on | 0.695 | 0.842 | 0.761 | 0.571 | 0.667 | 0.615 |
| Perform analysis-produces | 0.692 | 0.857 | 0.765 | 0.5 | 0.692 | 0.581 |
| Perform analysis-uses | 1 | 0.478 | 0.647 | 0.7 | 0.333 | 0.451 |
| Perform experiment-by | 0.469 | 0.958 | 0.63 | 0.471 | 0.696 | 0.562 |
| Perform experiment-participates in | 0.423 | 0.785 | 0.55 | 0.3 | 0.462 | 0.364 |
| Perform experiment-uses | 0.692 | 0.947 | 0.799 | 0.511 | 0.649 | 0.571 |
| Effect-affect | 0.886 | 0.921 | 0.903 | 0.75 | 0.788 | 0.768 |
| Effect-cause | 0.783 | 0.775 | 0.779 | 0.649 | 0.649 | 0.649 |
| Deactivate-affect | 0.906 | 1 | 0.951 | 0.875 | 0.966 | 0.918 |
| Deactivate-cause | 0.869 | 1 | 0.93 | 0.905 | 0.95 | 0.918 |
| Activate-affect | 0.975 | 0.962 | 0.968 | 0.862 | 0.812 | 0.836 |
| Activate-cause | 0.833 | 0.882 | 0.857 | 0.566 | 0.625 | 0.594 |
| ACQ-attribute | 0.878 | 0.935 | 0.906 | 0.594 | 0.92 | 0.867 |
| SEN-attribute | 0.96 | 0.923 | 0.941 | 0.385 | 0.385 | 0.385 |
| STP-attribute | 0.96 | 0.827 | 0.889 | 1 | 0.815 | 0.898 |
FIGURE 7Visualization results of event extraction. The top is five events extracted from the article (Prehn-Kristensen et al., 2009). The bottom is LDA topics extracted from the same article. In order to compare with the results of AT-NeuroEAE, LDA topics are manually divided into three classes, brain mechanism, experiment, and analysis.
FIGURE 8A comparison of neuroimaging entity/label categories/span of interest. The study on Shardlow et al. (2018) mainly focused on brain mechanism, especially multi-level brain structures. The study on Riedel et al. (2019) only took account of the experimental process. The study on Sheng et al. (2019) focused on brain mechanism and two experimental factors, including sensory stimuli or response and study participants’ medical problems. The study on Zhu et al. (2020) paid attention on pathology and mechanism of brain diseases. Our study is involved with the whole research process and extracted information is organized by events with rich semantics.