| Literature DB >> 36040302 |
Alejandro de la Vega1, Roberta Rocca1,2, Ross W Blair3, Christopher J Markiewicz3, Jeff Mentch4,5, James D Kent1, Peer Herholz6, Satrajit S Ghosh5,7, Russell A Poldrack3, Tal Yarkoni1.
Abstract
Functional magnetic resonance imaging (fMRI) has revolutionized cognitive neuroscience, but methodological barriers limit the generalizability of findings from the lab to the real world. Here, we present Neuroscout, an end-to-end platform for analysis of naturalistic fMRI data designed to facilitate the adoption of robust and generalizable research practices. Neuroscout leverages state-of-the-art machine learning models to automatically annotate stimuli from dozens of fMRI studies using naturalistic stimuli-such as movies and narratives-allowing researchers to easily test neuroscientific hypotheses across multiple ecologically-valid datasets. In addition, Neuroscout builds on a robust ecosystem of open tools and standards to provide an easy-to-use analysis builder and a fully automated execution engine that reduce the burden of reproducible research. Through a series of meta-analytic case studies, we validate the automatic feature extraction approach and demonstrate its potential to support more robust fMRI research. Owing to its ease of use and a high degree of automation, Neuroscout makes it possible to overcome modeling challenges commonly arising in naturalistic analysis and to easily scale analyses within and across datasets, democratizing generalizable fMRI research.Entities:
Keywords: fMRI; generalizability; human; naturalistic; neuroinformatics; neuroscience; open source; reproducibility
Mesh:
Year: 2022 PMID: 36040302 PMCID: PMC9489206 DOI: 10.7554/eLife.79277
Source DB: PubMed Journal: Elife ISSN: 2050-084X Impact factor: 8.713
Figure 1.Example of automated feature extraction on stimuli from the “Merlin” dataset.
Visual features were extracted from video stimuli at a frequency of 1 Hz. ‘Faces’: we applied a well-validated cascaded convolutional network trained to detect the presence of faces (Zhang et al., 2016). ‘Building’: We used Clarifai’s General Image Recognition model to compute the probability of the presence of buildings in each frame. ‘Spoken word frequency’ codes for the lexical frequency of words in the transcript, as determined by the SubtlexUS database (Brysbaert and New, 2009). Language features are extracted using speech transcripts with precise word-by-word timing determined through forced alignment.
Figure 2.Overview schematic of analysis creation and model execution.
(a) Interactive analysis creation is made possible through an easy-to-use web application, resulting in a fully specified reproducible analysis bundle. (b) Automated model execution is achieved with little-to-no configuration through a containerized model fitting workflow. Results are automatically made available in NeuroVault, a public repository for statistical maps.
Figure 3.Meta-analytic statistical maps for GLM models targeting a variety of effects with strong priors from fMRI research.
Individual GLM models were fit for each effect of interest, and dataset level estimates were combined using image-based meta-analysis. Images were thresholded at Z=3.29 (P<0.001) voxel-wise. Abbreviations: V1=primary visual cortex; FEF = frontal eye fields; AG = angular gyrus; PCUN = precuneus; A1=primary auditory cortex; PMC = premotor cortex; IFG = inferior frontal gyrus; STS = superior temporal sulcus; STG = superior temporal gyrus; PPA = parahippocampal place area; VWFA = visual word-form area; IPL = inferior parietal lobule; IPS = inferior parietal sulcus; LOTC = lateral occipito-temporal cortex.
Figure 4.Comparison of a sample of four single study results with meta-analysis (N=20) for three features: ‘building’ and ‘text’ extracted through Clarifai visual scene detection models, and sound ‘loudness’ (root mean squared of the auditory signal).
Images were thresholded at Z=3.29 (p<0.001) voxel-wise. Regions with a priori association with each predictor are highlighted: PPA, parahippocampal place area; VWFA, visual word form area; STS, superior temporal sulcus. Datasets: Budapest, Learning Temporal Structure (LTS), 500daysofsummer task from Naturalistic Neuroimaging Database, and Sherlock.
Figure 5.Meta-analysis of face perception with iterative addition of covariates.
Left; Only including binary predictors coding for the presence of faces on screen did not reveal activity in the right fusiform face area (rFFA). Middle; Controlling for speech removed spurious activations and revealed rFFA association with face presentation. Right; Controlling for temporal adaptation to face identity in addition to speech further strengthened the association between rFFA and face presentation. N=17 datasets; images were thresholded at Z=3.29 (p<0.001) voxel-wise.
Figure 6.Meta-analytic statistical maps for concreteness and frequency controlling for speech, text length, number of syllables and phonemes, and phone-level Levenshtein distance.
N=33 tasks; images were thresholded at Z=3.29 (p<0.001) voxel-wise. Visual word form area, VWFA.
Neuroscout datasets included in the validation analyses.
Subj is the number of unique subjects. Scan Time is the mean scan time per subject (in minutes). AV = Audio-Visual; AN = Audio Narrative.
| Name | Subj | DOI/URI | Scan time | Modality | Description |
|---|---|---|---|---|---|
| Study Forrest ( | 13 |
| 120 | AV | Slightly abridged German version of the movie: ‘Forrest Gump’ |
| Life ( | 19 |
| 62.8 | AV | Four segments of the Life nature documentary |
| Raiders ( | 11 |
| 113.3 | AV | Full movie: ‘Raiders of the Lost Ark’ |
| Learning Temporal Structure (LTS) ( | 30 |
| 20.1 | AV | Three clips from the movie ‘Grand Budapest Hotel’, presented six times each. Some clips were scrambled. |
| Sherlock ( | 16 |
| 23.7 | AV | The first half of the first episode from ‘Sherlock’ TV series. |
| SherlockMerlin ( | 18 | Temporarily unavailable | 25.1 | AV | Full episode from ‘Merlin’ TV series. Only used Merlin task to avoid analyzing the Sherlock task twice. |
| Schematic Narrative ( | 31 |
| 50.4 | AV/AN | 16 three-minute clips, including audiovisual clips and narration. |
| ParanoiaStory ( | 22 |
| 21.8 | AN | Audio narrative designed to elicit individual variation in suspicion/paranoia. |
| Budapest ( | 25 |
| 50.9 | AV | The majority of the movie ‘Grand Budapest Hotel’, presented in intact order |
| Naturalistic Neuroimaging Database (NNDb) ( | 86 |
| 112.03 | AV | Movie watching of 10 full-length movies |
| Narratives ( | 328 |
| 32.5 | AN | Passive listening of 16 audio narratives (two tasks were not analyzed due to preprocessing error) |
Extractor name, feature name, and description for all Neuroscout features used in the validation analyses.
| Extractor | Feature | Description |
|---|---|---|
| Brightness | brightness | Average luminosity across all pixels in each video frame. |
| Clarifai | building, landscape, text, tool | Indicators of the probability that an object belonging to each of these categories is present in the video frame. |
| FaceNet | any_faces, log_mean_time_cum | For each video frame, any_faces indicates the probability that the image displays at least one face. log_mean_time_cum indicates the cumulative time (in seconds) a given face has been on screen up since the beginning of the movie. If multiple faces are present, their cumulative time on screen is averaged. |
| Google Video Intelligence | shot_change | Binary indicator coding for shot changes. |
| FAVE/Rev | speech | Binary indicator coding for the presence of speech in the audio signal, inferred from word onsets/offsets information from force-aligned speech transcripts. |
| RMS | rms | Root mean square (RMS) energy of the audio signal. |
| Lexical norms | Log10WF, concreteness, phonlev, numsylls, numphones, duration, text_length | Logarithm of SubtlexUS lexical frequency, concreteness rating, phonological Levenshtein distance, number of syllables, number of phones, average auditory duration and number of characters for each word in the speech transcript. These metrics are extracted from lexical databases available through pliers. |