| Literature DB >> 36038567 |
Jixing Li1,2, Shohini Bhattasali3, Shulin Zhang4, Berta Franzluebbers4, Wen-Ming Luh5, R Nathan Spreng6, Jonathan R Brennan7, Yiming Yang8, Christophe Pallier9, John Hale10.
Abstract
Neuroimaging using more ecologically valid stimuli such as audiobooks has advanced our understanding of natural language comprehension in the brain. However, prior naturalistic stimuli have typically been restricted to a single language, which limited generalizability beyond small typological domains. Here we present the Le Petit Prince fMRI Corpus (LPPC-fMRI), a multilingual resource for research in the cognitive neuroscience of speech and language during naturalistic listening (OpenNeuro: ds003643). 49 English speakers, 35 Chinese speakers and 28 French speakers listened to the same audiobook The Little Prince in their native language while multi-echo functional magnetic resonance imaging was acquired. We also provide time-aligned speech annotation and word-by-word predictors obtained using natural language processing tools. The resulting timeseries data are shown to be of high quality with good temporal signal-to-noise ratio and high inter-subject correlation. Data-driven functional analyses provide further evidence of data quality. This annotated, multilingual fMRI dataset facilitates future re-analysis that addresses cross-linguistic commonalities and differences in the neural substrate of language processing on multiple perceptual and linguistic levels.Entities:
Mesh:
Year: 2022 PMID: 36038567 PMCID: PMC9424229 DOI: 10.1038/s41597-022-01625-7
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Schematic overview of the LPPC-fMRI data collection procedures, preprocessing, technical validation and annotation. During data collection (blue), anatomical MRI was first acquired, followed by functional MRI while participants listened to 9 sections of the audiobook. After preprocessing the data (green), behavioral and overall data quality were examined (yellow). Audio and text annotations were extracted using NLP tools.
Demographics of the participants, data collection procedures, and stimuli information for the English, Chinese, and French datasets.
| Language | Participants | Data Collection | Stimuli | |||||
|---|---|---|---|---|---|---|---|---|
| Number | Mean Age | Female | Location | Material | Length (s) | N Words | N Sentences | |
| English | 49 | 21.3 | 30 | Cornell University, United States | The little prince EN audiobook | 5632 | 15376 | 1499 |
| Chinese | 35 | 19.9 | 15 | Jiangsu Normal University, China | The little prince CN audiobook | 5954 | 16009 | 1577 |
| French | 28 | 24.4 | 15 | NeuroSpin, France | The little prince FR audiobook | 5828 | 15391 | 1480 |
List of subjects in the data collection with basic demographic information.
| English | Chinese | French | ||||||
|---|---|---|---|---|---|---|---|---|
| Participant ID | Age | Sex | Participant ID | Age | Sex | Participant ID | Age | Sex |
| sub-EN057 | 20 | F | sub-CN001 | 18 | F | sub-FR001 | 40 | M |
| sub-EN058 | 22 | M | sub-CN002 | 18 | F | sub-FR002 | 23 | M |
| sub-EN059 | 21 | F | sub-CN003 | 22 | F | sub-FR003 | 26 | F |
| sub-EN061 | 25 | F | sub-CN004 | 18 | M | sub-FR004 | 20 | M |
| sub-EN062 | 23 | M | sub-CN005 | 18 | F | sub-FR005 | 23 | F |
| sub-EN063 | 22 | M | sub-CN006 | 19 | F | sub-FR006 | 30 | M |
| sub-EN064 | 19 | M | sub-CN007 | 20 | F | sub-FR007 | 20 | M |
| sub-EN065 | 21 | F | sub-CN008 | 21 | F | sub-FR008 | 23 | M |
| sub-EN067 | 21 | F | sub-CN009 | 20 | M | sub-FR009 | 18 | F |
| sub-EN068 | 19 | M | sub-CN010 | 22 | M | sub-FR010 | 28 | F |
| sub-EN069 | 21 | F | sub-CN011 | 20 | M | sub-FR011 | 26 | F |
| sub-EN070 | 20 | F | sub-CN013 | 20 | F | sub-FR012 | 28 | F |
| sub-EN072 | 18 | F | sub-CN014 | 19 | M | sub-FR013 | 23 | F |
| sub-EN073 | 19 | F | sub-CN015 | 19 | F | sub-FR014 | 20 | F |
| sub-EN074 | 18 | F | sub-CN016 | 18 | F | sub-FR015 | 23 | F |
| sub-EN075 | 18 | M | sub-CN017 | 22 | M | sub-FR016 | 22 | M |
| sub-EN076 | 20 | M | sub-CN018 | 21 | M | sub-FR017 | 24 | M |
| sub-EN077 | 22 | M | sub-CN019 | 20 | M | sub-FR018 | 23 | F |
| sub-EN078 | 19 | F | sub-CN020 | 21 | M | sub-FR019 | 25 | F |
| sub-EN079 | 21 | F | sub-CN021 | 19 | F | sub-FR020 | 25 | F |
| sub-EN081 | 22 | F | sub-CN022 | 20 | F | sub-FR022 | 20 | F |
| sub-EN082 | 28 | F | sub-CN023 | 20 | F | sub-FR023 | 19 | M |
| sub-EN083 | 20 | F | sub-CN024 | 19 | F | sub-FR024 | 20 | M |
| sub-EN084 | 28 | F | sub-CN025 | 18 | M | sub-FR025 | 22 | M |
| sub-EN086 | 19 | M | sub-CN026 | 20 | M | sub-FR026 | 32 | F |
| sub-EN087 | 22 | M | sub-CN027 | 18 | M | sub-FR028 | 22 | M |
| sub-EN088 | 21 | M | sub-CN028 | 24 | M | sub-FR029 | 30 | F |
| sub-EN089 | 33 | M | sub-CN029 | 19 | M | sub-FR030 | 27 | M |
| sub-EN091 | 20 | M | sub-CN030 | 19 | M | |||
| sub-EN092 | 21 | M | sub-CN031 | 21 | M | |||
| sub-EN093 | 20 | F | sub-CN032 | 21 | M | |||
| sub-EN094 | 21 | F | sub-CN033 | 22 | M | |||
| sub-EN095 | 20 | F | sub-CN034 | 18 | F | |||
| sub-EN096 | 18 | F | sub-CN036 | 22 | M | |||
| sub-EN097 | 21 | F | sub-CN037 | 22 | M | |||
| sub-EN098 | 24 | F | ||||||
| sub-EN099 | 37 | F | ||||||
| sub-EN100 | 19 | F | ||||||
| sub-EN101 | 23 | M | ||||||
| sub-EN103 | 18 | F | ||||||
| sub-EN104 | 19 | F | ||||||
| sub-EN105 | 19 | F | ||||||
| sub-EN106 | 20 | M | ||||||
| sub-EN108 | 18 | M | ||||||
| sub-EN109 | 19 | M | ||||||
| sub-EN110 | 21 | F | ||||||
| sub-EN113 | 21 | F | ||||||
| sub-EN114 | 20 | M | ||||||
| sub-EN115 | 23 | F | ||||||
Fig. 2Annotation information for the stimuli. (a) Word boundaries in the audio files, included in files: lpp
Scanner parameters for structural and functional scans across English, Chinese, and French datasets.
| Language | Scanner | Head coil | Anatomical/Structural Scans | Functional Scans | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Pulse sequence | in-plane resolution | slice thickness | Pulse sequence | TRs | TEs | Flip angle | Matrix size | FoV | Image acceleration | N axial slices | in-plane resolution | slice thickness | |||
| English | 3 T MRI GE Discovery MR750 | 32 channel | T1W MPRAGE | 1.0 mm × 1.0 mm | 1.0 mm | ME-EPI | 2000 ms | 2.8, 27.5, 43 ms | 77 | 72 × 72 | 240.0 mm × 240.0 mm | 2x | 33 | 3.75 mm × 3.75 mm | 3.8 mm |
| Chinese | 3 T MRI GE Discovery MR750 | 32 channel | T1W MPRAGE | 1.0 mm × 1.0 mm | 1.0 mm | ME-EPI | 2000 ms | 2.8, 27.5, 43 ms | 77 | 72 × 72 | 240.0 mm × 240.0 mm | 2x | 33 | 3.75 mm × 3.75 mm | 3.8 mm |
| French | 3 T Siemens Magnetom Prisma Fit 230 | 64 channel | T1W MPRAGE | 1.0 mm × 1.0 mm | 1.0 mm | ME-EPI | 2000 ms | 10, 25, 38 ms | 77 | 72 × 72 | 240.0 mm × 240.0 mm | 2x | 34 | 3.75 mm × 3.75 mm | 3.8 mm |
Fig. 3Organization of the data collection. (a) General overview of directory structure. (b) Content of subject-specific anatomical and raw data directories. (c) Content of subject-specific preprocessed data directories. (d) Content of the stimuli directory. (e) Content of the quiz directory. (f) Content of the language-specific annotation directory.
Summary of framewise displacement information for the English, Chinese and French data.
| FD (mm) | FD > 0.2 mm (%) | |||
|---|---|---|---|---|
| Mean | SD | Mean | SD | |
| English | 0.11 | 0.05 | 9.3 | 10.6 |
| Chinese | 0.08 | 0.05 | 5.0 | 8.2 |
| French | 0.10 | 0.02 | 4.6 | 5.0 |
Fig. 4Voxel-wise temporal signal-to-noise ratio analysis before and after preprocessing. Cohen’s d effect sizes showed increase in tSNR after preprocessing.
Fig. 5Results of inter-subject correlation (ISC) demonstrating data quality and timing synchrony between participants. As expected, the temporal regions showed the largest correlation in brain responses across subjects.
Fig. 6GLM analyses to localize the wordrate regressor. (a) Offest of each word in the audiobook was marked 1 and was convolved with the canonical hemodynamic response function. (b) The timecourse of each voxel’s BOLD signals was modeled using our designmatrix at the first level At the group level, a one-sample t-test was performed on the distribution of the beta values for the wordrate regressor across subjects at each voxel for the fMRI data. Statistical significance was held at p < 0.05 FWE with a cluster size greater than 50.
Fig. 7GLM results showing the significant clusters for (a) the pitch and (b) word regions in the English, Chinese and French data using f0 and wordrate annotations. Red areas in the second column of the 3D brains shows meta-analyses of pitch and word regions from Neurosynth[37]. Statistical significance was thresholded at p < 0.05 FWE and k > 50.
GLM results for the f0 and wordrate regressors for the Chinese, English and French fMRI data: MNI coordinates, cluster size and their peak level statistics, thresholded at p < 0.05 FWE and k > 50.
| Condition | Language | Cluster | MNI Coordinates | k-size | ||
|---|---|---|---|---|---|---|
| Prosody | Chinese | RSTG | 62, −14, 0 | 3566 | 20.22 | <0.001 |
| L Heschl’s Gyrus | −56, −6, 4 | 5036 | 19.97 | <0.001 | ||
| L Frontal Lobe | −4, 0, 62 | 73 | 7.40 | <0.001 | ||
| LMFG | −52, −2, 48 | 64 | 6.15 | 0.0005 | ||
| English | L Heschl’s Gyrus | −50, −18, 6 | 5330 | 22.98 | <0.001 | |
| RSTG | 58, −20, 4 | 5053 | 22.64 | <0.001 | ||
| LIFG | −52, 26, 10 | 864 | 10.09 | <0.001 | ||
| LSFG | −8, 58, 26 | 1272 | 9.55 | <0.001 | ||
| LMFG | −34, 12, 42 | 145 | 6.88 | <0.001 | ||
| RIFG | 52, 26, −8 | 153 | 6.88 | <0.001 | ||
| French | LSTG | −62, −12, 4 | 1349 | 8.92 | <0.001 | |
| RSTG | 68, −22, 2 | 218 | 7.09 | <0.001 | ||
| L Precuneus | −4, −70, 32 | 53 | 6.63 | <0.001 | ||
| LMFG | −42, 18, 28 | 150 | 6.14 | <0.001 | ||
| Word | Chinese | LAG | −50, −64, 22 | 2040 | 14.45 | <0.001 |
| LMFG | −28, 22, 48 | 1194 | 10.39 | <0.001 | ||
| LMTG | −56, 0, −20 | 358 | 9.65 | <0.001 | ||
| LMTG | −60, −46, −6 | 511 | 8.61 | <0.001 | ||
| RAG | 54, −64, 26 | 289 | 8.03 | <0.001 | ||
| English | LMTG | −52, −4, −28 | 1683 | 11.10 | <0.001 | |
| LAG | −48, −60, 26 | 1561 | 10.54 | <0.001 | ||
| LMFG | −36, 16, 50 | 1770 | 10.09 | <0.001 | ||
| LIFG | −46, 32, −12 | 288 | 9.10 | <0.001 | ||
| RMTG | 60, −4, −30 | 171 | 6.92 | <0.001 | ||
| LIFG | −52, 26, 8 | 86 | 6.57 | <0.001 | ||
| RAG | 52, −64, 28 | 191 | 6.55 | <0.001 | ||
| French | LSTG | −54, −4, −12 | 1674 | 9.45 | <0.001 | |
| LAG | −50, −60, 24 | 516 | 8.88 | <0.001 | ||
| RSTG | 62, −2, 2 | 72 | 7.01 | <0.001 |
Example of renaming convention using symbolic links to keep run numbers consistent across participants.
| Original file | Renamed file |
|---|---|
| sub-EN084_task-lppEN_run-09_echo-1_bold.nii.gz | sub-EN084_task-lppEN_run-01_echo-1_bold.nii.gz |
| sub-EN084_task-lppEN_run-09_echo-2_bold.nii.gz | sub-EN084_task-lppEN_run-01_echo-2_bold.nii.gz |
| sub-EN084_task-lppEN_run-09_echo-3_bold.nii.gz | sub-EN084_task-lppEN_run-01_echo-3_bold.nii.gz |
| sub-EN084_task-lppEN_run-10_echo-1_bold.nii.gz | sub-EN084_task-lppEN_run-02_echo-1_bold.nii.gz |
| sub-EN084_task-lppEN_run-10_echo-2_bold.nii.gz | sub-EN084_task-lppEN_run-02_echo-2_bold.nii.gz |
| sub-EN084_task-lppEN_run-10_echo-3_bold.nii.gz | sub-EN084_task-lppEN_run-02_echo-3_bold.nii.gz |
| sub-EN084_task-lppEN_run-13_echo-1_bold.nii.gz | sub-EN084_task-lppEN_run-03_echo-1_bold.nii.gz |
| sub-EN084_task-lppEN_run-13_echo-2_bold.nii.gz | sub-EN084_task-lppEN_run-03_echo-2_bold.nii.gz |
| sub-EN084_task-lppEN_run-13_echo-3_bold.nii.gz | sub-EN084_task-lppEN_run-03_echo-3_bold.nii.gz |
| sub-EN084_task-lppEN_run-14_echo-1_bold.nii.gz | sub-EN084_task-lppEN_run-04_echo-1_bold.nii.gz |
| sub-EN084_task-lppEN_run-14_echo-2_bold.nii.gz | sub-EN084_task-lppEN_run-04_echo-2_bold.nii.gz |
| sub-EN084_task-lppEN_run-14_echo-3_bold.nii.gz | sub-EN084_task-lppEN_run-04_echo-3_bold.nii.gz |
| sub-EN084_task-lppEN_run-15_echo-1_bold.nii.gz | sub-EN084_task-lppEN_run-05_echo-1_bold.nii.gz |
| sub-EN084_task-lppEN_run-15_echo-2_bold.nii.gz | sub-EN084_task-lppEN_run-05_echo-2_bold.nii.gz |
| sub-EN084_task-lppEN_run-15_echo-3_bold.nii.gz | sub-EN084_task-lppEN_run-05_echo-3_bold.nii.gz |
| sub-EN084_task-lppEN_run-16_echo-1_bold.nii.gz | sub-EN084_task-lppEN_run-06_echo-1_bold.nii.gz |
| sub-EN084_task-lppEN_run-16_echo-2_bold.nii.gz | sub-EN084_task-lppEN_run-06_echo-2_bold.nii.gz |
| sub-EN084_task-lppEN_run-16_echo-3_bold.nii.gz | sub-EN084_task-lppEN_run-06_echo-3_bold.nii.gz |
| sub-EN084_task-lppEN_run-17_echo-1_bold.nii.gz | sub-EN084_task-lppEN_run-07_echo-1_bold.nii.gz |
| sub-EN084_task-lppEN_run-17_echo-2_bold.nii.gz | sub-EN084_task-lppEN_run-07_echo-2_bold.nii.gz |
| sub-EN084_task-lppEN_run-17_echo-3_bold.nii.gz | sub-EN084_task-lppEN_run-07_echo-3_bold.nii.gz |
| sub-EN084_task-lppEN_run-18_echo-1_bold.nii.gz | sub-EN084_task-lppEN_run-08_echo-1_bold.nii.gz |
| sub-EN084_task-lppEN_run-18_echo-2_bold.nii.gz | sub-EN084_task-lppEN_run-08_echo-2_bold.nii.gz |
| sub-EN084_task-lppEN_run-18_echo-3_bold.nii.gz | sub-EN084_task-lppEN_run-08_echo-3_bold.nii.gz |
| sub-EN084_task-lppEN_run-19_echo-1_bold.nii.gz | sub-EN084_task-lppEN_run-09_echo-1_bold.nii.gz |
| sub-EN084_task-lppEN_run-19_echo-2_bold.nii.gz | sub-EN084_task-lppEN_run-09_echo-2_bold.nii.gz |
| sub-EN084_task-lppEN_run-19_echo-3_bold.nii.gz | sub-EN084_task-lppEN_run-09_echo-3_bold.nii.gz |
| Measurement(s) | Blood Oxygen Level-Dependent Functional MRI |
| Technology Type(s) | Magnetization-Prepared Rapid Gradient Echo MRI |
| Sample Characteristic - Organism | Homo sapiens |