Literature DB >> 31940340

BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis.

Manoj Kumar¹, Cameron T Ellis², Qihong Lu¹, Hejia Zhang³, Mihai Capotă⁴, Theodore L Willke⁴, Peter J Ramadge^1,3, Nicholas B Turk-Browne², Kenneth A Norman^1,5.

Abstract

Advanced brain imaging analysis methods, including multivariate pattern analysis (MVPA), functional connectivity, and functional alignment, have become powerful tools in cognitive neuroscience over the past decade. These tools are implemented in custom code and separate packages, often requiring different software and language proficiencies. Although usable by expert researchers, novice users face a steep learning curve. These difficulties stem from the use of new programming languages (e.g., Python), learning how to apply machine-learning methods to high-dimensional fMRI data, and minimal documentation and training materials. Furthermore, most standard fMRI analysis packages (e.g., AFNI, FSL, SPM) focus on preprocessing and univariate analyses, leaving a gap in how to integrate with advanced tools. To address these needs, we developed BrainIAK (brainiak.org), an open-source Python software package that seamlessly integrates several cutting-edge, computationally efficient techniques with other Python packages (e.g., Nilearn, Scikit-learn) for file handling, visualization, and machine learning. To disseminate these powerful tools, we developed user-friendly tutorials (in Jupyter format; https://brainiak.org/tutorials/) for learning BrainIAK and advanced fMRI analysis in Python more generally. These materials cover techniques including: MVPA (pattern classification and representational similarity analysis); parallelized searchlight analysis; background connectivity; full correlation matrix analysis; inter-subject correlation; inter-subject functional connectivity; shared response modeling; event segmentation using hidden Markov models; and real-time fMRI. For long-running jobs or large memory needs we provide detailed guidance on high-performance computing clusters. These notebooks were successfully tested at multiple sites, including as problem sets for courses at Yale and Princeton universities and at various workshops and hackathons. These materials are freely shared, with the hope that they become part of a pool of open-source software and educational materials for large-scale, reproducible fMRI analysis and accelerated discovery.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 31940340 PMCID： PMC6961866 DOI： 10.1371/journal.pcbi.1007549

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

This is a PLOS Computational Biology Software paper.

Introduction

The latest methods for analyzing brain activity recorded via functional magnetic resonance imaging (fMRI) are complex to learn and execute. This is particularly true for multivariate pattern analysis (MVPA) methods, which focus on extracting information about a person’s cognitive state (i.e., percepts, thoughts, memories) from spatially and/or temporally distributed patterns of fMRI activity. Beginners and even intermediate users face a steep learning curve and uncertainty in using these complex techniques. Even expert users are hesitant to add new, more advanced techniques to their existing pipelines, and face significant software and hardware challenges in doing so. These difficulties continue, even though MVPA has been used successfully for almost two decades to answer fundamental questions in cognitive neuroscience. MVPA encompasses a wide range of analyses: from pattern classifiers that map between distributed brain patterns and cognitive states [1-4], to techniques that explore the similarity structure exploited by classifiers (e.g., representational similarity analysis, RSA; [5,6]). There are also related multivariate techniques for functional connectivity and functional alignment, including: full correlation matrix analysis (FCMA; [7]), inter-subject correlation (ISC; [8,9]), inter-subject functional connectivity (ISFC; [10]), shared response modeling (SRM; [11]), and event segmentation [12]. These analyses can be run after data collection is complete or in real-time for neurofeedback training or adaptive design optimization [13-16]. There exist multiple open-source packages that implement MVPA and RSA techniques. Some of these packages require commercial software such as MATLAB with paid licenses and proprietary code (e.g., Princeton MVPA Toolbox, The Decoding Toolbox [17], and CoSMoMVPA [18]), while others are completely open-source (e.g., Nilearn [19] and PyMVPA [20,21]). Although all of these packages cover a broad range of MVPA and RSA techniques, they do not cover techniques such as FCMA, ISC, ISFC, SRM, and event segmentation. One barrier to increasing the accessibility of these techniques is that, in most cases, they were created as custom code within individual labs and are thus not part of other fMRI software analysis packages. To address this, we implemented and released them in an open-source Python package called BrainIAK. The tutorials that are the focus of this article provide extensive background, code, and exercises, which serve as structured guidance for learning how to use these and other advanced fMRI analysis techniques. The tutorials also show how to use methods in other packages such as Nilearn and integrate them with the methods in BrainIAK. In a typical fMRI analysis pipeline, the data are first pre-processed, a general linear model (GLM) might be fit, and then MVPA or other more advanced analyses are performed. For pre-processing and GLM analysis of fMRI data, a number of tutorials and bootcamps are available to learn software packages such as AFNI, FSL, SPM, and fmriprep [22-25]; a recent publicly released course at http://dartbrains.org also nicely covers these topics. In contrast, for MVPA and more advanced analyses, fewer educational materials are available. We designed the present tutorials to make it easier for the novice user to learn these techniques. An expert user can use our materials to understand BrainIAK’s implementation of these techniques, to train other researchers, and to teach research methods classes. There are three main steps to learning and implementing BrainIAK methods: (1) learning to write code and scripts, (2) understanding machine learning algorithms and how to apply them to cognitive neuroscience data, and (3) executing jobs on high-performance compute clusters. We elaborate on each of these steps below. First, one needs to learn a programming language; for example, BrainIAK uses Python. This can present a significant challenge to a beginner as learning to program and how to apply these skills to scientific computing is a time-consuming process. Such skills have only recently been added to the curriculum in some psychology and neuroscience departments, and been included as components of hackathons and summer schools. As instructors tend to teach in the language they are most familiar with, different programming languages are often used to teach various techniques, making it difficult for users to switch flexibly between methods. Second, the analysis techniques in BrainIAK involve extensive use of machine learning algorithms that may be unfamiliar to cognitive neuroscientists. There are multiple tutorials on machine learning available (see examples on Scikit-learn https://scikit-learn.org/stable/auto_examples/index.html); however, only a few cover the use of machine learning in cognitive neuroscience: for example, the documentation for Nilearn [19], lectures from the MIND summer school, lectures from the Organization for Human Brain Mapping education section and hackathons, and blogs such as MVPA Meanderings. For many of the cutting-edge techniques in BrainIAK, no tutorials exist (one notable exception is the volumetric searchlight technique; tutorials for this method are included in the PyMVPA [20,21] and Nilearn [19] packages) or they are taught only as a part of special workshops. Furthermore, the application of general-purpose machine learning algorithms in cognitive neuroscience needs to be done with care, as not all data are independent of each other in space or time; this has led to the insidious problem of circular inference or “double dipping” [26]. Third, the execution of these programs on high-performance compute clusters is non-trivial even for advanced practitioners who are proficient at executing code on individual machines. Using clusters can accelerate analyses dramatically through parallelization, but sizing the memory needed and enabling parallel code execution for optimal run-times requires an understanding of how jobs are scheduled and processed in a cluster environment. It is a challenge to find training materials on how to run fMRI analyses on a compute cluster, although, resources are becoming increasingly available, for example, lectures on Neurohackademy (https://neurohackademy.org/course_type/lectures/); and forums such as NeuroStars (https://neurostars.org) for using fmriprep on clusters [25]. We have created learning materials (herein referred to as tutorials) that address each of the above challenges, making it easier for novice users to learn MVPA and for expert users to learn more advanced BrainIAK analysis techniques, such as FCMA and SRM. To aid learning to code, the tutorials provide an interactive environment to read, write, and execute Python. Specifically, for novice users, a simple way to learn programming is to study small snippets of code with a clear description of what is being accomplished by the code. Our use of Jupyter notebooks [27] allows for detailed explanations of the code with text and figures embedded in-line. The user can execute the code step-by-step and interact with data at each step using plotting functions. In order to ease users into the use of advanced analysis techniques, we first introduce them to a fully-working but simplified version of the code. After mastering this version, we encourage users to delve deeper and learn more about helper functions and input/output variables. Expert users, who may wish to examine the details of how the data are being processed, or modify the code to suit their needs, can readily do so using the open-source Python code contained in the Jupyter notebooks. For all users, we embed background material and references, prompts for further self-study, and problem set exercises to help them learn how to generate and adapt code. The exercises for each notebook focus on neuroscientific applications of the techniques being learned; thus, by working through the exercises, students learn how to use these techniques to answer meaningful neuroscientific questions (course instructors may contact us for more information). To help users learn how to apply machine learning algorithms to cognitive neuroscience data, we build on several open-source machine learning tools in Python. For data loading we use Nibabel [28]; for data masking, normalization, dimensionality reduction, plotting, atlases, and functional connectivity we use Nilearn [19]; and for machine learning libraries we use Scikit-learn [29]. We include detailed instructions and exercises on how to avoid problems of circular inference and double-dipping. We also use tools native to BrainIAK for applying cutting-edge machine learning to fMRI data, including parallelized searchlight analysis [30]. An important consideration is how to prepare the data in a suitable format. Publicly available datasets are often in a raw state and need to be pre-processed (e.g., motion correction, registration, and masking) before they can be used for advanced analyses. The pre-processing can take a significant amount of time and add to the burden on the learner. To circumvent this bottleneck, we supply fully pre-processed data with the tutorials, making it significantly easier for a novice user to get started and quickly perform a successful analysis. Having made it easy to access code and use machine learning algorithms, we embrace the third challenge: running the code efficiently using compute clusters. It can be difficult to take code that works on a laptop and modify it to efficiently leverage the resources of a cluster and scale performance to meet the demands of large datasets. This is a burden on the user and requires specialized expertise to write efficient, properly parallelized code. BrainIAK has built-in tools for making the most of clusters to scale analyses easily. In fact, the same code works seamlessly from a laptop (with a few cores) to clusters (with thousands of cores). For example, searchlight analysis (see [31]) involves running the same MVPA thousands of times at different points in the brain, which can be extremely slow on a laptop or desktop. BrainIAK includes a searchlight function that distributes these jobs on a cluster to run them in parallel. This function can be invoked using a few lines of code and runs seamlessly on any computing hardware. The tutorials give example code for cluster computing that can easily be extended to novel datasets. In addition to parallelizing the code, cluster environments can present other complications for learners. In particular, the interactive nature of working on a laptop or desktop is absent when working on a cluster, making troubleshooting difficult. Cluster environments also demand resource allocations up front (i.e., number of cores and amount of memory); increasing memory or extending time during program execution is not permitted. The tutorials use the SLURM scheduler [32] and provide instructions on how to determine the resources required to execute jobs and how to monitor running jobs. In summary, we present a set of tutorials created to enable users of all skill levels to learn and deploy advanced multivariate fMRI analysis techniques. In addition to covering the latest incarnation of MVPA [1],[5], we provide recommendations on optimizing classifiers and strategies to avoid double-dipping. We also cover a range of cutting-edge techniques available in BrainIAK, including searchlight analysis, FCMA, ISC, ISFC, SRM, real-time fMRI, and event segmentation using hidden Markov models. We have released these tutorials publicly and freely. The users can also apply these methods to publicly available datasets from the existing literature, leading to independent validation of the published results. We are hopeful that this will help increase reproducibility of future results more broadly: when tutorial users analyze their own data, they will have already become familiar with the tools necessary to share their code and data, leading to a cycle of improved data sharing and code validation.

Design and implementation

Tutorials

Our learning materials are built and integrated using freely available tools and packages. The tutorials are written in the Python programming language. They are presented as Jupyter notebooks with background, documentation, and figures for each section of the code. For data loading, masking, and writing files in NIFTI format, we use Nibabel and Nilearn. A variety of functions useful for machine learning are called from Scikit-learn. Each notebook is paired with a publicly available dataset that is analyzed using the code (see Table 1). These datasets have already been pre-processed using standard steps and parameters, allowing the user to focus on learning the analyses. We have compiled a condensed version of these datasets, reducing the number of subjects to ensure that the tutorials run quickly and have reasonable memory requirements [33]. The results from the analyses are plotted using Matplotlib [34] and Seaborn [35]. For network connectivity diagrams, Networkx [36] and Nxviz were used. To load hdf5 files, the Deepdish package was used. The Watchdog package was used to indicate when new files were created.

Table 1

The datasets used in the tutorials.

We are releasing these datasets under the Creative Commons Attribution 4.0 International License. Although some of these datasets are publicly available, we provide condensed versions of these datasets, along with masks, that are easier to use and may be downloaded from Zenodo (https://doi.org/10.5281/zenodo.2598755). For quicker download speeds, the datasets may be downloaded from the Brainiak tutorials website (https:/brainiak.org/tutorials).

	Datasets	Source	Used in tutorials	Online archive of the dataset
1.	Faces, places, and objects	[37]	1–5, 7	https://openneuro.org/datasets/ds001926/versions/1.0.1Condensed version available on the BrainIAK tutorials website.
2.	Ninety-six objects	[38]	6	Not on OpenNeuro. Condensed version available on the BrainIAK tutorials website.
3.	Faces and scenes	[39]	7, 9	Not on OpenNeuro. Condensed version available on the BrainIAK tutorials website.
4.	Lateralized attention	[40]	8	Not on OpenNeuro. Condensed version available on the BrainIAK tutorials website.
5.	Pieman story	[10]	10, 11	https://dataspace.princeton.edu/jspui/handle/88435/dsp015d86p269kCondensed version available on the BrainIAK tutorials website.
6.	Raiders movie	[41]	11	https://github.com/HaxbyLab/raiders_dataCondensed version available on the BrainIAK tutorials website.
7.	Raiders images	[41]	11	https://github.com/HaxbyLab/raiders_dataCondensed version available on the BrainIAK tutorials website.
8.	Sherlock movie	[42]	12	https://openneuro.org/datasets/ds001132/versions/1.0.0Condensed version available on the BrainIAK tutorials website.

The datasets used in the tutorials.

BrainIAK

BrainIAK is a software library for advanced fMRI analysis co-designed by cognitive neuroscientists and computer scientists. BrainIAK offers a Python interface and is mostly written in Python, but contains optimized code written in Cython and C++. Many of the methods in BrainIAK scale from a laptop to compute clusters using OpenMP [43] and MPI [44] parallel and distributed computing technologies. BrainIAK assumes that data have already been pre-processed with other pipelines and relies on other packages for plotting. The user is free to use any preprocessing pipeline (e.g., fmriprep, AFNI). Data are exchanged in standard NIFTI and NumPy formats with existing tools such as Nibabel or Nilearn and our tutorials show how to import data into Python structures and use BrainIAK. The functions in BrainIAK parse the data in a time x voxels format, with an exception being the searchlight function that takes in 4-D volumes. The BrainIAK package also serves as an ecosystem for users to contribute their own methods while avoiding duplication of methods found in other packages.

Hardware configurations

We have provided detailed instructions on how to configure the tutorials on different computing platforms here: https://brainiak.org/tutorials. Multiple installation or usage options are available using: Google Colaboratory for running through a web browser on the cloud, Docker for running on a Macintosh or Windows computer, and Conda for running on a Macintosh computer, Linux server, or high-performance compute cluster. We tested the tutorials on clusters using the SLURM scheduler. We provide scripts to launch Jupyter notebooks on clusters and connect to the tutorials through a web browser via an SSH tunnel. We also provide bash scripts for running the tutorials on these remote servers. For long-running jobs that need large amounts of resources on the cluster, we use Python scripts that are submitted to the cluster as batch jobs instead of the more interactive Jupyter notebooks. These scripts are also provided along with the tutorials.

Classroom deployment

These notebooks were initially developed for research methods courses taught at an advanced undergraduate/graduate level at Yale and Princeton. Each notebook was intentionally designed to be a suitable length for a weekly problem set that would take students between three and twelve hours, depending on the skill level of the student and complexity of the topic. To implement these tutorials in a classroom setting, we configured cluster resources for the class and distributed and collected assigned notebooks using GitHub Classroom. Another feature of GitHub Classroom is that it keeps student responses private from other students and yet gives the instructors easy access.

Results

Our goal was to create user-friendly educational materials (https://brainiak.org/tutorials) that can be used by novice or expert practitioners to learn how to deploy advanced fMRI analyses in their research. The execution of the notebooks on a cluster is also made simple. If the requisite software and data are installed on the cluster, a user simply needs to connect to the cluster from their laptop/desktop computer, open a web browser, and access the Jupyter notebooks. The tutorials can also be run on the cloud for free via Google Colaboratory. Each tutorial notebook has an overarching theme of a scientific question relevant to cognitive neuroscience. The accompanying notebook exercises help the user understand the method and its applicability to the scientific question by requiring that they generate answers or code. The questions and exercises can be used to formally evaluate students enrolled in a for-credit course (course instructors may contact us for more information). These questions are posed in the context of a publicly available fMRI dataset. These datasets are distributed with the tutorials in a ready-to-use (pre-processed) state. The user is also encouraged to make novel contributions using the method that they learned in the tutorial, either by enhancing the method, creating a new visualization of the data, or even using the method on another dataset, e.g., from OpenNeuro (http://openneuro.org) [45]. Once the user has acquired proficiency in executing the notebooks from a browser, we introduce running programs on clusters by submitting scripts as batch jobs. Each of the notebooks can be run independently. For the beginning and intermediate user, we recommend starting with the first notebook and working through 1–7. After this, the user can choose to focus on a particular method among notebooks 8–13. An advanced user already familiar with Python and machine learning can start with any notebook in the sequence. For those who are new to clusters but otherwise proficient at fMRI analysis, the searchlight notebook is a useful starting point. We describe the contents of each notebook (https://brainiak.org/tutorials) in more detail below: Setup: An introductory notebook to help users learn how to work with Jupyter. Data handling and normalization: Load fMRI datasets into a Python environment using Nilearn and Nibabel packages. The importance of normalizing the data is shown via an exercise using a simulated dataset. Classification: Once the data have been loaded and normalized, the BOLD signal is extracted with a shift to account for hemodynamic lag and classification is performed using a linear classifier. The importance of separating training and test data is emphasized and cross-validation is introduced. The pitfalls of double-dipping are highlighted and the leave-one-run-out approach is covered. A category localizer dataset is used to examine modular vs. distributed processing in the visual system. Dimensionality reduction: Introduce principal component analysis (PCA), explore how to select the number of dimensions, and highlight the importance of using cross-validation to perform feature selection. Determine the smallest number of components yielding the “best” decoding accuracy. Show how other dimensionality reduction techniques can be substituted into this pipeline. Classifier optimization: Use grid search and pipelines from Scikit-learn to tune hyperparameters and perform nested cross-validation. How to handle mild forms of double-dipping (e.g., “peeking” at unlabeled test data by including it in z-scoring) that are often unavoidable, by performing permutation tests with randomized labels. Representational similarity analysis (RSA): Using pattern similarity and representational dissimilarity matrices to explore the neural representation of different categories of objects in a way that can be compared to behavioral judgments and computational models, and solve the identity of unlabeled “mystery” objects. Searchlights: Explore where in the brain local areas contain multivariate information that discriminates between faces and scenes. Begins with a small mask to build proficiency and ends by running a whole-brain searchlight analysis. Demonstrates how to execute this computationally intensive analysis rapidly on a cluster using batch scripts and covers resource planning and monitoring of large batch jobs. Seed-based functional connectivity: To explore how large-scale brain networks, not just individual regions, contribute to cognitive processing, examine the temporal correlation (functional connectivity) between regions. Shows how connectivity changes during an attention task and how to remove stimulus-evoked responses to isolate background connectivity. Full correlation matrix analysis (FCMA): Rather than focus on connectivity with one or more seed regions of interest, calculate and analyze an unbiased measure of connectivity—the correlation of every voxel in the brain with every other voxel. Highlights differences between FCMA (which classifies based on connectivity) and MVPA (which classifies based on activity), including brain regions that are equally active for faces and scenes but are differentially connected. Inter-subject connectivity (ISC): Examine what is common across people by measuring correlations over time in the activity of matching voxels in their brains in response to a common stimulus (e.g., story or movie). Measure functional connectivity across people by correlating non-matching voxels (e.g., between angular gyrus in one subject and hippocampus in another). Shows how these techniques can reveal stimulus-driven variance in the brain by comparing listening to intact vs. scrambled stories. Shared response model (SRM): A common stimulus across subjects can be used to align subject brains functionally, rather than typical anatomical registration. SRM seeks to find shared variance in the fMRI data across subjects, in a reduced dimension feature space. This results in weights that map between voxels and features, allowing other data to be projected into the aligned space. SRM can also be viewed as a technique for isolating reliable stimulus-related responses by removing responses that are either noise or idiosyncratic subject responses. Shows the utility of this approach by improving time-segment matching in movie data and image classification with MVPA. Event segmentation: Use hidden Markov models (HMMs) to identify a sequence of transitions between stable brain patterns in fMRI data. Illustrates how fitting HMMs to data from high-level brain regions (obtained during movie-watching) subdivides the time series into chunks that track events in the movie. Explores whether retrieving events from memory leads to similar neural transitions. Real-time fMRI: Most fMRI studies involve collecting data and analyzing them days or weeks later. By analyzing data on the fly, real-time fMRI makes new kinds of experiments possible, such as neurofeedback training and adaptive designs. Demonstrates the use of an fMRI data simulator, which generates brain images at the rate of an fMRI study (every 1–2 s), and then address how to pre-process data online and how to complete MVPA or other advanced analyses incrementally, before the next brain image.

Cluster computing

Analyses that require either a long run-time or large memory are best run in batch mode. The Jupyter notebooks for these jobs serve as a template and may be used as the starting point for a batch script. Once the contents of the notebook have been learned, the user is directed to execute batch scripts associated with the notebook on the cluster. Executing batch jobs on clusters is non-trivial as it involves allocating the correct memory utilization, number of tasks, and the time required. Given the non-interactive nature of most clusters, debugging performance issues can be challenging. In the Searchlight notebook we have provided step-by-step instructions for cluster execution. To make the transition to running on clusters easier, we provide recommendations such as running small samples of the analyses and extrapolating to make memory and time estimates for the analysis of the entire dataset. We also provide batch scripts with parameters that can be changed to fit the needs of the user. Finally, we provide some basic tips on how to monitor the status of batch jobs on the clusters.

Other resources

To use the tutorials, a user will need to interact with multiple software tools. To make it easier for a new user to navigate these tools, we have created a website https://github.com/brainiak/brainiak-tutorials/wiki/Resources, where a new user can access tutorials and become familiar with Python, GitHub, and Unix. Furthermore, our goal for these tutorials was to cover advanced fMRI analysis and hence our tutorials do not cover pre-processing methods, General Linear Model analysis, or software deployment options (e.g., containers) in great detail. An exhaustive list covering multiple helpful tools and tutorials is available here: https://github.com/ohbm/hackathon2019/blob/master/Tutorial_Resources.md.

Availability and future directions

These tutorials and their associated datasets can be accessed here: https://brainiak.org/tutorials. At the time of writing there are 13 notebooks available. As time permits, we intend to produce more tutorials as needs or new methods demand. The methods/tools that are available in BrainIAK but not covered in the tutorials are: Bayesian derived methods for RSA; Topographic Factor Analysis; and an fMRI Simulator. We welcome contributions to BrainIAK from the community, in the form of code and tutorials added via GitHub. 26 Aug 2019 Dear Dr Kumar, Thank you very much for submitting your manuscript 'BrainIAK tutorials: user-friendly learning materials for advanced fMRI analysis' for review by PLOS Computational Biology. Your manuscript has been fully evaluated by the PLOS Computational Biology editorial team and in this case also by independent peer reviewers. Your paper describes a valuable contribution, and the examples are comprehensive. The reviewers have raised some important points concerning the implementation of the tutorials (in addtition to the feedback you already received by the community), and the inclusion of this set of tools in the panorama of a wider communitary effort. While your manuscript cannot be accepted in its present form, we are willing to consider a revised version in which the issues raised by the reviewers have been adequately addressed. We cannot, of course, promise publication at that time. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Your revisions should address the specific points made by each reviewer. Please return the revised version within the next 60 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org. Revised manuscripts received beyond 60 days may require evaluation and peer review similar to that applied to newly submitted manuscripts. In addition, when you are ready to resubmit, please be prepared to provide the following: (1) A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors. (2) A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text. (3) A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution. Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are: - Figures uploaded separately as TIFF or EPS files (if you wish, your figures may remain in your main manuscript file in addition). - Supporting Information uploaded as separate files, titled Dataset, Figure, Table, Text, Protocol, Audio, or Video. - Funding information in the 'Financial Disclosure' box in the online system. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see here. We are sorry that we cannot be more positive about your manuscript at this stage, but if you have any concerns or questions, please do not hesitate to contact us. Sincerely, Daniele Marinazzo Deputy Editor PLOS Computational Biology Daniele Marinazzo Deputy Editor PLOS Computational Biology A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Review uploaded as attachment Reviewer #2: By drafting the described tutorials, the authors have provided a substantial, valuable contribution to the field. I do, however, have significant concerns about their framing of this contribution in the present manuscript. In particular, the authors fail to acknowledge related efforts across open neuroimaging. I hope that this review provides some constructive feedback as to where they could better link their efforts with those of other community members so that a reader might better understand the impact of the described tutorials. The full review is uploaded as an attachment. Reviewer #3: # Summary and general comments In this submission, Kumar and colleagues present a library called BrainIAK for machine learning in functional neuroimaging, and an accompanying set of tutorials. The tutorials are presented in the form of jupyter notebooks, and are accessible either locally through containers or online on the google collab platform. They also include instructions for deployment on high-performance infrastructure. The data used in the tutorial are freely available and specially prepared to be used as part of a training activity. As a strength, some of the material covered in the tutorials include inter-subject correlations and representational similarity analysis, two applications which are not well covered by currently available tutorials, to my knowledge. Overall, this new library and tutorials are remarkably comprehensive, and I believe will represent a very valuable resource for the community. My only major concern is that the authors did not properly position their work compared to other efforts. # Minor comments * abstract: citing a specific list training and hackathons will become obsolete in a few months only. Maybe stay vague there. * intro claims several times the lack of existing education material. There is a huge amount of general-purpose tutorials for machine learning, most notably featuring the sklearn documentation. There are at least three extensive packages with many tutorials: nilearn, pyMVPA and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4956688/. The authors should briefly review these other resources and explain how BrainIAK adds to them (paragraph line 120). Which techniques are not currently covered by tutorials? * l. 211: would you have recommendations for resources to preprocess the data that would integrate well with BrainIAK? In particular, you may want to discuss if detailed instructions are available for importing minimally preprocessed data, such as the ones generated by fMRIprep. * no material is presented to demonstrate that the proposed material achieves the stated goals. Survey results from a workshop, for example, would add some support to the usefulness of the resources. # Optional suggestions Below are two suggestions. I (as a reviewer) do not think it is necessary to implement these suggestions prior to publication. I am providing these suggestions in the hope the authors may find them useful and may choose to follow up on some of them. * BrainIAK should go through a proper code review, as a library. Consider a submission to the journal of open source software (JOSS) for the library component of BrainIAK. * I have not reviewed the tutorials themselves, but tried to evaluate if BrainIAK adds conceptually to existing software resources. As part of the NeuroLibre platform, a detailed technical review of the notebooks has been performed by two reviewers. I would encourage the authors to address these technical issues. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Oscar Esteban Reviewer #2: No Reviewer #3: Yes: Pierre Bellec Submitted filename: PCOMPBIOL-D-19-01130.pdf Click here for additional data file. Submitted filename: review-PCOMPBIOL-D-19-01130.pdf Click here for additional data file. 15 Oct 2019 Submitted filename: Brainiak Tutorials Paper_response_letter_v3.pdf Click here for additional data file. 17 Nov 2019 Dear Dr Kumar, We are pleased to inform you that your manuscript 'BrainIAK tutorials: user-friendly learning materials for advanced fMRI analysis' has been provisionally accepted for publication in PLOS Computational Biology. Please make sure to update the table and references as requested by reviewer 2. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Once you have received these formatting requests, please note that your manuscript will not be scheduled for publication until you have made the required changes. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pcompbiol/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. One of the goals of PLOS is to make science accessible to educators and the public. PLOS staff issue occasional press releases and make early versions of PLOS Computational Biology articles available to science writers and journalists. PLOS staff also collaborate with Communication and Public Information Offices and would be happy to work with the relevant people at your institution or funding agency. If your institution or funding agency is interested in promoting your findings, please ask them to coordinate their releases with PLOS (contact ploscompbiol@plos.org). Thank you again for supporting Open Access publishing. We look forward to publishing your paper in PLOS Computational Biology. Sincerely, Daniele Marinazzo Deputy Editor PLOS Computational Biology Daniele Marinazzo Deputy Editor PLOS Computational Biology
Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Please find my comments attached. Oscar Esteban. Reviewer #2: The authors have addressed my major concerns, and the revised manuscript significantly better situates these contributions in the context of the broader field. Several minor notes and clarifications: 1. I am delighted that the authors are careful to cite supporting software, but I noticed that two technologies are missing from the references list: Jupyter Notebooks and OpenNeuro (formerly OpenfMRI). These citations are, respectively: Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B.E., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J.B., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., Willing, C., & Jupyter development team (2016). Jupyter Notebooks - a publishing format for reproducible computational workflows. In Positioning and Power in Academic Publishing: Players, Agents and Agendas. doi: 10.3233/978-1-61499-649-1-87. and Poldrack, R.A., Barch, D.M., Mitchell, J.P., Wager, T.D., Wagner, A.D., Devlin, J.T., Cumba, C., Koyejo, O., and Milham, M.P. (2013). Toward open sharing of task-based fMRI data: the OpenfMRI project. Frontiers in Neuroinformatics, 7, 1–12. Could the authors please update the text to include these references? 2. I appreciate the authors' clarification as to why Google drive links were included for the datasets. I was also pleased to see that the data used in the tutorials are now directly available in Zenodo, as this provides better long-term archiving. Would the authors be willing to update their caption for Table 1 to directly link to the Zenodo archive? This would ensure better long-term access to the exact data source used in the tutorials, as the authors point out that the versions available from e.g. OpenNeuro do not match those used in the lessons. Reviewer #3: Thanks for appropriately addressing all of my comments, and congratulations on a very valuable contribution. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.
If you choose “no”, your identity will remain anonymous but your review may still be made public.
Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Oscar Esteban Reviewer #2: No Reviewer #3: Yes: Pierre Bellec Submitted filename: PCOMPBIOL-D-19-01130_R1.pdf Click here for additional data file. 9 Dec 2019 PCOMPBIOL-D-19-01130R1 BrainIAK tutorials: User-friendly learning materials for advanced fMRI analysis Dear Dr Kumar, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Laura Mallard PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

29 in total

1. Discovering Event Structure in Continuous Narrative Perception and Memory.

Authors: Christopher Baldassano; Janice Chen; Asieh Zadbood; Jonathan W Pillow; Uri Hasson; Kenneth A Norman
Journal: Neuron Date: 2017-08-02 Impact factor: 17.173

2. Beyond mind-reading: multi-voxel pattern analysis of fMRI data.

Authors: Kenneth A Norman; Sean M Polyn; Greg J Detre; James V Haxby
Journal: Trends Cogn Sci Date: 2006-08-08 Impact factor: 20.229

Review 3. Neuroadaptive Bayesian Optimization and Hypothesis Testing.

Authors: Romy Lorenz; Adam Hampshire; Robert Leech
Journal: Trends Cogn Sci Date: 2017-02-21 Impact factor: 20.229

Review 4. Closed-loop brain training: the science of neurofeedback.

Authors: Ranganatha Sitaram; Tomas Ros; Luke Stoeckel; Sven Haller; Frank Scharnowski; Jarrod Lewis-Peacock; Nikolaus Weiskopf; Maria Laura Blefari; Mohit Rana; Ethan Oblak; Niels Birbaumer; James Sulzer
Journal: Nat Rev Neurosci Date: 2016-12-22 Impact factor: 34.870

5. Scene representations in parahippocampal cortex depend on temporal context.

Authors: Nicholas B Turk-Browne; Mason G Simon; Per B Sederberg
Journal: J Neurosci Date: 2012-05-23 Impact factor: 6.167

6. Matching categorical object representations in inferior temporal cortex of man and monkey.

Authors: Nikolaus Kriegeskorte; Marieke Mur; Douglas A Ruff; Roozbeh Kiani; Jerzy Bodurka; Hossein Esteky; Keiji Tanaka; Peter A Bandettini
Journal: Neuron Date: 2008-12-26 Impact factor: 17.173

7. Shared memories reveal shared structure in neural activity across individuals.

Authors: Janice Chen; Yuan Chang Leong; Christopher J Honey; Chung H Yong; Kenneth A Norman; Uri Hasson
Journal: Nat Neurosci Date: 2016-12-05 Impact factor: 24.884

8. fMRIPrep: a robust preprocessing pipeline for functional MRI.

Authors: Russell A Poldrack; Krzysztof J Gorgolewski; Oscar Esteban; Christopher J Markiewicz; Ross W Blair; Craig A Moodie; A Ilkay Isik; Asier Erramuzpe; James D Kent; Mathias Goncalves; Elizabeth DuPre; Madeleine Snyder; Hiroyuki Oya; Satrajit S Ghosh; Jessey Wright; Joke Durnez
Journal: Nat Methods Date: 2018-12-10 Impact factor: 28.547

9. Toward open sharing of task-based fMRI data: the OpenfMRI project.

Authors: Russell A Poldrack; Deanna M Barch; Jason P Mitchell; Tor D Wager; Anthony D Wagner; Joseph T Devlin; Chad Cumba; Oluwasanmi Koyejo; Michael P Milham
Journal: Front Neuroinform Date: 2013-07-08 Impact factor: 4.081

10. Dynamic reconfiguration of the default mode network during narrative comprehension.

Authors: Erez Simony; Christopher J Honey; Janice Chen; Olga Lositsky; Yaara Yeshurun; Ami Wiesel; Uri Hasson
Journal: Nat Commun Date: 2016-07-18 Impact factor: 14.919

9 in total

Review 1. Incorporating structured assumptions with probabilistic graphical models in fMRI data analysis.

Authors: Ming Bo Cai; Michael Shvartsman; Anqi Wu; Hejia Zhang; Xia Zhu
Journal: Neuropsychologia Date: 2020-05-17 Impact factor: 3.139

2. A neural network model of when to retrieve and encode episodic memories.

Authors: Qihong Lu; Uri Hasson; Kenneth A Norman
Journal: Elife Date: 2022-02-10 Impact factor: 8.713

3. Changes to information in working memory depend on distinct removal operations.

Authors: Hyojeong Kim; Harry R Smolker; Louisa L Smith; Marie T Banich; Jarrod A Lewis-Peacock
Journal: Nat Commun Date: 2020-12-07 Impact factor: 14.919

4. The Amsterdam Open MRI Collection, a set of multimodal MRI datasets for individual difference analyses.

Authors: Lukas Snoek; Maite M van der Miesen; Tinka Beemsterboer; Andries van der Leij; Annemarie Eigenhuis; H Steven Scholte
Journal: Sci Data Date: 2021-03-19 Impact factor: 6.444

5. Emergence and organization of adult brain function throughout child development.

Authors: Tristan S Yates; Cameron T Ellis; Nicholas B Turk-Browne
Journal: Neuroimage Date: 2020-11-30 Impact factor: 6.556

6. Predicting speech from a cortical hierarchy of event-based time scales.

Authors: Lea-Maria Schmitt; Julia Erb; Sarah Tune; Anna U Rysop; Gesa Hartwigsen; Jonas Obleser
Journal: Sci Adv Date: 2021-12-03 Impact factor: 14.136

7. The "Narratives" fMRI dataset for evaluating models of naturalistic language comprehension.

Authors: Samuel A Nastase; Yun-Fei Liu; Hanna Hillman; Asieh Zadbood; Liat Hasenfratz; Neggin Keshavarzian; Janice Chen; Christopher J Honey; Yaara Yeshurun; Mor Regev; Mai Nguyen; Claire H C Chang; Christopher Baldassano; Olga Lositsky; Erez Simony; Michael A Chow; Yuan Chang Leong; Paula P Brooks; Emily Micciche; Gina Choe; Ariel Goldstein; Tamara Vanderwal; Yaroslav O Halchenko; Kenneth A Norman; Uri Hasson
Journal: Sci Data Date: 2021-09-28 Impact factor: 8.501

8. BrainIAK: The Brain Imaging Analysis Kit.

Authors: Manoj Kumar; Michael J Anderson; James W Antony; Christopher Baldassano; Paula P Brooks; Ming Bo Cai; Po-Hsuan Cameron Chen; Cameron T Ellis; Gregory Henselman-Petrusek; David Huberdeau; J Benjamin Hutchinson; Y Peeta Li; Qihong Lu; Jeremy R Manning; Anne C Mennen; Samuel A Nastase; Hugo Richard; Anna C Schapiro; Nicolas W Schuck; Michael Shvartsman; Narayanan Sundaram; Daniel Suo; Javier S Turek; David Turner; Vy A Vo; Grant Wallace; Yida Wang; Jamal A Williams; Hejia Zhang; Xia Zhu; Mihai Capotă; Jonathan D Cohen; Uri Hasson; Kai Li; Peter J Ramadge; Nicholas B Turk-Browne; Theodore L Willke; Kenneth A Norman
Journal: Apert Neuro Date: 2022-02-16

9. Conducting decoded neurofeedback studies.

Authors: Vincent Taschereau-Dumouchel; Aurelio Cortese; Hakwan Lau; Mitsuo Kawato
Journal: Soc Cogn Affect Neurosci Date: 2021-08-06 Impact factor: 3.436

9 in total