| Literature DB >> 30405329 |
K Jarrod Millman1,2, Matthew Brett3, Ross Barnowski4, Jean-Baptiste Poline5.
Abstract
We describe a project-based introduction to reproducible and collaborative neuroimaging analysis. Traditional teaching on neuroimaging usually consists of a series of lectures that emphasize the big picture rather than the foundations on which the techniques are based. The lectures are often paired with practical workshops in which students run imaging analyses using the graphical interface of specific neuroimaging software packages. Our experience suggests that this combination leaves the student with a superficial understanding of the underlying ideas, and an informal, inefficient, and inaccurate approach to analysis. To address these problems, we based our course around a substantial open-ended group project. This allowed us to teach: (a) computational tools to ensure computationally reproducible work, such as the Unix command line, structured code, version control, automated testing, and code review and (b) a clear understanding of the statistical techniques used for a basic analysis of a single run in an MR scanner. The emphasis we put on the group project showed the importance of standard computational tools for accuracy, efficiency, and collaboration. The projects were broadly successful in engaging students in working reproducibly on real scientific questions. We propose that a course on this model should be the foundation for future programs in neuroimaging. We believe it will also serve as a model for teaching efficient and reproducible research in other fields of computational science.Entities:
Keywords: FMRI; Python language; computational reproducibility; data science; education; neuroimaging; scientific computing; statistics
Year: 2018 PMID: 30405329 PMCID: PMC6204391 DOI: 10.3389/fnins.2018.00727
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Figure 1The initial and final repository directory template for student projects. We gave the students a project copied from the initial template in week 5, from which they would write and build and their initial proposal. As the course progressed, we pushed two updates to their repository. The first gave them a template for building their final report, their top-level README.md file and file specifying an open license for their work. The second gave them: an initial set of functions we had shown them in class with associated tests (in the code directory, not shown); machinery to trigger automatic execution of tests on Travis-CI servers; automated code coverage reporting; and a template for writing slides for their first progress report. (A) Initial repository template. (B) Final project template.
Figure 2Class lecture material giving instructions for the first progress report. This is the text we showed them during class, and discussed, to prepare them for their first progress report. As for almost all our lecture material, we posted this text to the class website.
Figure 3Above we list the first few lines of diagnosis_script.py. This file was part of the second homework, which focused on detecting outlier 3D volumes in a 4D FMRI image. Tasks included: (a) implementing functions on image arrays using NumPy, (b) exploring FMRI data for outliers, (c) running least-squares fits on different models, and (d) making and saving plots with Matplotlib.
Project grading rubric.
| Questions | Questions overly simplistic, unrelated, or unmotivated | Questions appropriate, coherent, and motivated | Questions well motivated, interesting, insightful, and novel |
| Analysis | Choice of analysis overly simplistic or incomplete | Analysis appropriate | Analysis appropriate, complete, advanced, and informative |
| Results | Conclusions missing, incorrect, or not based on analysis | Conclusions relevant, but partially correct or partially complete | Relevant conclusions tied to analysis and context |
| Inappropriate choice of plots; poorly labeled plots; plots missing | Plots convey information but lack context for interpretation | Plots convey information correctly with adequate and appropriate reference information | |
| Collaboration | Few members contributed substantial effort or each members worked on only part of project | All members contributed substantial effort and everyone contributed to all aspects of project | All members contributed substantial effort to each project aspect |
| Tests | Tests incomplete, incorrect, or missing | Tests cover most of the project code | Extensive and comprehensive testing |
| Code review | Pull requests not adequately used, reviewed, or improved | Pull requests adequately used, reviewed, and improved | Code review substantial and extensive |
| Documentation | Poorly documented | Adequately documented | Well documented |
| Readability | Code readability inconsistent or poor | Code readability consistent and good quality | Code readability excellent |
| Organization | Poorly organized and structured repository | Reasonably organized and clear structure | Elegant and transparent code organization |
| Presentation | Verbal presentation illogical, incorrect, or incoherent | Verbal presentation partially correct but incomplete or unconvincing | Verbal presentation correct, complete, and convincing |
| Visual presentation cluttered, disjoint, or illegible | Visual presentation is readable and clear | Visual presentation appealing, informative, and crisp | |
| Verbal and visual presentation unrelated | Verbal and visual presentation related | Verbal and visual presentation clearly related | |
| Writing | Explanation illogical, incorrect, or incoherent | Explanation correct, complete, and convincing | Explanation correct, complete, convincing, and elegant |
| Reproduciblity | Code didn't run | Makefile recipes fetch data, validates fetched data, generates all results and figures in report | Makefiles generate EDA work and supplementary analysis |
An “A” was roughly two or more check pluses and no check minuses.
GitHub and other code metrics for student projects.
| Alpha | 787 | 23 | 190 | 379 | 24.7 | 3,293 | 3.7 |
| Beta | 534 | 7 | 147 | 105 | 20.1 | 1,753 | 2.0 |
| Delta | 571 | 31 | 121 | 117 | 35.1 | 996 | 21.3 |
| Epsilon | 593 | 26 | 310 | 79 | 40.9 | 1,809 | 19.5 |
| Eta | 259 | 11 | 89 | 44 | 21.7 | 588 | 12.4 |
| Gamma | 329 | 4 | 79 | 35 | 22.3 | 1,040 | 37.6 |
| Iota | 414 | 26 | 113 | 144 | 24.3 | 928 | 8.4 |
| Kappa | 337 | 30 | 99 | 86 | 16.6 | 1,157 | 3.5 |
| Lambda | 365 | 22 | 67 | 82 | 17.6 | 732 | 91.5 |
| Theta | 547 | 25 | 133 | 450 | 22.1 | 1,186 | 23.1 |
| Zeta | 344 | 3 | 49 | 21 | 20.8 | 8,287 | 6.2 |
Commits: number of Git commits in repository. Issues: number of GitHub issues. PRs: number of GitHub Pull Requests. Comments: number of comments, either on issues or PRs, not including the initial description of the issue / PR. Words / comment: mean number of words per comment. LoC: total lines of code measured with . % covered: Estimate of percentage of lines of code in LoC column that were covered by tests, from statistics at .