| Literature DB >> 24296908 |
Torsten Rohlfing1, Kevin Cummins1, Trevor Henthorn2, Weiwei Chu3, B Nolan Nichols4.
Abstract
The infrastructure for data collection implemented by the National Consortium on Alcohol and NeuroDevelopment in Adolescence (N-CANDA) for data collection comprises several innovative features: (a) secure, asynchronous transfer and persistent storage of collected data via a revision control system; (b) two-stage import into a longitudinal database; and (c) use of a script-controlled web browser for data retrieval from a third-party, web-based neuropsychological test battery. The asynchronous operation of data transmission and import is of particular benefit, as it has allowed the consortium sites to begin data collection before the receiving database infrastructure had been deployed. Records were collected within 86 days of funding, 35 days after finalizing the collected instruments. Final instruments were added to the database import 225 days after instrument selection, with up to 173 records already collected at that time. Thus, the concepts implemented in N-CANDA's data collection system helped reduce project start-up time by several months. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.Entities:
Keywords: data integration; informatics; longitudinal data collection; revision control system
Mesh:
Year: 2013 PMID: 24296908 PMCID: PMC4078281 DOI: 10.1136/amiajnl-2013-002367
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Data captured by the N-CANDA data collection laptops; the ‘data conversion’ steps are automatically executed by the consortium server running Linux, with no user interaction
| Instrument | Data collection | Data conversion |
|---|---|---|
| Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA). | Legacy survey implemented with the Blaise computer-assisted interviewing system ( | Proprietary database format; raw data records extracted using Blaise's ‘Manipula’ tool, running in batch mode using the ‘Wine’ Windows Emulator ( |
| Delay discounting | Third-party Windows application | Simple tabular text file, parsed using a custom Python script, ‘dd2csv’. |
| Paced Auditory Serial-Addition Task (PASAT) | Third-party Windows application | Database file in Microsoft Access format is converted to CSV format using open-source mdb-tools ( |
| Stroop Match-to-Sample (MtS) | ePrime ( | Human-readable log file, parsed and scores computed using a custom Python script, ‘eprime2redcap’. |
| Several study-specific instrument batteries (herein referred to as: Youth Report 1, Youth Report 2, Parent Report, MRI Report) | Customized version of LimeSurvey ( | Single-record, comma-separated file, reformatted by a custom Python script, ‘lime2csv’, to remove special characters from field names, etc. |
All custom scripts referenced in this table are available for download at https://www.nitrc.org/frs/?group_id=672.
CSV, comma-separated-value; N-CANDA, National Consortium on NeuroDevelopment in Adolescence.
Figure 1Illustration of the laptop data transmission and import procedure using a networked Subversion repository and the ‘harvester’ Python script.
Figure 2Illustration of the two-stage import procedure for data from N-CANDA data collection laptop or WebCNP system into the longitudinal REDCap project database. N-CANDA, National Consortium on Alcohol and NeuroDevelopment in Adolescence; WebCNP, Web-based Computerized Neuropsychological Testing.
Figure 3Number of records collected over time after funding began (September 15, 2012) for different instruments of the N-CANDA protocol. The protocol was finalized during a consortium meeting November 5 and 6, 2012 (Day 51; marked by dotted line across graphs). In the graph for each instrument, an arrow marks the day when the first record was collected, and the dashed gray line marks the date when the REDCap database and import procedure were ready for that instrument. N-CANDA, National Consortium on Alcohol and NeuroDevelopment in Adolescence.
Instruments collected by the N-CANDA consortium (partial list as reported in this article), showing date of first collected record, date when database and import were ready for each instrument, and number of records collected by the time database and import were ready
| Instrument | First record collected | Database and import ready | Records collected prior to database ready |
|---|---|---|---|
| Delay discounting | 2013-01-18 (day 125) | 2013-04-05 (day 202) | 46 |
| PASAT | 2013-01-11 (day 118) | 2013-04-05 (day 202) | 54 |
| Stroop MtS | 2013-01-11 (day 118) | 2013-04-27 (day 224) | 74 |
| SSAGA (parent) | 2012-12-14 (day 90) | 2013-05-23 (day 250) | 98 |
| SSAGA (youth) | 2013-01-05 (day 112) | 2013-05-23 (day 250) | 132 |
| MRI report | 2013-01-13 (day 120) | 2013-06-18 (day 276) | 141 |
| Parent report | 2012-12-10 (day 86) | 2013-06-18 (day 276) | 161 |
| Youth report 1 | 2013-01-13 (day 120) | 2013-06-18 (day 276) | 173 |
| Youth report 2 | 2013-01-14 (day 121) | 2013-06-18 (day 276) | 155 |
‘Day’ is the number of days after project funding was received (September 15, 2012).
MtS, Match to Sample; N-CANDA, National Consortium on Alcohol and NeuroDevelopment in Adolescence; PASAT, Paced Auditory Serial-Addition Task; SSAGA, Semi-Structured Assessment for the Genetics of Alcoholism.