| Literature DB >> 21811476 |
Charles P Schmitt1, Margaret Burchinal.
Abstract
The success of research in the field of maternal-infant health, or in any scientific field, relies on the adoption of best practices for data and knowledge management. Prior work by our group and others has identified evidence-based solutions to many of the data management challenges that exist, including cost-effective practices for ensuring high-quality data entry and proper construction and maintenance of data standards and ontologies. Quality assurance practices for data entry and processing are necessary to ensure that data are not denigrated during processing, but the use of these practices has not been widely adopted in the fields of psychology and biology. Furthermore, collaborative research is becoming more common. Collaborative research often involves multiple laboratories, different scientific disciplines, numerous data sources, large data sets, and data sets from public and commercial sources. These factors present new challenges for data and knowledge management. Data security and privacy concerns are increased as data may be accessed by investigators affiliated with different institutions. Collaborative groups must address the challenges associated with federating data access between the data-collecting sites and a centralized data management site. The merging of ontologies between different data sets can become formidable, especially in fields with evolving ontologies. The increased use of automated data acquisition can yield more data, but it can also increase the risk of introducing error or systematic biases into data. In addition, the integration of data collected from different assay types often requires the development of new tools to analyze the data. All of these challenges act to increase the costs and time spent on data management for a given project, and they increase the likelihood of decreasing the quality of the data. In this paper, we review these issues and discuss theoretical and practical approaches for addressing these issues.Entities:
Keywords: collaborative research; data entry; data integration; data management
Year: 2011 PMID: 21811476 PMCID: PMC3143734 DOI: 10.3389/fpsyt.2011.00047
Source DB: PubMed Journal: Front Psychiatry ISSN: 1664-0640 Impact factor: 4.157
Figure 1Schematic of data exchange in a multi-site research collaboration. In a multi-site research collaboration, data coining from and to individual laboratories and scientific cores (e.g., metabolomic, proteomics, imaging cores) must be managed and integrated along with the annotation describing the data.
Factors to consider when developing the technical approach.
| Factors | Description |
|---|---|
| Personnel skills and resources | Identify the IT staff and technical skills already in place at the receiving and distribution sites, and determine if they are qualified to handle the planned approaches. In particular, consider if there are personnel available with the appropriate skill sets required for all tasks. |
| Data retrieval/publishing mechanisms | Identify the in-place (or planned) mechanisms for data access that will be used for distributing and retrieving data from laboratories and other data sources. |
| Data issues | Consider the types of data that are being transferred, the formats that the data will have, and the transformations of the data that will be required. |
| Integration requirements | Consider how the data will be integrated and where the integration will take place. For example, will the data be integrated “on-demand” by users at their sites, or will they be pre-computed? Will laboratories need full access to integrated data or subsets of data? What software will be used with the integrated data, and where will that software reside? Should integrated data be treated as data managed by best practices, with auditing and/or changes in the data? |
| Scale | Consider the computational and storage requirements for the integrated data and for use of the data. If these requirements are great, can the laboratories handle the requirement, or will they require additional disk space or computational support? |
| Policies | Consider the policies regarding access, sharing, and movement of the data for integration. Also, consider the policies regarding the integrated data. What privacy and security mechanisms need to be put in place? Does the integration of data change regulatory requirements? Are there differences in Institutional Review Board policies between institutions? |
| Provenance | Consider the requirements for tracking the integration of data and the use of the integrated data. What result sets must be reproducible? |