| Literature DB >> 34095517 |
Kimberlyn M McGrail1, Kerina Jones2, Ashley Akbari2, Tellen D Bennett3, Andy Boyd4, Fabrizio Carinci5, Xinjie Cui6, Spiros Denaxas7, Nadine Dougall8, David Ford2, Russell Kirby9, Hye-Chung Kum10, Rachael Moorin11, Ros Moran12, Christine M O'Keefe13, David Preen14, Hude Quan15, Claudia Sanmartin16, Michael Schull17, Mark Smith18, Christine Williams19, Tyler Williamson15, Grant Ma Wyper20, Milton Kotelchuck21.
Abstract
Information is increasingly digital, creating opportunities to respond to pressing issues about human populations using linked datasets that are large, complex, and diverse. The potential social and individual benefits that can come from data-intensive science are large, but raise challenges of balancing individual privacy and the public good, building appropriate socio-technical systems to support data-intensive science, and determining whether defining a new field of inquiry might help move those collective interests and activities forward. A combination of expert engagement, literature review, and iterative conversations led to our conclusion that defining the field of Population Data Science (challenge 3) will help address the other two challenges as well. We define Population Data Science succinctly as the science of data about people and note that it is related to but distinct from the fields of data science and informatics. A broader definition names four characteristics of: data use for positive impact on citizens and society; bringing together and analyzing data from multiple sources; finding population-level insights; and developing safe, privacy-sensitive and ethical infrastructure to support research. One implication of these characteristics is that few people possess all of the requisite knowledge and skills of Population Data Science, so this is by nature a multi-disciplinary field. Other implications include the need to advance various aspects of science, such as data linkage technology, various forms of analytics, and methods of public engagement. These implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field, can catalyze significant advances in our understanding of trends in society, health, and human behavior.Entities:
Year: 2018 PMID: 34095517 PMCID: PMC8142960 DOI: 10.23889/ijpds.v3i1.415
Source DB: PubMed Journal: Int J Popul Data Sci ISSN: 2399-4908
Figure 1: The increase in total information and percent digital, 1986-2007Source: Hilbert M, López P. The World’s Technological Capacity to Store, Communicate, and Compute Information. S [Internet]. 2012;332(60):60-5
Figure 2: Relationship of Population Data Science to Data Science and Informatics| Is the focus on people? | Do data come from multiple sources? | What is the primary aim of research? | Is technical and policy infrastructure a focus? | |
|---|---|---|---|---|
| Focus on people, systems and population-level insights | Yes, a primary objective is linking and/or integrating data from multiple data sources and data of different types | Research must be seen to have public value, with potential for positive impact on citizens and society | Infrastructure is a key focus, in particular with respect to legal, ethical, and privacy norms and public expectations. This covers all aspects of data from collection to storage and use. | |
| The focus is on the data themselves. The field is general to all disciplines, though often focuses on “big data” | Linking can be, but is not necessarily a focus. Data from a single source (e.g. a private company) are often the focus. | Focus is use of data for actionable information. Data science techniques are often (though not exclusively) used by private, proprietary interests. | Not generally a focus outside of legal commitments to protect privacy | |
| General (though not exclusive) focus on providers or systems as much as people represented in the data. | Sometimes there is linking, though often from an operational perspective | Public good is present but often as a secondary objective behind (for example) implementation of technology-based solutions to improve health care delivery | Infrastructure focus is on database / technical development and implementation | |
| Focus: | Implication(s) | Response to implication(s) |
|---|---|---|
| Required knowledge and skills are not likely to exist in a single discipline | Population Data Science is a multi-disciplinary field that encourages collective action, including the public’s voice. | |
| Technical approach to linkage - bringing together disparate data without common identifiers, preserving privacy | Population Data Science will include advancing the science of linkage technology. | |
| Analysis of complex data | Population Data Science will develop new tools for data analysis and will promote the training of practitioners. | |
| Population data science will develop methods for data analysis that do not require movement or (in some cases) direct viewing of sensitive data | ||
| Interpreting data in a secondary context | Population Data Science will develop the assessment and reporting frameworks needed to document data with sufficient detail to inform accurate assessments | |
| Understanding the values and expectations of the public and other key stakeholders, and then building systems to meet those. | Population Data Science is committed to public and stakeholder involvement and engagement in its many forms. | |
| Population Data Scientists will advance the science of public engagement to promote public understanding of data usage. | ||
| Required knowledge and skills are not likely to exist in a single discipline | Population Data Science is a multi-disciplinary field that has a strong focus on stakeholder engagement and commitment to capacity building. | |
| Data are complex and sensitive. | Population Data Scientists commit to ethical and rigorous science within the legal framework of their jurisdictions. | |