Big data has been flourishing in the healthcare field in those years and has exerted a profound impact on the way how nursing practice and research are conducted. While it has numerous and to some extent, irreplaceable advantages to transform healthcare, there are critiques that pure big data approach is insufficient to achieve personalized care promise due to the complex, muti-causal, and idiosyncratic nature of human health conditions., In this article, I advocate for a small data paradigm for personalized cancer care, and more broadly, nursing research and practice. I argue that small data has large potential in building causal explanations and facilitating rapid learning. Small and big data should be used in an integrated and complementary fashion to promote the personalized practice.Small data has been defined in various ways in different disciplines. Researchers in the informatics field compared it with big data from the features of volume, variety, and velocity and considered small data as small in volume, narrow in variety, and slow in velocity. In the business field, the author of the book Small Data - The tiny clues that uncover huge trends Martin Lindstrom defined small data as those seemingly insignificant observations we make in people's lives that lead to an understanding of causations. In healthcare, Estrin viewed small data as the n = me data that we generate every day for ourselves. In this article, I turn to the definition by Hekler et al and consider small data as “the rigorous use of data by and for a specific N-of-1 unit (ie., a single person, clinic, hospital, healthcare system, community, city, etc.) to facilitate improved individual-level description, prediction and, ultimately, control for that specific unit”. Here the word “small” does not necessarily mean that the data volume is small but that the data is collected from and being used for a single unit. While big data aims to collect information from a set of individuals to improve the description and prediction for other individuals, small data collects individual data for that individual., It should be noted that while small data has been dominantly discussed in artificial intelligence, machine learning, and informatics fields, it is not the focus of this article to delve into these aspects. I consider small data as a research paradigm or orientation that guides our deep understanding of complex nursing, healthcare, and social phenomena rather than a type of analytical method.In this article, I start with an ontological argument for the small data approach and suggest that small data, consistent with an ontological position of causal dispositionalism, has a larger potential to promote personalized care than the big data approach underpinned by empiricism and Humeanism. Next, I raise two advantages of small data in healthcare research and practice: building causal explanations and facilitating rapid learning. Then, I illustrate two promising research methods that align well with the small data paradigm: the case study method in social science and the N-of-1 trial in natural science. After that, I discuss the possible ways to integrate big and small data to promote personalized care. Lastly, I propose several implications of small data for personalized cancer care.
Small data, causal dispositionalism, and personalized care
Causation is one of the basic features of reality and the main foundation upon which science is based. Big data and small data can be viewed as being underpinned by different ontological positions of causation. Big data, akin to evidence-based medicine, has a solid foundation in empiricism. Empiricism reduces complex human conditions into its compositions and builds causal relations and universal laws through correlations between variables. It is premised on the idea that what works at the population level should guide the decision for individuals. The position of causation under this paradigm is called Humeanism. Humeanism holds four main propositions on causation, namely (1) constant conjunction: causality is the perfect correlation between a cause and its effect; every time the cause happens, the effect follows; (2) temporal priority: the cause always happens before the effect; (3) contiguity: the cause and effect must be contiguous in space and time; and (4) the same cause always produces the same effect, and the same effect never arises but from the same cause.,, For example, it is now well-accepted that smoking is the leading cause of lung cancer. From the Humeanism perspective, the reason why smoking causes lung cancer is that the epidemiological data show that heavy smokers often (constant conjunction) suffer from lung cancer after a period of time (temporal priority and contiguity), and when we trace back the history of patients with lung cancer, many of them smoke. However, Humeanism faces serious critiques in building real causation due to its failure to distinguish correlation from causation. The same cause does not necessarily lead to the same effect across different contexts—context matters in the causal process. Causes and effects are not linear relations. One cause might have multiple intended and unintended effects; one effect might have multiple interacting causes.,, Therefore, the pure big data approach underpinned by empiricism and Humeanism can hardly achieve personalized care promise because of its limitations on external validity, the ecological fallacy, and the non-linearity, complexity, and multi-causality of human health conditions.,Small data, on the other hand, can be viewed as consistent with an ontological position of causal dispositionalism. Different from Humeanism, dispositionalism holds that causation happens in a particular case and does not require repetition. Disposition is the causal power, abilities, capacities, or the intrinsic properties of things that can exist unmanifested. It is a tendency toward effects—causality is the result of complex interaction among multiple dispositions.,, The reason why smoking leads to lung cancer is not the correlation between these two phenomena, although such correlation might lead to our understanding of causation. The real reason is that the cigarette smoke we breathe in contains chemicals that have the disposition to cause genetic mutation in the lung cells. From a dispositionalist view, to establish a causal relationship between an intervention and an effect is to understand whether the intervention has dispositions that interact with other dispositions to produce the effects. The same disposition tends to produce different effects depending on which dispositions it interacts with. For example, Ibuprofen, as one type of nonsteroidal anti-inflammatory drugs (NSAIDs), has the disposition to relieve cancer pain. However, it only works with patients who suffer from mild pain and have the disposition to react to NSAIDs. When patients suffer from severe pain, neuropathic pain, or bone metastases pain, they no longer have the disposition to react to NSAIDs, thus ibuprofen seldom works in these circumstances. Dispositionalism embraces the complex and multifactorial nature of causation, individual variations, and context sensitivities. It argues that causation is singular and each patient is unique and thus one size does not fit all. The small data approach, collecting rich data about the specific N-of-1 unit and appreciating the heterogeneity of individuals, has large potential in building causations for individuals (described below) and thus promoting personalized care.
Advantages of small data paradigm
Small data for building causal explanations
Small data has large potential in building causal explanations. Under dispositionalism, causation is understood as the intrinsic dispositions of things and thus can not be captured from the pure empirical facts. Researchers need to go beyond the investigation of whether and how often things happen through statistical analysis (ie. empirical level) and delve into the theoretical knowledge on why and how things happen or do not happen (ie. causal explanations). The small data approach, which can also be considered an “intensive research” approach, collects different types of qualitative, quantitative, and longitudinal data, such as the medical examinations, lab analyses, medical history, and patient narratives on lived experiences, around a particular case or a few cases. It enables a deep understanding of the surrounding contexts and their interactions with an individual to build context-sensitive explanations. To put it in another way, it helps to gain insight into how multiple dispositions surrounding the individual interact to cause effects.Social scientists from the critical realism tradition argue that researchers can use retroductive reasoning to build causal explanations qualitatively by conducting intensive research (ie. small data research). Retroduction is to advance from empirical observations of events and identify the causal power of structures by using a combination of induction, deduction, and abduction logics. They suggest that the deep interrogation of pathological cases, extreme cases, and comparative cases are helpful to identify causal explanations., Researchers can use a five-stage framework in an intensive study to build causal explanation: (1) descriptions—describe the complex phenomena that we intend to explain; (2) analytical resolution—distinguish the various components, aspects, dimensions, and levels of analysis and clarify what we aim to explain; (3) abduction—interpret and redescribe the components of the phenomena using theoretical frameworks for possible explanations; (4) retroduction—find answers to the question: “What is fundamentally constitutive for the structures and what mechanisms are related to these structures?” by using the aforementioned case study strategies; and (5) comparisons and contextualization—evaluate the explanatory power of different mechanisms using theoretical and empirical approaches and understand the context-sensitivities.14, 15, 16, 17 These five stages are often intertwined, rather than in strictly chronological order, to build causal explanations from the concrete (stage 1) to abstract (stage 2–4) and then back to the concrete (stage 5). Currently, very limited small data studies have been done that draw on critical realism to build causal understandings of nursing phenomena, whereas it has become increasingly used in education, information science, and management fields.Causal dispositionalism philosophers suggested a three-stage iterative causal discovery process, including (1) observing phenomenon; (2) hypothesizing causation; and (3) establishing causation. The small data approach runs through all these stages. Based on the proposed process, we can start from an observed phenomenon and gain a deep understanding of this phenomenon in its context in sufficient details through case studies. We then enter stage 2 to hypothesize causation. Building on existing knowledge, one can test the hypothesis through experimentations, such as the N-of-1 trials, and further observations, such as patient narratives. After that, we can try to establish causation by observing cases of causal failure, in which dispositions do not manifest in an expected way. We may have a clear understanding ofcausation at this stage, or we need to go back to stage 1 to continue the process. Anjum et al, in a recent publication, proposed a Dx3 approach for assessing whether a particular medicine has or could have caused a certain adverse event by using small data (ie. individual case safety reports). This approach argues that three types of dispositions—the drug disposition, the predisposition of the patient taking the drug (vulnerability), and the disposition of the patient–drug interaction (mutuality)—can be qualitatively evaluated to understand causality in pharmacovigilance. They also provide guidance, checklists, and examples on how to conduct such causal assessments. It should be noted that here I aim to emphasize the significant, and at some points, irreplaceable role of small data throughout the causation building process. That does not mean other research methods, such as randomized controlled trials, cohort studies, case-control studies, and lab models, contribute little to the process. These methods all contribute to the causal process in one way or another, especially in the testing of causal hypotheses.
Small data for rapid learning
Small data can also promote rapid learning. While the big data approach uses data from a group of individuals to produce transportable knowledge primarily for other individuals, the defining feature of small data is that the data are collected from and primarily used by individuals for their own purposes. We become the consumers of our own data in the small data paradigm., Similar to local quality improvement projects, small data becomes advantageous for rapid learning at the N-of-1 unit. This unit can learn from and respond to the data findings rapidly. When the N-of-1 unit is a single patient, we can adjust treatment and care plans for this individual agilely based on his/her own process and outcome data; when the N-of-1 unit is one healthcare organization, the small data becomes valuable for advancing a learning healthcare organization where “knowledge generation processes are embedded in daily practice to produce continual improvement in care”. Such an advantage of small data is named reinforcement learning in artificial intelligence—“an agent learns how to interact with its environment via trial and error” (p. 8).The big data approach uses population-level data to inform individual-level practice through clustering people based on variables (ie. statistical generalization), it is challenging for rapid learning due to the impact of local complex and dynamic contexts. The small data approach, on the other hand, produces theoretical knowledge on causal mechanisms and informs population-level practice by clustering people with similar mechanisms (ie. theoretical generalization)., When the causal mechanisms identified from small data fail to be applied to another individual, we can instantly cease its generalization and reflect on the explanatory power of the mechanisms and contextual differences. The N-of-1 unit is sufficient to trigger reflection and learning. Such an advantage of small data is named transfer learning in artificial intelligence—it “works by first learning how to perform a task in a setting where data is abundant, then transferring what it has learned there to a task where much less data is available” (p.6).
Small data-oriented research methods
In this section, I describe two promising research methods that are small data-oriented, namely the case study method in social science and the N-of-1 trial in natural science.
Case study
A case study is a commonly used research method in social science. It is the study of a real-life, contemporary case or cases over time through detailed and in-depth data collection involving multiple sources of information, such as observations, interviews, and documents. A case is a bounded system and can be defined or described within certain parameters. While the case study approach can be used for exploratory, descriptive, and explanatory purposes, its potential to build causal explanations is largely untapped. Case study has been advised as a promising method to explain complex systems as it can capture the complex and dynamic interaction among system agents through longitudinal and cross-case analysis., A well-conducted empirical case study allows for an understanding of the dynamic and evolving influence of context on complex system-level interventions, thus generating theoretical knowledge on what interventions work in what context to achieve desired effects. The analytical method of qualitative comparative analysis has become popular in case studies to identify the configurations of causal conditions for an intended outcome. A recent meta-narrative review of case study approaches identified four broad research traditions in evaluating complex interventions: (1) developing and testing complex interventions in healthcare; (2) analyzing changes in organizations; (3) undertaking realist evaluations; and (4) studying complex change naturalistically.
N-of-1 trial
N-of-1 trials are prospective single-patient trials with multiple crossover periods between interventions and comparators. This type of trial uses key methodological components from traditional group-based clinical trials to evaluate the effectiveness of treatment in a single patient. This design is particularly useful for situations that can not accommodate large-scale trials, such as patients with rare diseases, comorbid conditions, or using concurrent therapies, and is more appropriate for patients with stable chronic conditions. The ultimate goal of an N-of-1 trial is to determine the optimal evidence-based treatment for an individual patient. So, a major advantage of the one-person trial is that the effectiveness of treatments is vetted for the participants themselves rather than benefiting other people which is the goal of traditional trials. N-of-1 trials are also good tools for establishing difference-making (ie. the philosophical idea that causes tend to make a difference; hence, if one can discover the difference-makers, then it can be a good indicator of causes) in a unique context, and therefore, have the epistemic advantages for establishing causation. The aggregated results of multiple N-of-1 trials have the potential to inform population-level treatment. In 2015, Nature published a commentary to call for the scaling up of N-of-1 trials in everyday clinical practice to promote precision medicine and personalized care.
Complementary use of big and small data
Big data has become an irreversible trend in current healthcare. Rather than devaluing big data contributions and prioritizing small data, it is more meaningful to integrate them at the doctor–patient encounter to achieve the personalized care promise.,, While big data generates scientific knowledge from statistical analysis of variables, small data produces tacit knowledge from patient narratives. Big data provides an understanding of general patterns on correlations, whereas small data gains insights into causations. Small data can build on big data findings and refine them in an individual patient to understand the context sensitivities. Big data become established by collecting high-quality small data. Sacristan and Dilla suggested that electronic medical records were places where small and big data could perfectly intersect. Small data is collected at each doctor-patient encounter and uploaded to the electronic medical records. Big data becomes established when those complete and standardized small data are summed up.The complementary use of the big and small data approach, as I propose, can have two common forms: the bottom-up approach and the top-down approach. The bottom-up approach collects high-quality small data to build big data and informs not only individual care but also policymaking. This approach has been vividly described in a case study of fundamental nursing care. Drawing on the complexity science, Conroy et al described in detail the development of data matrices through analyzing patient narratives to quantify and evaluate fundamental care in a single patient. Building on the small data, they are planning to standardize the procedures and use computer algorithms and digital technology to establish big data. The complementary use of the small and big data, in this case, can not only inform the fundamental care at the individual level but also help reveal predictive patterns of care and build an early warning system to prevent the deterioration of care standards. The top-down approach draws on findings from existing big data and fine-tunes them in individual cases. For example, Zhu et al described the use of big data-based network analysis to identify the symptom patterns in a population to inform symptom care; Zhao argued that to achieve personalized symptom care, we should turn to narratives and medical records to refine and consolidate the symptom patterns in an individual patient.
Implications of small data for personalized cancer care
While small data is a relatively new concept, the use of small data-oriented methods has a long history already. Nevertheless, it is still underused in the nursing field and its power in promoting personalized nursing care is underestimated. The small data approach has several implications for personalized cancer care, and more broadly, nursing practice: (1) Integrate clinical practice with nursing research —transforming everyday cancer care practice into solid individual-level data collection and improving the quality of data entered in the electronic medical records. (2) Embrace patient narratives—listen to stories of patients with cancer to understand their bio-physio-social contexts, which will facilitate our understandings of the underlying causes of patient symptoms. (3) Employ a complex science lens to uncover the interrelationships among patient's cancer symptoms and, more broadly, health conditions., Human body is a complex adaptive system which requires us to take a holistic rather than a reductionist view on patients’ conditions. (4) Interdisciplinary efforts to understand the causation behind cancer symptoms to inform nursing practice. (5) Perform shared decision making with patients on care plans and be ready to adjust them agilely based on rapid learning.
Conclusions
In this article, I distinguished the ontological positions of small and big data approaches and illustrated two unique advantages of small data: building causation and facilitating rapid learning. After that, I described two commonly used methods that are small data-oriented: case study and N-of-1 trial. Small data and big data should be used in a complementary fashion to promote personalized care. Such integration is promising to facilitate the close collaboration between researchers (the big data approach) and healthcare professionals (the small data approach) and is likely to translate research evidence into practice timely. Lastly, I proposed several implications of small data for personalized cancer care practice.
Authors: Eric B Hekler; Predrag Klasnja; Guillaume Chevance; Natalie M Golaszewski; Dana Lewis; Ida Sim Journal: BMC Med Date: 2019-07-17 Impact factor: 8.775
Authors: Sara Paparini; Judith Green; Chrysanthi Papoutsi; Jamie Murdoch; Mark Petticrew; Trish Greenhalgh; Benjamin Hanckel; Sara Shaw Journal: BMC Med Date: 2020-11-10 Impact factor: 8.775