Literature DB >> 27781485

Big Data in radiation therapy: challenges and opportunities.

Tim Lustberg1, Johan van Soest1, Arthur Jochems1, Timo Deist1, Yvonka van Wijk1, Sean Walsh1, Philippe Lambin1, Andre Dekker1.   

Abstract

Data collected and generated by radiation oncology can be classified by the Volume, Variety, Velocity and Veracity (4Vs) of Big Data because they are spread across different care providers and not easily shared owing to patient privacy protection. The magnitude of the 4Vs is substantial in oncology, especially owing to imaging modalities and unclear data definitions. To create useful models ideally all data of all care providers are understood and learned from; however, this presents challenges in the guise of poor data quality, patient privacy concerns, geographical spread, interoperability and large volume. In radiation oncology, there are many efforts to collect data for research and innovation purposes. Clinical trials are the gold standard when proving any hypothesis that directly affects the patient. Collecting data in registries with strict predefined rules is also a common approach to find answers. A third approach is to develop data stores that can be used by modern machine learning techniques to provide new insights or answer hypotheses. We believe all three approaches have their strengths and weaknesses, but they should all strive to create Findable, Accessible, Interoperable, Reusable (FAIR) data. To learn from these data, we need distributed learning techniques, sending machine learning algorithms to FAIR data stores around the world, learning from trial data, registries and routine clinical data rather than trying to centralize all data. To improve and personalize medicine, rapid learning platforms must be able to process FAIR "Big Data" to evaluate current clinical practice and to guide further innovation.

Entities:  

Mesh:

Year:  2016        PMID: 27781485      PMCID: PMC5605034          DOI: 10.1259/bjr.20160689

Source DB:  PubMed          Journal:  Br J Radiol        ISSN: 0007-1285            Impact factor:   3.039


Primarily because of the ubiquity of imaging in oncology, as well as many other diagnostic and therapeutic procedures, cancer data are firmly in the realm of “Big Data”.[1] To make an estimate, in the past 10 years, approximately 140 million patients were diagnosed with cancer in about 100,000 hospitals globally. If one assumes a data volume (depending on the hospital) of 0.1–10 Gb of data per patient, the total volume of cancer patient data in the world is estimated to be 14–1400 petabyte of data. Specifically, data in radiation oncology can be classified as “Big Data” because: the use of data-intensive imaging modalities (Volume), the imaging archives are growing rapidly (Velocity), there is an increasing amount of imaging and diagnostic modalities available (Variety), and interpretation and quality differs between care providers (Veracity). With this deluge in data, it becomes increasingly hard to translate all these data into knowledge and subsequently leverage that knowledge to guide clinical decisions.[2] The radiation oncologist is overwhelmed with scientific literature, swiftly evolving treatment techniques and the exponentially increasing amount of clinical data.[2] To provide high-quality individualized treatments, radiation oncologists need help translating all these data into knowledge that supports decision-making in routine clinical practice.[3] Collecting these data provides its own set of challenges. The data are spread over care providers around the world, difficult to share while protecting patient privacy, non-interoperable and varying in quality. The gold standard to assess the utility of innovations that directly affect patients is clinical trials. However, clinical trials only provide information about a select patient population, which often represents only a small percentage of the actual population. Also, clinical trials provide the radiation oncologist with little information when making clinical decisions for someone who does not (exactly) fit the trial population owing to age, comorbidities etc. On the other hand, clinical trials do provide high-quality reusable data owing to the clear definitions that are provided by trial protocols. Initiatives such as IBM Watson attempt to simplify accessing knowledge garnered from scientific literature for physicians. Patient characteristics provided by the physician are used to find and retrieve relevant publications (and possible other sources), which can aid the physician in making precision decisions for that particular patient.[4] To fill the gap of evidence between clinical trials and the common patient (i.e. one who does not fit trial inclusion criteria), data registries are being created around the world.[5] In general, the goal is to register a select set of parameters for all patients treated for a certain cancer. This results in a large patient population with high-quality data.[5] However, this requires great effort from care providers to collect these data, which limits the number of elements recorded, as someone has to fill in a form, digital or paper, to provide the registries with the data specified in the registry protocol. There are some early initiatives[6] to automatically provide the registries with the data they require by data mining the Oncology Information Systems. In theory, this should work well for all structured data (e.g. the fractionation schema or age), but is challenging for data which are usually recorded in free text (e.g. smoking behaviour or comorbidities). Registries in general give insights in practice but are not designed to guide decisions for individual patients. ASCO CancerLinQ is the exception; it aims to create a “super” registry with a learning approach on routine healthcare data in medical oncology.[7] Cancer screening shows promising advancements in identifying patients at high risk using data mining techniques;[8] this particular example shows the power of centralizing data. A different approach is to use routine clinical data from around the world to transform data into knowledge. As a proof of concept, the euroCAT project created data stores at several cancer centres, which can be accessed using web technologies. The local data are mined, pseudoanonymized, translated, mapped to standard concepts and made available to trusted partners in the network. The trusted partners do not have direct access to the data, but they can send a machine learning algorithm to the different data stores to learn from the data without sharing them (i.e. knowledge sharing, not data sharing). To demonstrate the power of this technique, an existing model was improved by combining the data of five centres (www.eurocat.info). However, a lot of time and effort is still required to access and utilize all data generated in clinical routine and translate them into knowledge. The end result of distributed learning is prediction models which can support physicians when making patient-specific choices (www.predictcancer.org). A different successful data mining was started by Public Health England. Data from all linear accelerators in England were collected automatically using the Radiotherapy Dataset tools (http://www.ncin.org.uk/collecting_and_using_data/rtds). This data set was analyzed to examine the variation in given treatments for different regions of the country. An important topic when discussing collection of healthcare data is patient privacy. All three approaches handle privacy issues differently. Registries are usually hosted in some central location by professional societies or government-related entities, which are often authorized to collect identifiable patient data and securely store them while giving researchers an anonymized view of these data. Clinical trials work with informed consent and with pseudoanonymized data. Distributed systems are privacy-by-design systems, as they simply do not allow data to leave the site where they were collected. It should be noted that there is a perception among healthcare providers that data must be kept in isolation owing to privacy issues. The fact is that there are existing solutions for these issues. The real barrier to learning from (Big) data in healthcare is that it requires willingness, resources and expertise.[9] Healthcare data are not yet “Big” enough to apply purely data-driven machine learning approaches and clinical expertise is needed to create useful models that make sense to the clinicians. Clinical trials, clinical registries and routine clinical data all provide unique evidence, which is currently utilized separately. Combining the three evidence sources into interoperable data stores makes them complementary to each other and will enable healthcare to move forward. However, data quality (Volume, Variety, Velocity and Veracity) and sharing issues are hindering progress. To achieve “Big Data” in healthcare, the data have to be Findable, Accessible, Interoperable and Reusable. The Findable, Accessible, Interoperable and Reusable Guiding Principles[10] can be applied to achieve good data management and stewardship, which will enable knowledge discovery and innovation. Eventually, when data-driven machine learning approaches have matured, it will provide a large knowledge base and clinical trials will only be used for a small subset of studies that requires to specific setup or a trial to prove (i.e. a new experimental treatment). IBM have stated that there are numerous ways to improve healthcare using their technology and they provide a conclusion of utmost importance: “Information technology cannot drive change”.[4] “Big Data” can be a powerful tool to move healthcare forward, but healthcare providers need to invest resources to make this happen. Consequently, industry leaders in radiation oncology are already exploring this horizon market in anticipation of the opportunities and challenges that “Big Data” in healthcare represents, both in terms of efficiency and efficacy. Our experience is that the technical limitations of sharing data are minimal. Practical reasons are that healthcare providers are not willing, do not have the resources and/or knowledge to share their data. Sharing data can have an effect on the reputation of the healthcare provider because it allows their performance to be compared with others. Furthermore, these data can be used in research that a competing institute is working on as well, possibly creating unwanted competitors. By limiting the access to data to a machine and only sharing the model learned from the data, these issues are eliminated or largely negated. Despite the conflicting interests of healthcare providers, change may be driven by pressure from external institutions (such as government and health insurance companies) to ensure that the highest standard and most cost-effective care is delivered to the patient. Many world leaders throughout history have referenced the seventeenth century poem by John Donne—“No man is an island”, when illustrating the need for collective responsibility and action towards a brighter future for all. This maxim rings as true in healthcare as it does in all other areas of life. We believe that utilizing patient privacy-preserving distributed machine learning to translate and combine all data sources into knowledge will enable healthcare to move to individualized, high-quality, affordable and safe cancer treatments, ensuring the sustainability of healthcare. This will also allow moving further towards participative medicine with customized patient decision aids.
  9 in total

1.  Delivering affordable cancer care in high-income countries.

Authors:  Richard Sullivan; Jeffrey Peppercorn; Karol Sikora; John Zalcberg; Neal J Meropol; Eitan Amir; David Khayat; Peter Boyle; Philippe Autier; Ian F Tannock; Tito Fojo; Jim Siderov; Steve Williamson; Silvia Camporesi; J Gordon McVie; Arnie D Purushotham; Peter Naredi; Alexander Eggermont; Murray F Brennan; Michael L Steinberg; Mark De Ridder; Susan A McCloskey; Dirk Verellen; Terence Roberts; Guy Storme; Rodney J Hicks; Peter J Ell; Bradford R Hirsch; David P Carbone; Kevin A Schulman; Paul Catchpole; David Taylor; Jan Geissler; Nancy G Brinker; David Meltzer; David Kerr; Matti Aapro
Journal:  Lancet Oncol       Date:  2011-09       Impact factor: 41.316

2.  Building a Rapid Learning Health Care System for Oncology: Why CancerLinQ Collects Identifiable Health Information to Achieve Its Vision.

Authors:  Alaap Shah; Andrew K Stewart; Andrej Kolacevski; Dina Michels; Robert Miller
Journal:  J Clin Oncol       Date:  2016-01-11       Impact factor: 44.544

3.  A prospective study comparing the predictions of doctors versus models for treatment outcome of lung cancer patients: a step toward individualized care and shared decision making.

Authors:  Cary Oberije; Georgi Nalbantov; Andre Dekker; Liesbeth Boersma; Jacques Borger; Bart Reymen; Angela van Baardwijk; Rinus Wanders; Dirk De Ruysscher; Ewout Steyerberg; Anne-Marie Dingemans; Philippe Lambin
Journal:  Radiother Oncol       Date:  2014-05-17       Impact factor: 6.280

4.  IBM's Health Analytics and Clinical Decision Support.

Authors:  M S Kohn; J Sun; S Knoop; A Shabo; B Carmeli; D Sow; T Syed-Mahmood; W Rapp
Journal:  Yearb Med Inform       Date:  2014-08-15

5.  Overview of the American Society for Radiation Oncology-National Institutes of Health-American Association of Physicists in Medicine Workshop 2015: Exploring Opportunities for Radiation Oncology in the Era of Big Data.

Authors:  Stanley H Benedict; Karen Hoffman; Mary K Martel; Amy P Abernethy; Anthony L Asher; Jacek Capala; Ronald C Chen; Bhisham Chera; Jennifer Couch; James Deye; Jason A Efstathiou; Eric Ford; Benedick A Fraass; Peter E Gabriel; Vojtech Huser; Brian D Kavanagh; Deepak Khuntia; Lawrence B Marks; Charles Mayo; Todd McNutt; Robert S Miller; Kevin L Moore; Fred Prior; Erik Roelofs; Barry S Rosenstein; Jeff Sloan; Anna Theriault; Bhadrasain Vikram
Journal:  Int J Radiat Oncol Biol Phys       Date:  2016-07-01       Impact factor: 7.038

6.  Initial outcomes of an integrated outpatient-based screening program for oral cancers.

Authors:  Li-Jen Liao; Hsiu-Ling Chou; Wu-Chia Lo; Chi-Te Wang; Hsu-Wen Chou; Chih-Dao Chen; Chen-Hsi Hsieh; Yu-Chin Lin; Po-Wen Cheng
Journal:  Oral Surg Oral Med Oral Pathol Oral Radiol       Date:  2014-09-28

7.  Practice-based evidence to evidence-based practice: building the National Radiation Oncology Registry.

Authors:  Jason A Efstathiou; Deborah S Nassif; Todd R McNutt; C Bob Bogardus; Walter Bosch; Jeffrey Carlin; Ronald C Chen; Henry Chou; Dave Eggert; Benedick A Fraass; Joel Goldwein; Karen E Hoffman; Ken Hotz; Margie Hunt; Marc Kessler; Colleen A F Lawton; Charles Mayo; Jeff M Michalski; Sasa Mutic; Louis Potters; Christopher M Rose; Howard M Sandler; Gregory Sharp; Wolfgang Tomé; Phuoc T Tran; Terry Wall; Anthony L Zietman; Peter E Gabriel; Justin E Bekelman
Journal:  J Oncol Pract       Date:  2013-05       Impact factor: 3.840

8.  The National Cancer Data Base: a powerful initiative to improve cancer care in the United States.

Authors:  Karl Y Bilimoria; Andrew K Stewart; David P Winchester; Clifford Y Ko
Journal:  Ann Surg Oncol       Date:  2008-01-09       Impact factor: 5.344

9.  The FAIR Guiding Principles for scientific data management and stewardship.

Authors:  Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal:  Sci Data       Date:  2016-03-15       Impact factor: 6.444

  9 in total
  5 in total

Review 1.  Applications and limitations of machine learning in radiation oncology.

Authors:  Daniel Jarrett; Eleanor Stride; Katherine Vallis; Mark J Gooding
Journal:  Br J Radiol       Date:  2019-06-05       Impact factor: 3.629

2.  Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care.

Authors:  Fadila Zerka; Samir Barakat; Sean Walsh; Marta Bogowicz; Ralph T H Leijenaar; Arthur Jochems; Benjamin Miraglio; David Townend; Philippe Lambin
Journal:  JCO Clin Cancer Inform       Date:  2020-03

Review 3.  Decision Support Systems in Oncology.

Authors:  Seán Walsh; Evelyn E C de Jong; Janna E van Timmeren; Abdalla Ibrahim; Inge Compter; Jurgen Peerlings; Sebastian Sanduleanu; Turkey Refaee; Simon Keek; Ruben T H M Larue; Yvonka van Wijk; Aniek J G Even; Arthur Jochems; Mohamed S Barakat; Ralph T H Leijenaar; Philippe Lambin
Journal:  JCO Clin Cancer Inform       Date:  2019-02

4.  Meeting the Challenge of Scientific Dissemination in the Era of COVID-19: Toward a Modular Approach to Knowledge-Sharing for Radiation Oncology.

Authors:  Clifton D Fuller; Lisanne V van Dijk; Reid F Thompson; Jacob G Scott; Ethan B Ludmir; Charles R Thomas
Journal:  Int J Radiat Oncol Biol Phys       Date:  2020-10-01       Impact factor: 7.038

5.  Automatic Liver Segmentation in CT Images with Enhanced GAN and Mask Region-Based CNN Architectures.

Authors:  Xiaoqin Wei; Xiaowen Chen; Ce Lai; Yuanzhong Zhu; Hanfeng Yang; Yong Du
Journal:  Biomed Res Int       Date:  2021-12-16       Impact factor: 3.411

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.