Literature DB >> 32509607

Common data elements of breast cancer for research databases: A systematic review.

Esmat Mirbagheri1, Maryam Ahmadi2, Soraya Salmanian3.   

Abstract

BACKGROUND: Common Data Elements (CDEs) are data-metadata descriptors used to collect research study data. CDEs facilitate the collection, processing, and sharing of breast cancer data. This study intended to explore the CDEs of breast cancer for research databases and primary care systems.
METHODS: This study was conducted using systematic search and review. This systematic literature review covered PubMed, Scopus, Science Direct, SID, ISC, Web of Science, and Google Scholar search engine. It included studies in English language with accessible full-text from the beginning of 2007 to September 2019.
RESULTS: Reviewing 25 studies revealed that 52 percent of studies were carried out in the US and most studies were conducted between 2013 and 2015. The most domains for using CDEs were: Pathology Report and Registry. The CDEs of breast cancer for research databases were categorized into three categories namely clinical, research, and non-clinical and indicate the importance of these data elements. Most of the studies focused on creating and deploying clinical CDEs as physical examination, clinical history and pathology data.
CONCLUSION: The integration of biomedical and clinical data relevant to breast cancer enhances the power of research variable analysis and statistical analysis, thereby facilitating improved knowledge of effective therapeutic interventions. Also CDEs used to collect, store, and retrieve patient data in various health setting such as primary care and research databases. Copyright:
© 2020 Journal of Family Medicine and Primary Care.

Entities:  

Keywords:  Breast cancer; common data elements (CDEs); research database

Year:  2020        PMID: 32509607      PMCID: PMC7266190          DOI: 10.4103/jfmpc.jfmpc_931_19

Source DB:  PubMed          Journal:  J Family Med Prim Care        ISSN: 2249-4863


Introduction

Breast cancer, along with lung and colorectal cancer, has three types of cancer in terms of high incidence and mortality rate worldwide. Together, these three cancers account for one-third of all cancer and deaths in the world. Breast cancer as the fifth leading cause of death (627,000 deaths, 6.6%) is a relatively favorable prognosis and is at least prevalent in developed countries. Breast cancer is the most common cancer in women (24.2%, about one in every four new cases of cancer diagnosed in women worldwide, is breast cancer). Breast cancer is also the leading cause of cancer deaths in women (15.0%).[1] Breast cancer is the most common cancer among Iranian women, with an estimated number of cases (5 years) in 2018 for Iran, amounted to 40,825 or 32.3%.[2] Improving the efficacy of clinical trials in research for breast cancer will lead to the innovation and reduction of time to use new methods and drugs in the treatment of this disease. To increase efficiency, different departments involved in a research study, including sponsors, clinical researchers, and surveillance devices, and each used different systems and software to collect and analyze data that integrate these programs and systems. Integration is one of the most important factors for achieving desirable goals in the field of medical research. But, at present, the relationship between the two areas of clinical research and clinical care is incomplete or sometimes completely disconnected because they each use different standards and terminology systems.[3] Also, the integration of heterogeneous datasets into clinical research is one of the complex problems that require continuous efforts to optimally utilize data and information in biomedical research.[4] The deficiencies and inefficiencies in follow-up care for breast cancer survivors in primary health care indicated the value of healthcare records and datasets in healthcare systems.[5] Improving health care for patients with breast cancer need to coordinate data from health setting and research database for qualitative primary care.[6] Integration of clinical data into electronic health records and clinical trials will increase the likelihood of intervention for disease prevention and treatment.[7] However, data integrity in the studies are always complex and difficult.[8] Conventional data collection methods are slow and costly.[9] The use of CDEs is one of the data integration methods that has increased ability to analyze stored data and combine different findings from studies, thereby reducing the cost of clinical research.[10] It also requires the definition of a specific set of features to identify CDEs. Among other elements of the data are data synchronization and integration in clinical settings can facilitate synchronization of data and spatial data in a specific field.[11] At present, data sharing in the clinical setting and full semantic interoperability between heterogeneous systems have not yet been realized. However, significant progress has been made in this area. In the International Classification of Diseases, a set of standards for controlled clinical terms has been widely used.[12] Integrating or combining data from different sources and providing it to users with the same vision, Researchers help to coordinate data elements in a particular area or a particular subject.[13] Data heterogeneity is a major source of challenge in integrating data and inability to interact with health information systems to deliver accurate and effective health care. Knowledge generation is based on clinical data in the context of clinical research.[1415] Therefore, solving the semantic heterogeneity problem is the key to achieving interoperability between health care systems and integrating different datasets related to different domains.[16] Although common data elements were introduced in 2015 by Mesh, since the early 20th century and before, common data elements were used to exchange the same data in different environments of computer systems. The purpose of this study was to retrieve common data elements used in breast cancer databases in order to integrate data elements into heterogeneous clinical research systems.

Methods

Search strategy

Databases including PubMed, Scopus, Science Direct, SID, ISC, Web of Science, and Google Scholar search engine were searched from 2007 to 2019. The Mesh term “CDEs” and all its entry terms[17] were searched with “OR” operator. In addition, we searched different terms for breast cancer such as breast neoplasm, breast tumor, mammary cancer, cancer of breast, malignant neoplasm of breast, and breast malignant tumor using “OR” operator. Additionally, the term “research databases” was also searched. All of the three search strategies were conducted in titles, abstracts, and keywords. Finally, the results of the above searches were combined together using “AND” operator. The research team also checked the references of the retrieved articles to find any related articles missed through the searching process. We included English papers dealing with CDEs of breast cancer from 2007 to 2019 (the last 12 years). Papers for which full-text was not accessible and not available in English were excluded. After importing the selected articles into the EndNote, the duplicate items were excluded.

Study selection

The produced list was then independently checked by two raters of the research team in terms of title, abstract, and the content given the inclusion and exclusion criteria. In this way, 25 articles were finally included in the study. The cases of inter-rater disagreement were resolved by holding a mutual meeting by two research team. Furthermore, the inter-rater agreement estimated by using kappa coefficient (κ) was found to be 0.85(statistically significant at P < 0.001). Study selection steps were as per PRISMA flow diagram.

Results

A total of 396 relevant studies were retrieved by the database search. After removing 184 duplicates, 212 studies remained. We excluded 127 studies based on title, another 52 studies based on abstract or full text, and 8 studies because the full-text article was not available. The remaining 25 studies were included for this review. Figure 1 depicts the details of the selection of the studies based on PRISMA flow diagram.
Figure 1

Details of selection of the studies based on PRISMA flow diagram

Details of selection of the studies based on PRISMA flow diagram Trend: Figure 2 shows the trend of studies from 2007 to 2019. Most of studies were carried out in 2013 and 2015. Ninety-five percent of the studies were published in scientific journals and only one study was published in the Australian report that used CDEs to design the national registration system.
Figure 2

Trend of studies from 2007 to 2019

Trend of studies from 2007 to 2019

Country

Most the articles were published in US. The extracted studies showed that 48 percent were published in US and 32 percent in Europe. Furthermore, England had the highest studies with 20 percent in Europe [Table 1]. Asia and Australia each accounted for 8% of studies, indicating that these countries were less in proportion to this domain.
Table 1

The frequency and percentage of published papers by country

CountryNumberPercentage
USA1352%
England520%
The Netherlands28%
Iran14%
Germany14%
New Zealand14%
Thailand14%
Australia14%
Total25100%
The frequency and percentage of published papers by country

Domain

The most use of CDEs was in the domain of pathology reporting and registration system with 16%. The domain of integration and diagnosis and screening were the next priorities of the studies. (8%) In general, the use of CDEs of breast cancer in research centers in different domains indicates the ease of creation and use of these data elements [Table 2].
Table 2

The frequency of domain CDEs in breast cancer

Domain of StudiesNumberPercentage
Pathology Report416%
Registry416%
Integration28%
Screening and Diagnosis28%
Mammography14%
Treatment14%
Data Harmonization14%
Biomarkers14%
Immunology14%
Big Data14%
Interoperability14%
Imaging14%
Documentation and Medical Forms14%
Minimum Data Set14%
Clinical Trial14%
Mesothelioma Virtual Tissue14%
Virtual Biorepository14%
Total25100%
The frequency of domain CDEs in breast cancer

CDEs

CDEs of breast cancer for research databases were categorized into three categories. These CDEs were clinical, research, and non-clinical indicate the importance of these data elements and frequency of use [Table 3].
Table 3

CDEs of breast cancer

Category (Clinical CDEs)Subcategory
Personal history[1819202122]Life style, physical activity and diet habits, Quality of Life, Sleep Habits, Comorbidity Diseases, Clinical History, Menopausal Status
Physical Examination & Clinical History[111819232425262728293031323334]Vital Sing, Size of Tumor, Side of Tumor (left, right or bilateral), Main Illness, Status of Body Systems, Morbidity, Referral Data, Body Mass Index, Main Complaint
Family History[2233]Morbidity
Diagnosis[1819202122232530333536]Mammography, sonography, MRI
Cancer Data[3132373839]Type of Cancer (In Situ, Ductal Carcinoma in Situ & Metastasis), Metastasis Status, Date of First Diagnosis, Date of First Metastasis
Surgical Data[18202627333438]Date of Surgery, Type of Surgery, Important Findings, Intraoperative Data
Pathology Data[11182021232425262729303233343739]Surgical Pathology, Grade of Tumor (I, II, II, IV), Margin of Mass, Macroscopic & Microscopic Data, Morphology Data , block level annotation, DCIS
Core Needle Biopsy[193340]Size and type of core needle biopsies
Specific Biomarkers & Hormone therapy[1121283031323334373841]ER, PR, HER2
Epidemiologic Data[2529]Reproductive data, hereditary status
Laboratory Test[1923]CA 15.3, CEA (carcinoembryonic antigen), CA125
Genetic & Genomic Data[212833353841]Gene Sequence Number, Cancer Phenotype
Lymph Node Status[18202427303132]Lymph Node Involvement, Examination of Lymph Nodes, Metastasis to Lymph node, Number of Lymph Nodes Tested
Histology Data[212431323339]Grade of Histology, Immunohistology
Type of Treatment & Results[11192122232630313335363839]Pharmaceutical, Surgical, Chemotherapy, Radiotherapy, Palliative care
Fallow Up Patient[182325293033353839]Fallow up of Treatment, Recurrence, Managing and Controlling Patient Pain
Radiology, Mammography, Ultra Sonography, MRI Data[19353642]Location of Mass (left, right or bilateral), BIRAD Imaging & Result, Calcification, Cystic Breast

Category (Non Clinical CDEs)Subcategory

Demographic[11212223252627282934363739]Name, Family Name, Patient ID, Birth Date, Marital Status, Address of Home and Work
Identification Information[233436]Social Security Number, Patient Identification Number, Medical Record Number
Contact Information[113336]Information of Contact with other Treatment providers, Address
Financial information/refund[111936]Government or institutions
Managerial &Legal information[21222731]Patient Consent, Date of Death (If Happen), Cause of death

Category (Research CDEs)Subcategory

Type of Study[1118222830]Cohort, Clinical Trial, ID of study
Location of Study[111822]Hospital, Palliative Center, Day Clinic
Date of Start (study)[11182228]Date
Date of Last Contact Patient[1833]Date
CDEs of breast cancer

Discussion

The purpose of this study was to provide an overview of breast cancer data elements in research databases. The results are generally divided into four categories: 1) the study trend over the time, 2) the study site, 3) the domain of studies, and 4) the CDEs of breast cancer derived from these articles. Studies were conducted before 2007, but due to limitations in the study, most studies were conducted in the period 2007–2019 between 2013 and 2015. Most studies were conducted in the United Kingdom with 8% in 2013 and the United States and the Netherlands with 4% each. Most studies were conducted in 2015 in the US with 8% and Iran and Thailand each with 4%. Most studies on the creation and use of CDEs were from the US and European countries. 52 percent of studies were in the US and 32 percent in the EU. In the European Union, the UK accounts for 20 percent of the most common data on breast cancer data with the other European countries, the Netherlands and Germany following with 8% and 4%. Countries such as France, Italy, and Norway lacked studies on CDEs for breast cancer. In the domain of study, most of the studies were in the field of pathology reporting and registration system with 16% each. Integration with 8% and screening and diagnosis with 8% were also studied. Other areas of importance for the creation and use of CDEs include: mammography, cancer treatment, data coordination, biomarker immunology, Big Data, interoperability, imaging, documentation and medical forms, minimal data sets, clinical trials, mesothelioma, virtual tissue and virtual biorepository each made up 4%. In the field of CDEs used for the Cancer Research Database, CDEs were divided into three categories, clinical, research, and nonclinical, by the research team. Common clinical data elements had the most duplication in the study literature, which were subdivided into categories. The most CDEs were in the clinical category, which is referred to in most articles. CDEs, physical examination and clinical history,[111819232425262728293031323334] diagnosis,[1819202122232530333536] pathologic data,[11182021232425262729303233343739] type and outcome of treatment,[11192122232630313335363839] and specific biomarkers of breast cancer and hormone therapy[1121283031323334373841] are of greater importance and have been suggested to be used in most studies. Other CDEs include patient follow up, surgical data,[182325293033353839] lymph nodes status,[18202427303132] genetic and genomic data, histological data,[212833353841] personal history,[1819202122] cancer data,[3132373839] radiology, mammography, ultrasonography and MRI data,[19353642] Core Needles Biopsy,[193340] epidemiologic data,[2529] and laboratory test.[1923] The review revealed that CDEs of physical examination and clinical history, diagnosis and pathologic data are very important in collecting and organizing CDEs in research databases and need to be used in the design and creation of registers and databases. In general, the most important of clinical CDEs of breast cancer were shared pathology and physical examination data items (64%). In nonclinical category, demography data[11212223252627282934363739] is important for designing CDEs and other subcategories included in identification information,[233436] managerial and legal information,[21222731] and contact information and financial information.[21222731] For the CDEs of the nonclinical, the demographic with 52% is most important. In the category of research CDEs, the type of study[1118222830] was more important, date of start (study),[11182228] location of studies,[111822] and the date of last visit or contact[1833] with the patient were identified in the relevant articles. In research CDEs, type of study with 20 percent is important. Storing and retrieving data through original data definitions using CDEs is a way of integrating clinical data from different databases.[21282930] The use of CDEs to underlie clinical research such as tissue banks demonstrates the efficacy and standardization of data model that can be applied in other domains of biomarkers and bioinformatics associated with breast cancer.[22282939] CDEs are also suitable for short-term studies on large datasets, so that these data elements act as a “mediator” of a unified model for mapping biomedical ontologies.[2639] Furthermore, CDEs facilitate the creation and use of EHR data, as an interface for connecting local data to EHR integrated or national registry.[2734] This study is like the Sluijter study, the importance of pathological data elements has been emphasized pathological data elements for description of 'resection margins', 'DCIS size', 'location' and 'presence of calcifications'.[32] There were limitations to the present study, such as, missing some studies with other language or studies that full text not available.

Conclusion

Medical research into various diseases, including cancer, requires the collection, processing, and exchange of data with different centers. Data exchange leads to data efficiency, preventing rework, saving time and cost, and ultimately enhancing the quality of medical research. One of the standard methods in this field is the use of data standards including data collection in the form of CDEs. Integrated datasets lead to integrated terminology to facilitate data management across the mass of patient data collected. Accordingly, CDEs can be the basis for achieving higher standard levels and data quality and facilitating the application of health information technology in breast cancer research centers. However, the identification and using of CDEs needs to be coordinated by different healthcare systems to use standard data for breast cancer patients to improve primary care.

Financial support and sponsorship

This study is a part of a PhD dissertation granted by Iran University of Medical Sciences (Grant No: IUMS/SHMIS_2017/9321563004).

Conflicts of interest

There are no conflicts of interest.
  37 in total

1.  A method to map heterogeneity between near but non-equivalent semantic attributes in multiple health data registries.

Authors:  Nadine Schuurman; Agnieszka Leszczynski
Journal:  Health Informatics J       Date:  2008-03       Impact factor: 2.681

2.  Data Harmonization for a Molecularly Driven Health System.

Authors:  Jerry Ssu-Hsien Lee; Warren Alden Kibbe; Robert Lee Grossman
Journal:  Cell       Date:  2018-08-23       Impact factor: 41.582

3.  Evaluation of an Automated Information Extraction Tool for Imaging Data Elements to Populate a Breast Cancer Screening Registry.

Authors:  Ronilda Lacson; Kimberly Harris; Phyllis Brawarsky; Tor D Tosteson; Tracy Onega; Anna N A Tosteson; Abby Kaye; Irina Gonzalez; Robyn Birdwell; Jennifer S Haas
Journal:  J Digit Imaging       Date:  2015-10       Impact factor: 4.056

4.  Agreement of Iranian breast cancer data and relationships with measuring quality of care in a 5-year period (2006-2011).

Authors:  Ali Keshtkaran; Roxana Sharifian; Saeed Barzegari; Abdolrasoul Talei; Seddigheh Liu; Hui Tahmasebi
Journal:  Asian Pac J Cancer Prev       Date:  2013

5.  DW4TR: A Data Warehouse for Translational Research.

Authors:  Hai Hu; Mick Correll; Leonid Kvecher; Michelle Osmond; Jim Clark; Anthony Bekhash; Gwendolyn Schwab; De Gao; Jun Gao; Vladimir Kubatin; Craig D Shriver; Jeffrey A Hooke; Larry G Maxwell; Albert J Kovatich; Jonathan G Sheldon; Michael N Liebman; Richard J Mural
Journal:  J Biomed Inform       Date:  2011-08-22       Impact factor: 6.317

6.  Using data to effectively manage a national screening program.

Authors:  Brandie Yancy; Janet E Royalty; Steve Marroulis; Cindy Mattingly; Vicki B Benard; Amy DeGroff
Journal:  Cancer       Date:  2014-08-15       Impact factor: 6.860

7.  A metadata approach for clinical data management in translational genomics studies in breast cancer.

Authors:  Irene Papatheodorou; Charles Crichton; Lorna Morris; Peter Maccallum; Jim Davies; James D Brenton; Carlos Caldas
Journal:  BMC Med Genomics       Date:  2009-11-30       Impact factor: 3.063

8.  Electronic Health Record Phenotypes for Precision Medicine: Perspectives and Caveats From Treatment of Breast Cancer at a Single Institution.

Authors:  Matthew K Breitenstein; Hongfang Liu; Kara N Maxwell; Jyotishman Pathak; Rui Zhang
Journal:  Clin Transl Sci       Date:  2018-01       Impact factor: 4.689

9.  Adherence to quality breast cancer survivorship care in four Canadian provinces: a CanIMPACT retrospective cohort study.

Authors:  Mary L McBride; Patti A Groome; Kathleen Decker; Cynthia Kendell; Li Jiang; Marlo Whitehead; Dongdong Li; Eva Grunfeld
Journal:  BMC Cancer       Date:  2019-07-04       Impact factor: 4.430

Review 10.  Big Data: the challenge for small research groups in the era of cancer genomics.

Authors:  Aisyah Mohd Noor; Lars Holmberg; Cheryl Gillett; Anita Grigoriadis
Journal:  Br J Cancer       Date:  2015-10-22       Impact factor: 7.640

View more
  5 in total

1.  Consensus core clinical data elements for meningiomas (v2021.1).

Authors:  Farshad Nassiri; Justin Z Wang; Karolyn Au; Jill Barnholtz-Sloan; Michael D Jenkinson; Kate Drummond; Yueren Zhou; James M Snyder; Priscilla Brastianos; Thomas Santarius; Suganth Suppiah; Laila Poisson; Francesco Gaillard; Mark Rosenthal; Timothy Kaufmann; Derek S Tsang; Kenneth Aldape; Gelareh Zadeh
Journal:  Neuro Oncol       Date:  2022-05-04       Impact factor: 13.029

2.  Identifying a minimum data set as a necessity to design a web-based personal health record for patients under chronic dialysis.

Authors:  Fatemeh R Jeddi; Ehsan Nabovati; Sorayya Rezayi; Soheila Saeedi; Shahrzad Amirazodi
Journal:  J Family Med Prim Care       Date:  2022-03-10

3.  Establishing a minimum data set for suicide and attempted suicide registry system in Iran.

Authors:  Mohsen Shafiee; Mostafa Shanbehzadeh; Hadi Kazemi-Arpanahi
Journal:  BMC Public Health       Date:  2022-04-29       Impact factor: 4.135

4.  Prediction of successful aging using ensemble machine learning algorithms.

Authors:  Zahra Asghari Varzaneh; Mostafa Shanbehzadeh; Hadi Kazemi-Arpanahi
Journal:  BMC Med Inform Decis Mak       Date:  2022-10-03       Impact factor: 3.298

Review 5.  Developing the minimum data set of the corrosive ingestion registry system in Iran.

Authors:  Zahra Mahmoudvand; Mostafa Shanbehzadeh; Mohsen Shafiee; Hadi Kazemi-Arpanahi
Journal:  BMC Health Serv Res       Date:  2022-09-27       Impact factor: 2.908

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.