BACKGROUND: Metadata is data that describes other data or resources. It has a defined number of named elements that convey meaning. Medical data are complex to process. For example, in the Primary Care Data Quality (PCDQ) renal programme, we need to collect over 300 variables because there are so many possible causes of renal disease. These variables are not just single columns of data--all are extracted as code plus date, while others are code-date-value. Metadata has the potential to improve the reliability of processing large datasets. OBJECTIVE: To define unique and unambiguous metadata headings for clinical data and derived variables. METHOD: We defined the look-up tables we would use as a controlled vocabulary to name the core clinical concepts within the metadata. We added six other elements to describe data: (1) the study or audit name; (2) the query used to extract the data; (3) the data collection number; (4) the type of data, including specifying the units; (5) the repeat number (if the variable was extracted more than once); and (6) a processing suffix that defines how the data have been processed. RESULTS: The metadata system has enabled the development of a query library and an analysis syntax library that make data processing and analysis more efficient. Its stability means greater effort can be put into more complex data processing, and some semiautomation of processes. However, the system has had implementation problems. It has been particularly hard to stop clinicians using multiple synonyms for the same variable. CONCLUSIONS: The PCDQ metadata system provides an auditable method of data processing. It is a method that should improve the reliability, validity and efficiency of processing routinely collected clinical data. This paper sets out to demystify our data processing method and makes the PCDQ metadata system available to clinicians and data processors who might wish to adopt it.
BACKGROUND: Metadata is data that describes other data or resources. It has a defined number of named elements that convey meaning. Medical data are complex to process. For example, in the Primary Care Data Quality (PCDQ) renal programme, we need to collect over 300 variables because there are so many possible causes of renal disease. These variables are not just single columns of data--all are extracted as code plus date, while others are code-date-value. Metadata has the potential to improve the reliability of processing large datasets. OBJECTIVE: To define unique and unambiguous metadata headings for clinical data and derived variables. METHOD: We defined the look-up tables we would use as a controlled vocabulary to name the core clinical concepts within the metadata. We added six other elements to describe data: (1) the study or audit name; (2) the query used to extract the data; (3) the data collection number; (4) the type of data, including specifying the units; (5) the repeat number (if the variable was extracted more than once); and (6) a processing suffix that defines how the data have been processed. RESULTS: The metadata system has enabled the development of a query library and an analysis syntax library that make data processing and analysis more efficient. Its stability means greater effort can be put into more complex data processing, and some semiautomation of processes. However, the system has had implementation problems. It has been particularly hard to stop clinicians using multiple synonyms for the same variable. CONCLUSIONS: The PCDQ metadata system provides an auditable method of data processing. It is a method that should improve the reliability, validity and efficiency of processing routinely collected clinical data. This paper sets out to demystify our data processing method and makes the PCDQ metadata system available to clinicians and data processors who might wish to adopt it.
Authors: Simon de Lusignan; Rob Navarro; Tom Chan; Glenys Parry; Kim Dent-Brown; Tony Kendrick Journal: BMC Med Inform Decis Mak Date: 2011-10-13 Impact factor: 2.796
Authors: Vojtech Huser; Frank J DeFalco; Martijn Schuemie; Patrick B Ryan; Ning Shang; Mark Velez; Rae Woong Park; Richard D Boyce; Jon Duke; Ritu Khare; Levon Utidjian; Charles Bailey Journal: EGEMS (Wash DC) Date: 2016-11-30
Authors: Simon de Lusignan; Hugh Gallagher; Tom Chan; Nicki Thomas; Jeremy van Vlymen; Michael Nation; Neerja Jain; Aumran Tahir; Elizabeth du Bois; Iain Crinson; Nigel Hague; Fiona Reid; Kevin Harris Journal: Implement Sci Date: 2009-07-14 Impact factor: 7.327
Authors: Olga Dmitrieva; Simon de Lusignan; Iain C Macdougall; Hugh Gallagher; Charles Tomson; Kevin Harris; Terry Desombre; David Goldsmith Journal: BMC Nephrol Date: 2013-01-25 Impact factor: 2.388