| Literature DB >> 35289755 |
Zahra Almowil1, Shang-Ming Zhou2, Sinead Brophy1, Jodie Croxall1.
Abstract
BACKGROUND: Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it difficult to compare different study findings and hinders the ability to conduct repeatable and reusable research.Entities:
Keywords: clinical codes; concept libraries; electronic health records; record linkage; reproducible research
Year: 2022 PMID: 35289755 PMCID: PMC8965669 DOI: 10.2196/31021
Source DB: PubMed Journal: JMIR Hum Factors ISSN: 2292-9495
One-to-one interviews’ questions guide.
| Introductory questions | Follow questions | Key questions | Final questions |
| To improve repeatable research in Swansea, a team of developers is developing a prototype concept library. This is a portal that allows access to the read codes or International Classification of Diseases–10 codes to identify conditions. Do you think this will be a helpful resource? Is the concept library a good idea that we should continue to develop? | Do you know about other already existing concept libraries? What do you think about them? Something like this exists at UCLa called CALIBERb. Have you seen CALIBER? Have you used it? |
Do you prefer to use ready-made algorithms or to have access to them to modify them? In your opinion, how should codes and algorithms be validated, and should they be validated? (Why should or should not?) There are often different versions of a diagnosis (eg, highly specific and suspected or likely cases). Do you think we need to collect and validate the best two versions of a diagnosis (specific or suspected)? Or do you think we should put all possible methods of identifying a condition, valid or not, and allow the researcher to choose? |
What are your requirements for the concept library for it to be helpful and user-friendly? What developments would you like to see to improve repeatable research using routine data? |
aUCL: University College London.
bCALIBER: Clinical Disease Research Using Linked Bespoke Studies and Electronic Health Records.
Presentation of the themes and subthemes of the one-to-one interviews.
| Themes | Examples of participant narratives | |||
|
| ||||
|
| Positive | “If there’s a way of doing that already that is set up and is validated and is consistently applied that would be an amazingly useful resource” (researcher 2). | ||
|
| Neutral | “It will be helpful, but it needs to be extended. If they want to build something like this, and it is effectively working as a library, you need two things to be happened: (1) people are happy to feed in their constructs so it builds up, and (2) a useful library, easy to go, to browse, and to borrow phenotypes definitions” (a clinician). | ||
|
| Negative | None | ||
|
| ||||
|
|
| |||
|
|
| Simplicity | “Simple plain English not in SQL or python” (a clinician). | |
|
|
| Searching ability | “What is the type of search engine? Is it a search engine that just does disease phenotypes or also does the health status phenotypes or risk factor phenotypes, symptoms phenotypes?” (a clinician). | |
|
|
| Data quality | “It’s really just about transparency and documentation. So, anybody can effectively do anything that can be turned into a reproducible research output. The barriers are usually not enough time to comment and document it properly and then not enough quality assurance” (a senior research manager). | |
|
|
| Sharing ability | “It would be very useful to share the knowledge about codes such as read codes, ICD 10 codes, or OPCS codes, and share ideas and concepts between other users that will save lots of time” (researcher 3). | |
|
|
| |||
|
|
| Interoperability | “How interoperable it is with other systems because the major failure of most of these systems is that they’re not interoperable, so people don’t use them” (a senior research manager). | |
|
|
| Accessibility | “So, from a group like myself, or me as a user, we would probably like direct access to the underlying data it stores. So, whether that’s through something like SQL directly, or something like that through a statistical package, because where we do lots of bulk type work” (a senior research manager). | |
|
|
| Analyzability | “I wanted to look at all health codes of my study population. Then, through machine learning, like feature selection, I tried to identify the most important list of codes, which are associated with the popular health conditions” (researcher 1) | |
|
| ||||
|
| Aware (used them) | “Yes, so with QOF, we definitely used QOF codes a lot, because obviously going back to the quality assurance question, they’d been assured so that the NHS can use them for remuneration of money and payments. With other systems, we tend to look online to see CALIBER of things with us, then yes we have used outputs from those systems before” (a senior research manager). | ||
|
| Aware (not used them) | “No. I have not used any of these things before so I think there is CALIBER and I think, is that part of what was set up within the previous Farr institute? so I am aware that some of these exist but I haven’t looked into them before” (researcher 2) | ||
|
| Not aware | None | ||
| Theme (4): user’s recommendation to improve repeatable research | “If we want reproducible research, we have to all be using these resources in a similar way or at least we need to be able to understand what previous projects have done. It is about setting things out clearly. Clear definitions, clear sets of codes that people can then either use themselves or build on I think” (researcher 2) | |||
A summary of general information on the participants in the focus group discussions (N=14).
| Parameters | Information |
| Current job position, n (%) |
Data scientist, 13 (93) Financial planner, 1 (7) |
| Sex, n (%) |
Female, 5 (36) Male, 9 (64) |
| Education, n (%) |
PhD degree, 6 (43) Master’s degree, 6 (43) Bachelor’s degree, 2 (14) |
| Research interests |
Data scientists Concept libraries Repeatable research with large health data Phenotyping and code lists of cancer disease Respiratory disease Algorithm or reusable codes development Asthma Collaboration in research methods Data analysis Machine learning Arthritis Health informatics Musculoskeletal disorders Healthy aging Gut—brain axis Neurodegenerative conditions Statistical methods Epidemiology Cancer Financial planners Intervention between primary care and secondary care and how they interact |