| Literature DB >> 24143138 |
Loes C M Bertens1, Berna D L Broekhuizen, Christiana A Naaktgeboren, Frans H Rutten, Arno W Hoes, Yvonne van Mourik, Karel G M Moons, Johannes B Reitsma.
Abstract
BACKGROUND: In diagnostic studies, a single and error-free test that can be used as the reference (gold) standard often does not exist. One solution is the use of panel diagnosis, i.e., a group of experts who assess the results from multiple tests to reach a final diagnosis in each patient. Although panel diagnosis, also known as consensus or expert diagnosis, is frequently used as the reference standard, guidance on preferred methodology is lacking. The aim of this study is to provide an overview of methods used in panel diagnoses and to provide initial guidance on the use and reporting of panel diagnosis as reference standard. METHODS ANDEntities:
Mesh:
Year: 2013 PMID: 24143138 PMCID: PMC3797139 DOI: 10.1371/journal.pmed.1001531
Source DB: PubMed Journal: PLoS Med ISSN: 1549-1277 Impact factor: 11.069
Figure 1Distribution of search results over time.
Dark grey columns represent the number of articles found with the search strategy, numbers displayed on right y-axis; light grey columns represent the articles included in the review after full text reading, numbers displayed on left y-axis.
Figure 2PRISMA flowchart of the selection of relevant papers.
Study characteristics of articles assessing psychiatric disorders, n = 30.
| Study Characteristics | Panel Members | Information for Panel Diagnosis | Decision Process | Validity | ||||||||||
| Author, year |
| Study Aim |
|
| Available Information | Original Data Available | Blinding |
|
| Initial Evaluation | Decision Making | Disagreements | Reproducibility | Comparison to Other Reference Test |
| Brugha, 2011 | 400 | Accuracy | 6 | 1 | Q | ? | N | 4 | 400 | Y | Consensus | ? | Y | Y |
| Carnero-Pardo, 2011 | 139 | Accuracy | 2 | 1 | PE, Q | ? | Y | 3 | 139 | N | Consensus | Additional expert | ? | N |
| Duberstein, 2011 | 191 | Accuracy | ? | ? | PH, Q | ? | ? | ? | 191 | ? | Consensus | ? | ? | ? |
| Girard, 2011 | 32 | ? | 2 | 2 | PH, I, Q | ? | ? | 2 | 32 | Y | Individual | ? | ? | ? |
| Johnson, 2011 | 173 | Accuracy | 2 | 1 | Q | ? | ? | 2 | 173 | ? | Consensus | ? | ? | N |
| Ogunniyi, 2011 | 1,733 | Prevalence | ? | ? | PE, BT, I, Q | ? | ? | 3 | 1733 | ? | Consensus | ? | Y | N |
| Plassman, 2011 | 217 | Prevalence | ? | 4 | PH, PE, BT, Q | ? | N | 2 | 217 | ? | Consensus | ? | ? | N |
| Hall, 2009 | 3,392 | Prevalence | ? | ? | PH, PE, Q | ? | Y | 3 | ? | ? | Consensus | ? | ? | ? |
| Potter, 2009 | 645 | Prediction model | ? | 2 | PH, PE, Q | ? | Y | 2 | 645 | ? | Consensus | ? | ? | N |
| Steenland, 2008 | 204 | Prediction model | 2 | 1 | PH, Q | ? | Y | 3 | 20 | Y | Individual | Consensus | Y | ? |
| Plassman, 2007 | 856 | Prevalence | 4 | ? | PH, PE, BT, Q | ? | N | 3 | 856 | ? | Consensus | Additional information | N | ? |
| Baird, 2006 | 255 | Prevalence | ? | ? | Q | ? | ? | ? | 255 | ? | ? | ? | Y | ? |
| Graff-Radford, 2006 | 128 | Accuracy | ? | 4 | ? | ? | Y | ? | 128 | ? | Consensus | ? | ? | ? |
| Sachdev, 2006 | 252 | Prediction model | 4 | 2 | PH, PE, FT, I | ? | ? | ? | 252 | ? | Consensus | ? | ? | ? |
| Boustani, 2005 | 227 | Prevalence | 4 | 4 | PH, BT, FT, I | ? | ? | 3 | 227 | ? | ? | ? | ? | ? |
| Williams, 2005 | 40 | Accuracy | 3 | ? | Q | ? | ? | 2 | 40 | ? | Consensus | ? | ? | Y |
| De Koning, 2004 | 410 | Accuracy | 3 | ? | Q | ? | Y | 5 | 410 | Y | Individual | Combined averages | ? | ? |
| Laurila, 2004 | 425 | Accuracy | 3 | 1 | PH, I, Q | ? | ? | 3 | 425 | ? | Consensus | ? | ? | ? |
| Miller, 2001 | 56 | Accuracy | 3 | ? | PH, BT, I, Q | ? | N | ? | 56 | ? | Consensus | ? | ? | ? |
| Bienvenu, 2000 | 153 | Prevalence | 2 | 1 | PH, PE, Q | Y | Y | 4 | 153 | Y | Individual | ? | N | N |
| Magaziner, 2000 | 2,285 | Prevalence | 2 | 2 | PH, Q | ? | ? | 3 | ? | Y | Individual | Additional expert | Y | ? |
| Weintraub, 2000 | 2,135 | Prediction model | 2 | 2 | PH, Q | ? | ? | 3 | 406 | Y | Individual | Additional expert | Y | ? |
| Fladby, 1999 | 40 | Accuracy | ? | ? | ? | ? | ? | 2 | 40 | ? | Consensus | ? | ? | N |
| Ogunniyi, 1998 | 77 | Prevalence | ? | 1 | PH, BT, FT, I | ? | ? | ? | 77 | ? | Consensus | ? | Y | ? |
| Gulevich, 1997 | 185 | Accuracy | 3 | 3 | PH, PE | Y | Y | 3 | 185 | Y | Consensus | ? | ? | ? |
| Wiener, 1997 | 20 | Inter-rater variability | 2 | 1 | Q | ? | ? | ? | 20 | ? | Consensus | ? | ? | ? |
| Class, 1996 | 106 | Prevalence | 3 | 2 | PH, PE, BT, FT, I | ? | ? | ? | 106 | ? | Consensus | ? | Y | ? |
| Tanenberg-Karant, 1995 | 196 | Prevalence | ? | 1 | PH, Q | ? | Y | 2 | 196 | Y | Individual | Consensus | Y | ? |
| Fennig, 1994 | 232 | Accuracy | 2 | 1 | PH, Q | ? | ? | ? | 232 | Y | Individual | Consensus | Y | ? |
| Drake, 1990 | 75 | Prevalence | ? | ? | PH, Q | ? | N | 2 | ? | Y | Consensus | Additional expert | ? | ? |
Abbreviations: ?, not reported; BT, blood test; FT, function test; I, imaging; N, no; PE, physical examination; PH, patient history; Q, questionnaire. Y, yes;
Study characteristics of articles assessing diseases from other medical domains, n = 17.
| Study Characteristics | Panel Members | Information for Panel Diagnosis | Decision Process | Validity | |||||||||||
| Author, Year |
| Study Aim | Medical domain |
|
| Available Information | Original Data Available | Blinding |
|
| Initial Evaluation | Decision Making | Disagreements | Reproducibility | Comparison to Other Reference Test |
| Ham, 2012 | 127 | Accuracy | DD | 2 | 2 | PH, PE, BT, I, FU | ? | ? | 2 | 127 | N | Consensus | ? | ? | ? |
| Bisulli, 2011 | 101 | Accuracy | ND | 3 | 2 | PH, I, Q, FU | ? | Y | 2 | 101 | ? | ? | ? | ? | N |
| Gamez-Diaz, 2011 | 630 | Accuracy | BD | 3 | 3 | PH, BT, I | ? | Y | 2 | 221 | Y | Individual | Consensus | Y | Y |
| Van Randen, 2011 | 1,021 | Accuracy | DD | 3 | 2 | PH, PE, BT, I, FU | ? | N | ? | 1021 | Y | Individual | Consensus | Y | ? |
| Whiteley, 2011 | 356 | Accuracy | ND | ? | 3 | PH, PE, I, FU | ? | Y | 3 | 356 | ? | ? | ? | ? | ? |
| Hardie, 2010 | 51 | Accuracy | DD | 2 | 1 | I | ? | N | 2 | 51 | ? | Consensus | ? | ? | N |
| O'Toole, 2010 | 75 | Accuracy | MD | 4 | 1 | I | Y | Y | 2 | 75 | N | Consensus | Majority | ? | N |
| Thabut, 2010 | 242 | Accuracy | BD | 3 | ? | PE, BT | N | ? | 3 | 242 | Y | Individual | Consensus | ? | ? |
| Amour, 2008 | 276 | Accuracy | ID | 2 | ? | PH, PE, BT | ? | Y | 5 | 276 | Y | Individual | Additional expert | Y | ? |
| Humphries, 2008 | 44 | Accuracy | UD | ? | 2 | PE, I | ? | ? | ? | 3 | ? | Consensus | ? | ? | ? |
| Lin, 2007 | 72 | Accuracy | UD | 2 | 1 | PH, PE, I, FU | ? | ? | ? | 72 | ? | Consensus | ? | ? | ? |
| Tadros, 2006 | 44 | Accuracy | MD | ? | ? | I | N | N | 2 | 44 | ? | Consensus | ? | ? | ? |
| Otte, 2005 | 102 | Accuracy | GD | ? | ? | PH, FT, I | ? | N | ? | 102 | ? | Consensus | ? | ? | Y |
| Robin, 2005 | 261 | Accuracy | ED | 9 | 1 | PH, FT | ? | Y | 4 | 261 | Y | Individual | Consensus | ? | ? |
| Tepper, 2004 | 377 | Prevalence | ND | 4 | ? | PH | ? | Y | ? | 377 | ? | ? | Consensus | ? | ? |
| Penzkofer, 2002 | 80 | Accuracy | ND | 2 | 1 | I | ? | Y | 2 | 80 | ? | ? | ? | ? | ? |
| Weih, 2001 | 4,744 | Prevalence | ED | 6 | 1 | PH, PE, I | ? | ? | 3 | 4744 | Y | Individual | Consensus | ? | N |
Abbreviations: ?, not reported; BD, disorders of the blood; BT, blood test; DD, disorders of the digestive system; ED, eye disorders; FT, function test; FU, follow-up; GD, gastroenterological disorders; I, imaging; MD, musculoskeletal disorders; N, no; ND, disorders of the nervous system; PE, physical examination; PH, patient history; Q, questionnaire; UD, disorders of the genitourinary system; Y, yes.
Study characteristics of articles assessing cardiovascular disease, n = 17.
| Study Characteristics | Panel Members | Information for Panel Diagnosis | Decision Process | Validity | ||||||||||
| Author, Year |
| Study Aim |
|
| Available Information | Original Data Available | Blinding |
|
| Initial Evaluation | Decision Making | Disagreements | Reproducibility | Comparison to Other Reference Test |
| Assomull, 2011 | 120 | Accuracy | 3 | 1 | PH, I | ? | N | 6 | 120 | N | Consensus | Majority | ? | Y |
| Doubal, 2011 | 355 | Prediction model | 3 | 3 | BT, I, FU | Y | N | ? | ? | ? | ? | ? | ? | N |
| Kelder, 2011 | 47 | Accuracy | 3 | 3 | PH, PE, BT, FT, I, FU | ? | Y | ? | 47 | ? | ? | ? | ? | ? |
| Kelder, 2011 | 200 | Accuracy | 3 | 3 | PH, PE, BT, FT, I, FU | ? | Y | ? | 200 | ? | ? | ? | ? | ? |
| Oudejans, 2011 | 206 | Prediction model | 4 | 4 | PH, PE, BT, I, FU | ? | Y | 2 | 206 | N | Consensus | Considered absent | Y | N |
| Bosner, 2010 | 1,199 | Prediction model | 3 | 3 | PH, PF, FT, FU | ? | N | 2 | 1199 | ? | ? | ? | ? | ? |
| Gaikwad, 2008 | 33 | Accuracy | 2 | 1 | PH, I | ? | ? | 2 | 33 | N | Consensus | ? | ? | ? |
| Hoffmann, 2007 | 70 | Accuracy | 2 | 1 | PH, FT, I | Y | ? | 2 | 9 | ? | Consensus | ? | ? | ? |
| Kantarci, 2007 | 33 | Accuracy | 2 | 2 | I | Y | ? | ? | 33 | ? | Consensus | ? | ? | ? |
| Linn, 2007 | 19 | Accuracy | 3 | ? | PH, I, FU | N | ? | ? | 19 | ? | Consensus | ? | ? | ? |
| Nordenholz, 2007 | 254 | Prevalence | 2 | 1 | I, DID | ? | ? | 3 | 15 | Y | Consensus | ? | ? | ? |
| Hoffmann, 2006 | 103 | Prediction model | 2 | 2 | PH, BT, FT, DID | ? | Y | 2 | 103 | ? | Consensus | Additional expert | ? | Y |
| Hoffamnn, 2006 | 40 | Accuracy | 2 | 2 | PH, BT, FT, DID | N | Y | 2 | 40 | ? | ? | Consensus | ? | ? |
| Hoffmann, 2006 | 100 | Accuracy | 2 | 1 | MH, FT, I | Y | N | 2 | 15 | Y | Consensus | ? | ? | N |
| Trevelyan, 2003 | 401 | Accuracy | 3 | 2 | PH, BT, FT | ? | ? | 4 | 401 | ? | ? | ? | ? | Y |
| Dao, 2001 | 250 | Accuracy | 2 | 1 | PH, PE, BT, I, FU | ? | Y | 3 | 250 | Y | Consensus | Additional information | ? | ? |
| Remy-Jardin, 2000 | 82 | Accuracy | 2 | 1 | I | Y | ? | 2 | 82 | ? | Consensus | Additional information | ? | ? |
Abbreviations: ?, not reported; BT, blood test; DID, discharge or preliminary diagnosis; FT, function test; FU, follow-up; I, imaging; N, no; PE, physical examination; PH, patient history; Y, yes.
Study characteristics of articles assessing respiratory disorders, n = 10.
| Study Characteristics | Panel Members | Information for Panel Diagnosis | Decision Process | Validity | ||||||||||
| Author, Year |
| Study Aim |
|
| Available Information | Original Data Available | Blinding |
|
| Initial Evaluation | Decision Making | Disagreements | Reproducibility | Comparison to Other Reference Test |
| Guder, 2012 | 405 | Accuracy | 2 | 2 | PH, FT, I | ? | N | 2 | 405 | ? | Consensus | ? | Y | ? |
| Mohammed Hoessein, 2012 | 342 | Accuracy | 2 | 2 | PH, PE, FT | ? | N | 2 | 342 | N | Consensus | Additional expert | Y | ? |
| Thieme, 2012 | 15 | Accuracy | 4 | 2 | I | ? | N | 2 | 15 | Y | Individual | Consensus | ? | ? |
| Broekhuizen, 2011 | 372 | Accuracy | 2 | ? | PH, PE, FT, FU | ? | Y | 2 | 372 | N | Consensus | ? | ? | ? |
| Broekhuizen, 2010 | 353 | Prevalence | 2 | 2 | PH, PE, FT, FU | ? | N | 2 | 353 | N | Consensus | Additional expert | Y | N |
| Szucs-Farkas, 2009 | 120 | Accuracy | 2 | 1 | PH, I | ? | N | 2 | 120 | ? | Consensus | Additional expert | ? | Y |
| Reinartz, 2006 | 53 | Accuracy | ? | ? | BT, I, FU, DID | ? | Y | ? | 53 | ? | Consensus | ? | ? | ? |
| Chavannes, 2004 | 12 | Accuracy | 4 | 3 | PH, PE, FT | ? | Y | 4 | 12 | N | Consensus | ? | ? | N |
| Reinartz, 2004 | 83 | Accuracy | ? | ? | BT, I, FU, DID | ? | N | ? | 83 | ? | Consensus | ? | ? | ? |
| Gauvin, 2003 | 30 | Accuracy | 3 | ? | PH. PE, BT, I | ? | Y | 2 | 30 | Y | Individual | Consensus | ? | ? |
Abbreviations: ?, not reported; BT, blood test; DID, discharge or preliminary diagnosis; FT, function test; FU, follow-up; I, imaging; N, no; PE, physical examination; PH, patient history; Y, yes.
Study characteristics of articles assessing multiple diseases, n = 7.
| Study Characteristics | Panel Members | Information for Panel Diagnosis | Decision Process | Validity | ||||||||||||
| Author, Year |
| Study Aim | Medical Domain(s) |
|
|
| Available Information | Original Data Available | Blinding |
|
| Initial Evaluation | Decision Making | Disagreements | Reproducibility | Comparison to Other Reference Test |
| Ray, 2006 | 514 | Accuracy | CD, RD | 8 | 2 | 6 | PH, PE, BT, FT, I | N | N | ? | 514 | Y | Individual | Additional expert | Y | ? |
| Rutten, 2005 | 405 | Prevalence | CD, RD | 2 | 4 | 3 | PH, PE, BT, FT, I | ? | ? | 3 | 405 | ? | Consensus | ? | ? | ? |
| White, 2005 | 69 | Accuracy | CD, RD | 6 | 3 | 3 | PH, PE, I, FU, DID | ? | N | 2 | 69 | ? | Consensus | ? | ? | Y |
| Marshall, 2004 | 107 | Accuracy | GD | 3 | 6 | 3 | PH, BT, I | N | N | 2 | 107 | N | Consensus | ? | ? | ? |
| Jorgensen, 1998 | 148 | Accuracy | CD, GD, MD, RD | 6 | 7 | 3 | PH, BT | ? | Y | 2 | 148 | ? | Consensus | ? | ? | Y |
| Geirnaerdt, 1997 | 78 | Inter-rater variability | MD | 2 | 2 | 1 | PH, I | Y | Y | ? | 78 | ? | Consensus | ? | ? | Y |
| Martinez, 1994 | 50 | ? | CD, PD, RD | 6 | 3 | ? | PH, PE, FT, FU | ? | ? | 2 | 50 | Y | Consensus | ? | ? | ? |
Abbreviations: ?, not reported; BT, blood test; CD, cardiovascular disorders; DID, discharge or preliminary diagnosis; FT, function test; FU, follow-up; GD, gastroenterological disorders; I, imaging; MD, musculoskeletal disorders; N, no; PD, psychiatric disorders; PE, physical examination; PH, patient history; RD, respiratory disorders; Y, yes;.
The proportion of articles that reported on items related to panel constitution, information available and methods of decision making.
| Item: | Number (%) of Articles |
|
| |
| Number of panel members? | 63 (78%) |
| Field(s) of expertise? | 61 (75%) |
|
| |
| Which information was available for panel evaluation? | 79 (98%) |
| Was original/raw data available? | 10 (12%) |
| Blinding of tests to the panel? | 53 (65%) |
|
| |
| Was the entire study population assessed by the panel? | 71 (88%) |
| Disease classification? (e.g., present/absent) | 58 (72%) |
| How were the decisions on disease status made? | 71 (88%) |
| Handling of disagreements? | 29 (36%) |
Total number of studies is 81. The displayed items were inspired by the reporting guideline for diagnostic research. The number of articles represents those that reported something on the items concerning panel constitution, information available for panel diagnosis, and the methods of decision making. For example, 53 studies reported on blinding of tests to the panel; this could include listing the specific items that were not available for panel diagnosis (blinding) or the statement that all patient data and tests were available for panel diagnosis.
Observed combinations of the decision process used in the reviewed articles.
| Initial Evaluation | Decision Process | Handling of Disagreements | |||
| Type |
| Type |
| Type |
|
| Individual | 24 | Individual | 17 | Additional expert | 4 |
| Discussion | 10 | ||||
| Other | 1 | ||||
| Not reported | 2 | ||||
| Plenary | 7 | Additional information | 1 | ||
| Additional expert | 1 | ||||
| Not reported | 5 | ||||
| Plenary | 11 | Plenary | 11 | Additional information | 1 |
| Additional expert | 3 | ||||
| Voting | 2 | ||||
| Not reported | 5 | ||||
| Not reported | 46 | Plenary | 34 | Additional information | 1 |
| Additional expert | 2 | ||||
| Discussion | 1 | ||||
| Not reported | 30 | ||||
| Not reported | 12 | Discussion | 2 | ||
| Not reported | 10 | ||||
Initial evaluation of the information was done individually, during a plenary meeting, or no details were reported. Decisions on disease status were made by combining individual scores (individual), in a plenary meeting, or no details were reported. For Additional expert, another expert was consulted to resolve disagreements; for Discussion, disagreements were resolved through discussion with all members; for additional information, extra information was made available to members to resolve disagreements; for voting, disagreements are resolved by choosing the opinion of the majority.
Averages of the panel members were calculated to decide on the disease status. For example, the panel members first assessed the information individually, decided on the diagnosis in a plenary meeting, and resolved disagreement by consulting an additional expert.
Figure 3Flowchart of options to consider when planning and conducting panel diagnosis.
Options to consider when reporting or designing a study using a panel diagnosis as reference standard.
|
|
|
| Number of members | |
| Odd number for voting | |
| Background of the members | |
| One or multiple areas of expertise represented?Broad or narrow expertise of the members?Years of experience | |
| Same panel constitution for all patients? | |
| Same member(s) present in every panel?Same expertise represented in each panel? | |
|
|
|
| Sources or domains of information | |
| e.g., history taking, physical examination, previous medical history, imaging, blood tests, follow-up, working diagnoses, etc. | |
| Information presented with or without interpretation? | |
| Blinding? | |
| Blinding to what source of information?Complete or staged blinding? | |
|
|
|
| Individual assessment of information by panel members BEFORE group meeting? | |
| Selected subgroups withheld from panel assessment? | |
| Pre-specified decision rule?Agreement among members in individual assessment? | |
| Classification of the target condition | |
| Present/absent or multiple ordered categories?Probability estimations? | |
| Individual or plenary decision process? | |
| Handling of disagreements | |
| Plenary discussion?Additional expert and/or additional information? | |
|
|
|
| Agreement testing | |
| Reproducibility of plenary decision process?Inter-rater agreement? | |
| Face validity | |
| Comparing panel diagnosis to other possible reference tests: | |
| Comparison to clinical follow-up?Pre-specified decision rule?Obtain ‘gold standard’ in subgroup of patients? |
Panel diagnosis definition: diagnosis based on multiple tests, agreed on by multiple experts.
The default choice is paper-based summaries, including interpretation, of the information.
The default choice is that all patients are assessed by the panel.