Literature DB >> 31900125

Measurement properties of oral health assessments for non-dental healthcare professionals in older people: a systematic review.

Babette Everaars1,2, Linet F Weening-Verbree3, Katarina Jerković-Ćosić4, Linda Schoonmade5, Nienke Bleijenberg6,7, Niek J de Wit7, Geert J M G van der Heijden8.   

Abstract

BACKGROUND: Regular inspection of the oral cavity is required for prevention, early diagnosis and risk reduction of oral- and general health-related problems. Assessments to inspect the oral cavity have been designed for non-dental healthcare professionals, like nurses. The purpose of this systematic review was to evaluate the content and the measurement properties of oral health assessments for use by non-dental healthcare professionals in assessing older peoples' oral health, in order to provide recommendations for practice, policy, and research.
METHODS: A systematic search in PubMed, EMBASE.com, and Cinahl (via Ebsco) has been performed. Search terms referring to 'oral health assessments', 'non-dental healthcare professionals' and 'older people (60+)' were used. Two reviewers individually performed title/abstract, and full-text screening for eligibility. The included studies have investigated at least one measurement property (validity/reliability) and were evaluated on their methodological quality using "The Consensus-based Standards for the selection of health Measurement Instruments" (COSMIN) checklist. The measurement properties were then scored using quality criteria (positive/negative/indeterminate).
RESULTS: Out of 879 hits, 18 studies were included in this review. Five studies showed good methodological quality on at least one measurement property and 14 studies showed poor methodological quality on some of their measurement properties. None of the studies assessed all measurement properties of the COSMIN. In total eight oral health assessments were found: the Revised Oral Assessment Guide (ROAG); the Minimum Data Set (MDS), with oral health component; the Oral Health Assessment Tool (OHAT); The Holistic Reliable Oral Assessment Tool (THROAT); Dental Hygiene Registration (DHR); Mucosal Plaque Score (MPS); The Brief Oral Health Screening Examination (BOHSE) and the Oral Assessment Sheet (OAS). Most frequently assessed items were: lips, mucosa membrane, tongue, gums, teeth, denture, saliva, and oral hygiene.
CONCLUSION: Taken into account the scarce evidence of the proposed assessments, the OHAT and ROAG are most complete in their included oral health items and are of best methodological quality in combination with positive quality criteria on their measurement properties. Non-dental healthcare professionals, policymakers and researchers should be aware of the methodological limitations of the available oral health assessments and realize that the quality of the measurement properties remains uncertain.

Entities:  

Keywords:  Non-dental healthcare professional; Older people; Oral health; Oral health assessment

Mesh:

Year:  2020        PMID: 31900125      PMCID: PMC6942417          DOI: 10.1186/s12877-019-1349-y

Source DB:  PubMed          Journal:  BMC Geriatr        ISSN: 1471-2318            Impact factor:   3.921


Background

Nowadays, in Western countries more older people retain all or a major part of their natural teeth which brings along new challenges for the oral healthcare system. Highly complicated restorations (e.g. crowns, bridges, implants) make it more difficult to perform adequate oral self-care, especially in frail older people [1], and as such may result in (oral) health-related complications [2, 3]. Oral health problems like pain, abscesses, difficulties with eating and chewing may have a significant impact on older peoples’ self-esteem, well-being, social life, and quality of life [4, 5]. At the same time, oral problems like periodontitis are associated with for example cardiovascular diseases, diabetes and pneumonia [6, 7]. Therefore, prevention and early diagnosis of oral diseases are important for the risk reduction of developing further problems with oral and general health. Oral health prevention requires regular inspection of the oral cavity. Such inspections are traditionally performed by the dentist during preventive treatment sessions in dental practice. However, several barriers to seeking oral health care may contribute to a decrease in oral inspections. A review from Kiyak et al. (2005) concluded that barriers in seeking oral care in older people are depending on age, ethnicity, income, availability of dental insurances, type of residence (urban vs. rural), physical access and general health. Moreover, they concluded that attitude and psychosocial factors could contribute to older peoples’ oral healthcare-seeking behavior. Since (frail) older people seek less frequently dental care, the role of non-dental care professionals gained importance in contributing to screen and triage oral health problems [8-11]. Over twenty years, several oral health assessments have been developed for use by non-dental healthcare professionals like nurses and caregivers. For example, the Oral Health Assessment Tool (OHAT), the Revised Oral Assessment Guide (ROAG), The Holistic Reliable Oral Assessment Tool (THROAT), and comparable assessments have been developed for inspection and triage the oral cavity of older people [10, 12]. Such assessments may serve non-dental healthcare professionals, for example in the context of assessing oral health in older people. Moreover, specific oral assessments have been developed for cancer patients [13]. However, since this target group suffers from specific oral health issues like Mucositis, their oral healthcare demand differs from general older people and was not the focus of this review. Available oral health assessment as reported in the literature may differ in their approach and they are described as tools, instruments, guides, and sheets for oral cavity inspection or triage. In this review, we use the generic term oral health assessment for all of the approaches that aim to inspect the oral cavity of older people. Earlier studies reported that oral health assessments in practice should be: easy and simple to use, inexpensive, and only require basic equipment [10, 14]. Moreover, for evidence-based care decisions, the measurement properties of such (oral health) assessments are considered crucial and therefore should be tested. The measurement properties are divided into three domains [15, 16]: Validity, i.e. construct validity: align with the theoretical notion of oral health; content validity: include all items considered relevant by all stakeholders; criterion validity: correlates with a reference; Reliability, i.e. similar results are obtained for repeated measurements; Responsiveness, i.e. change over time is detected. Chalmers et al. (2005) performed a systematic review on oral health assessments for use by nurses and caregivers of older people with dementia [10]. They concluded that there is a lack of validated and reliable tools for oral cavity inspection by non-dental healthcare professionals. Since then, new oral health assessments have been developed. Some of these were tested on their validity and reliability [17-19], while others were not [13, 20, 21]. To date, an overview of these assessments and their measurement properties has not been published.

Objective

The purpose of this systematic review was to evaluate the content and the measurement properties of oral health assessments for use by non-dental healthcare professionals in assessing older peoples’ oral health, in order to provide recommendations for practice, policy, and research.

Methodology

Study design and strategy

To identify all relevant publications, systematic searches were performed in the bibliographic databases PubMed, EMBASE.com, and Cinahl (via Ebsco) from inception to 13 November 2017. Search terms included indexed terms from MeSH in PubMed, EMtree in EMBASE.com, Cinahl headings in Cinahl as well as free text terms. Search terms referring to ‘oral health assessments’ were used in combination with search terms comprising ‘non-dental healthcare professionals’ and ‘older people’ (60+). Duplicate studies were excluded. The full search strategies for all databases can be found in Additional file 1 (Search strategies for databases). Reference lists of included studies were screened for additional relevant studies (cross-reference check).

Selection process

Two reviewers (BE and LWV) independently screened all potentially relevant titles and abstracts for eligibility. The selection process was performed using Covidence, a Cochrane online technology platform, to fulfill this procedure at distance [22]. If necessary, the full-text article was checked for the eligibility criteria. Differences in judgment were resolved through a consensus procedure. Studies were included if they met the following criteria: (i) full text available of the original article; (ii) include oral health assessments for oral cavity inspection of older people (60+) developed for use by non-dental healthcare professionals; (iii) report original investigative data on one or more measurement properties. Moreover, they should fulfill the criteria as defined by The Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) for systematic reviews: www.database.cosmin.nl [23]. Studies were excluded if they concerned: (i) publications in other languages than English; (ii) oral health assessments developed for dental professionals; (ii) oral health-related quality of life instruments; (iii) oral screening instruments based only on questionnaires; and (iiii) oral health assessments exclusively developed for patients with cancer or another specific illnesses.

General information of the included studies

To give an overview of the included studies, information has been extracted on: authors, publication year, study design, investigated measurement property, type of non-dental healthcare professional, specification of the older people population, oral health assessment (and their items assessed), rating scale of the assessment and duration of the assessment. Data extraction was performed on all included studies.

Assessment of the methodological quality of the included studies per measurement property

When validity and reliability of an assessment tool are investigated in a study of good methodological quality, the results can be used in research or daily care. However, when the methodological quality of a study is inadequate, the results of the study cannot be trusted and the quality remains unclear [16]. Therefore, to assess the methodological quality of the included studies, The COSMIN 4-point scale checklist has been used [24]. This checklist is a tool for the assessment of the methodological quality of studies examining measurement properties and has shown good inter-rater agreement and user-friendliness [19]. The COSMIN checklist evaluates three main measurement properties: 1. Validity, 2. Reliability, and 3.Responsiveness (Fig. 1), which are further divided into nine measurement properties (Box A-I). A visualization of how these measurement properties are related is shown in Fig. 1. Within the COSMIN a separate score is assigned for the methodological quality of each of the nine measurement properties in a study. Depending on the measurement property that has been evaluated, multiple scores for the methodological quality can be assigned and the score can differ per measurement property. For example, the methodological quality investigating the content validity can be good, while at the same time, the reliability assessment was performed in a small sample size and therefore of poor methodological quality. Depending on the measurement property, the COSMIN checklist contains a minimum of 5 and a maximum of 18 questions to evaluate the methodological quality [24]. Scores per question were rated on a nominal scale (excellent, good, fair, poor). To determine the methodological quality per property ‘The worst score counts’ criterion is used, meaning that the lowest score on a question within one measurement property determines the methodological quality score. For the full assessments of all measurement properties, we refer to the original COSMIN guideline [24]. A definition of each measurement properties is given in Table 1 under the column ‘description’. Definitions are based on Terwee et al. (2007) and slightly modified in terminology to fit the content of our study.
Fig. 1

Items and boxes as used by the COSMIN checklist rated on a four-point scale: excellent, good, fair & poor

Table 1

Definitions of the measurement properties and their quality criteria

Measurement propertyDescription aQuality criteria for measurement properties b
ValidityContent validityTo which degree the construct assesses whether the items are relevant for the construct to be measured+: The target population considers all items in the instrument to be relevant AND to be complete
?: No target population involvement
-: The target population considers the items of the instrument irrelevant OR incomplete
Construct validityStructural validityTo which degree the scores of an instrument are an adequate reflection of the dimensionality+: Factors should explain at least 50% of the variance
?: Explained variance not mentioned
-: Factors explain < 50% of the variance
Hypothesizes testingTo which extent the scores of the instrument are consistent with the theoretically derived hypotheses+: Correlation with an instrument measuring the same construct ≥ 0.50 or at least 75% of the results are in accordance with the hypotheses AND correlation with related constructs is higher than with unrelated constructs
?: Solely correlations determined with unrelated constructs
-: Correlations with an instrument measuring the same construct <0.50 OR <75% of the results are in accordance with the hypotheses OR correlation with related constructs is lower than with unrelated constructs
Cross-cultural validityTo which extend the items are an adequate reflection of the original version after translation or culturally adaptation.+: no important DIF between language versions
?: DIF not assessed
-: Important DIF found between language versions
Criterion validityTo what degree the scores of the instrument are an adequate reflection of a ‘gold standard’. The gold standard should fit the purpose of the assessed instrument.+: Convincing arguments that gold standard is ‘’gold” AND correlations with gold standard ≥0.70
?: No convincing argument that gold standard is ‘’gold” OR doubtful design or method
-: Despite adequate design and method, correlation is < 0.70
ReliabilityReliabilityThe proportion of the total variance in the measurements which is because of ‘’true” differences among patients+: ICC/weighted kappa ≥ 0.70 OR Pearson’s r ≥ 0.80
?: Neither ICC/weighted kappa, nor Pearson’s r determined
-: ICC/weighted kappa <0.70 OR Pearson’s r < 0.80
Internal consistencyThe extent to which items in a sub(scale) are inter correlated, thus measuring the same construct+: Cronbach’s α (s) ≥ 0.70
?: Cronbach’s α not determined
-: Cronbach’s α < 0.70
Measurement errorThe systematic and random error of a patient’s score that is not attributed to true changes in the construct to be measured+:MIC <SDC OR MIC outside the LOA OR convincing arguments that agreement is acceptable
?: Doubtful design or method OR MIC not defined AND no convincing arguments that agreement is acceptable
-: MIC≥ SDC OR MIC equals or inside LOA, despite adequate design and method
ResponsivenessThe ability of the instrument to detect change over time+: Correlation with an instrument measuring the same construct ≥ 0.50 OR at least 75% of the results are in accordance with the hypotheses OR AUC ≥ 0.70 AND correlation with related constructs is higher than with unrelated constructs
?: Solely correlations determined with unrelated constructs
-: Correlation with an instrument measuring the same construct <0.50 OR <75% of the results are in accordance with the hypotheses or AUC <0.70 OR correlation with related constructs is lower than with unrelated constructs.

DIF Differential item functioning, MIC minimal important change, SDC Smallest detectable change, LOA Limits of agreement, ICC Intra Class Correlation

+= positive rating; ?= indeterminate rating; -= negative rating

aDescriptions of the measurement properties are based on Terwee et al (2007)

bTo fit the content of oral health assessments, we combined the quality criteria as used by Weldam et al. (2013) & Terwee (2007)

Items and boxes as used by the COSMIN checklist rated on a four-point scale: excellent, good, fair & poor Definitions of the measurement properties and their quality criteria DIF Differential item functioning, MIC minimal important change, SDC Smallest detectable change, LOA Limits of agreement, ICC Intra Class Correlation += positive rating; ?= indeterminate rating; -= negative rating aDescriptions of the measurement properties are based on Terwee et al (2007) bTo fit the content of oral health assessments, we combined the quality criteria as used by Weldam et al. (2013) & Terwee (2007) Two raters (BE & LWV) independently determined the overall methodological quality per property. A disagreement between the raters was resolved via a consensus meeting. A third reviewer (KJ) was consulted when an agreement was still not reached.

Quality criteria for the measurement properties on oral health assessments

When measurement properties were of excellent, good or fair methodological quality, an assessment of the quality of the measurement properties has been performed. Measurement properties of poor methodological quality were excluded for further quality assessment of this specific measurement property. The scores for quality of measurement property were: positive (+), negative (−) or indeterminate (?). See the column ‘Quality criteria for measurement properties’ in Table 1 for the definitions.

Results

Search results

The literature search generated a total of 879 references: 395 in PubMed, 393 in EMBASE.com and 91 in Cinahl. After removing duplicates, 557 references remained. Four hundred four studies were removed based on the screening of the title and the abstract. The flowchart of the search and selection process is presented in Fig. 2. After screening the full-text, 136 studies were removed based on the presented in-and exclusion criteria. One article which met the in-and exclusion criteria was added after reviewing the reference lists of included articles. Reasons for exclusion full-text articles are described in Fig. 2.
Fig. 2

Flowchart of in- and excluded studies

Flowchart of in- and excluded studies

Included studies

In total, 18 studies describing eight different oral health assessments were included for analysis: (1) The Revised Oral Assessment Guide (ROAG); (2) the Minimum Data Set (MDS), with oral health component; (3) the Oral Health Assessment Tool (OHAT); (4) The Holistic Reliable Oral Assessment Tool (THROAT); (5) Dental Hygiene Registration (DHR); (6) Mucosal Plaque Score (MPS); (7) the Brief Oral Health Screening Examination (BOHSE), and (8) the Oral Assessment Sheet (OAS). Table 2 gives an overview of the included studies and their investigated oral health assessments. Most non-dental healthcare professionals involved were nurses, sub-classified as Registered Nurse (RN), Licensed Vocational Nurse (LVN), Clinical Nurse (CN) or Licensed Practical Nurse (LPN). In the study of Simpelaere et al. (2016), speech pathologists were included [38]. The population on which the oral health assessment was used was heterogeneous and consisted of rehabilitation residents, nursing home residents, hospitalized older people, community-dwelling older people and older people with mental problems (Table 2).
Table 2

Data-extraction table for the included studies

AuthorsPublication yearStudy designInvestigated measurement propertyType of non-dental healthcare professional using assessmentPatient populationOral health assessmentRating scaleDuration of assessment
1Andersson et al. [18]2002Cross-sectional observationalInter-rater reliabilityRNolder people in rehabilitation wardROAG3 point scale on 8 itemsUnknown
2Andersson et al. [25]2002Cross-sectional observationalContent validityRNGeriatric rehabilitation patientsROAG3 point scale on 8 itemsUnknown
3Arvidson-Bufano et al. [26]1996Cross-sectional observationalInter-rater reliabilityRN and LPNNursing home residentsMDS-RAI (section M) and RAP summary2 Point scale on 7 items3–4 min
4Blank et al. [27]1996Cross-sectional observationalInter-rater reliabilityRN and LPNNursing home residentsMDS-RAI (section M) and RAP summary2 Point scale on 7 itemsUnknown
5Chalmers et al. [17]2005Prognostic follow-up

Content validity

Criterion validity

Intra-rater reliability

Inter-rater reliability

Test-retest reliability

PCA, RN, Enrolled Nurses and NAResidents from residential facilitiesOHAT3 point scale on 8 itemsMean: 7.8 min
6Cohen-Mansfield et al. [28]2002Cross-sectional observational studyInter-rater reliabilityGeriatriciansNursing home residents with DementiaMDS- mouth pain and inflamed gums8 items on 2 point scaleUnknown
7Dickinson et al. [19]2001Cross-sectional study

Content validity

Intra-rater reliability

Inter-rater reliability

Stroke specialist nurse, staff nurses, student nurseOlder medically Ill patientsTHROAT4 point scale on 9 itemsUnknown
8Fjeld et al. [29]2016Prognostic follow-up

Content validity

Criterion validity

Inter-rater reliability

Clinical nurseNursing home residentsDHR3 point scale on two itemsLess than 1 minute
9Hanne et al. [30]2012Cross-sectionalCross-cultural validityNursesAcute medical ward residents (mean age 76.5)ROAG3 point scale on 8 itemsUnknown
10Hawes et al. [31]1995Cross-sectionalInter-rater reliabilityLNNursing home residentsMDSUnclearUnknown
11Henriksen et al. [32]1999Cross-sectional

Intra-rater reliability

Inter-rater reliability

Medical Nurseolder people with mental disabilitiesMPS4 point scale on 2 items2–4 min
12Kayser-Jones et al. [33]1995Cross-sectional

Inter-rater reliability

Test-retest reliability

RN, LVN, CNANursing home residentsBOHSE3 point scale on 10 itemsMean time RNs, LVNS, CNAs: 7.4, 7.9 and 8.7 min
13Lin et al. [34]1999Cross-sectional

Criterion validity

Inter-rater reliability

LN and CNALTC residents with AlzheimerBOHSE3 point scale on 10 itemsUnknown
14Morris et al. [35]1997Cross-sectionalInter-rater reliabilityNursesCommunity-dwelling older people with home careMDS-HCUnclearUnknown
15Paulsson et al. [36]2008ProspectiveCriterion validityNursesPatients on medical ward (mean age 67)ROAG3 point scale on 8 itemsUnknown
16Riberio et al. [37]2014Cross-sectional

Cross-cultural validity

Criterion validity

Intra-rater reliability

CHWCommunity-dwelling older peopleROAG3 point scale on 8 items11 min
17Simpelaere et al. [38]2016Cross-sectional with two- week follow-up for test-retest

Intra-rater reliability

Inter-rater reliability

Test-retest reliability

Speech PathologistsAcute geriatric department/hospitalized, residential care settings (assisted living and nursing homes)OHAT3 point scale on 8 itemsMean time: 2.45 min
18Yanagisawa et al. [39]2017Cross-sectional

Internal consistency

Inter-rater reliability

CaregiversInstitutionalized older peopleOAS3 point scale on 9 itemsUnknown

Non-dental healthcare abbreviations: RN Registered Nurse, LVN Licensed Vocational Nurse, CN Clinical Nurse, LPN Licensed Practical Nurse, DDS Doctoral Dental Surgery, DNS Director of Nursing, CHW Community health workers, NA Nurse assistant, PCA Personal Care Attendants.

Oral health assessment abbreviations: ROAG The Revised Oral Assessment Guide, (2) MDS-RAI/RAP the Minimum Data Set-Resident Assessment Instrument/ Resident Assessment Protocol, OHAT with oral health component, (3) the Oral Health Assessment Tool, (4) THROAT The Holistic Reliable Oral Assessment Tool, (5) DHR Dental Hygiene Registration, (6) MPS Mucosal Plaque Score, (7) BOHSE the Brief Oral Health Screening Examination and the OAS Oral Assessment Sheet

Data-extraction table for the included studies Content validity Criterion validity Intra-rater reliability Inter-rater reliability Test-retest reliability Content validity Intra-rater reliability Inter-rater reliability Content validity Criterion validity Inter-rater reliability Intra-rater reliability Inter-rater reliability Inter-rater reliability Test-retest reliability Criterion validity Inter-rater reliability Cross-cultural validity Criterion validity Intra-rater reliability Intra-rater reliability Inter-rater reliability Test-retest reliability Internal consistency Inter-rater reliability Non-dental healthcare abbreviations: RN Registered Nurse, LVN Licensed Vocational Nurse, CN Clinical Nurse, LPN Licensed Practical Nurse, DDS Doctoral Dental Surgery, DNS Director of Nursing, CHW Community health workers, NA Nurse assistant, PCA Personal Care Attendants. Oral health assessment abbreviations: ROAG The Revised Oral Assessment Guide, (2) MDS-RAI/RAP the Minimum Data Set-Resident Assessment Instrument/ Resident Assessment Protocol, OHAT with oral health component, (3) the Oral Health Assessment Tool, (4) THROAT The Holistic Reliable Oral Assessment Tool, (5) DHR Dental Hygiene Registration, (6) MPS Mucosal Plaque Score, (7) BOHSE the Brief Oral Health Screening Examination and the OAS Oral Assessment Sheet

The methodological quality of the included studies per measurement property

None of the studies assessed all measurement properties included in the COSMIN checklist. Chalmers et al. (2005) investigated the most (N = 5) measurement properties of the OHAT (Table 2). In total, five studies showed good methodological quality on at least one measurement property and 14 studies showed poor methodological quality on some of their measurement properties. An overview of the reasons for poor methodological quality is shown in Table 3. Below, the results on the methodological quality per measurement property will be described. The following measurement properties were not investigated by any of the included studies: Measurement error (box C), Structural validity (box E), Hypothesis testing (box F) and Responsiveness (box I).
Table 3

Reasons for scoring poor methodological quality on the measurement property for assessing oral health per study

StudyAssessmentMeasurement propertyReason for poor methodological quality
Andersson et al. (2002b) [25]ROAGContent validity

- Target population not involved

- Not assessed if all items together comprehensively reflect the construct to be measured

Arvidson-Bufano et al. (1996) [26]MDS-RAIInter-rater reliability

- Small sample size

- Only percent agreement calculated

Blank et al. (1996) [27]MDS-RAIInter-rater reliability

- Unclear how many patients the dentist assessed

- Only percent agreement is calculated

- Other important methodological flaws in design or execution of study

Chalmers et al. (2005) [10]OHAT

Content validity

Criterion Validity

Test-retest

- Target population not involved

- Not assessed if all items together comprehensively reflect the construct to be measured

- Small sample size

- No ICC or correlation calculated

Cohen-Mansfield et al. (2002) [28]MDSInter-rater reliability

- Small sample size

- No ICC or correlations calculated

- Other important methodological flaws in design or execution of study

Dickinson et al. (2001) [19]THROATContent validity- Target population not involved
Fjeld et al. (2017) [29]DHRContent validity- Target population not involved
Hanne et al. (2012) [30]ROAGCross-cultural validity- Only forward translation
Hawes et al. (1995) [31]MDSInter-rater reliability- Only percent agreement is calculated
Henriksen et al. (1999) [32]MPS

Intra-rater reliability

Inter-rater reliability

- Small sample size
Kayser-Jones et al. (1995) [33]BOHSEContent validity- Target population not involved
Paulsson et al. (2008) [36]ROAGCriterion validity

- Other important methodological flaws in design or execution of study

- Correlations or AUC not calculated

- Sensitivity and specificity not calculated

Simpelaere et al. (2016) [38]OHATIntra-rater reliability

- Small sample size

- Only percent agreement is calculated

Yanagisawa et al. (2017) [39]OASCriterion-validity- No factor analysis performed and no reference to another study
Reasons for scoring poor methodological quality on the measurement property for assessing oral health per study - Target population not involved - Not assessed if all items together comprehensively reflect the construct to be measured - Small sample size - Only percent agreement calculated - Unclear how many patients the dentist assessed - Only percent agreement is calculated - Other important methodological flaws in design or execution of study Content validity Criterion Validity Test-retest - Target population not involved - Not assessed if all items together comprehensively reflect the construct to be measured - Small sample size - No ICC or correlation calculated - Small sample size - No ICC or correlations calculated - Other important methodological flaws in design or execution of study Intra-rater reliability Inter-rater reliability - Other important methodological flaws in design or execution of study - Correlations or AUC not calculated - Sensitivity and specificity not calculated - Small sample size - Only percent agreement is calculated

The methodological quality of the measurement property validity

Nine out of the 18 included studies investigated the domain validity of the oral health assessments (Table 4).
Table 4

Methodological quality of the measurement property “validity” by the COSMIN and quality criteria of the measurement properties per assessment

AssessmentStudyValidity
Content validityCross-cultural validityCriterion Validity
MQMQMQ
ROAGAndersson et al. (2002b) [25]PoorN.A.
Hanne et al. (2012) [30]PoorN.A.
Paulsson et al. (2008) [36]PoorN.A.
Ribeiro et al. (2014) [37]Fair?Gooda

?

(Sens: 0.17-0.80)

(Spec: 0.69-0.98)

OHATChalmers et al. (2005) [17]PoorN.A.PoorN.A.
THROATDickinson et al. (2001) [19]PoorN.A.
DHRFjeld et al. (2017) [29]PoorN.A.Fair

+

(r(s) = 0.78)

BOHSEKayser-Jones et al. (1995) [33]PoorN.A.
Lin et al. (1999) [34]Gooda

-

(r: 0.351-0.578)

M = Assessment of methodological quality: “excellent”, “good”, “fair”, “poor”’ by COSMIN. Q = criteria for measurement properties; + = positive rating;? = indeterminate rating; − = negative rating.

aFor criterion validity, a non-dental healthcare professional was the index-rater, a dentist was used as reference-rater

N.A. Not applicable was reported for the quality criteria when an article had poor methodological quality.

Methodological quality of the measurement property “validity” by the COSMIN and quality criteria of the measurement properties per assessment ? (Sens: 0.17-0.80) (Spec: 0.69-0.98) + (r(s) = 0.78) - (r: 0.351-0.578) M = Assessment of methodological quality: “excellent”, “good”, “fair”, “poor”’ by COSMIN. Q = criteria for measurement properties; + = positive rating;? = indeterminate rating; − = negative rating. aFor criterion validity, a non-dental healthcare professional was the index-rater, a dentist was used as reference-rater N.A. Not applicable was reported for the quality criteria when an article had poor methodological quality. Of those, all five studies that assessed content validity, scored poor on their methodological quality, mainly because the patient population was not involved in developing the oral health assessment and studies did not assess if the items comprehensively reflect the construct (i.e. “oral health”) to be measured [19, 25, 29, 33, 40] (see Table 3). Two studies assessed cross-cultural validity. The ROAG was translated in Portuguese by Riberio et al. (2014) using multiple forward translations and one backward translation [37]. Hanne et al. (2012) only conducted forward translation into Danish and scored therefore poor on the methodological quality [30] (Table 3). Criterion validity was assessed by five studies on the ROAG, OHAT, DHR, and BOHSE. Chalmers et al. (2005) and Paulsson et al. (2008) scored poor on their methodological quality on this property (Table 3). Riberio et al. (2014) assessed the ROAG on criterion validity with a dentist considered as “gold standard” (reference-rater) and had good methodological quality [37]. Fjeld et al. (2017), investigated the criterion validity on the DHR and Lin et al. (1999) on the BOHSE [29, 34]. They scored fair and good on the methodological quality on the measurement property respectively (Table 4). The studies investigating the MDS, MPS, and OAS were not assessed on any validity items [26–28, 31, 32, 35, 39].

The methodological quality of the measurement property reliability

For this study, the reliability was divided into intra-rater reliability, inter-rater reliability, and test-retest to assess the methodological quality. Internal consistency was only investigated by the study of Yanagisawa et al. (2017) but was of poor methodological quality [39] (Table 3).

Intra-rater reliability

The intra-rater reliability was investigated for the ROAG, OHAT, THROAT, MPS, and DHR. Good methodological quality of the intra-rater reliability assessment was performed for the ROAG and THROAT by Ribeiro et al. (2014) and Dickinson et al. (2001) respectively [19, 37] (Table 5). The studies of Chalmers et al. (2005) and Simpelaere et al. (2016) investigated the intra-rater reliability for the OHAT [17, 38]. Chalmers et al. (2005) only reported unweighted kappas and was therefore of fair methodological quality.
Table 5

Methodological quality of the measurement property “reliability” by the COSMIN and quality criteria of the measurement properties per assessment

AssessmentStudyReliability
Internal-consistencyIntra-rater reliabilityInter-rater reliabilityTest-retest reliabilityRaters
MQMQMQMQ
ROAGAndersson et al. (2002a) [18]Gooda

?/−

(κ/κw: 0.45-0.84)b

Nurse/Dental hygienist
Ribeiro et al. (2014) [37]Good

+/−

w: 0.38-0.88)

Community health workers
MDSArvidson-Bufano et al. (1996) [28]PooraN.A.Nurse/Dentist
Blank et al. (1996) [27]PooraN.A.Nurse/Dentist
Cohen-Mansfield (2002) [28]PooraN.A.Geriatricians/Dentist
Hawes et al. (1995) [31]PoorN.A.Nurses
MDS-HCMorris et al. (1997) [35]Good

+/−

w: 0.57-0.7)

Nurses
OHATChalmers et al. (2005) [17]Fair

+ (ICC = 0.78)

? (κ: 0.51-0.80)b

Fair

+ (ICC = 0.74)

? (κ: 0.48-0.80)b

PoorN.A.Nurses
Simpelaere et al. (2016) [38]PoorN.A.Fair

+ (ICC = 0.96)

? (κ: 0.83-1.00)

Fair

+ (ICC = 0.81 & 0.78)

? (κ: 0.14-0.91)

Speech pathologists
THROATDickinson et al.(2001) [19]Good

+/−

w: 0-0.96)

Gooda

+/−

w: 0.46-0.97)

Dental hygienist,/ stroke specialist nurse and staff Nurse
DHRFjeld et al. (2017) [29]Fair

+

(κ: 0.7-0.8)

Faira

?

(κ: 0.4-0.8)

Dental hygienist and Nurse
MPSHenriksen et al. (1999) [32]PoorN.A.PooraN.A.Dentist, 2 Dental Hygienist, and Nurse
BOHSEKayser-Jones et al. (1995) [33]Faira

-(r: 0.4-0.68)

? (κ: -0.02-0.82)b

Fair

+/−

(r: 0.79-0.88)

Dentist and Nurses
Lin et al. (1999) [34]Faira

?

(κ: -0.018-0.519)b

Dentist and Nurses
OASYanagisawa et al. (2017) [39]PoorN.AFair

? (κ: 0.25-0.90)

+/- (ICC: 0.54-0.98)

Dental professionals and care workers

M = Assessment of methodological quality: “excellent”, “good”, “fair”, “poor” by COSMIN. Q = criteria for measurement properties; + = positive rating;? = indeterminate rating; − = negative rating.

a Inter-rater reliability measurements have been performed by two different professions.

bOnly kappas are reported instead of percent agreement because this reflects better methodological quality according to the COSMIN criteria

N.A. Not applicable was reported for the quality criteria when an article had poor methodological quality.

Methodological quality of the measurement property “reliability” by the COSMIN and quality criteria of the measurement properties per assessment ?/− (κ/κw: 0.45-0.84)b +/− (κw: 0.38-0.88) +/− (κw: 0.57-0.7) + (ICC = 0.78) ? (κ: 0.51-0.80)b + (ICC = 0.74) ? (κ: 0.48-0.80)b + (ICC = 0.96) ? (κ: 0.83-1.00) + (ICC = 0.81 & 0.78) ? (κ: 0.14-0.91) +/− (κw: 0-0.96) +/− (κw: 0.46-0.97) + (κ: 0.7-0.8) ? (κ: 0.4-0.8) -(r: 0.4-0.68) ? (κ: -0.02-0.82)b +/− (r: 0.79-0.88) ? (κ: -0.018-0.519)b ? (κ: 0.25-0.90) +/- (ICC: 0.54-0.98) M = Assessment of methodological quality: “excellent”, “good”, “fair”, “poor” by COSMIN. Q = criteria for measurement properties; + = positive rating;? = indeterminate rating; − = negative rating. a Inter-rater reliability measurements have been performed by two different professions. bOnly kappas are reported instead of percent agreement because this reflects better methodological quality according to the COSMIN criteria N.A. Not applicable was reported for the quality criteria when an article had poor methodological quality. Simpelaere et al. (2016) and Henriksen et al. (1999) scored poor methodological quality for this property (Table 3). Fjeld et al. (2017) scored fair methodological quality on this measurement property.

Inter-rater reliability

Inter-rater reliability was assessed for all oral health assessments in 14 included studies. Inter-rater reliability was investigated between several professions: nurses, speech pathologists or a dental professional with a non-dental healthcare professional (Table 5). Only three studies scored good on the methodological quality: Andersson et al. (2002), testing the ROAG, Morris et al., testing the MDS-HC and Dickinson et al. (2001), testing the THROAT [18, 19, 35]. The MDS was assessed on inter-rater reliability by all five studies on MDS. However, the quality was rated poor for four of them because of the low quality of the statistical method and small sample size (Table 3) [26–28, 31]. Studies investigating the OHAT, DHR, BOHSE, and OAS scored fair on methodological quality on the inter-rater reliability mainly because they reported unweighted kappas for ordinal scores [17, 29, 33, 39]. The study of Henriksen et al. (1999), showed poor methodological quality (Table 3) [32].

Test-retest reliability

Simpelaere et al. (2016) and Chalmers et al. (2005) investigated the stability of the OHAT by a test-retest. Chalmers et al. (2005) did not report correlations over time and therefore scored poor on the methodological quality (Table 3). Kayser-Jones et al. (1995) (BOSHE) also looked at test-retest reliability. The methodological quality was fair because of the moderate sample size and reported unweighted kappas for the ordinal score.

Characteristics of individual oral health assessments and the quality assessment of their measurement properties

Overall, the oral health assessments include 18 items in the oral cavity. The most frequently assessed items are lips, mucosa membrane, tongue, gums, teeth, denture, saliva, and oral hygiene (Table 6). The assessments of each item can differ. For example the item “Lips”: some assessments assess it by color and moistness while others look at swelling and bleeding (Table 6).
Table 6

Items which are assessed by the different oral health assessments

ROAGaMDSbOHATb/cTHROATaDHRMPSBOHSEdOAS
1. Mucosa membraneXXXXXXX
 Color/RashXXXXXX
 MoistnessXXXX
 Swelling/glazing/granulations/HyperplasiaXXXXX
 BleedingXXXXX
 Ulcers / Spots (under dentures)XXXXXXX
2. GumsXXXXX
 ColorXXXX
 MoistnessXX
 Swelling/glazingXXXX
 BleedingXXXX
 FirmnessXX
 InflammationXX
 Ulceration/spotsXXX
 Loose teethX
3. TeethXXXX
 Decay/Cariës/Broken teethXXXX
 Number of teethXX
 Tooth erosion/wearX
4. DenturesXXXXX
 Broken partsXXX
 Does the individual wear the denturesXXX
 Fit of dentures/need for adhesiveXX
 Label on denturesX
 FunctionalityX
5. LipsXXXX
 ColorXXXX
 Surface structure/Candida infectionXXXX
 MoistnessXXXX
 UlcerationXXXX
 BleedingXXXX
 SwellingX
6. TongueXXXXX
 ColorXXXX
 Surface structureXXXX
 MoistnessXXXX
 Ulceration/coatingXXXXX
 SwellingXX
 BleedingX
7. SalivaXXXXX
 Measured as friction/adherence of mouth mirror at buccal mucosaX
 Amount/structure of salivaXXXX
 Involvement of tissuesXXX
 Experience of individualX
8. PalateXX
 ColorXX
 Surface structureXX
 MoistnessXX
 UlcerationXX
 SwellingX
 Inflammation/bleedingXX
9. Floor of mouthXX
 ColorXX
 Surface structureXX
 MoistnessXX
 Ulceration/coatingXX
 SwellingX
 Inflammation/bleedingXX
10. Oral hygine (debris and plaque)XXXXXXX
11. Referral to a dental professionalXX
12. SmellXXX
13. Pairs in chewing position (amount)XX
14. Pain (physical signs and verbal signs)X
15. Voice (deep, rasping or painful)X
16. Ability to swallow (pain/inability to swallow)X
17. Functionality (mouth opening, tong thrusting)X
18. Lymph nodes (enlargement and tenderness)X

a) The ROAG and THROAT assess the items “Teeth and Dentures”’, however, they actually look at plaque/debris and oral hygiene in this item. Therefore, we labeled these items as “Oral Hygiene”. b)The MDS and OHAT combine the items “Gums and Mucosa membrane” into one item. c) The OHAT does not have a separate item for smell. They included it in the item “Oral Hygiene”. d) The BOHSE combines the items “Mucosa Membrane”, “Floor of mouth” and “Palate” into one item.

Items which are assessed by the different oral health assessments a) The ROAG and THROAT assess the items “Teeth and Dentures”’, however, they actually look at plaque/debris and oral hygiene in this item. Therefore, we labeled these items as “Oral Hygiene”. b)The MDS and OHAT combine the items “Gums and Mucosa membrane” into one item. c) The OHAT does not have a separate item for smell. They included it in the item “Oral Hygiene”. d) The BOHSE combines the items “Mucosa Membrane”, “Floor of mouth” and “Palate” into one item. If applicable, below the validity, intra−/inter-rater reliability and test-retest of the oral health assessments will be evaluated in their context and the quality assessment of the measurement property will be reported. No studies with acceptable methodological quality of any of the measurement properties were found for the MPS, so this assessment will not be discussed.

ROAG

Andersson et al. (2002) conducted a study on the inter-rater reliability between a dental hygienist and a registered nurse [18]. The percent agreement was the lowest for teeth/dentures and tongue and the highest for swallowing and voice. Only weighted kappas (κw) were reported on items that scored a minimum and maximum on the ordinal scale. For the items “voice”’ and “gums” no maximum score (score 3) was registered and therefore unweighted kappas (K) were reported instead of weighted Kappas. The quality assessment of the measurement property scored therefor? /−. The Kappas ranged from 0.45–0.84 with a mean of 0.59 (Table 5). The lowest kappas were found for voice (κ), teeth/dentures (κw), tongue (κw), and saliva (κw) and the highest for swallowing (κw). Ribeiro et al. (2014) investigated the ROAG on validity and reliability in Portuguese [37]. Criterion validity was assessed with a dentist considered as “gold standard”(reference-rater). The measurement property was scored indeterminate (?) because sensitivity, specificity, and accuracy were reported. Sensitivity ranged from 0.17 for saliva to 1.0 for swallowing. Specificity ranged from 0.69 for teeth/dentures to 0.98 for saliva (Table 4). For intra-rater reliability for the community health workers (CHW’s), only weighted kappas were measured for the items with two or three levels of response: tongue, hygiene of teeth and dentures, and/or caries. They ranged from κw = 0.38 to κw = 0.88 and therefore scored +/− on the measurement property (Table 5). The lowest weighted kappa was found for teeth/dentures. Unweighted kappas were the lowest for saliva and the highest for voice, lips, and swallowing.

MDS

The MDS was investigated by five different studies, however as described before, four of them had poor methodological quality and will not be evaluated in-depth. Morris et al. (1997), using the MDS-HC (for community-dwelling older people) reported overall weighted kappas between nurses for the oral health component ranging from κw = 0.57 to κw = 0.60. For MDS 2.0 (nursing homes) this was κw = 0.70. Because of the spread between weighted kappas, a +/− was scored for the quality criteria (see Table 5) [35].

OHAT

Measurement properties of the OHAT were assessed by Chalmers et al. (2005) and Simpelaere et al. (2016). In the study of Chalmers et al. (2005), on individual item level, intra-rater reliability ranged from 74.4% agreement for oral cleanliness to 93.9% for dental pain and 96.6% for a referral to the dentist [17]. Unweighted kappas were moderate: 0.51–0.60 for lips, saliva, oral cleanliness and referral to the dentist. All other categories showed kappas ranging from 0.61–0.80, which indicates substantial agreement. The overall intraclass correlation coefficient on the total score was 0.78 and all results were statistically significant. The quality of measurement property was scored +/? because of its high Intra Class Correlation (ICC) and reported unweighted kappas (Table 5). For the inter-rater reliability between nurses, percent agreement ranged from 72.6% for oral cleanliness to 92.6% for dental pain and 96.8% for the referral to the dentist. Unweighted kappas varied from 0.48–0.60 for lips, tongue, gums, saliva, oral cleanliness and referral to the dentist. The other items scored between 0.61 and 0.80, indicating substantial agreement for inter-rater reliability. The correlation coefficient for the inter-rater agreement on the total score was 0.74. All statistics were statistically significant. The quality of measurement property was scored +/? because of its high ICC and unweighted kappas were reported (Table 5). Simpelaere et al. (2016) investigated the intra-, inter- and test-retest reliability in speech pathologists [38]. However, intra-rater reliability was of “poor” methodological quality as described earlier and will not be further described. The inter-rater reliability was tested between three speech pathologists on 132 individuals. The ICC on the total score was 0.96 (95% CI 0.95–0.97) and scored therefore positive (+) on the quality criteria (Table 5). The individual items varied with a Fleiss kappa from 0.83 to 1.00. No weighted kappa was calculated, therefore an indeterminate (?) rating was given. For the test-retest, a second assessment was performed on 46 individuals after two weeks. The ICC for the two raters on the total score was 0.81 (95% CI 0.68–0.89) and 0.78 (95% CI 0.64–0.87). Kappas varied between 0.14 for dental pain and 0.91 for dentures and teeth. Another slight agreement was found for gums and tissues. Because of the reported unweighted kappas, and indeterminate (?) rating was scored (Table 5).

Throat

For the intra-rater agreement investigated by Dickinson et al. (2001), the weighted kappas varied between κw = 0.69–0.96 for all items, except for the floor of the mouth and smell (κw) = 0. For the total score, intra-rater reliability was good κw = 0.95 (95% CI 0.88–1.02) [19]. Because of the large spread between kappas, the measurement property scored +/− on the quality criteria (Table 4). The Inter-rater assessment for the single items was performed between nurses and the dental hygienist reporting unweighted kappas of κ < 0.30 across the raters. Negative kappas were reported for teeth and smell. When raters were paired, the weighted kappas ranged from κw = 0.46-0.89, with the lowest values for teeth and dentures. Because of the spread between kappas a +/− was scored on the quality criteria. A positive (+) rating for the inter-rater reliability on the total score was reported because weighted kappas were κw = 0.96 (95% CI 0.90–1.02) between a stroke specialist nurse and student nurse and κw = 0.97 (95% CI 0.92–1.02) between stroke specialist nurses and dental hygienist.

DHR

Fjeld et al. (2017) developed and tested the DHR [29]. For criterion validity, a positive (+) rate was scored because correlations with their reported gold standards (Mucosal Plaque Index [32] and OHI-S [41]) was Rs = 0.78 and statistically significant (Table 4). For inter-rater reliability, the unweighted kappa between the dental hygienist and clinical nurse was κ = 0.4 (not statistically significant) and scored therefore indeterminate (?). Intra- and inter-rater reliability has also been evaluated on a series of videos. The inter-rater reliability was scored indeterminate (?) because the unweighted kappa for the dental hygienist was 0.7 and for the clinical nurse κ = 0.8 (Table 5).

BOHSE

Lin et al. (1999) investigated the criterion validity using a dentist as “gold standard”(reference-rater) [34]. For criterion validity +/− was scored because the correlation coefficients varied between 0.351 and 0.578 for the dentist and the nurses (nurse and clinical nurse assistant (CNA)). However, correlation coefficients were lower than 0.70 and therefore they scored negative (−) on the quality criteria (Table 4). Inter-rater reliability was also tested between the dentist and the nurses. An intermediate (?) score was given because only percent agreement and unweighted kappas were reported. The lowest percent agreements were found on the items lips, gums, natural teeth, and oral cleanliness: 60.7%, 37.5%, 60.7%, and 32.1% respectively. Kappas ranged from κ = 0.015 to κ = 0.519. The lowest kappas were reported for gums between the Doctor of Dental Surgery (DDS) and CNA and oral cleanliness between the DDS and the nurse. The highest kappa was reported for pairs of teeth in chewing position (Table 5). In addition, negative kappas were reported for: lymph nodes, lips, tongue and tissues/cheek and, the floor of the mouth. In the study of Kayser-Jones et al. (1995) the inter-rater reliability on the total score was rated negative (−) because correlations varied between 0.40 (RN and CAN) and 0.68 (between the DDS and LVN) and were all statistically significant [33]. For the individual items, percent agreement ranged from 50.5–98.0. With the lowest values for oral cleanliness and the highest for lymph nodes. The unweighted kappas ranged from κ = 0.09 for the item tissues and κ = 0.82 for pairs in chewing position. Negative kappas were reported for lymph nodes. The individual items of the BOHSE scored indeterminate (?) because unweighted kappas were reported (Table 5). The test-retest reliability was assessed on the total score by Kayser-Jones et al. (1995) for the DDS, RN, LVN, and CNA. The highest correlation was reported for the RN between time 1 and 2. The quality criteria scored +/− because statistically significant correlations varied between r = 0.79 and r = 0.88 between time 1 and 2 for different raters (Table 5).

OAS

Yanagisawa et al. (2017) investigated the inter-rater reliability between dental professionals and carers before and after training [39]. Between dental professionals, the Fleiss’ kappa ranged from 0.49 to 0.83 and the ICC mean was 0.93. Kappa values were low for tongue coat, bad breath, and mouth opening. The kappas between dental professionals and care workers ranged from 0.25–0.80 and were the highest for bad breath and tongue thrusting. After the training, the mean kappas increased to a mean of 0.72 and the ICC increased to 0.89, with the lowest values for the cleanliness of teeth and gums, bad breath and difficulty chewing. Indeterminate (?) score was reported because the unweighted kappas were reported and the ICC scored +/− because of the variance between the scores (Table 5).

Discussion

With this systematic review, we evaluated eighteen studies, investigating eight oral health assessments for use by non-dental healthcare professionals to assess older peoples’ oral health, on their content and measurement properties in order to give recommendations for practice, policy and research. Out of the eighteen included studies, only five of them scored good on the methodological quality of some of the measurement properties [18, 19, 34, 35, 37]. Overall, the OHAT has been most extensively investigated on its measurement properties with fair/good methodological quality and a positive(+)/indeterminate(?) quality assessment of the outcome. Similar results were found for the BOHSE (a prior version of OHAT) which was the most reliable and valid oral health assessment, according to the systematic review of Pearson and Chalmers in 2005 [10]. However, nurses concluded that the BOHSE was too long and complicated and therefore it has been simplified into the OHAT by Chalmers et al. (2005) [17, 33]. Three adaptations were made: 1. The category of lymph nodes and pairs of teeth in chewing position was eliminated; 2. The items tissue and gums were combined and 3. A category of behavioral problems and pain was added. The ROAG, MDS, OHAT, THROAT, BOHSE, and OAS contain most items to inspect the oral cavity, varying between 6 and 12 items. The results of this review show the least agreement between raters on the items: oral hygiene, lips, saliva, and natural teeth. An explanation could be that non-dental healthcare professionals lack experience in assessing these items. Results from a focus group discussion from Chalmers (2005) support these findings; nurses felt less capable of assessing gums and tissues and natural teeth. Surprisingly, the nurses felt less capable of assessing the domain ‘pain’, which also showed the lowest kappa in the study of Simpeleare et al. (2016) between three speech pathologists. Another remarkable result was the negative kappas in the study of Lin et al. (1999) for lymph nodes, lips, tongue, and tissues. In this study, they claim that a negative kappa for lymph nodes was found because the research population did not show enlarged lymph nodes during the study [34]. However, no explanation has been given for the other negative values. Literature states that a negative kappa can occur when the outcome is lower than expected or disagreement between two raters occurs [42]. However, more information on the context of the study is needed to give a reliable explanation. The study of Dickinson et al. (2001) reported negative kappas for the items teeth and smell. This study supports the explanation of too little variety between the scores [19]. Therefore they modified the THROAT by removing these items during further analysis. As far as we know, this is the first systematic review that critically appraised the methodological quality of studies investigating the measurement properties of oral health assessments for use by non-dental healthcare professionals. When the methodological quality of the studies is lacking, the validity and reliability of the outcomes remain unclear [16]. Therefore, first, the methodological quality of the measurement property per study has been assessed. For this purpose, we used the COSMIN checklist with a 4-point scale [24]. Although recent updates of COSMIN are published, we chose to use the former version instead of the update. The updated COSMIN is specially developed for Patient-Reported Outcome Measures (PROMs), with a conditional step for good content validity for further assessment of other measurement properties [43], while the version of 2012 that we used focusses in a more general context on measurement properties of measurement instruments/assessments and therefore is better suited to our objective. However, even the COSMIN version of 2012 lead to some discussion points in our study. Although developed for assessing measurement properties in a more general context, this version of COSMIN strongly emphasizes the involvement of the target population (patients) in developing a measurement instrument. As a result, content validity scored poor overall on the methodological quality in the included studies because none of the included studies involved patients in developing the oral health assessment [44]. Nevertheless, we doubt to what extent the input of patients should be highly rated in the development of an oral health assessment which is used by non-dental healthcare professionals. The input of experts and non-dental healthcare professionals, might, in this case, be more valuable. The included studies often consulted experts and non-dental healthcare professionals in the development of oral health assessments. Therefore, we think that the rating of poor methodological quality with the COSMIN on this item should be interpreted with reservations. Regarding terminology, we noticed that “validity” and “reliability” are not used consistently in the included studies. We sometimes found mixed terminology for intra-rater reliability and test-retest reliability: Intra-rater reliability was described in the study, while a time interval of the second assessment was stated. Thus, in this case, test-retest would have been more appropriate. In addition, comparisons between a dental professional and non-dental healthcare professionals were made in assessing the criterion validity in some studies, while other studies referred to this as inter-rater reliability. For inter-rater reliability, often a non-dental healthcare professional was compared to a dental care professional as the reference-rater. For criterion validity, the dental professional was referred to as the “gold standard”. The purpose of investigating the criterion validity is to compare the investigated instrument/assessments against a gold standard. However, no gold standard for oral health assessments exist. The OHAT and DHR were the only assessments in which the single items were assessed using several standardized criteria [17, 29]. However, these indices are not reported as gold standards. Since the aim of the oral health assessment is not to diagnose oral diseases but to screen and triage, we consider a dental professional as the expert in detecting oral problems and therefore we scored positive on the methodological quality of criterion validity when using a dental professional as “gold standard” (reference-rater). Finally, a remark on the “worst score counts” method should be discussed: some studies scored good or excellent on a majority of the items, except for one single item, which resulted in a “poor” overall score. For example, the study of Chalmers et al. (2005) scored poor on the validity items because of the small sample size, while all other items scored good/excellent. This makes the method very strict in its overall score and this should be taken into account when referred to as “poor” methodological quality items.

Recommendations for researchers, policymakers, and users

Based on our findings, we recommend more research on the measurement properties validity and reliability of the existing oral health assessments. This should be done in studies with good methodological quality as introduced by COSMIN. As a first step, there should be unanimity about the content of oral health assessments performed by non-dental healthcare professionals. Relevant stakeholders should determine which items assess a “healthy” versus “unhealthy” mouth. The FDI is working on a standardized set of oral health measures that could be used as background information and be adapted for this specific purpose (oral health assessment by non-dental healthcare professionals) [45]. In addition, when conducting research on the measurement properties, a proper distinction should be made between testing validity or reliability and the use of adequate statistical methods and analysis Furthermore, when investigating criterion validity, it is recommended to investigate the individual items of an oral health assessment using standardized criteria like the Mucosal Plaque Index and OHI-S, WHO oral lesions categories, Rise denture assessment and NIDR tooth status as conducted by Chalmers et al. (2005) and Fjeld et al. (2007) [17, 29]. Since research on validity and responsiveness requires “gold standards”, which are not available for all aspects of oral health, we recommend research on the standardization of oral health measures and the possibility to develop gold standards. Finally, when new oral health assessments for non-dental healthcare professionals are developed we recommend using the COSMIN guideline to minimize methodological flaws and develop highly reliable and valid oral health assessments [46]. Policymakers should take into account the level of education and proper training of the healthcare workers when implementing an oral health assessment. Training in using an oral health assessment might not be sufficient as there is a need for improvement of oral health knowledge of non-dental healthcare professionals in general [47]. Several studies concluded that non-dental healthcare professionals lack knowledge about oral health [1, 47–49]. A literature review concluded that educational programs delivered, regularly reinforced by a dental hygienist, and using several teaching formats were most effective in the improvement of oral health of patients [47]. Therefore, we recommend that a dentist or a dental hygienist is involved during the implementation of oral health assessments of older people for continues training and feedback to support non-dental healthcare professionals. For non-dental healthcare professionals, we recommend taking into account the objective of assessing the oral cavity when choosing an oral health assessment. When screening, triage or decision for a referral to a dental professional is the main objective, the OHAT (prior BOHSE) and ROAG could be suitable. However, also other oral health assessments could be relevant when: (1) it is part of a general geriatric assessment (MPS); (2) the oral health assessment is for a specific patient group (THROAT); (3) only oral hygiene will be evaluated (DHR); or (4) the objective of an assessment is to give an indication of the oral health situation and set up an oral health care plan of patients in a specific setting (ROAG, OAS).

Conclusion

In this systematic review, several oral health assessments have been evaluated on their measurement properties. Most studies suffer from methodological shortcomings (according to the COSMIN criteria). To increase the methodological quality of oral health assessments, and facilitate the investigation thereof in future research, standardization of oral health assessment is required. Taken into account the scarce evidence of the proposed oral health assessments, the OHAT and ROAG are most complete in their included oral health items (including triage and referral to a dental professional when needed) and their studies are of best methodological quality in combination with a positive quality assessment on validity and reliability. Moreover, the OHAT has been most comprehensively investigated on its measurement properties. When choosing an oral health assessment, non-dental healthcare professionals should take such evidence into account. However, when using these oral health assessments one must realize that to date its evidence base is rather limited. Policymakers should be aware of the methodological limitations of the existing assessments when implementing them in healthcare and provide sufficient education for its users. Additional file 1. Search strategies for databases. Search strategy per database
  39 in total

1.  Improving the oral health of older people in long-term residential care: a review of the literature.

Authors:  Karen Miegel; Tracey Wachtel
Journal:  Int J Older People Nurs       Date:  2009-02-05       Impact factor: 2.115

2.  Comprehensive clinical assessment in community setting: applicability of the MDS-HC.

Authors:  J N Morris; B E Fries; K Steel; N Ikegami; R Bernabei; G I Carpenter; R Gilgen; J P Hirdes; E Topinková
Journal:  J Am Geriatr Soc       Date:  1997-08       Impact factor: 5.562

3.  [The impact of frailty on the oral care behaviour and dental service use of elderly people].

Authors:  D Niesten; W J M van der Sanden; A E Gerritsen
Journal:  Ned Tijdschr Tandheelkd       Date:  2015-04

4.  The oral health assessment tool--validity and reliability.

Authors:  J M Chalmers; P L King; A J Spencer; F A C Wright; K D Carter
Journal:  Aust Dent J       Date:  2005-09       Impact factor: 2.291

5.  Validity and reproducibility of the revised oral assessment guide applied by community health workers.

Authors:  Marco Tulio F Ribeiro; Raquel C Ferreira; Andrea M D Vargas; Efigênia Ferreira e Ferreira
Journal:  Gerodontology       Date:  2013-01-07       Impact factor: 2.980

6.  Chewing ability of the long-term hospitalized elderly.

Authors:  Petteri Peltola; Miira M Vehkalahti
Journal:  Spec Care Dentist       Date:  2005 Sep-Oct

7.  Oral health assessment by nursing staff of Alzheimer's patients in a long-term-care facility.

Authors:  C Y Lin; D B Jones; K Godwin; R K Godwin; J A Knebl; L Niessen
Journal:  Spec Care Dentist       Date:  1999 Mar-Apr

8.  The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study.

Authors:  Lidwine B Mokkink; Caroline B Terwee; Donald L Patrick; Jordi Alonso; Paul W Stratford; Dirk L Knol; Lex M Bouter; Henrica C W de Vet
Journal:  Qual Life Res       Date:  2010-02-19       Impact factor: 4.147

9.  Rating the methodological quality in systematic reviews of studies on measurement properties: a scoring system for the COSMIN checklist.

Authors:  Caroline B Terwee; Lidwine B Mokkink; Dirk L Knol; Raymond W J G Ostelo; Lex M Bouter; Henrica C W de Vet
Journal:  Qual Life Res       Date:  2011-07-06       Impact factor: 4.147

10.  The life course, care pathways and elements of vulnerability. A picture of health needs in a vulnerable population.

Authors:  Iain A Pretty
Journal:  Gerodontology       Date:  2014-02       Impact factor: 2.980

View more
  6 in total

Review 1.  Validity and Reliability of questionnaires measuring attitudes to oral health: A review of the literature.

Authors:  Rocío Del Pilar Ríos-León; Jessica-Margot Salas-Huallparimache; María-Elena Díaz-Pizán; Daniel-José Blanco-Victorio
Journal:  J Clin Exp Dent       Date:  2022-09-01

2.  Psychometric evaluation of a short-form version of the Swedish "Attitudes to and Knowledge of Oral Health" questionnaire.

Authors:  Maria Snogren; Amir H Pakpour; Irene Eriksson; Malin Stensson; Kristina Ek; Maria Browall
Journal:  BMC Geriatr       Date:  2022-06-22       Impact factor: 4.070

3.  Oral disease burden of dentate older adults living in long-term care facilities: FINORAL study.

Authors:  Lina Julkunen; Kaija Hiltunen; Hannu Kautiainen; Riitta K T Saarela; Kaisu H Pitkälä; Päivi Mäntylä
Journal:  BMC Oral Health       Date:  2021-12-07       Impact factor: 2.757

4.  Assessment of oral health in older adults by non-dental professional caregivers-development and validation of a photograph-supported oral health-related section for the interRAI suite of instruments.

Authors:  Stefanie Krausch-Hofmann; Trung Dung Tran; Barbara Janssens; Dominique Declerck; Emmanuel Lesaffre; Johanna de Almeida Mello; Anja Declercq; Jan De Lepeleire; Joke Duyck
Journal:  Clin Oral Investig       Date:  2020-11-16       Impact factor: 3.573

5.  Oral Assessment and Preventive Actions within the Swedish Quality Register Senior Alert: Impact on Frail Older Adults' Oral Health in a Longitudinal Perspective.

Authors:  Lisa Bellander; Pia Andersson; Helle Wijk; Catharina Hägglin
Journal:  Int J Environ Res Public Health       Date:  2021-12-11       Impact factor: 3.390

6.  Opioids and older adults: Increasing trends in opioid usage in a dental population compared to a National Database (NHANES).

Authors:  Piedad Suarez-Durall; Maile S Osborne; Chan Chan; Reyes Enciso; Roseann Mulligan
Journal:  Spec Care Dentist       Date:  2022-03-13
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.