Literature DB >> 36053625

Reliability of the Sexual Knowledge Picture Instrument: a potential diagnostic instrument for sexual abuse in young children.

Kirsten van Ham¹, Shanti Bolt², Mariska van Doesterling², Sonja Brilleslijper-Kater³, Rian Teeuw¹, Rick van Rijn⁴, Hans van Goudoever¹, Hanneke van der Lee⁵.

Abstract

OBJECTIVE: To determine the intra-rater and inter-rater reliability of the Sexual Knowledge Picture Instrument (SKPI), a potential diagnostic instrument for young suspected victims of sexual abuse containing three scoring forms, that is, verbal responses, non-verbal reactions and red flags.
DESIGN: Video-recorded SKPI interviews with children with and without suspicion of child sexual abuse were observed and scored by two trained, independent raters. The second rater repeated the assessment 6 weeks after initial rating to evaluate for intra-rater reliability.
SUBJECTS: 78 children aged 3-9 years old were included in the study. 39 of those included had known suspicion of sexual abuse and the other 39 had no suspicion. MAIN OUTCOME MEASURES: Intra-rater and inter-rater reliability of the scores per study group and in the total sample were assessed by Cohen's kappa and percentage of agreement (POA).
RESULTS: The median intra-rater Cohen's kappa exceeded 0.90 and the POA exceeded 95 for all three forms in both study groups, except for the red flag form (median Cohen's kappa 0.54 and POA 87 in the suspected group, and 0.84 and 92, respectively, in the total sample). For the verbal scoring form the median inter-rater Cohen's kappa and POA were 1.00 and 100, respectively, in both groups. For the non-verbal form the median inter-rater kappa and POA were 0.37 and 97, respectively, in the suspected group, and 0.47 and 100, respectively, in the control group. For the red flag form, they were 0.37 and 76, respectively, in the suspected group and 0.42 and 77, respectively, in the control group.
CONCLUSION: The reliability of the SKPI verbal form was sufficient, but there is room for improvement in the non-verbal and red flag scoring forms. These forms may be improved by adjusting the manual and improving rater training. © Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Entities: Chemical

Keywords: Child Abuse; Epidemiology; Forensic Medicine; Qualitative research

Mesh：

Year: 2022 PMID： 36053625 PMCID： PMC9280880 DOI： 10.1136/bmjpo-2022-001437

Source DB: PubMed Journal: BMJ Paediatr Open ISSN： 2399-9772

Despite its major consequences, sexual abuse in young children often remains unrecognised by medical and psychological professionals. The verbal scoring form of the Sexual Knowledge Picture Instrument has adequate intra-rater and inter-rater reliability. The reliability of the non-verbal and red flag scoring forms is suboptimal, requiring improvement of the manual and interviewer training for these forms. This study is part of the validation of an instrument that can be used in the diagnosis of sexual abuse in young children.

Introduction

Child sexual abuse (CSA) is a worldwide problem with potentially detrimental consequences to the victims.1–4 Short-term and long-term health effects that may arise as a result include depression, anxiety, post-traumatic stress disorder, eating disorders, substance abuse, and somatic syndromes such as sleeping disorders and heart and lung diseases.4–7 Early detection of signs of CSA by medical or psychological professionals is crucial to provide specialist support to the victims and to protect possible future victims. However, as reported by adults who were victims of CSA, and supported by the gap between prevalence numbers reported by authorities and self-report studies, we know that timely diagnosis of CSA is uncommon.8–14 Professionals who see young children with a suspicion of CSA are challenged for several reasons. When a child is presented for healthcare due to a suspected CSA, the chance of finding physical evidence is very small.15 16 Due to the nature of the abuse, there are usually no witnesses, although recording the abuse, either for personal use or to share on the dark web, does occur.17 Victims may struggle with feelings of dependency on, and loyalty to, the perpetrator, as well as feelings of shame and guilt or fear of being blamed if they disclose about sexual abuse. The limited verbal capacity of young children may hamper their ability to express their experiences, thoughts and feelings even more.11 14 Unfortunately, lessons from the past make us aware that the use of developed tools to facilitate disclosure, such as dolls and diagrams, even by professionals, can lead to false positive results.18–20 This can have major consequences, especially if such findings are used during the legal process, as was shown in notorious cases of false allegations of CSA.21–24 The current lack of scientific substantiation and the risk of improper tool use emphasise the importance of developing reliable, structured, evidence-based and uniform methods to support the diagnosis of CSA in clinical practice. A potential diagnostic instrument for medical and psychological professionals in cases of suspected CSA in young children (aged 3–9 years) is the Sexual Knowledge Picture Instrument (SKPI), based on previous work by Brilleslijper-Kater.25 This instrument consists of a child-friendly picture book with 15 illustrations about family routines, gender differences and identity, genitals and their functions, reproduction, intimate and sexual behaviour in adults, and normal physical intimacy in children. A semistructured interview technique from a manual allows a trained interviewer to conduct an open conversation with the child about the topics in the pictures and to potentially overcome the burden of disclosure. Afterwards, video recordings of each interview can be scored according to three standardised scoring lists from the manual: one on the child’s verbal responses, one on non-verbal behavioural reactions and one on overall impression and/or alarm signs (the so-called ‘red flags’). The SKPI pictures and manual are presented in online supplemental appendices 1 and 2. The aim of this study is to determine the intra-rater and inter-rater reliability of the SKPI. This is the first of two studies planned to validate the SKPI as a diagnostic instrument for CSA in children aged 3–9 years.26 If the diagnostic accuracy is proven to be adequate, this tool could be a valuable addition to current medical and psychological diagnostic work-up in young children with a suspicion of CSA.

Methods

Subject selection

In 2016, the Picture Instrument for Child Sexual Abuse Screening (PICAS) study started at Amsterdam University Medical Center. It included children aged 3–9 years with and without suspicion of CSA. During the study, trained interviewers used the SKPI with a sample of children from two different sources: First, a group consisting of suspected victims of CSA who had either been referred to the Department of Social Paediatrics in one of three participating Dutch university medical centres or who were investigated by a vice squad of the Dutch national police. Second, a control group consisting of children considered not to be victims of CSA. For more details on the study procedures, we refer to the article on the protocol.26 As recommended by de Vet et al,27 a minimum sample size of 50 subjects is required in validation studies of measurement instruments. To reach this number, all 39 children with suspicion of CSA who had been interviewed with the latest version of the scoring forms were included, as well as a selected sample of 39 children from the control group with equal age and gender distribution.

Data collection

Video-recorded interviews with the 78 children were scored three times: immediately by a first rater (who was one of the eight interviewers), a second time by the second rater (one forensic science master’s student) and a third time by the same second rater after a minimum interval of 6 weeks, to preclude recollection. All raters were either physicians or master’s students with medical or forensic background. They were individually trained by a specialised child psychologist (SB-K) and/or the main researcher (KvH) on how to conduct the semistructured interviews and how to work with the manual. All raters were blind to participants’ medical and psychological background information, and only the first rater was aware of the study group to which each child belonged. The verbal scoring form contained all 52 interview questions from the manual. By checking one of four (n=45) or five (n=7) answer options, each rater scored the answer given by the child. The non-verbal scoring form contained a table listing a total of 24 behavioural reactions. Each reaction could be checked for presence while observing each of the 15 pictures. The red flag scoring form consisted of three overarching questions with binary answer options to assess the interviewer’s overall impression of the child’s verbal and non-verbal behaviour during the interview.

Statistical analysis

The SKPI’s intra-rater reliability was assessed by comparing the two scorings of the second rater at different time points. Inter-rater reliability was assessed by comparing the rater scores for each child between the first rater and the primary scoring of the second rater. Data analysis was performed using the IBM SPSS software package (IBM SPSS Statistics for Windows, V.26.0). Descriptive statistics (percentages, median and IQR) were used to describe the demographic characteristics of the study population. For the verbal scoring, no, multiple answer options or ‘other…’ were considered a missing value. We calculated both Cohen’s kappa and percentage of agreement (POA) to assess intra-rater and inter-rater reliability. By definition, POA is higher than Cohen’s kappa, since kappa is adjusted for agreement by coincidence. For this reason, kappa is generally preferred over POA. However, in contrast to kappa, POA can always be calculated, even when some options have not been scored by one of the raters, as was the case for many items, in particular on the non-verbal scoring form.28 For the interpretation of Cohen’s kappa, Landis and Koch’s29 (arbitrary) grading system was applied on the median kappa per form, with a Cohen’s kappa of <0 signifying poor agreement, 0.00–0.20 as slight agreement, 0.21–0.40 as fair agreement, 0.41–0.60 as moderate agreement, 0.61–0.80 as substantial agreement and 0.81–1.00 as almost perfect agreement. For the interpretation of POA, a median of ≥80% agreement between raters was considered acceptable.28 For each of the three separate scoring forms, Cohen’s kappa and POA of all items and the median (IQR) per form were calculated in both study groups and in the total study sample.

Patient and public involvement

During the course of PICAS we received input from several adult CSA survivors who lived with the burden of the abuse throughout their childhood. The aim was to carefully assess and evaluate each step of the study with them. We intend to disseminate the main results to all parents and caregivers from the included subjects, as well as these CSA survivors, and will continue seeking their involvement in the development of a tool and appropriate methods of dissemination.

Results

Baseline characteristics

The baseline characteristics of the study population are shown in table 1. The median age was 5 years (IQR: 4–7). Slightly more girls than boys were included (55% vs 45%) in the total sample and in particular in the suspected group (61% vs 39%).

Table 1

Baseline characteristics study population

Variables	Suspected CSA group (n=39)	Control group (n=39)	Total sample (n=78)
Male, n (%)	15 (39)	20 (51)	35 (45)
Age (years), median (IQR)	5 (3-7)	5 (4-7)	5 (4-7)
Age groups, n (%)
3 years	10 (26)	7 (18)	17 (22)
4 years	8 (20)	7 (18)	15 (19)
5 years	5 (13)	7 (18)	12 (15)
6 years	6 (15)	6 (15)	12 (15)
7 years	1 (3)	6 (15)	7 (9)
8 years	9 (23)	6 (16)	15 (20)

Baseline characteristics study population

Intra-rater and inter-rater reliability per group

Tables 2 and 3 present aggregated intra-rater and inter-rater reliability, respectively, on all items of the verbal, non-verbal and red flag scoring forms in the suspected CSA group, the control group and the total sample, represented by Cohen’s kappa and POA. Intra-rater reliability *kappa could be calculated for 49 out of 52 questions. †kappa could be calculated for 44 out of 52 questions. ‡kappa could be calculated for 204 out of 360 reactions. §kappa could be calculated for 148 out of 360 reactions. ¶kappa could be calculated for 233 out of 360 reactions. IQR, interquartile range; min-max, lowest and highest value; POA, percentage of agreement. Inter-rater reliability *kappa could be calculated for 45 out of 52 questions. †kappa could be calculated for 41 out of 52 questions. ‡kappa could be calculated for 48 out of 52 questions. §kappa could be calculated for 183 out of 360 reactions. ¶kappa could be calculated for 87 out of 360 reactions. **kappa could be calculated for 206 out of 360 reactions. ††Kappa could be calculated for 2 out of 3 questions; therefore, only minimum and maximum values given. IQR, interquartile range; min-max, lowest and highest value.

Verbal scoring form

Intra-rater and inter-rater agreement on the verbal scoring form are almost perfect in both the suspected and control groups (both median Cohen’s kappa 1.00, POA 100). For intra-rater and inter-rater agreement on each of the 52 questions on the verbal scoring form, divided per study group and for the total sample, we refer to online supplemental appendix 3.

Non-verbal scoring form

For the non-verbal form, the median intra-rater Cohen’s kappa and POA were 0.91 and 100, respectively, in the suspected group and 0.92 and 100, respectively, in the control group. The median inter-rater Cohen’s kappa and POA were 0.37 and 97, respectively, in the suspected group and 0.47 and 100, respectively, in the control group. Intra-rater and inter-rater agreement of the non-verbal scoring form on each possible reaction and for each of the 15 pictures per each study group and in the total sample are presented in online supplemental appendix 4.

Red flag scoring form

For the red flag form, the median intra-rater Cohen’s kappa and POA were 0.54 and 87, respectively, in the suspected group and 0.95 and 97, respectively, in the control group. The median inter-rater Cohen’s kappa and POA were 0.37 and 74, respectively, in the suspected group and 0.42 and 77, respectively, in the control group. For results per question divided per study group and in the total sample, we refer to online supplemental appendix 5.

Discussion

The aim of this study was to evaluate the inter-rater and intra-rater reliability of the scoring method of the SKPI, consisting of a verbal, non-verbal and red flag scoring form, in a group of suspected CSA victims and a healthy control group. The intra-rater reliability of the verbal, non-verbal and red flag scoring forms is substantial to almost perfect, except for the red flag form in the suspected group, which is moderate. All median intra-rater POAs showed acceptable agreement for each of the three forms. The inter-rater reliability of the verbal scoring form is substantial to almost perfect, but the non-verbal and red flag forms show only fair to moderate reliability in both study groups. Inter-rater agreement is acceptable for the verbal and non-verbal forms, but the median POA was under the 80% threshold for the red flag form. The interpretation of Cohen’s kappa is arbitrary, as stated in Landis and Koch’s often-cited paper.29 Moreover, Cohen’s kappa depends on the distribution of the item scores, leading to lower kappa values with more skewed distributions, as is the case in many of the SKPI items. Therefore, the POA values may be preferable for determining SKPI reliability. Focusing on the results per item (online supplemental appendices 4 and 5), we notice that agreement varies widely between individual items in both the non-verbal and the red flag scoring forms.30 Therefore, opportunities to improve the scoring method may be found at the level of individual items. For now, simply removing those items that lacked reliability does not seem the best solution, as it may decrease the face validity of the instrument. However, once the diagnostic accuracy of the instrument has been established, it is worth reconsidering this option. Another way to improve the reliability of non-verbal and red flag scoring may be to intensify rater training and to improve manual instructions, in particular with regard to less reliable scoring items. On the verbal scoring form, raters were instructed to tick the box ‘other…’ if there was cause for doubt or, which was most often the case, if, despite the manual instructions, the interviewer was unable to ask the question during the interview. This led to a considerable amount of missing data during the analysis, as can be seen in online supplemental appendix 3. Although the reliability in the CSA suspected group is slightly lower than in the control group for most verbal and non-verbal items, the intra-rater and inter-rater agreement for both forms are generally adequate. On the red flag form, however, the intra-rater reliability is remarkably lower in the suspected than in the control group. This may have been due to the fact that all scoring for this intra-rater analysis was performed by a single rater who was trained once, before she first rated the video recordings. To improve both intra-rater and inter-rater agreement, in addition to one individual training, refresher courses and group training on how to work with the manual should be considered for all raters to ensure consistency in manual use and form scoring. During training at present, an example interview with a child from the control group is shown, and a single practice interview is conducted with a non-abused child. More extensive experience with use of the SKPI, including a practice interview with a child from the suspected group, should therefore also be included in training to improve interviewer and rater skills.

Strengths and limitations

A strength of the present study is its large sample size involving young children with suspected CSA. The study population consisted of a broad spectrum of children, including confirmed cases of CSA, children with high, moderate or low CSA suspicion in the suspected CSA group, and children with no suspicion in the control group. The study groups were analysed separately to evaluate the SKPI reliability in a group that is largely representative of the target population (suspected CSA group). Another strength of this study is the blinding of the first and second rater. Only the first rater, who was also the interviewer, had some knowledge of the child’s background and whether or not CSA was suspected. A study design with one suboptimally blinded rater and one fully blinded rater (as will be the case when the instrument is used in practice) enhances the validity of the results. A limitation is that a single and relatively inexperienced second rater performed the repeated assessments, thus limiting the generalisability of the intra-rater reliability. A further limitation is that all interviewers and raters were female. This was not by design. Despite the use of a structured interview technique, children might have responded differently in interviews conducted by male interviewers.31

Recommendations for practice

When applied by experienced and trained professionals, the SKPI can be used to lower the threshold to start a conversation with a young child on sexually related topics. However, it is very important that video images of the interviews are analysed afterwards and, if necessary, that remarkable verbal and non-verbal reactions are discussed with another (independent) professional. Creating a balance between the preservation of privacy while enabling objective assessment remains a challenge. Taking into account the European General Data Protection Regulation, clear protocols must be developed and adhered to within each medical or psychological institution on how to deal with storage and/or sharing of data.32

Recommendations for research

The diagnostic accuracy of the SKPI will be investigated as a next step in our validation study. In addition, we recommend improving the manual and interviewer training.

Conclusion

The verbal scoring form of the SKPI has adequate intra-rater and inter-rater reliability. The reliability of the non-verbal and red flag scoring forms is suboptimal, requiring improvement of the manual and interviewer training for these forms. In its current form, the instrument can be used to open a conversation with a child suspected of being sexually abused. Due to its clear structure, the SKPI is a relevant additional tool for use in the medical, psychological and forensic field.

Table 2

Intra-rater reliability

Outcome measure	Suspected CSA group	Control group	Total sample
Verbal scoring form (52 items)
Cohen’s kappa, median (IQR)	1.00 (1.00-1.00)*	1.00 (1.00-1.00)†	1.00 (0.96-1.00)
POA, median (IQR)	100 (100-100)	100 (98-100)	100 (98-100)
Non-verbal scoring form (360 items)
Cohen’s kappa, median (IQR)	0.91 (0.79-1.00)‡	0.92 (0.84-1.00)§	0.90 (0.79-1.00)¶
POA, median (IQR)	100 (97-100)	100 (100-100)	100 (99-100)
Red flag scoring form (3 items)
Cohen’s kappa, median (min-max)	0.54 (0.52- 0.55)	0.95 (0.89-1.00)	0.84 (0.64-0.86)
POA, median (min-max)	87 (77-92)	97 (95-100)	92 (89-94)

*kappa could be calculated for 49 out of 52 questions.

†kappa could be calculated for 44 out of 52 questions.

‡kappa could be calculated for 204 out of 360 reactions.

§kappa could be calculated for 148 out of 360 reactions.

¶kappa could be calculated for 233 out of 360 reactions.

IQR, interquartile range; min-max, lowest and highest value; POA, percentage of agreement.

Table 3

Inter-rater reliability

Outcome measure	Suspected CSA group	Control group	Total sample
Verbal scoring form (52 items)
Cohen’s kappa, median (IQR)	1.00 (0.69-1.00)*	1.00 (0.76-1.00)†	0.91 (0.66-1.00)‡
POA, median (IQR)	100 (94-100)	100 (94-100)	98 (95-100)
Non-verbal scoring form (360 items)
Cohen’s kappa, median (IQR)	0.37 (-.03-0.55)§	0.47 (0.22-0.79)¶	0.36 (-0.01-0.53)**
POA, median (IQR)	97 (92-100)	100 (97-100)	97 (94-100)
Red flag scoring form (3 items)
Cohen’s kappa, median (min-max)	0.42 (0.27-0.47)	(0.38-0.52)††	0.51 (0.45-0.61)
POA, median (min-max)	74 (73-87)	77 (72-97)	82 (73-83)

*kappa could be calculated for 45 out of 52 questions.

†kappa could be calculated for 41 out of 52 questions.

‡kappa could be calculated for 48 out of 52 questions.

§kappa could be calculated for 183 out of 360 reactions.

¶kappa could be calculated for 87 out of 360 reactions.

**kappa could be calculated for 206 out of 360 reactions.

††Kappa could be calculated for 2 out of 3 questions; therefore, only minimum and maximum values given.

IQR, interquartile range; min-max, lowest and highest value.

27 in total

1. A global perspective on child sexual abuse: meta-analysis of prevalence around the world.

Authors: Marije Stoltenborgh; Marinus H van Ijzendoorn; Eveline M Euser; Marian J Bakermans-Kranenburg
Journal: Child Maltreat Date: 2011-04-21

Review 2. Psychological trauma and functional somatic syndromes: a systematic review and meta-analysis.

Authors: Niloofar Afari; Sandra M Ahumada; Lisa Johnson Wright; Sheeva Mostoufi; Golnaz Golnari; Veronica Reis; Jessica Gundy Cuneo
Journal: Psychosom Med Date: 2013-12-12 Impact factor: 4.312

3. "They were the ones that saw me and listened." From child sexual abuse to disclosure: Adults' recalls of the process towards final disclosure.

Authors: Maria Larsen Brattfjell; Anna Margrete Flåm
Journal: Child Abuse Negl Date: 2019-01-11

4. Detecting children's true and false denials of wrongdoing: Effects of question type and base rate knowledge.

Authors: Kirsten Domagalski; Jennifer Gongola; Thomas D Lyon; Steven E Clark; Jodi A Quas
Journal: Behav Sci Law Date: 2020-11-25

5. Forensic psychiatry in France: the Outreau case and false allegations of child sexual abuse.

Authors: Paul Bensussan
Journal: Child Adolesc Psychiatr Clin N Am Date: 2011-07

6. Why do child sexual abuse victims not tell anyone about their abuse? An exploration of factors that prevent and promote disclosure.

Authors: Georgia M Winters; Niki Colombino; Sarah Schaaf; Anniken L W Laake; Elizabeth L Jeglic; Cynthia Calkins
Journal: Behav Sci Law Date: 2020-11-29

Review 7. Wrongful Acquittals of Sexual Abuse.

Authors: Thomas D Lyon; Stacia N Stolzenberg; Kelly McWilliams
Journal: J Interpers Violence Date: 2017-03