Literature DB >> 36206230

Mixed-methods analysis of select issues reported in the 2016 World Health Organization verbal autopsy questionnaire.

Erin Nichols¹, Kristen Pettrone^1,2, Brent Vickers³, Hermon Gebrehiwet⁴, Clarissa Surek-Clark⁵, Jordana Leitao⁶, Agbessi Amouzou⁷, Dianna M Blau⁸, Debbie Bradshaw⁹, El Marnissi Abdelilah¹⁰, Pamela Groenewald⁹, Brian Munkombwe¹, Chomba Mwango¹¹, F Sam Notzon¹², Steve Biko Odhiambo¹³, Paul Scanlon³.

Abstract

BACKGROUND: Use of a standardized verbal autopsy (VA) questionnaire, such as the World Health Organization (WHO) instrument, can improve the consistency and reliability of the data it collects. Systematically revising a questionnaire, however, requires evidence about the performance of its questions. The purpose of this investigation was to use a mixed methods approach to evaluate the performance of questions related to 14 previously reported issues in the 2016 version of the WHO questionnaire, where there were concerns of potential confusion, redundancy, or inability of the respondent to answer the question. The results from this mixed methods analysis are discussed across common themes that may have contributed to the underperformance of questions and have been compiled to inform decisions around the revision of the current VA instrument.
METHODS: Quantitative analysis of 19,150 VAs for neonates, children, and adults from five project teams implementing VAs predominately in Sub-Saharan Africa included frequency distributions and cross-tabulations to evaluate response patterns among related questions. The association of respondent characteristics and response patterns was evaluated using prevalence ratios. Qualitative analysis included results from cognitive interviewing, an approach that provides a detailed understanding of the meanings and processes that respondents use to answer interview questions. Cognitive interviews were conducted among 149 participants in Morocco and Zambia. Findings from the qualitative and quantitative analyses were triangulated to identify common themes.
RESULTS: Four broad themes contributing to the underperformance or redundancy within the instrument were identified: question sequence, overlap within the question series, questions outside the frame of reference of the respondent, and questions needing clarification. The series of questions associated with one of the 14 identified issues (the series of questions on injuries) related to question sequence; seven (tobacco use, sores, breast swelling, abdominal problem, vomiting, vaccination, and baby size) demonstrated similar response patterns among questions within each series capturing overlapping information. Respondent characteristics, including relationship to the deceased and whether or not the respondent lived with the deceased, were associated with differing frequencies of non-substantive responses in three question series (female health related issues, tobacco use, and baby size). An inconsistent understanding of related constructs was observed between questions related to sores/ulcers, birth weight/baby size, and diagnosis of dementia/presence of mental confusion. An incorrect association of the intended construct with that which was interpreted by the respondent was observed in the medical diagnosis question series.
CONCLUSIONS: In this mixed methods analysis, we identified series of questions which could be shortened through elimination of redundancy, series of questions requiring clarification due to unclear constructs, and the impact of respondent characteristics on the quality of responses. These changes can lead to a better understanding of the question constructs by the respondents, increase the acceptance of the tool, and improve the overall accuracy of the VA instrument.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36206230 PMCID： PMC9543875 DOI： 10.1371/journal.pone.0274304

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Verbal autopsy (VA) is a method for estimating population-level cause of death information for mortality surveillance purposes, in the absence of physician certification of cause of death or full autopsy as is the case in most low-to-middle-income countries [1, 2]. VA involves a structured interview conducted by a trained interviewer, in which family members or caregivers familiar with the deceased provide information about the signs, symptoms, medical history and circumstances experienced by the deceased at and around the time of death. From this information, a cause of death is determined. Cause of death determination from VA can be done using physician review, an expert-derived algorithm, or a computer-coded algorithm. Over the last 15 years, there has been a push to develop a standardized VA tool in an effort to improve consistency and comparability between countries and to address concerns about the validity of instruments and the comparability of data [3]. In 2007, the World Health Organization (WHO) introduced the first international technical standards and guidelines for VA [4]. The instrument was updated in 2012, 2014 and 2016; the current version is the 2016 WHO VA instrument, which is now used by more than 20 countries and includes questions addressing COVID-19 [5, 6]. When updating or revising a questionnaire, such as a VA tool, the overall objective is to create a shorter and more practical instrument while increasing the value and reliability of its individual questions [7-10]. This can increase the acceptability of VA by respondents and communities, decrease question non-response, and improve the validity and utility of the VA process [11]. Systematically revising a questionnaire, however, requires evidence about the performance its questions. One robust approach to uncovering this evidence is to use a mixed method question evaluation approach. Mixed methods research is a process which integrates information from both qualitative and quantitative sources, leveraging the strengths of each epistemology, to produce a better understanding of findings by compensating for each method’s limitations [12, 13]. Grounded within logical empiricism, quantitative studies examine associations between key factors and outcomes in accordance with observed empirical laws but are subject to bias that can be controlled to some extent by research design. Qualitative studies, grounded within constructivism, use an interpretive process to contextualize findings. Used together, qualitative findings can help explain associations observed in quantitative findings [13]. Greene et al. have described several uses of a mixed methods approach including enhancement or clarification of results from one method with another method, the expansion of an investigation using different methods, and a discovery of new perspectives or contradictions in one method using the results of another [14]. Benitez-Baena and Padilla describe a series of approaches that can be used to mix qualitative and quantitative methods in the evaluation of survey instruments, including the use of experimental designs, within-survey probes, and multi-mode cognitive interviewing, an approach that provides a detailed understanding of the meanings and processes that respondents use to answer interview questions [15]. Others have successfully combined cognitive interviewing with psychometric approaches like Item Response Theory (IRT) to reduce and validate scales or cognitive interviews with probes allowing for greater extrapolation of cognitive findings [16-18]. The purpose of this investigation was to use a mixed methods approach triangulating cognitive interviewing and questionnaire performance findings to evaluate the performance of select questions in the 2016 version of the WHO questionnaire, as part of a broader questionnaire revision effort [1]. Feedback from field teams who use the WHO standard VA questionnaire was collected via a public facing GitHub platform commencing in 2017 [19]. The platform was established for VA users to report and track issues identified during use of the standard questionnaire; the types of issues reported via this platform vary from minor issues in the electronic programming of the questionnaire or suggestions to add hints for clarity to more significant issues including incorrect skip patterns or confusion about questions that are likely to impact the quality of responses. A review of feedback in 2019 identified, 14 problematic issues for which solutions could be well-informed through a mixed methods approach (Table 1). These issues relate to the how questions impact the interview process (for example, causing repetition or unnecessarily lengthening the interview) and quality of responses (for example, where there is lack of clarity on what a question is asking or the question is not answerable by the respondent). VA results from multiple countries provided the quantitative data for this evaluation, while qualitative findings were derived from cognitive interviews of VA respondents in two countries. The results from this mixed methods analysis have identified common themes that may have contributed to the underperformance of questions and are being used to inform decisions around the revision of the VA instrument. More broadly, the application of this methodology can be used in the evaluation of other cross-cultural survey instruments.

Table 1

Series of questions identified for review from feedback from field teams and description of associated issue.

Series	Description of Issue
1. Tobacco Use	Can the series be shortened?
1. Tobacco Use	Can the respondent provide meaningful responses when asked about the number and frequency of tobacco products used?
2. Swallowing	What is the consistency of responses to the questions of “pain” and “difficulty”?
2. Swallowing	Are respondents able to differentiate the constructs of “pain” and “difficulty”? Can or should one question be eliminated?
3. Sores and Ulcers	This question series asks multiple questions about similar though not identical constructs. Are the constructs clearly understood?
3. Sores and Ulcers	Can the question series be shortened?
4. Swelling, Lump, Ulcers, Pits in the Breast	There is potential confusion between the constructs of swelling or lump in the breast and ulcers (pits) in the breast. Are participants able to answer these questions?
	Are both questions needed?
	What are the response patterns by respondent characteristics?
	Are the response patterns different in those with greater familiarity with the deceased?
5. Other Female Health-Related Questions	Are the response patterns to these gendered questions different in those with greater familiarity with the deceased?
6. Medical Diagnosis Questions	Measurement or response error is more likely with questions on diagnosis than with questions on symptoms. Are there response patterns to the symptoms questions that correspond to the medical diagnosis that provide an evidence of response error? What is the respondent understanding and interpretation of the medical diagnosis questions?
7. Vaccinations	The question “Select EPI Vaccines Done” requires the interviewer to know what the complete vaccine schedule is for their country and to assess the vaccination card for completion. With this complexity, there is much room for error. Also, documentation of vaccine status is required for a response to one question in the series; a concern has been reported that for many respondents, this documentation may not be available, because it was thrown away, buried with the child, or otherwise lost. How has this question performed? Can it be simplified?
8. Injury Questions	Is the full VA required for those deceased who clearly died of an injury?
9. Urine	What is the consistency between a “Yes” response to the first order question and a “Yes” response to the more detailed, second order questions? Inconsistencies would flag potential false positives; respondent may not know what urine problems are (e.g., blood in pee)?
9. Urine	For important questions, is it better to ask the specific construct of interest directly and not screen out based on the response to the first order question?
10. Abdominal Problem	There is potential for redundancy and/or inconsistency across this series of questions. Can we shorten this series in any way?
11. Lumps	Are the response patterns different in those with greater familiarity with the deceased?
12. Vomiting	The questions “Did (s)he vomit?” and “To clarify: Did (s)he vomit in the week preceding the death?” are both asked of all respondents. Can one question be eliminated?
12. Vomiting	The question “How long before death did s(h)e vomit?” required clarification. Does it refer to the duration or timing of the vomiting?
13. Violence	There is a concern of under-reporting of suicide for children.
13. Violence	What is the consistency in responses to violence and self-inflicted injury for children?
14. Baby Size	What is the consistency in responses to the series of questions about size and weight?
14. Baby Size	What is the frequency and plausibility of the responses to the reported birth weight?

Methods

Quantitative data collection

Data from VA questionnaires were compiled into two datasets for this analysis—the primary dataset, which included VA interview data from five project teams, and the reference dataset, which included the VA interview data in the primary dataset as well as information on the deceased’s cause of death, as determined by physician-certified VA, which was available from one project team. The primary dataset consisted of de-identified, aggregated VA results using the 2016 WHO VA questionnaire. Project teams that had completed at least 1,000 VAs conducted by field interviewers (typically, though not always, a community health employee) using the 2016 WHO VA questionnaire and who agreed to contribute their data for the exercise submitted data to the analysis team. The following countries and sources contributed to the dataset: Zambia (VAs conducted by the Department of National Registration, Passports, and Citizenship for community deaths “brought in dead” to two mortuaries in Lusaka, Zambia); South Africa (as described below), Kenya (VAs from the Kenya Medical Research Institute/U.S. Centers for Disease Control and Prevention (CDC) Health and Demographic Surveillance Site in Western Kenya), CHAMPS (Child Health and Mortality Prevention Surveillance) and COMSA(Countrywide Mortality Surveillance for Action)-Mozambique. The CHAMPS network focuses on mortality surveillance in children under 5 years of age in sub-Saharan Africa and South Asia [20]. COMSA-Mozambique is a surveillance program in Mozambique that produces and makes publicly available continuous annual data on mortality and cause of death at national and subnational levels within the country [21]. The reference dataset included the VA interview data as in the primary dataset in addition to cause of death information determined by physician review of the VA interview (PCVA or physician certified VA) from the South Africa National Cause of Death Validation Study [22]. With relevance to the reference dataset, there was no difference in how the VA interview was administered for the data in the reference dataset compared to the primary dataset. The reference dataset contributed to analysis for three issues, including the medical diagnosis questions, injury, and violence. De-identified data were transferred by country teams and stored via a secure share-file system with access restricted only to data contributors and analysts. The primary dataset for the quantitative analysis included 19,150 verbal autopsies: 13,736 adults (persons aged 12 years and above), 2,916 children (aged 4 weeks to 11 years) and 2,498 neonates (under 4 weeks old); of these, 10,280 were males and 8,870 were females. The reference dataset, which included cause of death information, contained 5,389 verbal autopsies: 102 neonates, 187 children and 5,100 adults; 2,579 were female and 2,810 were male. Cause of death in the reference dataset was determined using the physician-certified VA [22].

Qualitative data collection

Cognitive interviewing is a qualitative method whose purpose is to evaluate survey questionnaires and determine which constructs the questionnaires’ items capture. The primary benefit of cognitive interviewing over non-qualitative evaluation methods is that it provides rich, contextual data into how respondents interpret questions, apply their lived experiences to their responses, and formulate responses to survey items based on those interpretations and experiences [23]. Thus, cognitive interviewing data allows researchers and survey designers to understand whether or not a question is capturing the specific social constructs they originally wanted and gives insight into what design changes are needed to advance the survey’s overall goal. Cognitive interviews were conducted in Zambia and Morocco in 2019. These sites were selected based on their readiness, given that the VA process had been well established and the field teams were interested in evaluating the performance of the interview process using cognitive interviewing. Staff from the U.S. CDC’s National Center for Health Statistics (NCHS) Collaborating Center for Questionnaire Design and Evaluation Research (CCQDER) trained interviewers selected from the local communities on cognitive interviewing during a week-long, on-site training. Following the training in English, the interviewers recruited cognitive interviewing participants among the VA respondents, such that the selected VA respondents were also the cognitive interviewing participants; no data in the VA interview were changed as a result of cognitive interviewing discussions. In Zambia, recruitment occurred at one of three hospitals in the Lusaka area, whereas in Morocco respondents were recruited from Ministry of Interior offices in the Rabat area where relatives came to certify deaths. Written and verbal informed consent was obtained from all respondents. A total of 149 semi-structured interviews (n = 84 in Morocco and n = 65 in Zambia) were conducted in the native language of the respondents across the two sites via a purposive sample, with adult respondents recruited in order to examine all three of the VA questionnaires (adult, child, and neonate); as a result, 68 respondents received the adult questionnaire (45 in Morocco, 23 in Zambia), 43 received the child questionnaire (20 in Morocco, 23 in Zambia), and 38 received the neonatal questionnaire (19 in Morocco and 19 in Zambia). The interview structure consisted of respondents first answering the VA questions and then answering a series of follow-up probe questions that reveal what respondents were thinking and their rationale for that specific response. While there was a selection of questions that all the interviewers probed on, based on problematic areas identified during VA implementation in Morocco and Zambia, they were also given free rein to probe on any other questions they thought the respondents did not understand or questions they thought might have elicited response errors during the VA interview. Most interviews lasted approximately 30–60 minutes beyond the normal VA interview. Cognitive interviewers recorded their initial interview notes in the language of their preference, then entered their notes in English into CDC’s Q-Notes software, which is a qualitative analysis program designed specifically for the storage and analysis of data from cognitive interviews [24]. Interviews were conducted over a period of six months in Zambia (January-June 2019) and one year in Morocco (February 2019- February 2020). CCQDER researchers monitored data collection and quality via Q-Notes and communicated with the field teams when necessary to provide direction and assistance. Once all interviews were complete, CCQDER staff analyzed the interview notes and summarized the findings using an iterative five-step synthesis and reduction process: conducting interviews, producing summaries, comparing across respondents, comparing across subgroups of respondents, and reaching conclusions [23, 25, 26]. As is common across cognitive interviewing studies, this analysis process uncovered the patterns of interpretation used by the respondents to comprehend, judge, and respond to the various survey items under study. This activity was reviewed by CDC and conducted consistent with applicable federal law and CDC policy as an institutional review board–exempt public health surveillance evaluation. Approval for cognitive interviewing and qualitative data collection was obtained from the University of Zambia Biomedical Research Ethics Committee and the Mohammed V University Comité d’Éthique pour la Recherche Biomédical de Rabat.

Analysis

This investigation included mixed methods analysis of secondary data collected using the 2016 WHO VA questionnaire together with cognitive interviewing results. All data were de-identified by the contributing teams prior to submission for analysis. Quantitative results provided information on performance of the series of questions related to the 14 selected issues, including response pattern analysis and the association of respondent characteristics with response patterns. Frequency distributions and cross tabulations of quantitative data were run to compare response patterns among related questions. The impact of respondent characteristics on ability to provide substantive responses was evaluated by calculating prevalence ratios, 95% confidence intervals and p-values using the chi-squared test with p<0.05 considered significant. “Yes” and “No” were classified as substantive responses while “Don’t Know” and “Refused” were classified as non-substantive responses. For respondent characteristics, close family members were categorized as being a sister, parent, child, or spouse. The analysis was run using the SAS v.9.4 statistical software system (SAS Institute, Cary, NC, USA). Qualitative cognitive interviewing results for series of questions related to the 14 select issues provided insight on the ways in which the question was interpreted by various groups of respondents, the processes that respondents utilized to formulate a response as well as any difficulties that respondents might have experienced when attempting to answer the question. Qualitative and quantitative results were triangulated iteratively and integrated into summary findings. Results and summary findings from each issue were then analyzed to identify commonalities or themes that may have contributed to the underperformance of the question and to make recommendations for improvement.

Results

From the application of the mixed methods analysis to the 14 issues (Table 1) flagged by end-users of the 2016 WHO standard questionnaire, we identified four broad themes contributing to the underperformance of or redundancy within the instrument: question sequence, overlap or redundancy within the question series, questions outside the frame of reference of the respondent, and questions or concepts needing clarification. Each of these constructs will be discussed in more detail with select examples drawn from the 14 pre-identified issues to demonstrate the application of various analysis types. A full description and analysis of each of the 14 issues is available in S1 File.

Question sequence

Question sequence addresses the ordering of questions within the questionnaire and the application of skip logic to a question series. In the WHO 2016 tool, first-order questions are questions required of all respondents, and second-order (and subsequent order) questions are delivered dependent on the response to the first order question. For example, a “Yes” response to a first-order question might then trigger second- and third-order questions exploring this “Yes” response in further detail, whereas those who answered “No” to this first order question, would not be subject to the second-order questions. Question series that might be overly lengthy or capture redundant information might benefit from application of a skip logic or reordering of questions in order to shorten the instrument and improve acceptability of the interview by respondents as well as accuracy of the responses. From the 14 issues, one question series identified as potentially unnecessarily lengthy is that addressing injuries. The injury series starts with the first order question, “Did (s)he suffer any injury or accident that led to his or her death?” If “Yes”, this question is followed by a series of second- and third-order questions investigating the nature of the injury or accident. Regardless of whether the respondent indicated the presence of an injury, they will also be asked the remainder of the required questions in the questionnaire ascertaining other non-injury related signs or symptoms (such as cough, headache or vomiting). This may lead to unnecessarily lengthy interviews if the deceased clearly died of an injury with no other signs or symptoms. Some reasons, however, to ask subsequent questions after indication of death by injury include to determine if the death was maternal-related or to determine if the injury was caused by an underlying medical condition. From the primary dataset, 10% (n = 1,919) of respondents reported an injury. From the reference dataset, 85% (n = 582) of those reporting an injury were determined to have an injury as the underlying cause of death (UCoD). The median number of affirmative responses to non-injury related symptom questions was 0.8 (IQR: 0–1) among those with an injury UCoD compared to 3 (IQR: 1–4) among those without an injury UCoD. The fewer number of affirmative responses to non-injury symptom questions among those assigned an injury cause of death suggests the questionnaire might benefit from shortening or application of skip logic if the respondent indicates the presence of an injury rather than be required to complete the full questionnaire.

Redundancy

Within the WHO questionnaire, there are questions that may ascertain identical or redundant information. Evaluation of the series of questions related to seven of the 14 issues demonstrated similar response patterns to two or more questions within each series which captured overlapping information: tobacco use, sores, breast swelling, abdominal problem, vomiting, vaccination, and baby size. Details of the analysis for two of these issues—tobacco and vomiting—are provided below as an example. In the tobacco series, the two first-order questions “Did (s)he use tobacco” and “Did (s)he smoke tobacco?” ask similar information and demonstrated >95% consistency in “Yes” and “No” responses to the two questions among respondents (S1 File). In the cognitive interviewing sample, all of those who reportedly used tobacco, smoked it. There was no one who used tobacco in some other way that was not smoking. However, answers varied for those who had quit smoking. Some answered “Yes” to the two first-order tobacco questions, and others answered “No”. The qualitative cognitive interviewing results also suggested that the similarity of questions in the tobacco series was confusing or seemed repetitive to respondents; as noted by one respondent in response to the questions of whether they smoked tobacco and what kind of tobacco was used: “He used to smoke cigarettes as I said”, the respondent answered. The question series addressing the symptom vomiting begins with two first-order, required questions “Did (s)he vomit?” and “To clarify, did (s)he vomit in the week preceding death?”, followed by the second order question “How long before death did (s)he vomit?”. Given the two first order questions are ascertaining similar information, one of these first-order questions could potentially be eliminated. Further, two of the three questions address the timing of the vomiting, suggesting an element of redundancy. The frequencies of affirmative and negative responses to the two first-order questions were similar: 97% of respondents who reported the deceased vomited in the week before death also answered “Yes” to the first question, “Did (s)he vomit?”; 98% of respondents who answered “No” to the first question also answered “No” to the second question (Table 2) suggesting good capture of the symptom of vomiting with only one of the questions. From the qualitative review, addressing the timing of the vomiting, most respondents understood the first question, “Did (s)he vomit?”, as asking about whether or not the decedent vomited in the immediate period before death, which varied from the hours before death to a few weeks prior to death.

Table 2

Crosstabulation of responses to questions, “Did (S)he vomit?” and “Did (s)he vomit in the week before death?”

	Did (s)he vomit in the week before death?
	n
	Row%
	Col%
Did (s)he vomit?	Yes	No	DK	Ref	Total
Yes	3,127	1,080	72	3	4,282 (38%)
	73%	25%	2%	<1%
	97%	14%	32%	50%
No	82	6,700	32	0	6,814 (60%)
	1%	98%	1%
	3%	86%	14%
DK	2	56	122	1	181 (2%)
	1%	31%	67%	1%
	<1%	<1%	54%	17%
Ref	0	0	0	2	2 (<1%)
				100%
				33%
Total	3,211 (28%)	7,836 (70%)	226 (2%)	6 (<1%)	11,279*

Data source: primary dataset (n = 11,279)

* Both questions were not asked by some countries/regions

Data source: primary dataset (n = 11,279) * Both questions were not asked by some countries/regions

Frame of reference

Level of familiarity and experience with the deceased may impact a question response. Questions that are outside the frame of reference of the respondent may affect the ability to provide an accurate answer. The level of familiarity of the respondent with the deceased was evaluated in two ways: 1) measuring the association of response patterns with the questions “What is your relationship to the deceased?” and “Did you live with the deceased?” and 2) measuring the percentage of “don’t know” responses for a relevant question. Concerns relating to frame of reference of the respondent were explored for four of the 14 issues–the female health related questions, lumps, tobacco, and baby size. In evaluating six of the female health related questions, close family member respondents were significantly less likely than other respondents to provide a “Don’t Know” or “Refused” response for two of the six questions, “When she had her period, did she have vaginal bleeding in between menstrual periods?” (PR 0.71, 95% CI 0.64–0.80) and “At the time of death was her period overdue?” (PR 0.74, 95%CI 0.61–0.90) (Table 3). Further, a respondent who lived with the deceased was also less likely than a respondent who did not live with the deceased to provide a “Don’t Know” or “Refused” response for five of the six questions. In the series of questions related to lumps, respondents who were close family members or lived with the deceased were less likely to provide a “Don’t Know” or “Refused” response to the first-order question about the presence of lumps (Close Family PR 0.51, 95%CI 0.42–0.63; Lived with Deceased PR: 0.32, 95%CI 0.26–0.41), while living with the deceased demonstrated a lower PR of non-substantive response to the question about the presence of breast swelling (PR 0.51, 95%CI 0.32–0.74) (S1 File).

Table 3

Prevalence Ratios (PR) of “Don’t Know” or “Refused” responses among respondents who were close family members of the deceased compared with other relationship to the deceased and lived with the deceased compared with did not live with the deceased.

Question	Percent of Don’t know or Refused Response (among all respondents)	Close Family vs Other	Lived with Deceased vs Did Not Live with the Deceased
		PR	PR
		(95% CI, p value)	(95% CI, p value)
Did she ever have a period or menstruate?	3%	0.81	0.48
Did she ever have a period or menstruate?	3%	(0.6–1.1, p = 0.13)	(0.35–0.65, p<0.001)
When she had her period, did she have vaginal bleeding in between menstrual periods?	26%	0.71	0.77
	26%	(0.64–0.80, p<0.001)	(0.67–0.89, p<0.001)
Was the bleeding excessive?	12%	1.31	0.56
Was the bleeding excessive?	12%	(0.7–2.4, p = 0.33)	(0.28–1.11, p = 0.1)
Was there excessive vaginal bleeding in the week prior to death?	6%	1.23	0.54
	6%	(0.9–1.5, p = 0.5)	(0.40–0.72, p<0.001)
Did her menstrual period stop naturally because of menopause or removal of uterus?	6%	1.12	0.48
	6%	(0.9–1.4, p = 0.5)	(0.36–0.64, p<0.001)
At the time of death was her period overdue?	26%	0.74	0.70
At the time of death was her period overdue?	26%	(0.61–0.9, p<0.05)	(0.58–0.84, p<0.001)

Data source: primary dataset

Data source: primary dataset Similarly, the level of detail sought in a question, such as frequency or weight, may be unknown or outside the frame of reference for a respondent. For example, 63% (n = 1,516) of respondents who reported the deceased smoked were not able to state the number of cigarettes the deceased smoked per day. Forty-one percent (n = 1,187) of responses to the birth weight question were unknown or implausible (< 100 or > 6,000 grams; many weights were recorded as < 100 grams, suggesting they were possibly recorded in units of kilograms instead of the requested units of grams). (S1 File).

Clarity of construct

Clarity of construct refers to the respondent’s ability to understand the terminology used in the question as well as the correct association of the construct intended in the question with that which was interpreted by the respondent. In the first case, quantitative analysis showed a lack of consistency in response patterns for questions seeking similar information but using different terminology. For example, 45% of respondents who reported the presence of a pit or ulcer on the foot reported the presence of sores. Nineteen percent (n = 7) of respondents who reported a birth weight >4500 grams reported “yes” to the questions “At birth, was the baby larger than usual?”. Sixteen percent of respondents (n = 79) who reported a diagnosis of dementia in the decedent reported the presence of mental confusion (S1 File). Incorrect association of intended constructs was observed among the medical diagnosis questions, such as those inquiring if the deceased had ever been diagnosed by a healthcare provider with dengue fever or a stroke. Qualitative cognitive interviewing results suggested two general patterns in which respondents evaluated the health conditions of the deceased: A) Medical diagnoses from a health professional or B) symptoms perceived to be related to the condition, as shown in Fig 1.

Fig 1

Respondent cognitive pattern for health condition questions.

For example, some respondents confused or conflated the disease under question with another condition such as confusing dengue fever with other diseases such as yellow fever, malaria, and sickle cell anemia (pattern A2 from Fig 1). Other respondents based their answer on whether or not the decedent displayed any symptoms that they understood to be related to the condition, such as breathlessness with a diagnosis of COPD (pattern B in Fig 1). In some cases, the respondents did not know the condition or the symptoms, but still gave a “Yes” or “No” response.

Discussion

In this investigation, we present the use of a mixed methods analysis in the evaluation of a standardized VA questionnaire. We present the results of both quantitative and qualitative analysis that can be used to inform decisions around the revision of the instrument to improve its overall accuracy and utility. Questionnaire revision decisions cannot be made from this evidence alone, as many other factors need to be taken into consideration when formally changing an instrument. From the application of the mixed methods methodology, however, we have described four broad themes that may contribute to the underperformance of 14 identified issues within the instrument: question sequence, redundancy, frame of reference and clarity of construct. Overly lengthy questionnaires can lead to survey fatigue and decrease the acceptance of the tool among respondents [27, 28]. Application of skip or branching logic can provide opportunities to shorten a questionnaire dependent on the responses. In the injury series, the median number of affirmative responses to non-injury related symptom question was three times lower among those assigned an injury cause of death compared to those who were not. These findings suggest that deaths with indication of injury most often have no other symptoms to report and eliminating the rest of the questionnaire or using an abbreviated set of symptom questions to screen for non-injury or maternal causes after report of an injury could be applied. The inclusion of questions seeking redundant or overlapping information can impact the answers provided by the respondent. In general, when respondents are confused about similarity among questions, such as in the tobacco section, they attach meaning to this phenomenon. Possible meanings include “I must have misunderstood one of the questions” and “they are trying to trick me”, which can lead to response errors. When considering eliminating a question in a series, the level of agreement between questions can yield useful information. Similar response patterns among questions ascertaining overlapping information suggest the feasibility of eliminating a question such as in the vomiting series. Likewise, questions with similar but distinct terminology can results in incorrect association of intended constructs, as was observed among the medical diagnosis questions. Clarity of the construct of interest can be supported by strategic organization of questions to provide a questionnaire flow that aids the respondent in tracking the distinct constructs of interest. More routinely, the open narrative that is typically also collected during the VA, where the respondent explains in their own words the circumstances of death, can also be used to verify information reported on related constructs in the “closed” section of the questionnaire. Open narratives have been reported as a way to build rapport with respondents, improving the ability to collect quality information [29]. However, further work is needed to understand how best to optimize the use of the open narrative in the VA questionnaire and in the cause of death assignment and quality control processes. Choosing the right respondent is key in the application of a VA [1]. Level of familiarity with the deceased can affect the accuracy of the responses provided by the respondent. In our investigation, gendered questions, such as the female reproductive health questions, yielded more substantive, “Yes” or “No”, responses when provided by respondents who were either close family members or lived with the deceased—the two respondent characteristics that were evaluated in this work. Understanding this difference in responses dependent on respondent characteristics is key when addressing culturally sensitive or gender-specific topics such as deaths in women of reproductive age. The most recent versions of the WHO VA questionnaire (v1.5.2 and v1.5.3) contain a question on the sex of the respondent which may provide additional information about the impact of respondent characteristics on the usefulness of responses [1]. Though not explored in this work, the relationship of the interviewer to the respondent and the community in which they are conducting VA interviews is also a known factor impacting VA performance [30]. Cognitive interviewing provided valuable insight into disparities observed in the quantitative analysis. Identification of unanticipated response patterns through the quantitative analysis highlighted the need for a better understanding of respondent knowledge and interpretation of the question construct. A review of cognitive interviewing results for questions capturing similar information, such as birth weight and medical diagnoses, but having differing response patterns, has elucidated patterns of interpretation of the construct that may lead to response errors of false positive responses. This information could be used in the questionnaire revision process to rephrase constructs within a question in order to improve accuracy. Our investigation was subject to several limitations. With the exception of some CHAMPS data from Bangladesh, the available VA data only represent sub-Saharan Africa. Considering the variation in epidemiologic patterns and cultural practices across regions—for example types of tobacco used most frequently—the response patterns analyzed in this investigation may not fully represent those that would be expected in other regions. Also, there is variation in the way the final VA instrument is applied in given settings, due to different versions of the 2016 questionnaire being used, or other modifications that teams make, which may impact the electronic skip pattern and response frequencies. When using these quantitative and qualitative findings to revise an instrument, it is imperative to consider question requirements as well as impact of instrument changes on the question weightings used by the various automated algorithms. Alterations in the questions, question series or question elimination can affect the input into the decision matrix and the determination of the underlying cause of death from these algorithms. Furthermore, it should be noted that this investigation did not explore the specific impact of linguistic differences, translation, or other culturally appropriate adaptations that are often made during VA implementation. While the open discussion between the respondent and the cognitive interviewer aims to capture an understanding of the respondent’s cognitive processes compared to the intent of the questions, further work is needed to fully understand variations across culture and language. Finally, the sample size did not permit analysis of differences in results based on country or age group of the deaths; additional data would facilitate deeper analysis of such comparisons. While this work demonstrates how mixed methods analysis can be used to improve VA processes, it also highlights the many additional areas in which VA methods—including the questionnaire, the interview process, and the assignment of cause of death—can continue to be improved. In addition to further work to optimize the use of the open narrative and to understand the impact of cultural and linguistic variations on VA, VA methods can also be advanced through the ongoing collection of geographically and epidemiologically representative reference deaths, against which VA data can be evaluated and knowledge of the symptom-cause relationship can be improved. Findings from this investigation provide supporting evidence for the revision of the 2016 WHO VA instrument; specific recommendations and considerations based on the complete analysis are included in the Supplemental file (S1 File). Quantitative and qualitative analysis results can identify series of questions which could be shortened through elimination of redundancy, series of questions requiring clarification due to unclear constructs, and the impact of respondent characteristics on quality of responses. These changes can lead to better understanding of the question constructs by the respondents, increase in acceptance of the tool, and improvement in overall accuracy of the VA instrument. These findings also support the need for the selection of an appropriate respondent to the questionnaire in order to maximize accuracy of responses particularly in culturally sensitive topics and diagnoses. The integration of quantitative and qualitative data sources used in this mixed methods approach have identified areas of underperformance of the questionnaire and provided evidence to inform improvement efforts. This application has shown the mixed methods approach to be a useful methodology which can be applied in the evaluation of multiple platforms including questionnaires, surveys, and other information-gathering tools.

Mixed-methods analysis of select issues reported in the 2016 WHO VA questionnaire: Summary report.

(DOCX) Click here for additional data file. 17 Jun 2021 PONE-D-21-12120 Mixed-Methods Analysis of Selected Issues Reported in the 2016 World Health Organization Verbal Autopsy Questionnaire PLOS ONE Dear Dr. Pettrone, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 01 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Prof. Ritesh G. Menezes, M.B.B.S., M.D., Diplomate N.B. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. 3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly Reviewer #3: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No Reviewer #3: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This is an interesting approach to trying to determine target areas for refining the WHO's 2016 VA tool, and the use of mixed-methods is an appropriate design. Overall, some more detail in the methods, especially around the qualitative data is needed, and there are some areas that would benefit from more discussion (language, and the interviewer perspective). Abstract: - Can the setting be included in the methods? And a clarification about the age range for the VAs in question. - For someone who will only read the abstract, I don't think it will be clear what you mean by item series and constructs. Appreciate the space is limited, but adding a very brief definition of these would really help readability here. Background: - "Feedback from end users of the WHO 2016 VA instrument has been compiled" - this needs a bit more information, as it wasn't totally clear to me whether this had been done before, who the end users were, were they from different settings to the current study etc. Given this is the entry point to the study, more clarity is needed. - Given the comment above, the background on mixed-methods could be greatly reduced (as this is a common methodology), to keep this section concise. E.g. just keeping the content about mixed-methods for survey development. Methods: - I just wanted to clarify - so all the VA data was collated, and then cause of death was assigned independently by clinicians from South Africa? Did none of the data sources already have causes of death assigned, and if so, what did you do about conflicts? This process was not totally clear. - Can you explain how (and why)the sub-sample was generate for cause of death - was it random? Why was it not stratified by age-group? - Can you explain why Zambia and Morocco were chosen? - What country is CCQDER based in? 'National' is not so informative for international readers! - For the cognitive interviews, can you say something about who the 'local researchers' are (and consider re-phrasing this term), and languages e.g. were notes taken in English? Or other languages and translated? How was nuance of language/translations/understanding dealt with, given the cognitive interviewing methodology. - "The qualitative cognitive interviewing results were analyzed using typical qualitative analysis methods" - can you be more explicit and state the approach used? Also, who conducted the analysis, and how did you think about validity? - For triangulation, from reading it sounds like the analysis was done iteratively, so triangulation was done during the analysis phase, and not in the interpretation phase - it would be good to state this clearly (i.e. a concurrent iterative triangulation mixed-methods approach?). Results: - Figure 1 appeared to be distorted or missing information. Discussion: - When considering duplication of information (or redundancy of questions), there have been previous reports from fieldworkers that they found open narratives useful as a way to cross-check information then given to closed questions. It would be worth discussing this wider point, about recall and reliability of data, and whether having some 'checks' in the questionnaire would always be negative. I.e. could these actually be turned into quality control flags? - Its also worth raising in the point about who the respondent is, to consider who the interviewer is, and how this also plays a role in data quality. There is some literature on this around insiders versus outsiders conducting VAs. - On the point that nearly all data was from the African region, the discussion on tobacco could include a clarification, given smokeless tobacco use is most common in the Asian region (https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-020-01677-9). Reviewer #2: The authors present important evaluation of the WHO VSA identifying areas where improvement can be made in this widely used tool. Given the importance of identifying causes of death as countries work to reduce preventable mortality on the road to achieving effective UHC and meeting the goals in the SGDs. The manuscript identifies 4 areas where change is needed based on unsolicited input from users (unable to assess that data source as the link is broken) and analysis of results from 2 data sets of quantitative results. The authors however then describe a mixed methods, which is a weak part of an otherwise strong paper. The description of the analysis (“typical qualitative analysis methods” which is insufficient and the references are about cognitive interviewing (one on analysis)) and requires more detail (see COREQ for describing qualitative analysis). In addition it is difficult to identify where the results were used beyond the analysis of the already submitted input. It would also be important to understand the process through which the countries undertook linguistic and cultural translation and if there were differences in the results based on country. More specific details are below. Also-data are plural and should be corrected. Introduction: Why is the mixed methods approach “novel”? This is hard to interpret given the paucity of description More details on how the feedback was obtained (line 117). The link to reference 19 is broken Methods: see above. A description of how the 2 different VSA datasets were used as this does not emerge in results. In Table 1-was this only from the submitted comments or also done through whether the new qualitative data or from analysis of the VSA results to identify additional areas beyond the 14? Results: Overall well described but some language is hard to follow (such as the discussion of the vomiting questions-what was the conclusion-which to drop? It would also be helpful for others such as respondent type to know if and how much value of still asking (the PR shows relative value but what % of non-close relatives for example, did answer yes or no”. Similarly (from discussion) -is there a potential difference behind accuracy (correct answer) and assurance (willing to say yes/no even if they do not know due to sense they should know) The information for example on responses to birth weight-what is ‘too low” a definitive response rate to warrant dropping for that respondent type. For the clarity of construct-where there were differing responses, is there anything from the second VSA data set to identify which questions was closer to reality?. Figure 1 is also a bit hard t understand Discussion line 374-377: It was unclear what cognitive evaluation was done and what patterns could results in false (+) In the limitations, the comments about mixed methods is hard to interpret given the issues around description of methods and results noted above. Also there is no discussion at differences between processes for linguistic and cultural translation as well as the cultural adaptations needed. Reviewer #3: Mixed-Methods Analysis of Selected Issues Reported in the 2016 World Health Organization Verbal Autopsy Questionnaire PONE-D-21-12120 OVERALL COMMENTS Thank you for the opportunity to review this important paper advancing the international VA standard interview towards a format that is more amenable to widespread application. The paper reports an important and original contribution and is worthy of publication. There are several overall and specific comments below, all relatively minor but may strengthen the paper in terms of clarity and meaning to readers unfamiliar with VA. Overall, I found several key pieces of information to be slightly opaque. In the description of the ’14 issues’ it would be useful to summarise what these issues were – i.e., that they relate to process (generally shortening the interview) and substantive issues (confusion with constructs/meaningful responses). The selection/origins of the data and sequence of the analysis were also not entirely clear e.g., whether the 14 issues from VA end-users is part of or separate to this analysis. These points could be introduced and explained with more clarity, and earlier in the paper. Moreover, some attention to where the ’14 issues’ came from; what a cognitive interview is; what the qualitative work did (content of the interview); and, in the abstract, an overview of the substantive findings. A statement on objectives was also missing from the abstract and paper. This could be e.g., related to identifying: ‘underperformance or redundancy [of items] within the instrument’ (page 17, line 204) and ideally connected to higher order aims ‘to shorten the instrument and improve acceptability of the interview by respondents as well as the accuracy of the responses’ (page 18, lines 218-9) or ‘to inform decisions around the revision of the instrument to improve its overall accuracy and utility’ (page 24, lines 338-9). Consistency with key terms would also be useful. There was some variety in terms used especially for the qualitative elements (e.g., cognitive testing results and typical qualitative analysis methods) consistency with these would be preferable. I hope these comments are of use in revising and clarifying some key aspects. Finally, it should be noted that I am not a quantitative methodologist and a reviewer with skills in this area should also review the paper. SPECIFIC COMMENTS ABSTRACT 1. Page 10, line 56-7: It would be useful to summarise what the issues are, and provide a stated objective. 2. Page 10, line 59: It would be useful to indicate where the VAs were drawn from. 3. Page 10, line 60: How is the quality of responses defined? 4. Page 10, line 61: For the unfamiliar reader, it would be useful to understand what a ‘cognitive interview’ is. 5. Page 10, line 62: It would be useful to understand, in the abstract, the settings from which the data were drawn. 6. Page 10, line 63: It would be useful to understand how identification of common themes relates to the overall aims and objectives. 7. Page 10, line 67: As per comment no. 1, it would be useful to have a summary of what the previously identified issues are. The sentence ‘Two of the 14 question series identified issues related to item sequence; seven demonstrated similar response patterns among questions within each series capturing overlapping information’ suggests what the issues relate to, but is not entirely clear. 8. Page 10, lines 69-70: Is it possible to report the respondent characteristics? Similarly on constructs outside the frame of reference? 9. Page 11, lines 74-8: As per comments above, the substantive results feel lacking. Some of this content could be used to develop aims and objectives. INTRODUCTION 10. Page 12, line 99: It might be useful, for the unfamiliar reader, to understand what is meant by ‘item and unit’. 11. Page 12, line 105: Again, for the unfamiliar reader, it might be useful to briefly summarise how different philosophical positions on truth and knowledge underpin different methodologies. Does this only improve reliability? 12. Page 12, line 112: As above, please include a brief description of ‘cognitive interview’. 13. Page 12, line 117: Who are the ‘end-users’? 14. Page 12, line 118: As per comments on abstract, please provide a summary of the 14 issues, from whose perspectives and using what approaches these issues were identified. 15. Strongly suggest that the authors articulate aims and objectives. METHODS Quantitative data collection 16. Page 13, line 129: (And throughout) consistency with abbreviations needed. 17. Page 13, paragraph 2: Great to understand the settings from which data derived. It would be useful to understand how and why data from these countries were included. 18. Page 14, lines 148-53: A short explanation of the relationship between the primary and reference datasets would be useful. Qualitative data collection 19. Page 14, line 156: As above – a) why these settings? And b) what is a cognitive interview? 20. A description of what the cognitive interview sought information on would be useful to include. The authors may also wish to report on key information such as: How long did the interview take? Was it structured/semi-structured? Is the interview guide available? How many interviews were done in each setting? Analysis 21. Page 15, line 181: As above, who are end users? 22. Page 15, lines 184-5: Great to have the 14 problematic areas, a description of this could come earlier, however. It is also not clear whether the 14 issues from VA end-users is part of or separate to this analysis. 23. This section might usefully be revised to state specifically the aspects being assessed, how these relate to the ’14 issues’ and how the assessment allowed the issue to be addressed. 24. Page 16, line 193: what does ‘typical qualitative analysis methods’ mean? Details on the specifics of the analytical approach, and why the approach was appropriate would be useful to include. 25. Page 16, lines 194-5: ‘cognitive interviewing data’ – does this mean qualitative data? It is not clear why end-users (presumably administrators of VA) would report the same or similar issues to VA respondents. Specifics of the quantitative analysis performed on the inductive analysis would be useful to include. 26. Page 16, line 197: ‘underperformance of the item’ gives some sense of the overall objective and how the analysis contributed to achieving, however this could be brought out more clearly. 27. Table 1 – please number the 14 items. In the description, it might be useful to summarise that these relate to process (repetition, response patterns, or shortening of the interview) and substantive issues (confusion with constructs/consistent and meaningful responses). As above, this could be introduced and explained with more clarity, and earlier in the paper. RESULTS 28. Page 18, line 203: Consistency with ‘concepts’ and ‘constructs’ in reference to items in the interview would be useful. Considering much of the analysis refers to respondents’ understanding of constructs, the authors may wish to refer to ‘four broad themes’, here. 29. Page 18, lines 204-5: The authors may wish to indicate that ‘overlap within the item series’ is understood as ‘redundancy’. Redundancy 30. Page 19, line 243: some explanation of ‘seven of the question series’ would be useful to include. 31. This section opens with a statement about question series on tobacco, sores, breast swelling, abdominal problem, vomiting, vaccination, and birth weight. It is not clear why results of the analysis of response patterns are presented in detail for one of these (vomiting) in an appendix for another (tobacco use), triangulating with the qualitative analysis for one (tobacco use) and not for the others. 32. Page 21, line 272: does ‘cognitive testing results’ mean qualitative analysis? Various terms are used for this element of the analysis, which may not be entirely clear to readers. Frame of reference 33. Page 21, line 289: Again, ‘question series’ would be useful to describe to the unfamiliar reader. 34. Table 3: it would be useful to understand why PRs are presented for 6 questions. What about the others? Clarity of construct 35. Page 23, lines 312-4: The difference between the two elements of clarity of construct is unclear. 36. Page 23, lines 314-5: The sentence ‘In the qualitative analysis, items seeking similar information but using different terminology, or items having overlapping constructs demonstrated differing response patterns’ is slightly unclear, suggest revise in the active voice. 37. Page 23, line 323: Again, the term ‘cognitive testing results’ is used. This term is only introduced in the results section. It is perfectly acceptable to use the term, however it should be introduced and described in the methods section and used consistently thereafter. 38. As above, the triangulation and choice of specific results presented is unclear. DISCUSSION 39. Page 24, line 341: See point above, the authors may wish to consistently refer to themes from the mixed methods analysis. Various reference to constructs and concepts may be confusing for readers. 40. Page 25, paragraphs 1-2: As above, were these the findings of note from the item sequence analyses? ‘Such as’ indicates there were others. 41. Page 25, paragraph 3: Was there any attempt to examine response patterns by respondent type? Or by setting? 42. Page 26, lines 371-4: As above, consistency with key terms – ‘cognitive testing’, used for the first time in the results and frequently thereafter, and here for the first time, ‘cognitive evaluation’, and prior with qualitative analysis could be confusing for readers unfamiliar with these methods. Introducing and explaining key terms in the methods section, and carrying these through the paper consistently would further strengthen the reporting of the research process and findings. 43. The discussion could include some attention to the wider debates on VA. How does, for example, this research contribute to the methodological transition of the method? 44. Page 26, paragraph 2: The limitations are useful and relate to some comments above on how study settings were selected, and where data were drawn from, which, if raised in the methods, could be critically reflected on here. The authors may also wish to consider strengths of the approach, and future directions. Also, on page 27 (line 397) the approach is described as novel. It would be useful to understand what type of research or other information has informed previous iterations of the instrument, and how this approach is new/contributes to what has gone before. 45. Page 26, paragraph 2: While it is more customary in qualitative research, the authors may wish to reflect on their positionality and how this influenced the research process and results. 46. Pages 26-7, lines 389-99: This reads as a useful conclusion. Is this section required for this type of paper, in this journal? 47. Pages 26-7, lines 389-99: Does the statement ‘Questionnaire revision decisions cannot be made from this evidence alone’ (page 24, line 339) contradict the subsequent statement ‘Findings from this investigation provide supporting evidence for the revision of the 2016 WHO verbal autopsy instrument.’ (page 26, lines 389-90)? See above, the authors may wish to consider articulating a series of directions for future research to inform decisions on revision of this instrument. END OF REVIEW ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Lucia D'Ambruoso [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 5 Jan 2022 Response to Reviewers Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf DONE 2. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. Regarding the availability of the data, indeed, we are only able to consider data sharing by request. While the data are de-identified, some of the data are sourced from sensitive data that contribute to vital statistics. The data have been provided by teams for this analysis with the agreement that the data will only be used by the agreed collaborators for the purpose of contributing to the improvement of the WHO verbal autopsy questionnaire. Data contributors agreed to the publication of aggregate findings and are co-authors of the paper. We can address any requests for the data individually; requests can be directed to myself (corresponding author), and we will work with co-authors/data contributors as needed to seek the appropriate permissions if and as need. 3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. DONE Reviewers' comments: Reviewer's Responses to Questions Comments to the Author – responses provided in italics 5. Review Comments to the Author Reviewer #1: This is an interesting approach to trying to determine target areas for refining the WHO's 2016 VA tool, and the use of mixed-methods is an appropriate design. Overall, some more detail in the methods, especially around the qualitative data is needed, and there are some areas that would benefit from more discussion (language, and the interviewer perspective). Abstract: - Can the setting be included in the methods? And a clarification about the age range for the VAs in question. – details have been added - For someone who will only read the abstract, I don't think it will be clear what you mean by item series and constructs. Appreciate the space is limited, but adding a very brief definition of these would really help readability here.- have reworked how these points are made to be more clear (throughout) Background: - "Feedback from end users of the WHO 2016 VA instrument has been compiled" - this needs a bit more information, as it wasn't totally clear to me whether this had been done before, who the end users were, were they from different settings to the current study etc. Given this is the entry point to the study, more clarity is needed. – Have added clarification that end users are any field teams that use the VA questionnaire; have also added detail about the platform used to compile feedback. - Given the comment above, the background on mixed-methods could be greatly reduced (as this is a common methodology), to keep this section concise. E.g. just keeping the content about mixed-methods for survey development. – the background information included briefly describes the mixed-methods approaches that underpin this work, and thus we propose to keep it; per another reviewer’s suggestion, we have added additional detail on qualitative and quantitative approaches. Methods: - I just wanted to clarify - so all the VA data was collated, and then cause of death was assigned independently by clinicians from South Africa? Did none of the data sources already have causes of death assigned, and if so, what did you do about conflicts? This process was not totally clear. -No; VA questionnaire data was available from 5 project teams; additional information on cause of death, as assigned by a physician, was available from 1 project team (South Africa). Clarification added in the first sentence of the quantitative data collection section. - Can you explain how (and why) the sub-sample was generate for cause of death - was it random? Why was it not stratified by age-group? – per the above, it was not a random sample; it was based on what data was available from the project teams. - Can you explain why Zambia and Morocco were chosen? – Clarification added in the second sentence of the qualitative data collection section (“These sites were selected based on their readiness of the sites, given that the VA process had been well established, and the field teams were interested into evaluating the performance of the interview process using cognitive interviewing.”) - What country is CCQDER based in? 'National' is not so informative for international readers! – the U.S.; clarification added. - For the cognitive interviews, can you say something about who the 'local researchers' are (and consider re-phrasing this term), and languages e.g. were notes taken in English? Or other languages and translated? How was nuance of language/translations/understanding dealt with, given the cognitive interviewing methodology. Clarification has been added through the section to clarify who the researchers were (interviewers selected from the local communities) and the role of language and translation throughout the process. The cognitive interviewers were depended on to translate local language of the respondents into English for analysis. - "The qualitative cognitive interviewing results were analyzed using typical qualitative analysis methods" - can you be more explicit and state the approach used? Also, who conducted the analysis, and how did you think about validity --Detail has been added in the qualitative section about the iterative 5-step synthesis and reduction process, within which results are validated through comparison and iteration of findings; results were further validated in the mixed-methods triangulation. For triangulation, from reading it sounds like the analysis was done iteratively, so triangulation was done during the analysis phase, and not in the interpretation phase - it would be good to state this clearly (i.e. a concurrent iterative triangulation mixed-methods approach?). –Correct, triangulation was iterative, and clarification has been added. Results: - Figure 1 appeared to be distorted or missing information. – redid figure to clarify relationship of content. Discussion: - When considering duplication of information (or redundancy of questions), there have been previous reports from fieldworkers that they found open narratives useful as a way to cross-check information then given to closed questions. It would be worth discussing this wider point, about recall and reliability of data, and whether having some 'checks' in the questionnaire would always be negative. I.e. could these actually be turned into quality control flags? – Agree. Have added a statement about the value and potential use of the narrative in this capacity, also noting that more work is needed to understand how to make the best use of the narrative. - Its also worth raising in the point about who the respondent is, to consider who the interviewer is, and how this also plays a role in data quality. There is some literature on this around insiders versus outsiders conducting VAs. – noted. This point with a relevant reference has been added to the section on choosing the right respondent - On the point that nearly all data was from the African region, the discussion on tobacco could include a clarification, given smokeless tobacco use is most common in the Asian region (https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-020-01677-9). – noted; this example has been added in the limitations section with a broader explanation about the impact of the variation in epidemiologic patterns and cultural practices across regions. Reviewer #2: The authors present important evaluation of the WHO VSA identifying areas where improvement can be made in this widely used tool. Given the importance of identifying causes of death as countries work to reduce preventable mortality on the road to achieving effective UHC and meeting the goals in the SGDs. The manuscript identifies 4 areas where change is needed based on unsolicited input from users (unable to assess that data source as the link is broken) and analysis of results from 2 data sets of quantitative results. The authors however then describe a mixed methods, which is a weak part of an otherwise strong paper. The description of the analysis (“typical qualitative analysis methods” which is insufficient and the references are about cognitive interviewing (one on analysis)) and requires more detail (see COREQ for describing qualitative analysis). In addition it is difficult to identify where the results were used beyond the analysis of the already submitted input. It would also be important to understand the process through which the countries undertook linguistic and cultural translation and if there were differences in the results based on country. More specific details are below. Also-data are plural and should be corrected. – have addressed these comments as described below. Introduction: Why is the mixed methods approach “novel”? This is hard to interpret given the paucity of description More details on how the feedback was obtained (line 117). ¬-The application of the mixed methods approach to improving the WHO VA questionnaire is novel, but for simplicity, this term has been removed. The link to reference 19 is broken – have updated: https://github.com/SwissTPH/WHO_VA_2016 Methods: see above. A description of how the 2 different VSA datasets were used as this does not emerge in results. –Clarification has been added in the quantitative section regarding the composition of the primary and reference datasets. In Table 1-was this only from the submitted comments or also done through whether the new qualitative data or from analysis of the VSA results to identify additional areas beyond the 14? – only from the submitted comments. Results: Overall well described but some language is hard to follow (such as the discussion of the vomiting questions-what was the conclusion-which to drop? – Addressing comments from other reviewers, have added modifications throughout to clarify the language. However, to note, for simplicity, we are not providing specific recommendations with the examples here—they are in the supplemental file, and we’ve added a sentence on this in the last paragraph of the discussion. It would also be helpful for others such as respondent type to know if and how much value of still asking (the PR shows relative value but what % of non-close relatives for example, did answer yes or no”. – have added the overall percent of refused and don’t know responses for each question for consideration—where there is a higher percentage of don’t know/refused, the value of the question may be questioned. (this was considered for all questions at a later phase in the broader questionnaire revision process). Similarly (from discussion) -is there a potential difference behind accuracy (correct answer) and assurance (willing to say yes/no even if they do not know due to sense they should know) – Indeed, and that is a purpose of the quantitative comparisons made in this analysis- to compare response patterns across related questions to detect deviations from what is expected if the questions were answered as intended. The information for example on responses to birth weight-what is ‘too low” a definitive response rate to warrant dropping for that respondent type. – Have added detail on the low and high cutoffs used to determine plausibility. For the clarity of construct-where there were differing responses, is there anything from the second VSA data set to identify which questions was closer to reality? – Agree this would be a relevant analysis to conduct, though it was not conducted at this phase of the analysis; have added a comment in Discussion about the value of reference deaths to support further performance evaluation. Figure 1 is also a bit hard t understand – redid figure to clarify relationship of components Discussion line 374-377: It was unclear what cognitive evaluation was done and what patterns could results in false (+) ¬– this refers to the cognitive interviewing that was the source of the qualitative data used in the analysis; the terminology has been changed for consistency and clarity; the reference to false positives refers to the issue described in figure 1, which shows a cognitive pattern of how respondents interpreted diagnosis questions in different ways, suggesting that some respondents may have conflated the presence of symptoms with a diagnosis of a related health condition, yielding a potential false positive response. In the limitations, the comments about mixed methods is hard to interpret given the issues around description of methods and results noted above. -addressed as noted above. Also there is no discussion at differences between processes for linguistic and cultural translation as well as the cultural adaptations needed. – this point has now been addressed in the limitations section Reviewer #3: Mixed-Methods Analysis of Selected Issues Reported in the 2016 World Health Organization Verbal Autopsy Questionnaire OVERALL COMMENTS Thank you for the opportunity to review this important paper advancing the international VA standard interview towards a format that is more amenable to widespread application. The paper reports an important and original contribution and is worthy of publication. There are several overall and specific comments below, all relatively minor but may strengthen the paper in terms of clarity and meaning to readers unfamiliar with VA. Overall, I found several key pieces of information to be slightly opaque. In the description of the ’14 issues’ it would be useful to summarise what these issues were – i.e., that they relate to process (generally shortening the interview) and substantive issues (confusion with constructs/meaningful responses). The selection/origins of the data and sequence of the analysis were also not entirely clear e.g., whether the 14 issues from VA end-users is part of or separate to this analysis. These points could be introduced and explained with more clarity, and earlier in the paper. Moreover, some attention to where the ’14 issues’ came from; what a cognitive interview is; what the qualitative work did (content of the interview); and, in the abstract, an overview of the substantive findings. A statement on objectives was also missing from the abstract and paper. This could be e.g., related to identifying: ‘underperformance or redundancy [of items] within the instrument’ (page 17, line 204) and ideally connected to higher order aims ‘to shorten the instrument and improve acceptability of the interview by respondents as well as the accuracy of the responses’ (page 18, lines 218-9) or ‘to inform decisions around the revision of the instrument to improve its overall accuracy and utility’ (page 24, lines 338-9). Consistency with key terms would also be useful. There was some variety in terms used especially for the qualitative elements (e.g., cognitive testing results and typical qualitative analysis methods) consistency with these would be preferable. I hope these comments are of use in revising and clarifying some key aspects. Finally, it should be noted that I am not a quantitative methodologist and a reviewer with skills in this area should also review the paper. – All comments have been address as described below. SPECIFIC COMMENTS ABSTRACT 1. Page 10, line 56-7: It would be useful to summarise what the issues are, and provide a stated objective. – Have modified to address these issues and have named specific issues in the relevant statements in the Results section 2. Page 10, line 59: It would be useful to indicate where the VAs were drawn from. – have added 3. Page 10, line 60: How is the quality of responses defined? -rephrased to more accurately describe the association measured 4. Page 10, line 61: For the unfamiliar reader, it would be useful to understand what a ‘cognitive interview’ is- have added a brief explanation 5. Page 10, line 62: It would be useful to understand, in the abstract, the settings from which the data were drawn. – have added countries 6. Page 10, line 63: It would be useful to understand how identification of common themes relates to the overall aims and objectives. –While the specific findings for each question need to be considered for the questionnaire revision process, the four common themes are useful to summarize the findings overall and to consider how these broad themes may relate to other possible revisions (no change recommended) 7. Page 10, line 67: As per comment no. 1, it would be useful to have a summary of what the previously identified issues are. The sentence ‘Two of the 14 question series identified issues related to item sequence; seven demonstrated similar response patterns among questions within each series capturing overlapping information’ suggests what the issues relate to, but is not entirely clear. – have added names of specific issues for clarity 8. Page 10, lines 69-70: Is it possible to report the respondent characteristics? Similarly on constructs outside the frame of reference? – have added 9. Page 11, lines 74-8: As per comments above, the substantive results feel lacking. Some of this content could be used to develop aims and objectives.- have added detail as noted above INTRODUCTION 10. Page 12, line 99: It might be useful, for the unfamiliar reader, to understand what is meant by ‘item and unit’. – agree these are confusion; have changed item to question and deleted unit 11. Page 12, line 105: Again, for the unfamiliar reader, it might be useful to briefly summarise how different philosophical positions on truth and knowledge underpin different methodologies. – Have added the requested detail after line 105. Does this only improve reliability? – No, have broadened terms used to “produce a better understanding of findings”. 12. Page 12, line 112: As above, please include a brief description of ‘cognitive interview’. – Have added at first mention of cognitive interviewing in previous paragraph. 13. Page 12, line 117: Who are the ‘end-users’? –have added clarify that end users are any field teams that use the questionnaire 14. Page 12, line 118: As per comments on abstract, please provide a summary of the 14 issues, from whose perspectives and using what approaches these issues were identified. –additional detail has been added to clarify how the issues were identified and why they were selected for the mixed methods analysis; we have revised the background paragraph to include more detail on the platform for collecting feedback and to indicate that this paper is about the issues for which a mixed methods approach is appropriate 15. Strongly suggest that the authors articulate aims and objectives. – Have revised the final paragraph in the background section to clarify the purpose for the investigation. METHODS Quantitative data collection 16. Page 13, line 129: (And throughout) consistency with abbreviations needed. -reviewed/updated 17. Page 13, paragraph 2: Great to understand the settings from which data derived. It would be useful to understand how and why data from these countries were included. – added detail (field teams that had completed at least 1,000 VAs using the 2016 WHO VA questionnaire and who agreed to contribute their data for the exercise) 18. Page 14, lines 148-53: A short explanation of the relationship between the primary and reference datasets would be useful. – have added clarification. (“The reference dataset included the VA data as in the primary dataset combined with cause of death information determined by physician review of the verbal autopsy.”) 19. Page 14, line 156: As above – a) why these settings? – added (“These sites were selected based on their readiness, given that the VA process had been well established and the field teams were interested in evaluating the performance of the interview process using cognitive interviewing.”) And b) what is a cognitive interview? – Has been added in background 20. A description of what the cognitive interview sought information on would be useful to include. The authors may also wish to report on key information such as: How long did the interview take? Was it structured/semi-structured? Is the interview guide available? How many interviews were done in each setting? – Requested details have been added in the qualitative section. Analysis 21. Page 15, line 181: As above, who are end users? – addressed above and use deleted in this section, as all detail on feedback moved to background 22. Page 15, lines 184-5: Great to have the 14 problematic areas, a description of this could come earlier, however. It is also not clear whether the 14 issues from VA end-users is part of or separate to this analysis. – Agreed. Have moved all description of 14 issues to background. 23. This section might usefully be revised to state specifically the aspects being assessed, how these relate to the ’14 issues’ and how the assessment allowed the issue to be addressed. – Detail has been added in Analysis section to clarify the aspects assessed and relation to 14 issues. 24. Page 16, line 193: what does ‘typical qualitative analysis methods’ mean? Details on the specifics of the analytical approach, and why the approach was appropriate would be useful to include. – Details have been added in qualitative section (iterative 5-step synthesis and reduction process). 25. Page 16, lines 194-5: ‘cognitive interviewing data’ – does this mean qualitative data? It is not clear why end-users (presumably administrators of VA) would report the same or similar issues to VA respondents. Specifics of the quantitative analysis performed on the inductive analysis would be useful to include. – this specific statement has been removed; additional detail added in the section to clarify that qualitative data refers to cognitive interviewing data; end users (which has been clarified to be field teams) are those that administer the VA process—they compile feedback from interviewers, who can report when respondents have issues understanding the questions; of course, specific cognitive testing assessments aim to systematically compile such information directly from respondents. 26. Page 16, line 197: ‘underperformance of the item’ gives some sense of the overall objective and how the analysis contributed to achieving, however this could be brought out more clearly. – noted; this has been added in the last part of the intro section with additional information in the analysis section noting results are used to make recommendations for improvement. 27. Table 1 – please number the 14 items. In the description, it might be useful to summarise that these relate to process (repetition, response patterns, or shortening of the interview) and substantive issues (confusion with constructs/consistent and meaningful responses). As above, this could be introduced and explained with more clarity, and earlier in the paper. – numbers added in Table 1. Detail added in background where 14 issues are described about how the issues relate to the interview. Table 1 moved to after Intro. RESULTS 28. Page 18, line 203: Consistency with ‘concepts’ and ‘constructs’ in reference to items in the interview would be useful. Considering much of the analysis refers to respondents’ understanding of constructs, the authors may wish to refer to ‘four broad themes’, here. – Noted. Changes have been made for clarity; four broad concepts/constructs changed to themes throughout; where “concepts” referred to “questions,” wording was changed to “questions.” 29. Page 18, lines 204-5: The authors may wish to indicate that ‘overlap within the item series’ is understood as ‘redundancy’. – done, clarity added Redundancy 30. Page 19, line 243: some explanation of ‘seven of the question series’ would be useful to include. – rephrased for clarity 31. This section opens with a statement about question series on tobacco, sores, breast swelling, abdominal problem, vomiting, vaccination, and birth weight. It is not clear why results of the analysis of response patterns are presented in detail for one of these (vomiting) in an appendix for another (tobacco use), triangulating with the qualitative analysis for one (tobacco use) and not for the others. – noted. Select examples are shown throughout the results section to demonstrate results of the mixed methods analysis contributing to the summary findings of the four broad themes contributing to issues in the questionnaire, with reference to the Supplemental file for the full analysis. This is noted at the end of the first results paragraph. Wording has been added in the redundancy section to clarify that details for two of the issues are included as an example. The two examples have been reordered to be ordered consistently with how they are mentioned in Table 1 and in the list of 7 issues with problems of redundancy. 32. Page 21, line 272: does ‘cognitive testing results’ mean qualitative analysis? Various terms are used for this element of the analysis, which may not be entirely clear to readers. – yes; have switched “testing” to “interviewing” for consistency and added “qualitative” before cognitive interviewing in this section for clarity. Frame of reference 33. Page 21, line 289: Again, ‘question series’ would be useful to describe to the unfamiliar reader. – noted; have made modifications for clarity in the methods/analysis section and in this section. 34. Table 3: it would be useful to understand why PRs are presented for 6 questions. What about the others? – We used two measures to evaluate frame of reference; examples provided demonstrate each; have added a sentence clarifying these two measures in the section. Clarity of construct 35. Page 23, lines 312-4: The difference between the two elements of clarity of construct is unclear. – have revised to add clarity that first element refers to the ability of the respondent to understand the terminology (versus the second, which refers to the respondent responding to the intended construct). 36. Page 23, lines 314-5: The sentence ‘In the qualitative analysis, items seeking similar information but using different terminology, or items having overlapping constructs demonstrated differing response patterns’ is slightly unclear, suggest revise in the active voice. – Have revised for clarity. 37. Page 23, line 323: Again, the term ‘cognitive testing results’ is used. This term is only introduced in the results section. It is perfectly acceptable to use the term, however it should be introduced and described in the methods section and used consistently thereafter. – modified as noted above—now using “cognitive interviewing” consistently throughout. 38. As above, the triangulation and choice of specific results presented is unclear. – Have added detail in the first paragraph of the results section to clarify that select examples are shown to demonstrate the application of different types of analysis. Have also reworked the intro statement for the first two paragraphs of this section to add clarity. DISCUSSION 39. Page 24, line 341: See point above, the authors may wish to consistently refer to themes from the mixed methods analysis. Various reference to constructs and concepts may be confusing for readers. – Noted, have changed as recommended. 40. Page 25, paragraphs 1-2: As above, were these the findings of note from the item sequence analyses? ‘Such as’ indicates there were others. ¬¬-Paragraphs 2-5 of the Discussion provide additional context for each of the 4 broad themes, referencing examples described in the results. 41. Page 25, paragraph 3: Was there any attempt to examine response patterns by respondent type? Or by setting? ¬¬It’s unclear what is meant by respondent type. We did not examine response patterns by setting—the analysis was conducted on the full set of data; we did examine response patterns by relationship of the respondent to the deceased (as described in “frame of reference” section). 42. Page 26, lines 371-4: As above, consistency with key terms – ‘cognitive testing’, used for the first time in the results and frequently thereafter, and here for the first time, ‘cognitive evaluation’, and prior with qualitative analysis could be confusing for readers unfamiliar with these methods. Introducing and explaining key terms in the methods section, and carrying these through the paper consistently would further strengthen the reporting of the research process and findings. – use of “cognitive testing” and “cognitive evaluation” has been removed throughout. 43. The discussion could include some attention to the wider debates on VA. How does, for example, this research contribute to the methodological transition of the method? – Have added a paragraph on this in the discussion, calling for additional work in various areas 44. Page 26, paragraph 2: The limitations are useful and relate to some comments above on how study settings were selected, and where data were drawn from, which, if raised in the methods, could be critically reflected on here. The authors may also wish to consider strengths of the approach, and future directions. Also, on page 27 (line 397) the approach is described as novel. It would be useful to understand what type of research or other information has informed previous iterations of the instrument, and how this approach is new/contributes to what has gone before. – have expanded details on limitations of where data were from in limitations; have added a paragraph (second to last in the Discussion) for future work; cognitive testing was also used to inform previous revisions of the instrument, but for simplicity, have removed the term “novel”. 45. Page 26, paragraph 2: While it is more customary in qualitative research, the authors may wish to reflect on their positionality and how this influenced the research process and results. –this was not included with the qualitative portion of the findings from which this work was drawn; the co-authors are happy to discuss further if needed. 46. Pages 26-7, lines 389-99: This reads as a useful conclusion. Is this section required for this type of paper, in this journal? Unclear- believe discussion sections starting with a summary of results are customary? 47. Pages 26-7, lines 389-99: Does the statement ‘Questionnaire revision decisions cannot be made from this evidence alone’ (page 24, line 339) contradict the subsequent statement ‘Findings from this investigation provide supporting evidence for the revision of the 2016 WHO verbal autopsy instrument.’ (page 26, lines 389-90)? See above, the authors may wish to consider articulating a series of directions for future research to inform decisions on revision of this instrument. –These findings were one of a set of criteria that were taken into consideration during the overall questionnaire revision process (other criteria included the significance of a symptom in assigning a cause of death, as measured by empirical data and by medical expert opinion). As noted above, we have added a paragraph on future direction. END OF REVIEW Submitted filename: ResponsetoReviewers.docx Click here for additional data file. 28 Mar 2022

PONE-D-21-12120R1

Mixed-Methods Analysis of Selected Issues Reported in the 2016 World Health Organization Verbal Autopsy Questionnaire PLOS ONE Dear Dr. Nichols, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by May 12, 2022. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A 'Response to Reviewers' letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Prof. Ritesh G. Menezes, M.B.B.S., M.D., Diplomate N.B. Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: (No Response) Reviewer #4: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for the responses and the methods and origin of the different data sources are now much clearer - again, really interesting work! The most minor of things - in the abstract, the final sentence in the background should be removed, as its a conclusion statement (although I think you added this on reviewer 3's request?), so maybe an editorial decision. Reviewer #2: The authors have done a careful job in responding to the 3 reviewers in-depth and relevant comments. This is an important manuscript which has hopefully already influenced work to continue to strengthen the VSA tool. However there remain a few targeted areas where the paper could be strengthened for reader understanding and impact Abstract: The explanation of mixed methods is not needed, as now a commonly used and can be dropped to allow for more detail (like which 2 countries were used for the qualitative work) In the introduction the authors provide a nice description of the uses of mixed methods, but do not then state clearly which of these is being applied In the description of the GitHub data, they note "14 problematic issues for which solutions could be well-informed...."-I think these are 14 areas rather than grouping of issues? this should be clarified While edited, it is still not clear to a new reader that only one of the datasets as physician validation. “The reference dataset included the VA data as in the primary dataset combined with cause of death information determined by physician review of the VA interview (PCVA or physician certified VA) from the South Africa National Cause of Death Validation Study [22]. This should be explicitly stated, as well as any differences in how the VA may have been administered in the reference set versus the other data. A reference on cognitive interviewing is needed. In addition, were the respondents interviewed also being interviewed for the VA? Were any changes made or requested after the cognitive interviewing when issues were identified in interpretation? As noted by a reviewer-a COREQ checklist should be completed as an appendix. Given the importance of the qualitative and emphasis on mixed methods, there remains a lack of description of the qualitative analysis (as was identified by a previous reviewer). This should be corrected and a reference for the methodology (and why chosen) added In results, were there any differences in the 4 main areas based on either age group of the deaths (ex. Neonate versus older adults) and across countries? Any identified issues for example with linguistic or cognitive translation of questions? How was the reference dataset used beyond the use in injury? The discussion is much stronger after revisions. I still did not see any discussion about the reference dataset (was that for accuracy of the VA versus physician dx and if so, results and discussions?). I would only add into limitations that differences based on age group of the deceased should be included. Reviewer #4: Nichols et al. conducted a mixed-method analysis titled, “Mixed-Methods Analysis of Selected Issues Reported in the 2016 World Health Organization Verbal Autopsy Questionnaire”, in which they show that WHO VA questionnaire requires revisions and clarifications to improve the respondents understanding of the questionnaire. In my opinion, the study can be improved by incorporating the following points: 1. The authors have not mentioned regarding how they computed the cross-tabulations data and evaluated the significance of their results, such as with a chi square test. Also mention the p value that was considered significant. 2. As the data was collected by the field team, who were included in the field teams, such as doctors or nurses? More details can be mentioned. 3. In the discussion (Line 411 – 413) when the authors have compared injury related symptom question in death due to injury vs non-injury, they should elaborate more on the reason why response was lower. 4. In WHO VA version 1.5.2 and 1.5.3, which particular respondent characteristics affect the reliability of the response? This can be mentioned to improve this part of the discussion as the authors have already highlighted that this is a sensitive issue. 5. Please add a reference for WHO VA 1.5.2/3. 6. The results state that clarity of construct was the ability to understand the terminologies and the intention of the question. However, the discussion lacks any explanations regarding the findings of clarity of construct and how this affects the questionnaire and the responses by the participants. 7. It is mentioned that an open narrative regarding the circumstances of death is also collected but it can be mentioned that what are the benefits of this? Such as can this type of questioning give more detailed and qualitative data? The authors may give more references here to support this claim. 8. The manuscript needs to be proofread for grammatical mistakes 9. The conclusion of the study needs to be improved. The authors may include particular summarized findings from the study as a properly written conclusion affects the readers’ interest. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #4: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

8 Jul 2022 The authors are grateful for the additional review and have addressed all comments as described in the Response to Reviewers. Submitted filename: ResponsetoReviewers_12May2022.docx Click here for additional data file. 26 Aug 2022 Mixed-Methods Analysis of Selected Issues Reported in the 2016 World Health Organization Verbal Autopsy Questionnaire PONE-D-21-12120R2 Dear Dr. Nichols, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Prof. Ritesh G. Menezes, M.B.B.S., M.D., Diplomate N.B. Academic Editor PLOS ONE 29 Sep 2022 PONE-D-21-12120R2 Mixed-Methods Analysis of Select Issues Reported in the 2016 World Health Organization Verbal Autopsy Questionnaire Dear Dr. Nichols: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Prof. Dr. Ritesh G. Menezes Academic Editor PLOS ONE

11 in total

1. Combining the power of stories and the power of numbers: mixed methods research and mixed studies reviews.

Authors: Pierre Pluye; Quan Nha Hong
Journal: Annu Rev Public Health Date: 2013-10-30 Impact factor: 21.981

2. Comparing Cognitive Interviewing and Psychometric Methods to Evaluate a Racial/Ethnic Discrimination Scale.

Authors: Bryce B Reeve; Gordon Willis; Salma N Shariff-Marco; Nancy Breen; David R Williams; Gilbert C Gee; Margarita Alegría; David T Takeuchi; Martha S Kudela; Kerry Y Levin
Journal: Field methods Date: 2011-08-25

Review 3. Innovations in Mixed Methods Evaluations.

Authors: Lawrence A Palinkas; Sapna J Mendon; Alison B Hamilton
Journal: Annu Rev Public Health Date: 2019-01-11 Impact factor: 21.981

4. Factors influencing healthcare provider respondent fatigue answering a globally administered in-app survey.

Authors: Vikas N O'Reilly-Shah
Journal: PeerJ Date: 2017-09-12 Impact factor: 2.984

5. The WHO 2016 verbal autopsy instrument: An international standard suitable for automated analysis by InterVA, InSilicoVA, and Tariff 2.0.

Authors: Erin K Nichols; Peter Byass; Daniel Chandramohan; Samuel J Clark; Abraham D Flaxman; Robert Jakob; Jordana Leitao; Nicolas Maire; Chalapati Rao; Ian Riley; Philip W Setel
Journal: PLoS Med Date: 2018-01-10 Impact factor: 11.069

6. Verbal autopsy in health policy and systems: a literature review.

Authors: Lisa-Marie Thomas; Lucia D'Ambruoso; Dina Balabanova
Journal: BMJ Glob Health Date: 2018-05-03

7. Revising the WHO verbal autopsy instrument to facilitate routine cause-of-death monitoring.

Authors: Jordana Leitao; Daniel Chandramohan; Peter Byass; Robert Jakob; Kanitta Bundhamcharoen; Chanpen Choprapawon; Don de Savigny; Edward Fottrell; Elizabeth França; Frederik Frøen; Gihan Gewaifel; Abraham Hodgson; Sennen Hounton; Kathleen Kahn; Anand Krishnan; Vishwajeet Kumar; Honorati Masanja; Erin Nichols; Francis Notzon; Mohammad Hafiz Rasooly; Osman Sankoh; Paul Spiegel; Carla AbouZahr; Marc Amexo; Derege Kebede; William Soumbey Alley; Fatima Marinho; Mohamed Ali; Enrique Loyola; Jyotsna Chikersal; Jun Gao; Giuseppe Annunziata; Rajiv Bahl; Kidist Bartolomeus; Ties Boerma; Bedirhan Ustun; Doris Chou; Lulu Muhe; Matthews Mathai
Journal: Glob Health Action Date: 2013-09-13 Impact factor: 2.640

Review 8. Comparison of physician-certified verbal autopsy with computer-coded verbal autopsy for cause of death assignment in hospitalized patients in low- and middle-income countries: systematic review.

Authors: Jordana Leitao; Nikita Desai; Lukasz Aleksandrowicz; Peter Byass; Pierre Miasnikof; Stephen Tollman; Dewan Alam; Ying Lu; Suresh Kumar Rathi; Abhishek Singh; Wilson Suraweera; Faujdar Ram; Prabhat Jha
Journal: BMC Med Date: 2014-02-04 Impact factor: 8.775

Review 9. Prospects for automated diagnosis of verbal autopsies.

Authors: Michel Garenne
Journal: BMC Med Date: 2014-02-04 Impact factor: 8.775

10. Added value of an open narrative in verbal autopsies: a mixed-methods evaluation from Malawi.

Authors: Patricia Loh; Edward Fottrell; James Beard; Naor Bar-Zeev; Tambosi Phiri; Masford Banda; Charles Makwenda; Jon Bird; Carina King
Journal: BMJ Paediatr Open Date: 2021-02-05