Literature DB >> 34007892

Mapping population vulnerability and community support during COVID-19: a case study from Wales.

Nina H Di Cara¹, Jiao Song², Valerio Maggio¹, Christopher Moreno-Stokoe^1,3, Alastair R Tanner¹, Benjamin Woolf^1,3, Oliver Sp Davis^1,4, Alisha Davies².

Abstract

BACKGROUND: Disasters such as the COVID-19 pandemic pose an overwhelming demand on resources that cannot always be met by official organisations. Limited resources and human response to crises can lead members of local communities to turn to one another to fulfil immediate needs. This spontaneous citizen-led response can be crucial to a community's ability to cope in a crisis. It is thus essential to understand the scope of such initiatives so that support can be provided where it is most needed. Nevertheless, quickly developing situations and varying definitions can make the community response challenging to measure. AIM: To create an accessible interactive map of the citizen-led community response to need during the COVID-19 pandemic in Wales, UK that combines information gathered from multiple data providers to reflect different interpretations of need and support. APPROACH: We gathered data from a combination of official data providers and community-generated sources to create 14 variables representative of need and support. These variables are derived by a reproducible data pipeline that enables flexible integration of new data. The interactive tool is available online (www.covidresponsemap.wales) and can map available data at two geographic resolutions. Users choose their variables of interest, and interpretation of the map is aided by a linked bee-swarm plot. DISCUSSION: The novel approach we developed enables people at all levels of community response to explore and analyse the distribution of need and support across Wales. While there can be limitations to the accuracy of community-generated data, we demonstrate that they can be effectively used alongside traditional data sources to maximise the understanding of community action. This adds to our overall aim to measure community response and resilience, as well as to make complex population health data accessible to a range of audiences. Future developments include the integration of other factors such as well-being.

Entities: Chemical

Keywords: community resilience; coronavirus; data visualisation; geospatial; public health

Year: 2021 PMID： 34007892 PMCID： PMC8104153 DOI： 10.23889/ijpds.v5i4.1409

Source DB: PubMed Journal: Int J Popul Data Sci ISSN： 2399-4908

Background

Understanding the geographic distribution of need is crucial for localised and central agencies to provide relevant support. During a crisis this is particularly relevant as resources are likely to be overwhelmed. This process of vulnerability (or risk) mapping [1] is typically used in response to physical disasters, but the current COVID-19 pandemic has presented a global crisis in the field of public health. Whilst vulnerability to disease is a key risk to understand during a pandemic it is also crucial to consider that vulnerability to poor physical and mental health as a consequence of public actions (e.g. self-isolation) reflects existing social and economic inequalities such as financial security, and access to services and local support [2]. Evidence that the direct and indirect impacts of COVID-19 were greater amongst those already experiencing inequalities [3-6] was seen just months into the pandemic, including that these impacts reflected existing geographic distributions of inequality [7]. The challenges of meeting emerging needs in local communities can be somewhat mitigated by local resilience and citizen-led responses from existing or spontaneous community groups [8-10] which have the potential to improve the ability to withstand stress and survive adverse circumstances at both an individual and community level [11]. As such, it becomes crucial to understand which communities have the most need that cannot be mitigated by the available and emerging community support in each area [12, 13]. Strengthening community resilience is a global and national priority [14], set out in the United Nations Sustainable Development Goals and Well-being of Future Generations Act [15]. This emphasises the importance of curated and timely data that can capture the scale of community action. Data on the determinants of vulnerability, inequality, and community belonging is generally measured by annual government surveys and census data [16], but these methods are not often timely enough to capture a live assessment of localised well-being and support. During a crisis it is crucial for higher-level agencies and those organising support locally to have access to this information in order to enable more effective national and local action as well as to empower communities as partners in managing the impact of a disaster [17]. Citizen-led community support played a vital role towards the beginning of the UK lockdown, with the importance of digital communication quickly becoming apparent [18, 19]. In this situation, online platforms became hubs for spontaneous neighbourhood and community initiatives and provided a means to communicate and coordinate local resources; public support groups on Facebook, NextDoor and WhatsApp were being developed [20], alongside those led by existing third sector organisations, and community leaders [21, 22]. A survey by Supporting Communities in Northern Ireland [21] determined that 76% of community groups were communicating with local residents through social media. Previous research into environmental disasters has shown evidence that sourcing community generated information about local action from social media and crowd-sourcing platforms is possible [23-26], and has been employed as a live data source in several natural disasters [27, 28]. Post-hoc analysis has also revealed that useful data can be drawn from these sources [23, 29, 30], including levels of community resilience [31]. These findings show simultaneously the power of the internet for connecting people and understanding the workings of communities, and subsequently the potentially dire consequences of digital exclusion that exacerbates the lack of available support for those who are most in need [32].

Aims

The aims of the COVID-19 Response Map project are two-fold: (I) collate data that represents the scale of unmet need during the pandemic across geographic areas; and (II) create a bespoke data platform that would facilitate the exploration of this complex population health data. The need is represented by the populations who are most vulnerable to poor health outcomes from COVID-19, and hence its fulfilment corresponds to the level of community support, and the resources available to mitigate the impacts of those needs. The data-driven approach also aims to include non-traditional sources of data (e.g. social media and community-generated data) to supplement administrative and publicly available data. In this case study paper we set out the steps we took to fulfil these aims with specific reference to the country of Wales, in the United Kingdom. We approached this problem with a multi-disciplinary team of public health experts, statisticians, data scientists and researchers in human-computer interaction.

Approach

In this section we will describe the systematic approach we took to identify, process and visualise community-level data. We first start in Definitions by clarifying our intended definitions for the need, support, and vulnerability of local communities. Then, in Data sets and data providers, we outline what data we identified to support these definitions, also outlining the data providers from whom we sourced the required information. Data transformation pipeline describes the subsequent data processing pipeline, focusing on how the design was developed to enable full reproducibility. Finally, in Data visualisation, the interactive mapping tool will be described, emphasising the choices we made for effective data visualisation, and easy data exploration.

Definitions

To approach the challenge of mapping the vulnerability of a local population across a specific geographic area, we first needed to define our interpretation of need and support. Need could be defined and measured in many ways [33]. However, in the context of the COVID-19 pandemic we primarily focused on the clinical and social vulnerability of a geographical community, as divided into three main themes. These themes are designed to cover the existing features of a community alongside the changing risks presented by the pandemic: (1) Health Vulnerabilities, (2) Transmission Risk, and (3) Deprivation and Exclusion. The first represents the proportion of the population who are vulnerable to poor health outcomes from COVID-19 [34, 35]; transmission risk expresses the likelihood of becoming infected [36, 37]; the latter considers contextual socioeconomic factors [7]. Quantifying the resilience and the support in a community can be challenging, mainly due to the many possible conceptualisations of resilience and its tendency to change over time [8]. Therefore, we defined resilience guided by known features that were likely to be expressed in available data. We again identified three themes of interest: (1) Support Resources, namely the known community assets or services that were supporting people in each area [8]; (2) Community Cohesion, the existing, measured cohesion of the community in each area; (3) Reported Support on Social Media, support being offered or reported on social media sites. Finally, we defined the vulnerability (or unmet needs [12]) of a specific area as the relative gap between local need and local support.

Datasets and data providers

After establishing the operational definitions of need and support, we underwent a process of scoping the data that were available to meet these definitions. To do so, we engaged in ongoing consultations with representatives from the public and the third sector, including the Welsh Government (WG), Third Sector Support Wales, Data Cymru, the County Voluntary Councils and the Wales Council for Voluntary Action (WCVA). After this process, we were able to characterise the list of Data Providers, as well as the individual Variables that could be captured from the data these providers could share. Data providers here refers to the organisations providing access to data. These data were either available publicly, and released under the terms of open licenses (e.g. GPL-v3 [38], Creative Commons [39]), or shared directly with us for the sake of the project. Identified data providers are (1) WCVA; (2) COVID-19 Mutual Aid UK; (3) National Health Service (NHS) Wales, through both the Informatics Service (NWIS) and Public Health Wales (PHW); (4) WG; (5) Office for National Statistics (ONS); (6) Secure Anonymised Information Linkage (SAIL); and (7) Twitter. As well as identifying available data, the scoping process also revealed challenges in sourcing data about online community support. The majority of online conversation about local support services was taking place either through private conversation channels (e.g. WhatsApp), on Facebook groups or on neighbourhood social media platforms [40]. Gathering data from some of these sources was undesirable (for instance, private messaging) or unsuccessful due to companies being unwilling to disclose commercially sensitive information, or not providing application programming interface (API) access to social media platforms. The final set of Variables collected were 14 indicators of the concepts we sought to capture (eight for local need (N), and six for local support (S)): (N1) COVID-19 high risk; (N2) COVID-19 moderate risk; (N3) Over 65 age; (N4) COVID-19 cases; (N5) Population density; (N6) Welsh index of deprivation; (N7) No Internet access; (N8) No online GP registration; (S1) WCVA registered volunteers; (S2) WCVA increase in volunteers; (S3) Mutual aid community support group; (S4) Sense of community belonging; (S5) Symptoms tracker: can count on someone close; (S6) Twitter community support. Each Variable has been defined to match a specific theme of interest from the definitions of need and support. In terms of data architecture, each theme represents a single logical Dataset as composed by a group of Variables. Figure 1 shows a comprehensive diagram mapping each Variable to the originating Data Provider. Each Variable is also grouped by the matched Dataset. A short description of each Dataset is reported below, along with a Summary table for each of the corresponding Variables. Further information on the details of each Variable (e.g. data frequency and geographic resolution) are reported in Supplementary Appendix A.

Figure 1: Diagram representing the mapping between Data Providers, and corresponding Variables for local need (N), and local support (S). Each box represents the Dataset each variable belongs to

Local need - health and vulnerabilities

This dataset brings together existing clinical vulnerabilities in the population as expressed by the National Health Service (NHS) definitions for those at moderate risk (clinically vulnerable) and high risk (clinically extremely vulnerable) from COVID-19 [34]. While high-risk individuals are well defined (see Table 1, N1), the population of those at moderate risk is less specific. In order to index moderate risk (Table 1, N2), we constructed a proxy measure based on the NHS definition by finding the number of people who met one or more of the following criteria as reported in the National Survey for Wales 2018-19 [42]: (A) Aged 70+; (B) Asthma diagnosis; (C) Heart or circulatory illness; (D) Respiratory system illness; (E) Kidney complaints; (F) Other digestive complaints including stomach, liver, pancreas etc.; (G) Learning disability; (H) Diabetes (including hyperglycaemia). We also separately included those over age 65 (Table 1, N3), due to specific vulnerabilities around age that users of the tool may wish to explore independently from the coronavirus risk.

Table 1: Summary description of variables included in the

ID	Variable	Data provider	Short description	Benefits and limitations
N1	COVID-19 High Risk	Public Health Wales by request, with permission from Welsh Government	The percentage of the population who are high risk, also known as “shielding” or clinically extremely vulnerable.	This information was timely and provided by an official source, but does assume that records are correct and could miss those who are not in contact with services.
N2	COVID-19 Moderate Risk	National Survey for Wales via UK Data Service	Percentage of the population who are at moderate risk from coronavirus, based on responses to the National Survey for Wales 2018–19.	This is a proxy variable, and so not an exact measure. The response rate is 54.2%, and nationally representative [41]. However, results are one year old, and may not include some “in-need” groups, e.g. elderly not living at home.
N3	Over Age 65	ONS available on statswales.gov.wales	Percentage of the population who are aged 65 years or older.	The population over 65 is based on modelled projections by the ONS.

Local need – transmission risk

This dataset characterises the risk of transmission through contact with others [36], as represented by the number of cases in a given area [43] (Table 2, N4) relative to its population density (Table 2, N5), which directly affects contact rates [37].

Table 2: Summary description of variables included in the

ID	Variable	Data provider	Short description	Benefits and limitations
N4	COVID-19 Cases	Public Health Wales publicly available	The cumulative number of confirmed cases.	Very timely data, but only includes confirmed cases, therefore an underestimate of the true no. of cases at any given time.
N5	Population Density	ONS available on statswales.gov.wales	No. people per square kilometre based on 2018 mid-year estimates.	Similarly to N3 (Table 1), these figures are based on projections made by the ONS

Local need – deprivation and exclusion

This dataset comprises information characterising the deprivation of communities. The Welsh Index of Multiple Deprivation (IMD) is the official measure for relative deprivation in small areas, and ranks every Lower Super Output Area (LSOA) in Wales from the most to the least deprived, based on factors such as income, employment, access to services and community safety [2]. We chose to include it as a well-established and high-quality index of some important environmental determinants of health and well-being (Table 3, N6). Given the importance of digital connectivity in access to support and services we also considered digital exclusion. Digital exclusion can be represented in different ways [44]; here we were able to include data on the ability to access the internet from home (Table 3, N7), and the proportion of patients registered with online services at their GP surgery (Table 3, N8).

Table 3: Summary description of variables included in the

ID	Variable	Data provider	Short description	Benefits and limitations
N6	Welsh Index of Multiple Deprivation (WIMD)	WG available on Statswales	At LSOA level this is a ranked list of all Welsh LSOAs by level of deprivation. At Local Authority (LA) level this is the percentage of LSOAs in each LA that are in the top 20% most deprived nationally.	The WIMD is measured at a small area level and is a high quality statistic of the multiple facets of deprivation.
N7	Digital Exclusion: No Internet Access	National Survey for Wales via UK Data Service	Percentage of the population without access to the internet as reported in the National Survey for Wales 2018–19.	Similarly to N2 (Table 1) the National Survey is over one year old, but was nationally representative at the time of the survey.
N8	Digital Exclusion: Not Registered with Online GP Services	NHS Wales Informatics Service by request	Percentage of total patients who are not registered with their GP’s online patient service.	This data measures the uptake of digital services across the whole of Wales at a high geographic resolution. However, it does only include people registered with an NHS practice in Wales.

Local support – support resources

The WCVA is a national organisation with a central record of volunteers which, whilst not representing all forms of volunteerism, gives a measure of the distribution of registered volunteers (Table 4, S1). Their monthly reports of volunteer numbers also allowed us to derive the percentage increase in volunteers between the beginning of the pandemic and June 2020 (Table 4, S2). To understand the distribution of community groups we turned to the open database collected by Police Rewired, which brings together COVID-19 Mutual Aid groups registered by community members [20], groups on the community networking app LocalHalo (www.localhalo.com), and council community hubs (see Table 4, S3).

Table 4: Summary description of variables included in the

ID	Variable	Data provider	Short description	Benefits and limitations
S1	WCVA Registered Volunteers	WCVA by request	Number of volunteers who have signed up with the WCVA to provide voluntary support (per 100 people)	Covers the whole of Wales, but does not record volunteers registered with other organisations such as directly with charities.
S2	WCVA Increase in Volunteers	WCVA by request	The percentage increase in volunteers between 13th March 2020 and 18th May 2020.	As in (S1), this will not capture all volunteers registered through other organisations, or casual support (e.g. helping neighbours).
S3	Mutual Aid Community Support Groups	COVID-19 Mutual Aid and LocalHalo via Police Rewired available openly	Locations of local community support groups submitted by the public.	This provides exact locations for community groups, but not information about the size of the of organisation. Not all community groups will be registered online.

Local support – community cohesion

This dataset includes two variables that index self-reported community cohesion. The first was a question in the National Survey for Wales 2018–19 that asks respondents how strongly they agree with the statement “I belong to my local area” (see Table 5, S4). The second was a question included as part of the sign-up process for the COVID-19 Symptom Tracker app [45], whose data was made available via the SAIL data bank (Table 5, S5). The question asks the user if they “could count on someone close to them if they need help”.

Table 5: Summary description of variables included in the

ID	Variable	Data provider	Short description	Benefits and limitations
S4	Sense of Community Belonging	National Survey for Wales via UK Data Service	The percentage of people who agreed, or strongly agreed with the statement “I belong to my local area” in the National Survey for Wales 2018–19.	As with N2 and N7 (see Tables 1 and 3), the survey results are nationally representative but now over a year old. The sample frame may also have missed some “in need” groups, such as the elderly not living at home.
S5	Symptom Tracker: Can Count on Someone Close	ZOE Symptom Tracker App via Secure Anonymised Information Linkage (SAIL) Databank	The percentage of people who agreed that they could count on someone close to them if they need help.	The sample is limited to those who have a smartphone and internet access. The response rate may not to be representative of the population the respondents are from.

Local support – reported support on social media

To quantify relative local levels of support reported on social media, we collected data from Twitter (www.twitter.com), a well-known social networking platform that allows users to share public updates of under 280 characters in length known as “tweets” (see Table 6, S6).

Table 6: Summary description of variables included in the

ID	Variable	Data provider	Short description	Benefits and limitations
S6	Twitter Community Support	Twitter via the Streaming API	Number of Twitter users identified as having posted at least one tweet about community support since 9th March 2020, as a percentage of total users in each area.	If the underlying determinants of Twitter use are associated with levels of support then this variable could be misleading. Location of each tweet is not exact, so we matched the most likely LA, weighted by the approximate population.

Publicly available data from Twitter were accessed via Twitter’s Streaming API [46, 47] between 9th March and 15th June 2020, retrieving tweets whose Twitter place field was in Wales. The API returns a random sample of the total tweets from the specified area, up to a maximum of 1% of the total worldwide traffic [46]. The tweets returned by the API contain both the text of the tweet and associated meta-data. These meta-data allowed us to identify the Local Authority each tweet was most likely sent from using an automatic matching method based on the percentage overlap of a tweet’s bounding box with Local Authority geographic boundaries [48], weighted by the approximate population of the overlapping areas. To find tweets that were expressing community support we first used a keyword driven approach to obtain a shortlist of tweets that matched words relating to community (the full criteria are available in Supplementary Appendix B). We then qualitatively reviewed the shortlist of tweets to generate the set of tweets that we, as human coders, deemed to be indicative of positive community support. To test the effectiveness of human evaluation of community support indicators on Twitter, two researchers classified 3,215 tweets from the initial shortlist with an inter-rater reliability of 0.44 (Cohen’s kappa) [49], which led to refinement of our inclusion criteria. Our final qualitative review criteria are listed in full in Supplementary Appendix B. The final data we included in the map was the percentage of total unique users in each area who had positively identified community support. In total 860,304 tweets from Wales were retrieved in the time frame, corresponding to 27,805 unique users. Of these, 6,640 tweets were shortlisted for coding using the keyword-based query, from which 972 tweets from 540 unique users were coded as being indicative of community support.

Data transformation pipeline

Considering the multitude of data sources needed to gather Variables, as well as their different formats (e.g. JSON, CSV, TSV, HTML), we defined a fully automated approach to harmonise and aggregate the data. This idea was originally motivated by our intention to guarantee a completely reproducible complex data pipeline, and transparent data documentation. Moreover, this systematic procedure favours our requirement for easy extensibility, both in terms of processing operations and of additional data sources. A sketch of the defined transformation pipeline is represented in Figure 2. The pipeline is composed by four main consecutive steps, aimed at extracting the target Variables from original data sources, aggregating them into the corresponding dataset, and finally preparing them in a format compliant with the interactive mapping tool. Most of this analysis has been carried out using the pandas library [50, 51], and the Python programming language (version 3.7.7). The source code and the technical documentation are publicly available on GitHub [52], along with specific instructions to recreate the development environment.

Figure 2: A schematic of the data processing pipeline

Extract: The input data source is processed in order to extract the data relating to the target Variable. This usually corresponds to grouping and filtering operations on the original data to retain only the information that is relevant to the target Variable. This is the only step of the whole pipeline that has to be customised and adapted to the specific format and layout of the original dataset. Nonetheless, the pipeline keeps tracks of all the applied transformations to the data so that they could be replicated and reproduced. The consistency of the extracted data is verified via automated testing procedures. Harmonise: The aim of this step is to encapsulate extracted data into a tidy [53] and unified data layout. This step is crucial to allow generic transformation operations that can also be re-used regardless of the specific format of the original data. To do so, a generic Variable data abstraction is generated as an output of this step. Transform: During the transformation steps, each collated Variable is subject to a series of transformations which is specific to the data at hand. The structure of the transformation pipeline for each Variable is dynamically defined via a series of generic and re-usable operators, leveraging on the harmonised data layout abstractions. Examples of these operations are numeric format alignment, percentage calculation, as well as data pivoting and transposition. Similarly to the extraction step, each applied transformation is logged for future replicability. Aggregate: The last step of the pipeline aims to aggregate the multiple Variables into their corresponding Dataset, where they are matched by frequency and corresponding geographic resolution. Aggregated data are then formatted in GeoJSON to be integrated into the mapping tool. Detailed information about the Variables themselves, including numeric transformations applied to them is given in Supplementary Appendix A, as well as being fully documented on our code repository [52].

Data visualisation

Vulnerability maps traditionally pinpoint the location of a natural disaster alongside information about the local area [54]. However, since our intention is to specifically identify unmet need, we adopted a bivariate approach that allowed us to combine indices of both need and support on a single map. Our approach to the visualisation is based on a choropleth map, a map whose colours represent a summary statistic relevant to each geographic area, in combination with a linked scatter plot. This plot displays the local need against local support for each considered area (see Figure 3).

Figure 3: Illustration of mapping a composite need score using the number of people at high risk, population density and deprivation against an area’s sense of community belonging

Users are able to select the Variables of interest in relation to need and support from a drop-down list. Where users select more than one Variable to index need or support, each variable is then transformed to a z-score, summed, and normalised using a standard scaling procedure (that is to zero mean, and unit variance). This gives an equally weighted combined score that is fast enough to calculate in the browser, and that we considered accurate enough for visualisation. Furthermore, if a user removes all Variables from one dimension (need or support), the scatter plot automatically collapses to a univariate bee-swarm plot (see Figure 4).

Figure 4: Illustration of mapping a composite need score using the number of people at high risk, population density and deprivation

Data points are coloured according to a scale based on the z-score for support minus the z-score for need. This gives a visual index of how close a data point falls to the bottom right quadrant of the plot, corresponding to an area with high need and low support, in contrast to the top left quadrant, referred to an area with low need and high support (Figure 5). These colours are mirrored on the accompanying choropleth map. We chose the colours so that red consistently represents areas of greater need, and blue consistently represents areas of greater support, with the intensity of the colours representing distance from the main bisecting line. Colours are interpolated in Hue-Chroma-Luminance colour space to maintain a perceptually constant colour scale.

Figure 5: The colouring used to indicate the level of support or need for each area

Since the Variables are available at different geographical resolutions, the results are presented at the highest resolution that is available for all the selected variables. Hovering the mouse over an area on the map, or over a data point in the scatter plot, highlights the area in both views, and labels the area in the scatter plot. Zooming in to the choropleth map reveals further geographical detail, including the location of specific community support groups (Figure 6). Clicking on one of these locations gives more information about the group, including a direct link to the group’s web site.

Figure 6: Community groups are marked on the map at higher zoom levels as dark grey points

The data visualisation was programmed in JavaScript using the D3 visualisation library (version 5, www.d3js.org) and the Mapbox API (version 1.11.0; www.mapbox.com). The tool is available online at www.covidresponsemap.wales or www.mapymatebcovid.cymru, and is supported by an explanatory web page and a comprehensive user guide.

Discussion

In response to the need for an understanding of how the citizen-led response to the COVID-19 pandemic was meeting the needs of local communities we have developed the COVID-19 Response Map project: an online interactive map, available in English (www.covidresponsemap.wales) and Welsh, (www.mapymatebcovid.cymru), that measures local levels of need and community action with a novel combination of data sources. The map uses a bespoke visualisation design that allows users to explore any combination of variables of interest to them, and makes it possible for non-specialists to derive meaningful insights from complex population data. This means that important information about local well-being and needs is available to everyone involved in disaster response, from community and third sector organisers to the government. Although other efforts have created maps of the vulnerability of communities to COVID-19, notably the British Red Cross [55], we have approached this in a different way, allowing users to explore how flexibly-defined local need and support are related to each other, facilitating the identification of areas where the local need or vulnerability is not currently being met by local community support. Our approach also integrates non-traditional data sources such as Twitter and crowd-sourced data, which provide a unique perspective on how we can understand the workings of communities, both in a crisis situation and outside of it. Whilst many of the data sources we used are open (available for anyone to download), the task of sourcing, and combining them is not trivial and requires access to key data owners, time and data-centric skills. This is due to the fact that all the original data are available in their own format and layout, which needed to be processed and harmonised in order to be integrated into a single output for comparison. In doing so, we have developed a systematic data processing strategy that ensures the reproducibility of our whole approach: every single operation to the data is recorded, whilst automated testing is used to verify data consistency. With 42% of charities reporting that they are poor at managing, using and analysing data [56], having user-friendly tools available to combine and interpret population data is important. As well as creating the tool, sharing our documentation, data and code openly [52, 57] is a means of sharing this work with the public sector, so that organisations can reuse elements or refine it to their needs. In turn, we are continuing to work with local and national organisations to further adapt it to their requirements.

Development process

This community support tool was developed in collaboration with the Welsh Government, local councils, voluntary groups and the public sector. It has been received as a welcome contribution to the challenge of democratising access to data and mapping the complexities of communities. Local councils particularly wished to overlay their own data sources, which were sometimes not suitable for public dissemination, and to directly add lists of community groups. Feedback from community organisations and local charities has highlighted the value of better understanding what other local offers of support are so that they can work together to streamline their response. This feature of the map demonstrates the contribution to mapping and understanding community resilience more widely, as the ability to measure and visualise this complex concept will enable better support for communities who are struggling outside of the coronavirus pandemic. This is an area for further exploration going forwards.

Strengths and limitations

The strength of this tool lies in combining multiple data sources in an interpretable way, and bringing together sources of openly available data on community mobilisation and support groups to provide a novel perspective of community resilience and need. We identified a national register of community groups on a central database [20], which was helpful to provide local level information on community action, but there are limitations to community-generated data sources. Since the database relies on individuals to register their groups online it is not comprehensive, and many community groups were already known to residents through existing channels [19, 21]. It also relies on this information being maintained by individuals in order to remain up-to-date and reliable. Through the engagement exercises we undertook we also found that local authorities were holding databases of community groups that served their specific populations; these were more likely to be up to date, but were not open data. To capture informal community mobilisation and support we also drew on social media data from Twitter. Social media has the potential to offer new insights in public health with the added benefit of being extremely timely; our approach to finding community support online did reveal many explicit examples of support being offered or received, or local support groups being advertised. However, it was challenging to rigidly classify community support, reflected in the inter-rater reliability of 0.44 that we achieved with human evaluators, which subsequently made it difficult to establish a reliable automated method for assessing tweets, which would have improved timeliness. There also remains the potential for such data to be misleading or incorrect [58]. Another challenge of combining multiple sources is the differing detail available in terms of timescales, and granularity. From Twitter data that is recorded to the millisecond, to census data that is collected once every ten years, the time-based variation means that the present-day accuracy of variables may be unknown. There are also differing degrees of geographic specificity available for mapping. The majority of data sources we have presented are available at a Local Authority District resolution, which in Wales corresponds to only 22 areas [48]. Data at Middle Super Output Area level would be the ideal resolution to aggregate relevant information and still have meaningful depth, but restrictions on granularity mean this is often not possible. The last challenge we faced in combining data sources is the data that does not exist. The map shows that there are potential benefits of drawing on community generated data through Twitter and the COVID-19 Symptom Study, but these applications are inaccessible to the 13% of people in Wales who have no internet at home [32]. Of this population, over 70% are over 70 years old, and 25% have a low level of general health [32]; as such those without internet access represent some of the most vulnerable members of the population who are not being reached through these emerging data sources. It is for this reason that we deemed it especially important to provide information on digital exclusion.

Conclusion and future directions

Granular, localised and timely information on community resilience will help to direct support to those areas most in need, which is of significant importance given the contribution of communities to general population health and well-being. We have implemented an approach that allows people at all levels of community response to explore complex population data about the distribution of the citizen response to need. Our approach identified key datasets relevant to community support and community need which extended beyond traditional data collection methods for public health; these non-traditional data sources can be timelier than official datasets and add new dimensions to our understanding of communities but it is also important to understand the limitations in their accuracy. Future developments will include incorporating medium- to long-term impacts on communities such as mental well-being. As the pandemic progresses and the research around its direct and indirect impact on health continues to evolve we aim to build on the approach we have developed by identifying new and existing data sources with a specific focus on community vulnerability and support. Given the need for more real time and longitudinal information we would like to use data from Twitter to measure mental health and mood in communities [59, 60]. We will also continue to evaluate the timeliness of our existing datasets, and intend to update the National Survey data with the 2019-20 collection when it becomes available.

13 in total

1. The definition and identification of need for health care.

Authors: R M Acheson
Journal: J Epidemiol Community Health Date: 1978-03 Impact factor: 3.710

2. High population densities catalyse the spread of COVID-19.

Authors: Joacim Rocklöv; Henrik Sjödin
Journal: J Travel Med Date: 2020-05-18 Impact factor: 8.490

Review 3. Identifying and mapping community vulnerability.

Authors: B H Morrow
Journal: Disasters Date: 1999-03

4. Resilience and 21st century public health.

Authors: Erio Ziglio; Natasha Azzopardi-Muscat; Lino Briguglio
Journal: Eur J Public Health Date: 2017-10-01 Impact factor: 3.367

Review 5. Unmet Medical Need: An Introduction to Definitions and Stakeholder Perceptions.

Authors: Rick A Vreman; Inkatuuli Heikkinen; Ad Schuurman; Claudine Sapede; Jordi Llinares Garcia; Niklas Hedberg; Dimitrios Athanasiou; Jens Grueger; Hubert G M Leufkens; Wim G Goettsch
Journal: Value Health Date: 2019-09-06 Impact factor: 5.725

6. Reducing Public Health Risk During Disasters: Identifying Social Vulnerabilities.

Authors: Amy Wolkin; Jennifer Rees Patterson; Shelly Harris; Elena Soler; Sherry Burrer; Michael McGeehin; Sandra Greene
Journal: J Homel Secur Emerg Manag Date: 2015-06-16

7. Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection.

Authors: Yoonsang Kim; Jidong Huang; Sherry Emery
Journal: J Med Internet Res Date: 2016-02-26 Impact factor: 5.428

8. Estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods.

Authors: Kokil Jaidka; Salvatore Giorgi; H Andrew Schwartz; Margaret L Kern; Lyle H Ungar; Johannes C Eichstaedt
Journal: Proc Natl Acad Sci U S A Date: 2020-04-27 Impact factor: 11.205

9. The Story of Goldilocks and Three Twitter's APIs: A Pilot Study on Twitter Data Sources and Disclosure.

Authors: Yoonsang Kim; Rachel Nordgren; Sherry Emery
Journal: Int J Environ Res Public Health Date: 2020-01-30 Impact factor: 3.390

10. Interrater reliability: the kappa statistic.

Authors: Mary L McHugh
Journal: Biochem Med (Zagreb) Date: 2012 Impact factor: 2.313

2 in total

1. Sensing pedestrian flows for real-time assessment of non-pharmaceutical policy interventions during the COVID-19 pandemic.

Authors: Jonas Klingwort; Sofie Mmg De Broe; Sven A Brocker
Journal: Int J Popul Data Sci Date: 2022-01-12

2. Quantifying depression-related language on social media during the COVID-19 pandemic.

Authors: Brent D Davis; Dawn Estes McKnight; Daniela Teodorescu; Anabel Quan-Haase; Rumi Chunara; Alona Fyshe; Daniel J Lizotte
Journal: Int J Popul Data Sci Date: 2022-03-30

2 in total