Literature DB >> 36042870

Ethical Considerations in the Application of Artificial Intelligence to Monitor Social Media for COVID-19 Data.

Abstract

The COVID-19 pandemic and its related policies (e.g., stay at home and social distancing orders) have increased people's use of digital technology, such as social media. Researchers have, in turn, utilized artificial intelligence to analyze social media data for public health surveillance. For example, through machine learning and natural language processing, they have monitored social media data to examine public knowledge and behavior. This paper explores the ethical considerations of using artificial intelligence to monitor social media to understand the public's perspectives and behaviors surrounding COVID-19, including potential risks and benefits of an AI-driven approach. Importantly, investigators and ethics committees have a role in ensuring that researchers adhere to ethical principles of respect for persons, beneficence, and justice in a way that moves science forward while ensuring public safety and confidence in the process.

© The Author(s), under exclusive licence to Springer Nature B.V. 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Entities: Chemical

Keywords: Artificial intelligence; COVID-19; big data; ethics; social media

Year: 2022 PMID： 36042870 PMCID： PMC9406274 DOI： 10.1007/s11023-022-09610-0

Source DB: PubMed Journal: Minds Mach (Dordr) ISSN： 0924-6495 Impact factor: 5.339

Background

The emergence of SARS-coV-2 created a high demand for telehealth and expedited health information as the implementation of lockdowns across the globe disrupted public mobility (Williamson, 2020; Young & Schneider, 2020). Many sectors have had to increase their use of technology to minimize disruptions to their services. Since the initial days of the pandemic, the use of telemedicine and remote learning have increased. Approximately half of US adults began using social media more frequently since the pandemic, with 29.7% of social media users increasing daily use by one to two hours (Williamson, 2020). Artificial intelligence (AI) continues to play an important role in our daily lives in domains such as entertainment, transportation, and the criminal justice system. Within the scope of this paper, AI is defined as the algorithms and methodologies utilized to perform decision-making tasks for collecting and analyzing online research data (Hou et al., 2020). AI subfields, such as machine learning, have been utilized by health researchers to analyze data collected from social media. This implementation of AI has raised ethical questions about the protection of users’ online data and need for standardized regulation of digital technologies (Young & Garett, 2018; Young et al., 2021). The use of AI for health surveillance, which often uses publicly available data, provides a potentially important tool to assist traditional public health surveillance. Traditional methods of epidemiology and statistical analyses on case reporting have potential to benefit from AI. For example, traditional health surveillance, performed by organizations such as the Center for Disease Control and Prevention (CDC), encompass personal identifiable information (PII) bound to the Health Insurance Portability and Accountability Act (HIPAA), which results in the systematic collection of data from primary sources such as hospitals (CDC 2019). AI on health data can be used to rapidly sift through large amounts of data to draw conclusions, hence expediting the timeliness of retrieving results. Data related to diseases require expedited interventions, especially during the COVID-19 pandemic. For this reason, AI can potentially expedite the processing of health data. This paper explores the ethical questions surrounding the use of artificial intelligence to monitor the COVID-19 pandemic, as it pertains to the collection and analysis of social media data on platforms such as Twitter, excluding non-public platforms such as online search (i.e., Google) and mobility patterns.

Social Media and Artificial Intelligence

Digital epidemiology, or the use of web-based data for analysis, surveillance, and prediction of diseases, may offer real-time insight into public knowledge and reaction. For researchers using web-based data, Twitter, Google, and websites/platforms have typically been their preferred sources of data, followed by blogs/forums, Facebook, and other search engines (Mavragani, 2020). For instance, in the months surrounding December 2019, investigators retrospectively explored a popular social media app for COVID-19 related keywords. In this exploration, they concluded that monitoring social media for specific COVID-19-related keywords may have the capacity to detect outbreaks earlier than traditional surveillance systems (Wang et al., 2020). Similarly, surveillance of digital tools such as social media, search engines, and even e-commerce marketplaces provided insights on risk perceptions and emotive responses to the virus. Furthermore, this surveillance strategy allowed for the detection of rumors and misinformation surrounding the pandemic that led to panic purchases. As public reaction and behaviors were displayed online, these tools afforded public officials the opportunity to counter misinformation and curtail undesirable behaviors (Hou et al., 2020). AI was also instrumental in detecting a cluster of pneumonia cases in China prior to the announcement of COVID-19 by the World Health Organization (Stieg, 2020). Machine learning facilitates rapid analysis of large amounts of data from social networking site and search engines for surveillance purposes (Lampos et al., 2021; Sun et al., 2020) or predicting cases (Qin et al., 2020). Machine learning and natural language processing, both facets of AI technology, were used to analyze data from various sources such as public health, digital media, livestock reports, population demographics, and global airline ticketing data to detect early signs of a respiratory outbreak in China (Stieg, 2020). However, due to the rising trend of using social media data in surveillance methods, the gravity of the pandemic, and errors found in the development and application of AI models (Buolamwini & Gebru, 2018; Zhao et al., 2017), ethical concerns on using AI on social media data to monitor COVID-19 warrant examination. Due to the novelty of these methods, committees must determine protocol guidelines within their decision-making process based on their own experiences and expertise. Relatedly, there is a gap in literature related to the ethics behind conducting this type of research. The United States currently has federal and state-level regulations for protecting online data such as the California Online Privacy Protection Act (Netwrix, n.d.; Privacy Policies, n.d.). Protection laws such as these cover scopes related to personal identifiable information and its connection to browsing data. It is important to note, that due to the varying laws that exist across states and countries, researchers are expected to stay up to date on such policies.

Belmont Report

The Belmont Report, published in 1979, identifies basic ethical principles that serve as guidelines within human subject’s research (Protections (OHRP) 2010). It is based on the following three principles: respect for persons, beneficence, and justice. In this paper, we detail each of the three principles within the lens of an AI-enabled social media research perspective. The ethical principles outlined in the Belmont Repot relate to biomedical and behavioral research. Advancements in technology and new avenues of conducting health research bring forth new ethical concerns and harms which the Belmont Report does not directly address. It is important for researchers, and particularly those who sit on institutional review boards (IRBs), to evaluate the principles found within the Belmont Report and determine how these principles may be applied to the analysis of social media data through an AI-enabled lens (Protections (OHRP) 2018). This paper begins to facilitate a conversation on how Belmont Report principles may be applied to new avenues of health research to address the ethical concerns and harms that arise.

Ethical Principles

Respect for Persons

The first principle of the Belmont Report is respect for persons, which requires that participants enter research voluntarily and are provided with sufficient information about the research being conducted. One of the first ethical considerations researchers must address when conducting human subjects research is informed consent. However, analyses of social media data are typically not considered or recognized as human subjects’ research by the scientific research community or institutional review boards across the US. Hence, informed consent is typically not mandatory to conduct this research. Yet, although informed consent is not enforced on publicly available data, it should not be disregarded within the research process, especially as funders might require additional ethical review. In research that utilizes AI methods on publicly sourced data, validating consent with these datasets can be a challenge and is often not feasible. Hence, upholding ethical research principles when informed consent is not required, is of the utmost importance and subject for discussion. Big data derived from posts obtained from social media for surveillance often exclude informed consent in their protocols. Researchers who attempt to collect informed consent are often deterred due to challenges unique to big data. One of the issues in obtaining informed consent is the challenge of attempting to contact and gather consent from thousands or millions of online participants. Logistically, executing this task may take weeks or months to complete, posing a limitation to the research team undergoing this labor. Researchers often prefer to move forward with analyzing publicly available data than to take on this logistical task. The question remains: how can researchers exercise principles of the Belmont Report, such as respect for persons, when informed consent is not possible? Updating the Belmont Report or drafting new principles that consider the complexities of conducting research with public internet data may address this issue. Furthermore, the public nature of social media data often qualifies research studies that utilize it as “exempt”. Research studies classified as “exempt” do not need to go through a thorough IRB process and are viewed as “no risk” or “minimal risk” studies. However, this brings forth many possible harms since social media data on COVID-19 may contain sensitive health information that can be linked back to personal identifiable information. It is important to study the harms that may arise when classifying social media research as “exempt”. While approaches involving social media and artificial intelligence hold great potential, further evaluation is needed to assess the potential risks to PII data when informed consent is absent. Investigators that have studied ethical concerns in using social media data, have found that participants often felt that researchers were “eavesdropping” and invading their privacy (Reuter et al., 2019). These types of concerns provide additional questions for future research to address on the ethics of such approaches. Ultimately, it will be up to each investigator, the ethical review board, and key stakeholders to ensure that usage of publicly available data maintain ethical standards, especially in the case of COVID-19, in which findings may have negative consequences for COVID-19 positive individuals (Kim & Denyer, 2020). Notably, there is a distinction between surveillance using aggregated data and researchers scraping data to evaluate individual datapoints. Twitter’s API policy prohibits researchers from inferring or deriving information about an individual user’s health (“More on restricted use cases—Twitter Developers”, n.d.). Twitter only permits aggregated analyses on sensitive topics, such as health, as long as personal data and identifiers are not stored (e.g. usernames). For instance, aggregated data should not reveal or store personally identifiable information, but instead should depict a broader portrayal of a group. Aggregated data on sensitive characteristics such as race, ethnicity, sexual preference, and age may pose a greater threat to privacy, and hence requires deidentification. By looking at social media data we can infer characteristics such as race and sexual preference and use these to predict HIV status. For instance, researchers using AI to predict the likelihood of a Twitter user contracting COVID-19 based on their historical tweets may warrant concern due to the granularity of using individual-level data. Using AI methods on individual-level data is prohibited; however, it is important to note the distinctions between methods of analyzing aggregated versus individual-level social media data.

Beneficence

Within COVID-19 social media research new harms may arise for participants. The Belmont Report describes the principle of beneficence as the following: “persons are treated in an ethical manner not only by respecting their decisions and protecting them from harm, but also by making efforts to secure their well-being”. Due to limitations in acquiring informed consent, it may not be possible for all users to know ahead of time that their data will be utilized for research. Hence, users must rely on a social media platform’s terms and conditions to be aware of their data’s involvement in research. Social media platforms, such as Twitter, often provide terms and conditions that inform users of privacy features and voluntary research involvement. The following excerpt is from Twitter’s Terms and Conditions: “By submitting, posting or displaying Content on or through the Services, you grant us a worldwide, non-exclusive, royalty-free license (with the right to sublicense) to use, copy, reproduce, process, adapt, modify, publish, transmit, display and distribute such Content in any and all media or distribution methods now known or later developed (for clarity, these rights include, for example, curating, transforming, and translating)” (“Twitter Terms of Service”, n.d.). These Terms and Conditions state that by utilizing Twitter, a user acknowledges and forfeits their data to being processed, analyzed, and published. Twitter’s privacy statement then addresses how a user may control the personal information they share with Twitter through their profile settings. “We give you control through your settings to limit the data we collect from you and how we use it, and to control things like account security, marketing preferences, apps that can access your account, and address book contacts you’ve uploaded to Twitter” (“Privacy Policy”, n.d.). A concern with Twitter’s Terms and Conditions being the primary way of informing participants of their research involvement is whether users will read a company’s Terms and Conditions page, especially if these pages are lengthy. We encourage AI researchers exploring COVID-19 social media data to evaluate the methods in which the principle of beneficence can be upheld despite the circumstance that users may be unaware of their participation. For instance, privacy may be protected by assessing the privacy options presented in Twitter’s Terms and Conditions, yet users may not be aware of these features. Social media users have the choice of making their accounts public or private on certain platforms. This feature allows users to keep their data private, hence opting out of participation in research studies such as those mentioned in this paper. However, Twitter may still collect user data on usage patterns, purchases, device information, etc. for their own research and product testing (“Twitter Privacy Policy”, n.d.). Additionally, in creating their social media profiles, users may opt to using pseudonyms as opposed to personally identifiable information, such as their full name. This feature may enhance privacy by protecting a person’s identity; however, it is still possible to discover this information by exploring a user’s tweets, profile, past conversations, followers, etc. One method of addressing the issue of privacy is for users to rely on privacy settings to protect their data. Confidentiality may be protected by deidentifying data and strictly using aggregated social media data in research studies. Aggregated datasets do not display personal identifiable information, hence protecting confidentiality of health information that may be revealed within tweets. This may address the public’s concerns of health surveillance during the COVID-19 pandemic. Calvo and colleagues describe the concept of “surveillance creep” as surveillance used beyond its original intended purpose in which it becomes more intrusive (Calvo et al., 2020). In using AI to monitor public reactions and behaviors during an unprecedented event, the following concern arises: To what degree will technologies be used in the future to safeguard public interest over individual privacy? Deidentifying and aggregating data may be a method of protecting individual privacy. This methodology may allow researchers to prioritize the well-being of the public while protecting individual-level data, especially in circumstances where COVID-19 diagnoses or symptoms may be monitored. This relates to the principle of beneficence within the Belmont Report by protecting participants from the harm of having their private health information revealed. These ethical concerns present additional questions on how to best develop and implement these methods to safeguard privacy and strengthen data security. Health surveillance within the scope of government monitoring, such as tracking which citizens have contracted COVID-19, may pose a different response than research methods that utilize AI to gather a larger picture of cases. For instance, in China, COVID surveillance systems have been utilized to target individuals and dissidents, creating concerns surrounding privacy and freewill. This is one of the many possible futures that individuals fear may occur within the US if regulations and ethics are not considered when utilizing social media data (Buckley et al., 2022). The concept of “surveillance creep” and public reactions to it may differ depending on the ways in which health surveillance is utilized and the AI methods that are implemented. Hence, AI methods that aggregate data may be viewed differently than data that is tied to personally identifiable information. Concerns such as these may be addressed with guidelines that specifically relate to research that utilize AI to analyze COVID-19 social media data.

Justice

The concept of justice focuses on fairness and states that the benefits and risks of research should be equally distributed. With respect to social media, data collected and analyzed may be influenced by a software’s capacity to extract information. Information shared by social media users, such as geolocation, may be informative to answer research questions, but may also misrepresent certain regions. For instance, if a research study utilizes tweets with geolocation metadata attached, all tweets not containing geocoordinates might be excluded from the study. Notably, the inclusion and exclusion of certain participants may vary depending on several factors. When developing research questions that utilize social media data, researchers must also account for biases and errors in data collection, including lack of representativeness. Since internet access may be limited across different regions, social media data may over-represent ideas and experiences of people with internet access. Subsequently, those with restricted access may have a limited voice. For example, despite recent trends showing an increase in the adoption of digital technologies, rural Americans continue to lag behind nonrural Americans with home broadband internet connectivity (Perrin, 2019). Furthermore, findings showed that rural Americans spend less time on the internet as opposed to nonrural Americans, which may lead to further differences in analyses and reduce generalization of findings across the United States. It is important to note that traditional health surveillance and social media research both contain drawbacks in relation to bias. Within the scope of this paper, we define bias as “prejudice in favor of one group compared to another” (Simpson & Weiner, 1989). Social media demographics may misrepresent communities by utilizing data from individuals who have access to internet and use social media (Cesare et al., 2019; Hargittai, 2020). Hence, data from individuals without internet are left out of analyses and conclusions, leading to biased results. On the other hand, traditional health surveillance also encounters issues with bias when patients are targeted for diagnostics tests leading to more diagnosis within that group (Haut & Pronovost, 2011). Data collected based on incomplete information due to issues such as limited access and biased data collection, may not reflect an accurate picture of the reality some face amidst the pandemic. Similarly, errors in the development of natural language processing models may lead to the wrongful flagging of keywords or predicting findings on a population that may not generalize across a region (Wei, 2020; Seyyed-Kalantari et al., 2021). Hence, a combination of all these errors may create challenges in addressing the pandemic. Researchers must evaluate to what extent these limitations impose on the ethics of their research to best address them.

Conclusion

As the application and utility of AI in COVID-19 social media data research continues to progress, investigators and ethics committees need to evolve alongside this technology. Social media health research has shown promising potential and is being utilized for HIV surveillance (Heerden & Young, 2020), observation of opioid usage across the US (Flores & Young, 2021), and to understand COVID-19 perceptions (Ugarte et al., 2021). The introduction of AI methods to COVID-19 social media research brings forth new ethical concerns and potential harms that require addressing (Ahmed et al., 2021; Jalil et al., 2022). Research principles from the Belmont Report may be utilized to establish best practices for utilizing social media data. Although the creation of standardized guidelines for the ethical conduct of research across institutions for digital tools presents challenges, it merits discussion to preserve the principles of respect for persons, beneficence, and justice. While principles found in the Belmont Report may guide this research, AI’s novelty and unique challenges present additional ethical questions. Potential harms such as constraints in informed consent, limited awareness of privacy settings, and restricted representativeness create ethical concerns that require addressing. Future research, discussion, and attention are needed surrounding the ethics of AI outcomes and how they might be standardized and implemented as these technologies and methods continue to evolve. Evaluating AI-enabled COVID-19 social media research, through a human subject’s research lens, such as those found in the Belmont Report, may be a step forward in promoting and cultivating ethical practices.

17 in total

1. Clinical Care, Research, and Telehealth Services in the Era of Social Distancing to Mitigate COVID-19.

Authors: Sean D Young; John Schneider
Journal: AIDS Behav Date: 2020-07

2. COVID-19 Related Sentiment Analysis Using State-of-the-Art Machine Learning and Deep Learning Techniques.

Authors: Zunera Jalil; Ahmed Abbasi; Abdul Rehman Javed; Muhammad Badruddin Khan; Mozaherul Hoque Abul Hasanat; Khalid Mahmood Malik; Abdul Khader Jilani Saudagar
Journal: Front Public Health Date: 2022-01-14

Review 3. Infodemiology and Infoveillance: Scoping Review.

Authors: Amaryllis Mavragani
Journal: J Med Internet Res Date: 2020-04-28 Impact factor: 5.428

4. Using WeChat, a Chinese Social Media App, for Early Detection of the COVID-19 Outbreak in December 2019: Retrospective Study.

Authors: Wenjun Wang; Yikai Wang; Xin Zhang; Xiaoli Jia; Yaping Li; Shuangsuo Dang
Journal: JMIR Mhealth Uhealth Date: 2020-10-05 Impact factor: 4.773

5. Public Attitudes About COVID-19 in Response to President Trump's Social Media Posts.

Authors: Dominic Arjuna Ugarte; William G Cumberland; Lidia Flores; Sean D Young
Journal: JAMA Netw Open Date: 2021-02-01

6. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations.

Authors: Laleh Seyyed-Kalantari; Haoran Zhang; Matthew B A McDermott; Irene Y Chen; Marzyeh Ghassemi
Journal: Nat Med Date: 2021-12-10 Impact factor: 87.241