Literature DB >> 32750135

Estimation Without Representation: Early Severe Acute Respiratory Syndrome Coronavirus 2 Seroprevalence Studies and the Path Forward.

Bonnie E Shook-Sa¹, Ross M Boyce², Allison E Aiello^3,4.

Abstract

The recent development and regulatory approval of a variety of serological assays indicating the presence of antibodies against severe acute respiratory syndrome coronavirus 2 has led to rapid and widespread implementation of seroprevalence studies. Accurate estimates of seroprevalence are needed to model transmission dynamics and estimate mortality rates. Furthermore, seroprevalence levels in a population help guide policy surrounding reopening efforts. The literature to date has focused heavily on issues surrounding the quality of seroprevalence tests and less on the sampling methods that ultimately drive the representativeness of resulting estimates. Seroprevalence studies based on convenience samples are being reported widely and extrapolated to larger populations for the estimation of total coronavirus disease 2019 (COVID-19) infections, comparisons of prevalence across geographic regions, and estimation of mortality rates. In this viewpoint, we discuss the pitfalls that can arise with the use of convenience samples and offer guidance for moving towards more representative and timely population estimates of COVID-19 seroprevalence.

Entities: Disease Species

Keywords: COVID-19; address-based sampling; convenience sampling; seroprevalence; transmission

Mesh：

Substances：
Antibodies, Viral

Year: 2020 PMID： 32750135 PMCID： PMC7454696 DOI： 10.1093/infdis/jiaa429

Source DB: PubMed Journal: J Infect Dis ISSN： 0022-1899 Impact factor: 5.226

LIMITATIONS TO THE GENERALIZABILITY OF EARLY SEVERE ACUTE RESPIRATORY SYNDROME CORONAVIRUS 2 SEROPREVALENCE STUDIES

In addition to direct health impacts, coronavirus disease 2019 (COVID-19) has caused an unprecedented level of disruption to social networks and economic systems. Phased “re-opening” policies are being guided by surrogate measures of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission, including symptom-based (ie, syndromic surveillance), test-based (ie, positivity rates), and facility-based (ie, hospitalizations) measures of disease activity [1]. The United States has experienced shortages of critical testing supplies, which has contributed to an underreporting of cases and struggling mitigation efforts [2]. Given the persistent issues, many individuals face accessing testing and the high proportion of subclinical infections that do not prompt care-seeking [3, 4], these metrics are suboptimal measures of disease activity in the community. The recent development and rapid regulatory approval of serological assays indicating the presence of antibodies against SARS-CoV-2 has led to widespread implementation of seroprevalence studies [5]. These studies have used a wide range of assays and recruitment methods. Although issues surrounding test performance have been the focus of much debate [6], there has been much less discussion around the rigor and appropriateness of sampling frames. In this study, we illustrate some of the pitfalls associated with the use of convenience samples and provide guidance on best practices for quickly generating population-based estimates of seroprevalence. To date, several large seroprevalence studies have been published in both the preprint and peer-reviewed literature. The majority of these studies have used convenience sampling, with participants recruited from online platforms (eg, Facebook), healthcare facilities, market research databases, or shopping centers [7-10]. The benefit of convenience sampling is that recruitment can occur relatively quickly and it is generally less expensive than probability-based sampling approaches. The major disadvantage is that resulting estimates will often not reflect the true seroprevalence in the underlying population due to selection bias. That is, recruitment methods and inherent factors that drive participation in convenience samples often lead to samples that do not reflect the underlying population in terms of demographic composition and risk factors for COVID-19 infection. Moreover, it is very difficult to estimate the level of bias introduced by convenience sampling, especially when there are few population-based studies available for comparison. This can make extrapolation to underlying populations and estimation of mortality rates problematic. Selection bias inherent in convenience samples is largely a result of competing factors that may influence an individual’s participation. For example, an individual with a prior COVID-19 infection might be more likely to volunteer for a seroprevalence study due to recent symptoms compared with someone without a prior infection who has been asymptomatic, thus leading to an overestimate of seroprevalence in the underlying population. Likewise, persons in shopping centers and other public areas where recruitment occurs may be at higher risk for COVID-19 than the general population. In contrast, persons who are avoiding shopping centers, perhaps due to underlying illness or the presence of high-risk individuals in the household, are essentially excluded from participating in seroprevalence studies with this recruitment method. Other design features of convenience samples can result in underestimating seroprevalence. Studies based on social media recruitment have reported underrepresentation of older persons and overrepresentation of non-Hispanic whites [8]. However, African American and Hispanic populations have been disproportionally impacted by the COVID-19 pandemic [11]. Thus, underrepresentation in studies may not only contribute to underestimates of community seroprevalence but also to poorly targeted policy. Although weighting methods [8] or modeling approaches [12] can be used to improve representativeness of convenience samples, such methods rely on assumptions that cannot be validated. These methods typically assume that participants are like a random sample from the population stratified by a known set of characteristics. When this assumption holds, these methods provide unbiased estimates. Violations occur when participation is driven by the outcome itself or when participation and risk for the outcome are driven by factors not accounted for in the analysis. This leaves researchers to speculate about how these errors offset one another and if the results are truly representative [8, 10]. To illustrate some of these biases, we examined characteristics of the underlying population for a seroprevalence study conducted in Santa Clara County, California in early April [8]. Participants were recruited using targeted Facebook advertisements and community listserves. Although researchers tried to recruit such that the distribution of participants would accurately reflect the population geographically, persons in wealthier areas were overrepresented. The sample also underrepresented men, persons 65 and older, Hispanics, and Asians. Researchers weighted the sample such that the weighted distributions of participants by ZIP Code, sex, and race/ethnicity would reflect known county demographics, and weighted estimates of seroprevalence were produced. This approach assumes that participants in the study are like a random sample of Santa Clara residents stratified by ZIP Code, sex, and race/ethnicity. There is some evidence to question the validity of this assumption. Age, a factor correlated with COVID-19 risk [13], was not included as a weighting variable. Although 12.9% of Santa Clara residents are reported as being aged 65 or older, only 4.5% of the weighted sample represented this age group. Furthermore, characteristics such as occupation [14] and social distancing practices [15] are likely drivers of COVID-19 infection and were not accounted for in the weighting or analysis. The results of the Santa Clara study and other convenience samples have been publicized widely in the media [5, 7, 16], and thus the estimates from these studies have the potential to influence policymakers.

BEST PRACTICES FOR GENERATING A REPRESENTATIVE SAMPLING FRAME

Classic sample surveys achieve representation by ensuring that (1) all members of the population of interest have a chance of being included in the study, (2) members of the population are randomly selected for participation, and (3) researchers can quantify the chance that each sampled person was selected. These criteria are grounded in probability theory and have long been used to provide valid inference about target populations from concrete sampling frames, which are lists of population members from which samples are selected. Although sample surveys can suffer from generalizability concerns when there are problems with the sampling frame or participation, these errors have long been recognized, and methods have been developed to minimize errors throughout the survey process [17]. Representative surveys of the general population are commonly based on sampling frames constructed from lists of addresses or telephone numbers. If the goal of a research team is to estimate SARS-CoV-2 seroprevalence within a population residing in a single municipality, a representative estimate can be obtained using household sampling methods. Frasier et al [18] propose a design for representative COVID-19 seroprevalence studies in the United States using address-based sampling (ABS) methods. With ABS, samples are randomly selected from lists of mailing addresses derived from the US Postal Services’ database [19]. Participants are recruited by mail or in-person for study participation, and testing is conducted in neighboring clinics or using self-administered test kits with at-home collection [18]. The ABS methodology has been validated in numerous settings and geographies to have high coverage of the general population [19-21]. Because selection is random and not driven by the participant or the researcher, selection bias due to the sampling method is eliminated. Guidance for sample size determination [18, 22] and the logistics of conducting seroprevalence studies using ABS [18] are available. Sampling techniques such as stratification and clustering can facilitate efficient designs, logistic feasibility, and estimation of subpopulations of interest. Designs can incorporate oversampling of at-risk and vulnerable populations to allow for robust assessments of seroprevalence in these populations. The application of ABS methods to estimate SARS-CoV-2 seroprevalence is new, so characteristics of nonresponse are still unknown. Established methods to enhance community engagement as well as minimize, measure, and adjust for nonparticipation within the sample [17, 23] should be used to ensure that those who participate are representative of the target population. Household sampling can be a time- and cost-intensive process. Because of the urgency to obtain seroprevalence estimates quickly, an efficient approach is to partner with an existing study already collecting representative data in the geography of interest. For example, we are working with a local health department that has an established household cohort for estimating factors related to population health in the county. The sample for the cohort has been selected, and participants have already been recruited to complete annual surveys. Representative estimates of seroprevalence will be obtained relatively quickly by recruiting within the existing study sample [24]. Likewise, researchers in Switzerland obtained representative estimates of seroprevalence by sampling former participants from a representative survey of population health [25]. Collaboration between researchers across disciplines and institutions can facilitate these types of timely but representative estimates of seroprevalence.

Conclusions

Although they are more time-consuming and resource-intensive, representative samples are urgently needed to quantify seroprevalence of COVID-19 and to monitor disease trends over time. These studies will also serve as benchmarks for evaluating the performance of less rigorous methodologies, including convenience samples. Not all researchers can use probability-based sampling methods due to time and cost constraints. In these circumstances, estimates based on extrapolation from convenience samples should clearly outline the assumptions being made, and, when possible, results should be compared with benchmarks from probability-based studies. Although representative studies will take time, we caution against overinterpreting the results of convenience samples in the interim.

10 in total

1. Seroprevalence of SARS-CoV-2-Specific Antibodies Among Adults in Los Angeles County, California, on April 10-11, 2020.

Authors: Neeraj Sood; Paul Simon; Peggy Ebner; Daniel Eichner; Jeffrey Reynolds; Eran Bendavid; Jay Bhattacharya
Journal: JAMA Date: 2020-06-16 Impact factor: 56.272

2. Antibody tests suggest that coronavirus infections vastly exceed official counts.

Authors: Smriti Mallapaty
Journal: Nature Date: 2020-04-17 Impact factor: 49.962

Review 3. Why we need community engagement in medical research.

Authors: Jessica K Holzer; Lauren Ellis; Maria W Merritt
Journal: J Investig Med Date: 2014-08 Impact factor: 2.895

4. Estimating the burden of United States workers exposed to infection or disease: A key factor in containing risk of COVID-19 infection.

Authors: Marissa G Baker; Trevor K Peckham; Noah S Seixas
Journal: PLoS One Date: 2020-04-28 Impact factor: 3.240

5. Social distancing during the COVID-19 pandemic: Staying home save lives.

Authors: Brendon Sen-Crowe; Mark McKenney; Adel Elkbuli
Journal: Am J Emerg Med Date: 2020-04-02 Impact factor: 2.469

6. Cumulative incidence and diagnosis of SARS-CoV-2 infection in New York.

Authors: Eli S Rosenberg; James M Tesoriero; Elizabeth M Rosenthal; Rakkoo Chung; Meredith A Barranco; Linda M Styer; Monica M Parker; Shu-Yin John Leung; Johanne E Morne; Danielle Greene; David R Holtgrave; Dina Hoefer; Jessica Kumar; Tomoko Udo; Brad Hutton; Howard A Zucker
Journal: Ann Epidemiol Date: 2020-06-17 Impact factor: 3.797

7. Covid-19 in South Korea - Challenges of Subclinical Manifestations.

Authors: Joon-Young Song; Jin-Gu Yun; Ji-Yun Noh; Hee-Jin Cheong; Woo-Joo Kim
Journal: N Engl J Med Date: 2020-04-06 Impact factor: 91.245

8. Seroprevalence of anti-SARS-CoV-2 IgG antibodies in Geneva, Switzerland (SEROCoV-POP): a population-based study.

Authors: Silvia Stringhini; Ania Wisniak; Giovanni Piumatti; Andrew S Azman; Stephen A Lauer; Hélène Baysson; David De Ridder; Dusan Petrovic; Stephanie Schrempft; Kailing Marcus; Sabine Yerly; Isabelle Arm Vernez; Olivia Keiser; Samia Hurst; Klara M Posfay-Barbe; Didier Trono; Didier Pittet; Laurent Gétaz; François Chappuis; Isabella Eckerle; Nicolas Vuilleumier; Benjamin Meyer; Antoine Flahault; Laurent Kaiser; Idris Guessous
Journal: Lancet Date: 2020-06-11 Impact factor: 79.321

9. Developing antibody tests for SARS-CoV-2.

Authors: Anna Petherick
Journal: Lancet Date: 2020-04-04 Impact factor: 79.321

10. The important role of serology for COVID-19 control.

Authors: Amy K Winter; Sonia T Hegde
Journal: Lancet Infect Dis Date: 2020-04-21 Impact factor: 25.071

10 in total

15 in total

1. Assessment of SARS-CoV-2 Seroprevalence by Community Survey and Residual Specimens, Denver, Colorado, July-August 2020.

Authors: Kiersten J Kugeler; Laura J Podewils; Nisha B Alden; Tori L Burket; Breanna Kawasaki; Brad J Biggerstaff; Holly M Biggs; Rachael Zacks; Monique A Foster; Travis Lim; Emily McDonald; Jacqueline E Tate; Rachel K Herlihy; Jan Drobeniuc; Margaret M Cortese
Journal: Public Health Rep Date: 2021-11-09 Impact factor: 2.792

2. Spatio-temporal spread of COVID-19: Comparison of the inhomogeneous SEPIR model and data from South Carolina.

Authors: Yoav Tsori; Rony Granek
Journal: PLoS One Date: 2022-06-09 Impact factor: 3.752

3. Ethnoracial Disparities in SARS-CoV-2 Seroprevalence in a Large Cohort of Individuals in Central North Carolina from April to December 2020.

Authors: Cesar A Lopez; Clark H Cunningham; Sierra Pugh; Katerina Brandt; Usaphea P Vanna; Matthew J Delacruz; Quique Guerra; D Ryan Bhowmik; Samuel J Goldstein; Yixuan J Hou; Margaret Gearhart; Christine Wiethorn; Candace Pope; Carolyn Amditis; Kathryn Pruitt; Cinthia Newberry-Dillon; John L Schmitz; Lakshmanane Premkumar; Adaora A Adimora; Ralph S Baric; Michael Emch; Ross M Boyce; Allison E Aiello; Bailey K Fosdick; Daniel B Larremore; Aravinda M de Silva; Jonathan J Juliano; Alena J Markmann
Journal: mSphere Date: 2022-05-19 Impact factor: 5.029

Review 4. A Minimalist Strategy Towards Temporarily Defining Protection for COVID-19.

Authors: Nevio Cimolai
Journal: SN Compr Clin Med Date: 2020-09-19

5. Assessing the age specificity of infection fatality rates for COVID-19: systematic review, meta-analysis, and public policy implications.

Authors: Andrew T Levin; William P Hanage; Nana Owusu-Boaitey; Kensington B Cochran; Seamus P Walsh; Gideon Meyerowitz-Katz
Journal: Eur J Epidemiol Date: 2020-12-08 Impact factor: 8.082

6. SARS-CoV-2 infection in central North Carolina: Protocol for a population-based longitudinal cohort study and preliminary participant results.

Authors: Elyse M Miller; Elle A Law; Rawan Ajeen; Jaclyn Karasik; Carmen Mendoza; Haley Abernathy; Haley Garrett; Elise King; John Wallace; Michael Zelek; Jessie K Edwards; Khou Xiong; Cherese Beatty; Aaron T Fleischauer; Emily J Ciccone; Bonnie E Shook-Sa; Allison E Aiello; Ross M Boyce
Journal: PLoS One Date: 2021-10-25 Impact factor: 3.240

7. SARS-CoV-2 seroprevalence in Aden, Yemen: a population-based study.

Authors: Abdulla Salem Bin-Ghouth; Sheikh Al-Shoteri; Nuha Mahmoud; Altaf Musani; Nasser Mohsen Baoom; Ali Ahmed Al-Waleedi; Evans Buliva; Eman AbdelKreem Aly; Jeremias Domingos Naiene; Rosa Crestani; Mikiko Senga; Amal Barakat; Lubna Al-Ariqi; Khaled Zein Al-Sakkaf; Abeer Shaef; Najib Thabit; Ahmed Murshed; Samuel Omara
Journal: Int J Infect Dis Date: 2021-12-17 Impact factor: 12.074

8. Detection of SARS-CoV-2 IgG antibodies in dried blood spots.

Authors: Coleman T Turgeon; Karen A Sanders; Dane Granger; Stephanie L Nett; Heather Hilgart; Dietrich Matern; Elitza S Theel
Journal: Diagn Microbiol Infect Dis Date: 2021-05-13 Impact factor: 2.803

9. Disparities in SARS-CoV-2 seroprevalence among individuals presenting for care in central North Carolina over a six-month period.

Authors: Cesar A Lopez; Clark H Cunningham; Sierra Pugh; Katerina Brandt; Usaphea P Vanna; Matthew J Delacruz; Quique Guerra; Samuel Jacob Goldstein; Yixuan Jacob Hou; Margaret Gearhart; Christine Wiethorn; Candace Pope; Carolyn Amditis; Kathryn Pruitt; Cinthia Newberry-Dillon; John Schmitz; Lakshmanane Premkumar; Adaora A Adimora; Michael Emch; Ross Boyce; Allison E Aiello; Bailey K Fosdick; Daniel B Larremore; Aravinda M de Silva; Jonathan J J Juliano; Alena J Markmann
Journal: medRxiv Date: 2021-03-30

10. Nationally representative SARS-CoV-2 antibody prevalence estimates after the first epidemic wave in Mexico.

Authors: Ana Basto-Abreu; Martha Carnalla; Leticia Torres-Ibarra; Martín Romero-Martínez; Jesús Martínez-Barnetche; Irma López-Martínez; Rodrigo Aparicio-Antonio; Teresa Shamah-Levy; Celia Alpuche-Aranda; Juan A Rivera; Tonatiuh Barrientos-Gutierrez
Journal: Nat Commun Date: 2022-02-01 Impact factor: 14.919