Literature DB >> 35966127

Modeling the optimization of COVID-19 pooled testing: How many samples can be included in a single test?

Abstract

Objectives: This study tries to answer the crucial question of how many biological samples can be optimally included in a single test for COVID-19 pooled testing.
Methods: It builds a novel theoretical model which links the local population to be tested in a region, the number of biological samples included in a single test, the "attitude" toward resource cost saving and time taken in a single test, as well as the corresponding resource cost function and time function, together. The numerical simulation results are then used to formulate the resource cost function as well as the time function. Finally, a loss function to be minimized is constructed and the optimal number of samples included is calculated.
Results: In a numerical example, we consider a region of 1 million population which needs to be tested for the infection of COVID-19. The solution calculates the optimal number of biological samples included in a single test as 4.254 when the time taken is given the weight of 50% under the infection probability of 10%. Other combinations of numerical results are also presented. Conclusions: As we can see in our simulation results, given the infection probability at 10%, setting the number of biological samples included in a single test (in the integer level) at [4,6] is reasonable for a wide range of the subjective attitude between time and resource costs. Therefore, in the current practice, 5-mixed samples would sound better than the commonly used 10-mixed samples.

Entities: Chemical

Keywords: COVID-19; Machine learning; Pooled testing; Resource cost function; Time function

Year: 2022 PMID： 35966127 PMCID： PMC9357440 DOI： 10.1016/j.imu.2022.101037

Source DB: PubMed Journal: Inform Med Unlocked ISSN： 2352-9148

Introduction

Along with the rapid development of the COVID-19 pandemic around the globe, the need for testing the virus of SARS-CoV-2 at a massive level increases dramatically. The detection of SARS-CoV-2 RNA by Polymerase Chain Reaction (PCR) technology in testing is a common and sensitive method of COVID-19 diagnosis [1,2], even for reinfection [3]. In fact, whether such diagnostic tests are rapid and accurate enough or not can be the key to containing the pandemic [4,5]. After the outbreak in early 2020, to implement the full scale of work resumption in Wuhan, China, from May 15 to May 24, 2020, 9 million biological samples were collected, where >6.5 million were laboratory-tested [6,7]. In mid-August 2020, as school reopening is imminent, a massive level of COVID-19 tests with fast results was expected in the US [8]. On October 11, 2020, a small wave of outbreaks took place in Qingdao, China [9]. As a rapid response, the massive level population was quickly tested for COVID-19. A similar situation emerged in Chengdu, Shijiazhuang, Shenyang, and some other cities in China at the end of 2020. Mass testing is proven to be an efficient way to act as the urgent response to disease control [[10], [11], [12], [13], [14]], especially at the screening stage [[15], [16], [17], [18]], where frequency and turnaround time are more important than the test sensitivity [19]. In practice, this would reduce the shock to the economy due to the COVID-19 pandemic to a large extent [20,21]. In addition, the widespread COVID-19 testing in both clinical and non-clinical settings is also paid much attention [[22], [23], [24], [25]]. Unfortunately, the failure to test the possible COVID-19 cases early enough would cause tragic results to contain outbreaks [26]. In fact, the limited availability of testing COVID-19 may be one important reason for the disastrous outcome of the pandemic in many places [27]. In the meantime, more and more studies emerge to examine the practical performance and testing strategies for SARS-CoV-2 [28,29], where the optimal testing strategies and maths-based strategies have begun to be discussed [30,31]. Now, the pooled testing technique became more and more popular since July 2020 in both the US and many other regions [32]. As self-explained by its name, COVID-19 pooled testing mixes several biological samples in a “batch”, and then tests the pooled samples with a diagnostic test. This approach increases the number of samples to be tested using the same (or similar) amount of resources, hence reducing the medical cost and saving the time for a single test. The advantage of pooled testing is obvious in the sense that we can save substantial time by examining a single test for mixed biological samples from multiple individuals. In addition, it can be quite cost-effective as well since fewer quantities of scarce resources, such as test kits, other testing supplies, and medical workers, are used. In fact, cost-effectiveness plays a central role in the strategies of SARS-CoV-2 testing from both medical and management perspectives [33,34]. Indeed, “fast and cheap” are essential to obtain and manage the tests [[35], [36], [37]]. Most importantly, it can boost the test capacity of a region dozens of times. For a simple example, if 10 biological samples are mixed for a single test, then theoretically the local test capacity for identifying the COVID-19 cases would be increased by about 10 times. In the fourth quarter of 2020, the global daily increase of confirmed COVID-19 cases keeps at the one-million level. In many places all over the world, million-scale biological samples need to be tested every day. Although the multi-sample pools have already been academically studied [[38], [39], [40], [41]], to our knowledge, the optimization of COVID-19 pooled testing is still not paid enough attention. In practice, it appears that the “rule of the thumb” prevails. Sometimes, 5 biological samples are mixed, while sometimes 10 samples or even more are combined. Then, how many samples can be optimally included in a single test? Easy as it appears to be, however, the answer to such a simple question is yet unknown. This study, therefore, tries to answer this crucial question and contribute to both the literature and practice. This paper attempted to obtain the optimal number of samples to be included in a single test for COVID-19 pooled testing, to balance the trade-offs between first-attempt costs and re-examine costs. Especially from a practical perspective, it studies an important problem to deal with the COVID-19.

Methods

The basic model

Let us start from a simple setting. Considering that a local population to be tested in a region is N , and the number of biological samples included in a single test is G , hence the number of tests can be denoted as . In addition, we consider both the physical or monetary resource cost and the time taken in each test. Therefore, the optimization of COVID-19 pooled testing requires a tradeoff between the two aspects. Consequently, we introduce a parameter ζ to show the “attitude” toward resource cost saving. The corresponding “attitude” toward time-saving is thus 1-ζ. Moreover, we consider that the resource cost in monetary value for a single test is C (G ), and the time taken in a single test is T (G ). As we can see, we treat both C and T as functions of G . This assumption is for a more general purpose, which can be simplified if both C and T are constants. However, we do see the nonlinear relationship in reality. Therefore, such a non-linear relationship may be more attractive. Now, the total resource cost in the monetary value of the COVID-19 test in a region, C , can be expressed as follows. The corresponding total time taken, T , can be expressed as follows as well. Thus, we form a loss function to be minimized. Substituting Eq. (1) and Eq. (2) into Eq. (3), and after simplification, we have: Although the construction of the above loss function appears to be subjective, it is a reasonable representation of the trade-off between resource cost and time cost in COVID-19 testing. Solving the first-order condition from the above expression, we can obtain the optimal value of the number of samples in a single test below.

The extended model

Here, we introduce an extended model, which fits reality better. According to the standard protocol of the pooled testing of COVID-19, the positive or negative outcome of the pooled test would result in different operational treatments. Therefore, we need to consider the outcome of a single test, which would be “Yes” (i.e., positive) or “No” (i.e., negative). If “No”, then this test is passed. If “Yes”, then everybody in this group of samples would be tested again individually. In addition, we need to consider the “background” of the test, which is the infection intensity or severity of COVID-19 in a region. We thus introduce a probability of local COVID-19 infection, P . Now, the probability of the positive result in a single test, i.e., P , can be expressed as follows. This is a multiplication of probabilities. First, there is a “background” infection rate in the local population. Second, it is the ratio of the sampled group to be positively tested. Therefore, when we link them together, we obtain Eq. (6). Of course, such notation relies on the assumption of uniform distribution of the COVID-19 infection in a target group of the population. Then, the corresponding probability of the negative result in a single test, P , becomes 1-P . Notably, when the result in a single test is positive and everyone in the group needs to be re-examined, both the resource cost and time required in that group would increase dramatically. We hence denote the resource cost of re-examining a group to be , where “(G =1)” means that the function “C (G )” is evaluated at the particular value of G =1. Please consider “” as a generalized function, then it would be easier to understand here. The corresponding time associated with the re-examination would be as well. As we can see, now both C and T are not functions of G anymore as G takes the specific numerical value of 1. Finally, we can set up the mathematical expectation of the loss function as below. Simplifying the above equation along with Eq. (6), we have: In order to find the optimal solution, now we differentiate the above equation with respect to G and we obtain the following expression. Multiplying the term G on both sides of the above equation and after rearrangement, we have: As shown above, the method used here follows the idea of artificial intelligence and especially machine learning, which starts from a loss function as Eq. (7) has shown. This strand of methods has already emerged in the study of COVID-19 pandemic-related issues [[42], [43], [44]]. Now, for the more general situation, without any form of simplification, we assume the following functional form. We therefore have: Similarly, we assume: And we have: Substituting the above expressions into Eq. (10), we have: After rearrangement, we finally have: As we know, the cubic equation is “notoriously” difficult to find a closed-form solution. Unfortunately, we cannot find a closed expression for the solution of the above equation. However, we can still use numerical methods to find an answer.

Data

As shown previously, the functional form of the resource cost for a single test C (G ), and the time taken T (G ) would be crucial. However, the available information on such functional form is very limited. Therefore, we have to derive them from both real-world data and simulation-based techniques. For the resource cost of the test, we use the price of the test to stand for its overall resource cost. The trouble is the number of valid observations. Using the latest information on the prices of single or pooled tests in China, we construct a data set of only 21 observations (Table 1 ).

Table 1

Constructed data set for the resource cost of COVID-19 testing.

C_Test	G_Test
Unit: RMB Yuan	Unit: Number of samples
80	1
150	5
200	10
150	10
60	1
58	1
90	5
130	10
40	1
40	5
80	10
35	1
100	5
50	5
150	10
16	1
25	5
50	10
6	1
17.5	5
35	10

Constructed data set for the resource cost of COVID-19 testing. In early 2021, 80 yuan is charged for a single-sample test, 30 yuan per person is charged for the pooled test mixed with 5 samples, and 20 or 15 yuan per person is charged for the pooled test mixed with 10 samples [45]. In the latest practice, the price charged has been reduced substantially. This is why we can see a downward-sloping trend in the associated prices of the PCR test. Regarding the time taken for the tests, from a description by the medical worker [46], a single biological sample requires 2–4 h to be handled and a mix of 500 samples requires 8–10 h. Therefore, we construct a small data set with 6 observations from such description (Table 2 ).

Table 2

Constructed data set for the time of COVID-19 testing.

T_Test	G_Test
Unit: Number of hours	Unit: Number of samples
2	1
3	1
4	1
8	500
9	500
10	500

Constructed data set for the time of COVID-19 testing.

Results

As discussed above, the estimation of the resource cost function and time function is challenging. We, therefore, take a two-step approach for each of the functions. First, we use Ordinary Least Square (OLS) upon the small-sample data set. Second, using the OLS results as the initial values, we simulate the large-sample results by the Linear Gaussian Model (LGM) with Gibbs sampling to apply the Markov Chain Monte Carlo (MCMC). These numerical simulation results are then used to formulate the resource cost function as well as the time function. Therefore, from the MCMC results with 20 million iterations, we obtain the numerical values as: a=15.882, b=1.043, d=2.994, e=2.40E-05 (Table 3 , Table 4 ). Substituting these parameter values back into Eq. (18), we can have the full information of the important functions.

Table 3

Empirical estimation results for resource cost function of COVID-19 testing, with dependent variable C (N = 21).

	Model (3–1) OLS	Model (3–2) LGMMCMC Draws: 10,000	Model (3-3) LGMMCMC Draws: 1,000,000	Model (3–4) LGMMCMC Draws: 10,000,000	Model (3–5) LGMMCMC Draws: 20,000,000
Constant	45.178*** (3.205)	15.923* (1.834)	15.879* (1.845)	15.878* (1.846)	15.882* (1.847)
G_Test²	0.696*** (2.938)	1.044*** (5.346)	1.043148*** (5.337)	1.043*** (5.342)	1.043*** (5.338)
Adjusted R²	0.276

Note: t statistics in parentheses. ***p ≤ 0.01, ** 0.01< p < 0.05, *0.05< p < 0.1.

Table 4

Empirical estimation results for time function of COVID-19 testing, with dependent variable T (N = 6).

	Model (4–1) OLS	Model (4–2) LGMMCMC Draws: 10,000	Model (4–3) LGMMCMC Draws: 1,000,000	Model (4-4) LGMMCMC Draws: 10,000,000	Model (4–5) LGMMCMC Draws: 20,000,000
Constant	3.000*** (5.196)	2.996*** (6.715)	2.994*** (6.693)	2.994*** (6.701)	2.994*** (6.705)
G_Test²	2.40E-05*** (7.348)	2.40E-05*** (9.562)	2.40E-05*** (9.486)	2.40E-05*** (9.486)	2.40E-05*** (9.486)
Adjusted R²	0.914

Note: t statistics in parentheses. ***p ≤ 0.01, ** 0.01< p < 0.05, *0.05< p < 0.1.

Empirical estimation results for resource cost function of COVID-19 testing, with dependent variable C (N = 21). Note: t statistics in parentheses. ***p ≤ 0.01, ** 0.01< p < 0.05, *0.05< p < 0.1. Empirical estimation results for time function of COVID-19 testing, with dependent variable T (N = 6). Note: t statistics in parentheses. ***p ≤ 0.01, ** 0.01< p < 0.05, *0.05< p < 0.1. Now we can make a numerical example of the result, i.e., the optimal number of biological samples included in a single test. We consider a region of 1 million population who needs to be tested for the infection of COVID-19. Thus, besides the above functional parameter values, we also set the following numerical values: N =1,000,000, P =0.1, ζ=0.5. Numerically solving Eq. (18), we have G *=4.254024. Please note that the number of samples included in a single test should be an integer. Therefore, the calculated result can be made practical to the closest integer.

Discussion

The previous model and numerical results show the promise of conducting the COVID-19 pooled testing optimally both in theory and practice. Although there are already some other studies on a similar topic, the novelty of this study is likely to be the trade-off between cost and time constraints, which is comprehensively examined in the loss function mentioned in the above models. However, when the exogenous conditions change, such calculated results would change accordingly. In Table 5 , we show the new results when N and ζ change their numerical values. As we can see, when N becomes smaller, the number of samples included in a single test would be slightly larger. But it does not really make a difference in the practice level. On the contrary, the smaller ζ leads to a significantly larger value in G *. This result confirms our intuition that when the time taken in the test is given more weight in the decision (i.e., smaller value in ζ), there must be more samples included in a single test.

Table 5

Numerical simulation of G*.

	Model (5–1) N_Total=10,000 P_Infection=0.1ζ=0.5	Model (5–2) N_Total=10,000,000 P_Infection=0.1ζ=0.5	Model (5–3) N_Total=1,000,000 P_Infection=0.1ζ=0.1	Model (5–4) N_Total=1,000,000 P_Infection=0.1ζ=0.9
G_Test*	4.253798	4.253532	6.407177	3.942798

Numerical simulation of G*. However, the discussion of P is much more complex. Although we do not have a closed-form solution for G * now, we can still use the Implicit Function Theorem to derive the partial derivative of G with respect to P as below. Simplifying the above equation and rearranging terms, we have: We can easily locate a particular value where as follows. Please note that such a particular value only means that P does not influence G . However, the expression of Eq. (20) also reveals the complex nature of such a relationship. It can be either positive or negative in a more general situation. Therefore, the sign of the above expression may be ambiguous.

Conclusion and future work

This study has proposed a theoretical model for the number of biological samples needed in a single test. It will help in resource and time minimization. As we can see in our simulation results, given the infection probability at 10%, setting the number of biological samples included in a single test (in the integer level) at [4,6] is reasonable for a wide range of the subjective attitude between time and resource costs. If we relax this criterion for a little bit, the range of [3,7] would also be OK for the mixed biological samples from multiple individuals. Therefore, in the current practice, 5-mixed samples would sound better than the commonly used 10-mixed samples. The method proposed in this study is of general purpose and it can be applied elsewhere other than in China. It can be used to analyze issues of other infectious diseases as well. One just needs to calibrate the functional parameters again with the local data, and the model would be still working. Although we may have made some novel contributions in this paper, this study still has some shortcomings. On the one hand, concerns regarding the pooled testing arise in the sense that the virus concentration would decrease after being mixed with other samples. Indeed, sensitivity matters in the test [47,48]. Since samples are diluted, there is a higher likelihood to obtain false-negative results. Therefore, the chance of error in test results may be higher in mixed samples than that in a pure sample when other things are equal, which may result in higher infection risk as well as re-examine cost [49]. It is true in reality in China. In some less developed regions (or cities), 20 samples are pooled together in a single test, which is reported to be less accurate. In fact, some cases show that it may be one of the reasons for the delay in the detection of infected residents in the local area. Therefore, the expression of concern here is meaningful. On the other hand, the data used in this study are still very limited. More information is needed to obtain the resource cost function and time function in the COVID-19 testing with higher accuracy. These issues among others are left for future research.

Funding

Supported by Guanghua Talent Project of .

Data availability statement

All data used in this study are publicly available.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

29 in total

1. Failing the Test - The Tragic Data Gap Undermining the U.S. Pandemic Response.

Authors: Eric C Schneider
Journal: N Engl J Med Date: 2020-05-15 Impact factor: 91.245

2. SARS-CoV-2 testing for public health use: core principles and considerations for defined use settings.

Authors: Catharina Boehme; Emma Hannay; Rangarajan Sampath
Journal: Lancet Glob Health Date: 2021-01-28 Impact factor: 26.763

3. Change in Saliva RT-PCR Sensitivity Over the Course of SARS-CoV-2 Infection.

Authors: Zion Congrave-Wilson; Yesun Lee; Jaycee Jumarang; Stephanie Perez; Jeffrey M Bender; Jennifer Dien Bard; Pia S Pannaraj
Journal: JAMA Date: 2021-09-21 Impact factor: 56.272

4. Assessment of protection against reinfection with SARS-CoV-2 among 4 million PCR-tested individuals in Denmark in 2020: a population-level observational study.

Authors: Christian Holm Hansen; Daniela Michlmayr; Sophie Madeleine Gubbels; Kåre Mølbak; Steen Ethelberg
Journal: Lancet Date: 2021-03-17 Impact factor: 79.321

5. Swabs Collected by Patients or Health Care Workers for SARS-CoV-2 Testing.

Authors: Yuan-Po Tu; Rachel Jennings; Brian Hart; Gerard A Cangelosi; Rachel C Wood; Kevin Wehber; Prateek Verma; Deneen Vojta; Ethan M Berke
Journal: N Engl J Med Date: 2020-06-03 Impact factor: 91.245

6. A nanoluciferase SARS-CoV-2 for rapid neutralization testing and screening of anti-infective drugs for COVID-19.

Authors: Xuping Xie; Antonio E Muruato; Xianwen Zhang; Kumari G Lokugamage; Camila R Fontes-Garfias; Jing Zou; Jianying Liu; Ping Ren; Mini Balakrishnan; Tomas Cihlar; Chien-Te K Tseng; Shinji Makino; Vineet D Menachery; John P Bilello; Pei-Yong Shi
Journal: Nat Commun Date: 2020-10-15 Impact factor: 14.919

7. Insight into the practical performance of RT-PCR testing for SARS-CoV-2 using serological data: a cohort study.

Authors: Zhen Zhang; Qifang Bi; Shisong Fang; Lan Wei; Xin Wang; Jianfan He; Yongsheng Wu; Xiaojian Liu; Wei Gao; Renli Zhang; Wenfeng Gong; Qiru Su; Andrew S Azman; Justin Lessler; Xuan Zou
Journal: Lancet Microbe Date: 2021-01-19

8. Performance and operational feasibility of antigen and antibody rapid diagnostic tests for COVID-19 in symptomatic and asymptomatic patients in Cameroon: a clinical, prospective, diagnostic accuracy study.

Authors: Yap Boum; Karl Njuwa Fai; Birgit Nicolay; Akenji Blaise Mboringong; Lisa M Bebell; Mark Ndifon; Aristide Abbah; Rachel Essaka; Lucrèce Eteki; Francisco Luquero; Céline Langendorf; Nicole Fouda Mbarga; Rene Ghislain Essomba; Bongkiyung Donald Buri; Tchoula Mamiafo Corine; Bertrand Tchualeu Kameni; Nadia Mandeng; Mahamat Fanne; Anne-Cécile Zoung-Kani Bisseck; Clement B Ndongmo; Sara Eyangoh; Achta Hamadou; Jean Patrick Ouamba; Modeste Tamakloé Koku; Richard Njouom; Okomo Marie Claire; Linda Esso; Emilienne Epée; Georges Alain Etoundi Mballa
Journal: Lancet Infect Dis Date: 2021-03-25 Impact factor: 25.071

9. Rapid triage for COVID-19 using routine clinical data for patients attending hospital: development and prospective validation of an artificial intelligence screening test.

Authors: Andrew A S Soltan; Samaneh Kouchaki; Tingting Zhu; Dani Kiyasseh; Thomas Taylor; Zaamin B Hussain; Tim Peto; Andrew J Brent; David W Eyre; David A Clifton
Journal: Lancet Digit Health Date: 2020-12-11

10. Clarifying the evidence on SARS-CoV-2 antigen rapid tests in public health responses to COVID-19.

Authors: Michael J Mina; Tim E Peto; Marta García-Fiñana; Malcolm G Semple; Iain E Buchan
Journal: Lancet Date: 2021-02-17 Impact factor: 79.321