Conrad W Safranek1, David Scheinker2. 1. Department of Biology, Computational Biology, Stanford University, CA; Department of Management Science and Engineering, Stanford University School of Engineering, CA. 2. Department of Management Science and Engineering, Stanford University School of Engineering, CA; Department of Pediatrics, Stanford University School of Medicine, CA; Clinical Excellence Research Center, Stanford University School of Medicine, CA. Electronic address: dscheink@stanford.edu.
Abstract
PURPOSE: No method is available to systematically study SARS-CoV-2 transmission dynamics using the data that rideshare companies share with government agencies. We developed a proof-of-concept method for the analysis of SARS-CoV-2 transmissions between rideshare passengers and drivers. METHOD: To assess whether this method could enable hypothesis testing about SARS-CoV-2, we repeated ten 200-day agent-based simulations of SARS-CoV-2 propagation within the Los Angeles County rideshare network. Assuming data access for 25% of infections, we estimated an epidemiologist's ability to analyze the observable infection patterns to correctly identify a baseline viral variant A, as opposed to viral variant A with mask use (50% reduction in viral particle exchange), or a more infectious viral variant B (300% higher cumulative viral load). RESULTS: Simulations had an average of 190,387 potentially infectious rideshare interactions, resulting in 409 average diagnosed infections. Comparison of the number of observed and expected passenger-to-driver infections under each hypothesis demonstrated our method's ability to consistently discern large infectivity differences (viral variant A vs. viral variant B) given partial data from one large city, and to discern smaller infectivity differences (viral variant A vs. viral variant A with masks) given partial data aggregated across multiple cities. CONCLUSIONS: This novel statistical method suggests that, for the present and subsequent pandemics, government-facilitated analysis of rideshare data combined with diagnosis records may augment efforts to better understand viral transmission dynamics and to measure changes in infectivity associated with nonpharmaceutical interventions and emergent viral strains.
PURPOSE: No method is available to systematically study SARS-CoV-2 transmission dynamics using the data that rideshare companies share with government agencies. We developed a proof-of-concept method for the analysis of SARS-CoV-2 transmissions between rideshare passengers and drivers. METHOD: To assess whether this method could enable hypothesis testing about SARS-CoV-2, we repeated ten 200-day agent-based simulations of SARS-CoV-2 propagation within the Los Angeles County rideshare network. Assuming data access for 25% of infections, we estimated an epidemiologist's ability to analyze the observable infection patterns to correctly identify a baseline viral variant A, as opposed to viral variant A with mask use (50% reduction in viral particle exchange), or a more infectious viral variant B (300% higher cumulative viral load). RESULTS: Simulations had an average of 190,387 potentially infectious rideshare interactions, resulting in 409 average diagnosed infections. Comparison of the number of observed and expected passenger-to-driver infections under each hypothesis demonstrated our method's ability to consistently discern large infectivity differences (viral variant A vs. viral variant B) given partial data from one large city, and to discern smaller infectivity differences (viral variant A vs. viral variant A with masks) given partial data aggregated across multiple cities. CONCLUSIONS: This novel statistical method suggests that, for the present and subsequent pandemics, government-facilitated analysis of rideshare data combined with diagnosis records may augment efforts to better understand viral transmission dynamics and to measure changes in infectivity associated with nonpharmaceutical interventions and emergent viral strains.
The emergence of novel, partially vaccine-resistant strains of SARS-CoV-2 poses a serious threat to ongoing public health efforts.[1,2] Policies to contain the spread of COVID-19 rely on understanding what such strains mean for the risk of transmission and how they may impact optimal strategies to contact trace, test, quarantine, and vaccinate.[3,4] Important factors in our understanding of viral transmission include the effectiveness of non-pharmaceutical interventions (NPIs) such as facemasks;[4,5] the prevalence of superspreaders in the population;[6] the increased infectivity of new viral strains;[7,8] and the ongoing extent to which vaccines reduce infectivity given emergent strains and waning immunity.[2,9] The emergence and rapid spread of the Delta and Omicron variants have highlighted the limitations of the current infrastructure, even after data-sharing improvements during the first year of the pandemic, to offer only retrospective identification of emergent viral strains and their transmission dynamics.[1]The study of the transmission of SARS-CoV-2 has been constrained primarily to specialized settings, such as laboratory experimentation, by examining infections in households, trains, hospitals, and military facilities, and through placebo-controlled vaccine clinical trials.[5,[10], [11], [12], [13], [14]] The usefulness of these data is limited by the difficulty of translating the results to other settings and the emergence of new viral strains. Practical and ethical limitations to clinical trials and the challenges of drawing causal inference from observational data limit our ongoing understanding of real-world transmission dynamics for COVID-19. For example, more than nine months after the emergence of the Delta variant with over 50,000 diagnosed “breakthrough” infections in vaccinated individuals,[15] the research community still lacked consensus on whether these infected vaccinated individuals had a reduced level of transmissibility relative to infected non-vaccinated individuals,[16,17] or whether they were equally contagious.[18,19]Analysis of data collected via digital disease surveillance may help elucidate evolving SARS-CoV-2 transmission dynamics.[20] Mobile phone applications for contact tracing (e.g. Google Apple Exposure Notification technology) gather extensive real-world data, but with limited information about the highly variable settings in which these interactions take place.[21] In contrast, in-vehicle transmission dynamics can be well characterized and are relatively consistent. Aerosolized viral dynamics within cars have been studied,[22] and rideshare companies already share data with government health agencies for contact tracing,[23] but to the best of our knowledge no statistical method is available for the use of these data for the systematic study of the transmission dynamics of COVID-19.The interactions between passengers and drivers facilitated by rideshare platforms such as Uber and Lyft are, essentially, a series of partially controlled, standardized, pseudo-random experiments of SARS-CoV-2 transmissions. Rideshare trips are often the only connection between individuals; are governed by mask-wearing policies implemented on specific dates; occur in a relatively controlled environment with fairly consistent spatial dynamics; and are sporadic with respect to time, location, and duration. Previous studies have associated rideshare trips with up to a 10-fold increase in COVID-19 infection risk,[24] suggesting that infection patterns among rideshare trips may be substantial. Furthermore, rideshare location and time data are stored in a machine-readable format that could be linked to data such as diagnoses or self-reported symptoms (anonymized in accordance with local legislation). Thus, rideshare trip data may facilitate the automated identification of the emergence of new viral strains and the study of their transmission dynamics, with and without the presence of NPIs and vaccination, and based on passenger and driver characteristics.The unknown and potentially low “signal-to-noise” ratio of detected-to-undetected viral infections in a rideshare network presents a challenge for the usefulness of a statistical method to analyze rideshare data merged with diagnosis data.[25] The potential utility of such a method depends on its performance accurately detecting rideshare-acquired infections while accounting for undiagnosed infections as well as for “false-positive” rideshare interactions that appear to have resulted in transmission even though the potential infectee contracted the virus elsewhere.We develop Rideshare Infection Detection (RIDE), a probabilistic method and model of rideshare transmissions designed to test hypotheses about the emergence of novel strains of SARS-CoV-2 and their transmission dynamics. Since aggregated rideshare and diagnosis data are not currently available, we simulate viral transmission data in a hypothetical rideshare network based on empirical data from a large US city. We use these simulated data to test hypotheses about transmission dynamics, while assuming access to only the kinds of data that may be available in practice. Primarily this study attempts to assess (1) RIDE's ability to detect changes in infectivity associated with either mask use or a more infectious, emergent viral variant, and (2) RIDE's ability to detect an increase in superspreading. Ultimately, we aim to demonstrate that future application of RIDE with real-world data has the potential to augment understanding of viral transmission dynamics, thereby informing policy makers in their response to changing conditions during the current and future pandemics.
METHODS
Overview
We derive a mathematical model of in-vehicle viral transmission. Using large-scale, agent-based computer simulation with empirical diagnoses and rideshare data from Los Angeles County, we apply this mathematical transmission model to generate hypothetical SARS-CoV-2 rideshare infection patterns representative of a large US city during quarantine. We develop RIDE, a statistical method for the analysis of these simulated infection patterns assuming access only to the kind of data that would be available to an epidemiologist, e.g. with incomplete knowledge of infections and infection origins. As a proof-of-concept, we apply a limited Monte Carlo technique to test the power of this statistical method to differentiate between the mathematical transmission parameters corresponding to a baseline strain versus other hypothetical transmission scenarios.
Mathematical model of transmission
We estimate patient infectivity relative to symptom onset with a general mathematical model of viral transmission adapted with SARS-CoV-2-specific parameter estimates from the literature.[10,26,27] In-vehicle probability-of-infection functions incorporating ride duration and ride timing relative to the infector's symptom onset are derived for unique passenger-to-driver and driver-to-passenger transmission dynamics. We define four hypothetical scenarios of transmission (Table 1
). Scenario “viral variant A” assumes no facemask use and overall viral load homogeneity across infected individuals. This serves as the baseline. In scenario “viral variant A with masks,” we represent masking with a 50% reduction in viral particle exchange between infector and potential infectee. In scenario “viral variant A with increased superspreading,” we increase the likelihood of superspreading by introducing infectivity asymmetry in the population, with 1 in 20 infected individuals having a 5-fold higher overall probability of transmitting SARS-CoV-2.[6] In scenario “viral variant B,” based on research detailing the emergent D614G SARS-CoV-2 strain, we introduce a viral variant with 300% higher viral load relative to the baseline viral variant A (Supplementary Section II).[7]
Table 1
Hypothesized sets of SARS-CoV-2 transmission parameters and probability of infection for populations with differing transmission characteristics. See Supplementary Section II for details of the mathematical model of in-vehicle viral transmission.
Parameters
Estimates from literature (mean [95% confidence interval])
“Viral variant A …
“Viral variant B”
… ”
… with masks”
… with increased super-spreading”
No masks, baseline super- spreading
Masks, baseline super- spreading†
No masks, 5% super- spreaders‡
No masks, baseline super- spreading
Viral expulsion relative to viral variant A — %
Not available
100%
-
-
300%
Viral exchange reduction due to NPIs — %
Not available
-
50%
-
-
Infectivity of “superspreader” relative to baseline viral variant A — %
Not available
-
-
500%
-
Days infectious before symptom onset — days
2.3 [0.8-3.0]
2.3
2.3
2.3
3
Peak infectivity relative to symptom onset— days *
-0.7 [−2.0-0.2]
-0.7
-0.7
-0.7
-2
probability ofpassenger-to-driverinfection from 20 min ride w/ passenger on day of symptom onset — %
-
1.5%
0.7%
7.40%
2.8%
probability ofdriver-to-passengerinfection from 20 min ride w/ driver on day of symptom onset — %
-
2.9%
1.5%
14.70%
5.6%
Adjusted for via gamma infectivity distribution shape and scale parameters.
Passenger and driver both wearing masks, blocking 50% of viral particle exchange.
Increased likelihood of superspreading: Assumes 5% of infected individuals are “superspreaders” with 20x increased cumulative viral load; the remaining 95% of infected individuals are 78.9% as infectious as baseline individuals with viral variant A.
Hypothesized sets of SARS-CoV-2 transmission parameters and probability of infection for populations with differing transmission characteristics. See Supplementary Section II for details of the mathematical model of in-vehicle viral transmission.Adjusted for via gamma infectivity distribution shape and scale parameters.Passenger and driver both wearing masks, blocking 50% of viral particle exchange.Increased likelihood of superspreading: Assumes 5% of infected individuals are “superspreaders” with 20x increased cumulative viral load; the remaining 95% of infected individuals are 78.9% as infectious as baseline individuals with viral variant A.
Generation of synthetic rideshare transmission data
We apply agent-based modeling to generate synthetic data to represent the data that may become available if public health agencies partner with rideshare companies. The characteristics of urban rideshare networks, COVID-19 diagnoses, and estimates of the fraction of true infections represented by the diagnoses from March 17, 2020 to October 3rd, 2020 (200 days) in Los Angeles County (LA) were derived from the literature (Supplementary Section III).[28], [29], [30] This time period represents the “first wave” of the pandemic in LA, occurring in spring and summer of 2020. LA rideshare volume, with an estimated 75% reduction during this period of quarantine, was simulated.[30] The total number of infections in LA was assumed to be twice the number of diagnosed infections reported over the period considered.[25,31] Each network was initialized by assigning a day of symptom onset to randomly selected passengers and drivers, with the number of infections proportional to the estimated historical number of infections reported 3 days later (to account for the average delay between symptom onset and testing).[32] The time of symptom onset was chosen from a truncated normal distribution calibrated to best fit the distribution of empirical LA diagnoses during the simulated time period (Supplementary Section III).For each rideshare interaction involving an infected individual, the probability of infection was calculated using the parameter set corresponding to the hypothetical viral scenario being tested and via applying the previously derived passenger-to-driver or driver-to-passenger transmission functions. The interaction was assumed to result in transmission based on a draw from a Bernoulli random variable with probability of success equal to the calculated probability of infection. For those infected, an incubation period was drawn from a normal distribution, and a subsequent symptom onset time was assigned.From this simulated data, we then derive data representative of the empirical data that would be available to an epidemiologist. We consider a Partial Reporting scenario with access to diagnosis and symptom onset data for 25% of infections and a Full Reporting scenario in which data are available for all infections.Full mathematical details for the generation of the simulated data are available in Supplementary Section III.
Hypothesis testing with RIDE
Using only the data that would be available to an epidemiologist (neither undiagnosed infections nor information on the origin of each diagnosed infection), we introduce Rideshare Infection Detection (RIDE) to analyze the simulated rideshare infection patterns in order to calculate the number of expected infections and the number of observed infections (Figure 1
).
Figure 1
Simplified schematic of RIDE analytical method to identify potential and observed infections. Observed infections are tabulated by counting individuals with symptom onset following a potentially infectious rideshare interaction. The expected number of infections is the sum of all potential infections, each weighted by its probability of infection. Arrow thickness for each potential infection corresponds to the probability-of-infection magnitude, which is calculated given a mathematical model of rideshare transmission that depends on ride duration, the timing of the ride relative to the potential infector's symptom onset, and the assumed SARS-CoV-2 transmission parameters given the hypothesized scenario and viral variant.
Simplified schematic of RIDE analytical method to identify potential and observed infections. Observed infections are tabulated by counting individuals with symptom onset following a potentially infectious rideshare interaction. The expected number of infections is the sum of all potential infections, each weighted by its probability of infection. Arrow thickness for each potential infection corresponds to the probability-of-infection magnitude, which is calculated given a mathematical model of rideshare transmission that depends on ride duration, the timing of the ride relative to the potential infector's symptom onset, and the assumed SARS-CoV-2 transmission parameters given the hypothesized scenario and viral variant.Using synthetic data produced via simulated propagation of viral variant A, we test hypotheses about which scenario (viral variant A, viral variant A with masks, or viral variant B) best corresponds to the apparent number of rideshare infections observable in the network. For each scenario, we use the parameter set corresponding to the scenario being tested and knowledge only of diagnosed infections to calculate the expected number of rideshare infections, equal to the sum of the probabilities of infection across all potentially infectious rideshare interactions (i.e. the sum of the expected values of the Bernoulli distributions). The observed number of rideshare infections is determined by counting the number of interactions in which a diagnosed individual within their infectious window shared a rideshare vehicle with a potential infectee who had a positive diagnosis with symptom onset between 1.5 and 10 days following the rideshare trip. This observed number is then adjusted to account for the percentage of infections diagnosed (this diagnosis percentage is assumed known) and for an estimation of the average number of “false positives” (interactions that appeared to have resulted in an infection even though the infectee was infected elsewhere) given the overall infection density in the network. For each respective level of reporting (Full vs. Partial Reporting), this simulated propagation and analysis with RIDE was repeated 10 times to determine the impact of transmission stochasticity and rideshare network variability. Across the 10 simulations, the differences in the expected number of passenger-to-driver infections (given each set of hypothesized parameters) and the observed number of passenger-to-driver infections are compared with a pairwise Kruskal-Wallis test (Supplementary Section IV).Separately, two synthetic rideshare infection patterns were simulated and compared: propagation of viral variant A and propagation of viral variant A with increased superspreading. For each round of analysis, the differences were calculated between the number of observed passengers infected per infectious driver for the increased superspreading scenario minus the respective values for the baseline scenario. This was repeated 10 times each for both Partial and Full Reporting, and the resulting distributions were compared with the Kruskal-Wallis test. All P-values were adjusted for multiple testing (Supplementary Section IV).All simulated viral propagation and analysis with RIDE were performed with R (Version 4.0.3, 2010; Vienna, Austria) and executed with Stanford's Sherlock High-Performance Computing Cluster.
RESULTS
Mathematical model
Differences in viral strains and the various hypothetical propagation scenarios considered resulted in notable differences in the probability of rideshare-acquired SARS-CoV-2 infection (Figure 2
). Relative to the baseline scenario probability-of-infection given a 20-minute ride without mask use with a passenger with viral variant A one day before symptom onset, the probability of driver infection was found to be 51% lower when both driver and passenger are masked (scenario viral variant A with masks), 480% higher when the passenger was one of the more infectious individuals from scenario viral variant A with increased superspreading, and 342% higher when the passenger was infected with viral variant B.
Figure 2
Modeled probability of transmission from passenger to driver. Probability of infection varies significantly depending upon interaction dynamics (time of ride relative to passenger's symptom onset and ride duration) and upon assumptions defining the hypothetical SARS-CoV-2 transmission parameters (representing the passenger's viral variant; whether the passenger is a superspreader; and whether face masks are used).
*Probability of driver infection given 20 min ride with infected passenger. †Baseline viral variant. ‡Passenger and driver wearing masks, blocking 50% of viral particle exchange. §Superspreaders with viral variant A are 500% more infectious than baseline individuals with viral variant A. ¶Alternative viral variant B is 300% more infectious than baseline and has different infectivity parameters within the previously estimated 95% confidence intervals.
Modeled probability of transmission from passenger to driver. Probability of infection varies significantly depending upon interaction dynamics (time of ride relative to passenger's symptom onset and ride duration) and upon assumptions defining the hypothetical SARS-CoV-2 transmission parameters (representing the passenger's viral variant; whether the passenger is a superspreader; and whether face masks are used).*Probability of driver infection given 20 min ride with infected passenger. †Baseline viral variant. ‡Passenger and driver wearing masks, blocking 50% of viral particle exchange. §Superspreaders with viral variant A are 500% more infectious than baseline individuals with viral variant A. ¶Alternative viral variant B is 300% more infectious than baseline and has different infectivity parameters within the previously estimated 95% confidence intervals.
Synthetic rideshare transmission data
The simulation was initiated with a baseline infection probability of 528,828 out of 10 million, assuming an infection prevalence double the 264,414 diagnosed, reported COVID-19 infections in LA during this period. Data from 10 simulated trials of SARS-CoV-2 propagation within the LA rideshare network resulted in an average of 190,387.1 (range 187,898 to 193,645) potentially infectious rideshare interactions, encompassing possible passenger-to-driver and driver-to-passenger transmissions. When these 10 simulated trials were propagated assuming SARS-CoV-2 transmission parameters corresponding to viral variant A, there were an average of 409.0 (range 384 to 424) rideshare infections resulting in a diagnosis for the Partial Reporting scenario (access to data on 25% of infections), and 1,666.4 (range 1,614 to 1,698) diagnosed rideshare infections for the Full Reporting scenario.Across 10 trials of propagation with viral variant A followed by hypothesis testing with RIDE given Partial Reporting, the difference between the number of expected minus observed passenger-to-driver rideshare infections was 16.7 (range -54.4 to 78.8) when assuming viral variant A without masks or increased superspreading; -61.0 (range -130.3 to 0.2) when assuming viral variant A with masks without increased superspreading; and 294.9 (range 224.8 to 371.4) when assuming viral variant B without masks or increased superspreading (all adjusted P-values < 0.001). The results were qualitatively similar in the Full Reporting scenario, with greater differences, less variation, and a higher level of significance (Figure 3
).
Figure 3
Differences given 10 trials in the number of passenger-to-driver infections expected and observed in the simulation based on analysis with different hypotheses of SARS-CoV-2 transmission, according to the percent of infections reported (left, Full Reporting; right, Partial Reporting). Variability in results given analyses of 10 simulated synthetic datasets resulting from SARS-CoV-2 propagation in Los Angeles County when true propagation conditions correspond to viral variant A with no masks. For each simulated dataset, and with access to only diagnosed infections, the epidemiologist assesses the difference in the expected number of passenger-to-driver infections given analysis with hypothesized parameters (assuming either viral variant A, viral variant A with masks, or viral variant B) minus the adjusted number of observed rideshare infections in the network.
Each dot within a box-plot represents results with the given hypothesis for one round of Los Angeles County simulation and analysis. Box-plot midline represents the median of analysis results across the 10 trials, box edges show interquartile range, and whisker tips show the minimum and maximum result values. *Observed infections adjusted to account for undiagnosed infections and baseline "false-positive" transmissions (Supplementary IV).
Differences given 10 trials in the number of passenger-to-driver infections expected and observed in the simulation based on analysis with different hypotheses of SARS-CoV-2 transmission, according to the percent of infections reported (left, Full Reporting; right, Partial Reporting). Variability in results given analyses of 10 simulated synthetic datasets resulting from SARS-CoV-2 propagation in Los Angeles County when true propagation conditions correspond to viral variant A with no masks. For each simulated dataset, and with access to only diagnosed infections, the epidemiologist assesses the difference in the expected number of passenger-to-driver infections given analysis with hypothesized parameters (assuming either viral variant A, viral variant A with masks, or viral variant B) minus the adjusted number of observed rideshare infections in the network.Each dot within a box-plot represents results with the given hypothesis for one round of Los Angeles County simulation and analysis. Box-plot midline represents the median of analysis results across the 10 trials, box edges show interquartile range, and whisker tips show the minimum and maximum result values. *Observed infections adjusted to account for undiagnosed infections and baseline "false-positive" transmissions (Supplementary IV).For the secondary investigation comparing analytical results given simulated propagation of viral variant A with increased superspreading relative to simulated propagation of viral variant A, the mean observed number of drivers that infected exactly one passenger was 9.2 (range 29 to -6) lower in the superspreader scenario and the mean number of drivers that infected exactly two passengers was 1.8 (range -8 to 7) higher in the superspreader scenario given Partial Reporting. The results were qualitatively similar in the Full Reporting scenario, with greater differences (Figure 4
). In both the Full and Partial Reporting scenarios, the combined data from the 10 trials led to significant differences in the distributions (Full Reporting, P < 0.001; Partial Reporting, P < 0.05).
Figure 4
Differences in the number of passenger infections per driver in simulations with baseline variant versus variant with increased superspreading, according to the percent of infections reported (left, Full Reporting; right, Partial Reporting). Variability in the differences between results given analyses of 10 simulated synthetic datasets resulting from SARS-CoV-2 propagation of viral variant A in Los Angeles County given either a population with homogeneous cumulative infectivity (baseline superspreading scenario) or asymmetric infectivity (increased superspreading scenario, where 5% of infected individuals are “superspreaders”).
Each dot within a box-plot represents analysis results with the given hypothesis for one round of Los Angeles County simulation and analysis. Box-plot midline represents the median of analysis results across the 10 trials; box edges show interquartile range; whisker tips are minimum and maximum values; and black dot shows outliers, as specified by Tukey.
Differences in the number of passenger infections per driver in simulations with baseline variant versus variant with increased superspreading, according to the percent of infections reported (left, Full Reporting; right, Partial Reporting). Variability in the differences between results given analyses of 10 simulated synthetic datasets resulting from SARS-CoV-2 propagation of viral variant A in Los Angeles County given either a population with homogeneous cumulative infectivity (baseline superspreading scenario) or asymmetric infectivity (increased superspreading scenario, where 5% of infected individuals are “superspreaders”).Each dot within a box-plot represents analysis results with the given hypothesis for one round of Los Angeles County simulation and analysis. Box-plot midline represents the median of analysis results across the 10 trials; box edges show interquartile range; whisker tips are minimum and maximum values; and black dot shows outliers, as specified by Tukey.
LIMITATIONS
The primary limitations of this work stem from the use of synthetic data. Although we include sources of variability in our model by incorporating stochastic ride durations, ride timing relative to symptom onset, transmission heterogeneity, etc., real-world infection patterns would include additional sources of noise. For example, we do not consider some factors affecting in-vehicle air circulation, such as the potential for open windows,[22] seating arrangements,[33] whether the airflow system is operating,[34] potential barriers between the front and back seat, or differences in vehicle sizes. Furthermore, we do not incorporate the possibility of systematic infectivity differences between diagnosed and undiagnosed individuals, immunity from recent SARS-CoV-2 infection, non-uniform diagnostic testing and reporting rates between drivers and passengers, nor systematic differences between neighborhoods within cities. Finally, our hypothesis testing with RIDE assumes accurate estimation of the total fraction of infections that are diagnosed (which is not precisely known by the scientific community), as well as accurate data about infected individuals’ symptom onset time relative to diagnosis, which may not be consistently recorded.The simulated propagations for this proof-of-concept testing considered COVID-19 infections in LA during the first wave of the pandemic, when there were 132,207 diagnosed infections over 6 months. However, in the more recent Omicron wave, in just one month (December 25, 2021 to January 25, 2022) LA's cumulative diagnosed case count rose from 1.5 million to 2.5 million.[29] Given this higher relative rate of infections, RIDE's detection power would potentially be 24-fold greater during the Omicron wave. Moreover, given improved PCR and antigen testing infrastructure, diagnosis rates have since likely increased above the conservative 25% estimate used for the Partial Reporting scenario. Finally, real-world deployment of RIDE could aggregate data across multiple cities and combine analyses of passenger-to-driver and driver-to-passenger transmissions. Thus, the additional noise introduced by the aforementioned limitations would likely be mitigated by these factors given real-world deployment of RIDE (see Supplementary Section V for details).For simplicity, we did not address all possible scenario permutations (e.g. mask use new viral variant). Instead, we focused on assessing RIDE's detection ability with regards specific events of the COVID-19 pandemic, such as the start of the mask mandate and the rapid Delta variant outbreak. Other scenario permutations may represent opportunities for future research
DISCUSSION
Using simulated viral propagation patterns generated based on empirical ridesharing data and COVID-19 diagnoses records from Los Angeles County (LA), we demonstrate that analysis of rideshare data may allow for the identification of emergent SARS-CoV-2 strains and the study of their transmission characteristics. Our analyses of LA's first COVID-19 wave show that such an approach may be effective given data aggregated over 6 months and when as few as 25% of infectious individuals have been identified. While additional sources of noise may confound real-world deployment of RIDE, consideration of our simulations’ conservative assumptions suggests that meaningful, even real-time pandemic monitoring may be feasible. Together, our findings indicate that further investigation is warranted for the development of such a system at the national scale and with access to more detailed data.We demonstrate that current research-based estimates for the range of SARS-CoV-2 transmission parameter values leave significant uncertainty in transmission modeling, leading to substantial variability in probability-of-infection predictions for a typical rideshare interaction. We demonstrate the ability of RIDE to differentiate between values within the current estimated range, potentially enabling measured appraisal of the emergence of more infectious viral strains; the effectiveness of Uber and Lyft's nationwide mask mandates for all drivers and passengers; and the extent of superspreading in the population.Previous epidemiological modeling research combining diagnoses records and retrospective passenger transportation data from high-speed trains in China sets precedent for RIDE.[33] The standardized format of rideshare data at an international scale presents a significant opportunity for a more extensive study of viral transmission. While our simulations have similarities with previous agent-based modeling studies predicting SARS-CoV-2 transmission patterns within other settings (e.g. universities),[35] the main contribution of our work is not estimating the number of rideshare-based infections, but rather developing a method to analyze real-world rideshare infection patterns if we were to have access to retrospective data.Straightforward expansions of RIDE could increase real-world hypothesis-testing power via the aggregation of data across multiple cities and with the integration of additional forms of data, such as vaccination records. These extensions could be used to evaluate the extent to which vaccines reduce the infectivity of vaccinated infected individuals relative to non-vaccinated infected individuals, and how this reduction may diminish over time given emergent viral variants or due to naturally waning immunity.Other avenues of potential investigation include assessment of the effectiveness of other NPIs beyond masks and the prevalence and dynamics of possible passenger-to-passenger infections due to indirect contact through shared surface contact and leftover aerosols in the vehicle. Finally, this analytical method paired with the communication features of rideshare platforms could facilitate a largely automated “radar” for contact tracing within the rideshare network.The US Center for Disease Control and Prevention may have grounds to require rideshare dataset access so that it can be merged with the list of positive COVID-19 diagnoses, strain sequencing data, vaccination records, and other relevant data. The case has been made for digital disease surveillance that maintains considerations of ethics and patient privacy.[36] Large rideshare companies such as Uber and Lyft are already sharing data with public health officials to assist with contact tracing,[23] but no standardized analytics framework is available to aggregate and derive insights from these data. While logistics—the details of which are beyond the scope of this article—for the large-scale merging of rideshare and infection data are complex, it could be accomplished in an anonymous fashion given careful data processing.
CONCLUSION
The rideshare network of exposure is unlike any other, with tens of millions of potentially infectious connections between individuals worldwide. Unlike more general cellphone-based contact data, rideshare contacts occur in a relatively controlled environment with fairly consistent spatial dynamics, are often the only connection between individuals, and are sporadic with respect to time, location, and duration. We demonstrated, via simulations of COVID-19 propagation through a rideshare network based on Los Angeles County, that viral strains with differing SARS-CoV-2 transmission parameters lead to detectably different patterns of infections, even in the presence of limited diagnostic information. For the present and subsequent pandemics, analysis of rideshare data combined with diagnosis records may augment efforts to better understand viral transmission dynamics and to measure the changes in infectivity associated with non-pharmaceutical interventions, vaccination, or emergent viral strains.
Author contributions
David Scheinker had a primary role in conceptualization, methodology development, supervision, validation, literature search, and writing (review and editing). Conrad Safranek had a primary role in conceptualization, methodology development, software (programming for simulation and analysis), tables, figures, literature search, and writing (original draft). Both authors (Conrad Safranek and David Scheinker) verified the underlying data.
Ethics approval
Not applicable; No private data on human subjects was used in this research.
Uncited Link
Table S1
Table
S1 Previously estimated a, b, and c) and unknown (m and m2) transmission parameters included in the mathematical model of probability-of-infection for a given potentially infectious rideshare interaction.
Relative magnitude variables define the NPI-dependent and strain-dependent rate of viral transfer between infector and infectee
P→D
m1
rate of virus particle transfer from infectious passenger to susceptible driver — virus particles per hour
Not Available
D→P
m2
rate of virus particle transfer from infectious driver to susceptible passenger — virus particles per hour
Not Available
Infector's infectivity relative to symptom onset is modeled with a shifted gamma distribution.
P→D and D→ P
a
Shape of gamma distribution
2.12 [CI Not Available]
P→ D and D→ P
b
Scale of gamma distribution
1.45 [CI Not Available]
P→ D and D→ P
c
Shift of gamma distribution (# of days patient is infectious before symptom onset) — days
2.3 [0.8-3.0]
S1 Previously estimated a, b, and c) and unknown (m and m2) transmission parameters included in the mathematical model of probability-of-infection for a given potentially infectious rideshare interaction.S2 Independent variable inputs for mathematical modeling of potentially infectious interactions within the rideshare network.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Table
S2 Independent variable inputs for mathematical modeling of potentially infectious interactions within the rideshare network.
Time between infector's symptom onset and rideshare trip — hours
P→ D and D→ P
tR
Duration of potentially infectious ride (in-person transmissions) — hours
P→D and D→P
Table
S3 Assumed value for SARS-CoV-2 transmission parameter determine interaction probability-of-infection modeling for two hypothesized test cases: Previously estimated parameters (a, b, and c) and unknown, NPI-dependent, magnitude-of-viral-transfer parameters (m1 and m2) are respectively varied to represent hypothetical SARS-CoV-2 variants and propagation scenarios.
Relative magnitude variables define the NPI-dependent and
P→D
m1
rate of virus particle transfer from infectious passenger to susceptible driver — virus particles per hour
1.2
0.6
3.6
strain-dependent rate of viral transfer between infector and infectee
D→P
m2
rate of virus particle transfer from infectious driver to susceptible passenger — virus particles per hour
0.4
0.2
1.2
Infector's infectivity relative to symptom onset is modeled with a shifted gamma distribution.
P→D and D→P
a
Shape of gamma distribution — unitless
2.12
2.12
2.12
P→D and D→P
b
Scale of gamma distribution — unitless
1.45
1.45
0.9
P→D and D→P
c
Shift of gamma distribution (# of days patient is infectious before symptom onset) — days
2.3
2.3
3
Table
S4 Relevant data from the New York City Taxi and Limousine Commission was gathered and adapted to model the Los Angeles rideshare network.
Inputs to generate a hypothetical rideshare network (estimates based on New York City):
Number of rideshare drivers = 64,000 drivers (→ 0.744% of NYC's total 8.6E6 population are drivers)
Average days/month a driver works = 20.5 days (→ 68.8% chance a driver works on a given day)
Average rides given on a day a driver works = 13.63 rides/day (→ 9.38 rides/day overall average for all days of month)
Average ride duration = 17 min (distribution data unavailable, but assumed to have a lognormal distribution with log mean 2.65 min and log standard deviation 0.6053319)
Average interim period between rides = unavailable (assumed to have overall average of 12 min represented by a lognormal distribution with log mean 2.47 min and log standard deviation 0.8523)
Average total rides per day = 600,000 rides
Table
S5 Data from a 2019 PEW survey was incorporated to model the Los Angeles rideshare network.
Authors: Mirjam E Kretzschmar; Ganna Rozhnova; Martin C J Bootsma; Michiel van Boven; Janneke H H M van de Wijgert; Marc J M Bonten Journal: Lancet Public Health Date: 2020-07-16
Authors: Sidra L Speaker; Christine M Doherty; Elizabeth Pfoh; Aaron Dunn; Bryan Hair; Lynn Daboul; Victoria Shaker; Michael Rothberg Journal: Cureus Date: 2021-02-01
Authors: Lindsey R Baden; Hana M El Sahly; Brandon Essink; Karen Kotloff; Sharon Frey; Rick Novak; David Diemert; Stephen A Spector; Nadine Rouphael; C Buddy Creech; John McGettigan; Shishir Khetan; Nathan Segall; Joel Solis; Adam Brosz; Carlos Fierro; Howard Schwartz; Kathleen Neuzil; Larry Corey; Peter Gilbert; Holly Janes; Dean Follmann; Mary Marovich; John Mascola; Laura Polakowski; Julie Ledgerwood; Barney S Graham; Hamilton Bennett; Rolando Pajon; Conor Knightly; Brett Leav; Weiping Deng; Honghong Zhou; Shu Han; Melanie Ivarsson; Jacqueline Miller; Tal Zaks Journal: N Engl J Med Date: 2020-12-30 Impact factor: 91.245
Authors: Neeltje van Doremalen; Trenton Bushmaker; Dylan H Morris; Myndi G Holbrook; Amandine Gamble; Brandi N Williamson; Azaibi Tamin; Jennifer L Harcourt; Natalie J Thornburg; Susan I Gerber; James O Lloyd-Smith; Emmie de Wit; Vincent J Munster Journal: N Engl J Med Date: 2020-03-17 Impact factor: 91.245
Authors: Kyra H Grantz; Hannah R Meredith; Derek A T Cummings; C Jessica E Metcalf; Bryan T Grenfell; John R Giles; Shruti Mehta; Sunil Solomon; Alain Labrique; Nishant Kishore; Caroline O Buckee; Amy Wesolowski Journal: Nat Commun Date: 2020-09-30 Impact factor: 14.919
Authors: Bette Korber; Will M Fischer; Sandrasegaram Gnanakaran; Hyejin Yoon; James Theiler; Werner Abfalterer; Nick Hengartner; Elena E Giorgi; Tanmoy Bhattacharya; Brian Foley; Kathryn M Hastie; Matthew D Parker; David G Partridge; Cariad M Evans; Timothy M Freeman; Thushan I de Silva; Charlene McDanal; Lautaro G Perez; Haili Tang; Alex Moon-Walker; Sean P Whelan; Celia C LaBranche; Erica O Saphire; David C Montefiori Journal: Cell Date: 2020-07-03 Impact factor: 66.850