Literature DB >> 35641521

Disaggregated data on age and sex for the first 250 days of the COVID-19 pandemic in Bucharest, Romania.

Marian-Gabriel Hâncean1, Maria Cristina Ghiță2, Matjaž Perc3,4,5,6, Jürgen Lerner7,8, Iulian Oană2, Bianca-Elena Mihăilă2, Adelina Alexandra Stoica2, David-Andrei Bunaciu2.   

Abstract

Experts worldwide have constantly been calling for high-quality open-access epidemiological data, given the fast-evolving nature of the COVID-19 pandemic. Disaggregated high-level granularity records are still scant despite being essential to corroborate the effectiveness of virus containment measures and even vaccination strategies. We provide a complete dataset containing disaggregated epidemiological information about all the COVID-19 patients officially reported during the first 250 days of the COVID-19 pandemic in Bucharest (Romania). We give the sex, age, and the COVID-19 infection confirmation date for 46,440 individual cases, between March 7th and November 11th, 2020. Additionally, we provide context-wise information such as the stringency levels of the measures taken by the Romanian authorities. We procured the data from the local public health authorities and systemized it to respond to the urgent international need of comparing observational data collected from various populations. Our dataset may help understand COVID-19 transmission in highly dense urban communities, perform virus spreading simulations, ascertain the effects of non-pharmaceutical interventions, and craft better vaccination strategies.
© 2022. The Author(s).

Entities:  

Mesh:

Year:  2022        PMID: 35641521      PMCID: PMC9156663          DOI: 10.1038/s41597-022-01374-7

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   8.501


Background & Summary

Since the onset of the pandemic, the volume of COVID-19 data made available to the public has been unprecedented[1]. Yet, disaggregated data are still scarce, incomplete, sometimes contradictory, and with little cross-country comparability. Various reports have substantiated the need for better data. It has already been pinpointed the sheer importance of disaggregated information by gender, age, geographic location, ethnicity, and other variables relevant in a national context, especially for the developing counties[2]. Some scholars have posited that failing to acknowledge the importance of disaggregated data on gender and sex may result in significant inequalities in access to health services[3]. Others have propounded that the current lack of COVID-19 disentangled data will increase the existing sex and gender data gaps, which in turn will increase gender disparities in the health and socioeconomic effects of the pandemic, with a negative impact on females[4]. Various experts worldwide have advocated the need to produce and standardise age-disaggregated health data to improve usability and cross-country comparison[5]. They have showcased that failing to do so results in misinterpreting the patterns of SARS-CoV-2 transmission among and beyond the cohort of children, the process of prioritizing vaccination, and the inspection of the secondary effects. As of January 21st, 2022, Romania has had the second-lowest vaccination rate in Europe (41% cumulative uptake of full COVID-19 vaccination scheme)[6]. On top of that, Romania is among the top ten European countries with the highest death toll (3,106 per one million people)[7]. Located in the southeast of Romania, Bucharest is the capital city and the primary economic agglomeration of the country. Services - the most impacted sector during the COVID-19 pandemic-represent the leading supplier for the local economy[8]. With a total resident population of 1,827,390[9], Bucharest ranks third, behind Paris and Athens, by population density, among the European Union country capital cities (7,917 persons per km2)[10]. The first COVID-19 case in Bucharest (case number 12 in Romania) was officially confirmed on March 7th, 2020. The patient, a male, aged 49 years old, entered Bucharest by plane on February 27th, travelling from Rome, Italy[11]. Restrictive measures were imposed, at that time, only for travellers arriving in Romania from regions in northern Italy. Evidence suggests that returning travellers from Italy had a pivotal role in the early spread of COVID-19 in Romania[12]. As of January 21st, 2022, the number of positive cases in Bucharest has reached a tally of 335,498, accounting for 17% of all 1,983,670 cases reported in the country[13]. We present the first 250 days of the coronavirus pandemic in Bucharest (between March 7th and November 11th, 2020), comprising 46,440 COVID-19 individual cases. The dataset gives for each patient: disaggregated biological sex and age, the COVID-19 infection confirmation date, the place of residence or quarantine, and the phase (stage) corresponding to the Non-Pharmaceutical Interventions (NPIs) active in Bucharest. The official measures adopted during the first 250 days critically vary in severity: from imposing a nationwide state of emergency to almost no measures at all. The NPIs have been diverse: mask-wearing, social distancing, movement restrictions (including for people aged 65+), school closures, etc. The effect of the NPIs is likely to be conditioned on the structure of the target population, on the share of specific groups within the total population[14], and on mobility and geographic factors[15]. In effect, we divide the first 250 days into five phases given the measures’ profile (the severity). All the information (excepting the NPIs phase variable) was procured, in an anonymized format, from the Bucharest Public Health Department (the Romanian Ministry of Health). Our study received ethical approval (Decision No. 1, September 1st, 2020) from the Ethics Committee of the Department of Sociology (University of Bucharest). We complied with all relevant ethical regulations. The potential re-use of our dataset is multiple: epidemiological prevalence understanding, mathematical modelling or simulations, and public health policy design or assessment. Also, it allows for measuring the effectiveness of NPIs at the city (urban community) level, which may provide a fine-grained understanding of local health policy success. A sub-sample of our dataset has already been employed to estimate the role of age and sex in spreading COVID-19 in Bucharest[16,17].

Methods

The dataset comprises 46,440 COVID-19 patients officially confirmed and reported between March 7th and November 11th, 2020, in Bucharest (before the start of the vaccination campaign). These real-world data were procured from the Bucharest Public Health Department (the BPHD), Romanian Ministry of Health. To receive the data, our team sent an official request address to the BPHD (i.e., Address No. 14870, August 18th, 2020) on behalf of the University of Bucharest. The BPHD provided the dataset based on approval No. NT3054E (August 28th, 2020) in response to our request. Our study received ethical approval (Decision No. 1, September 1st, 2020) from the Ethics Committee of the Department of Sociology (University of Bucharest). Before their being procured, the data were anonymized by the BPHD. The 46,440 observations (patients) correspond to an investigated time window of 250 days (March 7th - November 11th, 2020). Subsequently, we qualitatively analysed the NPIs advanced by the public health authorities in Romania[18] during the 250-day timeframe. Accounting for their level of severity, we derived five stringency phases. Each of the 250 days was nested in one of the five phases. Figure 1 describes the steps of producing the “COVID-19 in Bucharest” dataset (or, briefly, the dataset). For each of the 46,440 COVID-19 patients, we have records of their biological sex, age, area of residence, or quarantine, as well as their official confirmation date. Each patient also has a unique code of identification (patient_ID) assigned by the BPHD. These records are marked in red in Fig. 1. Information about the NPIs corresponding to the first 250-day time interval was retrieved from official press releases uploaded on the webpage of the Romanian Ministry of Internal Affairs[18] and from newsletters published in the “Press releases” section on the webpage of the Romanian Ministry of Public Health[19]. This information was expressed in the form of the “phase” variable (marked in blue in Fig. 1). Additionally, our team derived new variables from the original data records provided by the BPHD, i.e., observation number, age_groups, month, week, and 14-day interval. These variables are marked in green in Fig. 1. Our team performed various technical validation checks on the “COVID-19 in Bucharest” dataset (plausibility, completeness, and conformance checks). These are presented in greater detail in the “Technical validation” section. For brevity, we only mention here that during the technical validation stage, we compared the data records in our dataset to the information enclosed in other public available datasets[20-22].
Fig. 1

The sequence of steps taken to produce the dataset.

The sequence of steps taken to produce the dataset. We used the severity degree of the NPIs to delineate the stringency phases. In Fig. 2, for informative and illustrative purposes, we represent the evolution of the COVID-19 cases in a longitudinal fashion (between March 7th and November 11th, 2020) by major NPIs, stringency phases, sex, and age groups. The stringency phases are displayed in temporal order. The information displayed in the main content and Fig. 2 represents the authors’ contribution.
Fig. 2

The evolution of COVID-19 confirmed cases between March 7th and November 11th, 2020, in Bucharest, Romania. We illustrate the new daily cases by sex (a) and age-groups (b) while accounting for the stringency phases. The rendered information represents the authors’ contribution.

The evolution of COVID-19 confirmed cases between March 7th and November 11th, 2020, in Bucharest, Romania. We illustrate the new daily cases by sex (a) and age-groups (b) while accounting for the stringency phases. The rendered information represents the authors’ contribution. Stringency “Phase 1” ranges between March 7th and March 15th, 2020, and corresponds to “low to moderate measures.” Forty-seven COVID-19 cases are confirmed during this phase. The major NPIs until March 15th are: restriction of outdoor or indoor public and private events to 1,000 participants (March 8th), cessation of flights (March 9th), bus rides (March 10th), and railway travel (March 12th) to and from Italy, suspension of face-to-face classes (March 11th) in all pre-university level schools and some universities, limitation of the number of participants in indoor cultural, scientific, religious and sports activities to 50 (March 11th). “Phase 2”, matching the most severe measures, corresponds to “the state of emergency in Romania” and ranges between March 16th and May 14th, 2020. One thousand five hundred fifty-one cases are confirmed amid these dates. The preliminary NPIs adopted during the state of emergency are: an extension of the suspension of in-person classes; permission of only takeaway and delivery services in restaurants and shopping malls; closure of hotels and clubs; prohibition of indoor cultural, scientific, religious, and sports events; the restriction of outdoor personal events to 100 participants; the cessation of flights to and from Spain. On March 24th, the military is deployed to help enforce a national lockdown. Movement outside the household is generally prohibited for non-essential purchases, with persons over 65 having their outdoor activity restricted, at first, to a two-hour interval (March 25th) and, then, to a three-hour interval (March 29th). Starting April 27th, non-essential movement outside the household is permitted for persons above the age of 65 both in the morning (7–11 am) and during the evening (7–10 pm). “Phase 3”, with moderate measures, ranges between May 15th and June 16th, 2020. Nine hundred thirty-two cases are confirmed during this interval. May 15th marks the onset of the first “state of alert”, allowing hairdressing salons, barbershops, and dental clinics to reopen. Facial masks generally become mandatory in indoor public spaces, public transportation included. From June 1st, gradually, some of the movement restrictions are lifted, outdoor concerts and cultural events are permitted, and restaurant terraces reopen. “Phase 4”, with 18,499 confirmed cases, is a phase of the least stringent measures, ranging between June 17th and October 6th, 2020. Since June 17th, shopping malls, public nurseries, and pre-schools are allowed to reopen. Pre-university level schools reopen on September 14th. Local elections are held on September 27th, with a participation rate of 37% of the 18+ Bucharest population[23]. For the entire phase, facial masks remain mandatory. “Phase 5”, with low to moderate measures, ranges between October 7th and November 11th, 2020. Authorities confirm 25,274 COVID-19 cases during this phase. As of October 7th, theatres, cinemas, and show venues are closed in Bucharest, while restaurants are restricted to outdoor service only. Opening hours of stores are limited, and a night curfew is being imposed. Starting November 9th, all in-person classes are suspended. Masks are mandatory in both outdoor and indoor places, working spaces included. Public and private gatherings are cancelled. Table 1 displays the major NPIs by stringency phase and time interval.
Table 1

GANTT table displaying the major Non-Pharmaceutical Interventions during the first 250 days of the COVID-19 pandemic in Bucharest.

Categories of measuresMeasuresTime intervalCorresponding stringency phases
12345
International travel controlsBan of flights from/to ItalyMar. 9th–Jun. 23rd
Ban of bus rides from/to ItalyMar. 10th–May 22nd
Ban of flights from/to SpainMar. 16th–Jul. 7th
Service restrictionsHotel closureMar. 16th–May 14th
Night club closureMar. 16thNov.11th a
Closure of hairdressing salons, barbershops & dental clinicsMar. 16th–May 14th
Takeaway only & delivery services (in restaurants & malls)Mar. 16th–Jun. 16th
Closure of restaurants & coffee shopsMar. 17th–Jun. 15th
Restaurants restricted to outdoor serviceOct. 7thNov. 11th
Opening hours of stores are limitedOct. 7thNov. 11th
Stay home orderNational lockdownMar. 24th–May 14th
Lockdown enforced by the armyMar. 24th–May 14th
Night curfewOct. 7thNov. 11th
Internal movement restrictionsProhibition of movementbMar. 16th–May 14th
Outdoor activity restrictions for people aged 65+cMar. 25th–Mar. 28th
Adjusted outdoor activity restrictions for people aged 65+dMar. 29th–Apr. 26th
Adjusted outdoor activity restrictions for people aged 65+eApr. 27th–May 14th
Facial coveringsFacial masks and gloves are mandatoryfMay 14thNov. 11th
Events & gatheringsRestriction of outdoor activities and eventsgMar. 8th–Mar.10th
Restriction of indoor and private events to 1,000 participantsMar. 8th–Mar. 10th
Adjusted restriction of indoor eventshMar. 11th–Mar. 16th
Restriction of outdoor activities and events to 100 participantsMar. 11th–Mar. 21st
Prohibition of indoor religious eventsMar. 16th–Jun. 16th
Prohibition of indoor sports eventsMar. 16th–Jun. 1st
Restriction of gatherings of more than three personsMar. 22nd–May 31st
Prohibition of indoor cultural & scientific eventsMar. 16th–Aug. 31st
Re-closure of theatres, cinemas, and show venuesOct. 7thNov. 11th
School closureSuspension of face to face university classesMar. 11thNov. 11th
Suspension of face to face pre-university classesMar. 11th–Sep. 14th
Suspension of face to face pre-university classesNov. 9thNov. 11th

aNovember 11th is not the date until a measure is effective, but it is the last day in our dataset. bMovement outside the household is prohibited for non-essential purchases. cOutdoor activity for persons over 65 is restricted to a two-hour interval. dOutdoor activity for persons over 65 is limited to a three-hour interval. eNon-essential movement outside the household is permitted for persons above the age of 65, both in the morning (7–11 am) and during the evening (7–10 pm). fFacial masks and gloves are mandatory in public indoor spaces (public transportation included). gRestriction of outdoor activities and events to 1,000 participants. hLimitation of the number of participants in indoor cultural, scientific, religious, and sports activities to 50. The information illustrated in the table represents the authors’ contribution.

GANTT table displaying the major Non-Pharmaceutical Interventions during the first 250 days of the COVID-19 pandemic in Bucharest. aNovember 11th is not the date until a measure is effective, but it is the last day in our dataset. bMovement outside the household is prohibited for non-essential purchases. cOutdoor activity for persons over 65 is restricted to a two-hour interval. dOutdoor activity for persons over 65 is limited to a three-hour interval. eNon-essential movement outside the household is permitted for persons above the age of 65, both in the morning (7–11 am) and during the evening (7–10 pm). fFacial masks and gloves are mandatory in public indoor spaces (public transportation included). gRestriction of outdoor activities and events to 1,000 participants. hLimitation of the number of participants in indoor cultural, scientific, religious, and sports activities to 50. The information illustrated in the table represents the authors’ contribution.

Data Records

We deposited a copy of the dataset (i.e., the Bucharest COVID-19 dataset) to the generalist repository figshare[24]. Data are available in an Excel file format (.xlsx), facilitating importation into various statistical software programs such as R, Python, SPSS, Stata, SAS, or conversion to comma-separated value format (.csv). In the database, the rows designate unique individual observations (COVID-19 confirmed cases). Each observation is assigned a number from 1 to 46,440 (the total number of cases) and ascendingly ordered. Overall, the dataset renders information about all the officially COVID-19 confirmed individual cases in Bucharest, since the onset of the pandemic in the city, on March 7th, till November 11th, 2020. Specifically, we give information on the COVID-19 confirmation date and each patient’s sex, age, and geographical (administrative) location (the administrative district in Bucharest). We also give the patients’ ID codes as provided by the BPHD. These ID codes are helpful for joining (linking) this dataset to other available datasets[16,17]. Additionally, we derive four new variables from the original information. Namely, based on its COVID-19 confirmation date, each observation is nested in a (calendar) month, week, 14-day interval, and stringency phase. We make available the dataset in the English language to increase its international usage by health professionals, policy-makers, scientists, and other interested parties. We use self-explanatory and straightforward variable labels and values for users’ convenience. Also, we mark missing data by “NA” codes. We continue this section with a detailed description of all variables (data fields) available in the dataset (Table 2).
Table 2

The data fields (variables) included in the Bucharest COVID-19 dataset.

NoCategoryData field (variable labels)Data format
1Observation numberobservation_numberNumber (e.g., “0”, “1”, “2”, …)
2Identificationpatient_IDaNumber (e.g., “12”, “14”, “16”, …)
3Epidemiologicalconfirmation_dateaDate (MM-DD-YYYY)
4monthbText (e.g., “March”, “April”, “May”…)
5weekbAlpha-numerical text (e.g., “w10”, “w11”, …)
614_day_intervalbAlpha-numerical text (e.g., “w10_w11”, “w12_w13”, …)
7Demographic informationsexaText (i.e., “male”, “female”)
8ageaNumber (e.g., “0”, “1”, “2”, …)
9age_groupsbAlpha-numerical text (e.g., “0–4 y.o.”, “5–9 y.o.”, …)
10districtaAlpha-numerical text (e.g., “Bucharest”, “District_1”, “District_2”, etc.)
11Health policystringency_phasecAlpha-numerical text (e.g., “phase_1”, “phase_2”, …)

Data sources: aBucharest Public Health Department, bAuthors, using the information retrieved from the Bucharest Public Health Department, cAuthors, using the information retrieved from the Romanian Ministry of Internal Affairs.

The data fields (variables) included in the Bucharest COVID-19 dataset. Data sources: aBucharest Public Health Department, bAuthors, using the information retrieved from the Bucharest Public Health Department, cAuthors, using the information retrieved from the Romanian Ministry of Internal Affairs.

Observation_number

There are 46,440 unique data entries (observations or patients) in the dataset. Therefore, the variable takes values from 1 to 46,440. The observation numbers are ascendingly sorted. These numbers do not reflect epidemiological progression and should not be used for that purpose. Instead, this variable uniquely identifies each observation in the dataset.

Patient_ID

The BPHD had assigned each COVID-19 patient a numerical code due to the anonymization process. We render these patient ID codes as it is useful for joining (linking) the dataset to other available datasets[16,17].

Confirmation_date

For each patient, this represents the official date of test response (i.e., COVID-19 confirmation date). The dataset includes only positive tests. A number of 137 cases are reported with missing confirmation dates (marked by NA). These missing data account for 0.3% of all 46,440 patients. We record the time related to the confirmation dates in the “MM/DD/YYYY” format (“month, day, year”, in left-to-right writing direction), e.g., “03/07/2020”. The maximum number of cases confirmed in a day is 1,203 (on November 3rd, 2020). Between March 7th and November 11th, 2020, there are 250 days (confirmation dates) in which the authorities report positive cases.

Sex

This represents the biological sex of the tested individuals (patients). The dataset comprises 24,696 female patients (53%) and 21,744 male patients (47%). We report no missing cases on this variable. The variable takes two values: “male” and “female”.

Age

This variable captures the equivalent of age in completed years, with values ranging from 0 (less than one year of age) to 101. The average value is 43.4, the median is 43.0, and the most frequent value within the dataset is 52.0. The standard deviation is 17.7. Thirty-six cases have missing data (these represent less than 0.1% of all cases). In the dataset, we mark missing values by NA.

Age_groups

We recode the “age” variable into five-year age groups (brackets), with categories ranging from 0–4 y.o. to 85+ y.o. We derive a total of 18 groups. The category with the highest absolute frequency is the 40–44 y.o. group (namely, 5,650 positive cases correspond to this age group, accounting for roughly 12% of all data entries). Thirty-six patients have missing information (these represent less than 0.1% of all cases). In the dataset, we mark missing values by NA.

District

This variable refers to the six administrative units (or sectors) forming the Municipality of Bucharest, each governed by a mayor. The six districts have a clockwise geographic arrangement. For instance, District 1 is located in the north, District 4 in the south, and District 6 in the west of the city (Fig. 3). The exact district is not available for 8,043 cases (that is 17.3% of all the data entries). For these cases, we add a generic location – “Bucharest”. The “district” marks the place of residence or quarantine for a specific individual case. Due to disclosure reasons, the BPHD did not disentangle the residence from the quarantine place. The variable takes the following values: “District_1”, “District_2”, “District_3”, “District_4”, “District_5”, “District_6”, and “Bucharest”.
Fig. 3

The administrative organization of the Municipality of Bucharest into six districts.

The administrative organization of the Municipality of Bucharest into six districts.

Month

We derive this variable from the “confirmation_date” variable. We assign each “confirmation_date” to a “month” (each date is nested in a month). Consequently, we obtain nine months of investigation (March – November 2020). Two of these months are incomplete: March and November 2020. Further, October has the highest absolute frequency of observations (19,429 cases accounting for 41.8% of all cases). Also, in the dataset, we have 137 data entries with missing data that are marked by NA. The variable takes the following values: “March”, “April”, “May”, “June”, “July”, “August”, “September”, “October”, and “November”.

Week

We derive this variable from the “confirmation_date” variable. We assign each “confirmation_date” to a “week” (each date is nested in a week). The variable takes as values the week number (e.g., w10, w11, w12, …, w46). Week counting starts from the beginning of the year (2020) – the first week of 2020 is January 1st – January 4th. We consider each week begins with Sunday. In our dataset, the pandemic onset in Bucharest is in week 10 (i.e., w10). The largest number of reported cases is in week 45 (6,105 observations accounting for 13% of all cases). Also, in the dataset, we have 137 data entries with missing data that are marked by NA.

14_day_interval

We derive this variable from the “confirmation_date” variable. We assign each “confirmation_date” to a 14-day-time-interval or two-week time window (each date is nested in a two-week time interval). We build this variable for potential analysis purposes. The variable takes as values week numbers (e.g., “w10_w11”, “w12_w13”, “w14_w15”, “w16_w17”, …, “w44_w45”, “w46_w47”). We showcase that “w10_w11” and “w46_w47” (the beginning and the end of the time window) are incomplete. We notice that the minimum number of unique COVID-19 confirmed cases (i.e., 40 representing less than 0.1% of all observations) is in weeks 10–11. Also, the maximum number of cases is in weeks 44–45 (12,045 accounting for 26% of all observations). We have 137 data entries with missing data marked by NA in the dataset.

Stringency_phase

We derive this variable from the “confirmation_date” variable. We assign each “confirmation_date” to a “stringency phase” (each date is nested in a phase). This variable takes the following values: “phase_1”, “phase_2”, “phase_3”, “phase_4”, and “phase_5”. We define these stringency phases based on the various NPIs adopted by the authorities. These five phases are consistently different in terms of the stringency of the adopted measures. “Phase_1” covers nine days: between March 7th (the first confirmed case in Bucharest) and March 15th (the last day before the state of emergency). “Phase_2” corresponds to a 60-day time window between March 16th and May 14th (the entire period of the state of emergency). “Phase_3” covers a time frame of 33 days between May 15th and June 16th (the first state of alert). “Phase_4” has a 112-day duration equivalent to the period of relaxation (between June 17th and October 6th). “Phase_5” covers 36 days and corresponds to new restrictive measures (October 7th and November 11th). Also, in the dataset, we have 137 data entries with missing data that are marked by NA.

Technical Validation

Before being transferred to our team, the data were collected, curated, and anonymized by the Bucharest Public Health Department (BPHD), The Romanian Ministry of Health. The BPHD performed data anonymization to solely safeguard personal data and not to infringe on the data reliability or correctness and comprehensiveness. The BPHD is a Romanian public authority mandated to develop public health policies and programs, devise preventive measures and collect public health statistics. In this regard, the BPHD is a reliable official source of data. The authors do not have details of the collection of the epidemiological data and, therefore, cannot assess the reliability of the dataset acquired from the BPHD. After receiving the dataset from the BPHD, the authors performed additional checks to ensure the technical quality. Firstly, we implemented plausibility checks, looking for duplicate cases. We identified 118 duplicate cases - with identical values on all variables. These cases were eventually removed from the dataset. Secondly, we performed completeness checks. Namely, we closely examined the variables in searching for unavailable information. We detected less than 0.3% missing values in relation to two variables, i.e., confirmation_date and age (specifically, 137 and 36 cases, respectively). We did not employ any imputation technique for these missing observations. Subsequently, we marked the missing data with “NA”. Thirdly, we executed conformance checks and compared the information embedded in our dataset to available alternative data. To that end, we deployed three comparisons. We compared the Bucharest COVID-19 dataset against the first 147 disaggregated records officially announced by the health authorities at the onset of the pandemic in Romania[12,22]. For each of the 147 records, the authorities provided various pieces of information: the confirmation date, patients’ age, sex, residence, probable contacts, and travel history. A human-to-human network analysis of these first 147 records is available in the literature[12]. Further, we compared our dataset to the total population of COVID-19 cases officially reported in Romania for the same time window (March 7th - November 11th, 2020). We illustrate the two-time series in Table 3. Also, we compared our dataset to the most recently updated data structure of the resident population in Bucharest (as of July 1st, 2020)[20]. Tables 4 and 5 illustrate the comparison between the Bucharest COVID-19 dataset and the structure of the Bucharest resident population. Eventually, we compared the stringency phase variable from our dataset to the stringency index developed by the University of Oxford[25].
Table 3

The distribution of the COVID-19 confirmed cases in Romania vs the distribution of the COVID-19 confirmed cases in Bucharest, by weeks and phases, between March 7th and November 11th, 2020.

Time intervalCOVID-19 cases RomaniaCOVID-19 cases Romania (%)COVID-19 cases in BucharestCOVID-19 cases in Bucharest (%)% differences
Weeks 10–11830.0400.10.1
Phase 1 (07.0315.03.)1070.0470.10.1
Weeks 12–131,2030.43210.70.3
Weeks 14–154,1751.34591.0−0.3
Weeks 16–174,9501.64030.9−0.7
Weeks 18–194,3941.43120.7−0.7
Weeks 20–212,9010.92530.5−0.4
Phase 2 (16.0314.05.)15,8895.11,5513.3−1.8
Weeks 22–232,3910.84591.00.2
Weeks 24–253,2971.04851.00.0
Phase 3 (15.0516.06.)6,1632.09322.00.0
Weeks 26–274,7661.56811.50.0
Weeks 28–297,6362.41,0592.3−0.1
Weeks 30–3115,0844.81,9744.3−0.5
Weeks 32–3317,1605.52,1674.7−0.8
Weeks 34–3516,4225.22,4265.20.0
Weeks 36–3716,6075.32,6775.80.5
Weeks 38–3918,6085.93,4377.41.5
Weeks 40–4129,2039.36,30713.64.3
Phase 4 (17.0606.10.)115,32636.718,49940.03.3
Weeks 42–4352,14616.68,53818.41.8
Weeks 44–4586,03027.412,04526.0−1.4
Weeks 46–4727,2338.72,2604.9−3.8
Phase 5 (07.1011.11.)176,80456.225,27454.6−1.6
Total cases314,289100.046,303100.0

Calculations are made per time slot (weeks and phases, respectively). Percentages are calculated per column. Percentage (%) differences represent the difference between percentages in the Bucharest COVID-19 dataset and percentages in the population of COVID-19 cases in Romania. Out of the 46,440 cases, 46,303 have complete information on the infection confirmation date variable and served for recoding purposes (column “COVID-19 cases in Bucharest”).

Table 4

The resident population of Bucharest (as of July 1st, 2020) vs the COVID-19 confirmed cases in Bucharest (March 7th - November 11th, 2020) by sex and age groups.

VariablesPopulation (freq.)Population (%)COVID-19 cases (freq.)COVID-19 cases (%)% differences
Sexmale850,00946.521,74446.80.3
female977,38153.524,69653.2−0.3
Age groups0–4 y.o.104,4425.77211.5−4.2
5–9 y.o.90,2784.96851.5−3.4
10–14 y.o.84,2334.69322.0−2.6
15–19 y.o.63,0343.41,1342.4−1.0
20–24 y.o.60,3583.32,4665.32.0
25–29 y.o.101,5645.63,7068.02.4
30–34 y.o.187,93210.35,11711.00.7
35–39 y.o.171,6099.45,00710.81.4
40–44 y.o.170,1079.35,65012.22.9
45–49 y.o.129,2657.14,62110.02.9
50–54 y.o.141,3157.75,04810.93.2
55–59 y.o.86,9224.82,8636.21.4
60–64 y.o.117,6736.52,6105.6−0.9
65–69 y.o.114,1286.22,1254.6−1.6
70–74 y.o.78,4394.31,3692.9−1.4
75–79 y.o.48,0212.68641.9−0.7
80–84 y.o.41,3432.37721.7−0.6
85+ y.o.36,7272.07141.4−0.6
NA360.10.1
Total1,827,390100.046,440100.0

Percentages (%) are calculated per column. Percentage (%) differences represent the difference between percentages in the Bucharest COVID-19 dataset and percentages in the resident population of Bucharest.

Table 5

The resident population of Bucharest (as of July 1st, 2020) vs the COVID-19 confirmed cases in Bucharest (March 7th - November 11th, 2020) by sex and age combined.

SexAge groupsPopulation (freq.)Population (%)COVID-19 cases (freq.)COVID-19 cases (%)% differences
Male0–4 y.o.53,7596.34161.9−4.4
5–9 y.o.46,6635.53261.5−4.0
10–14 y.o.43,4125.14742.2−2.9
15–19 y.o.32,7993.96162.8−1.1
20–24 y.o.30,3673.61,1705.41.8
25–29 y.o.47,0515.51,6797.72.2
30–34 y.o.88,41710.42,53811.71.3
35–39 y.o.84,5269.92,39711.01.1
40–44 y.o.83,9519.92,64912.22.3
45–49 y.o.62,6347.42,0489.52.1
50–54 y.o.66,4877.82,19410.12.3
55–59 y.o.39,3194.61,2885.91.3
60–64 y.o.50,0905.91,2785.90.0
65–69 y.o.47,9065.61,0544.8−0.8
70–74 y.o.31,4093.76633.0−0.7
75–79 y.o.17,1342.03991.8−0.2
80–84 y.o.13,2261.62961.4−0.2
85+ y.o.10,8591.32361.1−0.2
NA230.1
Total850,009100.021,744100.00.0
Female0–4 y.o.50,6835.23051.2−4.0
5–9 y.o.43,6154.53591.5−3.0
10–14 y.o.40,8214.24581.8−2.4
15–19 y.o.30,2353.15182.1−1.0
20–24 y.o.29,9913.11,2965.22.1
25–29 y.o.54,5135.52,0278.22.7
30–34 y.o.99,51510.22,57910.40.2
35–39 y.o.87,0838.92,61010.61.7
40–44 y.o.86,1568.83,00112.23.4
45–49 y.o.66,6316.82,57310.43.6
50–54 y.o.74,8287.62,85411.64.0
55–59 y.o.47,6034.91,5756.41.5
60–64 y.o.67,5836.91,3325.4−1.5
65–69 y.o.66,2226.81,0714.3−2.5
70–74 y.o.47,0304.87062.9−1.9
75–79 y.o.30,8873.24651.9−1.3
80–84 y.o.28,1172.94761.9−1.0
85+ y.o.25,8682.64781.9−0.7
NA130.1
Total977,381100.024,696100.00.0

Percentages (%) are calculated per column. Percentage (%) differences represent the difference between percentages in the Bucharest COVID-19 dataset and percentages in the resident population of Bucharest.

The distribution of the COVID-19 confirmed cases in Romania vs the distribution of the COVID-19 confirmed cases in Bucharest, by weeks and phases, between March 7th and November 11th, 2020. Calculations are made per time slot (weeks and phases, respectively). Percentages are calculated per column. Percentage (%) differences represent the difference between percentages in the Bucharest COVID-19 dataset and percentages in the population of COVID-19 cases in Romania. Out of the 46,440 cases, 46,303 have complete information on the infection confirmation date variable and served for recoding purposes (column “COVID-19 cases in Bucharest”). The resident population of Bucharest (as of July 1st, 2020) vs the COVID-19 confirmed cases in Bucharest (March 7th - November 11th, 2020) by sex and age groups. Percentages (%) are calculated per column. Percentage (%) differences represent the difference between percentages in the Bucharest COVID-19 dataset and percentages in the resident population of Bucharest. The resident population of Bucharest (as of July 1st, 2020) vs the COVID-19 confirmed cases in Bucharest (March 7th - November 11th, 2020) by sex and age combined. Percentages (%) are calculated per column. Percentage (%) differences represent the difference between percentages in the Bucharest COVID-19 dataset and percentages in the resident population of Bucharest. Supplementary, we deployed data type checks to ensure that the data entered had the correct data type. For instance, we examined whether the age variable contains only numerical values. Further, we ran a range check to verify whether the values taken by our variables fall within a predefined range. For example, whether the age variable has taken a reasonable range of values. Finally, we performed a format check to ensure that our variables had the predefined format. For instance, we assessed whether the values taken by the confirmation_date variable are all stored in the same fixed format, i.e., MM/DD/YYYY.

Usage Notes

Our data records illustrate the COVID-19 prevalence in an urban community (Bucharest) for the first 250 days by providing a high-level granularity dataset. Precisely, the dataset comprises individual-level covariates, such as the age and sex of the officially confirmed patients, in a longitudinal (daily) fashion. We hope to make a contribution to the current international efforts of coalescing disaggregated empirical evidence on the spread of COVID-19. Our dataset may prove a valuable instrument for public health experts, policy-makers, scientists, and even journalists interested in assessing and better understanding COVID-19 spread in urban communities, especially before introducing the vaccines. For example, our data may demonstrate its utility in informing the efforts of scientists to statistical model or simulate the spread of diseases, in general, and of respiratory viruses, in particular. Our disaggregated observations may assist public health experts in building a comprehensive picture of the epidemiological situation. Moreover, it may help scientists establish (confirm) causal inferences in virus circulation patterns and solve potential problems, such as Simpson’s paradox[26]. Furthermore, this dataset is suitable for European or global comparisons as it comprises individual-level cases that allow for standardization. If our dataset is supplemented with compatible information available from other sources[16,22,27], it may prove fruitful for gearing pharmaceutical interventions (e.g., vaccination efforts and strategies). For example, a subsample of this dataset has been partly used to estimate the role of age in spreading COVID-19 across a social network in Bucharest[16,17]. The age and sex of the patients confirmed positive between August 1st and October 31st, 2020, were input into relational hyperevent models[28,29]. Precisely, these two covariates were combined with network data (human-to-human transmission chains) to test for age and sex homophily effects[17]. Additionally, the variables embedded in our dataset may also be used, as covariates, in conjunction with network data, for estimating Exponential Random Graph Models (ERGMs)[15,30]. Furthermore, the current dataset may prove its utility in assessing the impact of the NPIs advanced by the local authorities in Bucharest between March 7th and November 11th, 2020. Various insights concerning the effects of the NPIs may be inferred using the sex and age covariates. For instance, the information in Table 6 implies significantly higher shares of COVID-19 cases among females when the stringency levels of the NPIs are higher. Further, the evidence exhibited in Table 6 may support the very few previous studies[31] that claim the average age of COVID-19 patients decreases and stabilizes over time.
Table 6

The distribution of COVID-19 confirmed cases in Bucharest (March 7th - November 11th, 2020), by sex and age, on weeks and phases.

Time interval% femaleaverage ageage std. deviationno. of cases
Weeks 10–1145.042.013.340
Phase 1 (07.03–15.03.)44.740.114.447
Weeks 12–1353.646.317.5321
Weeks 14–1558.445.917.6459
Weeks 16–1758.847.816.7403
Weeks 18–1957.748.918.7312
Weeks 20–2157.351.017.7253
Phase 2 (16.03–14.05.)57.547.417.61,551
Weeks 22–2361.949.119.2459
Weeks 24–2558.146.017.7485
Phase 3 (15.05–16.06.)59.448.418.4932
Weeks 26–2753.545.317.6681
Weeks 28–2951.444.118.21,059
Weeks 30–3151.942.418.01,974
Weeks 32–3351.843.118.02,167
Weeks 34–3552.643.118.32,426
Weeks 36–3751.343.117.82,677
Weeks 38–3951.842.518.23,437
Weeks 40–4153.642.717.86,307
Phase 4 (17.06–06.10.)52.343.118.118,499
Weeks 42–4352.942.717.58,538
Weeks 44–4553.543.617.312,045
Weeks 46–4753.744.417.82,260
Phase 5 (07.10–11.11.)53.443.217.425,274

Calculations are made per week and phase. The number of cases in the last column does not include missing values. In the first column, statistically significant differences (p < 0.05) were marked in bold.

The distribution of COVID-19 confirmed cases in Bucharest (March 7th - November 11th, 2020), by sex and age, on weeks and phases. Calculations are made per week and phase. The number of cases in the last column does not include missing values. In the first column, statistically significant differences (p < 0.05) were marked in bold. Our dataset may also provide a better understanding of the susceptibility to infection by biological sex. Already available scientific evidence has pointed to lower treatment efficiency[32], greater rates of hospitalization[33], higher probability of intensive treatment[34], and a higher risk of death for males[35-37]. Still, the evidence is mixed when looking at confirmed cases. Earlier reports find a sex imbalance, with COVID-19 male patients having a greater risk of infection[38,39]. However, more recent studies uncover no difference between males and females regarding susceptibility to infection[40,41]. We find in our dataset that, overall, significantly more females were confirmed with COVID-19 than males (χ2 = 187.64, df = 1, p = 0.000). Approximately 53.2% of all confirmed cases were females. The high level of detail in our data shows how males and females were affected during the first 250 days of the pandemic in Bucharest. For illustrative purposes, we report the differences between the Bucharest COVID-19 dataset and the resident population of Bucharest, by sex and age groups, at a 14-day time interval (Fig. 4). For females, differences range from −6.8 to +10.6, while for males, from −6.3 to +7.3. In Fig. 4, bright yellow colours designate high positive differences (more COVID-19 cases than we would expect if compared with the total population), and dark blue colours designate high negative differences (fewer COVID-19 cases than what we would expect if compared with the total population). Negative differences were found in the 0–19 age group and the 70+ age group, irrespective of the sex and confirmation date. Furthermore, the dataset provides evidence for a disproportionate impact of COVID-19 on sex, during the first few weeks of the pandemic. Throughout weeks 12 to 21, significantly high positive differences can be noticed for females aged between 40 and 54. More adult age females were tested positive during the state of emergency than we would expect compared to their share in the total resident population. The effect is not visible in the case of men. In sum, these results may reveal occupational segregation and, consequently, give support to existing reports about the unbalanced composition of the global health workforce (with females representing about 70%[42]).
Fig. 4

Statistically significant differences between the structure of COVID-19 confirmed cases in Bucharest and the structure of the resident population of Bucharest. We illustrate the significant differences by age groups and sex: females (a) and males (b). Brighter yellow colours designate high positive differences (more COVID-19 cases than expected when compared with the total population) and dark blue colours designate high negative differences (fewer COVID-19 cases than expected when compared with the total population).

Statistically significant differences between the structure of COVID-19 confirmed cases in Bucharest and the structure of the resident population of Bucharest. We illustrate the significant differences by age groups and sex: females (a) and males (b). Brighter yellow colours designate high positive differences (more COVID-19 cases than expected when compared with the total population) and dark blue colours designate high negative differences (fewer COVID-19 cases than expected when compared with the total population). Comparisons of the disaggregated COVID-19 data to population parameters are expected to have a critical role in designing health policies. The scientific community has already documented this need for demographically informed decisions, stressing the importance of interlinking the stringency and content of COVID-19 NPIs with key figures of the population[43]. For instance, school closures and curfew for individuals aged 65+ are expected to produce different outcomes depending on the structure of the population of interest. Moreover, sex and age disaggregated data are expected to guide crafting strategies for vaccination[44-46]. Last but not least, our dataset provides location details (for 83% of the cases) – see: the “district” variable, which coupled with the “confirmation date” records, could provide a spatiotemporal image for the first 250 days of the COVID-19 pandemic. In conclusion, we argue that the present disaggregated dataset can significantly improve the accuracy and effectiveness of NPIs, especially in countries with low vaccinations rates. Moreover, we deem that the qualitative scale of stringency (Fig. 1, Table 1 and the corresponding main content) is sufficiently justified and detailed that future researchers could use this, to extract, or modify it for their own purpose of exploration. Also, the modelling of COVID-19 spreading in this micro-context may be performed by corroborating our dataset with the detailed evidence available with the COVID-19 Border Accountability Project (COBAP)[47].
Measurement(s)COVID-19 cases • Demographic individual-level data • Nonpharmaceutical interventions
Technology Type(s)manual curation, checks for plausibility and completeness • manual curation, checks for plausibility, completeness, conformance • manual curation, data collection, checks for plausibility, completeness, conformance
Sample Characteristic - OrganismHomo sapiens
Sample Characteristic - EnvironmentUrban community
Sample Characteristic - LocationBucharest, Romania
  24 in total

1.  Data explosion during COVID-19: A call for collaboration with the tech industry & data scrutiny.

Authors:  Elizabeth M Hechenbleikner; Daniel V Samarov; Ed Lin
Journal:  EClinicalMedicine       Date:  2020-05-24

2.  Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia.

Authors:  Qun Li; Xuhua Guan; Peng Wu; Xiaoye Wang; Lei Zhou; Yeqing Tong; Ruiqi Ren; Kathy S M Leung; Eric H Y Lau; Jessica Y Wong; Xuesen Xing; Nijuan Xiang; Yang Wu; Chao Li; Qi Chen; Dan Li; Tian Liu; Jing Zhao; Man Liu; Wenxiao Tu; Chuding Chen; Lianmei Jin; Rui Yang; Qi Wang; Suhua Zhou; Rui Wang; Hui Liu; Yinbo Luo; Yuan Liu; Ge Shao; Huan Li; Zhongfa Tao; Yang Yang; Zhiqiang Deng; Boxi Liu; Zhitao Ma; Yanping Zhang; Guoqing Shi; Tommy T Y Lam; Joseph T Wu; George F Gao; Benjamin J Cowling; Bo Yang; Gabriel M Leung; Zijian Feng
Journal:  N Engl J Med       Date:  2020-01-29       Impact factor: 176.079

3.  Demographic science aids in understanding the spread and fatality rates of COVID-19.

Authors:  Jennifer Beam Dowd; Liliana Andriano; David M Brazel; Valentina Rotondi; Per Block; Xuejie Ding; Yan Liu; Melinda C Mills
Journal:  Proc Natl Acad Sci U S A       Date:  2020-04-16       Impact factor: 11.205

4.  Clinical Characteristics of Coronavirus Disease 2019 in China.

Authors:  Wei-Jie Guan; Zheng-Yi Ni; Yu Hu; Wen-Hua Liang; Chun-Quan Ou; Jian-Xing He; Lei Liu; Hong Shan; Chun-Liang Lei; David S C Hui; Bin Du; Lan-Juan Li; Guang Zeng; Kwok-Yung Yuen; Ru-Chong Chen; Chun-Li Tang; Tao Wang; Ping-Yan Chen; Jie Xiang; Shi-Yue Li; Jin-Lin Wang; Zi-Jing Liang; Yi-Xiang Peng; Li Wei; Yong Liu; Ya-Hua Hu; Peng Peng; Jian-Ming Wang; Ji-Yang Liu; Zhong Chen; Gang Li; Zhi-Jian Zheng; Shao-Qin Qiu; Jie Luo; Chang-Jiang Ye; Shao-Yong Zhu; Nan-Shan Zhong
Journal:  N Engl J Med       Date:  2020-02-28       Impact factor: 91.245

5.  Sex differences in susceptibility, severity, and outcomes of coronavirus disease 2019: Cross-sectional analysis from a diverse US metropolitan area.

Authors:  Farhaan S Vahidy; Alan P Pan; Hilda Ahnstedt; Yashasvee Munshi; Huimahn A Choi; Yordanos Tiruneh; Khurram Nasir; Bita A Kash; Julia D Andrieni; Louise D McCullough
Journal:  PLoS One       Date:  2021-01-13       Impact factor: 3.240

Review 6.  A call for standardised age-disaggregated health data.

Authors:  Theresa Diaz; Kathleen L Strong; Bochen Cao; Regina Guthold; Allisyn C Moran; Ann-Beth Moller; Jennifer Requejo; Ritu Sadana; Jotheeswaran Amuthavalli Thiyagarajan; Emmanuel Adebayo; Elsie Akwara; Agbessi Amouzou; John J Aponte Varon; Peter S Azzopardi; Cynthia Boschi-Pinto; Liliana Carvajal; Venkatraman Chandra-Mouli; Sarah Crofts; Saeed Dastgiri; Jeremiah S Dery; Shatha Elnakib; Laura Fagan; B Jane Ferguson; Julia Fitzner; Howard S Friedman; Ann Hagell; Eduard Jongstra; Laura Kann; Somnath Chatterji; Mike English; Philippe Glaziou; Claudia Hanson; Ahmad R Hosseinpoor; Andrew Marsh; Alison P Morgan; Melinda K Munos; Abdisalan Noor; Boris I Pavlin; Rich Pereira; Tyler A Porth; Joanna Schellenberg; Rizwana Siddique; Danzhen You; Lara M E Vaz; Anshu Banerjee
Journal:  Lancet Healthy Longev       Date:  2021-07

7.  Gender Differences in Patients With COVID-19: Focus on Severity and Mortality.

Authors:  Jian-Min Jin; Peng Bai; Wei He; Fei Wu; Xiao-Fang Liu; De-Min Han; Shi Liu; Jin-Kui Yang
Journal:  Front Public Health       Date:  2020-04-29

8.  Decreasing median age of COVID-19 cases in the United States-Changing epidemiology or changing surveillance?

Authors:  Dina N Greene; Michael L Jackson; David R Hillyard; Julio C Delgado; Robert L Schmidt
Journal:  PLoS One       Date:  2020-10-15       Impact factor: 3.240

9.  Gender differences in predictors of intensive care units admission among COVID-19 patients: The results of the SARS-RAS study of the Italian Society of Hypertension.

Authors:  Guido Iaccarino; Guido Grassi; Claudio Borghi; Stefano Carugo; Francesco Fallo; Claudio Ferri; Cristina Giannattasio; Davide Grassi; Claudio Letizia; Costantino Mancusi; Pietro Minuz; Stefano Perlini; Giacomo Pucci; Damiano Rizzoni; Massimo Salvetti; Riccardo Sarzani; Leonardo Sechi; Franco Veglio; Massimo Volpe; Maria Lorenza Muiesan
Journal:  PLoS One       Date:  2020-10-06       Impact factor: 3.240

10.  Male sex identified by global COVID-19 meta-analysis as a risk factor for death and ITU admission.

Authors:  Hannah Peckham; Nina M de Gruijter; Charles Raine; Anna Radziszewska; Coziana Ciurtin; Lucy R Wedderburn; Elizabeth C Rosser; Kate Webb; Claire T Deakin
Journal:  Nat Commun       Date:  2020-12-09       Impact factor: 17.694

View more
  2 in total

1.  Disaggregated data on age and sex for the first 250 days of the COVID-19 pandemic in Bucharest, Romania.

Authors:  Marian-Gabriel Hâncean; Maria Cristina Ghiță; Matjaž Perc; Jürgen Lerner; Iulian Oană; Bianca-Elena Mihăilă; Adelina Alexandra Stoica; David-Andrei Bunaciu
Journal:  Sci Data       Date:  2022-05-31       Impact factor: 8.501

2.  Occupations and their impact on the spreading of COVID-19 in urban communities.

Authors:  Marian-Gabriel Hâncean; Jürgen Lerner; Matjaž Perc; Iulian Oană; David-Andrei Bunaciu; Adelina Alexandra Stoica; Maria-Cristina Ghiţă
Journal:  Sci Rep       Date:  2022-08-18       Impact factor: 4.996

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.