| Literature DB >> 30581903 |
Nuno Antonio1,2, Ana de Almeida1,3,4, Luis Nunes1,2,4.
Abstract
This data article describes two datasets with hotel demand data. One of the hotels (H1) is a resort hotel and the other is a city hotel (H2). Both datasets share the same structure, with 31 variables describing the 40,060 observations of H1 and 79,330 observations of H2. Each observation represents a hotel booking. Both datasets comprehend bookings due to arrive between the 1st of July of 2015 and the 31st of August 2017, including bookings that effectively arrived and bookings that were canceled. Since this is hotel real data, all data elements pertaining hotel or costumer identification were deleted. Due to the scarcity of real business data for scientific and educational purposes, these datasets can have an important role for research and education in revenue management, machine learning, or data mining, as well as in other fields.Entities:
Year: 2018 PMID: 30581903 PMCID: PMC6297060 DOI: 10.1016/j.dib.2018.11.126
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Diagram of PMS database tables where variables where extracted from.
Variables description.
| Numeric | Average Daily Rate as defined by | BO, BL and TR / Calculated by dividing the sum of all lodging transactions by the total number of staying nights | |
| Integer | Number of adults | BO and BL | |
| Categorical | ID of the travel agency that made the booking | BO and BL | |
| Integer | Day of the month of the arrival date | BO and BL | |
| Categorical | Month of arrival date with 12 categories: “January” to “December” | BO and BL | |
| Integer | Week number of the arrival date | BO and BL | |
| Integer | Year of arrival date | BO and BL | |
| Categorical | Code for the type of room assigned to the booking. Sometimes the assigned room type differs from the reserved room type due to hotel operation reasons (e.g. overbooking) or by customer request. Code is presented instead of designation for anonymity reasons | BO and BL | |
| Integer | Number of babies | BO and BL | |
| Integer | Number of changes/amendments made to the booking from the moment the booking was entered on the PMS until the moment of check-in or cancellation | BO and BL/Calculated by adding the number of unique iterations that change some of the booking attributes, namely: persons, arrival date, nights, reserved room type or meal | |
| Integer | Number of children | BO and BL/Sum of both payable and non-payable children | |
| Categorical | ID of the company/entity that made the booking or responsible for paying the booking. ID is presented instead of designation for anonymity reasons | BO and BL. | |
| Categorical | Country of origin. Categories are represented in the ISO 3155–3:2013 format | BO, BL and NT | |
| Categorical | Type of booking, assuming one of four categories: | BO and BL | |
| Contract - when the booking has an allotment or other type of contract associated to it; | |||
| Group – when the booking is associated to a group; | |||
| Transient – when the booking is not part of a group or contract, and is not associated to other transient booking; | |||
| Transient-party – when the booking is transient, but is associated to at least other transient booking | |||
| Integer | Number of days the booking was in the waiting list before it was confirmed to the customer | BO/Calculated by subtracting the date the booking was confirmed to the customer from the date the booking entered on the PMS | |
| Categorical | Indication on if the customer made a deposit to guarantee the booking. This variable can assume three categories: | BO and TR/Value calculated based on the payments identified for the booking in the transaction (TR) table before the booking׳s arrival or cancellation date. | |
| No Deposit – no deposit was made; | |||
| In case no payments were found the value is “No Deposit”. | |||
| If the payment was equal or exceeded the total cost of stay, the value is set as “Non Refund”. | |||
| Non Refund – a deposit was made in the value of the total stay cost; | |||
| Otherwise the value is set as “Refundable” | |||
| Refundable – a deposit was made with a value under the total cost of stay. | |||
| Categorical | Booking distribution channel. The term “TA” means “Travel Agents” and “TO” means “Tour Operators” | BO, BL and DC | |
| Categorical | Value indicating if the booking was canceled (1) or not (0) | BO | |
| Categorical | Value indicating if the booking name was from a repeated guest (1) or not (0) | BO, BL and C/ Variable created by verifying if a profile was associated with the booking customer. If so, and if the customer profile creation date was prior to the creation date for the booking on the PMS database it was assumed the booking was from a repeated guest | |
| Integer | Number of days that elapsed between the entering date of the booking into the PMS and the arrival date | BO and BL/ Subtraction of the entering date from the arrival date | |
| Categorical | Market segment designation. In categories, the term “TA” means “Travel Agents” and “TO” means “Tour Operators” | BO, BL and MS | |
| Categorical | Type of meal booked. Categories are presented in standard hospitality meal packages: | BO, BL and ML | |
| Undefined/SC – no meal package; | |||
| BB – Bed & Breakfast; | |||
| HB – Half board (breakfast and one other meal – usually dinner); | |||
| FB – Full board (breakfast, lunch and dinner) | |||
| Integer | Number of previous bookings not cancelled by the customer prior to the current booking | BO and BL / In case there was no customer profile associated with the booking, the value is set to 0. Otherwise, the value is the number of bookings with the same customer profile created before the current booking and not canceled. | |
| Integer | Number of previous bookings that were cancelled by the customer prior to the current booking | BO and BL/ In case there was no customer profile associated with the booking, the value is set to 0. Otherwise, the value is the number of bookings with the same customer profile created before the current booking and canceled. | |
| Integer | Number of car parking spaces required by the customer | BO and BL | |
| Categorical | Reservation last status, assuming one of three categories: | BO | |
| Canceled – booking was canceled by the customer; | |||
| Check-Out – customer has checked in but already departed; | |||
| No-Show – customer did not check-in and did inform the hotel of the reason why | |||
| Date | Date at which the last status was set. This variable can be used in conjunction with the | BO | |
| Categorical | Code of room type reserved. Code is presented instead of designation for anonymity reasons | BO and BL | |
| Integer | Number of weekend nights (Saturday or Sunday) the guest stayed or booked to stay at the hotel | BO and BL/ Calculated by counting the number of weekend nights from the total number of nights | |
| Integer | Number of week nights (Monday to Friday) the guest stayed or booked to stay at the hotel | BO and BL/Calculated by counting the number of week nights from the total number of nights | |
| Integer | Number of special requests made by the customer (e.g. twin bed or high floor) | BO and BL/Sum of all special requests |
ID is presented instead of designation for anonymity reasons.
H1 dataset summary statistics – Date variables.
| 2014-11-18 | 2017-09-14 | 2016-07-31 | 913 |
H1 dataset summary statistics – Categorical variables.
| 186 | 240: 13 095, NULL: 8 209, 250: 2 869, 241: 1 721 | |
| 12 | Aug: 4 894, Jul: 4 573, Apr: 3 609, May: 3 559 | |
| 11 | A: 17 046, D: 10 339, E: 5 638, C: 2 214 | |
| 236 | NULL: 36 952, 223: 784, 281: 138, 154: 133 | |
| 125 | PRT: 17 630, GBR: 6 814, ESP: 3 957, IRL: 2 166 | |
| 4 | Tra.: 30 209, Tra.-Party: 7 791, Con.: 1 776, Gro.:284 | |
| 3 | No Dep.: 38 199, Non-Refund.: 1 719, Ref.: 142 | |
| 4 | TA/TO: 28 295, Dir.: 7 865, Cor.: 3 269, Und.: 1 | |
| 2 | 0: 28 938, 1: 11 122 | |
| 2 | 0: 38 282, 1: 1 778 | |
| 6 | Onl.: 17 729, Off.: 7472, Dir.: 6 513, Gro.: 5 836 | |
| 5 | BB: 30 005, HB: 8 046, Und.: 1 169, FB: 754 | |
| 3 | C.Out: 28 938, Can.: 10 831, No-Show: 291 | |
| 10 | A: 23 399, D: 7 433, E: 4 892, G: 1610 |
H1 dataset summary statistics – Integer and numeric variables.
| 94.95 | 61.44 | -6.38 | 50 | 75 | 125 | 508 | |
| 1.87 | 0.7 | 0 | 2 | 2 | 2 | 55 | |
| 15.82 | 8.88 | 1 | 8 | 16 | 24 | 31 | |
| 27.14 | 14.01 | 1 | 16 | 28 | 38 | 53 | |
| 2016.12 | 0.72 | 2015 | 2016 | 2016 | 2017 | 2017 | |
| 0.014 | 0.12 | 0 | 0 | 0 | 0 | 2 | |
| 0.29 | 0.73 | 0 | 0 | 0 | 0 | 17 | |
| 0.13 | 0.45 | 0 | 0 | 0 | 0 | 10 | |
| 0.53 | 7.43 | 0 | 0 | 0 | 0 | 185 | |
| 92.68 | 97.29 | 0 | 10 | 57 | 155 | 737 | |
| 0.15 | 1 | 0 | 0 | 0 | 0 | 30 | |
| 0.1 | 1.34 | 0 | 0 | 0 | 0 | 26 | |
| 0.14 | 0.35 | 0 | 0 | 0 | 0 | 8 | |
| 1.19 | 1.15 | 0 | 0 | 1 | 2 | 19 | |
| 3.13 | 2.46 | 0 | 1 | 3 | 5 | 50 | |
| 0.62 | 0.81 | 0 | 0 | 0 | 1 | 5 |
H2 dataset summary statistics – Date variables.
| 2014-10-17 | 2017-09-07 | 2016-08-10 | 864 |
H2 dataset summary statistics – Categorical variables.
| Agent | 224 | 9: 31 955, NULL: 8 131, 1: 7 137, 14: 3 640 |
| 12 | Aug: 8 983, May: 8 232, Jul: 8 088, Jun: 7 894 | |
| 9 | A: 57 007, D: 14 983, E: 2 168, F: 2 018 | |
| 208 | NULL: 75 641, 40: 924, 67: 267, 45: 250 | |
| 166 | PRT: 30 960, FRA: 8 804, DEU: 6 084, GBR: 5315 | |
| 4 | Tra.:59 404, Tra.-P.: 17 333, Con.: 2 300, Gro.:293 | |
| 3 | No Dep.: 66 442, Non-Refund.: 12 868, Ref.: 20 | |
| 5 | TA/TO: 68 945, Dir.: 6 780, Cor.: 3 408, GDS: 193 | |
| 2 | 0: 46 228, 1: 33 102 | |
| 2 | 0: 77 298, 1: 2 032 | |
| 8 | Onl.: 38 748, Off.: 16 747, Gro.: 13 975, Dir.: 6 093 | |
| 4 | BB: 62 305, SC: 10 564, HB: 6 417, FB: 44 | |
| 3 | C.Out: 46 228, Can.: 32 186, No-Show: 916 | |
| 8 | A: 62 595, D: 11768, F: 1 791, E: 1 553 |
H2 dataset summary statistics – Integer and numeric variables.
| 105.3 | 43.6 | 0 | 79.2 | 99.9 | 126 | 5400 | |
| 1.85 | 0.51 | 0 | 2 | 2 | 2 | 4 | |
| 15.79 | 8.73 | 1 | 8 | 16 | 23 | 31 | |
| 27.18 | 13.4 | 1 | 17 | 27 | 38 | 53 | |
| 2016.17 | 0.7 | 2015 | 2016 | 2016 | 2017 | 2017 | |
| 0.0049 | 0.084 | 0 | 0 | 0 | 0 | 10 | |
| 0.19 | 0.61 | 0 | 0 | 0 | 0 | 21 | |
| 0.091 | 0.37 | 0 | 0 | 0 | 0 | 3 | |
| 3.23 | 20.87 | 0 | 0 | 0 | 0 | 391 | |
| 109.74 | 110.95 | 0 | 23 | 74 | 163 | 629 | |
| 0.13 | 1.69 | 0 | 0 | 0 | 0 | 72 | |
| 0.08 | 0.42 | 0 | 0 | 0 | 0 | 32 | |
| 0.024 | 0.15 | 0 | 0 | 0 | 0 | 3 | |
| 0.8 | 0.89 | 0 | 0 | 1 | 2 | 16 | |
| 2.18 | 1.46 | 0 | 1 | 2 | 3 | 41 | |
| 0.55 | 0.78 | 0 | 0 | 0 | 1 | 5 |
Fig. 2H1 dataset partial visualization of all observations.
Fig. 3H2 dataset partial visualization of all observations.
| Subject area | Hospitality Management |
|---|---|
| More specific subject area | Revenue Management |
| Type of data | Text files and R objects |
| How data was acquired | Extraction from hotels’ Property Management System (PMS) SQL databases |
| Data format | Mixed (raw and preprocessed) |
| Experimental factors | Some of the variables were engineered from other variables from different database tables. The data point time for each observation was defined as the day prior to each booking׳s arrival |
| Experimental features | Data was extracted via TSQL queries executed directly in the hotels’ PMS databases and R was employed to perform data analysis |
| Data source location | Both hotels are located in Portugal: H1 at the resort region of Algarve and H2 at the city of Lisbon |
| Data accessibility | Data is supplied with the paper |