| Literature DB >> 35252489 |
David Pastoriza1, Thierry Warin1.
Abstract
This data article describes a dataset that allows exploring the determinants of superstars' sentiment in tournaments. It consists of 1,284 press conferences of Tiger Woods in the PGA Tour between 1996 and 2020. We used natural language processing, a form of artificial intelligence, to extract and encode in a quantitative form the sentiment in Tiger Woods press conferences both before the tournament and after the rounds played. Additionally, the dataset provides a series of variables that describe Tiger Woods' scoring and performance momentum in each round and variables that describe health-related and off-the-course issues that could affect his performance on the course. This data can be useful to understand the sentiment that superstars go through before important tournaments, their sentiment following a major victory or defeat, how that sentiment evolves throughout their athletic career, and how sentiment is associated with performance momentum.Entities:
Keywords: Machine learning; Natural language processing; Press conferences; Sentiment analysis; Tiger Woods
Year: 2022 PMID: 35252489 PMCID: PMC8889344 DOI: 10.1016/j.dib.2022.107955
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Tiger Woods speeches’ sentiment scores.
Descriptive statistics of the variables.
| Variable | Mean | S.D. | Min | Max |
|---|---|---|---|---|
| Round_Score | 69.57 | 3.18 | 61 | 85 |
| End_of_Round_Pos_numeric_ | 21.19 | 26.24 | 1 | 152 |
| Total_Holes_Over_Par | 2.53 | 1.67 | 0 | 9 |
| Birdies | 4.26 | 1.79 | 1 | 10 |
| Birdies_Rank | 25.37 | 27.18 | 1 | 145 |
| Bogey_Avoidance_Rank | 25.86 | 27.88 | 1 | 152 |
| Driving_Distance_Rank | 20.48 | 26.62 | 1 | 153 |
| Driving_Accuracy_Rank | 43.08 | 34.56 | 1 | 175 |
| Response_Positive | 13.6 | 9.85 | 0 | 75 |
| Response_Negative | 4.364 | 3.77 | 0 | 24 |
| Response_Sentiment | 9.235 | 8.42 | −9 | 60 |
| GIR_Rank | 28.96 | 29.64 | 1 | 147 |
| Scrambling_Rank | 36.56 | 34.28 | 1 | 172 |
| Distance_to_leader_strokes | 5.61 | 4.67 | 0 | 30 |
| Distance_to_leader_ranks | 20.19 | 26.24 | 0 | 151 |
| Ranks_gained | −4.06 | 16.74 | −90 | 65 |
| Strokes_gained_v_a_v_Leader | 0.69 | 3.17 | −9 | 16 |
| Distance_to_runner_up_strokes_ | 2.35 | 2.62 | 0 | 15 |
| Strokes_gained_v_a_v_Runner_up | 1.11 | 2.30 | −4 | 7 |
| Minor_Injury | 0.04 | 0.19 | 0 | 1 |
| Major_Injury_Surgery | 0.02 | 0.16 | 0 | 1 |
| Personal_Issues | 0.03 | 0.17 | 0 | 1 |
| Major | 0.24 | 0.42 | 0 | 1 |
| Prize_Money | 5.69 | 2.46 | 1.0 | 12.7 |
| SoF | 552 | 188 | 69 | 849 |
| OWGR | 33 | 120 | 1 | 1199 |
Expressed in millions of dollars.
In golf, par is the predetermined number of strokes that a proficient golfer should require to complete a hole. Bogey is a score of one stroke more than par. Birdie is a score of one stroke fewer than par. For instance, if in a par-4 hole a player scored 3, he birdied the hole; if he scored 4, he made par in the hole; if he scored 5, he bogeyed the hole.
A drive is the long-distance shot intended to move the ball a great distance down the fairway towards the green, in which the target hole is located.
The green is the area of specially prepared grass around the hole. A player hits a green “in regulation” if the ball is on the surface of the green and the number of strokes taken is at least two fewer than par (i.e., second stroke in a par-4 hole).
When a player misses the green in regulation, but still makes par or birdie on a hole.
| Subject | Economics |
| Specific subject area | Behavioral Economics; Sports Economics |
| Type of data | Text corpus; Figures; RDS file |
| How data were acquired | Data extracted from ASAP Sports Transcripts, the transcript supplier of the PGA Tour. PGA TOUR data was acquired through a licensing agreement with PGA TOUR, which allows the use of data for scientific purposes. Player's biographic information was manually extracted from PGA TOUR's media guides |
| Data format | Mixed (raw and processed) |
| Description of data collection | The interview data were collected through the asapsports.com website. |
| Data source location | The data was gathered from ASAP Sports Transcripts, the PGA Tour ShotLink Database and the Official World Golf Ranking. |
| Data accessibility | To access the data, enter in |
| Related research article | – |
Description of the variables in the dataset.
| Variable | Type | Description |
|---|---|---|
| Tournament_Year | Numeric | Yearly season |
| Tournament_Order | Ordinal | Chronological order of tournaments within a season (i.e., smaller numbers took placer earlier in the season) |
| Permanent_Tournament_Number | Numeric | Identification number that is unique to that tournament, regardless of the sponsor/name of the tournament that may change over the years |
| Course_Number | Numeric | Course number that does not change over time (i.e., different tournaments may be played on the same course) |
| Player_Number | Numeric | Player identification number. Does not change over the years |
| Round_Number | Numeric | Round number. PGA Tour tournaments generally have four rounds |
| Event_Name | Numeric | Name of the tournament |
| Course_Name | Text | Name of the course in which the tournament took place |
| Interview_Text | Text | The integral text of the interview |
| Number_Of_Answers | Numeric | Number of answers provided by Tiger Woods during the Q&A section |
| Link | Text | Link to the original document (redirecting on ASAP Transcription website) |
| Response_Negative | Numeric | The negative score computed on Tiger Woods’ responses |
| Response_Positive | Numeric | The positive score computed on Tiger Woods’ responses |
| Response_Sentiment | Numeric | The subtraction between the positive and the negative scores computed on Tiger Woods’ responses |
| Round_Score | Numeric | Number of strokes of the round |
| End_of_Round_Pos_numeric_ | Ordinal | Player's rank in the round (i.e., 1 means he is leading the tournament, 2 means he is the runner-up, etc.) |
| Total_Holes_Over_Par | Numeric | Number of holes in the round in which the player scored bogey or worse in the round |
| Birdies | Numeric | Number of birdies in the round |
| Birdies_Rank | Ordinal | Players' rank in number of birdies in the round (i.e., 1 means he was the player with the highest number of birdies in the round) |
| Bogey_Avoidance_Rank | Ordinal | Player's rank in terms of the number of holes in which the player saved a situation of bogey in the round |
| Driving_Distance_Rank | Ordinal | Player's rank in driving distance in the round |
| Driving_Accuracy_Rank | Ordinal | Player's rank in driving accuracy in the round |
| GIR_Rank | Ordinal | Player's rank in number of greens in regulation in the round |
| Scrambling_Rank | Ordinal | Player's rank in scrambling in the round (i.e., ability to recover from difficult situations) |
| Distance_to_leader_strokes | Numeric | Distance in strokes to the interim leader at the end of the round (i.e., if Tiger is trailing by two shots, it takes value 2; if Tiger is leading, the variable takes value 0) |
| Distance_to_leader_ranks | Numeric | Distance in ranks to the interim leader at the end of the round (i.e., if Tiger is in rank 3, it takes value 2; if Tiger is leading, it takes value 0) |
| Ranks_gained | Numeric | Equal to the rank at the end of Roundn minus the rank at the end of Roundn-1. For instance, if Tiger had a position 3 at the end of Roundn and 5 at the end of Roundn-1, the variable's value is −2. It takes missing values for round 1 |
| Strokes_gained_v_a_v_Leader | Numeric | Equal to distance to the leader (in strokes) at the end of Roundn minus distance to the leader (in strokes) at the end of Roundn-1. It takes missing values for round 1 |
| Distance_to_runner_up_strokes_ | Numeric | Distance in strokes to the interim runner-up at the end of the round. For instance, if Tiger is leading by two shots, the variable should take value 2; if Tiger is co-leading, the variable should take value 0; if Tiger is neither leading nor co-leading, it takes missing value. |
| Strokes_gained_v_a_v_Runner_up | Numeric | Distance in strokes to runner-up at the end of Roundn minus distance to runner-up at the end of Roundn-1. Note that this variable should have a missing value for observations of round 1. Note that this variable should have a missing value when Tiger is not leading. |
| Minor Injury | Categorical {1; 0} | Tiger had a minor injury when he entered the event |
| Major_injury_surgery | Categorical {1; 0} | Tiger Woods had a major injury when he entered the event |
| Personal_issues | Categorical {1; 0} | Tiger Woods had personal issues when he entered the event |
| Major | Categorical {1; 0} | Takes value 1 if tournament is a major (i.e., prestigious) |
| Prize_Money | Numeric | Prize money of the tournament |
| SoF | Numeric | Strength of the field of players in the tournament (i.e., the higher the number, the more competitive is the field) |
| OWGR | Numeric | OWGR of Tiger at the moment of the observation |
In golf, par is the predetermined number of strokes that a proficient golfer should require to complete a hole. Bogey is a score of one stroke more than par. Birdie is a score of one stroke fewer than par. For instance, if in a par-4 hole a player scored 3, he birdied the hole; if he scored 4, he made par in the hole; if he scored 5, he bogeyed the hole.
A drive is the long-distance shot intended to move the ball a great distance down the fairway towards the green, in which the target hole is located.
The green is the area of specially prepared grass around the hole. A player hits a green “in regulation” if the ball is on the surface of the green and the number of strokes taken is at least two fewer than par (i.e., second stroke in a par-4 hole).
When a player misses the green in regulation, but still makes par or birdie on a hole.