| Literature DB >> 34218819 |
Jasmine Nahorniak1, Viktor Bovbjerg2, Samantha Case3, Laurel Kincl2.
Abstract
BACKGROUND: Commercial fishing consistently has among the highest workforce injury and fatality rates in the United States. Data related to commercial fishing incidents are routinely collected by multiple organizations which do not currently coordinate or automatically link data. Each data set has the potential to generate a more complete picture to inform prevention efforts. Our objective was to examine the utility of using statistical data linkage methods to link commercial fishing incident data when personally identifiable information is not available.Entities:
Keywords: Fatality; Fishing; Injury; Linkage; Safety; Vessel
Year: 2021 PMID: 34218819 PMCID: PMC8256577 DOI: 10.1186/s40621-021-00323-z
Source DB: PubMed Journal: Inj Epidemiol ISSN: 2197-1714
Number of commercial fishing incidents recorded in each of the data sets used in this study
| Data Set | Date Range (YYYY-MM-DD) | Region | Total a | OR/WA a |
|---|---|---|---|---|
| Commercial Fishing Incident Database | 2000-01-04 to 2017-12-04 | All USA | 1315 (2966) | 194 (458) |
| Vessel Casualty | 2010-08-15 to 2014-12-31 | OR/WA | 524 (0) | 524 (0) |
| Nonfatal Injuries | 2002-01-12 to 2016-10-19 | OR/WA/CA | 232 (232) | 184 (184) |
| Oregon Trauma Registry | 2009-05-08 to 2016-07-06 | OR | 11 (11) | 11 (11) |
a Each incident may involve multiple personnel. The number between parentheses is the total number of personnel cases.
Fig. 1Schematic (not to scale) of the expected overlap between commercial fishing incidents in the following data sets: Commercial Fishing Incident Database, Vessel Casualty, Nonfatal Injuries, and the Oregon Trauma Registry
Linkage metrics TP (true positives), FP (false positives), FN (false negatives), TN (true negatives), and f-score for each classifier per pair of data sets
Match Parameters: Incident Date, Incident State Combinations (2966 * 11): 32,626 Golden Matches: 5 | ||||||
| Expectation/Conditional Maximization | 0.5 | 4 | 6 | 1 | 32,615 | 0.53 |
| Support vector machine | 0.5 | 0 | 0 | 5 | 32,621 | 0 |
| Naïve-Bayes | 0.005 | 5 | 29 | 0 | 32,592 | 0.26 |
| Logistic regression | 0.005 | 5 | 29 | 0 | 32,592 | 0.26 |
Match Parameters: Incident Date, Vessel Official Number, Latitude/Longitude Combinations (1315 * 524): 689,060 Golden Matches: 9 | ||||||
| Expectation/Conditional Maximization | 0.5 | 9 | 3 | 0 | 689,048 | 0.86 |
| Support vector machine | 0.5 | 8 | 0 | 1 | 689,051 | 0.94 |
| Naïve-Bayes | 0.005 | 9 | 3 | 0 | 689,048 | 0.86 |
| Logistic regression | 0.005 | 9 | 7 | 0 | 689,044 | 0.72 |
Match Parameters: Incident Date, Vessel Official Number, Latitude/Longitude Combinations (2966 * 232): 688,112 Golden Matches: 12 | ||||||
| Expectation/Conditional Maximization | 0.5 | 12 | 52 | 0 | 688,048 | 0.32 |
| Support vector machine | 0.5 | 0 | 0 | 12 | 688,100 | 0 |
| Naïve-Bayes | 0.005 | 12 | 52 | 0 | 688,048 | 0.32 |
| Logistic regression | 0.005 | 12 | 52 | 0 | 688,048 | 0.32 |
Match Parameters: Incident Date, Vessel Official Number, Latitude/Longitude Combinations (232 * 524): 121,568 Golden Matches: 10 | ||||||
| Expectation/Conditional Maximization | 0.5 | 10 | 13 | 0 | 121,545 | 0.61 |
| Support vector machine | 0.5 | 9 | 1 | 1 | 121,557 | 0.90 |
| Naïve-Bayes | 0.01 | 10 | 2 | 0 | 121,556 | 0.91 |
| Logistic regression | 0.01 | 10 | 13 | 0 | 121,545 | 0.61 |
Match Parameters: Incident Date, Incident State Combinations (232 * 11): 2552 Golden Matches: 4 | ||||||
| Expectation/Conditional Maximization | 0.2 | 4 | 3 | 0 | 2545 | 0.73 |
| Support vector machine | 0.5 | 0 | 0 | 4 | 2548 | 0 |
| Naïve-Bayes | 0.005 | 4 | 3 | 0 | 2545 | 0.73 |
| Logistic regression | 0.005 | 4 | 7 | 0 | 2541 | 0.53 |
Match Parameters: Incident Date, Incident State Combinations (524 * 11): 5764 Golden Matches: 0 | ||||||
Number of true matches found for each data set combination
| Commercial Fishing Incident Database | Oregon Trauma Registry | Vessel Casualty | Nonfatal Injuries | Matches a | |
|---|---|---|---|---|---|
| Data Set Pairs | x | x | 5 | ||
| x | x | 9 | |||
| x | x | 12 (20) | |||
| x | x | 0 | |||
| x | x | 5 | |||
| x | x | 10 | |||
| Multiple Data Sets | x | x | x | 0 | |
| x | x | x | 3 | ||
| x | x | x | 1 | ||
| x | x | x | 0 | ||
| x | x | x | x | 0 | |
a The numbers in parentheses include close matches
Fig. 2Schematic (not to scale) illustrating the number of true and close matches found between commercial fishing incidents in the following data sets: Commercial Fishing Incident Database, Vessel Casualty, Nonfatal Injuries, and the Oregon Trauma Registry
Fig. 3Match probabilities from the Naïve-Bayes classifier for all possible links found across different data set pairs. The threshold of 0.005 is shown as a solid line. A probability of 0.5 is indicated by a dashed line. True matches are highlighted with red circles. The same information is plotted on a log scale (upper panel) and linear scale (lower panel)
Relative accuracy and completeness of data within the 41 true match pairs
| Parameter | Relative Accuracy | Completeness | Commercial Fishing Incident Databasea | Vessel Casualtya | Nonfatal Injuriesa | Oregon Trauma Registrya |
|---|---|---|---|---|---|---|
| Incident Date | +/− 1 day | complete | x | x | x | x |
| Incident Time | varied from complete to no agreement | AM/PM designation and time zone often missing | x | x | x | x |
| Incident State | always agreed | complete | x | x | x | x |
| Latitude/Longitude | +/− 0.5 degrees (50 km) | often missing from trauma registry | x | x | x | x |
| Miles from Shore | never agreed | complete | x | x | x | |
| Vessel Official Number | always agreed | sometimes unavailable; state number used instead | x | x | x | |
| # People on Board | always agreed | only occasionally missing | x | x | x | |
| Narrative | consistent stories; matching provides additional details | rarely missing | x | x | x | x |
a The rightmost columns indicate the data sets that provide the listed parameters