| Literature DB >> 28061798 |
Kelly K Jones1, Shannon N Zenk2, Elizabeth Tarlov2,3, Lisa M Powell4, Stephen A Matthews5,6, Irina Horoi7.
Abstract
BACKGROUND: Food environment characterization in health studies often requires data on the location of food stores and restaurants. While commercial business lists are commonly used as data sources for such studies, current literature provides little guidance on how to use validation study results to make decisions on which commercial business list to use and how to maximize the accuracy of those lists. Using data from a retrospective cohort study [Weight And Veterans' Environments Study (WAVES)], we (a) explain how validity and bias information from existing validation studies (count accuracy, classification accuracy, locational accuracy, as well as potential bias by neighborhood racial/ethnic composition, economic characteristics, and urbanicity) were used to determine which commercial business listing to purchase for retail food outlet data and (b) describe the methods used to maximize the quality of the data and results of this approach.Entities:
Keywords: Business lists; Dun and Bradstreet; InfoUSA; Neighborhood food environment
Mesh:
Year: 2017 PMID: 28061798 PMCID: PMC5219657 DOI: 10.1186/s13104-016-2355-1
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Definitions of validity terms
| Term | Definition |
|---|---|
| Count accuracy | Number of outlets is neither an under- nor over-count |
| Classification accuracy | Business type is correctly identified |
| Locational accuracy | Geographic coordinates are accurate within an acceptable level of precision |
| True positive (TP) | Outlet present in business list and observed on the ground |
| False positive (FP) | Outlet present in business list and not observed on the ground |
| True negative (TN) | Outlet not present in business list and not observed on the ground |
| False negative (FN) | Outlet not present in business list and observed on the ground |
| Sensitivity | Proportion of observed outlets that are included in the business list: (TP)/(TP + FN) |
| Positive predictive value (PPV) | Likelihood that an outlet present in the business list is observed: (TP)/(TP + FP) |
| Concordance | Proportion of outlets both present in the business list and observed out of all outlets either in the business list or observed: (TP)/(TP + FP + FN) |
Identification of data source (InfoUSA or Dun and Bradstreet) with better count accuracy statistics for food stores and restaurants
| | All outlets | All food stores | All restaurants | |||
|---|---|---|---|---|---|---|
| Sensitivity | PPV | Sensitivity | PPV | Sensitivity | PPV | |
| D’Angelo [ | N/A | N/A |
| InfoUSA: 0.87 (0.01) | N/A | N/A |
| Dun and Bradstreet: 0.64 (0.02) | Dun and Bradstreet: 0.91 (0.01) | |||||
| Fleischhacker [ |
|
| N/A | N/A |
|
|
| Dun and Bradstreet: 0.41 [0.37, 0.45] | Dun and Bradstreet: 0.31 [0.28, 0.34] | Dun and Bradstreet: 0.38 [0.32, 0.44] | Dun and Bradstreet: 0.29 [0.25, 0.34] | |||
| Liese [ |
|
| InfoUSA: 0.61 [0.58, 0.64] | InfoUSA: 0.82 [0.79, 0.85] |
|
|
| Dun and Bradstreet: 0.55 [0.53, 0.57] | Dun and Bradstreet: 0.78 [0.76, 0.80] | Dun and Bradstreet: 0.63 [0.60, 0.66] | Dun and Bradstreet: 0.76 [0.73, 0.79] | Dun and Bradstreet: 0.50 [0.47, 0.53] | Dun and Bradstreet: 0.79 [0.77, 0.82] | |
| Powell [ | N/A | N/A |
|
|
|
|
| Dun and Bradstreet: 0.52 (0.02) | Dun and Bradstreet: 0.45 (0.02) | Dun and Bradstreet: 0.55 (0.01) | Dun and Bradstreet: 0.66 (0.01) | |||
Italic indicates statistically higher validity statistic as compared to the other data source
Standard errors and confidence intervals reported as originally reported in the cited papers
Bias findings from validation studies in InfoUSA and Dun and Bradstreet business lists
| Study | Racial/ethnic composition | Economic characteristics | Urbanicity | |||
|---|---|---|---|---|---|---|
| InfoUSA | Dun and Bradstreet | InfoUSA | Dun and Bradstreet | InfoUSA | Dun and Bradstreet | |
| Count accuracy | ||||||
| Fleischhacker [ | N/A | N/A | N/A | N/A | No differences found | No differences found |
| Liese [ | N/A | N/A | N/A | N/A | Urban areas had highest accuracy of stores. Rural areas had lowest accuracy of stores. Suburban areas had the lowest accuracy of restaurants | Urban areas had highest accuracy for stores and restaurants. Rural areas had lowest accuracy for stores and restaurants |
| Liese [ | Majority white neighborhoods had lowest accuracy | No differences found | High income and non-poor neighborhoods had lowest accuracy | No differences found | N/A | N/A |
| Powell [ | Majority black neighborhoods had lowest accuracy for food stores and restaurants. Majority non-Hispanic neighborhoods has lower accuracy for food stores | Majority black neighborhoods had lowest accuracy for restaurants and no difference for food stores | No differences found | High income areas had lowest accuracy for food stores and no differences for restaurants | Urban areas had highest accuracy of stores and restaurants. Rural areas had lowest accuracy of stores and restaurants | Urban areas had highest accuracy of stores and restaurants. Rural areas had lowest accuracy of stores and restaurants |
| Classification accuracy | ||||||
| Han [ | Majority non-Hispanic and majority black neighborhoods had lowest classification accuracy | Majority non-Hispanic and majority black neighborhoods had lowest classification accuracy | No differences found | No differences found | N/A | N/A |
| Locational accuracy | ||||||
| Liese [ | N/A | N/A | N/A | N/A | Urban areas were located with the least distance between observed and listed location. Records in suburban areas were most likely to be allocated to the correct census tract | Urban areas were located with the least distance between observed and listed location. Records in suburban areas were most likely to be allocated to the correct census tract |
Identification of business list with better classification accuracy for selected food stores and restaurants
| Study | Supermarkets and grocery stores | Convenience stores | Limited service restaurants | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Sensitivity | PPV | Concordance | Sensitivity | PPV | Concordance | Sensitivity | PPV | Concordance | |
| Han [ | N/A | N/A | InfoUSA: 69–81% | N/A | N/A |
| N/A | N/A | N/A |
| N/A | N/A |
| N/A | N/A | Dun and Bradstreet: 24% | N/A | N/A | N/A | |
| Liese [ | InfoUSA: 0.61 [0.54, 0.69] |
| N/A |
|
| N/A | InfoUSA: 0.08 [0.06, 0.10] | InfoUSA: 0.81 [0.71, 0.91] | N/A |
| Dun and Bradstreet: 0.58 [0.51, 0.66] | Dun and Bradstreet: 0.39 [0.33, 0.46] | N/A | Dun and Bradstreet: 0.40 [0.36, 0.45] | Dun and Bradstreet: 0.63 [0.58, 0.69] | N/A |
| Dun and Bradstreet: 0.67 [0.62, 0.71] | N/A | |
| Powell [ | InfoUSA: 0.54 (0.04) |
|
| InfoUSA: 0.50 (0.03) | InfoUSA: 0.62 (0.03) |
| InfoUSA: 0.09 (0.01) | InfoUSA: 0.52 (0.04) | InfoUSA: 0.09 (0.01) |
| Dun and Bradstreet: 0.46 (0.04) | Dun and Bradstreet: 0.29 (0.03) | Dun and Bradstreet: 0.22 (0.02) | Dun and Bradstreet: 0.38 (0.03) | Dun and Bradstreet: 0.48 (0.03) | Dun and Bradstreet: 0.27 (0.02) |
| Dun and Bradstreet: 0.61 (0.03) |
| |
Italic indicates statistically higher validity statistic as compared to the other data source
Standard errors and confidence intervals reported as originally reported in the cited papers
Sources for chain name lists
| Business type | Source | Years |
|---|---|---|
| Supermarkets/grocery stores | Supermarket News Top 75 Retailers and Wholesalers | 2010–2014 |
| Convenience stores | Convenience Store News Top 100 Convenience Store Companies | 2013 |
| Pharmacies | Chain Drug Review Top 50 Chains (pharmacy dollar value and pharmacy count) | 2013 |
| General merchandise stores | Expert opinion | N/A |
| Limited service restaurants | National Restaurant News Top 200 (quick service and fast casual) | 2007–2013 |
| Quick Service Restaurant Top 50 | 2007–2013 |
Purchased and final business counts for InfoUSA (food stores) and Dun and Bradstreet (restaurants), 2007–2014
| Purchased data | Final data | |||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | |
| Count N | By SIC code N (%) | By name N (%) | Count | Unuseda | Percent unused | |
| InfoUSA | 2,847,339 | 2,690,245 (94.5) | 157,094 (5.5) | 2,341,030 | 506,309 | 17.5 |
| Dun and Bradstreet | 2,143,147 | 2,104,369 (98.2) | 38,778 (1.8) | 2,023,032 | 120,115 | 5.6 |
aRecords were unused in the final dataset if they had insufficiently accurate geocoding, were duplicates, or were purchased in error
Store and restaurant counts before and after processing, overall and by store type, 2007–2014
| Supermarkets and grocery stores | Convenience stores | Pharmacies | Liquor stores | General merchandise stores | Limited service restaurants | |
|---|---|---|---|---|---|---|
| Provisionally classified by SIC code | 712,033 | 1,152,453 | 461,555 | 306,881 | 57,323 | 2,079,985 |
| Reclassified by name | 707,724 | 1,154,429 | 448,388 | 306,856 | 73,566 | 2,064,742 |
| After cleaning for locational accuracy | 643,306 | 1,038,993 | 424,842 | 286,261 | 69,592 | 2,023,032 |
| After deduplication by name | 624,845 | 977,178 | 408,979 | 280,473 | N/A | N/A |
| After deduplication by address | 624,700 | 977,165 | 408,935 | 280,473 | N/A | N/A |
| Final count after cleaning and deduplication (excluding AK, HI) | 621,343 | 972,735 | 407,270 | 278,895 | 60,787 | 2,023,032 |
General merchandise stores were deduplicated using geographic deduplication. Count changes in the final step are due to exclusion of records in Alaska and Hawaii
Fig. 1Best practices for commercial business list data purchasing and processing