| Literature DB >> 32455337 |
Adam N Hornsby1,2, Thomas Evans2, Peter S Riefer2, Rosie Prior2, Bradley C Love1,3.
Abstract
Computational models using text corpora have proved useful in understanding the nature of language and human concepts. One appeal of this work is that text, such as from newspaper articles, should reflect human behaviour and conceptual organization outside the laboratory. However, texts do not directly reflect human activity, but instead serve a communicative function and are highly curated or edited to suit an audience. Here, we apply methods devised for text to a data source that directly reflects thousands of individuals' activity patterns. Using product co-occurrence data from nearly 1.3-m supermarket shopping baskets, we trained a topic model to learn 25 high-level concepts (or topics). These topics were found to be comprehensible and coherent by both retail experts and consumers. The topics indicated that human concepts are primarily organized around goals and interactions (e.g. tomatoes go well with vegetables in a salad), rather than their intrinsic features (e.g. defining a tomato by the fact that it has seeds and is fleshy). These results are consistent with the notion that human conceptual knowledge is tailored to support action. Individual differences in the topics sampled predicted basic demographic characteristics. Our findings suggest that human activity patterns can reveal conceptual organization and may give rise to it.Entities:
Keywords: Big data; Cognition; Computational social science; Decision making; Machine learning
Year: 2019 PMID: 32455337 PMCID: PMC7235073 DOI: 10.1007/s42113-019-00064-9
Source DB: PubMed Journal: Comput Brain Behav ISSN: 2522-0861
Fig. 1The input in a corpus analysis is typically item counts (i.e. word counts) within some context (e.g. a sentence or document). Analogously, products (akin to words) are organized into baskets (akin to sentences). One advantage of applying these analysis techniques to baskets is that, unlike natural language, meaning is unaffected by item order
Fig. 2Latent Dirichlet Allocation (LDA) uncovers the higher-level product topics that can be viewed as generating the observed baskets purchased by consumers. LDA’s fit is driven by the co-occurrence pattern of products within baskets. In the solution, each product has a probability of occurring within each topic (shown on the left for apple). The colours illustrate which topic each product would have been labelled with if using the maximum product topic probability. Each basket is generated by a mixture of probabilities over the topics (shown on the right for this basket)
Retailer-supplied product descriptions for the 5 most relevant products within each of the 10 surveyed topics. Note that the authors had access to the full product topic relevancy matrix (see https://osf.io/tsymx/) when they labeled the topics. Brand names have been removed from this table for publication
| Topic | Description |
|---|---|
| Food for now | ITALIAN BEEF LASAGNE 450G |
| ITAL CHICKEN & BACON PASTA BAKE 450G | |
| ITALIAN MACARONI CHEESE PASTA 450G | |
| ITAL SPAGHETTI CARBONARA 450G | |
| ITAL HAM & MUSHROOM TAGLIATELLE 450G | |
| Summer salad | BUNCHED SPRING ONIONS 100G |
| ICEBERG LETTUCE EACH | |
| WHOLE CUCUMBER EACH | |
| SALAD TOMATOES 6 PACK | |
| GROWING SALAD CRESS EACH | |
| Stir fry | FRESH EGG NOODLES 375G |
| VEGETABLE & BEANSPROUSTIR FRY 333G | |
| CHINESE STIR FRY BOWL 300G | |
| EXPRESS GOLDEN VEG RICE 250G | |
| BEANSPROUTS 370G | |
| Afternoon tea | 2 EGG CUSTARD TARTS 2X90G |
| BRS/SKIMMED MLK 1.136L/2PINTS | |
| DANISH SLICED WHITE BREAD 400G | |
| MINHUMBUGS 200G | |
| BANANAS LOOSE | |
| Loose fruit and veg | CARROTS LOOSE |
| BANANAS LOOSE | |
| PARSNIPS LOOSE | |
| CONFERENCE PEARS LOOSE | |
| BROCCOLI LOOSE | |
| Low calorie options | LIGHFRUITS YOGUR6X175G |
| BRSKIMMED MILK 2.272L/4 PINTS | |
| LIGHYELLOW FRUIYOGUR6X175 | |
| LIGHTOFFEE YOGUR175G | |
| LIGHLIMITED EDITION YOGHURT 165G | |
| Cheapest option | EDAY VALUEBAKED BEANS IN TOMSAUCE 420G |
| EVERYDAY VALUE HAM 364G | |
| EDAY VALUE MILK CHOCOLATE DIGESTIVES 300G | |
| EDAY VALUEPENNE 500G | |
| EVERYDAY VALUELOW FAFRUIYOG 4X125G | |
| Cooking from scratch | COURGETTES LOOSE |
| LOOSE BROWN ONIONS | |
| RED ONIONS LOOSE | |
| CARROTS LOOSE | |
| GARLIC EACH | |
| Christmas | ORIGINAL CRISPS 190G |
| SOUR CREAM & ONION CRISPS 190G | |
| BRUSSELS SPROUTS 500G | |
| PARSNIPS PACK 500G | |
| SAL& VINEGAR CRISPS 190G | |
| Low maintenance cooking | PREPARED BABY SPROUTS 180G |
| PREPARED CARROCAULIFLOWER & BROCCOLI 370G | |
| PREPARED TRAD SLICED RUNNER BEANS 185G | |
| PREPARED BROCCOLI FLORETS 240G | |
| TOPSIDE OF ROASTBEEF 85G |
Fig. 3Proportion correct with standard error bars for the study on label agreement involving retail experts and the intruder study involving typical consumers. All proportions were significantly different (p < .001) than chance levels, 25.00% (1 of 4) and 16.67% (1 of 6), respectively
Fig. 4Topic prevalence varies by season. The proportion of baskets with a given topic label in each month of 2014, divided by the monthly mean average across all topics (i.e. index), is shown. a Topics that should be seasonal peak at the expected time, such December for the Christmas topic. b In contrast, topics for staple products vary less in prevalence over time