| Literature DB >> 28730465 |
Gaurav Malhotra1, David S Leslie2, Casimir J H Ludwig3, Rafal Bogacz4.
Abstract
The most widely used account of decision-making proposes that people choose between alternatives by accumulating evidence in favor of each alternative until this evidence reaches a decision boundary. It is frequently assumed that this decision boundary stays constant during a decision, depending on the evidence collected but not on time. Recent experimental and theoretical work has challenged this assumption, showing that constant decision boundaries are, in some circumstances, sub-optimal. We introduce a theoretical model that facilitates identification of the optimal decision boundaries under a wide range of conditions. Time-varying optimal decision boundaries for our model are a result only of uncertainty over the difficulty of each trial and do not require decision deadlines or costs associated with collecting evidence, as assumed by previous authors. Furthermore, the shape of optimal decision boundaries depends on the difficulties of different decisions. When some trials are very difficult, optimal boundaries decrease with time, but for tasks that only include a mixture of easy and medium difficulty trials, the optimal boundaries increase or stay constant. We also show how this simple model can be extended to more complex decision-making tasks such as when people have unequal priors or when they can choose to opt out of decisions. The theoretical model presented here provides an important framework to understand how, why, and whether decision boundaries should change over time in experiments on decision-making.Entities:
Keywords: Decision-making; Decreasing bounds; Optimal decisions; Reward rate
Mesh:
Year: 2018 PMID: 28730465 PMCID: PMC5990589 DOI: 10.3758/s13423-017-1340-6
Source DB: PubMed Journal: Psychon Bull Rev ISSN: 1069-9384
Fig. 1a Evidence accumulation as a random walk. Gray lines show current trajectory and black lines show possible trajectories if the decision-maker chooses to wait. b Evidence accumulation and decision-making as a Markov decision process: transitions associated with the action go are shown in dashed lines, while transitions associated with wait are shown in solid lines. The rewarded and unrewarded states are shown as C and I, respectively (for Correct and Incorrect)
Fig. 2Each panel shows the optimal actions for different points in the state space after convergence of the policy iteration. Gray squares indicate that wait is the optimal action in that state while black squares indicate that go is optimal. The inter-trial delays for all three computations were D = D = 150 and all trials in a task had the same difficulty. The up-probability for each decision in the task was drawn, with equal probability from (a) u ∈{0.05,0.95}, (b) u ∈{0.30,0.70} and (c) u = 0.50
Fig. 3Optimal actions for single and mixed difficulty tasks. The inter-trial intervals used for computing all three policies are D = D = 150. a Single difficulty task with up-probability for each decision drawn from u ∈{0.30,0.70}; b Single difficulty task with ; c Mixed difficulty task with u ∈{0.30,0.50,0.70}, with both easy and difficult trials equally likely, i.e.,
Fig. 4Optimal actions for mixed difficulty tasks with different difficulty levels. Each panel shows mixed difficulty task with up-probability for each decision drawn from (a) u ∈{0.30,0.48,0.52,0.70}, (b) u ∈{0.30,0.45,0.55,0.70}, and (c) u ∈{0.30,0.40,0.60,0.70} with equal probability. All other parameters remain the same as in computations shown in Fig. 3 above
Fig. 5Optimal actions for a mixture of difficulties when the easy task has narrower bounds than the difficult task. The inter-trial delays for all three computations are D = D = 150. Panels (a) and (b) show optimal policies for single difficulty tasks with up-probability of each decision chosen from u ∈{0.05,0.95} and u ∈{0.40,0.60}, respectively. Panel (c) shows optimal policy in mixed difficulty task with up-probability chosen from u ∈{0.05,0.40,0.60,0.95} and . Panels (d–f) show the change in posterior probabilities with time at the upper decision boundary for conditions (a–c), respectively
Fig. 6Optimal actions for single and mixed difficulty tasks when inter-trial intervals are reduced to D = D = 50. All other parameters are the same as Fig. 3
Fig. 7Optimal actions remain the same if D + D remain the same. Each panel shows the optimal policy for up-probability drawn from the same set as Fig. 3, but for an inter-trial delay of D = 75 for correct guesses and D = 150 for errors
Fig. 8Change in optimal policy during single difficulty tasks with increasingly biased prior beliefs. a ; b ; c . For all three computations, up-probability is drawn from u ∈{0.30,0.70} and the inter-trial intervals are D = D = 150
Fig. 9Optimal policy during mixed difficulty trials with biased prior beliefs. For all computations, the mixture of drifts involves 𝜖 = 0.20, 𝜖 = 0 and . Three different priors are used: the left column uses , the middle column uses , and the right column uses . The first row shows optimal policies, the second row shows the posterior probability for the trial being easy given the state and the third row shows the posterior probability for the trial having up-probability > 0.50 given the state. For all three computations, the inter-trial intervals are D = D = 150
Fig. 10Optimal actions for all states when actions include the pass option. Gray = wait; black = go; red = pass. For all computations, 𝜖 = 0.20, 𝜖 = 0, D = D = 150. (a) The single difficulty case with . For (b) and (c), . For (a) and (b), the inter-trial interval for pass action is 20 time steps while for (c) it is 40 time steps. Panels (d–f) show the corresponding posterior probabilities of a drift > 0.50, , for the conditions in panels (a–c)
Set of studies and drifts used to generate optimal policies
| Study | Paradigm | Conditions | Distribution | Drifts ({ |
|---|---|---|---|---|
|
| Motion |
| Uniform |
|
|
|
| |||
|
| Distance | 32 values | Uniform | 17 values |
| Range: [1.7, 2.4] cm | Range: [0, 0.50] | |||
|
| Brightness |
| Uniform | {0.05, 0.10, 0.20} |
|
| ||||
|
| Motion |
| Uniform |
|
|
|
| |||
|
| Color |
| Uniform | {0, 0.05, 0.10, 0.20} |
|
| ||||
|
| Numerosity | Range: [21, 80] | Piecewise |
|
| Uniform |
| |||
|
| Numerosity | Range: [3, 98] | Approximately | {0, 0.05,…, 0.50} |
| Gaussian | ||||
|
| Numerosity | Range: [31, 70] | Uniform | {0, 0.02,…, 0.20} |
|
| Numerosity | Range: [3, 98] | Uniform | {0, 0.02,…, 0.48} |
Notes. Each row shows the set of conditions used in the experiment, the distribution of these conditions across trials and the set of drift parameters used to compute optimal policies in Fig. 11. The value given to a condition refers to the motion coherence for the motion discrimination task, to the separation of dots for the distance judgment task, to the proportion of black pixels for the brightness discrimination task, to the percentage of cyan to magenta checkers for the color judgment task and to the number of asterisks for the numerosity judgment task. For the computation VRS_16: E2, the probability of each drift value 𝜖 was equal to , where is the probability density of the normal distribution with μ = 0, and standard deviation σ = 0.21 and Z is a normalization factor ensuring that the probabilities add up to 1. The names of studies are abbreviated as follows PHS_05: (Palmer, Huk, & Shadlen, 2005), RTM_01: (Ratcliff, Thapar, & McKoon, 2001), R_07: (Ratcliff, Hasegawa, Hasegawa, Smith, & Segraves, 2007), RM_08: (Ratcliff & McKoon, 2008), MS_14: (Middlebrooks & Schall, 2014), VRS_16: (Voskuilen et al., 2016) with E1 …E4 standing for Experiments 1 … 4, respectively.
Fig. 11Optimal policies for mixture of difficulties used in experiments considered by Hawkins et al. (2015) and Voskuilen et al. (2016). Insets show the slope of the optimal boundary (measured as tangent of a line fitting the boundary) across a range of inter-trial intervals for the mixture of drifts that maps to the experiment (solid, red line) and compares it to flat boundaries (dotted, black line) and the mixture 𝜖 ∈{0.20,0.50}, which gives a large slope across the range of inter-trial intervals (dashed, blue line). The dot in the inset along each solid (red) line indicates the value of inter-trial interval used to generate the optimal policy shown in the main figure