Literature DB >> 31757853

Social behavior for autonomous vehicles.

Wilko Schwarting¹, Alyssa Pierson², Javier Alonso-Mora³, Sertac Karaman⁴, Daniela Rus².

Abstract

Deployment of autonomous vehicles on public roads promises increased efficiency and safety. It requires understanding the intent of human drivers and adapting to their driving styles. Autonomous vehicles must also behave in safe and predictable ways without requiring explicit communication. We integrate tools from social psychology into autonomous-vehicle decision making to quantify and predict the social behavior of other drivers and to behave in a socially compliant way. A key component is Social Value Orientation (SVO), which quantifies the degree of an agent's selfishness or altruism, allowing us to better predict how the agent will interact and cooperate with others. We model interactions between agents as a best-response game wherein each agent negotiates to maximize their own utility. We solve the dynamic game by finding the Nash equilibrium, yielding an online method of predicting multiagent interactions given their SVOs. This approach allows autonomous vehicles to observe human drivers, estimate their SVOs, and generate an autonomous control policy in real time. We demonstrate the capabilities and performance of our algorithm in challenging traffic scenarios: merging lanes and unprotected left turns. We validate our results in simulation and on human driving data from the NGSIM dataset. Our results illustrate how the algorithm's behavior adapts to social preferences of other drivers. By incorporating SVO, we improve autonomous performance and reduce errors in human trajectory predictions by 25%.

Entities: Disease Gene Species

Keywords: Social Value Orientation; autonomous driving; game theory; inverse reinforcement learning; social compliance

Mesh：

Year: 2019 PMID： 31757853 PMCID： PMC6911195 DOI： 10.1073/pnas.1820676116

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

Interacting with human drivers is one of the great challenges of autonomous driving. To operate in the real world, autonomous vehicles (AVs) need to cope with situations requiring complex observations and interactions, such as highway merging and unprotected left-hand turns, which are challenging even for human drivers. For example, over 450,000 lane-change/merging accidents and 1.4 million right-/left-turn accidents occurred in the United States in 2015 alone (1). Currently, AVs lack an understanding of human behavior, thus requiring conservative behavior for safe operation. Conservative driving creates bottlenecks in traffic flow, especially in intersections. For example, Waymo, considered a leader in autonomous driving, still struggles with left turns and acting in predictable manners (2). This conservative behavior not only leaves AVs vulnerable to aggressive human drivers and inhibits the interpretability of intentions, but also can result in unexpected reactions that confuse and endanger others. In a recent analysis of California traffic incidents with AVs, in 57% of crashes, the AV was rear-ended by human drivers (3), with many of these crashes occurring because the AV behaved in an unexpected way that the human driver did not anticipate. For AVs to integrate onto roadways with human drivers, they must understand the intent of the human drivers and respond in a predictable and interpretable way. While planning a left turn may be trivial for an AV on an empty roadway, it remains difficult in heavy traffic. For human drivers, these unprotected left turns often occur when an oncoming driver slows down to yield, an implicit signal to the other driver that it is safe to turn. An AV must also recognize these social cues of selfishness or cooperation, and failure to do so impacts the overall flow of the traffic network and even the safety of the traffic participants. AVs rely on explicit communication, state machines, or geometric reasoning about the driving interactions (4–8), neglecting social cues and driver personality. These approaches cannot handle complex interactions, resulting in conservative behavior and limiting autonomy solutions to simple road interactions. Additionally, humans cannot directly quantify and communicate their actions and decisions to autonomous agents. We use game theory to capture the dynamic interactions between agents, considering an agent’s “best response” given the decisions of all other agents. Other approaches that use game-theoretic formulations model agents as selfish with homogeneous decision making (9–12). Instead, we extend the ability of AVs’ reasoning by incorporating estimates of the other drivers’ personality and driving style from social cues. This allows us to handle more complex navigation scenarios that rely on interactions, like multiple vehicles in an intersection. We present a mathematical formulation that combines control-theoretic approaches with models and metrics from the psychology literature, behavioral game theory, and machine learning.

Main Contributions.

This article proposes a system to measure, quantify, and predict human behavior to better inform an autonomous system. A game-theoretic formulation models driving as a series of social dilemmas to represent the dynamic interaction between drivers. We formulate a direct solution of the best-response game, allowing for fast, online predictions and planning, while integrating environmental and planning constraints to ensure safety. The game’s reward functions are dynamic and dependent on the vehicles’ states and the environment. Since we learn the reward functions from human driving data, we expect that our approach translates to other traffic scenarios and, broadly, human–robot interactions, where we can derive similar predictions trained on relevant data. Using Social Value Orientation (SVO), a common metric from psychology, we quantify human social preferences and their corresponding levels of cooperation. SVO measures how an individual weights their reward against the rewards of others, which translates into altruistic, prosocial, egoistic, or competitive preferences. We estimate the human drivers’ SVOs from observed motion and set the AV’s SVO based on the scenario. The main contributions of this paper are as follows: modeling driving as a dynamic game and computing its Nash equilibrium; predicting human actions from expected utility maximization; integrating SVO preferences into the utility-maximizing framework; estimating SVO online from observed driving trajectories; simulations of emerging socially compliant autonomous driving behavior; and validation on Next Generation Simulation (NGSIM) (13) driving data, a human dataset of the US Highway 101.

Driving as a Game.

We model driving as a noncooperative dynamic game (14), where the driving agents maximize their accumulated reward, or “payout,” over time. At each point in time, the agent receives a reward, which may be defined by factors like delay, comfort, distance between cars, progress to goal, and other priorities of the driver. Fig. 1 illustrates an example of a driving game: an unprotected left turn. Here, the blue car must make a left turn across the path of the black car. Depending on how the interaction is resolved, the agents accrue different rewards for decisions such as comfortable braking, waiting for others to pass, or safety. In Fig. 1, if each driver only maximizes their own reward, then the black vehicle would never brake for the blue vehicle making the unprotected left turn. However, we know that human drivers often brake for others in an act of altruism or cooperation. Similarly, in highway driving, we observe human drivers open gaps for merging vehicles. If all agents were to act in pure selfishness, the result would be increased congestion and, therefore, a decrease in the overall group’s reward. We thus conclude that driving poses a sequence of social dilemmas.

Fig. 1.

(A) Knowing a driver’s SVO helps predict their behavior. Here, the AV (blue) observes the trajectories of the other human driver (black). We can predict future motion of the black vehicle for candidate SVOs based on a utility-maximizing decision model (). If the human driver is egoistic, they will not yield, and the AV must wait to turn. If the human driver is prosocial, they will yield, and the AV can safely turn. In both cases, the driver is utility-maximizing, but the utility function varies by SVO. An egoistic driver considers only its own reward in computing its utility. A prosocial driver weights its reward with the reward of the other car. The most likely SVO is the one that best matches a candidate trajectory to the actual observed trajectory (). The AV predicts future motion using the most likely SVO estimate. (B) SVO is represented as an angular preference that relates how individuals weight rewards in a social dilemma game. Here, we plot the estimated SVOs for drivers merging in the NGSIM dataset, explained in . (C) The distribution of mean SVO estimates during interactions. We find merging drivers (red) to be more competitive than nonmerging drivers (blue).

Social Coordination.

Social dilemmas involve a conflict between the agent’s short-term self-interest and the group’s longer-term collective interest. Social dilemmas occur in driving, where drivers must coordinate their actions for safe and efficient joint maneuvers. Other examples include resource depletion, low voter turnout, overpopulation, the prisoner’s dilemma, or the public goods game. The autonomous control system proposed in this paper builds on social preferences of human drivers to predict outcomes of social dilemmas: whether individuals cooperate or defect, such as opening or closing a gap during a traffic merge. It allows us to better predict human behavior, thus offering a better basis for decision-making. It may also improve the efficiency of the group as a whole through emerging cooperation, for example, by reducing congestion.

Social Value Orientation.

Behavioral and experimental economics shows that people have unique and individual social preferences, including interpersonal altruism, fairness, reciprocity, inequity aversion, and egalitarianism. Self-interested models, like the homo economicus (15), assume that agents maximize only their own reward in a game, which fails to account for nuances in real human behavior. In contrast, SVO indicates a person’s preference of how to allocate rewards between themselves and another person. SVO can predict cooperative motives, negotiation strategies, and choice behavior (16–21). SVO preferences can be represented with a slider measure (22), a discrete-form triple dominance measure (23), or as an angle within a ring (24). We denote SVO in angular notation, shown in Fig. 1. Returning to Fig. 1, SVO helps explain when the black car yields. Here, the black car considers both its reward and the reward of the blue car, weighted by SVO. As the angular preference increases from egoistic to prosocial, the weight of the other agent’s reward increases, making it more likely that the black car will yield. Knowing a vehicle’s SVO helps an AV better predict the actions of that vehicle and allows it to complete the turn if cooperation is expected. Without SVO, it would wait conservatively until all cars cleared the intersection. An AV needs to estimate SVO, since humans cannot communicate this directly. Instead, humans observe and estimate SVO from actions and social cues (25). SVO preference distributions of individuals are largely individualistic () and prosocial () (22, 26–29), which emphasizes that an SVO-based model will be more accurate than a purely selfish model. We estimate SVOs of other drivers by determining the SVO that best fits predicted trajectories to the actual driver trajectories. This technique enables the estimation and study of SVO distributions of agent populations directly from trajectory data, extending beyond driving. We plot the estimated SVOs for drivers merging in the NGSIM dataset in Fig. 1.

Socially Compliant Driving.

Using SVO estimates of human drivers, we can design the control policy of the AV. We define socially compliant driving as behaving predictably to other human and autonomous agents during the sequence of driving social dilemmas. Achieving socially compliant driving in AVs is fundamental for the safety of passengers and surrounding vehicles, since behaving in a predictable manner enables humans to understand and appropriately respond to the AV’s actions. To achieve socially compliant driving, the autonomous system must behave as human-like as possible, which requires an intrinsic understanding of human behavior, as well as the social expectations of the group. Human behavior may be imitated by learning human policies from data through imitation learning (30, 31). Our autonomous system design enables social compliance by learning human reward functions through inverse reinforcement learning (IRL) (32). The optimal control policy of the best-response game with learned rewards yields a human-imitating policy (9, 33, 34). Mathematically, the imitating policy is the expectation of human behavior based on past observed actions, capable of predicting and mimicking human trajectories. Combined with SVO, this enables an AV to behave as a human driver is expected to behave in traffic scenarios, such as acting more competitively during merges, and mirroring the utility-maximization strategies of humans with heterogeneous social preferences in social dilemmas (35). When designing a cooperative AV, it may be desirable to assign the AV a prosocial SVO. Prosocials exhibit more fairness and considerateness compared to individualists (16) and engage in more volunteering, proenvironment, procommunity, and charitable efforts (17, 36–38). They also tend to minimize differences in outcomes between self and others (inequality aversion and egalitarianism) (18, 22). Additional findings suggest reciprocity in SVO and resulting cooperation (35, 39, 40). To make the unprotected turn in Fig. 1, the AV first observes the trajectory of the oncoming car, which can be done with onboard sensors. Using the reward (payoff) structure learned from data and our utility-maximizing behavior model, it generates candidate trajectories based on possible SVO values. The most likely SVO is the one that best matches a candidate trajectory to the actual observed trajectory. With this estimated SVO, the AV then generates future motion predictions and plans when to turn safely.

Estimating Driver Behavior with SVO

Our approach integrates SVO into a noncooperative dynamic game, and we model the agents as making utility-maximizing decisions, with the optimization framework presented in . To integrate SVO into our game-theoretic formulation, we define a utility function that combines the rewards of the ego agent with other agents, weighted by the ego agent’s SVO angular preference . For a two-agent game,where and are the “reward to self” and “reward to other,” respectively, and is the ego agent’s SVO. We see that the orientation of will weight the reward against based on the ego agent’s actions. The following definitions of social preferences (22, 24) are based on these weights: Altruistic: Altruistic agents maximize the other party’s reward, without consideration of their own outcome, with . Prosocial: Prosocial agents behave with the intention of benefiting a group as a whole, with . This is usually defined by maximizing the joint reward. Individualistic/egoistic: Individualistic agents maximize their own outcome, without concern of the reward of other agents, with . The term egoistic is also used. Competitive: Competitive agents maximize their relative gain over others, i.e., . We limit our definitions to rational social preferences, with more in refs. 22 and 24. While our definitions give specific values of SVO preferences for clarity, we also note that SVO exists on a continuum. For example, values in the range all exhibit a certain degree of altruism. We denote cooperative actions as actions that improve the outcome for all agents. For example, two egoistic agents may cooperate if both benefit in the outcome. Prosocials make cooperative choices, as their utility-maximizing policy also values a positive outcome of others. These cooperative choices improve the efficiency of the interaction and create collective value.

Measuring and Estimating SVO Online.

Given that other drivers maximize utility, we can predict their trajectories from observations and an estimate of their SVO. The choice of SVO changes the predicted trajectories. In Fig. 1, a prosocial SVO generates a braking trajectory prediction, while an egoistic SVO generates a nonbraking trajectory. We compute the likelihood of candidate SVOs from evaluating the Gaussian kernel on the distance between predicted and actual trajectories. We also consider a maximum-entropy model, which builds a likelihood function based on the distance of the observed trajectory to optimality given a candidate SVO (see for derivations). We utilize these methods to estimate SVO from human driver trajectories in .

Benefit of SVO.

We improve predictions of interactions by estimating the SVO of other drivers online. Incorporating SVO into the model increases social compliance of vehicles in the system, by improving predictability and blending in better. For the AVs, SVO adds the capability of nuanced cooperation with only a single variable. The AV’s SVO can be specified as user input, or change dynamically according to the driving scenario, such as becoming more competitive during merging.

Driving as a Game in Mixed Human–Robot Systems

To create a socially compliant autonomous system, our autonomous agents must determine their control strategies based on the decisions of the human and other agents. This section details how we incorporate a human decision-making model into an optimization framework; see for more detail. We formulate the utility-maximizing optimization problem as a multiagent dynamic game and then derive the Nash equilibrium to solve for a socially compliant control policy. Consider a system of human drivers and autonomous agents, with states such as position, heading, and speed, at time denoted , where and is the set of all possible states. We denote as the control input, such as acceleration and steering angle, of agent and as SVO preference, where is the set of all possible control inputs and is the set of possible SVO preferences. For brevity, we write the state of all agents in the system as , all control inputs as . The states evolve according to dynamics subject to constraints with the discrete-time transition functionThe notation refers to the set of agents excluding agent . For example, we can write the state vector , with . The agents calculate their individual control policies by solving a general discrete-time constrained optimization over time steps and time horizon . The set of states over the horizon is denoted as , and the set of inputs is . To calculate the control policy, we formulate a utility function for each agent and then find the utility-maximizing control actions. The utility function is defined as a combination of reward functions , as described in Eq. , and calculated from weighted features of the current state, controls, the environment, and social preference . At a given time , each agent ’s utility function is given by , and . The utility over the time horizon is denoted , writtenIn this paper, we learn the reward functions from the NGSIM driving data to approximate real human behavior; see for more details on this approach.

Human Decision-Making Model.

From psychology literature, we find that people are heterogeneous in their evaluation of joint rewards (18), and we can model preferences for others using utility functions that weight rewards (39–41). Murphy and Ackermann (35) model human decision making as expected utility maximizing under individual preferences. Based on these findings from behavioral decision theory, we model human agents in our system as agents that make utility-maximizing decisions. Other robotics literature (9, 33, 34) supports this case. Translating this decision making into an optimization framework for socially compliant behavior, we write the utility-maximizing policyThe solution to Eq. also corresponds to the actions maximizing the likelihood under the maximum entropy modelused to learn our rewards by IRL (32, 42). Under this model, the probability of actions is proportional to the exponential of the utility encountered along the trajectory. Hence, utility maximization yields actions most likely imitating human driver behavior, which is important for social compliance. Although the human driver does not explicitly calculate , we assume our model and formulation of captures the decision-making process of the human driver based on their observations, control actions, and underlying reward function of the environment. Later, we validate on the NGSIM dataset that our learned model successfully predicts the actual trajectories driven by the human drivers.

Game-Theoretic Autonomous Control Policy with SVO.

To design the control policy for the AV, note that Eq. formulated for all agents simultaneously defines a dynamic game (14). Given SVO estimates for all agents and a set of constraints on the system, we solve for the optimal control policy of a vehicle, , assuming the other agents in the system also choose an optimal policy, . For an intuition on how these dynamic games work, we first start with a Stackelberg game. An example traffic scenario that can be modeled as a Stackelberg game is cars arriving at a four-way stop, where they must traverse the intersection based on the first arrival. In the traditional two-agent Stackelberg game (43), the leader makes its choice of policy, , and the follower maximizes their control given the leader policy, . See for details on the general procedure of a multi-agent Stackelberg game. While the Stackelberg game can model some intersections, in many traffic scenarios, it is unclear who should be the leader and the follower, thus necessitating a more symmetric and simultaneous choice game, which is the approach we use in this paper. In the two-agent case, the follower chooses , but the leader readjusts based on the follower, or . This back and forth creates more levels of tacit negotiation and best response, such that . This strategy removes the leader–follower dynamics, as well as any asymmetric indirect control, yielding a simultaneous choice game.

Nash Equilibrium.

The iterative process of exchanging and optimizing policies is also called iterative best response, a numerical method to compute a Nash equilibrium of the game defined by Eq. . A limitation is its iterative nature; optimizing may take an unacceptable amount of steps. To make solving for the Nash equilibrium computationally tractable, we reformulate the interdependent optimization problems as a local single-level optimization using the Karush–Kuhn–Tucker conditions (14, 44). We solve the locally equivalent formulation, including all constraints, with state-of-the-art nonlinear optimizers. This preserves all safety constraints in the optimization, critical for guaranteeing safe operation, and performance Algorithm 1 provides an overview of our method, with more details in . The Nash equilibrium yields a control law for the AV as well as predicted actions for all other agents time steps into the future. Based on learned reward functions and the maximum entropy model, Eq. , are also maximum likelihood predictions. The Nash equilibrium is the predicted outcome of the driving social dilemma and mimics the negotiation process between agents.

Methods and Results

We implement our socially compliant driving algorithm in two ways: first to predict human driver behavior in highway merges, then in simulations of autonomous merging and turning scenarios. This section highlights illustrative examples of our results, with expanded analysis included in . We evaluate human driver predictions on the NGSIM dataset and examine highway on-ramp merges into congestion. We analyze a total of 92 unique merges from the dataset and discuss key results on a representative example. Incorporating SVO reduces errors in trajectory predictions of human drivers by up to 25%. For the AV simulations, we replicate this merging scenario and also present an unprotected left turn. Our simulations demonstrate how using SVO preferences assists the AV in choosing safe actions, adding nuanced behavior and cooperation with a single parameter.

Predicting Human Driving Behavior.

To validate our algorithm, we test its ability to predict human trajectories on highway on-ramp merges in the NGSIM dataset. We implement a noninteractive baseline algorithm, where each agent computes their optimal policy while modeling other agents as lane-keeping dynamic obstacles. Using the dataset and trajectory history, we compare the baseline prediction’s performance to the multiagent game-theoretic models with: 1) static egoistic SVO, equal to neglecting the SVO model; 2) best static SVO; and 3) estimated dynamic SVOs. The best static SVO corresponds to the best SVO estimate when holding it constant throughout the interaction. For different interactions, this may yield a different static SVO. Table 1 examines the relative position error between the true vehicle trajectory and our predictions. We find that incorporating the multiagent game-theoretic framework, but remaining egoistic, alone improves performance by 5%. Highlighting the importance of SVO, we see an 18% improvement over the baseline with static SVO and 25% with estimated dynamic SVO.

Table 1.

Trajectory prediction error

		Multiagent game theoretic
Prediction	Baseline	1	2	3
SVO	—	Egoistic	Static best	Estimated
MSE position	1.0	0.947	0.821	0.753

Relative position mean squared error (MSE) between predicted and actual trajectories, compared to a single-agent baseline. Our multiagent gametheoretic model reduces error, and the dynamic, estimated SVO performs best.

Socially Compliant Autonomous Driving: Trajectory prediction error Relative position mean squared error (MSE) between predicted and actual trajectories, compared to a single-agent baseline. Our multiagent gametheoretic model reduces error, and the dynamic, estimated SVO performs best. Fig. 2 shows a two-agent merge with car 1 (purple) merging into the lane with car 2 (green). We model the other cars in the dataset as obstacles for the planner. For a dynamic SVO prediction, we estimate SVO online from observed trajectories of the vehicles, then leverage SVO in predicting the trajectory. Fig. 2 shows SVO predictions and confidence bounds for both cars through the merge. Our SVO estimates help explain the interactions occurring: At , the first car’s SVO is egoistic while attempting to merge, but the second car is also egoistic and does not provide a sufficient gap to merge. At , the second car drops back and increases the gap for merging, corresponding to a more prosocial estimated SVO. Once the first car has merged, the second car closes the gap, returning to an egoistic SVO.

Fig. 2.

(Upper) Snapshot of NGSIM dataset with active cars (purple and green) and obstacle cars (gray). Here, car 1 (purple) is attempting a merge and must interact with car 2 (green). The solid lines indicate the predicted trajectory from our algorithm. For SVO estimates at each frame, the blue represents the distribution, while the red line indicates our estimate. (Lower) The solid line indicates SVO estimate over time, with the shaded region representing the confidence bounds. Initially, car 2 does not cooperate with car 1 and does not allow it to merge. After a few seconds, car 2 becomes more prosocial, which corresponds to it “dropping back” and allowing the first car to merge. The capability of estimating SVOs of humans by observing their motions allows us to investigate how SVO distributions in natural populations differ. Separating merging and nonmerging vehicles in the dataset, we find that merging cars are more likely to be competitive than nonmerging cars, as shown in the histogram of Fig. 1. This observation also withstands hypothesis testing with statistical significance (P < 0.002), further discussed in .

Autonomous Merging with SVO.

Employing the estimation techniques described in , we are able to measure SVO preference of another agent in a simulated highway-merging scenario. Fig. 3 shows the AV’s (red) SVO estimates of another vehicle (blue) over time. At first, the vehicles have little interaction, and the observations of the driver’s SVO remain ambiguous, such that the estimate is inaccurate with high variance. As the AV approaches the end of its lane, both vehicles begin to interact, indicated in gray in the figure. During this time, the SVO estimate quickly converges to the true value, with high confidence. After the merge, the vehicles no longer interact, the variance of the SVO estimate increases, and the estimate drifts away from the true value. Note that estimating the characteristics of an interaction (e.g., SVO) is only possible if the interaction between agents is impactful; see for a Hessian-based analysis.

Fig. 3.

(Left) Estimated distribution of SVO preference of blue car shown as polar histograms in SVO circles for premerge and during merge. (Right) The mean estimate is shown as red and the ground truth (, altruistic) in black. SVO estimates with 1- uncertainty bounds are shown on the right. Area of strong interaction corresponds to gray area on both sides.

Unprotected Left Turns.

In this scenario, the AV must make an unprotected left turn against numerous cars traveling in the oncoming direction. If the AV were in light traffic, it could be feasible for it to wait for all other oncoming cars to pass. However, in congested traffic, the intersection might never fully clear. Instead, the AV must predict when an oncoming car will yield, allowing the vehicle to safely make the turn. Fig. 4 shows our simulation, where the AV (red; ) attempts to turn across traffic. Two egoistic cars (blue; ) approach the intersection and do not yield for the AV, as predicted. An altruistic third car (magenta; ) yields for the AV by slowing down, such that the gap between itself and the other blue car increases. With this increased gap, the AV is able to safely make the turn, and the magenta car continues forward.

Fig. 4.

Unprotected left turn of an AV (red; ) with oncoming traffic. As the AV approaches the intersection, two egoistic cars (blue; ) continue and do not yield. A third altruistic car (magenta; ) yields by slowing down, allowing the AV to complete the turn in the gap.

Conclusions

We propose the use of SVO to measure, quantify, and predict the behavior of human drivers. We model the interactions between drivers as a dynamic game and present a computationally tractable way of finding its Nash equilibrium. Using SVO as our key factor in predicting human behavior, we present two likelihood functions to estimate the SVO of other drivers from observed trajectories. We validate our findings in simulation and on the NGSIM dataset, incorporating the human behavior into the AV planner, resulting in intelligent, socially aware maneuvers. We find that the multiagent Nash equilibrium, SVO, as well as its estimation improve predictions and prove essential assets for interactive driving. Our unified algorithm improves on human driver trajectory prediction by 25% over baseline models. For highway merges in the NGSIM data, we also find that the human drivers merging into traffic are consistently more competitive than the drivers yielding to the merging car. These insights can better inform AVs that currently struggle to make these maneuvers. The ability to estimate SVO distributions directly from observed motion instead of in laboratory conditions will prove impactful beyond autonomous driving. Overall, robotic and artificial intelligence applications where an autonomous system acts among humans will benefit from integrating SVO in their prediction and decision-making algorithms.

Data Availability Statement.

The NGSIM data are available at ref. 13.

Algorithm 1:

Socially Compliant Autonomous Driving:

1: x0← Update state observations of all agents

2: φ¬1← Update SVO estimation of all agents

3: φ1← Choose AV SVO

4: u*← Plan and predict for all agents Eq. 4 (multiagent Nash equilibrium)

5: Execute AV’s optimal control u*1

3 in total

1. Social value orientation: theoretical and measurement issues in the study of social preferences.

Authors: Ryan O Murphy; Kurt A Ackermann
Journal: Pers Soc Psychol Rev Date: 2013-09-23

2. Development of prosocial, individualistic, and competitive orientations: theory and preliminary evidence.

Authors: P A Van Lange; W Otten; E M De Bruin; J A Joireman
Journal: J Pers Soc Psychol Date: 1997-10

Review 3. When does "economic man" dominate social behavior?

Authors: Colin F Camerer; Ernst Fehr
Journal: Science Date: 2006-01-06 Impact factor: 47.728