Dynamic decision-making

Dynamic decision-making (DDM) is interdependent decision-making that takes place in an environment that changes over time either due to the previous actions of the decision maker or due to events that are outside of the control of the decision maker. In this sense, dynamic decisions, unlike simple and conventional one-time decisions, are typically more complex and occur in real-time and involve observing the extent to which people are able to use their experience to control a particular complex system, including the types of experience that lead to better decisions over time.

Overview
Dynamic decision making research uses computer simulations which are laboratory analogues for real-life situations. These computer simulations are also called “microworlds” and are used to examine people's behavior in simulated real world settings where people typically try to control a complex system where later decisions are affected by earlier decisions. The following differentiate DDM research from more classical forms of decision making research of the past:


 * The use of a series of decisions to reach a goal in DDM unlike a single decision
 * The interdependence of decisions on previous decisions in DDM unlike their independence from previous decisions
 * The dynamic nature of a changing environment in DDM unlike a static fixed environment that does not change
 * The fact that the decisions are made in real time in DDM tasks unlike no time pressure situations

Also, the use of microworlds as a tool to investigate DDM not only provides experimental control to DDM researchers but also makes the DDM field contemporary unlike the classical decision making research which is very old.

Examples of dynamic decision making situations include managing climate change, factory production and inventory, air traffic control, firefighting, and driving a car, military command and control in a battle field. Research in DDM has focused on investigating the extent to which decision makers use their experience to control a particular system; the factors that underlie the acquisition and use of experience in making decisions; and the type of experiences that lead to better decisions in dynamic tasks.

Characteristics of dynamic decision-making environments
The primary characteristics of dynamic decision environments are dynamics, complexity, opaqueness, and dynamic complexity. The dynamics of the environments refers to the dependence of the system's state on its state at an earlier time. Dynamics in the system could be driven by positive feedback (self-amplifying loops) or negative feedback (self-correcting loops), examples of which could be the accrual of interest in a saving bank account or the assuage of hunger due to eating respectively.

Complexity largely refers to the number of interacting or interconnected elements within a system that can make it difficult to predict the behavior of the system. But the definition of complexity could still have problems as system components can vary in terms of how many components there are in the system, number of relationships between them, and the nature of those relationships. Complexity may also be a function of the decision maker's ability.

Opaqueness refers to the physical invisibility of some aspects of a dynamic system and it might also be dependent upon a decision maker's ability to acquire knowledge of the components of the system.

Dynamic complexity refers to the decision maker's ability to control the system using the feedback the decision maker receives from the system. Diehl and Sterman have further broken down dynamic complexity into three components. The opaqueness present in the system might cause unintended side-effects. There might be non-linear relationships between components of a system and feedback delays between actions taken and their outcomes. The dynamic complexity of a system might eventually make it hard for the decision makers to understand and control the system.

Microworlds in DDM research
A microworld is a complex simulation used in controlled experiments designed to study dynamic decision-making. Research in dynamic decision-making is mostly laboratory-based and uses computer simulation microworld tools (i.e., Decision Making Games, DMGames). The microworlds are also known by other names, including synthetic task environments, high fidelity simulations, interactive learning environments, virtual environments, and scaled worlds. Microworlds become the laboratory analogues for real-life situations and help DDM investigators to study decision-making by compressing time and space while maintaining experimental control.

The DMGames compress the most important elements of the real-world problems they represent and are important tools for collecting human actions DMGames have helped investigate a variety of factors, such as cognitive ability, type of feedback, timing of feedback, strategies used while making decisions, and knowledge acquisition while performing DDM tasks. However, even though DMGames aim to represent the essential elements of real-world systems, they differ from the real-world task in various respects. Stakes might be higher in real-life tasks and expertise of the decision maker has often been acquired over a period of many years rather than minutes, hours or days as in DDM tasks. Thus, DDM differs in many respects from naturalistic decision-making (NDM).

In DDM tasks people have been shown to perform below the optimal levels of performance, if an optimal could be ascertained or known. For example, in a forest firefighting simulation game, participants frequently allowed their headquarters to be burned down. In similar DDM studies participants acting as doctors in an emergency room allowed their patients to die while they kept waiting for results of test that were actually non-diagnostic. An interesting insight into decisions from experience in DDM is that mostly the learning is implicit, and despite people's improvement of performance with repeated trials they are unable to verbalize the strategy they followed to do so.

Theories of learning in dynamic decision making tasks
Learning forms an integral part of DDM research. One of the main research activities in DDM has been to investigate using microworlds simulations tools the extent to which people are able to learn to control a particular simulated system and investigating the factors that might explain the learning in DDM tasks.

Strategy-Based Learning Theory
One theory of learning relies on the use of strategies or rules of action that relate to a particular task. These rules specify the conditions under which a certain rule or strategy will apply. These rules are of the form if you recognize situation S, then carry out action/strategy A. For example, Anzai implemented a set of production rules or strategies which performed the DDM task of steering a ship through a certain set of gates. The Anzai strategies did reasonably well to mimic the performance on the task by human participants. Similarly, Lovett and Anderson have shown how people use production rules or strategies of the if – then type in the building-sticks task which is an isomorph of Lurchins' waterjug problem. The goal in the building-sticks task is to construct a stick of a particular desired length given three stick lengths from which to build (there is an unlimited supply of sticks of each length). There are basically two strategies to use in trying to solve this problem. The undershoot strategy is to take smaller sticks and build up to the target stick. The overshoot strategy is to take the stick longer than the goal and cut off pieces equal in length to the smaller stick until one reaches the target length. Lovett and Anderson arranged it so that only one strategy would work for a particular problem and gave subjects problems where one of the two strategies worked on a majority of the problems (and she counterbalanced over subjects which was the more successful strategy).

Connectionism learning theory
Some other researchers have suggested that learning in DDM tasks can be explained by a connectionist theory or connectionism. The connections between units, whose strength or weighing depend upon previous experience. Thus, the output of a given unit depends upon the output of the previous unit weighted by the strength of the connection. As an example, Gibson et al. has shown that a connectionist neural network machine learning model does a good job to explain human behavior in the Berry and Broadbent's Sugar Production Factory task.

Instance-based learning theory
The Instance-Based Learning Theory (IBLT) is a theory of how humans make decisions in dynamic tasks developed by Cleotilde Gonzalez, Christian Lebiere, and Javier Lerch. The theory has been extended to two different paradigms of dynamic tasks, called sampling and repeated-choice, by Cleotilde Gonzalez and Varun Dutt. Gonzalez and Dutt have shown that in these dynamic tasks, IBLT provides the best explanation of human behavior and performs better than many other competing models and approaches. According to IBLT, individuals rely on their accumulated experience to make decisions by retrieving past solutions to similar situations stored in memory. Thus, decision accuracy can only improve gradually and through interaction with similar situations.

IBLT assumes that specific instances or experiences or exemplars are stored in the memory. These instances have a very concrete structure defined by three distinct parts which include the situation, decision, and utility (or SDU):
 * Situation refers to the environment's cues
 * Decision refers to decision maker's actions applicable to a particular situation
 * Utility refers to the correctness of a particular decision in that situation, either the expected utility (before making a decision) or the experienced utility (after feedback on the outcome of the decision has been received)

In addition to a predefined structure of an instance, IBLT relies on the global, high-level decision making process, consisting of five stages: recognition, judgment, choice, execution, and feedback. When people are faced with a particular environment's situation, people are likely to retrieve similar instances from memory to make a decision. In atypical situations (those that are not similar to anything encountered in the past), retrieval from memory is not possible and people would need to use a heuristic (which does not rely on memory) to make a decision. In situations that are typical and where inss can be retrieved, evaluation of the utility of the similar instances takes place until a necessity level is crossed.

Necessity is typically determined by the decision maker's “aspiration level,” similar to Simon and March's satisficing strategy. But the necessity level might also be determined by external environmental factors like time constraints (as in the medical domain with doctors in an emergency room treating patients in a time critical situation). Once that necessity level is crossed, the decision involving the instance with the highest utility is made. The outcome of the decision, when received, is then used to update the utility of the instance that was used to make the decision in the first place (from expected to experienced). This generic decision making process is assumed to apply to any dynamic decision making situation, when decisions are made from experience.

The computational representation of IBLT relies on several learning mechanisms proposed by a generic theory of cognition, ACT-R. Currently, there are many decision tasks that have been implemented in the IBLT that reproduces and explains human behavior accurately.

Feedback in dynamic decision-making tasks
Although feedback interventions have been found to benefit performance on DDM tasks, outcome feedback has been shown to work for tasks that are simple, require lower cognitive abilities, and that are repeatedly practiced. For example, IBLT suggests that in DDM situations, learning from only outcome feedback is slow and generally ineffective.

Effects of feedback delays in DDM tasks
The presence of feedback delays in the DDM tasks and its misperceptions by the participants contributes to less than optimal performance on DDM tasks. Such delays in feedback make it harder for people to understand the relationships that govern the system dynamics of the task due to the delay between the actions of the decision makers and the outcome from the dynamic system.

A familiar example of the effect of feedback delays is the Beer Distribution Game (or Beer Game). There is a time delay built into the game between placing an order by a role and reception of the ordered cases of beer. If a role runs out of beer (i.e., unable to satisfy a customer's current demand for beer cases), there is a fine of $1 per case. This might lead people to overstock beer to satisfy any future unanticipated demands. Results, contrary to economic theory which predicts a long term stable equilibrium, show people ordering too much. This happens because the time delay between placing an order and receiving inventory makes people think that the inventory is running out as new orders come in, so they react and place larger orders. Once they build up the inventory and realize the incoming orders they drastically cut future orders which leads the beer industry experience oscillating patterns of over-ordering and under-ordering, that is, costly cycles of boom and bust.

Similar examples on effects of feedback delay have been reported among fire fighters in a fire fighting game called NEWFIRE in the past where on account of task complexity and feedback delay between actions of firefighters and outcomes, led participants to frequently allow their headquarters to be burned down.

Effects of proportional thinking in DDM Tasks
Growing evidence in DDM indicates that adults share a robust problem in understanding some of the basic building blocks of simple dynamic systems, including stocks, inflows, and outflows. Many adults have shown a failure to interpret a basic principle of dynamics: a stock (or accumulation) rises (or falls) when the inflow exceeds (or is less than) the outflow. This problem, termed Stock-Flow failure (SF Failure), has been shown to be persistent even in simple tasks, with well motivated participants, in familiar contexts and simplified information displays. The belief that the stock behaves like the flows is a common but wrong heuristic (named the “correlation heuristic") that people often use when judging non-linear systems. The use of correlation heuristic or proportional reasoning is widespread across different domains and has been found to be a robust problem in both school children and educated adults (Cronin et al. 2009; Larrick & Soll, 2008; De Bock 2002; Greer, 1993; Van Dooren et al., 2005; Van Dooren et al., 2006; Verschaffel et al., 1994).

Individual Differences in DDM
Individual performance on DDM tasks is accompanied by tremendous amount of variability, which might be a result of the varying amount of skill and cognitive abilities of individuals who interact with the DDM tasks. Although individual differences exist and are often shown on DDM tasks, there has been a debate on whether these differences arise as a result of differences in cognitive abilities. Some studies have failed to find evidence of a link between cognitive abilities as measured by intelligence tests and performance on DDM tasks. But later studies contend that this lack is due to absence of reliable performance measures on DDM tasks.

Other studies have suggested a relationship between workload and cognitive abilities. It was found that low ability participants are generally outperformed by high ability participants. Under demanding conditions of workload, low ability participants do not show improvement in performance in either training or test trials. Evidence shows that low ability participants use more heuristics particularly when the task demands faster trials or time pressure and this happens both during training and test conditions.

DDM in the real world
In connection to DDM using laboratory microworld tools to investigate decision making there has also been a recent emphasis in DDM research to focus on decision making in the real world. This does not discount research in the laboratory but reveals the broad conception of the research underlying DDM. Under the DDM in the real world people are more interested in processes like goal setting, planning, perceptual and attention processes, forecasting, comprehension processes and many others including attending to feedback. The study of these processes brings DDM research closer to situation awareness and expertise.

For example, it has been shown in DDM research that motorists who have more than 10 years of experience or expertise (in terms years of driving experience) are faster to respond to hazards than drivers with less than three years of experience. Also, owing to their greater experience, such motorists tend to perform a more effective and efficient search for hazards cues than their not so experienced counterparts. A way to explain such behavior is based upon the premise that situation awareness in DDM tasks makes certain behaviors automatic for people with expertise. In this regard, the search for cue in the environment that could possibly lead to hazards for experienced motorists might be an automatic process whereas lack of situation awareness among novice motorists might lead them to a conscious non-automatic effort to find such cues leading them to become more prone to hazards by not noticing them at all. This behavior has also been documented for pilots and platoon commanders. The considerations of novice and experienced platoon commanders in a virtual reality battle simulator has shown that more experience was associated with higher perceptual skills, higher comprehension skills. Thus, experience on different DDM tasks makes a decision maker more situational aware with higher levels of perceptual and comprehension skills.