Shaping (psychology)

Shaping is a conditioning paradigm used primarily in the experimental analysis of behavior. The method used is differential reinforcement of successive approximations. It was introduced by B. F. Skinner with pigeons and extended to dogs, dolphins, humans and other species. In shaping, the form of an existing response is gradually changed across successive trials towards a desired target behavior by reinforcing exact segments of behavior. Skinner's explanation of shaping was this: "We first give the bird food when it turns slightly in the direction of the spot from any part of the cage. This increases the frequency of such behavior. We then withhold reinforcement until a slight movement is made toward the spot. This again alters the general distribution of behavior without producing a new unit. We continue by reinforcing positions successively closer to the spot, then by reinforcing only when the head is moved slightly forward, and finally only when the beak actually makes contact with the spot. ... The original probability of the response in its final form is very low; in some cases it may even be zero. In this way we can build complicated operants which would never appear in the repertoire of the organism otherwise. By reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time. ... The total act of turning toward the spot from any point in the box, walking toward it, raising the head, and striking the spot may seem to be a functionally coherent unit of behavior; but it is constructed by a continual process of differential reinforcement from undifferentiated behavior, just as the sculptor shapes his figure from a lump of clay."

Successive approximations
The successive approximations reinforced are increasingly closer approximations of the target behavior set by the trainer. As training progresses the trainer stops reinforcing the less accurate approximations. When the trainer stops reinforcing this behavior, the learner will go through an extinction burst in which they perform many behaviors in an attempt to receive that reinforcement. The trainer will pick one of those behaviors that is a closer approximation to the target behavior and reinforce that chosen behavior. The trainer repeats this process with the successive approximations getting closer to the target response until the learner achieves the intended behavior.

For example, to train a rat to press a lever, the following successive approximations might be applied:
 * 1) simply turning toward the lever will be reinforced
 * 2) only moving toward the lever will be reinforced
 * 3) only moving to within a specified distance from the lever will be reinforced
 * 4) only touching the lever with any part of the body, such as the nose, will be reinforced
 * 5) only touching the lever with a specified paw will be reinforced
 * 6) only depressing the lever partially with the specified paw will be reinforced
 * 7) only depressing the lever completely with the specified paw will be reinforced

The trainer starts by reinforcing all behaviors in the first category, here turning toward the lever. When the animal regularly performs that response (turning), the trainer restricts reinforcement to responses in the second category (moving toward), then the third, and so on, progressing to each more accurate approximation as the animal learns the one currently reinforced. Thus, the response gradually approximates the desired behavior until finally the target response (lever pressing) is established. At first the rat is not likely to press the lever; in the end it presses rapidly.

Shaping sometimes fails. An oft-cited example is an attempt by Marian and Keller Breland (students of B.F. Skinner) to shape a pig and a raccoon to deposit a coin in a piggy bank, using food as the reinforcer. Instead of learning to deposit the coin, the pig began to root it into the ground, and the raccoon "washed" and rubbed the coins together. That is, the animals treated the coin the same way that they treated food items that they were preparing to eat, referred to as “food-getting” behaviors. In the case of the raccoon, it was able to learn to deposit one coin into the box to gain a food reward, but when the contingencies were changed such that two coins were required to gain the reward, the raccoon could not learn the new, more complex rule. After what could be characterized as expressions of frustration, the raccoon resorts to basic “food-getting” behaviors common to its species. These results show a limitation in the raccoon’s cognitive capacity to even conceive of the possibility that two coins could be exchanged for food, irrespective of existing auto-shaping contingencies. Since the Breland's observations were reported many other examples of untrained responses to natural stimuli have been reported; in many contexts, the stimuli are called "sign stimuli", and the related behaviors are called "sign tracking".

Practical applications
Shaping is used in training operant responses in lab animals, and in applied behavior analysis to change human or animal behaviors considered to be maladaptive or dysfunctional. It can also be used to teach behaviors to learners who refuse to do the target behavior or struggle with achieving it. This procedure plays an important role in commercial animal training. Shaping assists in "discrimination", which is the ability to tell the difference between stimuli that are and are not reinforced, and in "generalization", which is the application of a response learned in one situation to a different but similar situation.

Shaping can also be used in a rehabilitation center. For example, training on parallel bars can approximate walking with a walker. Or shaping can teach patients how to increase the time between bathroom visits.

Autoshaping
Autoshaping (sometimes called sign tracking) is any of a variety of experimental procedures used to study classical conditioning. In autoshaping, in contrast to shaping, the reward comes irrespective of the behavior of the animal. In its simplest form, autoshaping is very similar to Pavlov's salivary conditioning procedure using dogs. In Pavlov's best-known procedure, a short audible tone reliably preceded the presentation of food to dogs. The dogs naturally, unconditionally, salivated (unconditioned response) to the food (unconditioned stimulus) given to them, but through learning, conditionally, came to salivate (conditioned response) to the tone (conditioned stimulus) that predicted food. In auto-shaping, a light is reliably turned on shortly before animals are given food. The animals naturally, unconditionally, display consummatory reactions to the food given them, but through learning, conditionally, came to perform those same consummatory actions directed at the conditioned stimulus that predicts food.

Autoshaping provides an interesting conundrum for B.F. Skinner's assertion that one must employ shaping as a method for teaching a pigeon to peck a key. After all, if an animal can shape itself, why use the laborious process of shaping? Autoshaping also contradicts Skinner's principle of reinforcement. During autoshaping, food comes irrespective of the behavior of the animal. If reinforcement were occurring, random behaviors should increase in frequency because they should have been rewarded by random food. Nonetheless, key-pecking reliably develops in pigeons, even if this behavior had never been rewarded.

But, the clearest evidence that auto-shaping is under Pavlovian and not Skinnerian control was found using the omission procedure. In that procedure, food is normally scheduled for delivery following each presentation of a stimulus (often a flash of light), except in cases in which the animal actually performs a consummatory response to the stimulus, in which case food is withheld. Here, if the behavior were under instrumental control, the animal would stop attempting to consume the stimulus, as that behavior is followed by the withholding of food. But, animals persist in attempting to consume the conditioned stimulus for thousands of trials (a phenomenon known as negative automaintenance), unable to cease their behavioral response to the conditioned stimulus even when it prevents them from obtaining a reward.