artificial neural circuits for conditioned learning

Hints from Life to AI, edited by Ugur HALICI, METU, 1994 ã

artificial neural circuits for conditioned learning[1]

The concepts of classical conditioning can be used in designing neural networks for association and expectation learning, and behavioural conditioning in artificial systems. Classical conditioning concepts such as excitatory conditioning, inhibitory conditioning, secondary conditioning, opponent extinction and habituation have been modelled by the Recurrent Associative Dipole (READ). However, this circuit does not satisfy the experimental data on extinction in case of the nonoccurence of an expected event. In this work, a brief overview of the basic concepts in conditioned learning is made, the operation of the READ circuit is depicted, and the circuit is modified so that it will model extinction as well. The changes in the performance of the READ circuit introduced by this modification are then explored.

An intelligent system using neural networks would include, among other things, networks for visual perception, pattern learning and object recognition, association learning, expectation learning, emotional states and behavioural actions [1]. Such neural networks can be incorporated in a robot, or in adaptive systems in any practical field.

The scope of this paper is only the parts related to association and expectation learning, and behavioural conditioning. Therefore, it is assumed that the objects that act as stimuli in these types of learning have already been recognized, and are presented to the network at the conceptual level rather than as patterns.

The neural network models depicted here aim at using the concepts of classical conditioning in animals, to design neural networks that can learn associations between objects (stimuli) and between objects and responses. As some associations between stimuli and responses seem to be inborn in animals; such as the association between the sight of food and salivation for a dog; some vital stimulus-response pairs can be hard-wired or initially set by software in artificial systems. Then, these initial associations can be used to form new associations between stimuli and between stimuli and responses. Some classical conditioning concepts will be defined in Section 2.

The Recurrent Associative Dipole (READ) [3] is a neural circuit that models some of the classical conditioning concepts that will be described in Section 2. A number of these circuits are used as part of the cognitive system of a mobile robot called MAVIN [1] developed at the Lincoln Laboratory at the MIT. The operation of the READ circuit will be summarized in Section 3.

The operation of the READ circuit doesn't conform with the psychological data on extinction in the case of unconfirmed expectations. However, the extinction of associations allows the system to learn new associations after forgetting ones that are no longer valid, and also prevents it from learning further associations based on those that are not valid. We have made a to the READ circuit and to the differential equations defining it, in order to handle the concept of extinction. In Section 4, these modifications are described, and simulation results of the READ circuit and its modified version are given.

In this section, some classical conditioning phenomena that are aimed to be modelled are described.

Classical or Pavlovian conditioning is the type of learning in Pavlov's well known experiment in which a dog is repeatedly presented with the sound of a tuning fork before being given food, and learns to salivate at the sound of the tuning fork alone [2]. In this section, the basic concepts of conditioned learning will be discussed.

US: The unconditioned stimulus which triggers a response without prior training. In Pavlov's experiment, the US is food.

UR: The unconditioned response triggered by the US; in Pavlov's experiment, the dog's salivation.

CS: The conditioned stimulus which comes to trigger a response by being repeatedly paired with the US; in Pavlov's experiment, the sound of a tuning fork.

CR: The conditioned response which arises at the occurrence of the CS, after the CS has been repeatedly paired with the US. The CR is not necessarily identical to the UR; it may vary in amplitude and latency. In Pavlov's experiment, the CR is salivation at the sound of the tuning fork alone.

The occurrence of the CR indicates that an association has been formed between the CS and the US. This association learning is meaningful in itself, regardless of the relationship between the UR and the CR [2].

In this subsection, some basic concepts of classical or Pavlovian conditioning are be defined.

The type of conditioning in which the CS is conditioned to the response related to the US, as in Pavlov's experiment, is called excitatory conditioning. In this type, firstly the CS is presented, and then the US is presented in the presence of the CS. This pairing is repeated a number of times.

In inhibitory conditioning, the CS is repeatedly presented after the offset of the US, and is thus conditioned to the response related to this offset.

The schedules for excitatory and inhibitory conditioning are demonstrated in Figures 1(a) and (b) respectively.

The conditioning of a CS by repeated pairing with a US is called primary conditioning. Once a conditioned stimulus CS₁ has been conditioned by using a US, it can in turn be used in conditioning another stimulus CS₂. In Pavlov's experiment, for instance, after the sound has been paired with food for a sufficient number of times, repeatedly pairing a light with the sound will cause the dog to salivate at the sight of the light, even when food is not used as an US. This phenomenon is called secondary conditioning.

Schematic diagrams of secondary excitatory conditioning and secondary inhibitory conditioning are shown in Figure 1(c) and (d) respectively.

Figure 1: Some classical conditioning schedules (a)primary excitatory conditioning (b)primary inhibitory conditioning (c)secondary excitatory conditioning (d)secondary inhibitory conditioning

Assume that a stimulus CS₁ is repeatedly paired with a US until conditioning takes place; and then CS₁ and a neutral unconditioned stimulus CS₂ are presented together and paired with the US. When CS₂ is presented alone afterwards, it does not result in the CR. This phenomenon of a previously conditioned stimulus preventing the conditioning of a neutral stimulus is called blocking.The experimental stages of blocking are as below [2]:

Assume that stimuli CS₁ and CS₂ which are initially both neutral are presented together and paired with US. It may be the case that conditioning occurs for the compound stimulus CS₁+CS₂ and for one of the stimuli; say CS₁, but does not occur for CS₂. This situation may arise because CS₁ is a more salient stimulus and therefore receives more attention than CS₂. This phenomenon is called overshadowing. Overshadowing can be demonstrated by the following stages [2]:

For conditioning to take place, the CS must be presented before the US, as shown in Figure 1(a). Otherwise, the US will attract more attention because of its relevance to drives and responses in the system, and will in a way overshadow the CS.

The time interval between the presentation of the CS and that of the US is called the interstimulus interval ISI. The synchronization problem in conditioning arises from the necessity that, although the ISI varies for the various repetitions of an experiment, the CS should become associated only with the US and not with a mixture of the US and the noise in the environment.

In animals, learned responses are dropped if they are not reinforced [2]. For example, if after being conditioned with the tune-food pair, the dog in Pavlov's experiment is repeatedly presented with a tune but no food appears, it will start to salivate less and less, and eventually it will not salivate at all in reponse to the tune. This phenomenon is called extinction. In Figure 2, a rough curve representing the cumulative number of responses during the extinction process is given. This curve has been obtained by using the well known Skinner box, in which there is a rat or a pigeon in a box, receives a piece of food each time it presses a lever. If no more food is dropped after the lever press, the rat gives this response less and less frequently until it doesn't press it at all. The curve in Figure 2 is the number of total lever presses in the course of an hour after the reward of food is removed [2].

Figure 2: Rough extinction curve. Total number of responses versus time during the period in which the CS is not followed by the US, i.e., the expected event does not occur.

In this section, the Recurrent Associative Dipole [3] that models most of the concepts defined in Section 1 will be depicted, and the simulation results for excitatory and inhibitory conditioning will be given.

The Recurrent Associative Dipole is a neural circuit that consists of two channels: one related to the on-response of a particular stimulus US, and the other to the off-reponse of the US in question. The on-channel and off-channel of the READ circuit in Figure 3 are the columns on the left and the right respectively.

Figure 3: The Recurrent Associative Dipole (READ) circuit and the response of each node to the J input shown on the bottom left

Both channels have modifiable synapses with nodes pertaining to other stimuli (CS's). The READ circuit models primary and secondary excitatory and inhibitory conditioning, opponent extinction and habituation. The differential equations defining the operation of the circuit are given below:

Primary excitatory conditioning takes place when a CS is presented at a node S_k, and the US pertaining to the READ circuit is presented at X₁ in the order shown in Figure 1(a), causing an increase in the weight w_k7 according to equation 11. The squares adjacent to the nodes x₃ and x₄ indicate that these have synapses with habituating and recovering transmitters. For example, w₁ decreases when x₁ is active and recovers when it is not, according to equation 3. This causes the activations x₃ and x₅ to decay as shown in Figure 3. Since the bias input I is equal for both channels, the habituation of w₁ results in a rebound in the off-channel after the offset of the US. If the CS is presented during this rebound, the weight w_k8 increases according to equation 12, and the CS is conditioned to the off-response, i.e., primary inhibitory conditioning takes place.

After conditioning, the feedback path from x₇(x₈) to x₁(x₂) allows the CS alone to activate x₁(x₂) and thus generate the on-response(off-response). This also makes secondary conditioning possible.

In the READ circuit, once excitatory conditioning has taken place, the weight w_k7 does not decay even if the US never arrives after the CS; i.e., if the expectation that the US will arrive is not confirmed. This can be noticed by observing equation 11: in order for w_k7 to decay, the CS must be active while x₅ is not. However, after conditioning, the CS alone is sufficient to activate the on-channel. Therefore this decay never takes place. However, psychological data suggests that the repeated nonoccurrence of the US after the CS results in the extinction of conditioned learning [2]. In the next part, this phenomenon is examined in some more detail, and a modification is made in the READ circuit to support this phenomenon.

The READ circuit supports extinction in neither excitatory nor inhibitory conditioning. Inhibitory conditioning, as defined in Part I, does not involve expectation learning. Therefore one cannot talk about the nonoccurrence of an expected event, and extinction is not supposed to occur.

As explained in the previous section, the reason for extinction not to take place in the READ circuit is that, after conditioning, CS activates the on-channel in exactly the same way that the US does (only with a slightly smaller activation.) Therefore it is not possible to differentiate between the US and a previously conditioned CS. This suggests that another level of neurons is needed to make this differentiation.

A modified version of the READ circuit is given in Figure 6, in which node X_e1 provides the possibility to make this differentiation. Node X_e2 has been added solely to preserve the symmetry of the circuit.

The additional and modified differential equations defining the operation of the modified circuit are given below:

As can be observed from the modified equations for the circuit given below, during the conditioning phase, the operation of the circuit is identical to that of the original READ circuit. However, W_k7 decays each time the activation of X_e1 is below a threshold slightly higher than the bias input; i.e., W_k7 decays when the CS is present and the US is not. This results in extinction in case of the nonoccurrence of the expected US.

Figure 5 contains the simulation results of primary excitatory conditioning with the READ circuit in Figure 3. The on-response and off-response are identical to the positive portions of X₅ and X₆, and are therefore not separately plotted.

In part (a), 10 trials are made in which the CS is presented for 200 time units, and the US is also presented in the last 40 of these. One can observe the growth in the response of X₅ at the presentation of CS at each trial. Part (b) of the same figure shows the 10 following trials during which only the CS is presented. At each of the 10 trials the on-response has the same amplitude; in fact, there is no change in W₇ either.

Figure 6 contains the results of the same experiment performed by using the modified READ circuit in Figure 4. During the conditioning phase, the circuit experiences the nonoccurrence of an expectation at the beginning of each presentation of CS, and therefore W₇ decays slightly, but when US is presented, the weight recovers from this decay. This slows down the learning process slightly, but not significantly. At the end of the learning phase, the on-response generated by the CS alone decreases each time The US does not arrive, and approaches zero at the end of 10 trials. Thus extinction takes place.

One phenomenon that must be mentioned here is secondary conditioning. The effect of the modifications on secondary conditioning will be as follows: Assume that CS₁ has been conditioned by using US. When CS₁ and CS₂ are then presented as in Figure 1(c), secondary conditioning will take place to a lesser and lesser degree each time CS₁ is not followed by US. Then, the extinction of the weight corresponding to CS₂ will be dependent on the occurrence of US and not of CS₁. This is a desirable property, since, for instance in the case of Pavlov's experiment, the useful expectation to be learned is that the light will be followed by food and not that it will be followed by the sound of a tune fork.

Therefore this circuit implements secondary conditioning in a much more practical way than the original READ circuit, since it results in the dog learning to salivate by using the light-tune pair as long as the light is followed by the sound of the tune which is mostly followed by food.

Extinction is an important property in animal learning, since it increases the adaptive capabilities of the animal by allowing it to drop responses that are not useful any more. The modification made to the READ circuit in this paper handles extinction as well as all the phenomena modelled by the READ circuit.

While the modified circuit handles extinction, there is another psychological phenomenon that must be modelled; namely the phenomenon of spontaneous recovery [2]. In experiments performed with animals, if time elapses after an extinction session like the one in Figure 6(b), the strength of the conditioned response is found to have recovered. The amount of recovery increases with the time interval between extinction sessions. However, if more extinction sessions are carried out, spontaneous recovery decreases and finally dissapears. Modelling spontaneous recovery is the next goal on this subject.

1. Baloch A.A. and Waxman A.M., "Visual learning, adaptive expectations, and behavioural conditioning of the mobile robot MAVIN", Neuarl Networks, Vol.4, pp.271-302, 1991

2. Hulse, H.H., Egeth,H., and Deese,J., The Psychology of Learning, McGraw-Hill,1980

3. Grossberg, S. and Schmajuk, N.A., "Neural dynamics of attentionally modulated Pavlovian condtioning: conditioned reinforcement, inhibition, and opponent processing." Psychobiology 15, pp.95-240

4. Grossberg,S., "A neural network architecture for Pavlovian conditioning: reinforcement, attention, forgetting, timing ", in Neural Network Models of Conditioning and Action, ed. Commons,M.L., Grossberg, S., Staddon, J.E.R., Lawrence Erlbaum Assoc., 1991

[1]This work is being supported by TUBITAK under grant EEEAG-126, Project: Modelling Cognitive Processes by Artificial Neural Networks.