Hints from Life to
AI, edited by Ugur HALICI,
METU, 1994 ã
artificial neural circuits for
conditioned learning[1]
Asli Guloksuz
Ugur Halici
Dept.of Electrical and Electronics Eng.
Middle East Technical University,
06531, Ankara, TURKEY
{guloksuz,halici}@metu.edu.tr
The concepts of
classical conditioning can be used in designing neural networks for association
and expectation learning, and behavioural conditioning in artificial systems.
Classical conditioning concepts such as excitatory conditioning, inhibitory
conditioning, secondary conditioning, opponent extinction and habituation have
been modelled by the Recurrent Associative Dipole (READ). However, this circuit
does not satisfy the experimental data on extinction in case of the
nonoccurence of an expected event. In this work, a brief overview of the basic
concepts in conditioned learning is made, the operation of the READ circuit is
depicted, and the circuit is modified so that it will model extinction as well.
The changes in the performance of the READ circuit introduced by this
modification are then explored.
1.
Introduction
An intelligent system using neural networks would
include, among other things, networks
for visual perception, pattern learning and object recognition, association
learning, expectation learning, emotional states and behavioural actions [1].
Such neural networks can be incorporated in a robot, or in adaptive systems in
any practical field.
The scope of this paper is only the parts related
to association and expectation learning, and behavioural conditioning.
Therefore, it is assumed that the objects that act as stimuli in these types of
learning have already been recognized, and are presented to the network at the
conceptual level rather than as patterns.
The neural network models depicted here aim at
using the concepts of classical conditioning in animals, to design neural
networks that can learn associations between objects (stimuli) and between
objects and responses. As some associations between stimuli and responses seem
to be inborn in animals; such as the association between the sight of food and
salivation for a dog; some vital stimulus-response pairs can be hard-wired or
initially set by software in artificial systems. Then, these initial associations can be used to form new
associations between stimuli and
between stimuli and responses. Some classical conditioning concepts will be
defined in Section 2.
The Recurrent Associative Dipole (READ) [3] is a
neural circuit that models some of the classical conditioning concepts that
will be described in Section 2. A number of these circuits are used as part of
the cognitive system of a mobile robot called MAVIN [1] developed at the
Lincoln Laboratory at the MIT. The operation of the READ circuit will be
summarized in Section 3.
The operation of the READ circuit doesn't conform
with the psychological data on extinction in the case of unconfirmed
expectations. However, the extinction of associations allows the system to
learn new associations after forgetting ones that are no longer valid, and also
prevents it from learning further associations based on those that are not
valid. We have made a to the READ
circuit and to the differential equations defining it, in order to handle the
concept of extinction. In Section 4, these modifications are described, and simulation
results of the READ circuit and its modified version are given.
2.
Conditioned Learning
In this section, some classical conditioning
phenomena that are aimed to be modelled
are described.
2.
1. Classical Conditioning
Classical or Pavlovian conditioning is the type of
learning in Pavlov's well known experiment in which a dog is repeatedly
presented with the sound of a tuning fork before being given food, and learns
to salivate at the sound of the tuning fork alone [2]. In this section, the
basic concepts of conditioned learning will be discussed.
The fundamental components of classical
conditioning are the following:
US:
The unconditioned stimulus which triggers a response without prior training. In
Pavlov's experiment, the US is food.
UR:
The unconditioned response triggered by the US; in Pavlov's experiment, the
dog's salivation.
CS:
The conditioned stimulus which comes to trigger a response by being repeatedly
paired with the US; in Pavlov's experiment, the sound of a tuning fork.
CR:
The conditioned response which arises at the occurrence of the CS, after the CS
has been repeatedly paired with the US. The CR is not necessarily identical to the
UR; it may vary in amplitude and latency. In Pavlov's experiment, the CR is
salivation at the sound of the tuning fork alone.
The occurrence of the CR indicates that an
association has been formed between the CS and the US. This association learning is meaningful in
itself, regardless of the relationship between the UR and the CR [2].
2.2.
Concepts in Classical Conditioning
In this subsection, some basic concepts of classical or Pavlovian conditioning are be
defined.
2.2.1.
Excitatory and Inhibitory Conditioning
The type of conditioning in which the CS is
conditioned to the response related to the US, as in Pavlov's experiment, is
called excitatory conditioning. In
this type, firstly the CS is presented, and then the US is presented in the
presence of the CS. This pairing is
repeated a number of times.
In inhibitory
conditioning, the CS is repeatedly presented after the offset of the US,
and is thus conditioned to the response related to this offset.
The schedules for excitatory and inhibitory conditioning
are demonstrated in Figures 1(a) and (b) respectively.
2.2.2.
Primary and Secondary Conditioning
The conditioning of a CS by repeated pairing with
a US is called primary conditioning. Once
a conditioned stimulus CS1 has been conditioned by using a US, it
can in turn be used in conditioning another stimulus CS2. In
Pavlov's experiment, for instance, after the sound has been paired with food
for a sufficient number of times, repeatedly pairing a light with the sound
will cause the dog to salivate at the sight of the light, even when food is not
used as an US. This phenomenon is called secondary
conditioning.
Schematic diagrams of secondary excitatory conditioning and secondary inhibitory conditioning are shown in Figure 1(c) and (d)
respectively.
Figure 1: Some classical conditioning schedules (a)primary excitatory conditioning (b)primary inhibitory conditioning (c)secondary excitatory conditioning (d)secondary inhibitory conditioning
2.2.3.
Blocking and Overshadowing
Assume that a stimulus CS1 is
repeatedly paired with a US until conditioning takes place; and then CS1
and a neutral unconditioned stimulus CS2 are presented together and
paired with the US. When CS2 is presented alone afterwards, it does
not result in the CR. This phenomenon of a previously conditioned stimulus
preventing the conditioning of a neutral
stimulus is called blocking.The
experimental stages of blocking are as below [2]:
1. CS1
US 2. CS1+CS2
US 3. CS2
CR
CS1
CR CS1+CS2
CR
Assume that stimuli CS1 and CS2
which are initially both neutral are presented together and paired with US. It
may be the case that conditioning occurs for the compound stimulus CS1+CS2
and for one of the stimuli; say CS1, but does not occur for CS2.
This situation may arise because CS1 is a more salient stimulus and
therefore receives more attention than CS2. This phenomenon is
called overshadowing. Overshadowing
can be demonstrated by the following stages [2]:
1. CS1+CS2
US 2. CS2
CR
CS1+CS2
CR
2.2.4.
Timing Considerations in Conditioning
For conditioning to take place, the CS must be presented before the US, as
shown in Figure 1(a). Otherwise, the US will attract more attention because of its
relevance to drives and responses in the system, and will in a way overshadow
the CS.
The time interval between the presentation
of the CS and that of the US is called the interstimulus
interval ISI. The synchronization problem in conditioning arises from the necessity that, although the ISI
varies for the various repetitions of an experiment, the CS should become
associated only with the US and not with a mixture of the US and the noise in
the environment.
2.2.5.
Extinction
In animals, learned responses are dropped if they
are not reinforced [2]. For example, if after being conditioned with the
tune-food pair, the dog in Pavlov's experiment is repeatedly presented with a
tune but no food appears, it will start to salivate less and less, and eventually
it will not salivate at all in reponse to the tune. This phenomenon is called extinction. In Figure 2, a rough curve
representing the cumulative number of responses during the extinction process
is given. This curve has been obtained by using the well known Skinner box, in
which there is a rat or a pigeon in a box, receives a piece of food each time
it presses a lever. If no more food is dropped after the lever press, the rat
gives this response less and less frequently until it doesn't press it at all.
The curve in Figure 2 is the number of total lever presses in the course of an
hour after the reward of food is removed [2].
Figure 2: Rough
extinction curve. Total number of responses versus time during the period in which
the CS is not followed by the US, i.e., the
expected event does not occur.
3. READ:A Neural Network
that Models Conditioned Learning
In this section, the Recurrent Associative Dipole
[3] that models most of the concepts defined in Section 1 will be depicted, and
the simulation results for excitatory and inhibitory conditioning will be
given.
3.1.
The Recurrent Associative Dipole (READ)
The Recurrent Associative Dipole is a neural circuit that consists of two
channels: one related to the on-response of
a particular stimulus US, and the other to the off-reponse of the US in question. The on-channel and off-channel
of the READ circuit in Figure 3 are the columns on the left and the right
respectively.
Figure
3: The Recurrent Associative Dipole (READ) circuit
and the response of each node to the J input shown on the bottom left
Both channels have modifiable synapses with nodes
pertaining to other stimuli (CS's). The READ circuit models primary and
secondary excitatory and inhibitory conditioning, opponent extinction and
habituation. The differential equations
defining the operation of the circuit are given below:
Arousal + US + Feedback On-Activation:
dx1
= -ax1 + I + J + f(x7)
(1)
dt
I:Arousal input J:Input to the On-channel (US)
Arousal + Feedback Off-Activation:
dx2
= -ax2 + I + f(x8)
(2)
dt
Depletable On and Off Transmitters:
dw1
= b(1-w1) - cg(x1)w1
(3)
dt
dw2
= b(1-w2) - cg(x2)w2
(4)
dt
Gated On and Off Activations:
dx3
= -ax3 + eg(x1)w1
(5)
dt
dx4
= -ax4 + eg(x2)w2
(6)
dt
Normalized Opponent On and Off Activations:
dx5
= -ax5 + (h-x5)x3 - (x5+k)x4 (7)
dt
dx6
= -ax6 + (h-x6)x4 - (x6+k)x3 (8)
dt
Total On and Off Activations:
dx7
= -ax7 + m[x5] +- pSkwk7 (9)
dt
dx8
= -ax8 + m[x6] +- pSkwk8 (10)
dt
On -conditioned and Off-conditioned Reinforcer Learning:
dwk7
= Sk(-qwk7 + r[x5]+)
(11)
dt
dwk8
= Sk(-qwk8 + r[x6]+)
(12)
dt
On and Off Responses:
ON = [x5]+
(13)
OFF
= [x6]+
(14)
3.2.
Primary and Secondary Excitatory Conditioning of the READ Circuit
Primary excitatory conditioning takes place when a
CS is presented at a node Sk, and the US pertaining to the READ
circuit is presented at X1 in the order shown in Figure 1(a),
causing an increase in the weight wk7 according to equation 11. The
squares adjacent to the nodes x3 and x4 indicate that these
have synapses with habituating and recovering transmitters. For example, w1
decreases when x1 is active and recovers when it is not, according
to equation 3. This causes the activations x3 and x5 to
decay as shown in Figure 3. Since the bias input I is equal for both channels,
the habituation of w1 results in a rebound in the off-channel after
the offset of the US. If the CS is presented during this rebound, the weight wk8
increases according to equation 12, and the CS is conditioned to the off-response,
i.e., primary inhibitory conditioning takes place.
After conditioning, the feedback path from x7(x8)
to x1(x2) allows the CS alone to activate x1(x2)
and thus generate the on-response(off-response). This also makes secondary
conditioning possible.
3.3.
Extinction in the READ Circuit
In the READ circuit, once excitatory conditioning
has taken place, the weight wk7 does not decay even if the US never
arrives after the CS; i.e., if the expectation that the US will arrive is not
confirmed. This can be noticed by observing equation 11: in order for wk7
to decay, the CS must be active while x5 is not. However, after
conditioning, the CS alone is sufficient to activate the on-channel. Therefore
this decay never takes place. However, psychological data suggests that the
repeated nonoccurrence of the US after the CS results in the extinction of
conditioned learning [2]. In the next part, this phenomenon is examined in some
more detail, and a modification is made in the READ circuit to support this
phenomenon.
4.
A Modification of the READ Circuit to Model Extinction
The READ circuit supports extinction in neither
excitatory nor inhibitory conditioning. Inhibitory conditioning, as defined in
Part I, does not involve expectation learning. Therefore one cannot talk about
the nonoccurrence of an expected event, and extinction is not supposed to
occur.
As explained in the previous section, the reason
for extinction not to take place in the READ circuit is that, after
conditioning, CS activates the on-channel in exactly the same way that the US
does (only with a slightly smaller activation.) Therefore it is not possible to
differentiate between the US and a previously conditioned CS. This suggests
that another level of neurons is needed to make this differentiation.
A modified version of the READ circuit is given in
Figure 6, in which node Xe1 provides the possibility to make this
differentiation. Node Xe2 has been added solely to preserve the
symmetry of the circuit.
Figure
4: Modified
READ circuit that handles extinction
The additional and modified differential equations defining the operation of the
modified circuit are given below:
New nodes:
dxe1
= -axe1 + I + J (15)
dt
dxe2
= -axe2 + I (16)
dt
Modified equations:
dx1
= -ax1 + xe1 + f(x7) (17)
dt
dx2
= -ax2 + xe2 + f(x8) (18)
dt
dwk7
= Sk(-qwk7 + r[x5q(xe1-I)]+)
(19)
dt
As can be observed from the modified equations for
the circuit given below, during the conditioning phase, the operation of the
circuit is identical to that of the original READ circuit. However, Wk7
decays each time the activation of Xe1 is below a threshold slightly
higher than the bias input; i.e., Wk7 decays when the CS is present
and the US is not. This results in extinction in case of the nonoccurrence of
the expected US.
4.4. Experimental
Results
Figure 5 contains the simulation results of
primary excitatory conditioning with the READ circuit in Figure 3. The on-response
and off-response are identical to the positive portions of X5 and X6,
and are therefore not separately plotted.
In part (a), 10 trials are made in which the CS is
presented for 200 time units, and the US is also presented in the last 40 of
these. One can observe the growth in the response of X5 at the
presentation of CS at each trial. Part (b) of the same figure shows the 10
following trials during which only the CS is presented. At each of the 10
trials the on-response has the same amplitude; in fact, there is no change in W7
either.
Figure 6 contains the results of the same
experiment performed by using the
modified READ circuit in Figure 4. During the conditioning phase, the circuit
experiences the nonoccurrence of an expectation at the beginning of each
presentation of CS, and therefore W7 decays slightly, but when US is
presented, the weight recovers from this decay. This slows down the learning process slightly, but not significantly.
At the end of the learning phase, the on-response generated by the CS alone
decreases each time The US does not arrive, and approaches zero at the end of
10 trials. Thus extinction takes place.
One
phenomenon that must be mentioned here is secondary conditioning. The effect of
the modifications on secondary conditioning will be as follows: Assume that CS1
has been conditioned by using US. When CS1
and CS2
are then presented as in Figure 1(c), secondary conditioning will take place to
a lesser and lesser degree each time CS1
is not followed by US. Then, the extinction of the weight corresponding to CS2
will be dependent on the occurrence of US and not of CS1.
This is a desirable property, since, for instance in the case of Pavlov's
experiment, the useful expectation to be learned is that the light will be
followed by food and not that it will be followed by the sound of a tune fork.
Therefore
this circuit implements secondary conditioning in a much more practical way
than the original READ circuit, since it results in the dog learning to
salivate by using the light-tune pair as long as the light is followed by the
sound of the tune which is mostly followed by food.
5.
Conclusion
Extinction is an important property in animal
learning, since it increases the adaptive capabilities of the animal by
allowing it to drop responses that are not useful any more. The modification
made to the READ circuit in this paper handles extinction as well as all the
phenomena modelled by the READ circuit.
While the modified circuit handles extinction,
there is another psychological phenomenon that must be modelled; namely the
phenomenon of spontaneous recovery
[2]. In experiments performed with animals, if time elapses after an extinction
session like the one in Figure 6(b), the strength of the conditioned response
is found to have recovered. The amount of recovery increases with the time
interval between extinction sessions. However, if more extinction sessions are
carried out, spontaneous recovery decreases and finally dissapears. Modelling
spontaneous recovery is the next goal on this subject.
References
1. Baloch A.A. and Waxman A.M., "Visual
learning, adaptive expectations, and behavioural conditioning of the mobile
robot MAVIN", Neuarl Networks,
Vol.4, pp.271-302, 1991
2. Hulse, H.H., Egeth,H., and Deese,J., The Psychology of Learning, McGraw-Hill,1980
3. Grossberg, S. and Schmajuk, N.A., "Neural dynamics of attentionally
modulated Pavlovian condtioning: conditioned reinforcement, inhibition, and
opponent processing." Psychobiology 15, pp.95-240
4. Grossberg,S., "A neural network
architecture for Pavlovian conditioning: reinforcement, attention, forgetting,
timing ", in Neural Network Models
of Conditioning and Action, ed. Commons,M.L., Grossberg, S., Staddon, J.E.R.,
Lawrence Erlbaum Assoc., 1991
5. Levine,D.S.,
Neural and Cognitive Modelling,
Lawrence Erlbaum Assoc., 1991