Hints from Life to AI,  edited by Ugur HALICI, METU, 1994 ã

 

  


artificial neural circuits for conditioned  learning[1]


Asli Guloksuz                 

Ugur Halici

Dept.of Electrical and Electronics Eng.

Middle East Technical University,

06531, Ankara, TURKEY

{guloksuz,halici}@metu.edu.tr

 


 The concepts of classical conditioning can be used in designing neural networks for association and expectation learning, and behavioural conditioning in artificial systems. Classical conditioning concepts such as excitatory conditioning, inhibitory conditioning, secondary conditioning, opponent extinction and habituation have been modelled by the Recurrent Associative Dipole (READ). However, this circuit does not satisfy the experimental data on extinction in case of the nonoccurence of an expected event. In this work, a brief overview of the basic concepts in conditioned learning is made, the operation of the READ circuit is depicted, and the circuit is modified so that it will model extinction as well. The changes in the performance of the READ circuit introduced by this modification are then explored.    

 


1. Introduction

 

An intelligent system using neural networks would include, among other things,  networks for visual perception, pattern learning and object recognition, association learning, expectation learning, emotional states and behavioural actions [1]. Such neural networks can be incorporated in a robot, or in adaptive systems in any practical field.

         

The scope of this paper is only the parts related to association and expectation learning, and behavioural conditioning. Therefore, it is assumed that the objects that act as stimuli in these types of learning have already been recognized, and are presented to the network at the conceptual level rather than as patterns. 

 

The neural network models depicted here aim at using the concepts of classical conditioning in animals, to design neural networks that can learn associations between objects (stimuli) and between objects and responses. As some associations between stimuli and responses seem to be inborn in animals; such as the association between the sight of food and salivation for a dog; some vital stimulus-response pairs can be hard-wired or initially set by software in artificial systems.  Then, these initial associations can be used to form new associations between  stimuli and between stimuli and responses. Some classical conditioning concepts will  be  defined in Section 2.

 

The Recurrent Associative Dipole (READ) [3] is a neural circuit that models some of the classical conditioning concepts that will be described in Section 2. A number of these circuits are used as part of the cognitive system of a mobile robot called MAVIN [1] developed at the Lincoln Laboratory at the MIT. The operation of the READ circuit will be summarized in Section 3. 

 

The operation of the READ circuit doesn't conform with the psychological data on extinction in the case of unconfirmed expectations. However, the extinction of associations allows the system to learn new associations after forgetting ones that are no longer valid, and also prevents it from learning further associations based on those that are not valid. We have made a  to the READ circuit and to the differential equations defining it, in order to handle the concept of extinction. In Section 4, these modifications are described, and simulation results of the READ circuit and its modified version are given.

 

2. Conditioned Learning

 

In this section, some classical conditioning phenomena that  are aimed to be modelled are described.

 

2. 1. Classical Conditioning

 

Classical or Pavlovian conditioning is the type of learning in Pavlov's well known experiment in which a dog is repeatedly presented with the sound of a tuning fork before being given food, and learns to salivate at the sound of the tuning fork alone [2]. In this section, the basic concepts of conditioned learning will be discussed.

 

 

 

The fundamental components of classical conditioning are the following:

 

     US: The unconditioned stimulus which triggers a response without prior training. In Pavlov's experiment, the US is food.

 

     UR: The unconditioned response triggered by the US; in Pavlov's experiment, the dog's salivation.

 

     CS: The conditioned stimulus which comes to trigger a response by being repeatedly paired with the US; in Pavlov's experiment, the sound of a tuning fork.

 

     CR: The conditioned response which arises at the occurrence of the CS, after the CS has been repeatedly paired with the US. The CR is not necessarily identical to the UR; it may vary in amplitude and latency. In Pavlov's experiment, the CR is salivation at the sound of the tuning fork alone.

 

The occurrence of the CR indicates that an association has been formed between the CS and the US. This association learning is meaningful in itself, regardless of the relationship between the UR and the CR [2].

 

2.2. Concepts in Classical Conditioning

 

In this subsection, some basic concepts of  classical or Pavlovian conditioning are be defined.

 

2.2.1. Excitatory and Inhibitory Conditioning

 

The type of conditioning in which the CS is conditioned to the response related to the US, as in Pavlov's experiment, is called excitatory conditioning. In this type, firstly the CS is presented, and then the US is presented in the presence of the CS.  This pairing is repeated a number of times.

 

In inhibitory conditioning, the CS is repeatedly presented after the offset of the US, and is thus conditioned to the response related to this offset.

 

The schedules for excitatory and inhibitory conditioning are demonstrated in Figures 1(a) and (b) respectively.

 

2.2.2. Primary and Secondary Conditioning

 

The conditioning of a CS by repeated pairing with a US is called primary conditioning.          Once a conditioned stimulus CS1 has been conditioned by using a US, it can in turn be used in conditioning another stimulus CS2. In Pavlov's experiment, for instance, after the sound has been paired with food for a sufficient number of times, repeatedly pairing a light with the sound will cause the dog to salivate at the sight of the light, even when food is not used as an US. This phenomenon is called secondary conditioning.

 

Schematic diagrams of secondary excitatory conditioning and secondary inhibitory conditioning are shown in Figure 1(c) and (d) respectively.

 

 

Figure 1:  Some classical conditioning schedules  (a)primary excitatory conditioning   (b)primary inhibitory conditioning   (c)secondary excitatory conditioning  (d)secondary inhibitory conditioning

 

2.2.3. Blocking and Overshadowing

 

Assume that a stimulus CS1 is repeatedly paired with a US until conditioning takes place; and then CS1 and a neutral unconditioned stimulus CS2 are presented together and paired with the US. When CS2 is presented alone afterwards, it does not result in the CR. This phenomenon of a previously conditioned stimulus preventing the conditioning of a neutral  stimulus is called blocking.The experimental stages of blocking are as below [2]:

 

1.   CS1 US            2.    CS1+CS2 US                 3.   CS2 CR

      CS1 CR                  CS1+CS2 CR

 

Assume that stimuli CS1 and CS2 which are initially both neutral are presented together and paired with US. It may be the case that conditioning occurs for the compound stimulus CS1+CS2 and for one of the stimuli; say CS1, but does not occur for CS2. This situation may arise because CS1 is a more salient stimulus and therefore receives more attention than CS2. This phenomenon is called overshadowing. Overshadowing can be demonstrated by the following stages [2]:

 

1.   CS1+CS2 US           2.   CS2 CR

      CS1+CS2 CR

 

 

2.2.4. Timing Considerations in Conditioning

 

For conditioning to take place,  the CS must be presented before the US, as shown in Figure 1(a). Otherwise, the US will attract more attention because of its relevance to drives and responses in the system, and will in a way overshadow the CS.

 

The time interval between the presentation of the CS and that of the US is called the interstimulus interval ISI. The synchronization problem in conditioning arises from the necessity that, although the ISI varies for the various repetitions of an experiment, the CS should become associated only with the US and not with a mixture of the US and the noise in the environment.

     

2.2.5. Extinction

 

In animals, learned responses are dropped if they are not reinforced [2]. For example, if after being conditioned with the tune-food pair, the dog in Pavlov's experiment is repeatedly presented with a tune but no food appears, it will start to salivate less and less, and eventually it will not salivate at all in reponse to the tune. This phenomenon is called extinction. In Figure 2, a rough curve representing the cumulative number of responses during the extinction process is given. This curve has been obtained by using the well known Skinner box, in which there is a rat or a pigeon in a box, receives a piece of food each time it presses a lever. If no more food is dropped after the lever press, the rat gives this response less and less frequently until it doesn't press it at all. The curve in Figure 2 is the number of total lever presses in the course of an hour after the reward of food is removed [2].

 

 

Figure 2: Rough extinction curve. Total number of responses versus time during the period in which the CS is not followed by the US, i.e., the expected event does not occur.

 

 

 

3. READ:A Neural Network that Models Conditioned Learning

 

In this section, the Recurrent Associative Dipole [3] that models most of the concepts defined in Section 1 will be depicted, and the simulation results for excitatory and inhibitory conditioning will be given.

 

3.1. The Recurrent Associative Dipole (READ)

         

The Recurrent Associative Dipole  is a neural circuit that consists of two channels: one related to the on-response of a particular stimulus US, and the other to the off-reponse of the US in question. The on-channel and off-channel of the READ circuit in Figure 3 are the columns on the left and the right respectively.

 

 

 

Figure 3: The Recurrent Associative Dipole (READ) circuit and the response of each node to the J input shown on the bottom left

 

 

Both channels have modifiable synapses with nodes pertaining to other stimuli (CS's). The READ circuit models primary and secondary excitatory and inhibitory conditioning, opponent extinction and habituation. The  differential equations defining the operation of the circuit are given below:

        

          Arousal + US + Feedback On-Activation:             

                    dx1 = -ax1 + I + J + f(x7)                                      (1)

                    dt

                    I:Arousal input         J:Input to the On-channel (US)

          Arousal + Feedback Off-Activation:

                    dx2 = -ax2 + I  + f(x8)                                           (2)

                    dt

         Depletable On and Off Transmitters:

                     dw1 = b(1-w1) - cg(x1)w1                                            (3)

                          dt

                      dw2 = b(1-w2) - cg(x2)w2                                           (4)

                      dt

        

        Gated On and Off Activations:

                    dx3 = -ax3 + eg(x1)w1                                                    (5)

                    dt

                    dx4 = -ax4 + eg(x2)w2                                                    (6)

                    dt

          Normalized Opponent On and Off Activations:

                    dx5 = -ax5 + (h-x5)x3 - (x5+k)x4                                  (7)

                    dt

                    dx6 = -ax6 + (h-x6)x4 - (x6+k)x3                                  (8)

                    dt

          Total On and Off  Activations:

                    dx7 = -ax7 + m[x5] +- pSkwk7                                     (9)

                    dt

                    dx8 = -ax8 + m[x6] +- pSkwk8                                    (10)

                    dt         

          On -conditioned and Off-conditioned Reinforcer Learning:

                    dwk7 = Sk(-qwk7 + r[x5]+)                                  (11)

                    dt

                    dwk8 = Sk(-qwk8 + r[x6]+)                                    (12)

                    dt

               On and Off Responses:

                    ON   = [x5]+                                                                    (13)

                    OFF = [x6]+                                                                     (14)

 

         

3.2. Primary and Secondary Excitatory Conditioning of the READ Circuit

 

Primary excitatory conditioning takes place when a CS is presented at a node Sk, and the US pertaining to the READ circuit is presented at X1 in the order shown in Figure 1(a), causing an increase in the weight wk7 according to equation 11. The squares adjacent to the nodes x3 and x4 indicate that these have synapses with habituating and recovering transmitters. For example, w1 decreases when x1 is active and recovers when it is not, according to equation 3. This causes the activations x3 and x5 to decay as shown in Figure 3. Since the bias input I is equal for both channels, the habituation of w1 results in a rebound in the off-channel after the offset of the US. If the CS is presented during this rebound, the weight wk8 increases according to equation 12, and the CS is conditioned to the off-response, i.e., primary inhibitory conditioning takes place.

         

After conditioning, the feedback path from x7(x8) to x1(x2) allows the CS alone to activate x1(x2) and thus generate the on-response(off-response). This also makes secondary conditioning possible.

 

3.3. Extinction in the READ Circuit

         

In the READ circuit, once excitatory conditioning has taken place, the weight wk7 does not decay even if the US never arrives after the CS; i.e., if the expectation that the US will arrive is not confirmed. This can be noticed by observing equation 11: in order for wk7 to decay, the CS must be active while x5 is not. However, after conditioning, the CS alone is sufficient to activate the on-channel. Therefore this decay never takes place. However, psychological data suggests that the repeated nonoccurrence of the US after the CS results in the extinction of conditioned learning [2]. In the next part, this phenomenon is examined in some more detail, and a modification is made in the READ circuit to support this phenomenon.

 

4. A Modification of the READ Circuit to Model Extinction

         

The READ circuit supports extinction in neither excitatory nor inhibitory conditioning. Inhibitory conditioning, as defined in Part I, does not involve expectation learning. Therefore one cannot talk about the nonoccurrence of an expected event, and extinction is not supposed to occur.

         

As explained in the previous section, the reason for extinction not to take place in the READ circuit is that, after conditioning, CS activates the on-channel in exactly the same way that the US does (only with a slightly smaller activation.) Therefore it is not possible to differentiate between the US and a previously conditioned CS. This suggests that another level of neurons is needed to make this differentiation.

         

A modified version of the READ circuit is given in Figure 6, in which node Xe1 provides the possibility to make this differentiation. Node Xe2 has been added solely to preserve the symmetry of the circuit.

 

             

 

Figure 4:  Modified READ circuit that handles extinction

 

 

The additional and modified differential  equations defining the operation of the modified circuit are given below:

 

          New  nodes:

                    dxe1 = -axe1 + I + J                                          (15)

                    dt

                    dxe2 = -axe2 + I                                               (16)

                    dt

              

               Modified equations:

                    dx1 = -ax1 + xe1 + f(x7)                                    (17)

                    dt

                    dx2 = -ax2 + xe2 + f(x8)                                   (18)

                    dt

                    dwk7 = Sk(-qwk7 + r[x5q(xe1-I)]+)                               (19)

                         dt

 

As can be observed from the modified equations for the circuit given below, during the conditioning phase, the operation of the circuit is identical to that of the original READ circuit. However, Wk7 decays each time the activation of Xe1 is below a threshold slightly higher than the bias input; i.e., Wk7 decays when the CS is present and the US is not. This results in extinction in case of the nonoccurrence of the expected US.

 

 

4.4. Experimental Results

 

Figure 5 contains the simulation results of primary excitatory conditioning with the READ circuit in Figure 3. The on-response and off-response are identical to the positive portions of X5 and X6, and are therefore not separately plotted.

 

In part (a), 10 trials are made in which the CS is presented for 200 time units, and the US is also presented in the last 40 of these. One can observe the growth in the response of X5 at the presentation of CS at each trial. Part (b) of the same figure shows the 10 following trials during which only the CS is presented. At each of the 10 trials the on-response has the same amplitude; in fact, there is no change in W7 either.

         

Figure 6 contains the results of the same experiment performed by using  the modified READ circuit in Figure 4. During the conditioning phase, the circuit experiences the nonoccurrence of an expectation at the beginning of each presentation of CS, and therefore W7 decays slightly, but when US is presented, the weight recovers from this decay. This  slows down the learning process slightly, but not significantly. At the end of the learning phase, the on-response generated by the CS alone decreases each time The US does not arrive, and approaches zero at the end of 10 trials. Thus extinction takes place.

         

One phenomenon that must be mentioned here is secondary conditioning. The effect of the modifications on secondary conditioning will be as follows: Assume that CS1 has been conditioned by using US. When CS1 and CS2 are then presented as in Figure 1(c), secondary conditioning will take place to a lesser and lesser degree each time CS1 is not followed by US. Then, the extinction of the weight corresponding to CS2 will be dependent on the occurrence of US and not of CS1. This is a desirable property, since, for instance in the case of Pavlov's experiment, the useful expectation to be learned is that the light will be followed by food and not that it will be followed by the sound of a tune fork.

 

Therefore this circuit implements secondary conditioning in a much more practical way than the original READ circuit, since it results in the dog learning to salivate by using the light-tune pair as long as the light is followed by the sound of the tune which is mostly followed by food.

 

5. Conclusion

                 

Extinction is an important property in animal learning, since it increases the adaptive capabilities of the animal by allowing it to drop responses that are not useful any more. The modification made to the READ circuit in this paper handles extinction as well as all the phenomena  modelled by the READ circuit.

         

While the modified circuit handles extinction, there is another psychological phenomenon that must be modelled; namely the phenomenon of spontaneous recovery [2]. In experiments performed with animals, if time elapses after an extinction session like the one in Figure 6(b), the strength of the conditioned response is found to have recovered. The amount of recovery increases with the time interval between extinction sessions. However, if more extinction sessions are carried   out, spontaneous recovery  decreases and finally dissapears. Modelling spontaneous recovery is the next goal on this subject.

 

References

 

1.  Baloch A.A. and Waxman A.M., "Visual learning, adaptive expectations, and behavioural conditioning of the mobile robot MAVIN", Neuarl Networks, Vol.4, pp.271-302, 1991

2.  Hulse, H.H., Egeth,H., and  Deese,J., The Psychology of Learning, McGraw-Hill,1980

3.  Grossberg, S. and Schmajuk, N.A.,  "Neural dynamics of attentionally modulated Pavlovian condtioning: conditioned reinforcement, inhibition, and opponent processing."  Psychobiology 15, pp.95-240

4.  Grossberg,S., "A neural network architecture for Pavlovian conditioning: reinforcement, attention, forgetting, timing ", in Neural Network Models of Conditioning and Action, ed. Commons,M.L., Grossberg, S., Staddon, J.E.R., Lawrence Erlbaum Assoc., 1991

5.     Levine,D.S., Neural and Cognitive Modelling, Lawrence Erlbaum Assoc., 1991

 


 

contents                                     home



[1]This work is being supported by TUBITAK under grant EEEAG-126, Project: Modelling Cognitive Processes by Artificial Neural Networks.