Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Adaptive bidirectional associative memories

Open Access Open Access

Abstract

Bidirectionality, forward and backward information flow, is introduced in neural networks to produce two-way associative search for stored stimulus-response associations (Ai,Bi). Two fields of neurons, FA and FB, are connected by an n × p synaptic marix M. Passing information through M gives one direction, passing information through its transpose MT gives the other. Every matrix is bidirectionally stable for bivalent and for continuous neurons. Paired data (Ai,Bi) are encoded in M by summing bipolar correlation matrices. The bidirectional associative memory (BAM) behaves as a two-layer hierarchy of symmetrically connected neurons. When the neurons in FA and FB are activated, the network quickly evolves to a stable state of two-pattern reverberation, or pseudoadaptive resonance, for every connection topology M. The stable reverberation corresponds to a system energy local minimum. An adaptive BAM allows M to rapidly learn associations without supervision. Stable short-term memory reverberations across FA and FB gradually seep pattern information into the long-term memory connections M, allowing input associations (Ai,Bi) to dig their own energy wells in the network state space. The BAM correlation encoding scheme is extended to a general Hebbian learning law. Then every BAM adaptively resonates in the sense that all nodes and edges quickly equilibrate in a system energy local minimum. A sampling adaptive BAM results when many more training samples are presented than there are neurons in FA and FB, but presented for brief pulses of learning, not allowing learning to fully or nearly converge. Learning tends to improve with sample size. Sampling adaptive BAMs can learn some simple continuous mappings and can rapidly abstract bivalent associations from several noisy gray-scale samples.

© 1987 Optical Society of America

I. Introduction: Storing Data Pairs in Associative Memory Matrices

An n × p real matrix M can be interpreted as a matrix of synapses between two fields of neurons. The input or bottom-up field FA consists of n neurons {al, …, an}. The output or top-down field FB consists of p neurons {bl, …, bp}. The neurons ai and bj are the units of short-term memory (STM). For convenience, we shall use ai and bj to indicate neuron names and neuron states. Matrix entry mij is the synaptic connection from ai to bj. It is the unit of long-term memory (LTM). The sign of mij determines the type of synaptic connection: excitatory if mij > 0, inhibitory if mij < 0. The magnitude of mij determines the strength of the connection. A real n-dimensional row vector A represents a state of FA, a STM pattern of activity across the neurons al, …, an. A real p-dimensional row vector B represents a state of FB. An associative memory is any vector space transformation T:RnRp. Usually T is nonlinear. The matrix mapping M:RnRp is a linear associative memory. When FA and FB are distinct, M is a heteroassociative associative memory. It stores vector data pairs (Ai,Bi). In the special case when FA = FB, M is an autoassociative associative memory. It stores data vectors Ai.

Recall proceeds through vector-matrix multiplication and nonlinear state transition. The p-vector A M is a fan-in vector of input sums to the neurons in FB: A M= (Ib1, …, Ibp). Specifically, each neuron ai fans out its numeric output ai across each synaptic pathway mij, sending the gated product aimij to each neuron bj in FB. Each neuron bj receives a fan-in of n gated products aimij, arriving independently and perhaps asynchronously, and sums them to compute its input Ibj = aim1j + … + anmnj. Neuron bj processes input Ibj to produce the output signal S(Ibj). In general the signal function S is nonlinear, usually sigmoidal or S-shaped. The associative memory M recalls the vector of output signals [S(Ib1), …, S(Ibp)] when presented with input key A. In the simplest associative memories, linear associative memories, each neuron's output signal is simply its input signal: S(Ibj) = Ibj. Then associative recall is simply vector multiplication: B = A M.

What is the simplest way to store m data pairs (A1,B1),(A2,B2), …, (Am,Bm) in an n × p associative memory matrix M? The simplest storage procedure is to convert each association (Ai,Bi) into an n × p matrix Mi, then combine each association matrix Mi pointwise. The simplest pointwise combination technique is addition: M = M1 + … + Mm. The simplest operation for converting two row vectors Ai, and Bi of dimensions n and p into an n × p matrix Mi is the vector outer product AiTBi. So the simplest way to store m (Ai,Bi) is to sum outer product or correlation matrices:

M=A1TB1++AmTBm.
This is the familiar storage method used in the theory of linear associative memories, studied by Kohonen[1],[2] and Anderson et al.[3] If the input patterns A1, …, Am are orthonormal AiAjT=1 if i = j, 0 if not—perfect recall of the associated output patterns {BI …, Bm} is achieved in the forward direction:
AiM=AiAiTBi+ij(AiAjT)Bj=Bi.
If A1, …, Am are not orthonormal, as in general they are not, the second term on the right-hand side of Eq. (2), the noise term, contributes crosstalk to the recalled pattern by additively modulating the signal term. More generally, as Kohonen[2] has shown, the least-squares optimal linear associative memory (OLAM) M is given by M = A* B, where A is the m × n matrix whose ith row is Ai, B is the m × p matrix whose ith row is Bi, and A* is the Moore-Penrose pseudoinverse of A. If {A1, …, Am} are orthonormal, the OLAM M = ATB, which is equivalent to the memory scheme in Eq. (1).

II. Discrete Bidirectional Associative Memory (BAM) Stability

Suppose we wish to synchronously feed back the recalled output B to an associative memory M to improve recall accuracy. The recalled output B is some nonlinear transformation S of the input sum A M:B = S(A M) = [S(AM1), …, S(A Mp)], where Mj is the jth column of M. What is the simplest way to feed B back to the associative memory? Since M has dimensions n × p and B is a p vector, B cannot vector multiply M, but it can multiply the M matrix transpose (adjoint) MT. Thus the simplest feedback scheme is to pass B backward through MT. Any other feedback scheme requires more information in the form of a p × n matrix N different from MT. Field FA receives the top-down message B MT and produces the new STM pattern A=S(BMT)=[S(BM1T),,S(BMnT)] across FA, where Mi is the ith row (column) of M (MT). Carpenter[4] and Grossberg[5][9] interpret top-down signals as expectations in their adaptive resonance theory (ART). Intuitively A′ is what the field FB expects to see when it receives bottom-up input B.

If A′ is fed back through M, a new B′ results, which can be fed back through MT to produce A″, and so on. Ideally this back-and-forth flow of distributed information will quickly equilibrate or resonate on a fixed iata pair (Af,Bf):

AMB,AMTB,AMB,AMTB,AfMBf,AfMTBf,
If an associative memory matrix M equilibrates in this fashion for every input pair (A,B), then M is said to be bidirectionally stable.[10],[11]

Which matrices are bidirectionally stable for which signal functions S? Linear associative memory matrices are obviously in general not bidirectionally stable. We shall limit our discussion to sigmoidal or S-shaped signal functions S, such as S(x) = (1 + ex)−1, or more generally, to bounded monotone increasing signal functions. Grossberg[12] long ago showed that this is not a limitation at all. He proved that, roughly speaking, a sigmoidal signal function is optimal in the sense that, in unidirectional competitive networks, it computes a quenching threshold below which neural activity is suppressed as noise and above which activity is contrast enhanced and then stored as a stable reverberation in STM. In particular, linear signal functions amplify noise as faithfully as they amplify signals. This theoretical fact reflects the evolutionary fact that real neuron firing frequency is sigmoidal.

First we consider bivalent, or McCulloch-Pitts,[13] neurons. Each neuron ai and bj is either on (+1) or off (0 or −1) at any time. Hence a state A of FA is a point in the Boolean n-cube Bn = {0,1}n or {−1,1}n. A state B of FB is a point in Bp = {0,1}p or {−1,1}p. A state of the bidirectional associative memory (BAM) (FA, M, FB) is a point (A,B) in the bivalent product space Bn × Bp. Topologically, a BAM can be viewed as a two-layer hierarchy of symmetrically connected fields: ao-26-23-4947-i001

What is the simplest signal function S for a bivalent BAM (FA, M, FB)? The simplest S is a threshold function:

ai={1ifBMiT>0,0ifBMiT<0,
bj={1ifAMj>0,0ifAMj<0,
where once again Mi is the ith row (column) of M (MT) and Mj is the jth column (row) of M (MT). If the input sum to each neuron equals its threshold 0, the neuron maintains its current state. It stays on if it already is on, off if already off. For simplicity, each neuron has threshold 0 and no external inputs. In general, ai has a numeric threshold Ti and constant numeric input Ii; bj has threshold Sj and input Jj. A bivalent BAM is then specified by the vector 7-tuple (FA, T, I, M, FB, S, J) and the threshold laws (3) and (4) are modified accordingly; e.g., ai = 1 if BMiT+Ii>Ti

Which matrices M are bidirectionally stable for bivalent BAMs? All matrices. Every synaptic connection topology rapidly equilibrates, no matter how large the dimensions n and p. This surprising theorem is proved in [Refs. 11] and [14] and generalizes the well-known undirectional stability for autoassociative networks with square symmetric M, as popularized by Hopfield[15] and reviewed below. Bidirectionality, forward and backward information flow, in neural nets produces two-way associative search for the nearest stored pair (Ai,Bi) to an input key. Since every matrix is bidirectionally stable, many more matrices can be decoded than those in which information has been deliberately encoded.

When the BAM neurons are activated, the network quickly evolves to a stable state of two-pattern reverberation, or nonadaptive resonance.[4],[7] The resonance is nonadaptive because no learning occurs. The weights mij are fixed. This behavior approximates equilibrium behavior in a learning context since changes in the synapses (LTM traces) mij are invariably slower than changes in the neuron activations (STM traces) ai and bj. Below we shall exploit this property to construct adaptive BAMs.

The stable reverberation corresponds to a system energy local minimum. Geometrically, an input pattern is placed on the BAM energy surface as a ball bearing in the bivalent product space Bn × Bp. In particular, the bipolar correlation encoding scheme described below sculpts the energy surface so that the data pairs (Ai, Bi) are stored as local energy minima. The input ball bearing rolls down into the nearest basin of attraction, dissipating energy as it rolls. Frictional damping brings it to rest at the bottom of the energy well, and the pattern is classified or misclassified accordingly. Thus the BAM behaves as a programmable dissipative dynamic system.

For completeness we review the proof[10],[11] that every matrix is bivalently bidirectionally stable. The proof technique is to show that some system functional E:Bn × BpR is a Lyapunov function or bounded monotone decreasing energy function for the network. The energy function decreases if state changes occur. System stability occurs when the functional E rapidly obtains its lower bound, where it stays forever. Lyapunov functionals provide a shortcut to the global analysis of nonlinear dynamic systems, sidestepping the often hopeless task of solving the many coupled nonlinear difference or differential equations. The most general Lyapunov stability result is the Cohen-Grossberg theorem[16] for symmetric undirectional autoassociators, which we extend in this and the next section to arbitrary bidirectional heteroassociators. The Lyapunov trick of the Cohen-Grossberg theorem is to substitute the neuron state-transition equations into the derivative of the appropriate energy function, and then use a sign argument to show that the derivative is always nonpositive. Hopfield[15] used the discrete version of this Lyapunov trick to show that zero-diagonal symmetric unidirectional autoassociators are stable for asynchronous or serial state changes, i.e., where at any moment at most one neuron changes state. The argument we now present subsumes this case when FA = FB and M = MT in simple asynchronous operation. An appropriate measure of the energy of the bivalent (A,B) is the sum (average) of two energies: the energy A M BT of the forward pass and the energy B MT AT of the backward pass. Taking the negative of these quadratic forms gives

E(A,B)=½A MBT½BMTAT=A MBT=ijaibjmij,
provided all thresholds Ti = Sj = 0 and inputs Ii = Jj = 0, which we shall assume for simplicity. In general the appropriate energy function includes thresholds and inputs linearly:
E(A,B)=A MBTIAT+TATJBT+SBT.

BAM convergence is proved by showing that synchronous or asynchronous state changes decrease the energy and that the energy is bounded below, so the BAM monotonically gravitates to fixed points. E is trivially bounded below for all A and B:

E(A,B)ij|mij|.

Synchronous vs asynchronous state changes must be clarified. Synchronous behavior occurs when all or some neurons within a field change their state at the same clock cycle. Asynchronous behavior is a special case. Simple asynchronous behavior occurs when only one neuron per field changes state per cycle. Subset asynchronous behavior occurs when some proper subset of neurons within a field changes state per cycle. These definitions of asynchrony are cross sectional. The resultant time-series interpretation of asynchronous behavior is that each neuron in a field randomly and independently changes state, converting the BAM network into a stochastic process. In the proof below we do not assume that changes occur concurrently in the two fields FA and FB. Otherwise, in principle the energy function might increase. Examination of the argument below shows, though, that this is very unlikely in large networks since so many additive terms in the energy differential are always negative. In any event, the BAM model of back-and-forth information flow we have been developing implicitly assumes that state changes are occurring in at most one field FA or FB at a time. Further, the Lyapunov argument below shows that synchronous operation produces sums of pointwise (neuronwise) energy changes that can be large. In practice this means synchronous updates produce much faster convergence than asynchronous updates.

First we consider state changes in field FA. A similar argument will hold for changes in FB. Field FA change is denoted by ΔA = A2A1 = (Δa1, …, Δan) and energy change by ΔE = E2E1. Hence Δai = −1, 0, or +1 for a binary neuron. Then

ΔE=ΔA MBT=iΔaijbjmij=iΔaiBMiT.
We need only consider nonzero state changes. If Δai > 0, the state transition law (3) above implies BMiT>0. If Δai < 0, Eq. (3) implies BMiT<0. Hence state change and input sum agree in sign. Hence their product is positive: ΔaiBMiT>0. Hence ΔE < 0. Similarly, the sign law (4) for bj implies ΔE = − A M ΔBT < 0. Since M was an arbitrary n × p real matrix, this proves that every matrix is bivalently bidirectionally stable.

III. BAM Correlation Encoding

Which BAM matrix M best encodes m binary pairs (Ai,Bi)? The correlation encoding scheme in Eq. (1) suggests adding the outer-product matrices AiTBi pointwise, at least to facilitate forward recall. Will this work for backward recall? The linearity of the transpose operator implies that it will:

MT=(A1TB1)T+(AmTBm)T=B1TA1++BmTAm.
However, the additive scheme (1) implies that if we use only binary vectors, M will contain no inhibitory synapses. So the input sums BMiT and A Mj can never be negative. So the state transition laws (3) and (4) imply that ai = bj = 1 once ai and bj turn on, which they probably will after the first update. Exceptions can occur for initial null vectors or a null matrix M, when ai = bj = 0.

Bipolar state vectors do not produce this problem. Suppose (Xi,Yi) is the bipolar version of the binary pair (Ai,Bi), i.e., binary zeros are replaced with minus ones, i.e., Xi = 2 AiI and Yi = 2BiI, where I is a unit vector of n-many or p-many ones. Then the ijth entry of XkTYk is excitatory (+1) if the vector elements xik and yjk agree in sign, inhibitory (−1) if they disagree in sign. This is simple conjunctive or Hebbian correlation learning. Thus the sum M of bipolar outer-product matrices

M=X1TY1++XmTYm
naturally weights the excitatory and inhibitory connections. Multiplying M or MT by binary or bipolar vectors produces input sums of different signs, so Eqs. (3) and (4) are not trivialized.

Note that to encode m binary vectors A1, …, Am in a unidirectional autoassociative memory matrix, Eq. (8) reduces to the symmetric matrix X1TX1++XmTXm, which is the storage mechanism used by Hop-field[15] (who also zeros the main diagonal to improve recall). Note also that the pair (Ai,Bi) can be unlearned or forgotten (erased) by summing XiTYi, or, equivalently, by encoding (Aic,Bi) or (Ai,Bic) since bipolar complements are given by Xic=Xi and Yic=Yi. Equation (8) allows data to be read, written, or erased from memory. Further, (Xic)TYic=XiTYi, so storing (Ai,Bi) through Eq. (8) implies storing (Aic,Bic) as well.

Strictly speaking bipolar correlation learning laws such as Eq. (8) can be biologically implausible. They imply that synapses can change character from excitatory to inhibitory, or inhibitory to excitatory, with successive experience. This is seldom observed with real synapses. However, when the number of stored patterns m is fairly large, |mij| > 0 tends to hold. So the addition or deletion of relatively few patterns does not on average change the sign of mij.

Is it better to use binary or bipolar state vectors for recall from Eq. (8)? In [Ref. 10] we prove that bipolar coding is better on average. Much of the argument can be seen from the properties of the bipolar signal–noise expansion

XiM=(XiXiT)Yi+ji(XiXjT)Yj=nYi+ji(XiXjT)Yj=jcijYj,
where cij=cji=XiXjT.

The cij are correction coefficients. Ideally the cij will behave in sign and magnitude so as to move Yj closer to Yi and give Yj more positive weight the closer Yj is to Yj. Then the right-hand side of Eq. (9) will tend to equal a positive multiple of Yi and thus threshold to Yi or Bi. When the input X is nearer Xi than all other Xj, the subsequent output Y should tend to be nearer Yi than all other Yj. When Y is fed back through MT, the output X′ should tend to be even closer to Xi than X was, and so on. Combining this argument with the signal–noise expansion (9) and its transpose-based backward analog, we obtain an estimate of the BAM storage capacity for reliable recall: m < min(n,p). No more data pairs can be stored and accurally recalled than the lesser of the vector dimensions used.

This analysis explains much BAM behavior without Lyapunov techniques. However, such accurate decoding implicitly assumes that if stored input patterns are close, stored output patterns are close. Specifically we make the continuity assumption:

1/nH(Ai,Aj)~1/pH(Bi,Bj),
where H(…) denotes Hamming or l1 distance. This is an implicit assumption of continuous mapping networks. When a data set substantially violates it, as in the parity mapping, which indicates whether there is an even or odd number of ones in a bit vector, supervised learning techniques such as backward error propagation[17][20] are preferable.

Do the correction coefficients cij behave as desired? They do, when (10) holds, in the sense that they naturally connect bipolar and binary spaces:

cij<>0iffH(Ai,Aj)<>n/2.
Expression (11) follows from
cij=XiXjT=(number of common elements)(number of different elements)=[nH(Ai,Aj)]H(Ai,Aj)=n2H(Ai,Aj).
If Aj is more than half the space away, so to speak, from Ai, and thus by (10) if Bj is approximately more than half the space away from Bi, the negative sign of cij corrects Yj by converting it to Yjc, which is a better approximation of Yi since Bjc is approximately less than half the space away from Bi. The magnitude of cij then further corrects Yj by directly approaching the maximum signal amplification factor, n, as H(BiBjc) approaches 0. If Aj is less than half the space away from Ai, then cij > 0 and cij approaches n as H(Bi,Bj) approaches 0. If Aj is equidistant between Ai and Aic, then cij = 0. Finally, bipolar coding of state vectors is better on average than binary coding in the sense that on average
AiXiT<>cijiffH(Ai,Aj)<>n/2
tends to hold. So on average the cij always correct better in magnitude than the mixed coefficients AiXjT and sometimes the mixed coefficients can have the wrong sign.

Consider a simple example. Suppose we wish to store two pairs given by

A1=(101010)B1=(1100),A2=(111000)B2=(1010).
Note that the vectors are nonorthogonal and that the continuity assumption (10) holds since 1/6 H(A1,A2) = 1/3 ∼ 1/2 = 1/4 H(B1,B2). Convert these binary pairs to bipolar pairs:
X1=(111111)Y1=(1111),X2=(111111)Y2=(1111).
Convert the bipolar pairs (Xi,Yi) to correlation matrices XiTYi:
X1TY1=(111111111111111111111111),X2TY2=(111111111111111111111111).
Then M is given by M=X1TY1+X2TY2:
M=(200202202002200202202002).
Then, using binary vectors for recall for ease of computing, we see that
A1M=(4224)(1100)=B1,A2M=(4224)(1010)=B2,
on using the threshold signal function (4) and, on using Eq. (3),
B1MT=(222222)(101010)=A1,B2MT=(222222)(111000)=A2,
The use of synchronous updates, combined with satisfying the continuity assumption and the memory capacity constraint [2 < min(6,4)], produced instant convergence to the local energy minima E(A1,B1)=A1MB1T=(4224)(1100)T=6=E(A2,B2). Suppose we perturb A2 by 1 bit. In particular, suppose we present an input A = (0 1 1 0 0 0) to the BAM. Then
A M=(2222)(1010)=B2,
and thus A evokes the resonant pair (A2,B2) with initial energy E(A,B2) = −4. Now suppose an input A = (0 0 0 1 1 0) is presented to the BAM. Since H(A,A1) = 3 < 5 = H(A,A2), we might expect A to evoke the resonant pair (A1,B1). In fact
A M=(2222)(0101)=B2c,
and B2c in turn recalls A2c, which recalls B2c, etc., with energies E(A,B2c)=4>6=E(A2c,B2c) since H(A,A2c)=1. We recall that the bipolar correlation encoding scheme (8) stores (Aic,Bic) when it stores (Ai,Bi).

Figure 1 displays snapshots of asynchronous BAM recall. Approximately six neurons update between snapshots. The spatial alphabetic associations (S,E), (M,V), and (G,N) are stored. FA contains n = 10 × 14 = 140 neurons. FB contains p = 9 × 12 = 108 neurons. A 40% noise corrupted version (99 bits randomly flipped) of (S,E) is presented to the BAM and (S,E) is perfectly recalled, illustrating the global order-from-chaos aesthetic appeal of asynchronous BAM operation.

BAMs are also natural structures for optical implementation. Perhaps the simplest all-optical implementation is a holographic resonator with M housed in a transmission hologram sandwiched between two phase-conjugate mirrors. Figures 2 and 3 display two different optical BAMs discussed in [Ref. 21]. Figure 2 displays a simple matrix–vector multiplier BAM with M represented by a 2-D grid of pixels with varying transmittances. Figure 3 displays a BAM based on a volume reflection hologram. The box labeled threshold device accepts a weak signal image on one side and produces an intensified and contrast-enhanced version of the image on its output side. The Hughes liquid crystal light valve or two-wave mixing are two ways to implement such a device. Note that the configuration requires the hologram to be read with light of two different polarizations. Hence diffraction efficiency of holograms recorded as birefringence patterns in photorefractive crystals will be somewhat compromised.

IV. Continuous BAMs

A continuous BAM[10],[11] is specified by, for example, the additive dynamic system

a˙i=ai+jS(bj)mij+Ii,
b˙j=bj+iS(ai)mij+Jj,
where the overdot denotes time differentiation. The activations ai and bj can take on arbitrary real values. S is a sigmoid signal function. More generally, we shall only assume that S is bounded and strictly monotone increasing, so that S′ = dS(x)/dx > 0. For definiteness, we assume all signals S(x) are in [0,1] or [−1,1], so that the output (observable) state of the BAM is a trajectory in the product unit hypercube In × Ip where In = [0,1]n or [−1,1]n. For example, in the simulations below we use the bipolar logistic sigmoid S(x) = 2(1 + ecx)−1 for c > 0. Ii and Jj are constant external inputs.

The first term on the right-hand sides of Eqs. (14) and (15) are STM passive decay terms. The second term is the endogenous feedback term. It sums gated bipolar signals from all neurons in the opposite field. The third term is the exogenous input, which is assumed to change so slow relative to the STM reaction times that it is constant. Of course both right-hand sides of Eqs. (14) and (15) are in general multiplied by time constants, as is each term. We omit these constants for notational convenience.

The additive model [Eqs. (14) and (15)] can be extended to a shunting[8] or multiplicative model that allows multiplicative self-excitation through the term (Aiai)[S(ai)+IiE] and multiplicative cross-inhibition through a similar term, where Ai (Bj) is the positive upper bound on the activation of ai (bj), and IiI(JjI) and IiE(JjE) are the respective constant non-negative inhibitory and excitatory inputs to ai (bj). The shunting model can then be written

a˙i=ai+(Aiai)[S(ai)+IiE]ai[jmijS(bj)+IiI],
b˙j=bj+(Bjbj)[S(bj)+JjE]bj[imijS(ai)+JjI].
The inhibitory shunt ai (bj) can be replaced with Ci + ai (Dj + bj) where Ci (Dj) is a non-negative constant. Then the range of ai (bj) is the interval [−Ci,Ai] ([−Dj,Bj]). The bidirectional stability of systems (16) and (17) follows from the same source of stability as the additive model, the bidirectional/heteroassociative extension of the Cohen-Grossberg theorem.[16] The thrust of this extension is to symmetrize an arbitrary rectangular connection matrix M by forming the zero-block diagonal matrix N:
(0MMT0),
so that N = NT. Thus the bidirectional heteroassociative procedure is converted to a large-scale unidirectional autoassociative procedure acting on the augmented state vectors C = [A|B], for which the Cohen-Grossberg theorem applies. The subsumption of the unidirectional version of Eqs. (16) and (17) by fixed-weight competitive networks is discussed in [Ref. 16]. The Cohen-Grossberg theorem is further extended in the next section when we prove the stability of adaptive BAMs. For simplicity we shall continue to analyze only the additive model, which subsumes the symmetric unidirectional autoassociative circuit model put forth by Hopfield[22] when M = MT.

As shown by Kosko,[10],[11] the appropriate bounded Lyapunov or energy function E for the additive BAM system [Eqs. (14) and (15)] is

E(A,B)=i0aiS(xi)xidxiijS(ai)S(bj)mijiS(ai)Ii+j0bjS(yj)yjdyjjS(bj)Jj.
The time derivative of E is computed term by term. The objective is to factor out S′ (ai) a˙i from terms involving inputs to ai and S′ (bj) b˙j from terms involving inputs to bj, regroup, then substitute in the STM Eqs. (15) and (16). The time derivative of the integrals is equivalent to the sum of the time derivative of F[ai(t)] for FA terms, of G[bj(t)] for FB terms. The chain rule gives dF/dt = dF/dai dai/dt=S(ai)a˙iai. The FA input term gives S′ (ai) a˙i Ii. The product rule of differentiation is used to compute the time derivative of the quadratic form, which gives the sum of the two endogenous feedback terms in Eqs. (14) and (15) modulated by the respective terms S′ (ai) a˙i and S′ (bi) b˙j. Rearrangement then gives
Ė=iS(ai)a˙i[ai+jS(bj)mij+Ii]iS(bj)b˙j[bj+iS(ai)mij+Jj]=iS(ai)a˙i2jS(bj)b˙j20,
on substituting Eqs. (14) and (15) for the terms in brackets. Since S′ > 0, Eq. (19) implies that Ė = 0 if and only if a˙i=b˙j=0 for all i and j. At equilibrium all activations and signals are constant. Since M was an arbitrary n × p real matrix, this proves that every matrix is continuously bidirectionally stable.

As Hopfield[22] has noted, in the high-gain case when the sigmoid signal function S is steep, the integral terms vanish from Eq. (18). Then the equilibria of the continuous energy E in Eq. (18) are the same as those of the bivalent energy E in Eq. (5), namely, the vertices of the product unit hypercube In × Ip or, equivalently, the binary points in Bn × Bp. Continuous BAM convergence then has an intuitive fuzzy set interpretation. A fuzzy set is simply a point in the unit hypercube In or Ip. Each component of the fuzzy set is a fit[14] (rather than bit) value, indicating the degree to which that element fits in or belongs to the subset. In a unit hypercube, the midpoint of the hypercube, M = (1/2,1/2, …, 1/2) has maximum fuzzy entropy[14] and binary vertices have minimum fuzzy entropy. In a continuous BAM the trajectory of an initial input pattern—an ambiguous or fuzzy key vector—is from somewhere inside In × Ip to the nearest product-space binary vertex. Hence this disambiguation process is precisely the minimization of fuzzy entropy.[11],[14]

V. Adaptive BAMs

BAM convergence is quick and robust when M is constant. Any connection topology always rapidly produces a stable contrast-enhanced STM reverberation across FA and FB. This stable STM reverberation is not achieved with a lateral inhibition or competitive[12],[23] connection topology within the FA and FB fields, as it is in the adaptive resonance model,[4] since there are no connections within FA and FB. The idea behind an adaptive BAM is to gradually let some of this stable STM reverberation seep into the LTM connections M. Since the BAM rapidly converges and since the STM variables ai and bj change faster than the LTM variables mij change in learning, it seems reasonable that some type of convergence should occur if the mij change gradually relative to ai and bj. Such convergence depends on the choice of learning law for mij.

In this section we show that, if mij adapts according to a generalized Hebbian learning law, every BAM adaptively resonates in the sense that all nodes (STM traces) and edges (LTM traces) quickly equilibrate. This real-time learning result extends the Lyapunov approach to the product space In × Ip × Rn×p. The LTM traces mij tend to learn the associations (Ai,Bi) in unsupervised fashion simply by presenting Ai to the bottom-up field of nodes FA and simultaneously presenting Bi to the top-down field of nodes FB. Input patterns sculpt their own attractor basins in which to reverberate. In addition to simple heteroassociative storage and recall, simulation results show that a pure bivalent association (Ai,Bi) can be quickly learned, or abstracted from, noisy gray-scale samples of (Ai,Bi). Many continuous mappings, such as rotation mappings, can also be learned by sampling instantiations of the mappings, often more instantiations than permitted by the storage capacity constraint m < min(n,p) for simple heteroassociative storage.

How should a BAM learn? How should synapse mij change with time given successive experience? In the simplest case no learning occurs, so mij should decay to 0. Passive decay is most simply a model with a first-order decay law:

m˙ij=mij,
so that mij(t) = mij(0) et → 0 as time increases. This simple model contains two ubiquitous features of unsupervised real-time learning models: exponentiation and locality. The mechanism of real-time behavior is exponential modulation. Learning only depends on locally available information, in this case mij. These two properties facilitate hardware instantiation and increase biological plausibility.

What other information is locally available to the synapse mij? Only information about ai and bj. What is the simplest way to additively include information about ai and bj into Eq. (20)? Multiply or add ai and bjaibj or ai + bj. Multiplicative combination is conjunctive; learning requires signals from both neurons. Additive combination is disjunctive; learning only requires signals from one neuron. Hence associative learning favors the product aibj. This choice is also an approximation of the correlation coding scheme (9) and produces a naive Hebbian learning law:

m˙ij=mij+aibj.
Again scale constants can be added as desired. Integration of Eq. (21) shows that, in principle, mij can be unbounded since ai and bj can, in principle, just grow and grow. This possibility is sure to occur in feedback networks. So Eq. (21) is unacceptable. Moreover, on closer examination of mij, which symmetrically connects the ith neuron in FA with the jth neuron in FB, we see that the activations ai and bj are not locally available to mij.

Only the signals S(ai) and S(bj) are locally available to mij. In Eq. (8) the bipolar vectors can be interpreted as vectors of threshold signals. So the simplest way to include the locally available information to mij is to add the bounded signal correlation term S(ai) S(bj) to Eq. (20). We call this a signal Hebb law:

m˙ij=mij+S(ai)S(bj).
Clark Guest (personal communication) notes that (22) is equivalent to the dynamic beam coupling equation in adaptive volume holography. The dynamic system of Eqs. (16), (17), and (22) defines an adaptive BAM. Suppose all nodes and edges have equilibrated. Then the equilibrium value of mij is found by setting the right-hand side of Eq. (22) equal to 0:
mij=Se(ai)Se(bj).

The signal Hebb law is bounded since the signals are bounded. Suppose for definiteness that S is a bipolar signal function. Then

1S(ai)S(bj)1.
The signal product is +1 if both signals are +1 or both are −1. The product is −1 if one signal is +1 and the other is −1. Thus the signal product behaves as a biconditional or equivalence operator in a fuzzy or continuous-valued logic. This biconditionality underlies the interpretation of the association (Ai,Bi) as the conjunction IF Ai THEN Bi, and IF Bi THEN Ai. Moreover, the bipolar endpoints −1 and +1 can be expected to abound with a steep bounded S.

Suppose mij is maximally increasing due to S(ai) S(bj) = 1. Then Eq. (22) reduces to the simple first-order equation

m˙ij+mij=1,
which integrates to
mij(t)=etmij(0)+0te(st)ds=etmij(0)+(1et)1astincreases for any initialmij(0).
Similarly, if mij is maximally decreasing, the right-hand side of Eq. (24) is −1 and mij approaches +1 exponentially fast independent of initial conditions. This agrees with Eq. (23). The signal Hebb law (22) asymptotically approaches the bipolar correlation learning scheme (8) for a single data pair. So the learning BAM for simple heteroassociative storage can still be expected to be capacity constrained by m < min(n,p).

The BAM memory medium produced by Eq. (22) is almost perfectly plastic. Scaling constants in Eq. (22) must be carefully chosen. In particular, the forget term −mij in Eq. (22) must be scaled with a constant less than unity. Otherwise present learning washes away past learning mij(0). In practice this means that a training list of associations (A1,B1), …, (Am,Bm) should be presented to the adaptive BAM system more than once if each pair (Ai,Bi) is presented for the same length of time. Alternatively, the training list can be presented once if the first pair (A1,B1) is presented longer than (A2,B2) is presented, (A2,B2) longer than (A3,B3), (A3,B3) longer than (A4,B4), and so on. This holds because the general integral solution to Eq. (22) is an exponentially weighted average of sampled patterns.

In what sense does the adaptive BAM converge? We prove below that it always converges in the sense that nodes and edges rapidly equilibrate or resonate when environmentally perturbed. Recall and learning can simultaneously occur in a type of adaptive resonance.[4][9]

At this point it is instructive to distinguish simple adaptive BAM behavior from standard adaptive resonance theory (ART) behavior. The high-level processing behavior of the Carpenter-Grossberg[4] ART model can be sketched as follows. Only one node in FB fires at a time, the instar[8] node bj that won the competition for bottom-up activation when a binary input pattern was presented to FA. The winner bj then fans out its spatial pattern or outstar[8] to the nodes in FA. If this fan-out pattern sufficiently matches the input pattern presented to FA, a stable pattern of STM reverberation is set up between FA and FB, learning can occur (but need not), and instar bj has recognized or categorized the input pattern. Otherwise bj is shut off and another instar winner bk fans out its spatial pattern, etc., until a match occurs or, if no match occurs, until the binary input pattern trains some uncommitted node bu to be its instar. Hence each instar node bj in the ART model recognizes or categorizes a single input pattern or set of input patterns, depending on how high a degree of match is desired. Match degree can be deliberately controlled. Direct access to a trained instar is assured only if the input matches exactly, or nearly, the pattern learned by the instar. The more novel the pattern presented to FA, and the higher the desired degree of match, the longer the ART system tends to search its instars to classify it.

In the adaptive BAM every FB node bj in parallel fans out its outstar across FA when a STM pattern is active across FA. The signal Hebb law (22) distributes recognition capability across all the edges of all the bj nodes so that most bivalent associations are unaffected by removing a particular node. The closest analog to a specifiable degree of match in a BAM is the storage-capacity relationship between pattern number and pattern dimensionality, m < min(n,p). The closer m is to the maximum reliable capacity, the greater the match, between an input pattern and a stored association (Ai,Bi), required to evoke (Ai,Bi) into a stable STM reverberation. When m is small relative to the maximum capacity, there tend to be few basins of attractions in the state space In × Ip, the basins tend to have wide diameters, and they tend to correspond to the stored associations (Ai,Bi). Each stored association tends to recognize or categorize a large set of input stimuli. When m is large, there tend to be several basins, with small diameters. When m is large enough, only the exact patterns Ai or Bi will evoke (Ai,Bi). Within capacity constraints, all inputs tend to fall into the basin of the nearest stored association and thus have direct access to nearest stored associations. Novel patterns are classified or misclassified as rapidly as more familiar patterns.

Learning can also occur in an adaptive BAM during the rapid recall process. Familiar patterns tend to strengthen or restrengthen the reverberating associations they elicit. Novel patterns tend to misclassify to spurious energy wells (attractor basins), which in effect recognize them, or by Eq. (22) they tend to dig their own energy wells, which thereafter recognize them. As the simulation results discussed below show, many more patterns can be stably presented to the BAM than min(n,p) if they resemble stored associations. Otherwise the forgetting effects of Eq. (22) prevail and at any moment the adaptive BAM tends to remember no more than the most recent min(n,p)-many distinct inputs (elicited associations).

We now prove that the adaptive BAM converges to local energy minima. Denote the bounded energy function in Eq. (18) by F. Then the appropriate energy or Lyapunov function for the adaptive BAM dynamic system of Eqs. (16), (17), and (22) is simply

E(A,B,M)=F+1/2ijmij2,
since the time derivative of 1/2mij2 is mijm˙ij. This new energy function is bounded since each mij is bounded. When the product rule of differentiation is applied to the time-varying triple product in the quadratic form component of F [Eq. (18)], we get the triple sum
m˙ijS(ai)S(bj)+S(ai)a˙imijS(bj)+S(bj)b˙jmijS(ai).
In the nonlearning continuous BAM the first term of this triple sum was zero and the new sum of squares in Eq. (27) was constant and hence made no contribution to Eq. (19). Now the time derivative of E in Eq. (27) gives, on rearrangement,
Ė=ijm˙ij[S(ai)S(bj)mij]iSa˙i2jSb˙j2=ijm˙ij2iS(ai)a˙i2jS(bj)b˙j20,
on substituting the signal Hebb learning law (22) for the term in brackets in Eq. (28). Hence an adaptive BAM is a dissipative dynamic system that generalizes the nonlearning continuous BAM dissipative system. When energy stability is reached, when Ė = 0, Eq. (28) and S′ > 0 imply that both edges and nodes have stabilized: m˙ij=a˙i=b˙j=0 for all i and j. Hence every signal Hebb BAM adaptively resonates. This result further generalizes in a straightforward way to any number of layered BAM fields that are interconnected, not necessarily contiguously, by Eq. (22).

Can an adaptive BAM learn and recall simultaneously? In the ART model[4] a mechanism of attentional gain control [inhibition due to the sum of FB signals S(bj)] is introduced to enable neurons ai in FA to distinguish environmental inputs I from top-down feedback patterns B. In principle, an attentional gain control mechanism can also be added to an adaptive BAM. Short of this new mechanism, how can neuron ai distinguish external input Ii and internal feedback input from FB? In Eq. (14) these terms both additively effect the time change of ai. So external and internal feedback to ai can only differ in their patterns of magnitude and duration over some short time interval. If the magnitude and duration of inputs are indistinguishable, the inputs are indistinguishable to ai. When they differ, ai can in principle learn and recall simultaneously.

Suppose a randomly fluctuating, uninformative environment confronts the adaptive BAM. Then Ii tends to have zero mean in short time intervals. This allows ai to be driven by internal feedback from FB. If learning is permitted, familiar STM reverberations, evoked perhaps by other ak (or bj), can be strengthened. When Ii remains relatively constant over an interval, a new pattern can be learned, and can be learned while FA and FB reverberate, eventually dominating those reverberations. If the reverberations are spurious, learning is enhanced by appropriately weighting Ii. In simulations, scaling Ii by p, the number of neurons in FB, has proved effective presumably because it balances the magnitude of Ii against the magnitude of the internal FB feedback sum in Eq. (14).

An extension of these ideas is the sampling adaptive BAM. There is a trade-off between learning time and learning samples. The standard learning model is to present relatively few samples for long lengths of learning time, typically until learning converges or is otherwise terminated, as in simple heteroassociative storage, or to present few samples over and over, as in backpropagation.[17][20] In what we shall call sampling learning several samples are presented briefly—typically many more patterns than neuron dimensionality—and the underlying patterns, associations, or mappings are better learned as sample size increases. Learning is not allowed to converge. Only a brief pulse of learning occurs for each sample. When the sampling learning technique is applied to the adaptive BAM, a sampling adaptive BAM results. For example, an adaptive BAM can rapidly learn a rotation mapping, if n = p, by simply presenting a few spatial patterns at FA and concurrently presenting the same pattern rotated some fixed degree at FB. Thereafter any pattern presented at FA produces the stable STM reverberation with the input pattern at FA and its rotated version at FB.

We note that Hecht-Nielsen[24] has developed his feedforward counterpropagation sampling learning technique for learning continuous mappings, and probability density functions that generate mappings, by applying Grossberg's outstar learning theorem[8],[9] and by applying the sampling learning technique to Grossberg's unsupervised competitive learning[2],[23]:

m˙ij=(iimij)bj,
which is also used in the ART model,[4] where (i1, …, in) is a normalized input pattern or probability distribution presented to FA and bj provides competitive modulation, e.g., bj = 1 if bj wins the FB instar competition for activation and bj = 0 otherwise. For simple autoassociative storage the competitive instar learning law (29) is also dimension bounded for non-sampling learning. No more distributions at FA can be recognized at FB than, obviously, the number p of instar nodes at FB. Yet Hecht-Nielsen[24] has demonstrated that sampling learning with Eq. (29) can learn a sine wave, which has minimal dimensionality, well with thirty neurons and a few hundred random samples, almost perfectly with a few thousand random samples.

Figures 46 display the results of a sampling BAM experiment. FA and FB each contain forty-nine grayscale neurons arranged in a 7 × 7 pixel tray. The output of the bipolar logistic signal function S(x) is discretized to six gray-scale levels, where S(x) = −1 is white and S(x) = 1 is black. S(x) = −1 if activation x < −51, S(x) = 1 if x > 51. Forty-eight randomly generated gray-scale noise patterns are presented to the adaptive BAM. The forty-eight samples violate the storage capacity m ≪ min(n,p) for simple heteroassociative storage. Figure 4 displays six of these random samples. Twenty-four of the samples are noisy versions of the bipolar association (Y,W); twenty-four are noisy versions of (B,Z). Noise was created by picking numbers in [−60,60] according to a uniform distribution, then adding them to the activation values, −52 or 52, underlying the bivalent signal values making up (Y,W) and (B,Z). Unlike in simple heteroassociative storage, no sample is presented long enough for learning to fully or nearly converge. Samples are briefly presented four at a time—four from the (Y,W) training set, then four from the (B,Z) training set, then the next four from the (Y,W) training set, and so on to exploit the exponentially weighted averaging effects of the signal Hebb learning law (22).

Figure 5 demonstrates recall and abstraction with the sampling adaptive BAM. A new noisy version of Y is presented to field FA. The initial STM activation across FA and FB is random. The BAM converges to the pure bipolar association (Y,W) it has never experienced but has abstracted from the noisy training samples. As in Plato's theory of ideals—and unlike the naive empiricist denial of abstraction of Locke, Berkeley, and Hume—it is as if the BAM learns redness from red things, smoothness from smooth things, triangularity from triangles, etc., and thereafter associates new red things with redness, not with most-similar old red things.

In Figure 6 the BAM is thinking about the STM reverberation (Y,W). A new noisy version of Z is presented to field FB, superimposing it on the (Y,W) reverberation. The reverberating thought is soon crowded out of STM by the environmental stimulus Z. The BAM again converges to the unobserved pure bipolar association, this time (B,Z), it abstracted from the noisy training samples.

This research was supported by the Air Force Office of Scientific Research (AFOSR F49620-86-C-0070) and the Advanced Research Projects Agency of the Department of Defense under ARPA Order 5794. The author thanks Robert Sasseen for developing all software and graphics.

Figures

 figure: Fig. 1

Fig. 1 Asynchronous BAM recall. Approximately six neurons update per snapshot. The associated spatial patterns (S,E), (M,V), and (G,N) are stored. Field FA contains 140 neurons; FB,108. Perfect recall of (S,E) is achieved when recall is initiated with a 40% noise-corrupted version of (S,E).

Download Full Size | PDF

 figure: Fig. 2

Fig. 2 Matrix–vector multiplier BAM.

Download Full Size | PDF

 figure: Fig. 3

Fig. 3 BAM volume reflection hologram.

Download Full Size | PDF

 figure: Fig. 4

Fig. 4 Sampling adaptive BAM noisy training set. Forty-eight randomly generated gray-scale noise patterns are presented to the system. Unlike in simple heteroassociative storage, no sample is presented long enough for learning to fully or nearly converge. Twenty-four of the samples are noisy versions of the bipolar association (Y,W); twenty-four are noisy versions of (B,Z). Three samples are displayed from each training set. Samples are presented four at a time—from the (Y,W) training set, then four from the (B,Z) training set, then the next four from the (Y,W) training set, etc. Both fields FA and FB contain forty-nine samples, violate the storage capacity m ≪ min(n,p) for simpleheteroassociative storage.

Download Full Size | PDF

 figure: Fig. 5

Fig. 5 Sampling adaptive BAM associative recall and abstraction. A new noisy version of Y is presented to field FA. Initial BAM STM activation across FA and FB is random. The BAM converges to the pure bipolar association (Y,W) it has never experienced but has abstracted from the noisy training samples in Fig. 4.

Download Full Size | PDF

 figure: Fig. 6

Fig. 6 Sampling adaptive BAM STM superimposition and associative recall. A new noisy version of Z is presented to field FB. This time the bipolar association (Y,W) recalled in Fig. 5 is reverberating in STM. This thought is soon crowded out of STM by the environmental stimulus Z. Again the BAM converges to the unobserved pure bipolar association, this time (B,Z), it abstracted from the noisy training samples.

Download Full Size | PDF

References

1. T. Kohonen, “Correlation Matrix Memories,” IEEE Trans. Comput. C-21, 353 (1972). [CrossRef]  

2. T. Kohonen, Self-Organization and Associative Memory (Springer-Verlag, New York, 1984).

3. J. A. Anderson, J. W. Silverstein, S. A. Ritz, and R. S. Jones, “Distinctive Features, Categorical Perception, and Probability Learning: Some Applications of a Neural Model,” Psychol. Rev. 84, 413 (1977). [CrossRef]  

4. G. A. Carpenter and S. Grossberg, “A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine,” Comput. Vision Graphics Image Process. 37, 54 (1987). [CrossRef]  

5. S. Grossberg, “Adaptive Pattern Classification and Universal Recoding, II: Feedback, Expectation, Olfaction, and Illusions,” Biol. Cybern. 23, 187 (1976). [PubMed]  

6. S. Grossberg, “A Theory of Human Memory: Self-Organization and Performance of Sensory-Motor Codes, Maps, and Plans,” Prog. Theor. Biol. 5, 000 (1978).

7. S. Grossberg, “How Does a Brain Build a Cognitive Code?,” Psychol. Rev. 87, 1 (1980). [CrossRef]   [PubMed]  

8. S. Grossberg, Studies of Mind and Brain: Neural Principles of Learning, Perception, Development, Cognition, and Motor Control (Reidel, Boston, 1982).

9. S. Grossberg, The Adaptive Brain, I and II (North-Holland, Amsterdam, 1987).

10. B. Kosko, “Bidirectional Associative Memories,” IEEE Trans. Syst. Man Cybern. SMC-00, 000 (1987).

11. B. Kosko, “Fuzzy Associative Memories,” in Fuzzy Expert Systems, A. Kandel, Ed. (Addison-Wesley, Reading, MA, 1987).

12. S. Grossberg, “Contour Enhancement, Short Term Memory, and Constancies in Reverberating Neural Networks,” Stud. Appl. Math. 52, 217 (1973).

13. W. S. McCulloch and W. Pitts, “A Logical Calculus of the Ideas Immanent in Nervous Activity,” Bull. Math. Biophys. 5, 115 (1943). [CrossRef]  

14. B. Kosko, “Fuzzy Entropy and Conditioning,” Inf. Sci. 40, 165 (1986). [CrossRef]  

15. J. J. Hopfield, “Neural Networks and Physical Systems with Emergent Collective Computational Abilities,” Proc. Natl. Acad. Sci. U.S.A. 79, 2554 (1982). [CrossRef]   [PubMed]  

16. M. A. Cohen and S. Grossberg, “Absolute Stability of Global Pattern Formation and Parallel Memory Storage by Competitive Neural Networks,” IEEE Trans. Syst. Man Cybern. SMC-13, 815 (1983). [CrossRef]  

17. D. B. Parker, “Learning Logic,” Invention Report S81-64, File 1, Office of Technology Licensing, Stanford U. (Oct. 1982).

18. D. B. Parker, “Learning Logic,” TR-47, Center for Computational Research in Economics and Management Science, MIT (Apr. 1985).

19. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning Internal Representations by Error Propagation,” ICS Report 8506, Institute for Cognitive Science, U. California San Diego (Sept. 1985).

20. P. J. Werbos, “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences,” Ph.D. Dissertation in Statistics, Harvard U. (Aug. 1974).

21. B. Kosko and C. Guest, “Optical Bidirectional Associative Memories,” Proc. Soc. Photo-Opt. Instrum. Eng. 758, (1987).

22. J. J. Hopfield, “Neurons with Graded Response Have Collective Computational Properties Like Those of Two-State Neurons,” Proc. Natl. Acad. Sci. U.S.A. 81, 3088 (1984). [CrossRef]   [PubMed]  

23. S. Grossberg, “Adaptive Pattern Classification and Universal Recoding, I: Parallel Development and Coding of Neural Feature Detectors,” Biol. Cybern. 23, 121 (1976). [CrossRef]   [PubMed]  

24. R. Hecht-Nielsen, “CounterPropagation Networks,” in Proceedings, First International Conference on Neural Networks (IEEE, New York, 1987).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (6)

Fig. 1
Fig. 1 Asynchronous BAM recall. Approximately six neurons update per snapshot. The associated spatial patterns (S,E), (M,V), and (G,N) are stored. Field FA contains 140 neurons; FB,108. Perfect recall of (S,E) is achieved when recall is initiated with a 40% noise-corrupted version of (S,E).
Fig. 2
Fig. 2 Matrix–vector multiplier BAM.
Fig. 3
Fig. 3 BAM volume reflection hologram.
Fig. 4
Fig. 4 Sampling adaptive BAM noisy training set. Forty-eight randomly generated gray-scale noise patterns are presented to the system. Unlike in simple heteroassociative storage, no sample is presented long enough for learning to fully or nearly converge. Twenty-four of the samples are noisy versions of the bipolar association (Y,W); twenty-four are noisy versions of (B,Z). Three samples are displayed from each training set. Samples are presented four at a time—from the (Y,W) training set, then four from the (B,Z) training set, then the next four from the (Y,W) training set, etc. Both fields FA and FB contain forty-nine samples, violate the storage capacity m ≪ min(n,p) for simpleheteroassociative storage.
Fig. 5
Fig. 5 Sampling adaptive BAM associative recall and abstraction. A new noisy version of Y is presented to field FA. Initial BAM STM activation across FA and FB is random. The BAM converges to the pure bipolar association (Y,W) it has never experienced but has abstracted from the noisy training samples in Fig. 4.
Fig. 6
Fig. 6 Sampling adaptive BAM STM superimposition and associative recall. A new noisy version of Z is presented to field FB. This time the bipolar association (Y,W) recalled in Fig. 5 is reverberating in STM. This thought is soon crowded out of STM by the environmental stimulus Z. Again the BAM converges to the unobserved pure bipolar association, this time (B,Z), it abstracted from the noisy training samples.

Equations (42)

Equations on this page are rendered with MathJax. Learn more.

M = A 1 T B 1 + + A m T B m .
A i M = A i A i T B i + i j ( A i A j T ) B j = B i .
A M B , A M T B , A M B , A M T B , A f M B f , A f M T B f ,
a i = { 1 if B M i T > 0 , 0 if B M i T < 0 ,
b j = { 1 if A M j > 0 , 0 if A M j < 0 ,
E ( A , B ) = ½ A M B T ½ B M T A T = A M B T = i j a i b j m ij ,
E ( A , B ) = A M B T I A T + T A T J B T + S B T .
E ( A , B ) i j | m ij | .
Δ E = Δ A M B T = i Δ a i j b j m ij = i Δ a i B M i T .
M T = ( A 1 T B 1 ) T + ( A m T B m ) T = B 1 T A 1 + + B m T A m .
M = X 1 T Y 1 + + X m T Y m
X i M = ( X i X i T ) Y i + j i ( X i X j T ) Y j = n Y i + j i ( X i X j T ) Y j = j c ij Y j ,
1 / n H ( A i , A j ) ~ 1 / p H ( B i , B j ) ,
c ij < > 0 iff H ( A i , A j ) < > n / 2 .
c ij = X i X j T = ( number of common elements ) ( number of different elements ) = [ n H ( A i , A j ) ] H ( A i , A j ) = n 2 H ( A i , A j ) .
A i X i T < > c ij iff H ( A i , A j ) < > n / 2
A 1 = ( 1 0 1 0 1 0 ) B 1 = ( 1 1 0 0 ) , A 2 = ( 1 1 1 0 0 0 ) B 2 = ( 1 0 1 0 ) .
X 1 = ( 1 1 1 1 1 1 ) Y 1 = ( 1 1 1 1 ) , X 2 = ( 1 1 1 1 1 1 ) Y 2 = ( 1 1 1 1 ) .
X 1 T Y 1 = ( 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ) , X 2 T Y 2 = ( 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ) .
M = ( 2 0 0 2 0 2 2 0 2 0 0 2 2 0 0 2 0 2 2 0 2 0 0 2 ) .
A 1 M = ( 4 2 2 4 ) ( 1 1 0 0 ) = B 1 , A 2 M = ( 4 2 2 4 ) ( 1 0 1 0 ) = B 2 ,
B 1 M T = ( 2 2 2 2 2 2 ) ( 1 0 1 0 1 0 ) = A 1 , B 2 M T = ( 2 2 2 2 2 2 ) ( 1 1 1 0 0 0 ) = A 2 ,
A M = ( 2 2 2 2 ) ( 1 0 1 0 ) = B 2 ,
A M = ( 2 2 2 2 ) ( 0 1 0 1 ) = B 2 c ,
a ˙ i = a i + j S ( b j ) m ij + I i ,
b ˙ j = b j + i S ( a i ) m ij + J j ,
a ˙ i = a i + ( A i a i ) [ S ( a i ) + I i E ] a i [ j m ij S ( b j ) + I i I ] ,
b ˙ j = b j + ( B j b j ) [ S ( b j ) + J j E ] b j [ i m ij S ( a i ) + J j I ] .
( 0 M M T 0 ) ,
E ( A , B ) = i 0 a i S ( x i ) x i d x i i j S ( a i ) S ( b j ) m ij i S ( a i ) I i + j 0 b j S ( y j ) y j d y j j S ( b j ) J j .
Ė = i S ( a i ) a ˙ i [ a i + j S ( b j ) m ij + I i ] i S ( b j ) b ˙ j [ b j + i S ( a i ) m ij + J j ] = i S ( a i ) a ˙ i 2 j S ( b j ) b ˙ j 2 0 ,
m ˙ ij = m ij ,
m ˙ i j = m ij + a i b j .
m ˙ ij = m ij + S ( a i ) S ( b j ) .
m ij = S e ( a i ) S e ( b j ) .
1 S ( a i ) S ( b j ) 1 .
m ˙ ij + m ij = 1 ,
m ij ( t ) = e t m ij ( 0 ) + 0 t e ( s t ) ds = e t m ij ( 0 ) + ( 1 e t ) 1 as t increases for any initial m ij ( 0 ) .
E ( A , B , M ) = F + 1 / 2 i j m ij 2 ,
m ˙ ij S ( a i ) S ( b j ) + S ( a i ) a ˙ i m ij S ( b j ) + S ( b j ) b ˙ j m ij S ( a i ) .
Ė = i j m ˙ ij [ S ( a i ) S ( b j ) m ij ] i S a ˙ i 2 j S b ˙ j 2 = i j m ˙ ij 2 i S ( a i ) a ˙ i 2 j S ( b j ) b ˙ j 2 0 ,
m ˙ ij = ( i i m ij ) b j ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.