44
EE141 1 Memory Memory Janusz A. Starzyk Computational Computational Intelligence Intelligence Based on a course taught by Prof. Randall O'Reilly University of Colorado and Prof. Włodzisław Duch Uniwersytet Mikołaja Kopernika

EE141 1 Memory Janusz A. Starzyk Computational Intelligence Based on a course taught by Prof. Randall O'ReillyRandall O'Reilly University of Colorado and

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

EE1411

MemoryMemory

Janusz A. Starzyk

Computational IntelligenceComputational Intelligence

Based on a course taught by Prof. Randall O'Reilly University of Colorado and Prof. Włodzisław DuchUniwersytet Mikołaja Kopernika

EE1412

General remarksGeneral remarksMemory is any persistent effect of experience.

Memory is seemingly uniform, but in reality it is very differentiated:

spatial, visual, aural, recognition, declarative, semantic, procedural,

explicit, implicit …

Here we test mechanisms, so the primary division is: Synaptic memory (physical changes in synapses), long-term and

requiring activation to have some influence on functioning. Dynamic memory, active, temporary activations, affects current

functioning. Long-term priming, based on synaptic memory, yielding to fast

modification – semantic and procedural memory are the result of

slow processes. Short-term priming, based on active memory.

EE1413

General remarksGeneral remarksMemory Types

Working memoryShort term memory

Long term memory

Declarative Nondeclarative

Facts Events Manual skills

Conditioning Priming

Emotional Motor

NeocortexCerebellumNuclei

Parietal cortexPrefrontal cortexLimbic system

STM LTM

EE1414

3 regions3 regionsPC – rear parietal cortex and motor cortex; distributed representations, spatial memory, long-term priming, associations, deductions, schemes.

FC – prefrontal cortex, isolated representations, disruption control, working memory.

HC – hippocampus formation, episodic memory, spatial memory, declarative memory, sparse representations, good image separation.

Slow learning, statistically relevant relationships => procedural and semantic memory, cortical; fast => episodic, HC.

Retaining active information and simultaneously accepting new information, eg. multiplying in your head 12*6, requires FC.

EE1415

Slow/rapid learningSlow/rapid learningA neurons learns situational

probability, correlations between

the desired activity and input

signals; optimal value of 0.7 is

reached rapidly only with a small

learning constant of 0.005

Every experience is a small fragment of uncertain, potentially useful

knowledge about the world => stability of one's image of the world requires

slow learning, integration leads to forgetting individual events. Relevant new information is learned after a single exposure. Lesions in the formation of the hippocampus cause subsequent amnesia. The neuromodulation system reaches a compromise of stability/plasticity.

EE1416

Complementary learning systemsComplementary learning systems

EE1417

Active memory and primingActive memory and primingDistributed overlapping representations in the PC can efficiently record information about the world, but this is not very precise and blurs with the passage of time.

FC – prefrontal cortex, stores isolated representations; increases memory stability.

The effects of priming are evident in people with a damaged hippocampus, cortical priming in the PC is possible.

We will differentiate many forms of priming:

length (short-term, long-term), type of information (visual, lexical), similarity (repetition, semantic).

EE1418

PrimingPrimingStandard: completing roots, after reading a list of words we get a root

and must add the ending, eg.

rea---

If reaction was on the list earlier, then it is usually chosen.

The interval of time can be about an hour, so active memory can't be

responsible for this.

Homophones: read, reed.

Completion: "It was found that the ...eel is on the ...", in which the last

word is "orange, wagon, shoe, table” is heard as:

"peel is on the orange",

"wheel is on the wagon",

"heel is on the shoe"

"meal is on the table".

EE1419

Priming modelPriming modelProject wt_priming.proj, Chapter 9 from

(http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Wt_Priming)

View Events: the first 3 have the same input images, but different output images, in

total 13 pairs x 2 outputs = 26 combinations, IA - IB

Attention: we're not yet learning the AB-AC lists, just the effect of learning.

EE14110

Exploring the modelExploring the modelView TrainLog and evaluation of the result:

similarity of the output image, summarized as a yellow line, the name of the most similar event, measured by sm_nm = binary errors in the names of the closest events, part of the result not very similar to the given: A B.

In blue both_err = 1 only if this isn't one of the two acceptable output images.

Noise helps to break through impasses but it also causes a small lack of stabilisation of already-learned images.

EE14111

Further testsFurther testsTest_logs: first we will check if there are some tendencies, and then if we can teach a network to change preference after the presentation of IA and then IB.

wt_update=Test, Test does one epoch, check Trial1_TextLog: ev_nm is either IA, or IB, and sm_nm is either 0 or 1, randomly. In Epoch1_TextLog we can see that there is always one of the two results, in sum 13/26, or half the time: there is no tendency. We check whether one exposure changes anything.

wt_update => On_Line, learning after every event, Run Test, the frequency increases significantly to 18 and then 25 times.

Conclusions: just error reduction gives mixed outputs A and B, a network without kWTA won't learn this task. The parietal cortex can be responsible for long-term priming.

EE14112

AB-AC LearningAB-AC LearningPeople are able to learn two lists, word pairs A-B, and then A-C, eg.

window-mind

bike-trash

....

and then:

window-train

bike-cloud

without greater interference, doing well on tests for AB and AC.

Networks with only error correction forget catastrophically!

Interference results from using the same elements and weights to learn

different associations.

It's necessary to use different units, or to learn with context.

EE14113

AB-AC ModelAB-AC ModelProject ab_ac_interference.proj

(http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_AB-

AC_List_Learning)

View Events_AB, Events_AC,

Output: either A, or C, the context differentiates.

Replication of catastrophic learning:

View: Train_graph_log, red = errors, yellow = tests for AB.

The test shows that after learning AC, the network forgets AB, many units

in the hidden layer take part in the learning of both lists.

EE14114

AB-AC ModelAB-AC Modelhid_kwta 12=>4 to decrease the number of active elements. The test, but without changes.

Increase the variance of initial values.wt_var 0.25=>0.4

Stronger influence of contextfm_context 1=>1.5

Hebbian learning hebb 0.01=>0.05

Decrease the rate of learning lrate => 0.1, Batch

Nothing here clearly helps but the catastrophes are less likely...

Two systems of learning are clearly necessary, a fast one and a slow one – cortex and hippocampus.

EE14115

HippocampusHippocampusAnatomy and connections of the

structures of the hippocampal

formation: signals reach from uni-

and multimodal association areas

through the Entorminal Cortex

(EC).

EE14116

More anatomyMore anatomy

Hippocampus = king of the

cortex

Bidirectional connections with the

entorhinal cortex:

olfactory bulb,

cingulate cortex,

superior temporal gyrus (STG),

insula,

orbitofrontal cortex.

EE14117

More anatomyMore anatomy

Sporadic activation

Representations in CA3 and CA1

are focused on specific

stimuli, while in the

subiculum and the entorhinal

cortex they are strongly

distributed.

EE14118

Hippocampal formationHippocampal formationModel contains structures:

dentate gyrus (DG),

areas CA1 and CA3,

entorhinal cortex (EC).

Pct Act = % of activation.

EE14119

Separation and conjunction of imagesSeparation and conjunction of images

CA1 separates by conjunction of images(representations)

It's also able to recreate the original activation from the EC by reversible connections

The hippocampus rapidly associates various representations of the cortex.

Creates episodic memory

Completes activations recreated from the memory and separates them into clearly distinct meanings

Sparse encoding eases the separation of meanings

EE14120

Model of the hippocampusModel of the hippocampusProject hip.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN

1_Hippocampus)

Input signals enter through the entorhinal

cortex (EC_in), to the dentate gyrus

DG and the CA3 area,

DG also influences CA3, where received

signals can be completed through

associations.

CA3 has strong internal connections. CA1

has more distributed sparse

representations => EC_out.

EC: 144 el = 4*36; 1 of 4 active.

DG: 625 el, CA3: 240 el

CA1: 384 el = 12 col * 32 el

EE14121

Exploration of the hippocampal model Exploration of the hippocampal model Learning of AB – AC associations without interference.Autoassociations: EC_in = EC_out, reversible transformations.

BuildNet, View_Train_Trial_Log will show the statistics. The input includes information about the input and output images and the list. StepTrain: units chosen in the previous step have white outlines.Partial overlapping of images in EC_in, DG, CA3, CA1.Training epoch: 10 list elements + 3 test sets: AB, AC, newView Test_Logs => text and graph logtrain_updt = no_updt to the test log, Run will do 3 epochs, the results are in Text_log, 70% remembered from the AB list and 100% from the AC list.Set test_uodt = no_updt, the network will more rapidly finish 3 training/test epochs.Test analysis: test_updt = Cycle_updt, Clear Trial1_1_Text_logStepTest, we see only A + context, we see how the image completes.

EE14122

Further explorationFurther explorationTarg in Network shows what image was learned, act targIn TextLog, stim_er_on = proportion of units erroneously activated in EC_out, stim_er_off = erroneously not activated in EC_out.

In Trial_1_GraphLog we can see these two numbers after every test, for known images they're small, correct memories, for new ones they're large, but on ~0,5 and off ~0.8, the network rarely fails. To move to list AC we turn off Test_updt = Trial_updt (or no_updt)and StepTest until in text_log, epc_ctrl changes to 1. These are events for list AC: the network does not recognize them (rmbr=0) because it hasn't learned them yet. Train_Epcs=5, train_env=Train_AC, Run and check results.

EE14123

SummarySummaryThe hippocampal model can rapidly, sequentially learn associations AB – AC without excessive interference. For this it was sufficient to use the Hebbian contrast rule, CPCA and the correct architecture.

Interference results from using the same units, in CA3 it arrives at separation of identical images (representations) learned in another context.

Separation of images doesn't allow associations, inferences based on similarity, efficient encoding of multidimensional information.

The conjunction of images happens in CA1.

This suggests a complementary role of the hippocampus, supplementing the slow learning mechanisms of the cortex.

The hippocampus can remember episodes helping in spatial orientation, create conjunctive representations connecting different stimuli together quicker than the cortex.

EE14124

MemoryMemoryMemory is not uniform

1. Weights (long-term, require activation) vs activations (short-term, already activated, can influence processing)

2. Based on weights The cortex has initial states but suffers from catastrophic influences. The hippocampus can learn fast without influences, using sparse

distributed representations of images

3. Based on activation The cortex shows initial states but isn't good for short-term memory

4. Cooperation of activation and memory based on weights

5. Video 1. short-term memory in chimpanzees -30 sec2. Comparison with students– 30 sec

EE14125

Active short-term memoryActive short-term memoryShort-term priming: attention and influence on reaction speed.Besides the duration, memory content and effects resulting from similarity are like long-term priming.

Project act_priming.proj. (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Act_Priming)

Completing roots or homophony, but without learning, only the influence of the remains after the last activation.The network has learned series IA-IB.The test has a series of images and results A and B, we show it A upon output, the network responds A; now we show the image for B but only phase is turned on – (lack of learning), the network's result is sometimes A, sometimes B.

LoadNet, View TestLogs,TestThe correlations of previous results A and B depend on the speed of fading of activation; check efekt act_decay 1 => 0, tendency to leaving a.Analyze the influence on results in test_log.

EE14126

Active maintenanceActive maintenanceProject act_maint.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_Active_Maintenance): active maintenance of information in working memory despite interference, quickly accessible, doesn't require synaptic changes. Recurrence is necessary, an attractor network with a large pool of attraction, resistant to noise.

Video – remembering with delay – 30 sec

The processes of analysing environmental data don't require such networks, because they are steered by incoming information.

Activation should diverse, enabling associations and inferences, while we have external signals this will suffice, eg. if we note on paper the results of intermediate operations.

With a lack of external activations, we have to rely on actively maintained representations in working memory, which has serious limits (famous Miller's 72, and even 42 for complex objects).

First a model without attractors, which requires external signals, then distributed representations, but shallow attractors, not very resistant to noise; in the end deep but localised attractors, which disable associations.

EE14127

Maintenance modelMaintenance modelProject act_maint.proj.

3 objects, 3 elements (features)

r.wt, View Grid_log, Run: if there is an input activation is maintained, but after removal it disperses (the network blurred...). Check influence wt_mean =0.5, wt_var = 0.1, 0.25, 0.4

Net_Type Higher_order: we add combinations of feature pairs.

Defaults, Run, add noise_var=0.01, the network forgets...

EE14128

Isolated representationsIsolated representationsDefault to return to initial parameters.network = IsolatedNetLack of connections between hidden units, but there is recurrence, activation doesn't fade. Noise = 0.01 doesn't interfere, but with 0.02 sometimes gets ruined. Is it worth learning to focus in spite of noise?

Different task: does stimulus S(t) = S(t+2)?

Parameters: input_data = MaintUpdateEnv, network Isolated, noise 0.01

Init, Run: there are two inputs, Input 1 and 2, wt_scale 1=>2, changes the strength of local connections. The network can be switched from fast actualization to long-lasting maintenance.How to do this automatically? Dopamine and dynamic regulation of reward in the PFC.

EE14129

Working memoryWorking memoryThe prefrontal cortex plays the central role in maintaining active working memory and has desired properties: isolated self-activating attractor networks with extensive pools.

Neuroanatomy, PFC connections and microcolumns => specialized area for active memory.

A. PR – spatial. B. PR - spatial, self-ordered tasks. C. PR - spatial, object and verbal, self-ordered tasks and analytical

thinking. D. PR - objects, analytical thinking.

Typical experiments require delayed choice and show the differences between PC, IT, which have only temporary stimulus representations, and PFC, which maintains them longer.

EE14130

Role of dopamineRole of dopamineBlocking of dopamine has a negative influence on working memory, and aiding it has a positive influence.

Dopamine (DA) arrives from the VTA (ventral tegmental area).DA strengthens internal activations, regulating access to working memory. VTA displays such increased activity.

Basal ganglia can also regulate PFC activity.

TD – temporal Difference in RL

EE14131

Working memoryWorking memoryProject pfc_maint_updt.proj (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_PFC_Maint_Updt)

Dynamic "gate” AC added to the network with recurrence and learning based on temporal differences (TD).

Inputs: A, B, C, D

Ignore, Store, Recall decides what to do with them

PFC is working memory, AC = adaptive critic is a reward system (dopamine) controlling information renewal in the PFC, hidden layer represents the parietal cortex, hidden 2 maps to the output (frontal cortex). AC learns to predict the next reward, modulating the strength of internal PFC connections.

EE14132

PFC ModelPFC Modelr.wt: one-to-one connections between input, hidden layers and the PFC.

AC has connections with the hidden layer and the PFC, but reverse connections AC => PFC serve only to modulate.

Act, Step: we observe phases – and +, at first the activation of PFC and AC is zero, there are two + steps, first to change PFC weights, and then to set the correct signal propagation.

When signal R appears (reminders), the network will not act correctly at first, the reward in AC is 0.

At first the network doesn't know what's going on, learning only on Store, Ignore hidden layer 2, but sometimes noise in the PFC will cause the correct result and reward to appear.

View Epoch_log, observe the change in weight of unit AC, r.wt

Weights of S => AC should increase and error will decrease, the yellow line is the number of incorrect predictions of AC.

View, Grid_log, Clear, act, Step. Store introduces data to the PFC, but Ignore doesn't. After Recall, PFC is zeroed.

EE14133

A- not BA- not BInteractions between active and synaptic memory - weights have already changed but active memory is in a different state: what wins? These interactions are visible in the developing brains of children ~ 8 months (Piaget 1954), experiments done also on animals.

A toy (food) is hidden in box A and after a short delay the child (animal) can remove it from there. After several repetitions in A, the toy is hidden in box B; the children keep looking in A.

Active memory doesn't work in children as efficiently as synaptic memory, lesions in the area of the prefrontal cortex cause similar effects in adult and infant rhesus monkeys. Children make fewer errors looking in the direction of the place where the toy was hidden, than reaching for it. There are many interesting variants of this type of experiment and explanations on different levels.

EE14134

Project A- not BProject A- not BDecision-making process model: we know that information about place and objects is divided, so this information is given on input: place A, B, C, toy T1 or T2 and cover C1 or C2.

Synaptic memory is realized with the help of standard CPCA Hebbian learning, and active memory as bi-directional connections between network representations in the hidden layer.

Output layers: decisions about the direction of looking and reaching.

The direction of looking is always activated during each experience, reaching is activated less often, only after moving the whole set-up toward the child, so these connections will rely on weaker learning.

Initial tendency: agreement of looking and reaching on A (weight 0.7). All inputs connected with hidden neurons, weight 0.3.

Project a_not_b.proj. (http://grey.colorado.edu/CompCogNeuro/index.php/CECN1_A_Not_B)

EE14135

Experiment 1Experiment 1rect_ws =0.3 decides on the strength of recurrent activations in the hidden layer (working memory), changing this parameter simulates a child's development.

View Events: 3 types of events, initial showing 4x, then A 2x, then B 1 x. An event has 4 temporal segments: 1) start, pretrial – boxes covered; 2) presentation, toy hidden in A; 3) expectation – toy in A; 4) choice – possible reaching.

Only visible elements are active. View: Grid_log, Run performs the entire experiment, turns off display. ViewPre shows on Grid_log, A is activatedViewA shows A tests, after learning. ViewB shows B tests: the network makes an error.

EE14136

Further experimentsFurther experimentsActivation in the hidden layer flows toward the representation associated from A. rect_ws 0.3 => 0.75 for a mature child. Run, ViewBAlthough synaptic memory didn't change, more efficient working memory enables the undertaking of correct action.

Try for rect_ws = 0.47 i 0.50What happens? There is no activity – hesitation?

The results depend on the length of the delay, with a shorter delay there are fewer errors.Delay 3=>1Do tests for rect_ws = 0.47 i 0.50

What happens with a very young child? rect_ws = 0.15, delay = 3; Weak recurrence, weak learning for A.

EE14137

Other types of memoryOther types of memoryThe traditional approach to memory assumes functional, cognitive, monolithic, canonical representations in memory.

From modeling, it turns out that there are many systems interacting with each other which are responsible for memory, with different characteristics, variable representations and types of information.

Recognition memory: was an element of the list seen earlier?

A "recognition" signal is enough, remembering is not necessary.

A hippocampus model is also useful here, it allows for remembering, but this is too much – in recognition memory the central role seems to be played by the area of the perirhinal cortex.

Cued recall - completion of missing information.

Free recall – effects of placement on the list (best at the beginning and the end), as well as grouping (chunking) of information.

EE14138

Learning categoriesLearning categoriesLearning categoriesLearning categories

Categorization in psychology - many theories. Classic experiments: Shepard et al. (1961), Nosofsky et al. (1994).

Problems with an increasing degree of complexity, division into categories C1, C2, 3 binary properties: color (black/white), size (small/large), shape (,). Type I: one property defines the category.

Type II: two properties, XOR, np. Cat A: (black,large) or (white,small), any shape.

Type III-V: one property + increasingly more exceptions.

Type VI: lack of rules, enumeration

Difficulties and speeds of learning: Type I < II < III ~ IV ~ V < VI

EE14139

Canonical dynamicCanonical dynamicCanonical dynamicCanonical dynamic

What happens in the brain while learning category definitions based on examples? Complex neurodynamics <=> the simplest dynamics (canonical). For all logical rules, we can write corresponding equations.

For type II problems, or XOR:

22 2 2

2 2 2

2 2 2

2 2 2

1, , 3

4

3

3

3

V x y z xyz x y z

Vx yz x y z x

xV

y xz x y z yy

Vz xy x y z z

z

Feature

area

EE14140

Against majorityAgainst majorityAgainst majorityAgainst majorityList: diseases C or R, symptoms PC, PR, I

Disease C is associated with symptoms (PC, I), disease R with (PR, I); C happens 3 times more often than R. (PC, I) => C, PC => C, I => C.

Predictions „against majority” (Medin, Edelson 1988). Although PC + I + PR => C (60%), PC + PR => R (60%)Neurodynamic attractor pools?

PDF in areas {C, R, I, PC, PR}.

Psychological interpretation (Kruschke 1996): PR has meaning even though this is a differentiating symptom, although PC is more common. Activation PR + PC more often leads to result R although the gradient in direction R is greater.

EE14141

LearningLearningLearningLearning

Neurodynamics Psychology

I+PC is more common => stronger synaptic connections, larger and deeper attractor basins.

Symptoms I, PC are typical for C since they happen more often.

To avoid attractors around I+PC leading to C, a deeper and more localized attractor around I+PR is created.

For rare disease R, symptom I is not distinct, so attention focuses on PR associated with R.

Point of view

EE14142

TestingTestingTestingTesting

Neurodynamics Psychology

Point of view

Activating only I leads to C since more examples of I+PC create a larger shared attractor basin than I+PR.

I => C, in accordance with expectations, more frequent stimuli I+PC are recalled more often.

Activation by I+PC+PR leads frequently to C, because I+PC puts the system in the middle of the large C basin and even for PR gradients still lead to C.

I+PC+PR => C because all symptoms are present and C is more frequent (base rates again).

Activation by PR+PC leads more frequently to R because the attractor basin for R is deeper, and the gradient at (PR,PC) leads to R.

PC+PR => R because R is distinct symptom, although PC is more common.

EE14143

SummarySummarySummarySummary Knowledge formed in memory is

built, dynamic, continuous, appearing Behavior and inhibition of knowledge are the result of

dynamic information processing rather than interaction structures set at the top.

Recognition is based on the ability to differentiate earlier-learned activations from new, unknown activations.

The hippocampus ensures high-quality recognition with a high threshold guaranteeing association of earlier-learned activations.

Priming contributes to slow building of inviariant representations

Two learning mechanisms Based on connection weights Based on neuron activation

EE14144

SummarySummarySummarySummary The cortex helps recognition by priming The cortex leads to unstimulated associations The cortex is responsible for working memory

cooperating with the hippocampus Sequences of grouped representations are stored in

long-term memory Memory based on activation requires combining

quick-actualizing with stable representations The hippocampus uses sparse distributed

representations for fast learning without mixing ideas Priming memory can be long-term (based on weights)

or short-term (based on activation)