1
What is modularity good for? What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson ([email protected], [email protected], [email protected]) Developmental Neurocognition Lab, Birkbeck College, University of London, UK. Abstract Abstract Method Method Results Results This research was supported by UK MRC CE Grant G0300188 to Michael Thomas Acknowledgements 1. Calabretta, R., Di Ferdinando, A., Wagner, G. P., & Parisi, D. (2003). What does it take to evolve behaviorally complex organisms? Biosystems, 69, 245- 262. 2. Fodor, J. A. (1983). The modularity of mind. CUP. 3. Pinker, S. (1991). Rules of language, Science, 253, 530-535. 4. Pinker, S. (1997). How the mind works. Allen Lane. 5. Pinker, S. (1999). Words and rules. London: Weidenfeld & Nicolson 6. Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-layered perceptron. Cognition, 38, 43-102. References References Table 1. Architectures with different modular commitments Discussion & Conclusions Discussion & Conclusions The modular solution was the least efficient use of the computational resources. How general is this finding? It is suggestive that in adaptive systems, co-operative use of resources to drive outputs is better than competitive use when the mechanisms received the same input. The modular solution may be superior when a common input must drive separate outputs, and the two output tasks rely on independent information present in the input. Figure 1. Computational resources used to learn past tense problem. (S) selection mechanism allows components to learn domain-specific mappings. (C) competition mechanism determines which mechanism drives the output (cf. Pinker’s ‘blocking’ device). Modular solution used S+C. Emergent solution uses neither. Redundant Modularity was initially proposed as an efficient architecture for low-level perceptual processing (Fodor, 1983). Latterly it was extended as a principle that might apply to high-level cognition, with the architecture shaped by natural selection (Pinker, 1997). How do modular systems learn their abilities? Table 1 shows some simple architectures with different modular commitments. Calabretta et al. (2003) trained a network with a common visual input to output ‘what’ and ‘where’ information about objects presented on the retina. They found modular processing channels were the optimal architecture for learning (Table 1, #7 was better than #5). ‘What’ and ‘where’ information is independent and modularity prevents interference. Pinker (1991) proposed that modularity would aid language development. E.g., in the English past tense, there is a duality between regular verbs (talk-talked) + rule generalisation (wug-wugged) and exceptions (hit-hit, drink-drank, go-went). When children learn the past tense, they shown intermittent over-application of the rule to exceptions (e.g., *drinked). Pinker argued for a modular architecture, with a rule-learning mechanism and an exception-learning mechanism. Over-application errors arise as the child learns to co-ordinate the mechanisms. However, the model has never been implemented. We explored the developmental trajectory of a modular approach to past tense acquisition (Table 1, #3), and contrasted it with non-modular ways of using the same initial computational resources. Does the modular solution show the predicted advantage? Introduction Introduction Modular systems have been proposed as efficient architectures for high-level cognition. However, such architectures are rarely implemented as developmental systems. Taking the example of past tense, we find that pre-specified modularity is less efficient for learning than using the same computational resources in different ways (to produce emergent or redundant systems). Modularity is good for: When computational components drive separate outputs and the information required by each output is independent Modularity is bad for: When components receive information from a common input and have to drive a common output What’s the problem? The modules try to drive the common output in different ways and the competition between them must be resolved. Co-operative processing is more efficient Output Input Hidden S C U OUTPUT Common Separate PROCESSING RESOURCES Common Separate Common Separate I N P U T 1 3 5 7 2 4 6 Common Separate 8 Phonological specification of past tense problem (Plunkett & Marchman, 1991) Use a 2-layer connectionist network for the optimised rule- learning device (Pinker’s rule learning device not specified in sufficient detail to implement) Use a 3-layer connectionist network for the optimal learning of arbitrary associations Can use these same resources in three ways: Pre-specified modularity => 2-layer network trained on regular verbs; 3-layer network trained on exceptions; strongest signal drives output Emergent specialisation => 2-layer network and 3-layer network both adapt to reduce error at output; networks demonstrate partial emergent All architectures exhibited a phase of interference (*drinked) errors (Fig.2). These were not solely diagnostic of the modular solution. Was the modular solution best? No, it was worse than both emergent and redundant solutions, and indeed failed to learn the exception verbs to ceiling (Fig.3). The modular solution struggled to resolve the competition between the different output responses of the two modules. Indeed, because the regular mechanism was learning a simpler function, it produced a stronger response than the exception mechanism and generally overrode it. Note 1: Results were sensitive to hidden unit resource levels in the (3-layer) exception mechanism. Results for both low and high resources shown. Note 2: Pinker (1999) proposed a Revised Dual Mechanism model, in which the regular mechanism learns regulars but the exception mechanism attempts all verbs. Results also shown for this architecture. Low resou rces - Excep tion rou te 0% 20% 40% 60% 80% 100% 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 Correct response . M odular Em ergent Redundant H ig h resou rces - Excep tion rou te 0% 20% 40% 60% 80% 100% 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 Correct response . M odular Em ergent Redundant Revised D M O ver-ap p lication of th e ru le to excep tion verb s 0% 20% 40% 60% 80% 100% 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 Interference errors M odular Em ergent Redundant R evised D M Figure 2. Interference errors Figure 3. Developmental trajectories Hit-ed Drink-ed Go-ed Low exception resources Hit-ed Drink-ed Go-ed High exception resources Talked Hit Drank Went Wugged Gktted Training set Novel items Talked Hit Drank Went Wugged Gktted Training set Novel items Train in g set: M od u lar (low exception resources) 0% 20% 40% 60% 80% 100% 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 Proportion correct . 1 2 5 10 50 100 200 250 1000 N ovelitem s: M od u lar (low exception resources) 0% 20% 40% 60% 80% 100% 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 Proportion of responses 1 2 5 10 50 100 200 250 1000 Train in g set: M od u lar (h ig h excep tion resou rces) 0% 20% 40% 60% 80% 100% 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 Proportion correct 1 2 5 10 50 100 200 250 1000 N ovelitem s: M odular (high exception resources) 0% 20% 40% 60% 80% 100% 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 1 5 25 100 500 Proportion of responses 1 2 5 10 50 100 200 250 1000 Talked Hit Drank Went Talked Hit Drank Went Wugged Gktted Frinked Frank Wugged Gktted Frinked Frank Figure 4. Developmental trajectories while boosting the signal strength from the exception mechanism (biasing factor x1 to x1000) The modular solution had fast rule learning and strong generalisation of the rule. But when the signal from the exception mechanism was boosted to allow exceptions to drive the output, so that these verbs could be learned to ceiling, the advantage on rule learning was lost. No level of exception signal boosting gave the modular solution an advantage over emergent or redundant architectures.

What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson

  • Upload
    ankti

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Figure 2. Interference errors. Figure 3. Developmental trajectories. Talked Hit Drank Went Wugged Gktted Training set Novel items. Hit-ed Drink-ed Go-ed Low exception resources. - PowerPoint PPT Presentation

Citation preview

Page 1: What is modularity good for? Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson

What is modularity good for?What is modularity good for?

Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson([email protected], [email protected], [email protected])

Developmental Neurocognition Lab, Birkbeck College, University of London, UK.

What is modularity good for?What is modularity good for?

Michael S. C. Thomas, Neil A. Forrester, Fiona M. Richardson([email protected], [email protected], [email protected])

Developmental Neurocognition Lab, Birkbeck College, University of London, UK.

AbstractAbstract

MethodMethod

ResultsResults

This research was supported by UK MRC CE Grant G0300188 to Michael Thomas

Acknowledgements

1. Calabretta, R., Di Ferdinando, A., Wagner, G. P., & Parisi, D. (2003). What does it take to evolve behaviorally complex organisms? Biosystems, 69, 245-262.

2. Fodor, J. A. (1983). The modularity of mind. CUP.3. Pinker, S. (1991). Rules of language, Science, 253, 530-535.4. Pinker, S. (1997). How the mind works. Allen Lane.5. Pinker, S. (1999). Words and rules. London: Weidenfeld & Nicolson6. Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-

layered perceptron. Cognition, 38, 43-102.

ReferencesReferences

Table 1. Architectures with different modular commitments

Discussion & ConclusionsDiscussion & Conclusions

The modular solution was the least efficient use of the computational resources. How general is this finding? It is suggestive that in adaptive systems, co-operative use of resources to drive outputs is better than competitive use when the mechanisms received the same input. The modular solution may be superior when a common input must drive separate outputs, and the two output tasks rely on independent information present in the input.

Figure 1. Computational resources used to learn past tense problem. (S) selection mechanism allows components to learn domain-specific mappings. (C) competition mechanism determines which mechanism drives the output (cf. Pinker’s ‘blocking’ device). Modular solution used S+C. Emergent solution uses neither. Redundant solution uses only C.

Modularity was initially proposed as an efficient architecture for low-level perceptual processing (Fodor, 1983). Latterly it was extended as a principle that might apply to high-level cognition, with the architecture shaped by natural selection (Pinker, 1997).

How do modular systems learn their abilities? Table 1 shows some simple architectures with different modular commitments. Calabretta et al. (2003) trained a network with a common visual input to output ‘what’ and ‘where’ information about objects presented on the retina. They found modular processing channels were the optimal architecture for learning (Table 1, #7 was better than #5). ‘What’ and ‘where’ information is independent and modularity prevents interference.

Pinker (1991) proposed that modularity would aid language development. E.g., in the English past tense, there is a duality between regular verbs (talk-talked) + rule generalisation (wug-wugged) and exceptions (hit-hit, drink-drank, go-went). When children learn the past tense, they shown intermittent over-application of the rule to exceptions (e.g., *drinked). Pinker argued for a modular architecture, with a rule-learning mechanism and an exception-learning mechanism. Over-application errors arise as the child learns to co-ordinate the mechanisms. However, the model has never been implemented.

We explored the developmental trajectory of a modular approach to past tense acquisition (Table 1, #3), and contrasted it with non-modular ways of using the same initial computational resources. Does the modular solution show the predicted advantage?

IntroductionIntroduction

Modular systems have been proposed as efficient architectures for high-level cognition. However, such architectures are rarely implemented as developmental systems. Taking the example of past tense, we find that pre-specified modularity is less efficient for learning than using the same computational resources in different ways (to produce emergent or redundant systems).

Modularity is good for: When computational components drive separate outputs and the information required by each output is independent

Modularity is bad for: When components receive information from a common input and have to drive a common output

What’s the problem? The modules try to drive the common output in different ways and the competition between them must be resolved. Co-operative processing is more efficient

Output

Input

Hidden

S

C

U

OUTPUT Common Separate PROCESSING RESOURCES Common Separate Common Separate

I NPUT

1 3 5

7

2 4

6

Common

Separate 8

Phonological specification of past tense problem (Plunkett & Marchman, 1991)

• Use a 2-layer connectionist network for the optimised rule-learning device (Pinker’s rule learning device not specified in sufficient detail to implement)

• Use a 3-layer connectionist network for the optimal learning of arbitrary associations

Can use these same resources in three ways:

Pre-specified modularity => 2-layer network trained on regular verbs; 3-layer network trained on exceptions; strongest signal drives output

Emergent specialisation => 2-layer network and 3-layer network both adapt to reduce error at output; networks demonstrate partial emergent specialisation of function (regulars+rule to 2-layer, exceptions to 3-layer)

Redundant => 2-layer and 3-layer both separately trained on whole past tense problem; strongest signal drives output

All architectures exhibited a phase of interference (*drinked) errors (Fig.2). These were not solely diagnostic of the modular solution.

Was the modular solution best? No, it was worse than both emergent and redundant solutions, and indeed failed to learn the exception verbs to ceiling (Fig.3). The modular solution struggled to resolve the competition between the different output responses of the two modules. Indeed, because the regular mechanism was learning a simpler function, it produced a stronger response than the exception mechanism and generally overrode it. Note 1: Results were sensitive to hidden unit resource levels in the (3-layer) exception mechanism. Results for both low and high resources shown.

Note 2: Pinker (1999) proposed a Revised Dual Mechanism model, in which the regular mechanism learns regulars but the exception mechanism attempts all verbs. Results also shown for this architecture.

Low resources - Exception route

0%

20%

40%

60%

80%

100%

1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0

Co

rrect

resp

on

se .

Modular

Emergent

Redundant

High resources - Exception route

0%

20%

40%

60%

80%

100%

1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0

Co

rrect

resp

on

se

.

Modular

Emergent

Redundant

Revised DM

Over-application of the rule to exception verbs

0%

20%

40%

60%

80%

100%

1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0

Inte

rfere

nce e

rrors

.

Modular

Emergent

RedundantRevised DM

Figure 2. Interference errors

Figure 3. Developmental trajectories

Hit-ed Drink-ed Go-ed Low exception resources

Hit-ed Drink-ed Go-ed High exception resources

Talked Hit Drank Went Wugged Gktted Training set Novel

items

Talked Hit Drank Went Wugged Gktted Training set Novel

items

Training set: Modular (low exception resources)

0%

20%

40%

60%

80%

100%

1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0

Pro

po

rtio

n c

orr

ect

.

12510501002002501000

Novel items: Modular (low exception resources)

0%

20%

40%

60%

80%

100%

1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0

Pro

port

ion

of

resp

on

ses .

12510501002002501000

Training set: Modular (high exception resources)

0%

20%

40%

60%

80%

100%

1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0

Pro

port

ion

corr

ect

.

12510501002002501000

Novel items: Modular (high exception resources)

0%

20%

40%

60%

80%

100%

1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0 1 5

25

10

0

50

0

Pro

port

ion

of

resp

on

ses .

1251050100 200 2501000

Talked Hit Drank Went Talked Hit Drank Went

Wugged Gktted Frinked Frank Wugged Gktted Frinked Frank

Figure 4. Developmental trajectories while boosting the signal strength from the exception mechanism (biasing factor x1 to x1000)

The modular solution had fast rule learning and strong generalisation of the rule. But when the signal from the exception mechanism was boosted to allow exceptions to drive the output, so that these verbs could be learned to ceiling, the advantage on rule learning was lost. No level of exception signal boosting gave the modular solution an advantage over emergent or redundant architectures.