Exploring Robotic Mindsdl.booktolearn.com/...exploring_robotic_minds_92a1.pdfPart I On the Mind 1. Where Do We Begin with Mind? 3 2. Cognitivism 9 2.1 Composition and Recursion in

i

Exploring Robotic Minds

ii

OXFORD SERIES ON COGNITIVE MODELS AND ARCHITECTURES

Series Editor

Frank E. Ritter

Series Board

Rich Carlson

Gary Cottrell

Robert L. Goldstone

Eva Hudlicka

William G. Kennedy

Pat Langley

Robert St. Amant

Integrated Models of Cognitive Systems

Edited by Wayne D. Gray

In Order to Learn: How the Sequence of Topics Influences Learning

Edited by Frank E. Ritter, Joseph Nerb, Erno Lehtinen, and Timothy O’Shea

How Can the Human Mind Occur in the Physical Universe?

By John R. Anderson

Principles of Synthetic Intelligence PSI: An Architecture of Motivated Cognition

By Joscha Bach

The Multitasking Mind

By David D. Salvucci and Niels A. Taatgen

How to Build a Brain: A Neural Architecture for Biological Cognition

By Chris Eliasmith

Minding Norms: Mechanisms and Dynamics of Social Order in Agent Societies

Edited by Rosaria Conte, Giulia Andrighetto, and Marco Campennì

Social Emotions in Nature and Artifact

Edited by Jonathan Gratch and Stacy Marsella

Anatomy of the Mind: Exploring Psychological Mechanisms and Processes

with the Clarion Cognitive Architecture

By Ron Sun

Exploring Robotic Minds: Actions, Symbols, and Consciousness

as Self- Organizing Dynamic Phenomena

By Jun Tani

1

iii

Exploring Robotic MindsActions, Symbols, and Consciousness as Self- Organizing Dynamic Phenomena

Jun Tani

1

iv

Oxford University Press is a department of the University of Oxford. It furthersthe University’s objective of excellence in research, scholarship, and educationby publishing worldwide. Oxford is a registered trade mark of Oxford UniversityPress in the UK and certain other countries.

Published in the United States of America by Oxford University Press198 Madison Avenue, New York, NY 10016, United States of America.

© Oxford University Press 2017

All rights reserved. No part of this publication may be reproduced, stored ina retrieval system, or transmitted, in any form or by any means, without theprior permission in writing of Oxford University Press, or as expressly permittedby law, by license, or under terms agreed with the appropriate reproductionrights organization. Inquiries concerning reproduction outside the scope of theabove should be sent to the Rights Department, Oxford University Press, at theaddress above.

You must not circulate this work in any other formand you must impose this same condition on any acquirer.

Library of Congress Cataloging- in- Publication DataNames: Tani, Jun, 1958– author.Title: Exploring robotic minds : actions, symbols, and consciousness as self-organizing dynamic phenomena / Jun Tani.Description: Oxford; New York: Oxford University Press, [2017] | Series: Cognitive models and architectures | Includes bibliographical references and index.Identifiers: LCCN 2016014889 (print) | LCCN 2016023997 (ebook) | ISBN 9780190281069 (hardcover : alk. paper) | ISBN 9780190281076 (UPDF)Subjects: LCSH: Artificial intelligence. | Robotics. | Cognitive neuroscience.Classification: LCC Q335 .T3645 2017 (print) | LCC Q335 (ebook) | DDC 629.8/9263—dc23LC record available at https://lccn.loc.gov/2016014889

9 8 7 6 5 4 3 2 1

Printed by Sheridan Books, Inc., United States of America

v

v

Contents

Foreword by Frank E. Ritter ix

Preface xiii

Part I On the Mind

1. Where Do We Begin with Mind? 3

2. Cognitivism 9

2.1 Composition and Recursion in Symbol Systems 9 2.2 Some Cognitive Models 13 2.3 The Symbol Grounding Problem 16 2.4 Context 18 2.5 Summary 19

3. Phenomenology 21

3.1 Direct Experience 22 3.2 The Subjective Mind and Objective World 23 3.3 Time Perception: How Can the Flow of Subjective

Experiences Be Objectified? 26 3.4 Being- in- the- World 29 3.5 Embodiment of Mind 32 3.6 Stream of Consciousness and Free Will 37 3.7 Summary 41

vi Contents

vi

4. Introducing the Brain and Brain Science 43

4.1 Hierarchical Brain Mechanisms for Visual Recognition and Action Generation 44

4.2 A New Understanding of Action Generation and Recognition in the Brain 55

4.3 How Can Intention Arise Spontaneously and Become an Object of Conscious Awareness? 69

4.4 Deciding Among Conflicting Evidence 75 4.5 Summary 77

5. Dynamical Systems Approach for Modeling Embodied Cognition 81

5.1 Dynamical Systems 83 5.2 Gibsonian and Neo- Gibsonian Approaches 93 5.3 Behavior- Based Robotics 103 5.4 Modeling the Brain at Different Levels 109 5.5 Neural Network Models 112 5.6 Neurorobotics from the Dynamical Systems

Perspective 125 5.7 Summary 136

Part II Emergent Minds: Findings from Robotics Experiments

6. New Proposals 141 6.1 Robots with Subjective Views 141 6.2 Engineering Subjective Views into Neurodynamic

Models 143 6.3 The Subjective Mind and the Objective World

as an Inseparable Entity 148

7. Predictive Learning About the World from Actional Consequences 151

7.1 Development of Compositionality: The Symbol Grounding Problem 152

7.2 Predictive Dynamics and Self- Consciousness 161 7.3 Summary 172

Contents vii

vii

8. Mirroring Action Generation and Recognition with Articulating Sensory– Motor Flow 175

8.1 A Mirror Neuron Model: RNNPB 177 8.2 Embedding Multiple Behaviors in Distributed

Representation 180 8.3 Imitating Others by Reading Their Mental States 182 8.4 Binding Language and Action 190 8.5 Summary 196

9. Development of Functional Hierarchy for Action 199

9.1 Self- Organization of Functional Hierarchy in Multiple Timescales 203

9.2 Robotics Experiments on Developmental Training of Complex Actions 209

9.3 Summary 216

10. Free Will for Action and Conscious Awareness 219

10.1 A Dynamic Account of Spontaneous Behaviors 219 10.2 Free Will, Consciousness, and Postdiction 230 10.3 Summary 239

11. Conclusions 243

11.1 Compositionality in the Cognitive Mind 243 11.2 Phenomenology 247 11.3 Objective Science and Subjective Experience 251 11.4 Future Directions 255 11.5 Summary 262

Glossary for Abbreviations 269

References 271

Index 289

viii

ix

ix

ForewordFrank E. Ritter

This book describes the background and results from Jun Tani’s work of over a decade of building robots that think and learn through interaction with the world. It has numerous useful and deep lessons for modelers developing and using symbolic, subsymbolic, and hybrid architectures, so I am pleased to see it in the Oxford Series on Cognitive Models and Architectures. It is work that is in the spirit of Newell and Simon’s (1975) theory of empirical exploration of computer science topics and their work on generation of behavior, and also takes Newell and Simon’s and Feynman’s motto of understanding through generation of behavior seri-ously. At the same time, this work extends the physical symbol hypoth-esis in a very useful way by suggesting by example that the symbols of human cognition need not be discrete symbols manually fed into com-puters (which we have often done in symbolic cognitive architectures), but can instead be composable neurodynamic structures arising through iterative learning of perceptual experience with the physical world.

Tani’s work has explored some of the deep issues in embodied cog-nition, about how interaction with the environment happens, what this means for representation and learning, and how more complex behavior can be created or how it arises through more simple aspects. These les-sons include insights about the role of interaction with the environment, consciousness and free will, and lessons about how to build neural net architectures to drive behavior in robots.

x Foreword

x

The book starts with a review of the foundations of this work, includ-ing some of the philosophical foundations in this area (including the symbol grounding problem, phenomenology, and the role of time in thinking). It argues for a role of hierarchy in modeling cognition, and for modeling and understanding interaction with an external world. The book also notes that state space attractors can be a useful concept in understanding cognition, and, I would add, this could be a useful additional way to measure fit of a model to behavior. This review also reminds us of areas that current symbolic models have been uninformed by— I don’t think that these topics have been so much ignored as much as put on a list for later work. These aspects are becoming more timely, as Tani’s work shows they can be. The review chapters make this book particularly useful as an advanced textbook, which Tani already uses it for.

Perhaps more importantly, in the second half of the book (Chapters 6 to 11) Tani describes lessons from his own work. This work argues that behavior is not always programmed or extant in a system, but that it can or often should arise in systems attempting to achieve homeostasis— that there are positions of stability in a mental representation (including modeling others, imitation), and that differences in knowledge between the levels can give rise to effects that might be seen to be a type of con-sciousness, a mental trace of what lower levels should do or are doing, or explanations of what they have done based on predictions of the agent’s own behavior, a type of self- reflexive mental model. These results sug-gest that more models should model homeostasis and include more goals and knowledge about how to achieve it.

His work provides another way of representing and generating behav-ior. This way emphasizes the dynamic behavior of systems rather than the data structures used in more traditional approaches. The simple ideas of evolution of knowledge, feedback, attractors, and further concepts provide food for thought for all systems that generate behavior. These components are reviewed in the first part of the book. The second part of the book also presents several systems used to explore these ideas.

Lessons from this book could and should change how we see all kinds of cognitive architectures. Many of these concepts have not yet been noticed in symbolic architectures, but they probably exist in them. This new way to examine behavior in architectures has provided insights already about learning and interaction and consciousness. Using these concepts in existing architectures and models will provide new insights

Foreword xi

xi

into how compositional thoughts and actions can be generated without facing the notorious problems of the symbol grounding problem or, ulti-mately, the mind– body problem.

In his work about layers of representation, he has seen that higher levels might not just lead the lower levels, but also follow them, adjust-ing their own settings based on the lower levels’ behavior. An interpre-tation of the higher levels trying to follow or predict the lower levels provides a potential computational description and explanation of some forms of consciousness and free will. I found these concepts particu-larly intriguing. Not only that higher levels could follow and not lead lower levels, but that the mismatch could lead to a kind of postdiction in which intention becomes consciously aware after action. We might see this elsewhere as other architectures, their environments, and their interaction with the environment become more complex, and indeed should look for it.

I hope you find the book as useful and suggestive of new areas of work and new aspects of behavior to consider for including in architectures as I have.

xii

xiii

xiii

Preface

The mind is ever elusive, and imagining its underlying mechanisms remains a constant challenge. This book attempts to show a clear pic-ture of how the mind might work, based on tangible experimental data I have obtained over the last two decades during my work to construct the minds of robots. The essential proposal of the book is that the mind is comprised of emergent phenomena, which appear via intricate and often conflictive interactions between the top- down subjective view for proactively acting on the external world and the bottom- up recognition of the resultant perceptual reality. This core idea can provide a scaf-fold to account for the various fundamental aspects of the mind and cognition. Allowing entangling interactions between the top- down and bottom- up processes means that the skills we need to generate complex actions, knowledge, and concepts for representing the world and the linguistic competency we need to express our experiences can naturally develop— and the cogito1 that allows this “compositional” yet fluid think-ing and action appears to be embedded in dynamic neural structures.

The crucial argument here is that this cogito is free from the prob-lems inherent in Cartesian dualism, such as that of interaction and how a nonmaterial mind can cause anything in a material body and world, and vice versa. We avoid such problems because the cogito embedded

1. Cogito is from a Latin philosophical proposition by Rene Descartes “Cogito ergo sum,” which has been translated as “I think, therefore I am.” Here, cogito denotes a subject of cognizing or thinking.

xiv Preface

xiv

in the continuous state space of dynamic neural systems is also matter, rather than nonmatter composed of a discrete symbol system or logic. Therefore, the cogito can interact physically with the external world: As one side pushes forward a little, the other side pulls back elastically so that a point of compromise can be found in conflictive situations through iterative dynamics. It is further proposed that even the nontrivial prob-lem of consciousness (what David Chalmers has called the hard problem of consciousness) and free will can become accessible by considering that consciousness is also an emergent phenomenon of matter arising inevita-bly from such conflictive interactions. The matter here is alive and vivid in never- ending trials by the cogito to comprehend an ever- changing reality in an open- ended world. Each of these statements— my propos-als on the workings of the mind— will be examined systematically by reviewing multidisciplinary discussions, largely from the fields of neuro-science, phenomenology, nonlinear dynamics, psychology, cognitive sci-ence and cognitive robotics. Actually, the book aims for a unique way of understanding the mind from rather an unordinary but inspiring combi-nation of ingredients such as humanoid robots, Heidegger’s philosophy, deep learning neural nets, strange attractor from chaos theory, mirror neurons, Gibsonian psychology, and more.

The book has been written with a multidisciplinary audience in mind. Each of the chapters start by presenting general concepts or tutorials on each discipline— cognitive science, phenomenology, neu-roscience and brain science, nonlinear dynamics, and neural network modeling— before exploring the subjects specifically in relation to the emergent phenomena which I believe constitute the mind. By providing a brief introduction to each topic, I hope that a general audience and undergraduate students with a specific interest in this subject will enjoy reading on to the more technical aspects of the book that describe the neurorobotics experiments.

I have debts of gratitude to many people. First of all, I thank Jeffrey White for plenty of insightful advice on this manuscript in regard to its contents, as well as for editing in English and examining every page. I would like to commend and thank all members of my former labora-tory at RIKEN as well as of the current one in the Korean Advanced Institute of Science and Technology (KAIST) who, over the years, have contributed to the research described in this book. I am lucky to have many research friends with whom I can have in- depth discussions about shared interests. Takashi Ikegami has been one of the most inspiring. His

Preface xv

xv

stroke of genius and creative insights on the topics of life and the mind are irreplaceable. I admit that many of my research projects described in this book have been inspired by thoughtful discussions with him. Ichiro Tsuda provided me deep thoughts about possible roles of chaos in the brain. The late Joseph Goguen and late Francisco Varela generously offered me much advice about the links between neurodynamics and phenomenology. Karl Friston has provided me thoughtful advice in the research of our shared interests on many occasions. Michael Arbib offered insight into the concept of action primitives and mirror neuron model-ing. He kindly read my early draft and sent it to Oxford University Press. I have been inspired by frequent discussions about developmental robot-ics with Minoru Asada and Yasuo Kuniyoshi. I would like to express my gratitude and appreciation to Masahiro Fujita, Toshitada Doi, and Mario Tokoro of Sony Corporation who kindly provided me with the chance to start my neurorobotics studies more than two decades ago in an elevator hall in a Sony building. I must thank Masao Ito and Shun- ichi Amari at RIKEN Brain Science Institute for their thoughtful advice to my research in general. And, I express my gratitude for Miki Sagara who prepared many figures. I am grateful to Frank Ritter as the Oxford series editor on cognitive models and architectures who kindly provided me advice and suggestions from micro details to macro levels of this manuscript during its development. The book could not have been completed in the pres-ent form without his input. I wish to thank my Oxford University Press editor Joan Bossert for her cordial support and encouragement from the beginning. Finally, my biggest thanks go to my wife, Tomoko, who pro-fessionally photographed the book’s cover image; my son, Kentaro; and my mother, Harumi. I could not have completed this book without their patient and loving support.

This book is dedicated to the memory of my father, Yougo Tani, who ignited my interest in science and engineering before he passed away in my childhood. Some additional resources such as robot videos can be found at https:// sites.google.com/ site/ tanioupbook/ home. Finally, this work was partially supported by RIKEN BSI Research Fund (2010-2011) and the 2012 KAIST Settlement and Research of New Instructors Fund, titled “Neuro- Robotics Experiments with Large Scale Brain Networks.”

http://https://sites.google.com/site/tanioupbook/home

xvi

1

Part I

On the Mind

2

3

3

1

Where Do We Begin with Mind?

How do our minds work? Sometimes I notice that I act without much consciousness, for example, when reaching for my mug of coffee on the table, putting on a jacket, or walking to the station for my daily com-mute. However, if something unexpected happens, like I fail to grasp the mug properly or the road to the station is closed due to roadwork, I suddenly become conscious of my actions. How does this conscious-ness arise at such moments? In everyday conversation, my utterances are generated smoothly. I automatically combine words in the correct order and seldom consciously manipulate grammar when speaking. How is this possible? Although it seems that many of our thoughts and actions are generated either consciously or unconsciously by utilizing knowl-edge or concepts in terms of images, rules, and symbols, I wonder how they are actually stored in our memories and how they can be manipu-lated in our minds. When I’m doing something like making a cup of cof-fee, my actions as well as thoughts tend to shift freely from getting out the milk to looking out the window to thinking about whether to stay in for lunch today. Is this spontaneous switching generated by my will? If so, how is such will initiated in my mind in the first place? Mostly, my everyday thinking or action follows routines, habituation, or social conventions. Nevertheless, sometimes some novel images, thoughts, or acts can be created. How are they generated? Finally, a somewhat phil-osophical question arises: How can I believe that this world really exists

4 On the Mind

4

without my subjectively thinking about it? Does my subjective mind subsume the reality of the world or is it the other way around?

The mind is one of the most curious and miraculous things. We know that the phenomena of the mind, like those just described, originate in the brain: We often hear scientists saying that our minds are the prod-ucts of “entangled” activities of neurons firing, synapse modulations, neuronal chemical reactions, and more. Although the scientific liter-ature contains an abundance of detailed information about such bio-logical phenomena in the brain, it is still difficult to find satisfactory explanations about how the mind actually works. This is because each piece of detailed knowledge about the biological brain cannot as yet be connected together well enough to produce a comprehensive picture of the whole. But understanding the mind is not only the remit of scien-tists; it is and has always been the job of philosophers, too. One of the greatest of philosophers, Aristotle, asserted that “The mind is the part of the soul by which it knows and understands” (Aristotle, Trans. 1907). It is hard, however, to link such metaphysical arguments to the actual biological reality of the brain.

Twenty- five years ago, I was a chemical plant engineer with no such thoughts about the brain, consciousness, and existence until something wonderful happened by chance to start me thinking about these things seriously. One day I traveled to a chemical plant site in an isolated area in northern Japan to examine a hydraulic system consisting of piping networks. The pipeline I saw there was huge, with a diameter of more than 1.5 m and a total length of around 20 km. It originated in a ship yard about 10 km away from the plant and inside the plant yard it was connected to a complex of looping networks equipped with various functional components such as automatic control valves, pumps, surge accumulators, and tanks.

I was conducting an emergency shutdown test of one of the huge main valves downstream in the pipeline when, immediately after valve shutdown, I was terrified by the thundering noise of the “water hammer” phenomenon, the loud knocking heard in a pipe caused by an abrupt pressure surge upstream of the valve. Several seconds later I heard the same sound arising from various locations around the plant yard, presumably because the pressure surge had propagated and was being reflected at various terminal ends in the piping network. After some minutes, although the initial thunderous noise had faded, I noticed a strange coherence of sounds occurring across the yard. I heard “a pair”

Where Do We Begin with Mind? 5

5

of water hammers at different places, seeming to respond to each other periodically. This coherence appeared and disappeared almost capri-ciously, arising again in other locations. I went back to the plant control room to examine the operation records, plotting the time history of the internal pressure at various points in the piping network. As I thought, the plots showed some oscillatory patterns of pressure hikes appearing at certain points and tending to transform to other oscillatory patterns within several minutes. Sometimes these patterns seemed to form in a combinatory way, with a set of patterns appearing in different combina-tions with other sets. At that point I jumped on a bicycle to search for more water hammers around the plant yard even though it was already dusk. Hearing this mysterious ensemble of roaring pipes in the darkness, I felt as if I was exploring inside a huge brain, where its consciousness arose. In the next moment, however, I stopped and reflected to myself that this was not actually a mystery at all but complex transient phe-nomena involving physical systems, and I thought then that this might explain the spontaneous nature of the mind.

I had another epiphany several months later when, together with my fellow engineers, I had the chance to visit a robotics research labora-tory, one of the most advanced of its kind in Japan. The researchers there showed us a sophisticated mobile robot that could navigate around a room guided by a map preprogrammed into the robot’s computer. During the demonstration the robot maneuvered around the room, stopped in front of some objects, and said in a synthesized voice, “This is a refrigerator,” “This is a blackboard,” and “This is a couch.” While we all stood amazed at seeing the robot correctly naming the objects around us, I asked myself how the robot could know what a refrigera-tor meant. To me, a refrigerator means the smell of refreshing cool air when I open the door to get a beer on a long hot summer day. Surely the robot didn’t understand the meaning of a refrigerator or a chair in such a way, as these items were nothing more to it than landmarks on a regis-tered computational map. The meanings of these items to me, however, would materialize as the result of my own experiences with them, such as the smell of cool air from the refrigerator or the feeling of my body sinking back into a soft chair as I sit down to drink my beer. Surely the meanings of various things in the world around us would be formed in our brains through the accumulation of our everyday experiences inter-acting with them. In the next moment I started to think about build-ing my own robot, one that could have a subjective mind, experience

6 On the Mind

6

feelings, imagine things, and think about the world by interacting in it. I also had some vague notion that a subjective mind should involve dynamic phenomena fluttering between the conscious and unconscious, just as with the water hammers that had captured my imagination a few months earlier.

Sometime later I went back to school, where I studied many subjects related to the mind and cognition, including cognitive science, robotics, neuroscience, neural network modeling, and philosophy. Each discipline seemed to have its own specific way of understanding the mind, and the way the problems were approached by each discipline seemed too narrow to exchange ideas and views with other disciplines. No single discipline could fully explain what the mind is or how it works. I sim-ply didn’t believe that one day a super genius like Einstein would come along and show us a complete picture of the mind, but rather I sus-pected that a good understanding, if attainable, would come from a mutual, relational understanding between multiple disciplines, enabling new findings and concepts in one domain to be explainable using differ-ent expressions in other disciplines.

It was then it came to me that building robots while taking a mul-tidisciplinary approach could well produce a picture of the mind. The current book presents the outcome of two decades of research under this motivation.

* * *

This book asks how natural or artificial systems can host cognitive minds that are characterized by higher order cognitive capabilities such as compositionality on the one hand and also by autonomy in generating spontaneous interactions with the outer world either con-sciously or unconsciously. The book draws answers from examination of synthetic neurorobotics experiments conducted by the author. The underlying motivation of this study differs from that of conventional intelligent robotics studies that aim to design or program functions to generate intelligent actions. The aim of synthetic neurorobotics studies is to examine experimentally the emergence of nontrivial mindlike phe-nomena through dynamic interactions, under specific conditions and for various “cognitive” tasks. It is like examining the emergence of nontrivial patterns of water hammer phenomena under the specific operational conditions applied in complex pipeline networks.

Where Do We Begin with Mind? 7

7

The synthetic neurorobotics studies described in this book have two foci. One is to make use of dynamical systems perspectives to under-stand various intricate mechanisms characterizing cognitive minds. The dynamical systems approach has been known to be effective in articu-lating mechanisms underlying the development of various functional structures by applying the principles of self- organization from physics (Nicolis & Prigogine, 1977; Haken, 1983). Structures and functions to mechanize higher order cognition, such as for compositional manipu-lations of “symbols,” concepts, or linguistic thoughts, may develop by means of self- organization in internal neurodynamic systems via the consolidative learning of experience. The other focus of these neuro-robotics studies is on the embodiment of cognitive processes crucial to understanding the circular causality arising between body and environ-ment as aspects of mind extend beyond the brain.

This naturally brings us to the distinction between the subjective mind and the objective world. Our studies emphasize top- down inten-tionality on the one hand, by which our own subjective images, views, and thoughts consolidated into structures through past experience are proactively projected onto the objective world, guiding and accompany-ing our actions. Our studies also emphasize bottom- up recognition of the perceptual reality on the other hand, which results in the modification of top- down intention in order to minimize gaps or errors between our prior expectations and actual outcomes. The crucial focus here is on the circular causality that emerges as the result of iterative interactions between the two processes of the top- down subjective intention of act-ing on the objective world and the bottom- up recognition of the objec-tive world with modification of the intention. My intuition is that the key to unlocking all of the mysteries of the mind, including our experi-ences of consciousness as well as free will, is hidden in this as yet unex-plored phenomenon of circular causality and the structure within which it occurs. Moreover, close examination of this structure might help us address the fundamental philosophical problem brought to the fore in mind/ body dualism: how the subjective mind and the objective world are related. The synthetic robotics approach described in this book seeks to answer this fundamental question through the examination of actual experimental results from the viewpoints of various disciplines.

This book is organized into two parts, namely “Part I On the Mind” from chapter 1 to chapter 5 and “Part II Emergent Minds: Findings from Robotics Experiments” from chapter 6 to chapter 11. In Part I, the

8 On the Mind

8

book reviews how problems with cognitive minds have been explored in different research fields, including cognitive science, phenomenol-ogy, brain science, neural network modeling, psychology, and robot-ics. These in- depth reviews will provide general readers with a good introduction to relevant disciplines and should help them to appreci-ate the many conflicting arguments about the mind and brain active therein. Part II starts with new proposals for tackling these problems through neurorobotics experiments, and through analysis of their results comes out with some answers to fundamental questions about the nature of the mind. In the end, this book traces my own journey in exploration of the fundamental nature of the mind, and in retracing this journey I hope to deliver an intuitively accessible account of how the mind works.

9

9

2

Cognitivism

One of the main forces having advanced the study of the mind over the last 50 years is cognitivism. Cognitivism regards the mind as an exter-nally observable object that can be best articulated with symbol systems in computational metaphors, and this approach has become successful as the speed and memory capacity of computers has grown exponen-tially. Let us begin our discussion of cognitivism by looking at the core ideas of cognitive science.

2.1. Composition and Recursion in Symbol Systems

The essence of cognitivism is represented well by the principle of com-positionality (i.e., the meaning of the whole is a function of the mean-ing of the parts), but specifically that as expounded by Gareth Evans (1982) in regard to language. According to Evans, the principle asserts that the meaning of a complex expression is determined by the mean-ings of its constituent expressions and the rules used to combine them (sentences are composed from sequences of words). However, its cen-tral notion that the whole can be decomposed into reusable parts (or primitives) is applicable to other faculties, such as action generation. Indeed, Michael Arbib (1981) in his motor schemata theory, which was

10 On the Mind

10

published not long before Evans’ work on language, proposed that com-plex, goal- directed actions can be decomposed into sequences of behav-ior primitives. Here, behavior primitives are sets of commonly used behavior pattern segments or motor programs that are put together to form streams of continuous sensory- motor flow. Cognitive scientists have found a good analogy between the compositionality of mental pro-cesses, like combining the meanings of words into those of sentences or combining the images of behavior primitives into those of goal- directed actions “at the back of our mind,” and the computational mechanics of the combinatorial operations of operands. In both cases we have con-crete objects— symbols— and distinct procedures for manipulating them in our brains. Because these objects to be manipulated— either by computers or in mental processes— are symbols without any physical dimensions such as weight, length, speed, or force, their manipulation processes are considered to be cost free in terms of time and energy con-sumption. When such a symbol system, comprising arbitrary shapes of tokens (Harnad, 1992), is provided with recursive functionality for the tokens’ operations, it achieves compositionality with an infinite range of expressions.

Noam Chomsky, famous for his revolutionary ideas on generative grammar in linguistics, has advocated that recursion is a uniquely human cognitive competency. Chomsky and colleagues (Hauser, Chomsky, & Fitch, 2002) proposed that the human brain might host two distinct cognitive competencies: the so- called faculty of language in a narrow sense (FLN) and the faculty of language in a broad sense (FLB). FLB com-prises a sensory- motor system, a conceptual- intentional system, and the computational mechanisms for recursion that allow for an infinite range of expressions from a finite set of elements. FLN, on the other hand, involves only recursion and is regarded as a uniquely human aspect of language. FLN is thought to generate internal representations by utiliz-ing syntactic rules and mapping them to a sensory– motor interface via the phonological system as well as to the conceptual– intentional inter-face via the semantic system.

Chomsky and colleagues admit that some animals other than humans can exhibit certain recursion- like behaviors with training. Chimps have become able to count the number of objects on a table by indicating a corresponding panel representing the correct number of objects on the table by association. The chimps became able to count up to around five objects correctly, but one or two errors creep in for more than five

Cognitivism 11

11

objects: The more objects to count, the more inaccurate at counting the chimps become. Another example of recursion- like behavior in animals is cup nesting, a task in which each cup varies in size so that the small-est cup fits into the second smallest, which in turn can be “nested” or “seriated” into larger cups. When observing chimps and bonobos cup nesting, Johnson- Pynn and colleagues (1999) found that performance differed by species as well as among individuals; some individuals could nest only two different sizes of cups whereas others could pair three by employing a subassembly strategy, that is, nesting a small cup in a medium size cup as a subassembly and then nesting them in a large cup. However, the number of nestings never reliably went beyond three. Similar limitations in cup nesting performance have been observed in parrots (Pepperberg & Shive, 2001) and the degu, a small rat- size rodent (Tokimoto & Okanoya, 2004).

These observations of animals’ object counting and nesting cup behaviors suggest that, although some animals can learn to perform recursion- like behaviors, the depth of recursion is quite limited particu-larly when contrasted with humans in whom almost an infinite depth of recursion is possible as long as time and physical conditions allow. Chomsky and colleagues thus speculated that the human brain might be uniquely endowed with the FLN component that enables infinite recur-sion in the generation of various cognitive behaviors including language. What then is the core mechanism of FLN? It seems to be a recursive call of logical rules. In counting numbers, the logical rule of “add one to the currently memorized number” is recursively called: Starting with the currently memorized number set to 0, it is increased to 1, 2, 3, … , infinity as the “add one” rule is called at each recursion. Cup nesting can be performed infinitely when the logical rule of “put the next smallest cup in the current nesting cup” is recursively called. Similarly, in the recursive structure of sentences, clauses nest inside of other clauses, and in sentence generation the recursive substitution of one of the context- free grammar rules for each variable could generate sentences of infinite length after starting with the symbol “S” (see Figure 2.1 for an illustra-tive example).

Chomsky and colleagues’ crucial argument is that the core aspect of recursion is not a matter of what has been learned or developed over a lifetime but what has been implemented as an innate function in the faculty of language in a narrow sense (FLN). In their view, what is to be learned or developed are the interfaces from this core aspect of recursion

12 On the Mind

12

ability to the sensory– motor systems or semantic systems in the faculty of language in a broad sense (FLB). They assert that the unique exis-tence of this core recursive aspect of FLN is an innate component that positions human cognitive capability at the top of the hierarchy of living systems.

Such a view is contentious though. First, it is not realistic to assume that we humans perform infinite recursions in everyday life. We can neither count infinitely nor generate/ recognize infinite- length sen-tences. Chomsky and colleagues, however, see this not as a problem of FLN itself but as a problem of external constraints (e.g., a limitation in working memory size in FLB in remembering currently generated word sequences) or of physical time constraints that hamper perform-ing infinite recursions in FLN. Second, are symbols actually manipu-lated recursively somewhere in our heads when counting numbers or generating/ recognizing sentences? If there are fewer than six objects on a table, the number would be grasped analogically from visual pat-terns; if there are more than six objects, we may start to count them one by one on our fingers. In our everyday conversations we generally talk without much concern for spoken grammar: Our colloquialisms seem to be generated not by consciously combining individual words following grammatical rules, but by automatically and subconsciously

Context-free grammarSentence generation

S

R1

R2 R3

R4Small N Nlike

R2

R5 R5

catsdogs

R2R6

NP

NP NPVA

VP

R1: S → NP VPR2: NP → (A NP)/NR3: VP → V NP

R4: A → SmallR5: N → dogs/catsR6: V → like

Figure 2.1. On the left is a context- free grammar (CFG) consisting of a set of rules and on the right is an example sentence that can be generated by recursive substitutions of the rules with the starting symbol “S” allocated to the top of the parsing tree. Note that the same CFG can generate different sentences, even those with infinite length, depending on the nature of the substituting rules (e.g., repeated substitutions of R2: NP→A NP).

Cognitivism 13

13

combining phrases. However, when needing to write complex embed-ded sentences such as those often seen in formal documents, we some-times find ourselves consciously dealing with grammar in our search for appropriate word sequences. Thus, the notion of there being infinite levels of recursion in FLN might apply only rarely to human cognition. In everyday life, it seems unlikely that an infinite range of expressions would be used.

Many cognitive behaviors in everyday life do still of course require some level of manipulation that involves composition or recursion of information. For example, generating goal- directed action plans by com-bining behavior primitives into sequences cannot be accounted for by the simple involuntary action of mapping sensory inputs to motor out-puts. It requires some level of manipulation of internal knowledge about the world, yet does not involve infinite complexity. How is such process-ing done? One possibility might be to use the core recursive component of calling logical rules in FLN under the limitation of finite levels of recursions. Another possibility might be to assume subrecursive func-tions embedded in analogical processes rather than logical operations in FLB that can mimic recursive operations for finite levels. Cognitivism embraces the former possibility, with its strong conviction that the core aspect of cognition should reside in symbol representation and a manip-ulation framework. But, if we are to assume that symbols play a central role in cognition, how would symbols comprising arbitrary shapes of tokens convey the richness of meaning and context we see in the real world? For example, a typical artificial intelligence system may repre-sent an “apple” with its features “color- is- RED” and “shape- is- SPHERE.” However, this is merely to describe the meaning of a symbol by way of other symbols, and I’m not sure how my everyday experience with apples could be represented in this form.

2.2. Some Cognitive Models

This section looks at some cognitive models that have been developed to solve general cognitive tasks by utilizing the aforementioned symbol-ist framework. The General Problem Solver (GPS) (Newell & Simon, 1972; Newell, 1990) that was developed by Allen Newell and Herbert A. Simon is such a typical cognitive model, which has made a significant impact on the subsequent direction of artificial intelligence research.

14 On the Mind

14

Numerous systems such as Act- R (Anderson, 1983) and Soar (Laird et al., 1987) use this rule- based approach, although it has a crucial prob-lem, as is shown later.

The GPS provides a core set of operations that can be used to solve cognitive problems in various task domains. In solving a problem, the problem space in terms of the goal to be achieved, the initial state, and the transition rules are defined. By following a means- end- analysis approach, the goal to be achieved is divided into subgoals and GPS attempts to solve each of those. Each transition rule is specified by an action operator associated with a list of precondition states, a list of “add” states and a list of “delete” states. After an action is applied, the corresponding “add” states and “delete” states are added to and deleted from the precondition states. A rule actually specifies a possible state transition from the precondition state to the consequent state after applying the action.

Let us consider the so- called monkey– banana problem in which the goal of the monkey is to become not hungry by eating a banana. The rules defined for GPS can be as shown in Table 2.1.

By considering that the goal is [“not hungry”] and the start state is [“at door,” “on floor,” “has ball,” “hungry,” “chair at door”], it can be seen that the goal state [“not hungry”] can be achieved by apply-ing an action of “eat bananas” in Rule 5 if the precondition state of [“has bananas”] is satisfied. Therefore, this precondition state of [“has bananas”] becomes the subgoal to be achieved in the next step. In the

Table 2.1. Example Rules in GPS

Rule # Action Precondition Add Delete

Rule 1 “climb on chair” “chair at middle room,” “at middle room,” “on floor”

“at bananas,” “on chair”

“at middle room,” “on floor”

Rule 2 “push chair from door to middle room”

“chair at door,” “at door”

“chair at middle room,” “middle room”

“chair at door,” “at door”

Rule 3 “walk from door to middle room”

“at door,” “on floor”

“at middle room” “at door”

Rule 4 “grasp bananas” “at bananas,” “empty handed”

“has bananas” “empty handed”

Rule 5 “eat bananas” “has bananas” “empty handed,” “not hungry”

“has bananas,” “hungry”

Cognitivism 15

15

same manner, the subgoal [“has bananas”] can be achieved by applying an action of [“grasp bananas”] with the precondition of [“at bananas”], which can be achieved again by applying another action of [“climb on chair”]. Repetitions of backward transition from a particular subgoal to its sub- subgoal by searching for an adequate action enabling the transi-tion can result in generation of a chain of actions, and the goal state can be achieved from the start state by applying the resulting action sequence.

The architecture of GPS is quite general in the sense that it has been applied to a variety of different task domains including proving theo-rems in logic or geometry, word puzzles, and chess. Allen Newell and his colleagues (Laird et al., 1987) developed a new cognitive model, Soar, by further extending GPS. Of particular interest is its primary learning mechanism, chunking. Chunking is involved in the conversion of an experience of an action sequence into long- term memory. When a particular action sequence is found to be effective to achieve a par-ticular subgoal, this action sequence is memorized as a chunk (a learned rule) in long- term memory. When the same subgoal appears again, this chunked action sequence is recalled rather than deliberating over and synthesizing it again. For example, in the case of the monkey– banana problem, the monkey may learn an action sequence of “grasp bananas” and “eat bananas” as an effective chunk for solving a current “hungry” problem, and may retain this chunk because “hungry” may appear as a problem again in the future.

The idea of chunking has attracted significant attention in cognitive psychology. Actually, I myself had been largely influenced by this idea after I learned about it in an artificial intelligence course given by John Laird, who has led the development of Soar for more than two decades. At the same time, however, I could not arrive at full agreement with the treatment of chunking in Soar because the basic elements to be chun-ked are symbols rather than continuous patterns even at the lowest per-ceptual level. I speculated that the mechanism of chunking should be considered at the level of continuous perceptual flow rather than symbol sequences in which each symbol already stands as an isolated segment within the flow. Later sections of this book explore how chunks can be structured out of continuous sensory– motor flow experiences. First, however, the next section introduces the so- called symbol grounding problem, which cognitive models built on symbolist frameworks inevi-tably encounter.

16 On the Mind

16

2.3. The Symbol Grounding Problem

The symbol grounding problem as conceptualized by Steven Harnad (1990) is based on his assertion that the meanings of symbols should originate from a nonsymbolic substrate like sensory- – motor patterns and as such, symbols are grounded bottom up. To give shape to this thought, he proposed, as an abstract model of cognitive systems, a hybrid sys-tem consisting of a symbol system in the upper level and a nonsymbolic pattern processing system in the lower level. The nonsymbolic pattern processing system functions as the interface between sensory– motor reality and abstract symbolic representation by categorizing continuous sensory– motor patterns into sets of discrete symbols. Harnad argued that meaning, or semantics, in the hybrid system would no longer be parasitic on its symbol representation but would become intrinsic to the whole system operation, as such representation is now grounded in the world. This concept of a hybrid system has similarities to that of FLN and FLB advocated by Chomsky and colleagues in the sense that it assumes a core aspect of human cognition in terms of logical symbol systems, which can support up to an infinite range of expressions, and peripheries as the interface to a sensory– motor or semantic system that may not be involved in composition or recursion in depth.

This idea of a hybrid system reminds me also of Cartesian dualism. According to Descartes the mind is a thinking thing that is nonmaterial whereas the body is nonthinking matter, and the two are distinct. The nonmaterial mind may correspond to FLN or symbol systems that are defined in a nonphysical discrete space, and the body to sensory– motor processes that are defined in physical space. The crucial question here is how these two completely distinct existences that do not share the same metric space can interact with each other. Obviously, our minds depend on our physical condition and the freshness of the mind affects the swiftness of our every move. Descartes showed some concern about this “problem of interactionism,” asking how a nonmaterial mind can cause anything in a material body, and vice versa. Cognitive scientists in modern times, however, seem to consider— rather optimistically I think— that some “nice” interfaces would enable interactions between the two opposite poles of nonmatter and matter.

Let’s consider the problem by examining a problem in robot naviga-tion as an example, reviewing my own work on the subject (Tani, 1998). A typical mobile robot, which is equipped with simple range sensors, may travel around an office environment while taking the range reading that

Cognitivism 17

17

provides an estimate of geometrical shapes in the surrounding environ-ment at each time step. The continuous flow of the range image pattern is categorized into one of several predefined landmark types such as a straight corridor, a corner, a T- branch, or a room entrance. The upper level constructs a chain representation of landmark types by observing sequential outputs of the categorizer while the robot explores the envi-ronment. This internal map consists of nodes representing position states of the robot associated with encountered landmark types and of arcs rep-resenting transitions between them associated with actions such as turn-ing to right/ left and going straight. This representation takes exactly the same form as a symbolic representation known as a finite state machine (FSM), which consists of a finite number of discrete states and their state transition rules. It is noted that the rule representation in GPS can be con-verted into this FSM representation by considering that each rule descrip-tion in GPS can be expanded into two adjacent nodes connected by an ark in FSM. Once the robot acquires the internal map of its environment, it becomes able to predict the next sensation of landmarks on its travels by looking at the next state transition in the FSM. When the actual percep-tion of the landmark type matches the prediction, the robot proceeds to the prediction of the next landmark to be encountered. An illustrative description is shown in Figure 2.2.

Straight

Straight

Right

Right

robot and its environment

sensory patternt

categorizer

“T-Branch”

FSM

C

C

T

T

C

C

Figure 2.2. Landmark- based navigation of a robot using hybrid- type architecture consisting of a finite state machine and a categorizer. Redrawn from Tani (1998).

18 On the Mind

18

Problems occur when this matching process fails. The robot becomes lost because the operation of the FSM halts upon receiving an illegiti-mate symbol/ landmark type. This is my concern about the symbol grounding problem. When systems involve bottom- up and top- down pathways, they inevitably encounter inconsistencies between the two pathways of top- down expectation and bottom- up reality. The problem is how such inconsistencies can be treated internally without causing a fatal error, halting the system’s operations. It is considered that both levels are dually responsible for any inconsistency and that they should resolve any conflict through cooperative processes. This cooperation entails iterative interactions between the two sides through which opti-mal matching between them is sought dynamically. If one side pushes forward a little, the other side should pull back elastically so that a point of compromise can be found through iterative dynamic interac-tions. The problem here is that the symbol systems defined in a discrete space appear to be too solid to afford such dynamic interactions with the sensory– motor system. This problem cannot be resolved simply by implementing certain interfaces between the two systems because the two simply do not share the same metric space enabling smooth, dense, and direct interactions.

2.4. Context

Another concern is how well symbol systems can represent the real-ity of the world. Wittgenstein once said: “Whereof one cannot speak, thereof one must be silent,” meaning that language as a formal symbol system for fully expressing philosophical ideas has its limitations. Not only in philosophy, but in everyday life, too, there is always some-thing that cannot be expressed explicitly. Context, or background, is an example. Context originally means discourse that surrounds a lan-guage unit and that helps to determine its interpretation. In a larger sense, it also means the surroundings that specify the meaning or exis-tence of an event.

Spencer- Brown (1969) highlighted a paradox in his attempts to explicitly specify context in his formulation of the calculus of indica-tions. Although details of his mathematical formulas are not introduced here, his statement could be interpreted to mean that indexing the

Cognitivism 19

19

current situation requires the indexing of its background or context. Because indexing the background requires further indexing of the back-ground of the background, the operation of indexing situations ends up as an infinite regression. Spencer- Brown wrote that, in this aspect, every observation entails a symbol, an unwritten cross, where the cross operation denotes indexing of the background. Let’s imagine you see a bottle- like shape. This situation can be disambiguated by specifying its immediate background (context), namely that a bottle- like shape was seen immediately after you opened the refrigerator door, which means that the bottle is chilled. Further background information that the refrigerator was opened after you went back to your apartment after a long day at work would mean that what you see now is a bottle of chilled beer waiting to be drunk. There is no logical way to terminate this regression, yet you can still reach for the bottle of beer to drink it! Although FLN may have the capability for infinite regression, it is hard to believe that our minds actually engage in such infinite computations. We live and act in the world surrounded or supported by context, which is always implicit, uncertain, and incomplete for us at best. How can a formal symbol system represent such a situation?

2.5. Summary

We humans definitely have internal images about our surrounding world. We can extract regularities from our experiences and observa-tions both consciously and unconsciously, as evidenced by the fact that we can acquire language skills involving grammar. Also, we can com-bine the acquired rules to create new images, utterances, and thoughts. Accounting for this aspect, cognitivists tend to assume that symbols exist to be manipulated in our heads. My question, though, is what is the reality of those symbols we suppose to be in our heads? Is symbol representation and manipulation an operational principle in the cogni-tive mind? If so, my next questions would be how can symbols compris-ing arbitrary shapes of tokens interact with sensory– motor reality and how can they access matters involving context, mood, or tacit knowl-edge that are considered to be difficult to deal with by formal symbol systems? It is also difficult to represent the state of consciousness with them. It is presumably hard to differentiate between doing something

20 On the Mind

20

consciously and unconsciously in the processes of merely manipulating symbols by following logic.

If we attempt to model or reconstruct mind, it should be essential to reconstruct not only rational thinking aspects but also the feelings that accompany our daily experiences such as consciousness as the vivid feeling of qualia characterizing various sensations. But if sym-bol systems cannot deal with such matters, what would be a viable solution? Indeed, this book proposes an abrupt transition from the aforementioned conventional symbolist framework. The main pro-posal is to consider that what we have in our brains as “symbol” is not just arbitrary shape of token but dynamic activity of physical matter embedded in continuous spatiotemporal space. Such dynamic activ-ity of matter, adequately developed, might enable compositional but vivid and contextual thinking and imaging in our brains. A crucial argument would be that such cognitive minds could be naturally situ-ated to the physical world because these two share the same metric space for interaction.

The next chapter addresses this very problem from the standpoint of a different discipline, that of phenomenology. The objective of phenomenology is not only to investigate the problem of minds but also to search for how the problems themselves can be constituted from the introspective view. Readers will find that the disciplinary of phenomenology is quite sympathetic to the aforementioned dynamic system view.

21

21

3

Phenomenology

Phenomenology originated in Europe at the beginning of the 20th cen-tury with Edmund Husserl’s study of so- called phenomenological reduc-tion, through which the analysis of the natural world is based purely on the conscious experiences of individuals. As this chapter shows, Husserl’s study subsequently evolved and was extended by the existentialism of Martin Heidegger and the embodiment of Maurice Merleau- Ponty and others. We should also not forget to mention William James, who was born 17 years earlier than Husserl in the United States. Although James is best known as the founder of modern psychology, he also pro-vided numerous essential philosophical ideas about the mind, some of which are quite analogous to Husserl’s phenomenology. In Japan, Kitaro Nishida (1990) developed his original thinking, influenced by Buddhist meditation, which turned out to include ideas with some affinity to those of Husserl and James.

Phenomenology asks us to contemplate how the world can exist for us and how such a belief can be constituted from our experiences, by suspending our ordinal assumption that the world exists as a physical fact from the outset. Here, the question of how the world can be con-stituted in our subjective reflection might be analogous to the question of how the knowledge of the world can be represented in cognitive sci-ence studies. Phenomenology, however, focuses more on phenomena themselves, through direct perception or pure experience, which has

22 On the Mind

22

not yet been articulated either by conception or language. For example, a rose exists in our subjectivity as a conscious phenomenon of a par-ticular smell or a particular visual shape, but not by our knowledge of its objective existence. This discipline then focuses purely on phenom-ena and questions the existence of the world from such a viewpoint. However, the discipline also explores the being of cogito (how cognition arises) in the higher level by examining how it can be developed purely through the accumulation of perceptual experiences. Thus, phenom-enology asks how cognition is constituted from direct perception, a line of questioning deeply related to the later discussions on how robotic agents can develop views or recognition of the world from their own sensory– motor experiences.

3.1. Direct Experience

Let us begin by examining what direct experience means in phenomenol-ogy. It is said that Husserl noticed the importance of direct experience when coming across Mach’s perspective (Figure 3.1) (T. Tani, 1998). It is said that Mach drew the picture to represent what he sees with his left eye while closing his right one. From this perspective, the tip of his nose appears to the right of the frame with his eye socket curving upwards. Although we usually do not notice this sort of perspective, this should represent the direct experience that we then reconstruct in our minds.

Husserl considered that an examination of such direct experience could serve as a starting point to explore phenomena. Around the same time, a notable Japanese philosopher, Kitaro Nishida introduced a simi-lar idea in terms of pure experience, writing that:

For example, the moment of seeing a color or hearing a sound is prior not only to the thought that the color or sound is the activity of an external object or that one is sensing it, but also to the judgment of what the color or sound might be. In this regard, pure experience is identical with direct experience (Nishida, 1990, p.3).

For Nishida, pure experience is not describable by language but is transcended:

When one directly experiences one’s own state of consciousness, there is not yet a subject or an object… . (Nishida, 1990, p.3)

Phenomenology 23

23

Here, what exactly does this phrase “there is not yet a subject or an object” mean? Shizuteru Ueda (1994), who is known for his studies on Nishida’s philosophy, explains this by analyzing the example utterance, “The temple bell is ringing.” If it is said instead as “I hear the temple bell ringing,” the explication of “I” as the subject conveys a subtle expression of subjective experience at the moment of hearing. In this interpreta-tion, the former utterance is considered to express pure experience in which subject and object are not yet separated by any articulation in the cogito. This analysis is analogous to what Husserl recognized from Mach’s perspective.

3.2. The Subjective Mind and Objective World

We might ask, however, how much the phenomena of experience depend on direct perception. Is our experience of perception the same as that of

Figure 3.1. Ernst Mach’s drawing. Source: Wikimedia Commons.

24 On the Mind

24

infants in the sense that any knowledge or conception in the cogito does not affect them at all? In answer, we have sensationalism on one side, which emphasizes direct experiences from the objective world, and on the other we have cognitivism, which emphasizes subjective reflection and representation of the world. But how did these conflicting poles of the subjective mind and the objective world appear? Perhaps they existed as one entity originally and later split off from each other. Let’s look then at how this issue of the subjective and the objective has been addressed by different phenomenological ideas.

In Husserl’s (2002) analysis of the structural relationship between what he calls appearance and that which appears in perceiving an object, he uses the example of perceiving a square, as shown in Figure 3.2.

In looking at squarelike shapes in everyday life, despite them having slightly unequal angles, we usually perceive them to be squares with equal right angles. In other words, a square could “appear” with unequal angles in various real situations, when it should have equal right angles in the ideal: in such a case, a parallelogram or trapezoid is the “appear-ance” and the square is “that which appears” as the result of percep-tion. At this point, we should forget about the actual existence of this square in the physical world because this object should, in Husserl’s sense, exist only through idealization. Whether things exist or not is just a subjective matter rather than an objective one. When things are constituted in our minds, they exist regardless of their actual being. This approach that puts aside correspondence to actual being is called

that which appears

Square is perceived

(appearance1)Parallelogram

(appearance2)Trapezoid

(appearance3)Parallelogram

Figure 3.2. Husserl’s ideas on the structural relationship between “appearance” and “that which appears” in perceiving a square, as an example.

Phenomenology 25

25

epoché, or suspension of belief. Husserl considers that direct experi-ence has intentionality toward representation. This intentional process of constituting representation from direct experience actually entails consciousness. Therefore, the phenomena of experience cannot be accounted for only by direct experience at the level of perception, but it must also be accounted for by conscious representation at the level of cogito. Ultimately, it can be said that the phenomena of experiences stand on the duality of these two levels.

Incidentally, from the preceding text, readers might speculate that the level of cogito and the level of perception are treated as separate entities in phenomenology. However, phenomenology does not seek to take that direction and instead attempts to explore how the apparent polarity of, for example, the cogito and perception, subjectivity and objectivity, and mind and material, could have appeared from a single unified entity in the beginning. Although understanding such constitu-tional aspects of the polarity (i.e., how the polarity developed) contin-ues to be a subject of debate in phenomenology, interesting assumptions have been made about there being some sort of immanence enabling self- development of such structures. For example, Husserl considers how the cogito level of dealing with temporal structure submerged in a stream of experience could emerge from the direct perceptual level, as explained in detail later.

Nishida (1990) also considers that the subject and object should be one unified existence rather than taken originally as independent phenomena. He, however, argues that the unified existence could have internal contradictions that lead to bifurcation or the division of the unity into the subject and object that we usually grasp. He suggests that the phenomenological entity simply continues to develop by repeat-ing these unification and division processes. Merleau- Ponty (1968) pro-fesses that this iteration of unification and division would take place in the medium of our bodies, as he considers that the two poles of the subjective mind and the objective material actually meet and intermin-gle with each other there. He regards the body as ambiguous, being positioned between the subjective mental world and the objective phys-ical world. Heidegger, on the other hand, devoted himself to exploring a more fundamental problem of being by working on what it means to be human rather than splitting the problem into that of subject and object. And through his approach to the problem of being he turned out to be successful in showing how subjectivity and objectivity can appear.

26 On the Mind

26

What follows examines the philosophical arguments concern-ing the subjective mind and the objective world in more depth, along with related discussions that include time perception as propounded by Husserl, being- in- the- world set forth by Heidegger, embodiment by Merleau- Ponty, and the stream of consciousness by James. Let’s begin by looking closely at each of these, starting with Husserl’s conception of the problem of time perception.

3.3. Time Perception: How Can the Flow of Subjective Experiences Be Objectified?

To Husserl, the world should consist of objects that the subject can con-sciously meditate on or describe. However, he noticed that our direct experiences do not originate with forms of such consciously represent-able objects but arise from a continuity of experience in time that exists as pure experience. Analyzing how a continuous flow of experience can be articulated or segmented into describable objects or events brought him to the problem of time perception. Husserl asks how we perceive temporal structure in our experiences (Husserl, 1964). It should be noted that “time” discussed here is not physical time having dimensions of seconds, minutes, and hours but rather time perceived subjectively without objective measures. The problem of time perception is a core issue in this book because both humans and robots that generate and recognize actions have to manage continuous flows of perception by articulating them (via segmentation and chunking), as is detailed later.

In considering the problem, Husserl presumed that time consists of two levels: so- called preempirical time at a deep level and objective time at a surface level. According to him, the continuous flow of experience becomes articulated into consciously accessible events by its develop-ment though these phenomenological levels. This idea seems born from his thinking on the structural relationship between “appearance” and “that which appears” mentioned earlier in this chapter. At the preem-pirical level, every experience is implicit and yet must be articulated, but there is some sort of passive intention toward the flow of experience which he refers to as retention and protention. His famous explanatory example is about hearing a continuous melody such as “do- re- mi.” When we hear the “re” note, we would still perceive a lingering impression of

Phenomenology 27

27

“do” and at the same time we would anticipate hearing the next note of “mi.” The former refers to retention and the latter protention. The present appearance of “re” is called the primary impression. These three terms of retention, primary impression, and protention are used to des-ignate the experienced sense of the immediate past, the present, and the immediate future, respectively. They are a part of automatic processes and as such cannot be monitored consciously. The situation is similar to that of the utterance “The temple bell is ringing” mentioned ear-lier, in the sense that the subject of this utterance is not yet consciously reflected. Let’s consider the problem of nowness in the “do- re- mi” example. Nowness as experienced in this situation might be taken to correspond with the present point of hearing “re” with no duration and nothing beyond that. Husserl, however, considered that the subjective experience of nowness is extended to include the fringes of the experi-enced sense of both the past and the future, that is, in terms of retention and protention: Retention of “do” and protention of “mi” are included in the primary impression of hearing “re.” This would be true especially when we hear “do- re- mi” as the chunk of a familiar melody rather than as a sequence consisting of independent notes. Having now understood Husserl’s notion of nowness in terms of retention and protention, the question arises: Where is nowness bounded? Husserl seems to think that the immediate past does not belong to a representational conscious mem-ory but merely to an impression. Yet, how could the immediate past, experienced just as an impression, slip into the distant past but still be retrieved through conscious memory, as Francisco Varela (1999) once asked in the context of neurophenomenology? Conscious memory of the past actually appears at the level of objective time, as described next.

This time, let’s consider remembering hearing the slightly longer sequence of notes in “do- re- mi- fa- so- la.” In this situation, we can recall hearing the final “la” that also retains the appearance of “so” by means of retention, and we can also recall hearing the same “so” that retains the appearance of “fa,” and so on in order back to “do.” By means of con-sciously unifying immediate pastness in a recall with presentness in the next recall in the retention train, a sense of objective time emerges as a natural consequence of organizing each appearance into one consistent linear sequence. In other words, objective time is constituted when the original experience of continuous flow (in this case the melody) is artic-ulated into a sequence of objectified events (the notes) by means of con-sciously recalling and unifying each appearance. There is a fundamental

28 On the Mind

28

difference between an impression that will sink into the horizon in preempirical time and the same past which is represented in objective time. The former is a present, living experience of passing away, whereas the latter can be constituted as consciously represented or manipulable objects, but only after the original experience is retained. Therefore, the latter may lack the pureness or vividness of the original experience, yet may fit well with Husserl’s goal that pure experience can be ultimately represented as logical forms dealing with discrete objects and events.

Husserl proposed two types of intentionality, each heading in a dif-ferent direction: transversal intentionality refers to integration of the living- present experience by means of retention, primary impression, and protention in preempirical time; longitudinal intentionality affords an immanence of time structures (from preempirical time to objective time) by means of conscious recall of retained events in the retention train. Consequently, this intentionality might be considered to be reten-tion of retention itself (a reflective awareness of this experience). In the process of interweaving these two intentionalities (double intentional-ity) into the unitary flow of consciousness, the original pure experience is objectified and simulataneously the subjectivity or ego of this objecti-fying process emerges.

In his later years, Husserl introduced an analysis at an even deeper level, the absolute flow level. Here, neither retention nor protention has yet appeared— only flow exists. However, this flow is not homogeneous; each appearance has its own duration. Tohru Tani (1998) interpreted this aspect by saying that consciousness flows as well as stagnates, char-acterizing the uniqueness of the absolute flow of consciousness and set-ting it apart from consciousness as developed elsewhere. This alternating flow and stagnation is primordial, an absolutely given dynamic which is nonreducible. The passive intentional acts of retention and protention that dimensionalize experience along the continuum of temporality in the next level originate from this primordial stage of consciousness, and objective time arises from there.

In sum, Husserl’s persistent drive to reduce the ideas and knowl-edge of man to direct experiences is admirable. However, his motiva-tion toward a logically manipulable ideal representation of the world via reflection seems to me problematic in that it has exactly the same problem as the symbol grounding problem in cognitive models. Dreyfus (Dreyfus & Dreyfus, 1988), who is well known for his criticism of arti-ficial intelligence research, argues that the main computational scheme

Phenomenology 29

29

based on logical inferences and categorical representation of knowl-edge in modern cognitive science or artificial intelligence originated from the ideas of Husserl. Actually Husserl (1970) had already toyed with an idea similar to the frame system, a notable invention of Marvin Minsky, which introduced domain specificity into the logical descrip-tions of objects and the world, but he finally admitted defeat in the face of infinite possibilities of situations or domains. However, Heidegger as a disciple of Husserl actually took an alternative route to escape this pre-dicament, as we discover next.

3.4. Being- in- the- World

Heidegger is considered by many to be one of the greatest philosophers of modern times, changing the direction of philosophy dramatically by introducing his thinking of existentialism (Dreyfus, 1991). Although a disciple of Husserl, once he became inspired by his own thoughts on the subjective constitution of the world, Heidegger subsequently departed from Husserl’s phenomenology. It is said that Heidegger noticed a phil-osophical problem concerning the cogito and consciousness, a problem that was considered by Descartes as well as Husserl and yet fully over-come by neither.

Descartes considered that the cogito, a unique undoubtable being, should be taken as the initial point of any philosophical thoughts after everything in the world is discarded for its doubtfulness of being. He concluded that if he doubted, then something or someone must be doing the doubting, therefore the very fact that he doubted proved his exist-ence (Williams, 2014). Husserl, taking on this thought, presented his idea that the world and objects should exist in terms of conscious rep-resentations in the cogito and that such conscious representations ought to be ideal ones.

Heidegger just could not accept the unconditional prior existence of the cogito. Nor could he accept an ideal and logical representation of the world that the cogito supposedly constitutes. Instead, he raised the more fundamental question of asking what it means to be human, while avoiding tackling directly the problems of cogito versus perception, sub-jectivity versus objectivity, and mental versus material. It is important to note that Heidegger sought not to obtain an objective understanding of the problem but rather to undertake a hermeneutic analysis of it.

30 On the Mind

30

Hermeneutics is an approach that attempts to deepen the under-standing of targets while having prior estimates, or biases, of them that are adequately modified during the process of understanding. For example, when we read a new piece of text, a preunderstanding of the author’s intention would help our understanding of the content as we go along. However, hermeneutics possesses an inherent difficulty because preunderstanding (bias) originates from intuitions in a context- dependent way and there is a potential danger of being caught up in a loop of interpretation, the so- called hermeneutic circle. Because we can understand the whole in terms of its parts and the parts only through their relationship to the whole, we experience an unending interpreta-tive loop. Despite this difficulty, Heidegger holds that there are some fundamental problems, like what it means to be human, which can only be understood in this way. It is said that we take being as granted, but cannot articulate it precisely when asked to do so. In his classic text, Being and Time, he attempts to elucidate the meaning of being via her-meneutic cycling, beginning with this same vague preunderstanding. It is his thoughts on understanding by hermeneutic cycling that form the essential philosophical background to the central theme of this book, namely emergent phenomena, as discussed later.

For now, let’s examine Heidegger’s famous notion of being- in- the- world (Heidegger, 1962) by looking at his interpretation of the ways of being in relation to equipment. Heidegger focuses on the purposeful exercise of naive capacities as extended by equipment and tools. For example, he asks what it means that a hammer exists. It is not suffi-cient to answer that it exists as a thing made from cast iron and wood because such an answer merely describes its objective features. Rather, the meaning of being of a hammer must be approached by way of its employment in daily activities, something like “The carpenter building my house is hitting nails with it.” Such an account of nails being hit with a hammer, the hammer being used by the carpenter, and the carpenter building my house implies the way of presently being for each of these entities as situated among others: None exist independently but are uni-fied in their existence via the preunderstanding of how each interacts in the constitution of a situation characterized by purposeful activity.

Heidegger asserts that the being of equipment is mostly “trans-parent.” Put another way, the existence of pieces of equipment is not noticed much in our daily usage of them. When a carpenter continues to hit a nail, the hammer becomes transparent to him: The hammer and

Phenomenology 31

31

the nail are absorbed in a connected structure, the purposeful activity that is house building. However, when he fails to hit the nail correctly, the unified structure breaks down and the independence of each entity becomes noticeable. Their relative meanings become interpretable only in such a breakdown. In the breakdown of the once- unified structure, the separated entities of the subject and the object become apparent with self- questioning, like “why did ‘I’ fail?” and “what’s wrong with the hammer and the nail?”. In this way, it is considered that the herme-neutic approach can provide an immanent understanding of metaphys-ical existence, such as consciousness, through cycles of self- corrective analysis.

Heidegger recognizes that ordinary man has rare opportunities to reflect on the meaning of his own way of being, occupied as he is with the daily routines of life, and he regards such a state of being as inau-thentic. Although man can live in his neighboring community occupied with “idle talk” and trivia, he cannot become individuated, ultimately recognizing and taking responsibility for his or her existence in such a way. Man in this case lives his daily life only in the immediate present, vaguely anticipating the future and mostly forgetting the past.

However, Heidegger tells us that this way of being can be changed to authentic being when man thinks positively about the possibility of his death, which could occur at any moment and not necessarily so very far into the future. Death is an absolutely special event because it is the ultimately individuating condition that cannot be shared with others. Death is to be regarded as the absolutely certain impossibility of being further related to any other kind of being, and when confronted in this way it prompts the authentic being to head toward its own absolute impossibility.

Here, we must focus on Heidegger’s brilliant notion that the present is born via the dynamic interplay between a unique agent’s projected future possibilities and its past. In this process, one reclaims one’s self from the undifferentiated flow of idle chatter and everyday routine. This is authenticity. The authentic agent has the courage to spend the time of his or her life in becoming an agent of change, to transform the situation in which one is “thrown” (the clearing of being as inherited, as one is born into it) into that ideal self- situation that characterizes one’s unique potential. The inauthentic agent hides from this potential, and rather invests his or her time in distractions, “idle chatter,” merely repeating established routines and defending established conventions regardless

32 On the Mind

32

of suboptimal and even grossly immoral results. Thus, Heidegger estab-lishes the subjective sense of temporality (rather than objective time) as the ground of authentic being, whereas the inauthentic being tries to nullify this subjective sense of time by ignoring his or her mortality and retreating into the blind habit and routine that is characteristic of “fallenness.”

Now, we see that his notion of temporality is drastically different from that of Husserl’s. Husserl considers that temporality appears as the result of subjective reflection to articulate direct experiences of sensory streams as consciously manipulable object or event sequences. Heidegger, on the other hand, shows how the differentiable aspects of past, present, and future rise from the mortal condition. Temporality is the dynamic structure of being, in light of which anything at all comes to matter, and from which any inquiry into the nature of being, includ-ing any derivative understanding of time as sequence, for example, is ultimately drawn.

Next, there are other aspects of mind to review, including the role of the body in mediating interactions between the mind and the material world.

3.5. Embodiment of Mind

In the philosophy of embodiment developed by Merleau- Ponty, we can easily find the influence of Heidegger’s being- in- the- world. Merleau- Ponty’s notion of embodiment has been recognized as a notion of ambi-guity, and with this ambiguity he successfully avoided tackling Cartesian dualism directly. As mentioned before, Descartes thought that the world consists of two extremes— the subjective mind of nonmaterial and the objective things of materials— and this invited a problem of interaction. The problem is how to account for the causal interaction among nonma-terial mind, material body, and material world while these effectively exist in different spaces. From this background, Merleau- Ponty devel-oped his thoughts on embodiment, asking at which pole the body, as a part of our being, should be taken to lie. Although the body in terms of flesh can be regarded as material, we actually often experience the body as an aspect of mind, which is regarded as nonmaterial by Descartes. For example, our cheeks turn red when we get angry and tears start to fall when we feel sad. In response, Merleau- Ponty proposes that we consider

Phenomenology 33

33

the body to be an ambiguous existence belonging to neither of these two extremes.

Merleau- Ponty examined various means of interaction between mind and body. For example, he presented an analysis of a blind man with a stick (Merleau- Ponty, 1962). The stick becomes an object when the blind man grasps it in order to guide his movements. At the same time, however, it becomes a part of his body when he scans his surroundings when walking by touching its tip to things, like tactile scanning with the finger. Although this is an interesting example showing the possi-bility of body extension, it also recalls the possibility that the range of self can be extended or shrunk through the use of tools and artifacts. In another example, his analysis of the phenomenon of phantom limbs might indicate the complete opposite of the blind man’s case. It is said that people who have had a limb amputated often still experience pain in the amputated limb that no longer exists. Merleau- Ponty explained the phenomena in terms of “refusal of deficiency,” which is the implicit negation of what runs counter to the natural momentum that throws us into our tasks, our cares, our situation, and our familiar horizons (Merleau- Ponty, 1962). It can be summarized then that the analysis of the blind man with his stick indicates the possibility of extension of the familiar horizon associated with daily use of the stick, whereas the anal-ysis of phantom limbs indicates another possibility, one of refusal of the sudden shrinking of this once familiar horizon. These examples might help us to understand how the horizon of subjective possibility is con-stituted via daily interactions between the body and the world, thereby enriching our understanding of being in the world.

Along the same line of the thought, Merleau- Ponty addressed the problem of body schema— the integrated image of the body— by con-ducting an analysis of a patient with neurological blindness. His patient, Schneider, had lesions in vision- related cortical areas. Although he had a problem in recognizing objects visually, he could pick up a cup to have a drink or make a fire by striking a match without problems, “see-ing” the cup or the match. So, he could see the shapes and outlines of objects but needed a reasoning process to identify them. When he was asked to point to his nose, he had difficulty doing so, but could blow his nose with a handkerchief. He had difficulty pointing to or moving a part of his body when asked to do so unless he deliberated over his movement from an objective view ahead of time. In short, he could per-form concrete movements in natural situations in daily life very easily,

34 On the Mind

34

but had difficulty performing abstract movements without context and without an objective view. Merleau- Ponty came to the conclusion that such concrete movements situated in everyday life are fundamental to the consideration of body schema. In concrete movements, our body or body part is not an object that we move in an objective space. Rather, it is our living body, the body as a subject, that we move in a bodily space. These movements performed by our living body are organized in famil-iar situations in the world, wherein the body comprehends its world and objects without explicitly representing or objectifying them. The body communicates with them through a skill or tacit knowledge, by mak-ing a direct reference to the world and its objects. This direct reference implies the fundamental structure of being- in- the- world, as Heidegger discussed in terms of Dasein.

Merleau- Ponty’s (1962) analysis of synesthesia is also worth introduc-ing here. Synesthesia, a neurological condition in which sensation in one modality unconsciously evokes perception in another, has been reported in a variety of forms. Some synesthetes perceive colors upon seeing certain shapes or letterforms, feel textures on hearing particular sounds, or expe-rience strong tastes on hearing certain words. Merleau- Ponty speculates that these sensory slips should have some clear meaning behind them, rather than simply being perceptual side effects, which would account for how we humans engage in the world. Indeed, perception of objects in the world is achieved in the iterative interactions between multiple modalities of sensation by reentrant mechanisms established in the coupling of us and the world. Merleau- Ponty refutes ordinary scientific views of modu-larity to understand the reality of perception by reducing it into the sum of each modality of sensation. His approach is to see perception as ongo-ing structuring processes of the whole, or Gestalt, which appears in the communicative exchanges between the different modalities of sensation.

He also refutes the notion of separating perception from action. He explains that the hand touching something reverses into an object that is being touched because the hand itself is tangible flesh. In shaking hands, we feel that we are touching another’s hand and simultaneously that our extended hand is being touched. Analogously, Merleau- Ponty says that a see- er reverses into a visible object because of the thickness of its flesh. Thus, vision is analogous to exploring objects in the dark by tactile palpa-tion. Visual palpation by looking inevitably accompanies a sense of being seen at the same time. He writes that painters often feel as if the objects in their own paintings gaze back at them. There are silent exchanges

Phenomenology 35

35

between the see- ers and the objects. Because flesh is tactile as well as visible, it can touch as well as be touched and can see as well as be seen. There is flux in the reciprocal network that is body and world, involving touching, vision, seeing, and things tangible.

Let’s take another example. Imagine that your right hand touches your left hand while it is palpating something. At this moment of touching, the subjective world of touching transforms into the objective world of being touched. Merleau- Ponty wrote that, in this sense, “the touching subject passes over to the rank of the touched, descends into the things, such that the touch is formed in the midst of the world and as it were in the things” (Merleau- Ponty, 1968, pp.133– 134). Although the subject of touching and the object of being touched are opposite in meaning, they are rendered identical when Merleau- Ponty’s concept of chiasm is applied. Chiasm, originating from the Greek letter χ (chi), is a rhetorical method to locate words by crossing over, combining subjective experience and objec-tive existence. Although the concept might become a little difficult from here onward, let’s imagine a situation in which a person who has language to describe only two- dimensional objects happens to encounter a novel object, a column, as a three- dimensional object, as Tohru Tani (1998) sug-gests. By exploring the object from different viewpoints such as from the top or side, he would say that this circular column could be a rectangular one and this rectangular column could be a circular one (Figure 3.3).

When this is written in the form of chiasm, it is expressed as:

[This circle is a rectangle.] X [This rectangle is a circle.]

Thus, the two- dimensional world is extended to a three- dimensional one in which a circle and a rectangle turn out to be just different views of the column. The conflict between the two is resolved by means of creating an additional dimension that supports their identity in a deeper level. Let’s consider then what could be created, or emerge, in the fol-lowing cases.

[A subject of touching is an object of being touched.] X [An object of being touched is a subject of touching.]

[A see- er is a visible object.] X [A visible object is a see- er.]

Merleau- Ponty suggests that embodiment as an additional dimen-sion emerges, in which flesh of the same tangibility as well as the same thickness can be given to both the subject of touching or seeing and the object of being touched or being seen. This dimension of embodiment

36 On the Mind

36

can facilitate the space for iterative exchanges between the two poles of subject and object:

There is a circle of the touched and the touching, the touched takes hold of the touching; there is a circle of the visible and the seeing, the seeing is not without visible existence. … My body as a visible thing is contained within the full spectacle. But my seeing body subtends this visible body, and all the visibles with it. There is reciprocal insertion and intertwining of one in the other (Merleau- Ponty, 1968, p.143).

Merleau- Ponty, in exploring ambiguity between the two poles of subjectivity and objectivity, did not anchor his thoughts in the midst of these two extremes, but rather allowed them to move dynamically between the two. By positioning the poles to face each other, he would have imagined a flux, a flow, from one pole to the other and an intertwin-ing of the two in the course of resolving the apparent conflicts between them in the medium of embodiment. When the flux intertwines the two, the subject and the object become an inseparable being reciprocally inserted into each other with the world arising in the gap.

Figure 3.3. A person who has language to describe only two- dimensional objects happens to encounter a novel object, a column, as a three- dimensional object.

Phenomenology 37

37

Recently, the thoughts on embodiment have been revived and have provided significant influences in cognitive science in terms of the rising “embodied minds” paradigm in the philosophy of mind and cognitive sciences (Varela, Thompson & Rosch, 1991; Clark, 1998; Ritter et al., 2000; O’Regan & Noë, 2001). Actually, a new movement, referred as the behavior- based approach (Brooks, 1990) in artificial intelligence and robotics started under this trend, as is repeatedly encountered in later chapters.

Let’s move on now to examination of the concept of the stream of consciousness put forward by the pioneering American psychologist and philosopher William James (1892) more than a half century before Merleau- Ponty. As we go, we’ll find some connection between James’ thinking and Husserl’s concept of time perception, especially that of the level of absolute flow. Also, we’ll see a certain affinity between this notion and that of Merleau- Ponty’s in his attempt to show the imma-nent dynamics of our inner phenomena. By examining James’ stream of consciousness, we can move closer toward answering how our will might be free.

3.6. Stream of Consciousness and Free Will

We experience our conscious states of mind as thoughts, images, feel-ings, and desires that flow while they constantly change. James defines his notion of the stream of consciousness as the inner coherence or unity of conscious states as they proceed from here to the next. He explains the four essential characteristics of this stream in his monumental Principles of Psychology (1918, p. 225) as follows:

1. Every “state” tends to be part of a personal consciousness.2. Within each personal consciousness states are always

changing.3. Each personal consciousness is sensibly continuous.4. It is interested in some parts of its object to the exclusion of

others, and welcomes or rejects— chooses from among them, in a word— all the while.

The first characteristic means that the various states comprising the stream are ultimately subjective matters that the subjects feel they

38 On the Mind

38

experience by themselves. In other words, the subjects can keep them private in their states of mind. The second characteristic, one of the most important of James’ claims, asserts that although the stream pre-serves the inner coherence as one stream, its states are constantly chang-ing autonomously as various thoughts and images are generated. James writes that:

When we take a general view of the wonderful stream of our consciousness, what strikes is the pace of its parts. Like a bird’s life, it seems to be an alternation of flights and perchings (James, 1918, p. 243).

James considers that the stream comprises successions of substantive parts of stable “perchings” and transitive “flights.” Conscious states of thoughts and images appear more stably in the substantive parts. On the other hand, the transitive parts generate successive transitions from one substantive part to another in temporal association. This alternation between the two parts takes place only intermittently, and the dura-tion of each substantive part can be quite different but only in terms of subjective feeling of time. Here, we can find a structural similarity to what Tohru Tani (1998) interpreted as consciousness flowing as well as stagnating when referring to Husserl’s flow of absolute consciousness. Although it is said that the transitive parts function to connect and relate various thoughts and images, how are they actually felt phenom-enally? James describes them as a subtle feeling like when, immediately after hearing someone say something, a relevant image is about to pop into the mind but is not yet quite fully formed. Because the thoughts and images are so faint, they are lost if we attempt to catch them. The transitive parts are the fringes of stable images and relate to each other, where information flows like the free water of consciousness around these images. James considers these fringes to be more essential than stable images and that the actual stream of consciousness is generated by means of tensional dynamics between stable images related to each other by their fringes.

The third observation suggests that the private states of consciousness constantly change but only continuously so. James says that conscious-ness is not like chopped up bits or jointed segments but rather flows like a river. This statement appears to conflict with the concept of time per-ception at the objective time level put forward by Husserl, because he considered that objective time comprises sequences of discrete objects

Phenomenology 39

39

and events. However, James’ idea is analogous to the absolute flow level, as mentioned before. I suspect that James limited his observation of the stream of consciousness to the level of pure experience and did not pro-ceed to observation of the higher level such as Husserl’s objective time. We can consider then that the notion of the stream of consciousness evolved from James’ notion of present existence characterized by con-tinuous flow to Husserl’s notion of recall or reconstruction with trains of segmented objects. Alongside this discussion, from the notion of the sensible continuity of the stream of consciousness we can see another essential consequence of James’ thought, that the continuous generation of the next state of mind from the current one endows a feeling that each state in the stream belongs to a single enduring self. The experi-ence of selfhood— the feeling of myself from the past to the present as belonging to the same self— might arise from the sensible continuity of the conscious state.

Finally, the fourth observation professes that our consciousness attends to a particular part of experiences in the stream. Or, that con-sciousness brings forth some part of a whole as its object of attention. Heidegger (1962) attends to this under the heading of “attunement,” and James’ observations of this aspect of the stream of consciousness lead to his conception of free will. Free will is the capability of an agent to choose freely, by itself, a course of action from among multiple alter-natives. However, the essential question concerning free will is that if we suppose that everything proceeds deterministically by following the laws of physics, what is left that enables our will to be free? According to Thomas Hobbes, a materialist philosopher, “voluntary” actions are compatible with strict logical and physical determinism, wherein “the cause of the will is not the will itself, but something else which is not disposed of it” (Molesworth, 1841, p. 376). He considers that will is not in fact free at all because voluntary actions, rather than being random and uncaused, have necessary causes.

James proposed a possible model for free will that combines random-ness and deterministic characteristics, in the so- called two- stage model (James, 1884). In this model, multiple alternative possibilities are imag-ined with the help of some degree of randomness in the first stage and then one possibility is chosen to be enacted through deterministic evalu-ation of the alternatives in the second stage. Then, how can these possible alternatives, in terms of the course of actions or images, be generated? James considers that all possibilities are learned by way of experience. He

40 On the Mind

40

says, “when a particular movement, having once occurred in a random, reflex, or involuntary way, has left an image of itself in the memory, then the movement can be desired again, proposed as an end, and deliberately willed” (James, 1918, p. 487). He considers further that the iterative expe-riences of different movements result in connections and relations among various images of movements in memory. Then, multiple alternatives can be imagined as accidental generations with spontaneous variations from the memory that has been consolidated, and finally one of the alternatives is selected for actual enactment. These “accidental generations with spon-taneous variations” might be better understood by recalling how James’ stream of consciousness is constituted. The stream is generated by transi-tions of thoughts and images embedded in the substantial part. When the memory holds complex relations or connections between images of past experiences, images can be regenerated with spontaneous variations into streams of consciousness (see Figure 3.4 for an illustration of his ideas).

James considers that all of these things are mechanized by dynamics in the brain. He writes:

Consider once again the analogy of the brain. We believe the brain to be an organ whose internal equilibrium is always in a state of change— the change affecting every part. The pulses of change are

Experiences Learning Memory Multiple streams ofan image generatedwith spontaneousvariations

Actual actionselected

Stable image Transitiverelation

Figure 3.4. An interpretative illustration of James’s thought accounting for how possible alternatives can be generated. Learning from various experiences forms a memory that has a relational structure among substantial images associated with actions. Multiple streams of action images can be generated with spontaneous variations of transitions from among the images embedded in the memory. One of those streams of action images is selected for actual generation.

Phenomenology 41

41

doubtless more violent in one place than in another, their rhythm more rapid at this time than at that. As in a kaleidoscope revolving at a uniform rate, although the figures are always rearranging themselves, … So in the brain the perpetual rearrangement must result in some forms of tension lingering relatively long, whilst others simply come and pass (James, 1892).

It is amazing that more than 100 years ago James already had devel-oped such a dynamic view of brain processes. His thinking is compatible with today’s cutting- edge views outlined in studies on neurodynamic modeling, as seen in later chapters.

3.7. Summary

Skeptics about the symbolist framework for representing the world as put forward by traditional cognitive science and outlined in chapter 2 has led us in the present chapter to look to phenomenology for alter-native views. Let’s take stock of what we’ve covered. Phenomenology begins with an analysis of direct experiences that are not yet articu-lated by any ideas or thoughts. Husserl considers that objects and the world can exist because they can be meditated on, regardless of their corresponding existences in the physical world. Their representations are constituted by means of the intentionality of direct experiences, a process that entails consciousness. Although Husserl thinks that such representations are intended to be idealistic so as to be logically trac-table, his thinking has been heavily criticized by Dreyfus and other modern philosophers. They claim that the inclination to ideality with logical formalism has turned out to provide a foundation for the symbol-ist framework envisioned by current cognitive science.

It was Heidegger who dramatically redirected phenomenology by returning to the problem of being. Focusing on the ways of being in everyday life, Heidegger explains through his notion of being- in- the- world that things can exist on account of the relational structure between them, for example, considering our usage of things. His thinking lies behind my early question, discussed in the introduction to this book, as to what the object of a refrigerator can actually mean to a robot when it names it a “refrigerator.” The refrigerator should be judged not from its characteristic physical features but from the ways in which it is used,

42 On the Mind

42

such as for taking a chilled beer from it. Heidegger also says that such being is not noticed particularly in daily life as we are submerged in rela-tional structures, as usage becomes habit and habit proceeds smoothly. We become consciously aware of the individual being of the subject and the object only in the very moment of the breakdown in the purposeful relations between them; for example, when a carpenter mishits a nail in hammering, he notices that himself, the hammer, and the nail are independent beings. In a similar way, when habits and conventions break down, no longer delivering anticipated success, the authentic individual engages in serious reflection of these past habits, transforms them, and thus lives proactively for his or her “own most” future alongside and with others with whom these habits and conventions are shared.

Merleau- Ponty, who was influenced by Heidegger, examined bodies as ambiguous beings that are neither subject nor object. On Merleau- Ponty’s account, when seeing is regarded as being seen and touching as being touched, these different modalities of sensation intertwine and their reentrance through embodiment is iterated. By means of such iterative processes, the subject and the object constitute an inseparable being, reciprocally inserted into each other in the course of resolving the apparent conflicts between them in the medium of embodiment. Recently, his thoughts on embodiment have been revived and have provided significant influences in cognitive science in terms of the ris-ing “embodied minds” paradigm, such as by Varela and his colleagues (Varela, Thompson & Rosch, 1991).

We finished this chapter by reviewing how William James explained the inner phenomena of consciousness and free will. His dynamic stream of conscious is generated by spontaneous variations of images from past experiences consolidated in memory. More than a century later, his ideas are still inspiring work in systems neuroscience. By the way, do these thoughts deliberated by those philosophers suggest any-thing useful for building minds, though? Indeed, at the least we should keep in mind that action and perception interact in a complicated man-ner and that our minds should emerge via such nontrivial dynamic pro-cess. The next chapter examines neuroscience approaches for exploring the underlying mechanisms of the cognitive minds in biological brains.

43

43

4

Introducing the Brain and Brain Science

In the previous chapter, we saw that a phenomenological understanding of the mind has come from introspection and its expression through language. We understand the words used intuitively or deliberatively by matching them with our own experiences and images. This approach to understanding the mind, that of subjective reflection, is clearly an essential approach, and is especially valuable when coupled with the vast knowledge that has been accumulated through other scientific approaches, such as neuroscience, which make use of modern technolo-gies to help us understand how we think by understanding how the brain works. The approach that neuroscience, or brain science, takes is quite different from that of cognitive science and phenomenology because it rests on objective observation of biological phenomena in the brain. It attempts to explain biological mechanisms for various cogni-tive functions such as generating actions, recognizing visual objects, or recognizing and generating speech.

However, readers should note that brain science is still in a relatively early stage of development and we have no confirmative accounts even for basic mechanisms. What we do have is some evidence of what is hap-pening in the brain, albeit in many cases the evidence is still conflicting.

44 On the Mind

44

What we have to do is build up the most likely construct for a theory of the brain by carefully examining and linking together all the pieces of evidence we have thus far accumulated, held against guiding insights into the phenomenology of the human condition, such as those left by James and Merleau- Ponty, while adding yet more experimental evidence in the confirmation or disputation of these guiding insights. In the proc-ess, further guiding insights may be generated, and research into the nature of the mind relative to the function of the brain will advance.

The next section starts with a review of the current state of the art in brain science with a focus on the processes of visual recognition and action generation, essential for creating autonomous robots. First, the chapter provides a conventional explanation of each independently, and then covers recent views that argue that these two processes are effec-tively inseparable. At the end of this chapter, we introduce some ideas informed by our robotics experiments on how intentions for actions originate in (human and other animal, organic not artificial) brains.

4.1. Hierarchical Brain Mechanisms for Visual Recognition and Action Generation

This section explores how visual recognition and action generation can be achieved in brains by reviewing accumulated evidence. A spe-cial focus will be put on how those processes work with hierarchical organization in brains, because insights into this structure help to guide us in approaching outstanding questions in cognitive science, such as how compositional manipulations of sensory- motor patterns can be achieved, as well as how the direct experience of sensory– motor flow can be objectified.

4.1.1 Visual Recognition Through Hierarchy and Modularity

First, let us look at the visual recognition process. Visual recognition is probably the most examined brain function, because related neuronal processes can be investigated relatively easily in electrophysiological experiments with nonmoving, anesthetized animals. The visual stim-ulus enters the retina first, proceeds to the lateral geniculate nucleus in the thalamus, and then continues on to the primary visual cortex (V1). One important characteristic assumed in the visual cortex as

Introducing the Brain and Brain Science 45

45

well as in other sensory cortices is its hierarchical and modular pro-cessing, which uses specific neuronal connectivity between local regions. Figure 4.1 shows the visual cortex of a macaque monkey in which the visual stimulus from the retina through the thalamus enters V1 located in the posterior part of the cortex.

V1 is thought to be responsible for lower end processing such as edge detection by using so- called columnar organization. The cortical columns for edge detection in V1 are arrayed for continuously changing orienta-tion. The orientation of the perceived edge in the local receptive field is detected in a winner- take- all manner; that is, only the best matching column for the edge orientation is activated (i.e., neurons in the column get fired) and other columns become silent.

After V1, the signal propagates to V2 where columns undertake slightly more complex tasks such as perceiving different orientations of line segments by detecting the end terminals of the line segments. After V2, the visual processing pathway branches into two: The ventral path-way reaches areas TEO and TE in the inferotemporal cortex, passing through V4, and the dorsal pathway reaches areas LIP and VIP in the parietal cortex, passing through the middle temporal area (MT) and medial superior temporal area (MST). The ventral branch is called the what pathway owing to its main involvement in object identification and the latter is called the where pathway due to its involvement in informa-tion processing related to position and movement.

Taking the case of the where pathway first, it is said that the MT detects direction of object motion with a relatively small receptive field, whereas the MST detects background scenes with a larger recep-tive field. Because movements in the background scene are related to own body movements in many cases, the MST consequently detects

LIP: lateral intraparietal areaVIP: ventral intraparietal areaMST: medial superior temporal areaMT: middle temporal areaTEO, TE: inferior temporal areas

where

whatTEO

V4V2

MST/MT

LIPVIP

V1

TE

Figure 4.1. Visual cortex of the macaque monkey showing the “what” and “where” pathways schematically.

46 On the Mind

46

self- movements. This information is then sent to areas such as the VIP and LIP in the parietal cortex. Cells in the VIP are multisensory neu-rons that often respond to both a visual stimulus and somatosensory stimulus. For example, it has been found that some VIP neurons in macaque monkeys respond when the experimenter strokes the animal’s face, and the same neurons fire when the experimenter shakes the mon-key’s hand in front of its face. As discussed later, many neurons in the parietal cortex integrate visual inputs with another modality of sen-sation (i.e., somatosensory, proprioception, or auditory). LIP neurons are involved in processing saccadic eye movements, enabling the visual localization of objects.

In the case of the what pathway, cells in V4 respond to specific contours or simple object features. Cells in the TEO respond to both simple and complex object features, and cells in the TE respond only to complex object features. In terms of the visual processing that occurs in the inferotemporal cortex, inspiring observations were made by Keiji Tanaka (1993) when conducting single- unit recording1 in par-tially anesthetised monkeys while showing the animals a set of artifi-cially created complex object features. Columnar representations were found in the TE for a set of complex object features, wherein most of the cells in the same column reacted to similar complex object features (Figure 4.2).

1. Single- unit recording is a method of measuring the electrophysiological responses of a single neuron using a microelectrode system.

Figure 4.2. Cell responses to complex object features in area TE in the inferotemporal cortex. Columnar modular representation in the TE for complex visual objects. Redrawn from (Tanaka, 1993).


47

For example, in a particular column that encodes starlike shapes, different cells may react to similar starlike shapes that have a differ-ent number of spines. This observation suggests that TE columns rep-resent a set of complex object features discretely like visual alphabets, but allow a range of modulation of complex object feature within the column. It can be summarized then that visual perception of objects might be compositional in the what pathway, in the sense that a set of visual parts registered in a previous level of the hierarchy are spatially combined in its next level, as illustrated in Figure 4.3.

In the first stage in V1, edges are detected at each narrow local recep-tive field from the raw retinotopic image, and in V2 the edge segments are detected. In V4 with its larger receptive field, connected edge seg-ments for continuously changing orientations are detected as a single contour curvature. Then in the TEO, geometric combinations of con-tour curvatures in a larger again receptive field are detected as a simple object features (some could be complex object features). Finally, in the TE, combinations of the object feature are detected as a complex object feature. It seems that columns in each visual cortical area represent primitive features at each stage of visual processing. Furthermore, each primitive feature represented in a column might be parameterized for minor modulation by local cell firing patterns.

4.1.2 Counter Arguments

As mentioned in the beginning of this section, we must exercise some caution in interpreting actual brain mechanisms from the data available

TE

V4

V2

V1

Figure 4.3. Schematic illustration of visual perception in the what pathway.

48 On the Mind

48

to us thus far. Although the aforementioned compositional mechanisms for visual recognition were considered utilizing explicit representations of the visual parts stored in the local columns and hierarchical manipu-lation of those from the lower level to the higher, the real mechanism may not be so simply mechanical but also highly contextual.

There is accumulating evidence that neuronal response in the local receptive field in early vision can be modulated contextually by means of lateral interactions with areas outside of the receptive field as well as through top- down feedback from higher levels. Although contours are thought to be perceivable only after V4 in the classical theory, Li and colleagues (2006) showed that, in monkeys performing a contour detec-tion task, there was a close correlation between the responses of V1 neu-rons and the perceptual saliency of contours. Interestingly, they showed that the same visual contours elicited significantly weaker neuronal responses when they were not the objects of attention. They concluded that contours can be perceived even in V1 by using the contextual infor-mation available at this same level and the higher level.

Kourtzi and colleagues (2003) provided corroborative evidence that early visual areas V1 and V2 respond to global rather than simple local features. It was argued that context modulation in the early visual cor-tex has a highly sophisticated nature, in effect putting the local features to which the cells respond into their full perceptual global context. These experimental results were obtainable because of the use of awake animals rather than anesthetized ones during the recording. In the elec-trophysiological experiments of the visual cortex, animals are usually anesthetized so as to avoid contamination of purely bottom- up percep-tual signals with unnecessary top- down signals from the higher order cognitive brain regions such as the prefrontal cortex. Contrary to this method, however, top- down signals seem to be equally as important as the bottom- up ones in understanding the hierarchy of vision. Rajesh Rao and Dona Ballard (1999) proposed so- called predictive coding as a model for hierarchical visual processing in which the top- down signal conveys prediction from the higher level activity to the lower one, whereas the bottom- up signal conveys the prediction error signal from the lower level, which modulates the higher level activity. They argue that the visual recognition of complex objects is achieved via such interaction between these two pathways rather than merely through the bottom- up one. This insight is deeply important to the neurorobotic experiments to come.


49

The modularity of feature representation in the columnar organiza-tion is also questionable. Yen and colleagues (2007) made simultaneous recordings of multiple early visual cortex cells in cats while showing the animals movies containing scenes from daily life. What they found was that there is a substantially large heterogeneity in the responses of adja-cent cells in the same columns. This finding obviously conflicts with the classical view that cells with similar response properties are clustered together in columns. They mention that visual cortex cells could have multiple response dimensions.

To sum up, the presumption of strict hierarchical and modular proc-essing in visual recognition might have to be reconsidered given accumu-lated evidence obtained as experimental setups become more realistic. The next subsection begins this process concerning action generation in the brain.

4.1.3 Action Generation Through Hierarchy

Understanding the brain mechanisms behind action generation is essen-tial to our attempts at understanding how the mind works because actions tie the subjective mind to the objective world. It is generally thought that complex actions can be generated by moving through mul-tiple stages of processing in different local areas in the brain in a similar way to how visual perception is achieved. Figure 4.4 shows the main brain areas assumed to be involved in action generation in the cortex.

The supplementary motor area (SMA) and the premotor cortex (PMC) are considered to sit at the top of the action generation hierar-chy. Some researchers think that the prefrontal cortex may play a fur-ther higher functional role in action generation, sitting as it does above the SMA or PMC, and we will return to this view later. It is generally held that the SMA is involved in organizing action programs for volun-tary action sequences, whereas the PMC is involved in organizing action programs for sensory guided action sequences. Because these areas have dense projections to the primary motor cortex (M1), the idea is that detailed motor patterns along with the motor program are generated in M1. Then, M1 sends the motor pattern signals via the pons and cerebel-lum to the spinal cord, which then sends out detailed motor commands to the corresponding muscles to finally initiate physical movement. As a seminal study for the primary motor cortex, Georgopoulos and col-leagues (1982) found evidence in electrophysiological experiments in

50 On the Mind

50

monkeys that the direction of hand movement or reaching behavior is encoded by a population of neural activities in M1. In the following, we review possible relationships between SMA and M1 and between PMC and M1.

4.1.4 Voluntary Sequential Movements in the Supplementary Motor Area

Considerable evidence suggests that hierarchical relations exist between the SMA and M1. One well- known example involves patients with alien hand syndrome due to lesions in the SMA. These patients tend to gen-erate actions completely bypassing their consciousness. For example, when they see a comb, their hand reaches out to it and they comb their hair compulsively. It is essential to note that skilled behaviors involved in combing their hair are completely intact. The people act well, it is just that they seem unable to regulate their actions at will. By way of explanation, it is thought that the SMA might regulate the generation of skilled behaviors by placing inhibitory controls over M1, which encodes a set of basic movement patterns including the one for combing hair. So if this inhibitory control is attenuated by lesions in the SMA, the mere

Inferior parietal cortex

Parietal cortexM1

SMAPMC

Prefrontalcortex

Figure 4.4. The main cortical areas involved in action generation include the primary motor cortex (M1), supplementary motor area (SMA), premotor cortex (PMC), and parietal cortex. The prefrontal cortex and inferior parietal cortex also play important roles.


51

perception of a comb could automatically trigger the movement pattern for the combing of hair stored in M1.

Neurophysiological evidence for the encoding of voluntary sequential movement in the SMA was obtained in pioneering studies conducted by Tanji’s group (Tanji & Shima, 1994; Shima & Tanji, 1998; Shima & Tanji, 2000). In these studies, monkeys were trained to be able to regenerate a set of specific sequential movements involving a combination of three primitive movements— pulling, pushing, and turning a handle. In each sequence, the three primitive movements were connected in serial order with a specific time interval at each transition of movement. After the training, the monkeys were required to regenerate each learned sequen-tial movement from memory without any sensory cues being given. In this way, the task can be regarded as memory driven rather than sen-sory reactive. In the unit recording in the SMA during the regeneration phase, three types of task- related cells were found.

The first interesting finding was that 54 out of 206 recorded cells showed sequence- specific activities. Figure 4.5a shows raster plots of one of these 54 cells, in this case an SMA cell, which was activated only before the sequence Turn- Pull- Push (lower) was initiated, not before other sequences such as Turn- Push- Pull (upper) were initiated. It is interesting to note that it took a few seconds for the SMA cell to be fully activated before onset of the sequential movements and that the acti-vation was diminished immediately after onset of the first movement. It is assumed, therefore, that the cell is responsible for preparing the action program for the specific sequential movement. This is contrasted with the situation observed in the M1 cell shown in the raster plot in Figure 4.5b. The M1 cell started to become active immediately before the onset of the specific movement and became fully activated during the actual movement itself. The preparatory period of this M1 cell was quite short, within a fraction of a second.

Tanji and Shima’s results imply that some portion of SMA cells play an essential role in the generation of compositional actions by sequen-tially combining primitive movements. These cells might encode whole sequences as abstract action programs with slowly changing activation profiles during the preparatory period. This activity might then lead to the activation of other SMA cells that can induce specific transitions from one movement to another during run time by activating partic-ular M1 cells, as well as SMA cells that encode corresponding move-ments with rapidly changing activation profiles. Here, we can assume

52 On the Mind

52

a certain spatiotemporal structure that affords hierarchical organiza-tion of sequential movements. In later work, Shima and Tanji (2000) reported further important findings from more detailed recording in a similar task protocol. Some cells were found to play multiple func-tional roles: Some SMA cells encoded not only a single specific motor sequence, but two or three different sequences out of four trained sequences. This suggests an interesting neuroscientific result that a set of primitive sequences is represented by distributed activation of some SMA cells rather than each sequence being represented by some specific cells exclusively and uniquely.

Although evidence acquired by various brain measuring techniques supports the notion that hierarchical organization of voluntary sequen-tial movements occurs in the SMA for abstract sequence processing and in M1 for detailed movement patterns, this view is not yet set in stone. Valuable challenges have arisen against the idea of the SMA encod-ing abstract sequences. Lu and Ashe (2005) recorded M1 cell activity

sec.

sec.

sec.

Rasterplots

Mean�ring

Rasterplots

Mean�ring

Turn

SMA

Push Pull

Turn Pull Push

M1[SEQ4] Turn Pull Push

1s

1s

(a)

(b)

Figure 4.5. Raster plots of showing cell firing in multiple trials in the upper part and the mean firing rate across the multiple trials in the supplementary motor area (SMA) and primary motor cortex (M1) during trained sequential movements. (a) An SMA cell activated only in the preparatory period for initiating the Turn- Pull- Push sequence shown in the bottom panel, not for other sequences such as the Turn- Push- Pull sequence shown in the top panel. (b) An M1 cell encoding the single Push movement. Adopted from Tanji and Shima (1994) with permission.


53

during sequential arm movements in monkeys. In the task, each arm movement was either downward, upward, toward the left, or toward the right. It was found that the neural activity of some M1 cells imme-diately before onset of the sequential movements “anticipated” the com-ing sequences, and that 40% of the recorded M1 cells could do this. Surprisingly, this percentage is much higher than that observed in the SMA by Tanji and Shima. Are the sequence- related activities of M1 cells merely epiphenomena that reflect the activity of SMA cells upstream or do they actually function to initiate corresponding motor sequences? Lu and Ashe dispelled any doubt about the answer by demonstrating that a lesion among the M1 cells, artificially created by microinjection of chemicals, degraded only the generation of sequences not each move-ment. It seems then that M1 cells primarily encode sequences rather than each movement, at least in the monkeys and cells involved in Lu and Ashe’s experiment.

4.1.5 Sensory- Guided Actions in the Premotor Cortex

The SMA is considered by most to be responsible for organizing com-plex actions such as sequential movements based on internal motiva-tion, whereas the PMC is considered to generate actions in a more externally driven manner by making use of immediate sensory infor-mation. Mushiake in Tanji’s group showed clear neurophysiological evidence for this dissociation (Mushiake et al., 1991). They trained monkeys to generate sequential movements under two different condi-tions: the internal motivation condition in which the monkeys remem-bered sequential movements and reproduced them from memory, and the external sensory driven condition in which the monkeys generated sequential movements guided by given visual cues. Unit recording in both the SMA and PMC during these two task conditions revealed a distinct difference in the functional roles of these two regions. During both the premovement and movement periods, PMC neurons were more active when the task was visually guided and SMA neurons were more active when the sequence was self- determined from memorized sequential movements. It is known that there are so- called bimodal neu-rons in the PMC that respond to both specific visual stimuli and to one’s own movement patterns. These bimodal neurons in the PMC associ-ated with visual movement are said to receive “what” information from the inferotemporal cortex and “where” information from the parietal

54 On the Mind

54

cortex. Thus, these bimodal neurons seem to enable the PMC to orga-nize sensory- guided complex actions.

Graziano and colleagues (2002) in their local stimulation experi-ments on the monkey cortex demonstrated related findings. However, in some aspects, their experimental results conflict with the conven-tional ideas that M1 encodes simple motor patterns such as directional movements or reaching actions as shown by Georgopoulos and col-leagues. They stimulated motor- related cortical regions with an elec-tric current and recorded the corresponding movement trajectories of the limbs. Some stimuli generated movements involved in reaching to specific parts of the monkey’s own body including the ipsilateral arm, mouth, and chest, whereas others generated movements involving reaching toward external spaces. They found some topologically pre-served mapping from sites over a large area including M1 and PMC to the generated reaching postures. The hand reached toward the lower space when the dorsal sites in the region were stimulated, for example, but reached toward the upper space when the ventral and anterior sites were stimulated. It was also found that many of those neurons were bimodal neurons exhibiting responses also to sensory stimulus. Given these results, Graziano and colleagues have adopted a different view from the conventional one in that they believe that functional specifi-cation is topologically parameterized as a large single map, rather than there being separate subdivisions such as M1, the PMC, and the SMA that are responsible for differentiable aspects of motor- related func-tions in a more piecemeal fashion.

So far, some textbookish evidence has been introduced to account for the hierarchical organization of motor generation, whereby M1 seems to encode primitive movements, and the SMA and PMC are together responsible for the more macroscopic manipulation of these primitives. At the same time, some counter evidence was introduced that M1 cells function to sequence primitives as if no explicit differences might exist between M1 and the PMC. Some evidence was also presented indicating that many neurons in the motor cortices are actually bimodal neurons that participate not only in motor action generation but also in sensory perception. The next section explores an alternative view accounting for action generation mechanisms, which has recently emerged from obser-vation of bimodal neurons that seem to integrate these two processes of action generation and recognition.


55

4.2. A New Understanding of Action Generation and Recognition in the Brain

This book has alluded a number of times to the fact that perception of sensory inputs and generation of motor outputs might best be regarded as two sides of the same coin. In one way, we may think that a motor behavior is generated in response to a particular sensory input. However, in the case of voluntary action, intended behaviors performed by bodies acting on environments necessarily result in changes in proprioception, tactile, visual, and auditory perceptions. Putting two together, a subject should be able to anticipate the perceptual outcomes for his or her own intended actions if similar actions are repeated under similar conditions. Indeed, the developmental psychologists Eleanor Gibson and Anne Pick have emphasized the role of perception in action generation. They once wrote in their seminal book (2000) that infants are active learners who perceptually engage their environments and extract information from them. In their ecological approach, learning an action is not just about learning a motor command sequence. Rather, it involves learning possible perceptual structures extracted during intentional interactions with the environment. Indeed, actions might be represented in terms of an expec-tation of the resultant perceptual sequences caused by those intended actions. For example, when I reach for my mug of coffee, it might be represented by a particular sequence of proprioception for my hand to make the preshape for grasping, as well as a particular sequence of visual perception of my hand approaching the mug with a specific expectation related to the moment of touching it. Eminent neuroscientist Walter Freeman (2000) argues that action generation can be regarded as a pro-active process by supposing this sort of action– perception cycle, rather than as the more passive, conventional perception– action cycle whereby motor behaviors are generated in response to perception.

Upon keeping minds of these arguments, this chapter starts by exam-ining the functional roles of the parietal cortex, as this area appears to be the exact place where the top- down perceptual image for action intention originating in the frontal area meets the perceptual reality originating bottom- up from the various peripheral sensory areas. Thus located, the parietal cortex may play an essential role in mediating between the two, top and bottom. It then examines in detail so- called mirror neurons that are thought to be essential to pair generation and

56 On the Mind

56

to perceptual recognition of actions. It is said that the finding of mir-ror neurons drastically changed our understanding of the brain mecha-nisms related to action generation and recognition. Finally, the chapter rounds out by looking at neural correlates for intentions, or “will,” that are thought to be initiated farthest upstream in the actional brain net-works, by examining some evidence from neuroscience that bear on the nature of free will.

4.2.1 The Parietal Cortex: Where Action Intention and Perceptual Outcome Meet

The previous section (4.1) discussed the what and where pathways in visual processes. Today, many researchers refer to the where path-way that stretches from V1 to the parietal cortex as the how pathway because recent evidence suggests that it is related more to behavior gen-eration that makes use of multimodal sensory information than merely to spatial visual perception. Mel Goodale, David Milner, and colleagues (1991) conducted a series of investigations on patient D. F. who had visual agnosia, a severe disorder of visual recognition. When she was asked to name some household items, she misnamed them, calling a cup an ashtray or a fork a knife. However, when she was asked to pick up a pen from the table, she could do it smoothly. In this sense then, the case of D. F. is very similar to that of Merleau- Ponty’s patient Schneider (see chapter 3). Goodale and Milner tested D. F.’s ability to perceive the three- dimensional orientation of objects. Later, D. F. was found to have bilateral lesions in the ventral what pathway, but not in the dorsal how pathway, in the parietal cortex. This implies that D. F. could not recog-nize three- dimensional objects visually using information about their category, size, and orientation because her ventral what pathway includ-ing the inferotemporal cortex was damaged. She could, however, gener-ate visually guided behaviors without conscious perception of objects. This was possible because her dorsal pathway including the parietal cor-tex was intact. Thus, the parietal cortex appears to be involved in how to manipulate visual objects, by allowing a close interaction between motor components and sensory components.

That the parietal cortex involves the generation of skilled behaviors by integrating vision- related and motor- related processes is a notion supported by the findings of electrophysiological experiments, espe-cially those concerning bimodal neurons in the parietal cortex of the


57

monkey during visually guided object manipulation. Hideo Sakata and colleagues (1995) identified populations of neurons that fire both when pushing a switch and when visually fixating on it. Skilled object manip-ulation behaviors such as pushing a switch should require an association between the visual information about the object itself and the motor outputs required for acting on it and so, by extension, some popula-tions of parietal cortex neurons should participate in this association by accessing both modalities of information.

Damage to the parietal cortex in humans, such as that caused by cere-bral hemorrhage due to stroke or trauma, for instance, can result in various deficits in skilled behavior needed for tool use. In the disorder ideational apraxia, individuals cannot understand how to use tools: If they are given a comb, they might try to brush their teeth with it. In ideomotor apraxia, individuals have difficulty particularly with miming: When asked to mime using a knife, they might knock on the table with their fist or when asked to mime picking up tiny grains of rice, they move their hand toward the imagined grains but with it wide open. These clinical observations suggest that the parietal cortex might store some forms of knowledge, or “models,” about the external world (e.g., objects, tools, and surrounding workspace), and through these models various mental images about pos-sible interactions with the external world can be composed.

How can the skills or knowledge for object manipulation, or tool usage, be mechanized in the parietal cortex? Such skills would seem to require not only motor pattern generation but also proactive represen-tation of the perceptual image associated with the motor act. Although the parietal cortex is conventionally seen as being responsible for inte-grating input from multiple sensory modalities, an increasing number of recent studies suggest that the parietal cortex might participate in predicting perceptual inputs associated with behaviors by acquiring some type of internal model (Sirigu et al., 1996; Eskandar & Assad, 1999; Desmurget & Grafton, 2000; Ehrsson et al., 2003; Mulliken et al., 2008; Bor & Seth, 2012). In particular, Mulliken and colleagues (2008) found direct evidence for the existence of the predictive model in the parietal cortex in their unit recording experiment involving monkeys performing a joystick task to control a cursor. They found that specific cells in the parietal cortex encode temporal estimates of the direction in which the cursor is moving, estimates that cannot be obtained directly from either of the current sensory inputs or motor outputs to the joystick, but can be obtained by forward prediction.

58 On the Mind

58

Now, let’s consider how predicting perceptual sequences could facilitate the generation of skilled actions in the parietal cortex. Some researchers have considered that a predictive model referred to as the forward model and assumed to operate in the cerebellum might also help us to understand what is happening in the parietal cortex. Masao Ito, who is famed for his findings linking long- term depres-sion to the cerebellum, suggested that the cerebellum might host internal models for action (Ito, 1970). Following Ito’s idea, Mitsuo Kawato and Daniel Wolpert constructed detailed forward models— computational models— that account for optimal control of arm movements (Kawato, 1990; Wolpert & Kawato, 1998). The forward model basically predicts how the current sensory inputs change in the next time step for arbitrary motor commands given in the cur-rent time step. In the case of arm movement control, the forward model predicts changes in the angular positions of the arm joints as output when given joint motor torques as input. Adequate training of the forward model based on iterative past experience of how joint angles change due to particular applied motor torques can generate a good predictive model. More recently, Ito (2005) suggested that the forward model might be first acquired in the parietal cortex and the model further consolidated in the cerebellum later. In addition, Oztop, Kawato, and Arbib (2006) as well as Blakemore and Sirigu (2003) have suggested that both the parietal cortex and cerebellum might host the forward model.

I, however, speculate that the predictive model in the parietal cortex may predict the perceptual outcome sequence as corresponding not to motor commands at each moment but to macroscopic states of “inten-tion” for actions that might be sent from the higher- order cognition pro-cessing area such as the prefrontal cortex (Figure 4.6). For example, for a given intention of “throwing a basketball into a goal net,” the correspond-ing visuo- proprioceptive flow consisting of proprioceptive trajectory of body posture change and visual trajectory of the ball falling into the net can be predicted. In a similar manner, such predictive models acquired by a skilled carpenter can predict the visuo- auditory- proprioceptive flow associated with an intention of “hitting a nail.” These illustrations just follow the aforementioned thought by Gibson and Pick. The point here is that a predictive model may not need to predict the perceptual out-comes for all possible combinations of motor commands, including many unrealistic ones. If the predictive model attempts to learn to predict


59

all possible motor command combinations, such an attempt will face a combinatorial explosion, which has been known as the “frame problem” (McCarthy, 1963) in AI research. Instead, a predictive model needs to predict possible perceptual trajectories associated only with a set of well- practiced familiar actional intentions.

Jeannerod (1994) has conjectured that individuals have so- called motor imagery for their well- practiced behaviors. Motor imagery is a mental process by which an individual imagines or simulates a given action without physically moving any body parts or sensing any signals from the outside world. The predictive model assumed in the parietal cortex can generate motor imagery by means of a look- ahead predic-tion of multimodal perceptual trajectories over a certain period. Indeed, Sirigu and colleagues (1996) compared healthy individuals, patients with damage to the primary motor area, and ones in the parietal cortex reported that patients with lesions in the parietal cortex showed selec-tive impairment in generating motor imagery.

If the predictive model just predicts perceptual sequences for given intentions for action, how can motor command sequences be obtained? It can be considered that a predicted body posture state in terms of antici-pated proprioception might be sent to the premotor cortex or primary

Visualperception

Mismatch

Intention

Mismatch info.

Parietal

Motorcommand

Proprioceptiveprediction

Visualprediction

M1S1

Figure 4.6. Predictive model in the parietal cortex. By receiving intention for action from the prefrontal cortex it predicts perceptual outcomes such as visuo- proprioceptive trajectories. Prediction of proprioception in terms of body posture results in the generation of necessary motor command sequences for achieving it. The intention is modified in the direction of minimizing the mismatch between the prediction and the perceptual outcome.

60 On the Mind

60

motor cortex (M1) via primary somatosensory cortex (S1) as a target posture to be achieved in the next time step. This information is further sent to the cerebellum, where the necessary motor commands or muscle forces to achieve this target posture might be composed. The target sen-sory signal could be a reaction force that is anticipated to be perceived, for example, in the thumb and index finger in the case of precisely grasp-ing a small object. Again, the cerebellum might compute the necessary motor torque to be exerted on the thumb and finger joints in order to achieve the expected reaction force. This constitutes the top- down sub-jective intentional pathway acting on the objective world as introduced through the brief review of phenomenology given in chapter 3.

Let’s look next at the bottom- up recognition that is thought to be the counterpart to top- down prediction. The prediction of sensory modali-ties such as vision and tactile sensation that is projected to each periph-eral sensory area through the top- down pathway might be compared with the actual outcome. When the visual or tactile sensation actually perceived is something different from the predicted sensation, like in the situation described by Heidegger wherein the hammer misses hit-ting a nail (see chapter 3), the current intention of continuing to hit the nail would be shifted consciously to a different intention such as look-ing for the mishit nail or in searching for an unbroken hammer. If the miss- hit does not happen, however, everything will continue on auto-matically as expected without any shifts occurring in the current inten-tion. Such shifts in intentional states might be brought about through the mismatch error between prediction and perceptual reality. When such a mismatch is generated, the intention state may be updated in the direction of minimizing the mismatch error. As the consequence of interaction between these top- down and bottom- up processes, current intentions can be reformed in light of a changing situation or mistaken environment.

When action changes the perceptual reality from the one expected, the recognized perceptual reality alters the current intention. This aspect of top- down and bottom- up interaction is analogous to predic-tive coding suggested for hierarchical visual processing as proposed by Rao and Ballard (see section 4.1). The obvious question to ask is whether in fact the brain actually employs such intention adjustment mecha-nisms by monitoring the outcomes of its own predictions or not. There is some recent evidence to this effect based on human brain imaging techniques including functional magnetic resonance imaging (fMRI)


61

and electroencephalography (EEG). Both techniques are known to be good at measuring global brain activity and to compliment one another, with relatively good spatial resolution from fMRI and good temporal resolution from EEG. These imaging studies have suggested that the temporoparietal junction (TPJ), where the temporal and parietal lobes meet, the inferior frontal cortex, and the SMA may all be involved in detecting mismatches between expected and actual perception in mul-timodal sensations (Downar et al., 2000; Balslev et al., 2005; Frith & Frith, 2012). It may be the TPJ that triggers adjustments in current action by detecting such mismatches (Frith & Frith, 2012).

That said, it may be reasonable to consider the alternative, that inter-actions between top- down prediction with a specific intention and bottom- up modification of this intention take place in a web of local networks including the frontal cortex, parietal cortex, and the vari-ous peripheral sensory areas, rather than in one specific local region. From this more distributed point of view, whatever regions are actually involved, it is the interactions between them that are indispensable in the organization of diverse intentional skilled actions in a changeable environment.

4.2.2 Returning to Merleau- Ponty

The concept behind the predictive model accords well with some of Merleau- Ponty’s thinking, as described in chapter 3. In his analysis of a blind man walking with a stick, he writes that the stick can be also a part of the body when the man scans his surroundings by touching its tip to things. This phenomenon can be accounted for by the acquisition of a predictive model for the stick. During a lengthy period in which the man uses the same stick, he acquires a model through which he can anticipate how tactile sensation will propagate from the tip of stick while touching things in his environment. Because of this unconscious anticipation, which we can think about in terms of Husserl’s notion of protention (e.g., we would anticipate hearing the next note of “mi” when hearing “re” in “do- re- mi,” as reviewed in chapter 3), and recalling Heidegger’s treatment of equipment as extensions of native capacities for action, the stick could be felt to be a part of the body, provided that the anticipation agrees with the outcome.

Related to this, Atsushi Iriki and colleagues (1996) made an impor-tant finding in their electrophysiological recording of the parietal cortex

62 On the Mind

62

in monkeys during a tool manipulation task. Monkeys confined to chairs were trained to use a rake to draw toward them small food objects located in front of them. After the training, neurons in the intraparietal sulcus, a part of the parietal cortex, were recorded for two phases: cap-turing the food without the rake and capturing the food with it. In the without- rake phase, they found that some bimodal neurons fired either when a tactile stimulus was given to the palm of the hand or when a visual stimulus approached the vicinity of the palm. It was shown that these particular neurons have a certain receptive field. Thus, each neuron fires only when the visual or tactile stimulus comes to a specific position relative to the palm (Figure 4.7a).

Surprisingly, in the with- rake phase, the same neurons fired when the visual stimulus approached the vicinity of the rake, thus demon-strating an extension of the visual receptive field to include the rake (Figure 4.7b). This shifting of the receptive field from the vicinity of the hand to that of the rake implies that the monkey perceives the rake as a part of the body when extended from the hand and purposefully employed, in the same way that the stick becomes a part of the body of a blind man. Monkeys thus seem to embody a predictive model that includes possible interactions between the rake and the food object.

The phantom limb phenomenon described in chapter 3 can be under-stood as an opposite case to that of the blind man’s stick. Even though

(a)

(b)

Figure 4.7. The receptive field of neurons in the intraparietal sulcus (a) in the vicinity of the hand in the without- rake phase and (b) extended to cover the vicinity of the rake in the with- rake phase.


63

the limb has been amputated, the predictive model for the limb might remain as a “familiar horizon,” as Merleau- Ponty would say, which would generate the expectation of a sensory image corresponding to the current action intention, which is then sent to the phantom limb from the motor cortex. The psychosomatic treatment invented by Ramachandran and Blakeslee (1998) using the virtual- reality mirror box provided patients with fake visual feedback that an amputated hand was moving. This feedback to the predictive model would have evoked the propriocep-tive image of “move” for the amputated limb by modifying the current intention from “freeze” to “move,” which might result in the feeling of twitching that patients experience in phantom limbs.

Merleau- Ponty held that synesthesia, wherein sensation in one modality unconsciously evokes perception in another, might originate from iterative interactions between multiple modalities of sensation and motor outputs by means of reentrant mechanisms established in the coupling between the world and us (see chapter 3). If we consider that the predictive model deals with the anticipation of multimodality sensations, it is not feasible to assume that each modality of sensation anticipates this independently. Instead, a shared structure should exist or be organized that can anticipate incoming sensory flow from all of the modalities together. It is speculated that a dynamic structure such as this is composed of collective neuronal activity, and it makes sense to consider that the bimodal neurons found in the parietal cor-tex as well as in the premotor cortex might in part constitute such a structure.

In sum then, the functional role of the parietal cortex in many ways reflects what Merleau- Ponty was pointing to in his philosophy of embodi-ment. Actually, the how pathway stretching through the parietal cortex is reminiscent of ambiguity in Merleau- Ponty’s sense, as it is located mid-way between the visual cortex that receives visual inputs from the objec-tive world and the prefrontal cortex that provides executive control with subjective intention over the rest of the brain. Several fMRI studies of object manipulation and motor imagery for objects have shown signifi-cant activation in the inferior parietal cortex. Probably the goal of object manipulation propagates from the prefrontal cortex through the supple-mentary motor area to the parietal cortex via the top- down pathway, whereas perceptual reality during manipulation of the object propagates from the sensory cortices, including the visual cortex and somatosen-sory cortex for tactile and proprioceptive sensation, via the bottom- up

64 On the Mind

64

pathway. Both of these pathways likely intermingle with each other, with close interaction occurring in the parietal cortex.

4.2.3 Mirror Neurons: Unifying the Generation and Recognition of Actions

Many researchers would agree that the discovery of mirror neurons by Rizzolatti’s group in 1996 is one of the most important findings for sys-tems neuroscience in recent decades. Personally, I find the idea of mir-ror neurons very appealing because it promises to explain how the two essential cognitive processes of generating and recognizing actions can be unified into a single system.

4.2.4 The Evidence for Mirror Neurons

In the mid- 1990s, researchers in the Rizzolatti laboratory in Parma were investigating the activities of neurons in the ventral premotor area (PMv) in the control of hand and mouth movements in monkeys. They had found that these neurons fired when the monkey grasped food objects, and whenever they fired, electrodes activated electronic cir-cuitry to give an audible beep. Serendipitously, one day when a graduate student entered the lab with an ice cream cone in his hand, every time he brought it to his lips, the system responded with a beep! The same neurons were firing both when the monkey grasped food objects and moved them to its mouth and when the monkey observed others doing a similar action. With a grad student bringing an ice cream cone to his mouth, mirror neurons were discovered!

Figure 4.8 shows the firing activity of a mirror neuron responding to a particular self- generated action as well as to the same action per-formed by an experimenter. Figure 4.8a shows a PMv neuron firing as the monkey observes the experimenter grasping a piece of food. Here, we see that the firing of the neuron ceases as the experimenter moves the food toward the monkey. Then, the same neuron fires again when the monkey grasps the food given by the experimenter. In Figure 4.8b, it can be seen that the same neuron does not fire when the monkey observes the experimenter picking up the food with an (unfamiliar!) tool, but thereafter firing occurs as described for the rest of the sequence of events in (a).


65

Besides these “grasping neurons,” they also found “holding neurons” and “tearing neurons” that functioned in the same way. There are two important characteristics about these mirror neurons. The first is that they encode for entire goal- directed behaviors, not for parts of them, that is, the grasping neurons do not fire when the monkey is just about to grasp the object. The second characteristic is that all the mirror neu-rons found in the monkey experiments are related to transitive actions toward objects. Mirror neurons in the monkey so far do not respond to intransitive behaviors such as reaching the hand toward a part of the body. That said, however, it looks to be a different case for humans, as recent human fMRI imaging studies found mirror systems also for intransitive actions (Rizzolatti & Craighero, 2004).

Recent monkey experiments by Rizzolatti’s group (Fogassi et al., 2005) have indicated that mirror neurons can be observed in the inferior parietal lobe (IPL) and that these function to both generate and recognize goal- directed actions composed of sequences of elementary movements. In their experiments, monkeys were trained to perform two different goal- directed actions: to grasp pieces of food and then move them to their own mouths to eat, and to grasp solid objects (the same size and shape as the food objects) and then place them into a cylinder. Interestingly, the

Spik

es s

–1 20

10

00 1 2 3 4 5

Time (s)

Spik

es s

–1 20

10

00 1 2 3 4 5

(a)

Time (s)

(b)

Figure 4.8. How mirror neurons work. (a) Firing of a mirror neuron shown in raster plots and histograms in the two situations in which the monkey observes the experimenter grasp a piece of food (left) and thereafter when the monkey grasps the same piece of food (right). (b) The same mirror neuron does not fire when the monkey observes the experimenter pick up the food with a tool (left), but it fires again when the monkey grasps the same piece of food (right). Adopted from (Rizzolatti et al., 1996) with permission.

66 On the Mind

66

activation patterns of many IPL neurons while grasping the objects differ depending on the subsequent goal, namely to eat or to place, even though the kinematics of grasping in both cases are the same. Supplemental experiments confirmed that the activation preferences during grasping do not originate from differences in visual stimuli between food and a solid object, but from the difference between goals. This view is reinforced by the fact that the same IPL neurons fired when the monkeys observed the experimenters achieving the same goals. These IPL neurons can therefore also be regarded as mirror neurons. It is certainly interesting that mirror neuron involvement is not limited to the generation and recognition of simple actions, but also occurs with compositional goal- directed actions consisting of chains of elementary movements.

Recent imaging studies focusing on imitative behaviors have also iden-tified mirror systems in humans. Imitation is considered to be cognitive behavior whereby an individual observes and replicates the behaviors of others. fMRI experimental results have shown that neural activa-tion in the posterior part of the left inferior frontal gyrus as well as in the right superior temporal sulcus increases during imitation (Iacoboni et al., 1999). If we consider that the posterior part of the left inferior frontal gyrus (also called Broca’s area) in humans is homologous to the PMv or F5 in monkeys, it is indeed feasible that these local sites could host mirror neurons in humans. Although it is still a matter of debate as to how much other animals including nonhuman primates, dolphins, and parrots can perform imitation, it is still widely held that the imita-tion capability uniquely evolved in humans has enabled them to acquire wider skills and knowledge about human- specific intellectual behaviors including tool use and language.

Michael Arbib (2012) has explored possible linkages between mir-ror neurons and human linguistic competency. Based on accounts of the evolutionary pathway from nonhuman primates to human, he has developed the view that the involvement of mirror neurons in embod-ied experience grounds brain structures that underlie language. He has hypothesized that what he calls the “human language- ready brain” rests on evolutionary developments in primates including mirror system pro-cessing (for skillful manual manipulations of objects, imitation of the manipulations performed by others, pantomime, and conventionalized manual gestures) that initiates the protosign system. He further pro-posed that the development of protosigns provided the scaffolding essen-tial for protospeech in the evolution of protolanguage (Arbib, 2010).


67

This hypothesis is interesting in light of the fact that mirror neurons in human brains might be responsible for recognizing the intentions of others as expressed in language. Actually, researchers have exam-ined this idea using various brain imaging techniques such as fMRI, positron emission tomography, and EEG. Hauk and colleagues (2004) showed in an fMRI experiment that reading action- related words with different “end effectors,” namely “lick,” “pick,” and “kick,” evoked neural activities in the motor areas that overlap with the local areas responsible for generating motor movements in the face, arm, and leg, respectively. More specifically, “lick” activated the sylvian fissure, “pick” activated the dorsolateral sites of the motor cortex, and “kick” activated the vertex and interhemispheric sulcus. Broca’s area was acti-vated for all three words. Tettamanti and colleagues (2005) observed similar types of activation patterns when their subjects listened to action- related sentences such as “I bite an apple,” “I grasp a knife,” and “I kick a ball.” Taken together, these results suggest that understanding action- related words or sentences generates certain canonical activa-tion patterns of mirror neurons, possibly in Broca’s area, which in turn initiate corresponding activations in motor- related areas. These results also suggest that Broca’s area might be a site of mirror neuronal activ-ity in humans.

Vittorio Gallese and Alvin Goldman (1998) suggest that mirror neu-rons in humans play an essential role in theory of mind in social cogni-tion. The theory of mind approach postulates that although the mental states of others are hidden from us, they can be inferred to some extent by applying naïve theories or causal rules about the mind to the observed behavior of others. They argue for a simulation theory, whereby the men-tal states of others are interpretable through mental simulations that adopt their perspective, by tracking or matching their states with states of one’s own. If these aforementioned human cases are granted, it can be said that the mirror neuron system has played an indispensable role in the emergence of uniquely human cognitive competencies from evolu-tionary pathways, from manual object manipulation, to protolanguage and to theory of mind.

4.2.5 How Might Mirror Neurons Work?

The reader may ask how the aforementioned mirror neural functions might be implemented in the brain. Let’s consider the mirror neuron

68 On the Mind

68

mechanism in terms of the aforementioned predictive model (see Figure 4.6), which we assumed may be located in the parietal cortex. If we assume that mirror neurons encode intention for action, we can easily explain how a particular activation pattern of the mirror neurons can lead to the generation of one’s own specific action and how recogni-tion of the same action performed by others can lead to the same action pattern in the mirror neurons. (Although Figure 4.6 assumed that the intention might be hosted somewhere in the prefrontal area, it could be by mirror neurons present in this area including Broca’s area for humans.)

In generating one’s own actions, such as grasping a coffee cup, expected perceptual sequences in terms of relative position, ori-entation, and posture of one’s own hand in relation to the cup are predicted by receiving inputs from mirror neuron activation that rep-resents the intentional state for this action. Different actions can be generated by receiving inputs of different mirror neuron activation patterns, whereby the mirror neurons function as a switcher between a set of intentional actions. Recognition of the same action performed by others can be achieved by utilizing the mismatch information as described previously. In the case of observing others grasp the coffee cup, the corresponding intentional state in terms of the mirror neuron activity pattern can be searched such that the reconstructed percep-tual sequence evoked by this intentional state can best fit with the actually perceived one in the coordinate system relative to the cof-fee cup, thereby minimizing the mismatch error. On this model, the recognition of other’s actions causes one to feel as if one’s own actions were being generated, due to the generation in the mirror neurons of motor imagery representing the same intentional state.

This assumption accords exactly with what Gallese and Goldman (1998) suggested for mirror neurons in terms of simulation theory as described previously. They suggested that mirror neuron discharge serves the purpose of retrodicting target mental states, moving back-ward from the observed action, thus representing a primitive version of a simulation heuristic that might underlie “mind- reading.” We will come back to the idea of the predictive coding model for mirror neu-rons in greater detail as we turn to related robotics experiments in later chapters.


69

4.3. How Can Intention Arise Spontaneously and Become an Object of Conscious Awareness?

In this chapter so far, we have seen that voluntary actions might be generated by means of a top- down drive by an intention. The intention could be hosted by the mirror neurons or other neurons in the prefron-tal cortex. Wherever it is represented in the brain, we are left with the essential question of how an intention itself can be set or generated. This question is related to the problem of free will that was introduced in the description of Williams James’s philosophy (see chapter 3). As he says, free will might be the capability of an agent to choose independently a course of action freely from among multiple alternatives.

The problem about free will then concerns its origin. If every aspect of free will can be explained by deterministic physical laws, there should be no space actually remaining for free will. Can our minds set intentions for actions absolutely freely without any other causes? Can intentions shift from one to another spontaneously in a chain for gen-erating various actions? Another interesting question concerns the issue of consciousness. If we can freely determine our actions, how can this determination be accompanied by consciousness? Or more simply, how can I feel consciously that I have just determined to do one thing and not another? Although there have been no definitive answers to this philo-sophical question thus far, there have been some interesting experimen-tal results showing possible neural correlates of intention and free will.

4.3.1 Searching for the Neural Correlates of Intention

I would like to introduce, first, the seminal study on conscious inten-tion conducted by Benjamin Libet. In his experiments (Libet, 1985), subjects were asked to press a button with their right hands at whatever moment they wished and their EEG activity was recorded from their scalp. Libet was trying to measure the exact timing when the subjects became conscious of their decision to initiate the button press action, which he called “w- judgment” time. The subjects were asked to watch a rotating clock hand and to remember the exact position of the clock hand when they first felt the urge to move their hand to press the but-ton. By asking the subjects to report the position after each button press

70 On the Mind

70

trial, the exact timing of their conscious intention to act could be mea-sured for each trial. It was found that the average timing of conscious intent to act is 206 ms before the onset of muscle activity and that the Readiness Potential (RP) to build up brain activity (as measured by EEG) started 1 s before movement onset (Figure 4.9).

This EEG activity was localized in the SMA. This is a somewhat sur-prising result because it implies that the voluntary action of pressing the button is not initiated by conscious intention but by unconscious brain activity, namely the readiness potential evoked in the SMA. At the very least, it demonstrates that one prepares to act before one decides to act.

It should be, however, noted that Libet’s experiment has drawn sub-stantial criticism along with enthusiastic debates on the results. It is said that subjective estimate of time for consciousness arising is not reli-able (Haggard, 2008). Also, Trevena and Miller (2002) reported that many reported conscious decision times were before the onset of the Lateralized Readiness Potential that represents actual preparation for movement as opposed to RP representing contemplation for movement as a future possibility.

However, it is also true that Libet’s study has been replicated by oth-ers and further extended experiments have been conducted (Haggard, 2008). Soon and colleagues (2008) showed that this unconscious brain activity to initiate voluntary action begins much longer before the onset

0–1–2+

Volta

ge

–

Readinesspotential

onset

Consciousdecision

–206 ms

500 ms – 1000 ms

Movementonset

Time (s)

Figure 4.9. The readiness potential to build up brain activity prior to movement onset, recorded during a free decision task conducted by Libet (1985).


71

of physical action. By utilizing fMRI brain imaging, they demonstrated that brain activity is initiated in the frontopolar part of the prefrontal cortex and in the precuneus in the medial area of the superior parietal cortex up to 7 s before a conscious decision is made to select either pressing the left button with the left index finger or the right button with the right index finger. Moreover, from the brain activity observed, the outcome of the motor decision to select between the two actions (a selection the subjects did not consciously make) could be predicted from this early brain activity, prior to reported consciousness of such selection.

4.3.2 How to Initiate Intentions and Become Consciously Aware

The experimental evidence provided by Libet and Soon’s group can be integrated to produce the following hypothesis. Brain activity for selecting a voluntary action is initiated unconsciously in the frontopolar part of the prefrontal cortex or in the precuneus in the parietal cortex from more than several seconds to 10 seconds before the onset of cor-responding physical movement, then is transmitted downstream to the SMA 1 second before the movement, with consciousness of this inten-tion to act arising only a few hundred milliseconds before movement onset. Controversially, this implies that there is no room left for free will, because our conscious intent that seemingly determines free next actions appears to actually be caused by preceding unconscious brain activities arising a long time before. If this is indeed true, it raises two fundamental questions. Can we freely initiate unconscious brain activ-ity in the frontopolar part of the prefrontal cortex or in the parietal cortex? And second, why do we feel conscious intention for voluntary action only at a very late stage of preparing for action, and what is the role of this conscious intention if it is not behind determining subse-quent voluntary actions?

To address the first question, let’s assume that the unconscious activ-ity in the beginning might not be caused by anybody or anything, but may appear automatically, by itself, as an aspect of continuously chang-ing brain dynamics. This notion relates to the “spontaneous generation of alternative images and thoughts” put forward by William James. As described previously (see Figure 3.4), when memory hosts com-plex relations or connections between images of past experiences, an image may be regenerated with spontaneous variations into streams of

72 On the Mind

72

consciousness. This idea of James leads to the conjecture that continu-ous transitions of images are generated spontaneously along trajectories of brain activation states visiting first one image state and then another iteratively.

Such spontaneous transitions can be accounted for by observations of the autonomous dynamic shifts of firing patterns in collective neurons in the absence of external stimulus inputs. Using an advanced optical imaging technique, Ikegaya and colleagues (2004) observed the activ-ities of a large number of neurons in the in vitro hippocampus tissue of rats. Their main finding concerns what the authors metaphorically call a “cortical song” wherein various spatiotemporally distributed firing patterns of collective neurons appear as “motifs” and shift from one to another spontaneously. Although those motifs seem to appear randomly in many cases, they often repeat in sequences exhibiting some regular-ity. Based on other work done by Churchland and colleagues (2010), we now also know how fluctuations in activities of collective neurons in the PMC during the preparation of movements can affect the gener-ation of succeeding actual movements. They recorded the simultaneous activities of 96 PMC cells of monkeys during the preparatory period for a go- cue– triggered visual target reaching task2 over many trials. First, they found that the trajectories of the collective neural activities could be projected into a two- dimensional axis from 96 original dimensions by a mathematical analysis similar to principal component analysis. They also found that those trajectories from the go cue until the onset of movement were mostly repeated for different trials of normal cue response cases (Figure 4.10).

An exception to the preceding schema was observed during prepar-atory periods leading to generation of failure behaviors such as abnor-mally delayed responses. In such cases, it was seen that the neural activation trajectories fluctuated significantly. Such fluctuating trajecto-ries appeared even though the setting at each trial was identical. Then, how can such fluctuating activities of collective neurons occur? Freeman (2000) and many others have speculated that such spontaneous fluctu-ation might be generated by means of deterministic chaos developed in the neural activity either at the local neuronal circuit level or at larger

2. The animals are trained to reach a position that was prior- specified visually immediately after a go- cue.


73

cortical area levels. These possibilities are explored in Chapter 10. To sum up then, continuous change in the cortical dynamical state might account for the spontaneous generation, without any external causes, of various intentions or images for next actions.

The second question concerning why we become conscious of inten-tion for voluntary action only at a very late stage of preparation for action remains difficult to answer at present. However, several reports on cor-tical electrical stimulation in human subjects might open a way to an answer. Desmurget and colleagues (2009) offer us two complementary pieces of evidence obtained in their cortical electrical stimulation study conducted in patients with brain tumors. The study employed periop-erative brain stimulations with a bipolar electrode during awake surgery for tumor removal. Stimulations of the premotor cortex evoked overt mouth and contralateral limb movements. But what was interesting was that in the absence of visual feedback, the patients firmly denied mak-ing the movements they actually made; they were not consciously aware of the movements generated. Conversely, stimulation of the parietal cor-tex created an intention or desire in the patients to move. With stronger stimulation, they reported that they had moved their limbs even though they had not actually moved them. Given this result, Desmurget and col-leagues speculated that the parietal cortex might mediate error monitor-ing between the predicted perceptual outcome for the intended action

Failure

Go cue

Pre-target

Movementonset Pre-target Failure

Movementonset

Go cue(a) (b)

Figure 4.10. Overwriting of 15 trajectories by means of two- dimensional projection of the activities of 96 neurons in the dorsal premotor cortex area of a monkey during repeated trials of reaching for a visual target task (a) on one occasion and (b) on a different occasion. In both plots, trajectories for failure cases are shown with thick lines. Adopted from (Churchland et al., 2010) with permission.

74 On the Mind

74

and the actual one. (These results also imply that the depotentiation of the parietal cortex without an error signal signifies successful execution of the intended action.)

Fried and colleagues (1991) reported results of direct stimulation of the presupplementary motor area in patients as part of neurosurgical evalua-tion. Stimulation at a low current elicited the urge to move a specific body part contralateral to the stimulated hemisphere. This urge to move the limbs is similar to a compulsive desire and in fact the patients reported that they felt as if they were not the agent of the generated movements. In other words, this is a feeling of imminence for movements of specific body parts in specific ways. Actually, the patients could describe precisely the urges evoked; for example, the left arm was about to move inward toward the body. This imminent intention for quite specific movements with stimulation of the presupplementary motor area contrasts with the case of parietal stimulation mentioned earlier, in which the patients felt a relatively weak desire or intention to move. Another difference between the two studies is that more intense stimulation tended to produce actual movement of the same body part when the presupplementary motor area, but not the parietal cortex, was stimulated.

Putting all of this evidence together, we can create a hypothesis for how conscious intention to initiate actions is organized in the brain as follows. The intention for action is built up from a vague intention to a concrete one by moving downward through the cortical hierarchy. In the first stage (several seconds before movement onset), the very early form of the intention is initiated by means of spontaneous neuronal state transitions in the prefrontal cortex, possibly in the frontopolar part as described by Soon and colleagues. At this stage, the intention generated might be too vague to access its contents and therefore it wouldn’t be consciously accessible (beyond a general mood of anticipa-tion, to recall Heidegger once again). Subsequently, the signal carrying this early form of intention is propagated to the parietal cortex, where prediction of perceptual sequences based on this intention is generated. This idea follows the aforementioned assumption about functions of the parietal cortex shown in Figure 4.6. By generating a prediction of the overall profile of action in terms of its accompanying perceptual sequence, the contents of the current intention become consciously accessible. Then, the next target position for movement predicted by the parietal cortex in terms of body posture or proprioceptive state is sent to the presupplementary motor area, where a specific motor


75

program for the required immediate movement is generated online. This process generates the feeling of imminence for movements of spe-cific body parts in specific ways, as described by Fried and colleagues. The motor program is sent to the premotor cortex and primary motor cortex to generate corresponding motor commands. This process is assumed to be essentially unconscious on the basis of the findings of Desmurget and colleagues (2009) mentioned earlier.

A big “however” needs to follow this hypothesis, because it con-tains some unclear parts. First, this hypothesis conflicts on a num-ber of points described thus far in this book, and the details of these conflicts are examined in the next section. Second, it has not been clarified yet how the contents of the current intention become con-sciously accessible in the parietal cortex in the process of predicting the resultant perceptual sequences. As related to this problem, David Chalmers speculates that it is nontrivial to account for the quality of human experiences of consciousness in terms of neuroscience data alone. This is what he calls the hard problem of consciousness (Chalmers, 1995). This hard problem is contrasted with the so called easy problem in which a target neural function can be understood by its reduction into processes of physical matter. Suppose that a set of neurons that fire only at conscious moments are successfully identi-fied in subjects. Yet, there is no way to explain how the firings of these neurons result in the conscious experiences of the subjects. This is the “hard problem.”

Analogously, how can we account for causal relationships between consciousness of one’s own actions and neural activity in the parietal cortex? This problem will be revisited repeatedly in later chapters, as it is central to this current book. Next, however, we must look at some of remaining open problems.

4.4. Deciding Among Conflicting Evidence

Let’s remind ourselves of the functional role of the presupplementary motor area described by (Tanji & Shima, 1994; Shima & Tanji, 1998; Shima & Tanji, 2000). Their electrophysiological experiments with monkeys showed that this area includes neurons responsible for organiz-ing sequences of primitive movements. However, their findings conflict with those of Fried and colleagues (1991), obtained by human brain

76 On the Mind

76

electrical stimulation. These researchers claim that the presupplemen-tary motor area is responsible for generating merely the urge for immi-nent movements, not for the expectation or desire for whole actions consisting of sequences of elemental movements. If Tanji and Shima’s findings for the role of the presupplementary motor area in monkeys hold true for humans, then electrical stimulation of the presupple-mentary area in humans should likewise evoke desire or expectation for sequences of elementary movements. We’ll come back to the pos-sible role of the presupplementary motor area in human cognition in a moment.

Another conflict concerns the functional role of the premotor cortex. Although the premotor cortex (F5 in monkeys) should host intentions or goals for the next actions to be generated according to the mirror neuron theory put forward by Rizzolatti’s group (Rizzolatti et al., 1996), later experiments by Desmurget and Sirigu (Sirigu et al., 2003; Desmurget et al., 2009) suggest that it may not be the premotor cor-tex that is involved in conscious intention for action but the parietal cortex, as described in the previous section. In fact, Rizzolatti and col-leagues (Fogassi et al., 2005) did later find mirror neurons in the parietal cortex of monkeys. These mirror neurons in the parietal cortex seem to encode intention for sequences of actions for both one’s own action sequence generation and while observing similar action generation by others. We may ask then whether some neurons, not just those in the premotor cortex but also those in the parietal cortex, fire as mirror neu-rons in the case of generating as well as recognizing single actions like grasping food objects, as described in the original mirror neuron paper (Rizzolatti et al., 1996). The puzzle we have here is the following. What is the primary area for generating voluntary actions? Is the presupple-mentary motor area to be considered the locus for generating voluntary action? Or is it the premotor cortex, the original mirror neuron site? Or, is it the parietal cortex, responsible for the prediction of action- related perceptual sequences? Or, ultimately, is it the prefrontal cortex, as the center for executive control? Although it could be the supplementary motor cortex, premotor cortex, or parietal cortex, we simply cannot tell right now, as the evidence currently available to us is apparently contradictory.

Finally, we might be disappointed that circuit- level mechanisms for the cognitive functions of interest are still not accounted for exactly by current brain research. Neuroscientists have taken a reductionist approach by pursuing possible neural correlates of all manner of things.


77

They have investigated mappings between neuronal activities in specific local brain areas and their possible functions, like the firing of presup-plementary motor area cells in action sequencing or of mirror neurons in the premotor cortex in action generation and recognition, with the hope of clarifying some mechanisms at work in the mind and cognition. Although clearly the accumulation of such evidence serves to inspire us to imagine how the mind may arise from activity in the brain, such evidence cannot yet tell us the exact mechanisms underlying different types of subjective experience, at least not in a fine- grained way ade-quate to confirming one- to- one correlative mappings from the “what it feels like” to specific physical processes. How can the firings of specific cells in the presupplementary motor area mechanize the generation of corresponding action sequences? How can the firings of the same pre-motor cells in terms of mirror neurons mechanize both the generation of specific actions and the recognition of the same actions by others? What are the underlying circuitry level mechanisms accounting for both, as well as the feeling of witnessing either? In order to answer questions like these, we might need future technical breakthroughs in measurement methods such as simultaneous recording of a good number of neurons and their synaptic connectivity in target functional circuits which are associated with modeling scheme of good quality.

4.5. Summary

This chapter explored how cognitive minds can be mechanized in bio-logical brains by reviewing a set of empirical results. First, we reviewed general understandings of possible hierarchical architectures in visual recognition and motor action generation. In the visual pathway, earlier stages of the visual system (in the primary visual cortex) are thought to deal with the processing of detailed information in the retinotopic image and later stages to deal with more abstract information process-ing (in the inferior temporal cortex). Thus, some have assumed that complex visual objects can be recognized by decomposition into specific spatial combinations of visual features represented in the lower level. The action generation pathway is also presumed to follow hierarchical processes. It is assumed that the supplementary motor area (SMA) and the premotor cortex (PMC) perform higher level coordination for gen-erating voluntary action and sensory- guided action by sending control signals to the primary motor cortex (M1) in the lower level.

78 On the Mind

78

However, there has arisen some conflicting evidence that does not support the existence of a rigid hierarchy both in the visual recognition and in action generation. So, we next examined a new way of conceiving of the processes at work in which action generation and sensory recogni-tion are inseparable. We found evidence for this new approach in the review of recent experimental studies focusing on the functional roles of the parietal cortex and mirror neurons distributed through differ-ent regions of the brain. We entertained the hypothesis that the pari-etal cortex may host a predictive model that can anticipate perceptual outcomes for actional intention encoded in mirror neurons. It was also speculated that a particular perceptual sequence can be recognized by means of inferring the corresponding intention state, and that the pre-dictive model can regenerate this sequence. A hallmark of this view is that action might be generated by the dense interaction of the top- down proactive intention and the bottom- up recognition of perceptual reality. Furthermore, we showed how this portrait is analogous to to Merleau- Ponty’s philosophy of embodiment.

An essential question remained. How is intention itself set or gener-ated? This question is related to the problem of free will. We reviewed findings that neural activities correlated with free decisions are initiated in various regions including the SMA, the prefrontal cortex, and the parietal cortex significantly before individuals become consciously aware of the decision. These findings raise two questions. The first question concerns how “unconscious” neural activities for decisions are initiated in those related regions. The second question concerns why conscious awareness of free decisions is delayed. Although we have provided some possible accounts to address these questions, they are yet speculative.

Also in this chapter, we have found that neuroscientists have taken a reductionist approach by pursuing possible neural correlates of all man-ner of things. They have investigated mappings between neuronal activ-ities in specific local brain areas and their possible functions. Although the accumulation of such evidence can serve to inspire us to hypothesize how the normal functioning brain results in the feeling of being con-scious, neurological evidence alone cannot yet specify the mechanisms at work. And with this, we have seen that not one, but many important questions about the nature of the mind remain to be answered.

How might we see neural correlates for our conscious experi-ence? Suppose that we might be able to record all essential neuro-nal data such as the connectivity, synaptic transmission efficiency,


79

and neuronal firings of all related local circuits in the future. Will this enable us to understand the mechanisms behind all of our phe-nomenological experiences? Probably not. Although we would find various interesting correlations in such massive datasets, like the cor-relations between synaptic connectivity and neuronal firing patterns or those between neuronal firing patterns and behavioral outcomes, they would still just be correlations, not proof of causal mechanisms. Can we understand the mechanisms of a computer’s operating system (OS) just by putting electrodes at various locations on the mother board circuits? We may obtain a bunch of correlated data in relation to voltages but probably not enough to infer the principles behind the workings of a sophisticated OS.

By taking seriously limitations inherent to the empirical neuroscience approach, this book now begins to explore an alternative approach, a synthetic modeling approach that attempts to understand possible neu-ronal mechanisms underlying our cognitive brains by reconstructing them as dynamic artifacts. The synthetic modeling approach described in this book has two complementary focuses. The first is to use dynam-ical systems perspectives to understand various complicated mecha-nisms at work in cognition. The dynamical systems approach is effective in articulating circular causality, for instance. The second focus con-cerns the embodiment of the cognitive processes, which were briefly described in the previous chapter. The role of embodiment in shaping cognition is crucial when causal links go beyond brains and establish cir-cular causalities between bodies and their environments (e.g., Freeman, 2000.) The next chapter provides an introductory account that consid-ers such problems.

80

81

81

5

Dynamical Systems Approach for Modeling Embodied Cognition

Nobel laureate in physics Richard Feynman once wrote on the chalk-board during a lecture:

What I cannot create, I cannot understand. — Richard Feynman1

Conversely, thus: I can understand what I can create. This seems to make sense because if we can synthesize something, we should know its orga-nizing principles. By this line of reasoning, then, we might be able to understand the cognitive mind by synthesizing it.

But how can we synthesize the mind? Basically, the plan is to put some computer simulation models of the brain into robot heads and then examine how the robots behave as well as how the neural activa-tion state changes dynamically in the artificial brains while the robots interact with the environment. The clear difficulty involved in doing this is how to build these brain models. Although we don’t yet know exactly their organizing principles, we should begin by deriving the most likely ones through a thorough survey of results from neuroscience,

1. This statement was found on his blackboard at the time of his death in February 1988.

82 On the Mind

82

psychology, and cognitive science. In robotics experiments, we can examine neural activation dynamics (of a brain model) and behaviors (of such embrained robots) as robots attempt to achieve goals of cogni-tive tasks designed by experimenters.

It is not trivial to anticipate— dare we say guess— what sorts of phe-nomena might be observed in such experiments even though the prin-ciples used in engineering relevant brain models are well defined. This comes from the fact that all interactions that occur within the model brains, as well as between them and the environment by circular causal-ity, are dominated by nonlinear dynamics for which numerical solutions cannot be obtained analytically. Rather, we should expect that such robotics experiments might evidence nontrivial phenomena that are not to be inferred from formative principles themselves. If such emer-gent phenomena observed in experiments correspond to various bodies of work including empirical observations in neuroscience, computa-tional aspects in cognitive science, and reports from phenomenological reduction, the presumed principles behind the models would seem to hold. Moreover, it would be great if just a small set of principles in the model could account for numerous phenomena of the mind through their synthesis. This is the goal of the synthetic approach, to articu-late the processes essential to cognition as we experience it and ideally nothing more.

Now, let’s assume that the mind is a product of emergent processes appearing in the structural interactions between the brain and the envi-ronment by means of sensory– motor coupling of a whole, embodied agent through behavior, wherein the mind is considered a nontrivial phenom-enon appearing as a result of such interactions. This assumption refers to the embodied mind, or embodied cognition (Varela et al., 1991). Many phe-nomena emergent from embodied cognition can be efficiently described in the language of dynamical systems, as we will see. Subsections of the current chapter will explore the idea of embodied cognition by visiting different approaches taken so far. These include psychological studies focusing on embodiment and “new- trend” artificial intelligence robot-ics studies exemplifying behavior- based robotics involving the synthe-sis of embodied cognition. Readers will see that some psychological views, especially Gibsonian and Neo- Gibsonian approaches have been well incorporated into dynamical system theories, and have thus pro-vided useful insights guiding behavior- based robots and neurorobots. After this review, we will consider particular neural network models as

Dynamical Systems Approach for Embodied Cognition 83

83

abstractions of brains, and then consider a set of neurorobotics studies by using those models that demonstrate emergence through synthesis by capturing some of the essence of embodied cognition. First, however, the next section presents an introduction to dynamical systems’ theories that lay the groundwork for the synthetic modeling studies to follow.

But, readers should note that this is not the end of the story: Chapter 6 discusses some of the crucial ingredients for synthesizing the “mind” that have been missed in conventional studies on neural network model-ing and behavior- based robotics. The first section provides an introduc-tory tutorial on general ideas of dynamical systems.

5.1. Dynamical Systems

Here, I would like to start with a very intuitive explanation. Let’s assume that there is a dynamical system, and suppose that this system can be described at any time as exhibiting an N dimensional system state where the ith dimensional value of the current state is given as xt

i . When xti+1 as

the ith dimensional value of the state at next time step, and can be deter-mined solely by way of all dimensional values at the current time step, the time development of the dimensions in the system can be described by the following difference equation (also called a “map”):

x g x x x

x g x x x

x g x

t t t tN

t t t tN

tN N

t

+

+

+

= …( )= …( )

=

11 1 1 2

12 2 1 2

11

, , ,

, , ,

,, , ,x xt tN2 …( )

(Eq. 1)

Here, the time development of the system state is obtained by iterating the mapping of the current state at t to the next state at t+1 starting from given initial state. Eq. 1 can be rewritten with N dimensional state vector Xt , and with P as a set of parameters of interest that characterize the function G( ):

X G X Pt t+ =1 ( , ) (Eq. 2)

A given dynamical system is often investigated by examining changes in time- development trajectories versus changes in the representative

84 On the Mind

84

parameter set P. If the function G( ) in Eq. 2 is given as a nonlinear function, the trajectories of time development can become complex depending on the nonlinearity. In most cases, the time development of the state cannot be obtained analytically. It can be obtained only through numerical computation as integration over time from a given initial state X0 and this computation can only be executed with the use of modern digital computers.

Dynamical systems can be described also with an ordinary differen-tial equation in continuous time with X as a vector of system state, with X as a vector of the time derivative of the state (it can be also written

as ∂∂Xt

), and with F( ) as a nonlinear dynamic function parameterized by

P as shown in Eq.3.

X F X P= ( ), (Eq. 3)

The exact trajectory in continuous time can be obtained also by integrat-ing the time derivative from a given dynamical state at the initial time.

The structure of a particular dynamical system is characterized by the configuration of attractors in the system, which determines the time evolution profiles of different states. Attractors are basins toward which trajectories of dynamical states converge. An attractor is called an invar-iant set because, after trajectories converge (perhaps after infinite time), they become invariant trajectories. That is, they are no longer variable and are instead determined, representing stable state behaviors charac-terizing the system. On the other hand, outside of attractors or invariant sets are transient states wherein trajectories are variable. Attractors can be roughly categorized in four types as shown in Figure 5.1a– d.

The easiest attractor to envision is a fixed point attractor in which all dynamic states converge to a point (Figure 5.1a). The second one is a limit cycle attractor (Figure 5.1b). In this type of attractor, the trajec-tory converges to a cyclic oscillation pattern with constant periodicity. The third one is a limit torus that appears when there is more than one frequency involved in the periodic trajectory of the system and two of these frequencies form an irrational fraction. In this case, the trajectory is no longer closed and it exhibits quasi- periodicity (Figure 5.1c). The fourth one is a chaotic attractor (a “strange attractor”) in which the tra-jectory exhibits infinite periodicity and thereby forms fractal structures (Figure 5.1d). Finally, in some cases multiple local attractors can coexist in the same state space as illustrated in Figure 5.1e. In such cases, the


85

attractor to which the system converges depends on the initial state. In Figure 5.1e a state trajectory starting from the left side and the right side of the dotted curb will converge to a fixed point and a limit cycle, respec-tively. Next, we look at the case of discrete time dynamics in detail.

5.1.1 Discrete Time System

Let us examine the so- called logistic map, which was introduced by Robert May (1976), as a simple illustrative example of Eq. 1 with a one- dimensional dynamic state. Even with a one- dimensional dynamic state, its behavior is nontrivial as will be seen in the following. The logistic map is written in discrete- time form as:

x x xat t t+ = −1 1( ) (Eq. 4)

Here, xt is a one- dimensional dynamic state and a is a parameter. If a particular value is taken for the initial state, x0 , it will recursively gen-erate a trajectory x1, x2, … ., xn

as shown in the diagram at the left of Figure 5.2a.

(a) (c)

(e)

(b)

(d)

P2P1

x

x

Figure 5.1. Different types of attractors. (a) Fixed point attractor, (b) limit cycle attractor, (c) limit torus characterized by two periodicities P1 and P2 which form an irrational fraction, and (d) chaotic attractor. (e) Shows multiple attractors consisting of a fixed point attractor and a limit cycle attractor. Note that all four types of attractors are illustrated in terms of continuous time dynamical systems.

86 On the Mind

86

Now, let’s examine how the dynamical structure of a logistic map changes when the parameter a is varied continuously. For this purpose, a bifurcation diagram of the logistic map is shown in Figure 5.2a, right. This diagram shows an invariant set of attractors for each value of a, where an invariant set means a set of points within the convergence tra-jectory as mentioned previously. For example, when a is set to 2.6, the trajectory of xt converges toward a point around 0.61 from any initial state, and therefore this point is a fixed- point attractor (see Figure 5.2b left.) When a is increased to 3.0, the fixed- point attractor bifurcates into a limit- cycle attractor with a period of 2. With a set to 3.2, a limit cycle alternating between 0.52 and 0.80 appears (see Figure 5.2b middle.),

a = 2.6 a = 3.2 a = 3.6

100 20 300.3

x

t100 20 30

0.3

1.0

x

t

0.3

1.0

x

100 20 30 40 50t

xt+1 xt+1=xt

x0 x3 x1 x2 xt

(a)

a

x

1.0(b)

1.0

0.8

0.6

0.4

0.2

0.02.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0

Figure 5.2. A logistic map. (a) Dynamic iteration corresponding to a logistic map is shown on the left and its bifurcation diagram with respect to the parameter a is shown on the right. (b) Time developments of the state with different values of a where a fixed point attractor, limit cycle attractor, and chaotic attractor appear from left to right for a = 2.6, 3.2, and 3.6, respectively.


87

and when a is further increased to 3.43, the limit cycle with a period of 2 bifurcates into one with a period of 4. A limit cycle alternating sequentially between 0.38, 0.82, 0.51, and 0.88 appears when a is set to 3.5, whereas when a is increased to 3.60, further bifurcation takes place from a limit cycle to a chaotic attractor characterized by an invariant set with an infinite number of points (see Figure 5.2b right.) The time evolutions of x starting from different initial states are plotted for these values of a, where it is clear that the transient dynamics of the trajectory of x converge toward those fixed- point, limit- cycle, and chaotic attrac-tors. It should be noted that no periodicity is seen in the case of chaos.

We’ll turn now to look briefly at a number of characteristics of chaos. One of the essential characteristics of chaos is its sensitivity with respect to initial conditions. In chaos, when two trajectories are generated from two initial states separated by a negligibly small distance in phase space, the distance between these two trajectories increases exponentially as iterations progress. Figure 5.3a shows an example of such development.

This sensitivity to initial conditions determines the ability of chaos to generate nonrepeatable behaviors even when a negligibly small per-turbation is applied to the initial conditions. This peculiarity of chaos can be explained by the process of stretching and folding in phase space as illustrated in Figure 5.3b. If a is set to 4.0, the logistic map gener-ates chaos that covers the range of x from 0.0 to 1.0 as can be seen in Figure 5.2a. In this case, the range of values for x0 between 0.0 and 0.5 is mapped to x1 values between 0.0 and 1.0 with magnification, whereas x0 values between 0.5 and 1.0 are mapped to x1 values between 1.0 and 0.0 (again with magnification, but in the opposite direction), as can be seen in Figure 5.3b. This essentially represents the process of stretching and folding in a single mapping step of the logistic map. Two adjacent initial states denoted by a dot and a cross are mapped to two points that are slightly further apart from each other after the first mapping. When this mapping is repeated n times, the distance between the two states increases exponentially, resulting in the complex geometry generated for xn by means of iterated stretching and folding. This iterated stretching and folding is considered to be a general mechanism for generating chaos.

Further, look at an interesting relation between chaotic dynamics and symbolic processes. If we observe the output sequence of the logistic map and label it with two symbols, “H” for values greater than 0.5 and “L” for those less than or equal to 0.5, we get probabilistic sequences of alternating “H” and “L.” When the parameter a is set at 4.0, it is known

88 On the Mind

88

that the logistic map generates “H” or “L” with equal probability with no memory, like a coin flip. This can be represented by a one- state proba-bilistic finite state machine (FSM) with an equal probability output for “H” and “L” from this single state. If the parameter a is changed to a different value in the chaotic region, a different form of a probabilistic FSM with a different number of discrete states and different probability assignments for output labels is reconstructed for each. This is called symbolic dynamics (Crutchfield & Young, 1989; Devaney 1989), which provides a theorem to connect real number dynamical systems and dis-crete symbol systems.

1.0

2nd1st 3rd

0.5

0.0

x0 x1 x2 x3 xn

x

(b)

1

0.9

0.8

0.7

0.6

0.5

0.4

5 10 15 20 25 30 35 40 45 50t

(a)

Figure 5.3. Initial sensitivity of chaotic mechanisms. (a) Distance between two trajectories (represented by solid and dashed lines) starting from their initial states separated by a distance of ϵ in phase space. Distance between the two grows exponentially as time goes by in chaos generated by a logistic map with a set to 3.6. (b) The mechanism of generating chaos is by stretching and folding.


89

One interesting observation of logistic maps in terms of symbolic dynamics is that the complexity of symbolic dynamics in terms of the number of states in the reconstructed probabilistic FSM can be infinite especially in the parameter region at the onset of chaos, at the ends of window regions in which the periodicity of the attrac-tor moves from finite to infinite (Crutchfield & Young, 1989). It is known that nonlinear dynamic systems in general develop critical behaviors upon exhibiting state trajectories of infinite complexity, at the “edge of chaos,” including at the ends of window parameter regions where quite rich dynamic patterns following power law can be observed. Edge of chaos can be observed also under another critical condition when “tangency” exists in mapping of function, as shown in Figure 5.4.

When the curve of mapping function becomes tangent to the line of identity mapping, passing through the tangent point could take infinite steps depending on the value of x to enter the passing through. This generates the phenomena known as intermittent chaos in which the passing through appears intermittently, only after sev-eral steps, or sometime after infinite steps. These properties of edge of chaos in critical conditions are revisited in later chapters as we examine the behavioral characteristics of neurorobots observed in our experiments.

Xt+1

Xt

Tangency

Figure 5.4. Tangency in nonlinear mapping. The passing through of the state x slows down in the vicinity of the tangency point.

90 On the Mind

90

5.1.2 Continuous- Time Systems

Next, let’s examine the case of continuous time, represented by Eq. 5. We’ll take the Rössler system (Rössler, 1976) as a simple example that can be described by the following set of ordinary differential equations:

x y z= − − y = +x ay (Eq. 5)

z = + −b z x c( )

This continuous- time nonlinear dynamical system is defined by a three- dimensional state (x, y, and z), three parameters (a, b, and c), and no inputs. If we conduct a phase space analysis on this system, we can see different dynamical structures appearing for different parameter set-tings (of a, b, and c). As shown in Figure 5.5, continuous trajectories of the dynamical state projected in the two- dimensional space (x, y) converge toward three different types of attractors (fixed point, limit cycle, or chaotic) depending on the values of the parameters. It should be noted that in each case the trajectory converges to the same attractor regardless of the initial state. Such an attractor is called a global attrac-tor, and the chaotic attractor shown in (c) is the Rössler attractor. The phenomena corresponding to these changes in the dynamical structure caused by parameter bifurcation are quite similar to those observed in the case of the logistic map. The mechanism of generating chaos with the Rössler attractor can be explained by the process of stretching and folding previously mentioned. In the Rössler attractor, a bundle of tra-jectories constituting a sheet rotates in a counterclockwise direction, accompanied by a one- time folding and stretching. If we take a section of the sheet, which is known as a Poincaré section (Figure 5.5d), we’ll see a line segment consisting of an infinite number of trajectory points. This line segment is folded and stretched once during a single rotation, which is mapped again onto the line segment (see Figure 5.5e). If this process is iterated, the sensitivity of this system to initial conditions becomes apparent in the same way as with the logistic map.

5.1.3 Structural Stability

This subsection explains why structural stability is an important char-acteristic of nonlinear dynamical systems. Importantly, I will argue that one emergent property of nonlinear dynamical systems is the appearance


91

of a particular attractor configuration for any given dynamical system. A particular equation describing a dynamical system can indicate the direction of change of state at each local point in terms of a vector field. However, the vector field itself cannot tell us what the attractor looks like. The attractor emerges only after a certain number of iterations have been performed, through the transient process of converging toward the attractor. An important point here is that attractors as trajectories of steady states cannot exist by themselves in isolation. Rather, they need to be “supported” by transient parts of the vector that converge toward these attractors. In other words, transient parts of the vector flow make attractors stable, as illustrated in Figure 5.6a.

(a)

(c)

(b)

(d)

Fold and stretch(e)

Poincaresection

Figure 5.5. Different attractors appearing in the Rössler system. (a) A fixed- point attractor (a = − 0.2, b = 0.2, c = 5.7), (b) a limit- cycle attractor (a = 0.1, b = 0.1, c = 4.0), and (c) a chaotic attractor (a = 0.2, b = 0.2, c = 5.7). Illustrations of (d) the Poincaré section and (e) the process of folding and stretching in the Rössler attractor that accounts for mechanism of generating chaos.

92 On the Mind

92

This is the notion behind the structural stability of attractors. To pro-vide a more intuitive explanation of this concept, let’s take a counterex-ample in terms of a system that is not structurally stable. Sometimes I ask students to give me an example of a system that generates oscillation patterns and a common answer is a sinusoidal function or a harmonic oscillator, such as the frictionless spring- mass system described by Eq. 6.

m kv x= − x v=

(Eq. 6)

Here, x is the one- dimensional position of a mass m, v is its velocity, and k is the spring coefficient. The equation represents a second order dynamic system without damping terms. A frictionless spring- mass sys-tem can indeed generate sinusoidal oscillation patterns. However, such patterns are not structurally stable because if we apply force to the mass of the oscillator instantaneously, the amplitude of oscillation will change immediately, and the original oscillation pattern will never be recovered automatically (again, it is frictionless). If the vector field is plotted in (x, v) space, we will see that the vector flow describes concentric circles where there is no convergent flow that constitute a limit- cycle attractor (see Figure 5.6b). Indeed, a sinusoidal wave function is also simply the trace of one point on a circle as it rolls along a plane.

Most rhythmic patterns in biological systems are thought to be gener-ated by limit- cycle attractors because of their potential stability against

–2–3 –1 0 1 2 3

0

1

–1

2

–2

3

–3

X

V

(a)

X–2–3 –1 0 1 2 3

0

1

–1

2

–2

3

–3

V

(b)

Figure 5.6. Vector flow. (a) Appearance of a limit- cycle attractor in a vector field of a particular two- dimensional continuous dynamical system with the system state (x, v) in which the vector flow converges toward a cyclic trajectory. (b) A vector field for a harmonic oscillator in which its flow is not convergent but forms concentric circles.


93

perturbations. These include central pattern generators in neural circuits for the heart beat, locomotion, breathing, swimming, and many oth-ers, as is described briefly in the next section. Such limit- cycle attractor dynamics in real physical systems are generated by nonlinear dynamical systems called dissipative systems. A dissipative system consists of an energy dissipation part and an energy supply part. If the amounts of energy dissipation and energy supply during one cycle of oscillation are balanced, this results in the formation of an attractor of the limit cycle type (or it could also result in the generation of chaos under certain con-ditions). Energy can be dissipated by dampening caused by friction in mechanical systems or by electric resistance in electrical circuits. When a larger or smaller amount of energy is supplied momentarily due to a perturbation from an external source, the state trajectory deviates and becomes transient. However, it returns to the original attractor region by means of automatic compensation by dissipating an appropriate amount of energy corresponding to the input energy.

On the other hand, a harmonic oscillator without a dampening term, such as that shown in Eq. 6, is not a dissipative system but an energy conservation system. There is no dampening term to dissipate energy from the system. Once perturbed, its state trajectory will not return to the original one. In short, the structural stability of dynamic patterns in terms of physical movements or neural activity in biological systems can be achieved through attractor dynamics by means of a dissipative struc-ture. Further, the particular attractors appearing in different cases are the products of emergent properties of such nonlinear (dissipative) dynamic systems. Indeed, Neo- Gibsonian psychologists have taken advantage of these interesting dynamical properties of dissipative systems to account for the generation of stable but flexible biological movements. The next section explores such concepts by introducing the Gibsonian approach first, followed by Neo- Gibsonian variants and infant developmental psy-chology using the dynamical systems’ perspectives.

5.2. Gibsonian and Neo- Gibsonian Approaches

5.2.1 The Gibsonian Approach

A concept central to this approach, known as affordance, has signifi-cantly influenced not only mainstream psychology and philosophy of

94 On the Mind

94

the mind, but also synthetic modeling studies including artificial intel-ligence and robotics. In the original theory of affordance proposed by J. J. Gibson (1979), affordance was defined as “all possibilities for actions latent in the environment.” Put another way, affordance can be under-stood as behavioral relations that animals are able to acquire in interac-tion with their environments. Relationships between actors and objects within these environments afford these agents opportunities to generate adequate behaviors. For example, a chair affords sitting on it, and a door knob affords pulling or pushing a door open or closed free from the resistance afforded by the door's locking mechanism.

Many of Gibson's considerations focused on the fact that essential information about the environment comes by way of human process-ing of the optical flow. Optical flow is the pattern of motion sensed by the eye of an observer. By considering that optical flow information can be used to perceive one's own motion pattern and to control one's own behavior, Gibson came up with the notion of affordance constancy. He illustrated this concept with the example of a pilot flying toward a tar-get on the ground, adjusting the direction of flight so that the focus of expansion (FOE) in the visual optical flow becomes superimposed on the target (see Figure 5.7a). This account was inspired by his own experience in training pilots to develop better landing skills during World War II.

A similar example, closer to everyday life, is that we walk along a corridor while recognizing the difference from zero of the optical flow vectors along both sides of the corridor, which allows us to walk down the middle of the corridor without colliding with the walls (see Figure 5.7b). These examples suggest that for each behavior there is a crucial perceptual variable— in Gibson’s two examples, the distance between the FOE and target, and the vector difference between the optical flows for both walls— and that body movements are generated to keep these perceptual variables at constant values. By assuming the existence of coupled dynamics between the environment and small controllers inside the brain, the role of the controllers is to preserve perceptual constancy. A simple dynamical system theory can show how this constancy may be maintained by assuming the existence of a fixed point attractor, which ensures that perceptual variables always converge to a constant state.

Andy Clark, a philosopher in Edinburgh, has been interested in the role of embodiment in generating situated behaviors from the Gibsonian perspective. He analyzed how an outfielder positions himself to catch a fly ball as an example (Clark, 1999). In general, this action is thought


95

to require complicated calculations of variables such as the arc, speed, acceleration, and distance of the ball. However, there is actually a simple strategy to catch it: If the outfielder continues to adjust his movement so that the ball appears to approach in a straight line in his visual field, the ball falls down to him eventually. By maintaining this coordination for perceptual constancy, he can catch the fly ball easily. Clark explains that the task is to maintain, by making multiple, ongoing, real- time adjust-ments to the running motion, a kind of coordination between the inner and the outer. This means that coordination dynamics like this naturally appears under relatively simple principles, such as perceptual constancy, instead of through complicated computation involving representation in an objective, simulated Cartesian coordinate system.

5.2.2 Neo- Gibsonian Approaches

In the 1980s, so- called Neo- Gibsonian psychologists such as Turvey, Kugler, and Kelso started investigating how to achieve the coordination of many degrees of freedom by applying the ideas of dissipative struc-tures from nonlinear dynamics to psychological observations of human and animal behavior (see the seminal book by Scott Kelso, 1995). They considered that the ideas of dissipative structures, especially concerning limit cycle attractor dynamics, can serve as a basic principle in organiz-ing coherent rhythmic movement patterns such as walking, swimming, breathing, and hand waving, as described briefly in the previous section. The important theoretical ingredients of these ideas are entrainment and phase transitions. First, coupled oscillators that initially oscillate with

(a) (b)

Figure 5.7. Gibson’s notion of optical constancy. (a) Flying while superim-posing the focus of expansion on the target heading and (b) walking along a corridor while balancing optical flow vectors against both side walls. Redrawn from Gibson (1979).

96 On the Mind

96

different phases and periodicities can, by mutual entrainment under cer-tain conditions, converge to a global synchrony with reduced dimen-sionality. Second, the characteristics of this global synchrony can be drastically changed by a shift of an order parameter of the dynamic sys-tem by means of phase transition.

Let’s look at this in more detail by reviewing a representative experi-mental study conducted by Kelso and colleagues (Schoner & Kelso, 1988). In the experiment, subjects were asked to wiggle the index fin-gers of their left and right hands in the same direction (different mus-cles activated; antiphase) in synchrony with a metronome. When the metronome was speeded up gradually, what happened was that the fin-ger movement pattern suddenly switched from the same direction to the opposite direction one (same muscles activated; in- phase). It was observed that the relative phase changed 180 degrees to 0 degrees sud-denly (see the left- hand side panel in Figure 5.8).

ener

gy

180° 0° Phase di�erence

ener

gy


ener

gy


Right index �ngerLeft index �nger

amp

amp

amp

time

time

time

Freq

uenc

y

Figure 5.8. The phase transition model by Kelso (1995) for explaining the dynamic shifts seen in bimanual finger movements. The panel on the left- hand side shows how oscillation coordination between right and left index fingers changes when the leading frequency is increased. The panel on the right- hand side shows the corresponding change in the energy landscape.


97

After this experiment, Kelso and colleagues showed by computer simu-lation that the observed dynamic shift is due to the phase transition from a particular dynamic structure self- organizing to another, given changes in an order parameter of the system (the speed of metronome in this exam-ple). When a hypothetical energy landscape is computed for the movement patterns along with the order parameter of the metronome speed (see the right- hand side panel in Figure 5.8), the antiphase becomes stable with its energy minimum state when the metronome speed is low. However, the antiphase becomes unstable as the metronome speed increases (the parameter introduces too much energy into the system) and the behavior is modulated toward the realization of a more stable system and corre-sponding energetic minimum, switching the system state suddenly from the antiphase to the in- phase. Such dramatic shifts in dynamic system state such as those seen in the bimanual finger movement illustration can be explained by means of the phenomena of phase transition. Indeed, a diverse range of phenomena characterized by similar shifts in animal and human movement patterns appear very effectively explained in terms of phase transitions. Good examples include the dynamic shift from trot to gallop in horse locomotion given a change in the system parameter “run-ning speed,” as well as the shift from walk to run in human locomotion. It is common experience that the middle state, a walk- run, is more dif-ficult to maintain (at least without lots of practice) than one or the other behaviors. This result accords with a central notion in Neo- Gibsonian approaches, that behaviors are organized not by an explicit central com-mander top- down, but by implicit synergy among local elements including neurons, muscles, and skeletal mechanics, and that these behaviors repre-sent emergent characteristics of dissipative structures.

5.2.3 Infant Developmental Psychology

Neo- Gibsonian theories helped to give birth to another dynamic system theory that accounts for infant development. Ester Thelen and Linda B. Smith wrote in their seminal textbook, A Dynamic Systems Approach to the Development of Cognition and Action, that:

We invoke Gibson’s beliefs that the world contains information and that the goal of development is to discover relevant information in order to make a functional match between what the environment affords and what the actor can and wants to do. (Thelen & Smith, 1994, p. 9, Introduction)

98 On the Mind

98

They suggest that development is better understood as the emergent product of many decentralized and local interactions occurring in real time between parts of the brain, the body, and the environment, rather than as sequences of events preprogrammed in our genes. For example, crawling is a stable behavior for infants for several months. However, when they newly acquire the movement patterns of walking upright, the movement patterns of crawling become unstable. Smith and Thelen hold that this happens not as the result of a genome preprogram but as the result of an efficient solution generated through self- organization (Smith & Thelen, 2003).

Following this line of thinking, Gershkoff- Stowe and Thelen (2004) provide a remarkable account of so- called “U- shaped” development, a phe-nomenon whereby previously performed behaviors regress or disappear only to recover or reappear with even better performance later on. A typi-cal example can be seen in language development around 2 or 3 years of age when children, after several months of correct usage, often incorrectly use words like “foots” and “goed,” a phenomenon known as overregularization. They eventually resume using these words correctly. Another example is the walking reflex. When a newborn baby is held so that the feet lightly touch a solid surface, she or he shows walking- like motion with alternate stepping. However, this reflexive behavior is scarcely observed after a few months and does not reappear until just prior to walking.

One more example is perseverative reaching observed in the so- called A- not- B task as originally demonstrated by Jean Piaget, known as the father of developmental psychology, illustrated in Figure 5.9.

In this task, 8- to 10- month- old infants are cued to recover a hidden object from one of two identical hiding places (see Figure 5.9). Recovery is repeated several times at the first location “A” before the experi-menter switches the hiding place to the second location “B.” Although the infant watches the toy hidden at the new location “B,” if there is a delay between hiding and allowing the child to reach, infants robustly return to the original location “A.” This is known as perseverative reach-ing. This reaching can even be observed in the not- hidden toy case (i.e., when provided with an explicit cue to indicate the correct location). An interesting observation in the not- hidden condition is that infants around 5 months old are correct (around 70% success rate) at location “B” and show less perseveration than infants around 8 months old who are incorrect (around 20% success rate). This perseverative behavior is not observed in infants older than 12 months of age.


99

What is the underlying mechanism in these examples of U- shaped development? Gershkoff- Stowe and Thelen (2004) argue that U- shaped development is not caused by regression or loss of a single element such as one in the motor, perceptual, or memory system alone. Instead, U- shaped behavior is the result of a continuously changing configuration between mutually interacting components, including both mental and behavioral components. They write, “The issue is not how a behavior is ‘lost’ or ‘gets worse,’ but how the com-ponent processes can reorganize to produce such dramatic nonlin-earities in performance” (Gershkoff- Stowe & Thelen, 2004, p. 16). In the case of perseverative reaching, although it can be considered that repeated recoveries from location “A” can reinforce a memory bias to select location “A” again upon next reaching, this is not the only cause. It was found that the hand trajectories in repeated recov-eries of 8- month- old infants become increasingly similar to those of 5- month- old infants who are relatively immature in controlling their hand reaching movements. It was also found that changing the hand trajectory by adding weights to the infants’ arms significantly

1 2

AB

3

4 5

Figure 5.9. Piaget’s A- not- B task. First, in 1, an attractive object is hidden at location “A” (left- hand side). The infant then repeatedly retrieves the object, in 2 and 3, from the correct location of “A.” In 4, the object is then hidden at location “B” (right- hand side) while the infant attends to this. However, with a delay between seeing the hiding and retrieval, the infant fails to retrieve the object at the correct location “B.”

100 On the Mind

100

decreased the perseveration. The point here is that the mutual rein-forcement of the memory bias and the persistent trajectories in the reaching movement through the repeated recoveries result in form-ing a strong habit of reliable perseverative reaching. This account has been supported by simulation studies using the dynamic neural field model (Schoner & Thelen, 2006). This perseverative reaching is at its peak at 8 months of age and starts to drop off thereafter as other functions mature to counter it, such as attention switch and atten-tion maintenance that allow for tracking and preserving the alter-native cue appearing in the second location “B.” Smith and Thelen (2003) explain that infants who have had more experience exploring environments by self- locomotion show greater visual attention to the desired object and its hidden location.

This account of how reaching for either “A” or “B” is determined by infants is parallel to what Spivey (2007) has discussed in terms of the “continuity of minds.” He considers that even discrete decisions for selecting actions might be delivered through the process of gradu-ally settling partially active and competing neural activities involved with multiple psychological processes. And again, the emergence of U- shape development is a product of dynamic interactions between multiple contingent processes both internal and external to infants (Gershkoff- Stowe & Thelen, 2004). The next subsection looks at the development of a cognitive competency, namely imitation, which has been considered to play an important role in the cognitive develop-ment of children.

5.2.4 Imitation

It has been considered that imitation and observational learning are essential for children to acquire a wide range of behaviors because learn-ing by imitation is much more efficient than learning through trial and error by each individual alone. Jean Piaget proposed that imitation in infants develops through six discrete stages until 18 to 24 months of age (Piaget, 1962). The first stage starts with the sensory- reflex response of newborns, which is followed by repletion of some repertories by chance in the second stage. A drastic differentiation in development comes with deferred imitation at around 8 to 12 months in the fourthth stage. Here, an ability to reproduce a modeled activity that has been observed at some point in the past emerges. Piaget emphasized this change by


101

suggesting that this stage marks the onset of mentalization capabilities in infants. This mentalization capability is further developed in the sixth stage at around 18 to 24 months, when some symbolic level men-tal representation and manipulation can be observed. A typical example is the appearance of pretend play. For example, a child pretends to call by using a banana instead of a real phone after observing the actions of his or her parents.

Although Piaget's emphasis was cognitive development toward men-talization or symbolism that appears in the later stages, some recent studies have pursued the suspicion that the roots of the human cogni-tion may be found in the analysis of early imitation, and so have focused on how neuronal mechanisms of imitation appear in much earlier stages. A seminal study by Meltzoff and Moore (1977) showed that human neonates can imitate facial gestures of adults such as tongue protrusion, mouth opening, and lip protrusion. This finding was nontrivial because it implies that neonates can match their own unseen behaviors with those demonstrated by others. Even Piaget believed that facial imitation could appear only after 8 months of age.

Although the exact mechanisms enabling these imitative behaviors in neonates is still a matter of debate, Meltzoff (2005) has hypothesized a “like me” mechanism that connects the perceptions of others “like me” with one's own capacities, therefore grounding an embodied understand-ing of others’ minds enactive imitation. In the first stage, in newborns, innate sensory– motor mapping can generate aforementioned imitative behaviors by means of automatic responses. In the second stage, infants experience regular relationships between their mental states and actions generated repeatedly, and thus associations between them are learned. Finally in the third stage, infants come to understand that others who act “like me” have mental states “like me.”

On a similar line, Jacqueline Nadel (2002) proposed that imitation is a means to communicate with others. Nadel observed a group of prever-bal infants in a natural social play setting involving a type of frequently observed communicative interaction, turn taking or switching roles among two or three infants. Typical turn taking was observed when an infant showed another infant an object similar to the one he or she was holding. In most cases, the partner infant took the object and imitated its usage. Sometimes, however, the partner refused to do so or ignored the other. In these cases, the initiator left the object and turned to imi-tate the partner's ongoing behavior.

102 On the Mind

102

Another remarkable finding by Nadel (2002) was that pairs of pre-verbal infants often exhibited imitation of instrumental activity with synchrony between them. Figure 5.10 shows that when one infant dem-onstrated an unexpected use of objects (carrying an upside- down chair on his head), the partner imitated this instrumental activity during their imitative exchanges.

Based on these observations and others, Nadel and colleagues argue that although immediate imitation generated during behavioral exchanges may not be always an intelligent process as Piaget pointed out, infants at the very least “know how” to communicate with each other (Andry et al., 2001). This intriguing communicative activity may not require much of mental representation and manipulation or symbol-ism, but rather depends on synchronization and rhythm which appear spontaneously in the dynamical processes of sensory– motor mapping between the perception of others of “like me” and one's own actions. The next section describes a new movement in artificial intelligence and robotics guided by these insights and many others from contemporary developmental psychology.

Figure 5.10. Preverbal infants exhibit instrumental activities with synchrony during imitative exchange. Reproduced from (Nadel, 2002) with permission.


103

5.3. Behavior- Based Robotics

At the end of the 1980s, a paradigm shift occurred in artificial intel-ligence and robotics research. This shift occurred with the introduc-tion of behavior- based robotics by Rodney Brooks at MIT. It should be noted, however, that just a few years before Brooks started his proj-ect, Valentino Braitenberg, a German neuroanatomist, published a book entitled Vehicles: Experiments in Synthetic Psychology (Braitenberg, 1984) describing the psychological perspective that led to the behavior- based robotics approach. The uniqueness of the book is its attempt to explore possible brain- psychological mechanisms for generating behav-ior via synthesis. For example, Braitenberg’s “Law of uphill analysis and downhill invention” suggests that it is more difficult to understand a working mechanism or system just from looking at it externally than it is to create it from scratch, an insight parallel to the quote from Feynman introducing this chapter.

Another interesting feature of Braitenberg’s book is that all of the synthesis described is done through thought experiments rather than by using real robots or computer simulations— although many research-ers reconstructed these experiments using actual robots years later. Braitenberg’s thought experiments are simple, yet provide readers with valuable clues about the cognitive organization underlying adaptive behaviors. Some representative examples of his thought experiments are introduced as follows, because they offer a good introduction to understanding the behavior- based approach.

5.3.1 Braitenberg’s Vehicle Thought Experiments

In his book, Braitenberg introduces thought experiments concerning 14 different types of vehicles. Here, we confine ourselves to looking at Vehicles 2, 3, and 4 as representative examples. Each of the three vehicles is equipped with a set of paired sensors on the front left- and right- hand sides of its body. The sensory inputs are transmitted to the left and right wheel drive motors at the rear through connecting lines which are analogous to synaptic connections. Let’s begin with Vehicle 2a shown in Figure 5.11.

The vehicle has light intensity sensors to the front on each side that are connected to its corresponding rear motors in an excitatory manner

104 On the Mind

104

(same- side excitatory connectivity). If a light source is located directly ahead of the vehicle, it will crash into the light source by accelerating the motors on both sides equally. However, if there is a slight deviation toward the light source, the deviation will be increased by accelerating the motor on the side closer to the light source. This eventually gener-ates radical avoidance of the light source. On the other hand, if each sensor is connected to a motor on the opposite side (cross- excitatory connectivity), as shown for Vehicle 2b in Figure 5.11, the vehicle always crashes into the light source. This is because the motor on the opposite side of the light source accelerates more and thus the vehicle moves toward the light source. Vehicles 2a and 2b are named Coward and Aggressive, respectively.

Now, let’s suppose that the connectivity lines, rather than being excitatory as for Vehicle 2, are inhibitory for Vehicle 3 (Figure 5.11). Now Vehicle 3 has drastically different behavior characteristics from Vehicle 2. First, let’s look at Vehicle 3a which has same- side inhibi-tory connectivity. This vehicle slows down in the vicinity of the light source. It is gradually attracted to the light source and finally stops close enough (perhaps depending on friction of the wheels and other factors). If the vehicle deviates slightly to one side from the source, the motor on the opposite side slows down because it is inhibited by the sensor that perceives a stronger stimulus from the source. If it deviates to the right, then the left wheel is inhibited, and vice versa.

Vehicle 2a Vehicle 2b

Vehicle 3a

Vehicle 3b

+ +

– –

+ +

––

Figure 5.11. Braitenberg vehicles 2a and 2b (top) and 3a and 3b (bottom).


105

Eventually, the vehicle shifts back toward the source and finally stops to stay in the vicinity of the source. In the case of Vehicle 3b, which has cross- inhibitory connectivity, although this vehicle also slows down in the presence of a strong light stimulus, it gently turns away from the source, employing the opposite control logic of Vehicle 3a. The vehicle heads for another light source. Vehicles 3a and 3b are named Lover and Explorer, respectively.

Vehicle 4 is added with a trick in the connectivity lines: The relation-ship between the sensory stimulus and the motor outputs is changed from a monotonic one to a non- monotonic one, as shown in Figure 5.12a.

Because of the potential nonlinearity in the sensory– motor response, the vehicle will not just be monotonically approaching the light sources or escaping from them. It can happen that the vehicle approaches a source but changes course to deviate away from it when coming within a certain distance of it. Braitenberg imagined that repetitions of this sort of approaching and moving away from light sources can result in the emergence of complex trajectories, as illustrated in Figure 5.12b. Simply by adding some nonlinearity to the sensory– motor mapping functions of the simple controllers, the resultant interactions between the vehicle and the environment (light sources) can become significantly complex. These are very interesting results. However, being thought experiments, this approach is quite limited. Should we wish to consider emergent behaviors beyond the limits of such thought experiments, we require computer simulations or real robotics experiments.

I

V

(a)

(b)

Figure 5.12. Braitenberg vehicle 4. (a) Nonlinear maps from sensory intensity to motor velocity assumed for this vehicle and (b) complex behaviors that emerge on more complex maps.

106 On the Mind

106

5.3.2 Behavior- Based Robots and Their Limitations

Returning now to behavior- based robotics, Brooks elaborated on thoughts similar to Braitenberg’s by demonstrating that even small and extremely simple insect- like robots could exhibit far more com-plex, realistic, and intelligent behaviors than the conventional compu-tationally heavy robots used in traditional AI research. This marked the beginning of behavior- based robotics research. Argumentative papers published by Brooks, such as “Elephants don't play chess” (Brooks, 1990) and “Intelligence without representation” (Brooks, 1991), present his thoughts on what he calls “classical AI” and “nouvelle AI.” He has criticized the use of large robots programmed with classical AI schemes, arguing that a lot of the computation time is spent on logical inference or the preparation of action plans in real- world tests even before taking a single step or indeed making any movement at all.

On the other hand, small robots whose behavior is based on the phi-losophy of nouvelle AI are designed to move first, taking part in physical interactions with their environment and with humans while comput-ing all the necessary parameters in real time in an event- based manner. Brooks also criticizes the tendency of classical AI to be overwhelmed with “representation.” For example, typical mobile robots based on the classical AI scheme are equipped with global maps or environment models represented in a three- dimensional Cartesian coordinate sys-tem. The robots then proceed to match what they have sensed through devices such as vision cameras with the stored representation through complicated coordinate transformations for each step of their move-ment as they find their location in the stored Cartesian coordinate sys-tem. The behavior- based robots made by Brooks and his students use only a simple scheme based on the perception- to- motor cycle, in which the motor outputs are directly mapped from the perceptual inputs at each iteration.

The problem with the classical AI approach is that the representation is prepared not through actual actions taken by the agent (the robot), but by implementing an externally imposed artificial purpose. This problem can be attributed to the lack of direct experience, which is related to Husserl’s discussions on phenomenological reduction (see chapter 3).

Behavior- based robotics could provide AI researchers and cognitive scientists with a unique means to obtain a view on first- person experi-ence from the viewpoint of a robot by almost literally putting themselves


107

inside its head, thereby affording the opportunity to examine the sen-sory flow experienced by the robot. Readers should note that the idea of the perception- to- motor cycle with small controllers in behavior- based robots and Braitenberg vehicles is quite analogous to the aforementioned Gibsonian theories emphasizing the role of the environment rather than the internal brain mechanisms (also see Bach, 1987)).

Behavior- based approaches that emphasize embodiment currently dominate the field of robotics and AI (Pfeifer & Bongard, 2006). Although this paradigm shift made by the behavior- based robotics researchers is deeply significant, I feel a sense of discomfort that the common use of this approach emphasizes only sensory– motor level interactions. This is because I still believe that we humans have the “cogito” level that can manipulate our thoughts and actions by abstract-ing our daily experiences from the sensory– motor level. Actually, Brooks and his students examined this view in their experiments applying the behavior- based approach to the robot navigation problem (Matarić, 1992). The behavior- based robots developed by Brooks’ lab employed the so- called subsumption architecture, which consists of layers of com-petencies or task- specific behaviors that subsume lower levels. Although in principle each behavior functions independently by accessing sensory inputs and motor outputs, behaviors in the higher layers subsume those in the lower ones by sending suppression and inhibition signals to their sensory inputs and motor outputs, respectively. A subsumption archi-tecture employed for the navigation task is shown in Figure 5.13.

The subsumption control of behaviors allocated to different layers includes avoiding obstacles, wandering and exploring the environment, and building map and planning. Of particular interest in this architec-ture is the top layer module that deals with map building and planning.

Building map & planning

Exploring

Wandering

Avoiding objectsSensation Motor

Figure 5.13. The subsumption architecture used for the robot navigation problem in research by Brooks and colleagues.

108 On the Mind

108

This layer, which corresponds to the cogito level, is supposed to generate abstract models of the environment through behavioral experiences and to use these in goal- directed action planning.

An important remaining problem concerns the ways that acquired models or maps of the environment are represented. Daniel Dennett points to this problem when writing “The trouble is that once we try to extend Brooks’ interesting and important message beyond the simplest of critters (artificial or biological), we can be quite sure that something awfully like representation is going to have to creep in…” (Dennett, 1993, p. 126). The scheme by Matarić (1992) employed a topological graph representation for the environment map consisting of nodes representing landmark types and arrows representing their transitions in the course of traveling (see Figure 2.2). As long as sym-bols understood to be arbitrary shapes of tokens (Harnad, 1990) are used in those nodes for representing the world, they can hardly be grounded in the physical world in a metric space common to the physical world, as discussed earlier. In light of this, what direction of research should behavior- based robotics researchers pursue? Should we give up involving the cogito level or accept the usage of symbols for incorporating cogito level activities, bearing in mind potential inconsistencies?

Actually, a clue to resolving this dichotomy can be found in one of Braitenberg's vehicles, Vehicle 12. Although Braitenberg vehicles up to Vehicle 4 have been introduced in numerous robotics and AI textbooks, the thought experiments beyond Vehicle 4, which target higher- order cognitive mechanisms, are equally interesting. These higher- order cog-nitive vehicles concern logic, concepts, rules, regularities, and foresights. Among them, Vehicle 12 examines how a train of thought can be gener-ated. Braitenberg implemented a nonlinear dynamical system, a logistic map (see section 5.1), into the vehicle that enables sequences of val-ues or “thoughts” in terms of neuronal activation to be generated in an unpredictable manner but with hidden regularity by means of chaos. Braitenberg argues that this vehicle seems to possess free will to manip-ulate thoughts, at the least from the perspective of outside observers of the vehicle. We will come back to this consideration in later chapters as the issue of free will constitutes one of the main focuses of this book.

So far, we have seen that Gibsonian and Neo- Gibsonian researchers as well as behavior- based robotics researchers who emphasize embodied cognition tend to regard the role of the brain as only that of a minimal


109

controller. This is because even very primitive controllers like the Braitenberg vehicles can generate quite complex behaviors when cou-pled with environmental stimuli. It is only natural to expect that even higher- order cognition might emerge to some extent if further nonlin-earity (like that employed in Vehicle 12) or some adaptability could be added to the controller.

Now, we begin to consider minimal forms of an artificial brain, namely neural network models that are characterized by their nonline-arity and adaptability, when put into robot heads. Note, however, that these attempts do not accord with our knowledge that the brain is a complex organ, as we have seen in previous chapters. So, let’s contem-plate first how this discordance can be resolved.

5.4. Modeling the Brain at Different Levels

As for general understanding, neural activity in the brain can be described on the basis of processes that occur at multiple levels, start-ing from the molecular level (which accounts for processes such as protein synthesis and gate opening in synapses), the neurochemical level (which accounts for signal transmission), the single cell activ-ity level (which accounts for processes such as spiking), and the cell assembly level in local circuits through to the macroscopic regional activation level measurable with technologies such as fMRI or EEG. The target level depends on the phenomenon to be reproduced. If we aim to model the firing activity of a single cell, we describe precisely how the membrane potential changes as a result of ion flow in a single neuron. If we aim to model neuron interconnection phenomena, as observed in the hippocampus by Ikegaya and colleagues (2004) using optical recording techniques, the model should focus on how spik-ing activity can spread across local circuits consisting of thousands of interconnected neurons.

On the other hand, if we aim to model neural processing related to the generation of cognitive behavior, it would not be a good idea to model a single spiking neuron. Rather, such modeling would require the reproduction of interactions between multiple brain regions to simulate the activities of tens of billions of spiking neurons, something that is impossible to perform with computer technology currently available to us. Another problem besides computational power is the operation and

110 On the Mind

110

maintenance of such a tremendously complex simulator as well as tech-niques for processing the results of simulations.

In fact, using supercomputers to reproduce neural circuits in the brain presents some considerable challenges in terms of making the simulation realistic. At present, we can obtain experimental data about connectivity between different types of neurons by using techniques such as labeling individual neurons with distinctly colored immuno-fluorescence markers appearing in specially modified transgenic ani-mals. These labeled neurons can be traced by confocal microscopy for each section of the sampled tissue, and eventually a three- dimensional reproduction of the entire system of interconnected neurons can be pre-pared by stacking a number of the images. For example, the Blue Brain project led by Henry Markram (Markram et al., 2015) reconstructed microcircuitory in the somatosensory neocortical of rat, consisting of about 31,000 neurons in a didgital computer model. This simulation coped with neurophysiological details such as reconstruction of firing propertis of 207 morpho- electrical types of neural cells in the circuit. The project is now attempting to reproduce the entire visual cortex, which consists of about a million of columns each of which consists of about 10,000 cells. If this is achieved, it may also be possible to create a cellular- level replica of the entire brain! Of course, such an accomplish-ment would provide us with vast amounts of scientific insight.

At the same time, however, I wonder how tractable such a realistic brain simulator would be. I imagine that for a realistic replica of the brain to function properly, it might also require realistic interactions with its environment. Therefore, it should be connected to a physical body of some sort to attain equally realistic sensory– motor interactions with the environment. It may take several years for the functions of a human- level brain replica to develop to a sufficiently high level by being exposed to realistic sensory– motor interactions, as we know that the development of cognitive capabilities in human infants requires a comparably long period of intensive parental care. Also, if a human- level brain replica must be embedded in various social contexts in human society to ensure its proper development, such an experiment may not be feasible for various other reasons, including ethical problems associ-ated with building such creatures. These issues will arise again in the final chapters of this book.

If the goal of modeling, though, is to build not a complete replica of the human brain but rather an artifact for synthesis and analysis that


111

can be used to obtain a better understanding of the human mind and cognition in general in terms of its organizational and functional prin-ciples, such models must be built with an adequate level of abstraction to facilitate their manipulability. Analogously, Herbert Simon (1981) wrote that we might hope to be able to characterize the main properties of the system and its behavior without elaborating the detail of either the outer or inner environments in modeling human. Let us remember the analyti-cal results obtained by Churchland and colleagues (2010) showing that the principal dimensions of ensembles of neuronal firing can be reduced to a few, as introduced in chapter 4. Then, it might be reasonable to assume that the spiking of some hundreds of neurons can be reproduced by simulating the activities of a few representative neural units modeled as point masses. An interesting observation is that the macroscopic state of collective neural activity changes continuously and rather smoothly in low- dimensional space, even though the activity of each neuron at each moment is discontinuous and noisy in regard to spiking. So, cogni-tion and behavior might just correlate with this macroscopic state, which changes continuously in a space whose dimensionality is several orders lower than the original dimensionality of the space of spiking neurons.

Consequently, it might be worthwhile to consider a network model consisting of a set of interacting units in which each unit essentially represents a single dimension of the original collective activity of the spiking neurons. Actually, this type of abstraction has been assumed in the connectionist approach, which is described in detail in the seminal book Parallel and distributed processing: Explorations in the microstructure of cognition, edited by Rumelhart, McClelland, and the PDP Research Group (1986). They showed that simple network models consisting of sets of activation units and connections can model various cogni-tive processes, including pattern matching, dynamic memory, sequence generation- recognition, and syntax processing in distributed activation patterns of the units. Those cognitive processes are emergent proper-ties of the interactive dynamics within networks, which result from the adjustment of connectivity weights between the different activation units caused by learning.

Among the various types of connectionist network models proposed, I find particularly interesting a dynamic neural network model called the recurrent neural network (RNN) (Jordan, 1986; Elman, 1990; Pollack, 1991). It is appealing because it can deal with both spatial and tem-poral information structures by utilizing its own dynamic properties.

112 On the Mind

112

However, the most important characteristics of RNNs are their gen-erality. As we proceed, we’ll see that RNNs, even in their minimal form, can exhibit general cognitive functions of learning, recognizing, and generating continuous spatiotemporal patterns that achieve gener-alization and compositionality while also preserving context sensitiv-ity. These unique characteristics of RNNs are due to the fact that they are nonlinear dynamical systems with high degrees of adaptability. It is well known that any computational process can be reconstructed by nonlinear dynamical systems as long as their parameters are adequately set (Crutchfield & Young, 1989). A study by Hava Siegelmann (1995) has established the possibility that analog computations by RNNs can exhibit an ultimately complex computational capability that is beyond the Turing limit. This can be understood by the fact that a nonlinear dynamical system can exhibit complexity equivalent to an infinite state machine depending on its parameters, as described in section 5.1.

Next, we start to look at a simpler neural network, the feed- forward network that can learn input- output mapping functions for static pat-terns. Then, we show how this feed- forward network can be extended to RNNs, which can learn spatiotemporal patterns. At the same time, we examine the basic characteristics of the RNN model from the per-spective of nonlinear dynamical systems.

5.5. Neural Network Models

This section introduces three types of basic neural network models including the three- layered feed- forward network, the discrete- time RNN, and the continuous- time RNN (CTRNN). All three types have two distinct modes of operation. One is the learning mode for deter-mining a set of optimal connectivity weights from a training dataset, and the other is the testing mode in which an optimal output pattern is generated from an example test input pattern.

5.5.1 The Feed- Forward Network Model

The feed- forward network model is shown in Figure 5.14. It consists of an input unit layer, a hidden unit layer and an output unit layer. Neural activations propagate from the input units to the hidden units and to the output units through the connectivity weights spanning between


113

each layer. The objective of learning is to determine a set of optimal connectivity weights that can reconstruct input- output patterns given in the target training dataset. The learning is conducted by utilizing the error backpropagation scheme that was conceived independently by Shun-Ichi Amari (1967), Paul Werbos (1974), and Rumelhart and colleagues (1986).

We assume that the network consists of input units (indexed with k), hidden units (indexed with j, and output units (indexed with i) and is trained to produce input- output mapping for P different patterns. The activations of the units when presented with the nth pattern are denoted as inn

k , anj , and on

i , respectively, where innk , is given as input. The poten-

tials of the hidden and output units are denoted as unj and un

i , respec-

tively, and the training target of the nth pattern is denoted as oni . Thus,

the forward activation of an output unit is written as:

u w a bni

ij nj

ni

j= +∑

(Eq. 7a)

o uni

ni= f ( ) (Eq. 7b)

Where b fni is a bias value for each unit and is a sigmoid functiion.

Similarly for the hidden units as:

u w in bnj

jk nk

nj

k= +∑

(Eq. 8a)

Wjk

Wij

Output

Input

Hidden

iOn

iOn

ja n

innk

∆Wij εδni an

j= –

δni on

i= – ( oni– (

( (oni on

i1 – ..

.

.

.

. .Wijδnj δn

i anj= Σi ( ( ( (an

j1 –

∆Wjk εδnj an

k= –

Figure 5.14. The feed- forward network model. Feed- forward activation and error backpropagation schemes are illustrated in the model. The right side of the figure shows how the delta error and the updated weights can be calculated through the error backpropagation process from the output layer to the hidden layer.

114 On the Mind

114

a unj

nj= f ( ) (Eq. 8b)

Here, the goal of learning is to minimize the square of the error between the target and the output, as shown in Eq. 9

E o o o onni

ni

ni

ni

i= − ⋅ −∑1

2( ) ( )

(Eq. 9)

First, we formulate how to update the connection weights in the output layer, which are denoted as ∆wij. Because the weights should be updated in the direction of minimizing the square of the error, the direction can be obtained by taking the derivative of En with respect to wij as follows:

∆ = − ∂∂

wEwij

n

ij

ε

The right side of this equation can be decomposed as:

−∂∂

= −∂∂

⋅∂∂

ε εEw

Eu

uw

n

ij

n

ni

ni

ij

By applying Eq. 7a to the second derivation on the right side, we obtain

− ∂∂

= − ∂∂

⋅ε εEw

Eu

an

ij

n

ni n

j

(Eq.10)

Here, ∂∂Eu

in

ni is the delta error of the th unit, which is denoteed as δn

i . The

delta error represents the contribution of the potential value of the unit to the square error:

δni

n

ni

Eu

= ∂∂

= ∂∂

⋅ ∂∂

Eo

ou

n

ni

ni

ni

By applying Eq. 9 to the first term on the right side and taking the deriv-ative of the sigmoid function with respect to the potential for the second term of the preceding equation, the delta error at the ith unit can be obtained as follows.


115

δni

ni

ni

ni

nio o o o= − − ⋅ ⋅ −( ) ( )1 (Eq. 11)

Furthermore, by utilizing the delta error in Eq. 10, the updated weight can be written as:

∆ = − ⋅w aij ni

njεδ

(Eq. 12)

Next, we obtain the updated connection weights of the hidden layer, which are denoted as ∆wjk , by taking the derivative of En, with respect to wjk.

∆ = − ∂∂

wEwjk

n

jk

ε

= − ∂∂

⋅ ∂∂

ε Eu

uw

n

nj

nj

jk

By substituting ∂∂Eu

jn

nj n

jwith the delta error at the th δ and folding by ∂∂

uw

anj

jknkat by applying Eq. 8a, the updated weights can be written as:

∆ = − ⋅w ajk nj

nkεδ

(Eq. 13)

Here, δ δnj

nican be derived from the previously obtained as folllows :

δnj

n

nj

Eu

= ∂∂

= ∂∂

⋅ ∂∂

⋅ ∂∂∑ E

uua

au

n

ni

ni

nj

nj

nji

= −⋅ ⋅ ⋅∑ ( ( ))δni

ij nj

nj

iw a a1

(Eq. 14)

It should be noted that ( )δni

ijiw⋅∑ in the first term on the right side

represents the sum of delta errors δni back- propagated to the jth hidden

unit multiplied by each connection weight wij. If there are more layers, the same error backpropagation scheme is repeated, in the course of which (1) the delta error at each unit in the current layer is obtained by back- propagating the error from the previous layer through the connec-tion weights and (2) the incoming connection weights to the units in the

116 On the Mind

116

current layer are updated by using the obtained delta errors. The actual process of updating the connection weights is implemented through summation of each update for all training patterns as:

w w wnew old

n

P

n= + ∆∑

(Eq. 15)

5.5.2 Recurrent Neural Network Models

Recurrent neural network models have been used to investigate the human cognitive capability of dealing with temporal processes such as in motor control (Jordan, 1986) and language learning (Elman, 1990). Let’s look at the exact form of the RNN model. Although various types of RNNs have been investigated so far (Jordan, 1986; Doya & Yoshizawa, 1989; Williams & Zipser, 1989; Elman, 1990; Pollack, 1991; Schmidhuber, 1992), it might be helpful to look at the Jordan- type RNN (Jordan, 1986), as it is one of the simplest implementations, illus-trated in Figure 5.15.

This model has context units in addition to current step inputs int

and next step output outt +1. The context units represent context or the

internal state in representing dynamic sequence patterns. In the forward dynamics, the current step context unit activation ct is mapped to its next step activation ct +1.

Let us consider an example of learning to generate a simple 1- dimensional cyclic sequence pattern of period 3 such as “0 0 1 0 0

(a) (b)

In2

Out2

Out2

In1

Out1

Out1

In0

Out3

Out3Outt+1

Outt+1

IntContextt

Contextt+1

error

error

error

t = 0

t = 1

t = 2

Figure 5.15. Jordan- type RNN. (a) Forward activation and (b) the error backpropagation through time scheme in the cascaded RNN.


117

1 … 0 0 1… .” In this example, the input is given as a sequence of “0 0 1 0 0 1 … 0 0” and the target output is given as this sequence shifted for-ward one step, “0 1 0 0 1 0 … 0 1.” The learning of this type of sequence faces the hidden state problem because the sequences include the same target output value in different orders in the sequence (i.e., two 0s in the first step and the second step in this cyclic sequence pattern.) Although this type of sequence cannot be learned by the feed- forward network by means of simple input- output mapping, the RNN model with the context units can learn it if the context unit activation states can be differentiated from ambiguous outputs, that is, a 1- dimensional context activation sequence is formed such as “0.2 0.4 0.8 0.2 0.4 0.8 … 0.2 0.4 0.8,” which is mapped to the output activation sequence of “0 0 1 0 0 1 … 0 0 1.” It is noted that the Jordan- type RNN operated in discrete time steps can be regarded as a dynamical map as shown in Eq. 2 in sec-tion 5.1 by considering that the current state Xt consisting of the current step input and the current context state is mapped to next state Xt +1 con-sisting of the input and the context at next step. Connectivity weights correspond to the parameter P of the dynamical map. Therefore, the RNN model can acquire desired dynamic structures by adequately tun-ing the connectivity weights to the learnable parameters of the dynami-cal map. For example, the aforementioned cyclic pattern of repeating “0 0 1” can be learned as a limit cycle attractor of period 3.

One of the important characteristics of RNNs is that they can exhibit dynamic activities autonomously without receiving any inputs when operated in closed- loop by feeding the prediction output for next step to the input of the current step. This phenomenon is explained quali-tatively by Maturana and Varela (1980), who state that neural circuits are closed circuits without any input or output functions. Closed cir-cuits maintain endogenous dynamics that are structurally coupled with sensory inputs and produce motor outputs, wherein sensory inputs are considered to be perturbative inputs to the endogenous dynamics. Although it is tempting to think that motor outputs are generated sim-ply by mapping different sensory states to sensory reflexes in different situations, this process should in fact involve additional steps, including utilization of the autonomous internal dynamics of the closed network. Iterative interactions between interconnected neural units afford the RNN a certain amount of autonomy, which might well constitute the origin of voluntary or contextual sensitivity. Later sections return to this point, again, as we focus in on the issue of free will.

118 On the Mind

118

The RNN employs a learning scheme, backpropagation through time (BPTT) (Rumelhart et al., 1986; Werbos, 1988), which has been devel-oped by extending the conventional error backpropagation scheme in the backward time direction to develop adequate dynamic activation patterns in the context units. In the aforementioned feed- forward net-work model, the connectivity weights between the output layer and the hidden layer are updated by using the error generated between the tar-get output and the generated output. Then the connectivity weights between the input layer and the hidden layer are updated using the delta error back- propagated from the output units to the hidden units. However, in the case of RNN, there are no error signals for the con-text output units because there are no target values for them. Therefore there are no means to update the connectivity weights between the con-text output units and the hidden units. However, in this situation, if the delta error back- propagated from the hidden units to the context input units is copied to the context output units in the previous step, the connectivity weights between the context output units and the hidden units can be updated by utilizing this copied information.

This scheme of the BPTT can be well understood by supposing that an identical RNN is cascaded in the direction of time to form a deep, feed- forward network as shown in Figure 5.15b. In this cascaded net-work, the current step activation of the context output units is copied to the context input units in next step, which is repeated from the start step to the end step in the forward computation. On the other hand, in the backward computation for the BPTT, the error generated in the output units at a particular time step is propagated through the context input units to the previous step context output units, which is repeated until the delta error signal reaches the context input units in the start step. In the BPTT, the error signals originating from the output units of different time steps are accumulated as the time step folds back and by which all the connectivity weights of the identical RNN can be updated.

The capability of the RNN for self- organizing context- dependent information processing can be understood well by looking at a prominent research outcome presented by Jeffrey Elman (1991) on the topic of lan-guage learning utilizing RNN models. He showed that a version of RNN, now called an Elman net (Figure 5.16a) can learn to extract grammatical structures from given exemplar sentences. In his simulation experiment, the example sentences for training the network were generated by using a lexicon of 23 items including 8 nouns, 12 verbs, the relative pronoun


119

“who,” and a period for indicating ends of sentences. The sentence gen-erations followed a context- free grammar that is shown in Figure 5.16b.

As described in chapter 2, various sentences can be generated by recursively applying substitution rules starting from S as the top of the tree representing the sentence structure. Especially, the presence of a relative clause with “who” allows generation of recursively complex sen-tences such as: “Dog who boys feed sees girl.” (See Figure 5.16c.) In the experiment, the Elman network was used for the generation of succes-sive predictions of words in sentences based on training of exemplar sentences. More specifically, words were input one- at- a- time at each step, and the network predicted the next word as the output. After the prediction, the correct target output was shown and the resultant pre-diction error was back- propagated, thereby adapting the connectivity weights. At the end of each sentence, the first word in the next sentence was input. This process was repeated for thousands of the exemplar sentences generated from the aforementioned grammar. It is noted that the Elman network in this experiment employed a local representation in the winner- take- all way using a 31- bit vector for both the input and the output units. A particular word was represented by an activation of a corresponding unit out of 31 units. The input and the output units had the same representation.

The analysis of network performance after the training of the tar-get sentences showed various interesting characteristics of the network

Next word prediction output

Current word input

Current context

Context loop

(a) (b)

(c)

S � NP VP “.”NP � PropN | N | N RCVP � V (NP)RC � who NP VP | who VPN � boy | girl | cat | dog | boys | girls | cats | dogsPorpN � John | MaryV� chase | feed | see | hear | walk | live |

chases | feeds | sees | hears | walks | lives

S

NP

N

N

N

V

V

VP

RC

VPwho

who

NP

NP

dog boys feed sees girl

Figure 5.16. Sentence learning experiments done by Elman. (a) The Elman network, (b) the context free grammar employed, and (c) an example sentence generated from the grammar (Elman, 1991).

120 On the Mind

120

behaviors. First, look at simple sentence cases. When a singular noun “boy” was input, all three singular verb categories as well as “who” for relative clause were activated as possible predicted next words and all other words were not activated at all. On the other hand, when a plural noun “boys” was input, all plural verbs and “who” were activated. This means that the network seems to capture the singular– plural agree-ments between subject nouns and verbs. Moreover, actual activation val-ues encoded the probability distribution of next- coming words, because only “boy” or “boys” cannot determine the next words deterministically. It was also observed that the network captures verb agreement struc-tures as well. For example, after two succeeding words “boy lives” are input, a period is predicted. For the case of “boy sees,” both a period and noun words were activated for the next prediction. Finally, for the case of “boy chases,” only noun words were activated. The network seems to understand that “live” and “chase” are an intransitive verb and a transi-tive verb, respectively. It also understands that “see” can be both.

Although the presence of a relative clause makes a sentence more complex, singular– plural agreements were preserved. An example exists in the following paired sentences:

1. boy who boys chase chases boy2. boys who boys chase chase boy

Actually, the network activated the singular verbs after being input “boy who boys chase” and activated the plural ones after being input “boys who boys chase.” To keep singular– plural agreements between subjects and distant verbs, the information of singular or plural of the subjects had to be preserved internally. Elman found that context acti-vation dynamics can be adequately self- organized in the network for this purpose.

5.5.3 Continuous Time Recurrent Neural Network Model

Next, we look at an RNN model operated in continuous time, which is known as a continuous- time recurrent neural network (CTRNN) (Doya & Yoshizawa, 1989; Williams & Zipser, 1989). Let us consider a CTRNN model without an explicit layer structure in which each neu-ral unit has synaptic inputs from all other neural units and also from its own feedback (see Figure 5.16a as an example). In this model, the activation dynamics of each neural unit can be described in terms of the differential equations shown in Eq. 16, equations in which each neural


121

unit has synaptic inputs from all other neural units as well as from its own feedback.

τ u u a w Ii i j

j iji= − + +∑

(Eq. 16a)

a ei ui

= + −1 1/ ( ) (Eq. 16b)

The left side of Eq. 16a represents the time differential of the poten-tial of the ith unit multiplied by a time constant τ , which is equated with the sum of synaptic inputs subtracted from the first term −ui.This means that positive and negative synaptic inputs increase and decrease the potential of the unit, respectively. If the sum of synaptic inputs is zero, the potential converges toward zero. The time constant τ plays the role of a viscous damper with its positive value. The larger or smaller the time constant τ , the slower or faster the change of the potential ui. You may notice that this equation is analogous to Eq. 3 of representing a general form of the continuous- time dynamical system.

Next, let’s examine the dynamics of CTRNNs. Randall Beer (1995a) showed that even a small CTRNN consisting of only three neural units can generate complex dynamical structures depending on its param-eters, especially the values of the connection weights. The CTRNN model examined by Beer consists of three neural units as shown in Figure 5.17a. Figure 5.17b– d shows that different attractor configura-tions can appear depending on the connection weights.

An interesting observation is that multiple attractors can be generated simultaneously with a given specific connection weight matrix. The eight stable fixed- point attractors and the two limit- cycle attractors appear with each specific connection weight, as shown in Figure 5.17b and c, respectively, and the attractor towards which the state trajectories converge depends on the initial state. In Figure 5.17d, a single chaotic attractor appears with a dif-ferent connection weight matrix. This type of complexity in attractor con-figurations might be the result of mutual nonlinear interactions between multiple neural units. In summary then, CTRNNs can autonomously gen-erate various types of dynamic behaviors ranging from simple fixed- point attractors through limit cycles to complex chaotic attractors, depending on the parameters represented by connection weights (this characteristics is the same also for discrete time RNN [Tani & Fukumura, 1995]). This fea-ture can be used for memorizing multiple temporal patterns of perceptual signals or movement sequences, which will be especially important as we consider MTRNNs later.

122 On the Mind

122

In the case of a CTRNN characterized by the time constant param-eter � τ , the BPTT scheme for supervised learning is used with slight modifications to the original form. Figure 5.18 illustrates how the BPTT scheme can be implemented in a CTRNN.

First, the forward activation dynamics of the CTRNN for n steps is computed by following Eq. 17 with a given initial neural activation state at each unit. Eq. 17 is obtained by converting Eq. 16 from a differential equation form into a difference equation by using Euler’s method for the purpose of numerical computation.

u u a w Iti

i ti

i j ti

ij ti= − + ∑ +− − −( ) ( )1 1 1

1 1 1τ τ (Eq. 17a)

a eti ut

i

= + −1 1/ ( ) (Eq. 17b)

What we have here is the leaky- integrator neuron with a decay rate of

11−τi

. After the forward computation with these leaky integrator neural

(a)

(b) (c)

1

3 2

w31

w13

w32

w12

w21

w11

w33 w23 w22

(d)

u2

0

0

64

4

1

2

2

2

u1u1

u3 u3

u20

0

2

64

2

4

1

2

u1

u2

u3

10

10

10

5

5

5

0

00

Figure 5.17. Different attractor configurations appear in the dynamics of a continuous time RNN model consisting of three neural units receiving synaptic inputs from the other two neural units as well as own recurrent ones. (a) The network architecture, (b) eight stable fixed- point attractors denoted as black points, (c) two limit cycles denoted as line circles with arrows, and (d) chaotic attractors. (b), (c), and (d) are adopted from (Beer, 1995a) with permission.


123

units, the backpropagation computation is initiated by computing the error between the training target and the current output in the nth step. The delta error at the output unit in the nth step is computed as δn

0. Then, this delta error is back- propagated to the 1st, 2nd, 4th, and 5th units, as denoted by continuous arrows, through forward connections. These delta errors propagated to local units are further back- propagated to the 1st, 2nd, 3rd, and 4th units in the (n− 1)st step and to the 1st, 2nd, and 4th units in the (n− 2)nd step. Additionally, the delta errors generated at the output unit in steps n− 1 and n−2 are also back- propagated in the same manner, as denoted by dotted- line and chain- line arrows, respectively. This backpropagation process is recursively repeated until the 1st step of the sequence is reached. One important note here is that the way of computing the delta error in CTRNN is different from the one in the conventional RNN because of the leaky integrator term in the forward activation dynamics defined in Eq. 17a. The delta error at the ith unit ∂

∂Eut

i,

either for an output unit or an internal unit, is recursively calculated from the following formula:

∂∂

=

− −( ) ⋅ ⋅ −( ) + −

∂∂

∈

∂∂

+Eu

o o o oEu

i Out

Eu

ti

ti

ti

ti

ti

i ti1 1

1

1τ

ttkk N ik

i kki t

itiw a a i Out

+∈∑ −

+ ⋅ −( )

∉

1

11 1

1δτ τ

(Eq. 18)

On

On

errorn

0

13

2 4

5

n step

On–1

On–1

errorn–1

0

53

2 4

n–1 step

On–2

On–2

errorn–2

0

112 5

42

n–2 step

Figure 5.18. An extension of the error backpropagation scheme to CTRNNs. The figure shows how the error generated at the nth step is propagated back to the (n−2)nd step. Arrows with continuous lines, dotted lines, and chain lines denote the backpropagation error generated at the nth step, the (n− 1)st step, and the (n−2)nd step, respectively. These errors continue to back- propagate along the forward connection over time.

124 On the Mind

124

From the right- hand side of Eq. 18 it can be seen that the ith unit in the

current step t inherits a large portion 11−

τi

of the delta error ∂∂ +

Eut

i1

from the same unit in the next step t+1 when its time constant τi is relatively large. It is noted that Eq. 18 turns out to be the conventional, discrete time version of BPTT when τi is set at 1.0. This means that, in a network with a large time constant, error back- propagates through time with a small decay rate. This enables the learning of long- term cor-relations latent in target time profiles by filtering out fast changes in the profiles. All delta errors propagated from different units are summed at each unit in each step. For example, at the 1st unit in the (n−1)st step, the delta errors propagated from the 0th, 2nd, and 1st units are summed to obtain the error for the (n−1)st step. By utilizing the delta errors com-puted for local units at each step, the updated weights for the input connections to those units in step n−1 are obtained by following Eq. 13.

Although the aforementioned models of feed- forward networks, RNN and CTRNN employ the error backpropagation scheme as the central mechanism for learning, their biological plausibility in neuronal circuits has been questioned. However, some supportive evidence has been provided by Mu- ming Poo and colleagues (Fitzsimonds et al., 1997; Du & Poo, 2004), as well as by Harris (2008) in related discussions. It has been observed that the action potential back- propagates through dendrites when postsynaptic neurons in the downstream side fire upon receiving synaptic inputs above a threshold from the presynaptic neu-rons in the upstream side. What Poo has further suggested is that such synaptic inhibition or potentiation depending on information activity can propagate backward across not just one but some successive synaptic connections. We can, therefore, speculate that the retrograde axonal signal (Harris, 2008) conveying error information might propagate from the peripheral area of sensory– motor input- output to the higher- order cortical area, modulating its contextual memory structures by passing through multiple layers of synapses and neurons in the real brains like the delta error signal back- propagates from the output units to the inter-nal units in the CTRNN model. In light of this evidence, then, the bio-logical plausibility of this approach appears promising.

It should also be noted, however, that counterintuitive results have been obtained by other researchers. For example, using the “echo- state network” (Jaeger & Haas, 2004), a version of RNN in which internal units are connected with randomly predetermined constant weights and


125

only the output connection weights from the internal units are modu-lated without using error backpropagation, Jaeger and Haas showed that quite complex sequences can be learned with this scheme. My question here would be what sorts of internal structures can be gener-ated without the influence of error- related training signals. The next section introduces neurorobotics studies that use some of the neural network models, including the feed- forward network model and the RNN model.

5.6. Neurorobotics from the Dynamical Systems Perspective

Although Rodney Brooks did not delve deeply into research on adap-tive or learnable robots, other researchers have explored such topics while seriously considering the issues of embodiment emphasized on the behavior- based approach. A representative researcher in this field, Randall Beer (2000), proposed the idea of considering the structural coupling between the neural system, the body, and the environment, as illustrated in Figure 5.19.

The internal neural system interacts with its body and the body inter-acts with its surrounding environment, so the three can be viewed as a coupled dynamical system. In this setting, it is argued that the objective of neural adaptation is to keep the behavior of the whole system within a viable zone. Obviously, this thought is quite analogous to the Gibsonian and Neo- Gibsonian approaches as described in section 5.2.

In the 1990s, various experiments were conducted in which different neural adaptation schemes were applied in the development of sensory– motor coordination skills in robots. These schemes included: evolutional learning (Koza, 1992; Cliff et al., 1993; Beer 1995; Nolfi & Floreano, 2000; Di Paolo, 2000; Ijspeert, 2001; Ziemke & Thieme, 2002; Ikegami & Iizuka, 2007), which uses artificial evolution of genomes encoding connection weights for neural networks based on principles such as survival of the fittest; value- based reinforcement learning (Edelman, 1987; Meeden, 1996; Shibata & Okabe, 1997; Morimoto & Doya, 2001; Krichmar & Edelman, 2002; Doya & Uchibe, 2005; Endo et al., 2008), wherein the connection weights are modified in the direc-tion of reward maximization; and supervised and imitation learning (Tani & Fukumura, 1997; Gaussier et al., 1998; Schaal, 1999; Billard, 2000; Demiris & Hayes, 2002; Steil et al., 2004), wherein a teacher or

126 On the Mind

126

imitation targets exist. Most of these experiments were conducted in minimal settings with rather simple robots (mobile robots with range sensors in many cases) with small- scale neural controllers (influenced by Gibsonian and behavior- based philosophy). Although the experi-ments might have lacked scalability both with respect to engineering applications and accounting for human cognitive competence, they do demonstrate that nontrivial structures in terms of “minimal cognition” can emerge in the structural coupling between simple neural network models and the environment.

Let’s look now at a few examples of these studies from among the many remarkable studies that have been conducted. Especially, the fol-lowing emphasize the dynamical systems perspectives in developing and generating the minimal cognitive behaviors by neurorobots.

5.6.1 Evolution of Locomotion with Limit Cycle Attractors

It is widely held that rhythmical movements in animals, such as loco-motion, are generated by neural circuits called central pattern genera-tors (CPGs), which generate oscillatory signals by means of limit- cycle dynamics in neural circuits (Delcomyn, 1980). By constructing synthetic

Environment

Body

NeuralSystem

Figure 5.19. The neural system, the body, and the environment are considered as a coupled dynamical system by Randall Beer (2000).


127

simulation models and conducting robotics studies based on the con-cept of CPGs, a number of researchers have investigated the adaptation mechanisms of walking locomotion in a number of animals: six- legged insects (Beer, 1995b), four- legged dogs (Kimura et al., 1999), and two- legged humans (Taga et al., 1991; Endo et al., 2008), as well as in walk-ing and swimming via the spinal oscillation of four- legged salamanders (Ijspeert, 2001).

Especially, Beer (1995b) investigated how stable walking can be achieved for six- legged insect- like robots under different conditions in the interaction between internal neural systems and the environment by utilizing artificial evolution within CTRNN models. In this artificial evolution scheme, the connectivity weights in CTRNN are randomly modulated in terms of “mutation.” If some robots exhibit better per-formance in terms of the predefined fitness functions with the modu-lated weights in the networks as compared with others, these robots are allowed to “reproduce,” with their “offspring” inheriting the same connectivity weights of the networks. Otherwise, characteristic connec-tivity weights within networks are not “reproduced.” Thus, connectivity weights are adapted in the direction of maximizing fitness over genera-tions of population dynamics.

In Beer’s model, each leg is controlled by a local CTRNN consist-ing of a small number of neural units. Gait and motor outputs serve as sensory inputs for the network in terms of the torques generated when the legs move forward and backward. The six local CTRNNs are sparsely connected to generate overall body movement. During the evolutionary learning stage, the connectivity weights within the local CTRNN as well as the interconnections between the six local CTRNNs are mutated, and the fitness of the individual is evaluated by measuring the maximum for-ward walking distance within a specific time period.

An interesting finding from Beer’s simulation experiments on artificial evolution is that evolved locomotion mechanisms were qualitatively dif-ferent under different evolutionary conditions. First, if the sensory inputs were constantly enabled during evolution, a “reflective pattern genera-tor” evolved. Because the leg movements were generated by means of reflections of sensory inputs, the locomotive motor pattern was easily distorted when the sensory inputs were disrupted. Second, if the sensory inputs were made inaccessible completely from the network during evo-lution, a CPG- type locomotive controller evolved. This evolved control-ler could generate autonomous rhythmic oscillation without having any

128 On the Mind

128

external drives, by means of the self- organizing limit cycle attractor in the CTRNN. Third, if the presence of the sensory inputs was made unreli-able during evolution, a “mixed pattern generator” evolved. Although this controller could generate robust basic locomotion patterns even when the sensory inputs were disrupted, it demonstrated better locomotion performance when the sensory feedbacks were available (Figure 5.20).

In summary, these experiments showed that limit cycle attractors can emerge in the course of evolving the CTRNN controller for generating locomotion in different ways, depending on the parameters set for the evolution process. When the sensory feedback is available, a limit cycle is organized in the coupling between the internal dynamics of the CTRNN and the environment dynamics. Otherwise, the limit cycle attractor appears in the form of an autonomous dynamic in the CTRNN alone. Beer speculated that the mixed strategy that emerges under the condi-tion of unreliable sensory feedback is the most typical among biological pattern generators.

5.6.2 Developing Sensory– Motor Coordination

Schemes of evolutionary learning have been applied in robots for various goal- directed tasks beyond locomotion by developing sensory– motor coordination adequate for such tasks. Scheier, Pfeifer, and Kuniyoshi

(a)

(b)

R3R2R1L3L2L1

Velocity

Velocity

R3R2R1L3L2L1

Figure 5.20. Six- legged locomotion patterns generated by the evolved mixed pattern generator (a) shows a gait pattern with sensory feedback, and (b) shows one without sensory feedback. The case with the sensory feedback shows more stable oscillation with tight coordination among different legs. Adopted from (Beer, 1995b) with permission.


129

(1998) showed that nontrivial perceptual categorization capabilities can be acquired by inducing interactions between robots and their environ-ments. They prepared a workspace for a miniature mobile robot (55 mm in diameter), called Khepera (Figure 5.21), where large and small cylin-drical objects were placed at random.

The behavioral task for the robot was to approach large cylindrical objects and to avoid small ones. This task is far from trivial because the sensing capabilities of the Khepera robot are quite limited, consisting of just eight infrared proximity sensors attached to the periphery of the body. Therefore, the robot can acquire eight directional range images represent-ing distances to obstacles, but detection occurs only when an obstacle is within 3 cm, and the images are low resolution. Scheier and colleagues implemented a feed- forward neural network model that receives six directional range images from sensors at the front and controls the speeds of the left and right motors. The synaptic weights necessary for determin-ing the characteristics of mapping sensor inputs to motor outputs were obtained in an evolutionary way. The fitness value for evolutionary selec-tion increased when the robot stayed closer to large cylindrical objects and decreased when the robot stayed closer to small ones.

It was reported that when the robot evolved a successful network to accomplish the task, it would wander around the environment until it found an object and then would start circling it (Figure 5.22). The robot

Figure 5.21. The Khepera robot, which features two wheel motors and eight infrared proximity sensors mounted in the periphery of the body. Source: Wikipedia.

130 On the Mind

130

would eventually leave its trajectory if the object was a small cylindrical one, otherwise it would keep circling if the object was large. Because it was difficult to distinguish between large and small cylindrical objects by means of passive perception using the installed low- resolution proximity sensors, the evolutionary processes found an effective scheme based on active perception. In this scheme, the successfully evolved robot circled around a cylindrical object, whether small or large, simply by following the curvature of its surface, utilizing information from proximity sensors on one side of its body. A significant difference was found between large and small objects in terms of the way that the robot circled the object by generating different profiles of the motor output patterns which enabled different object types to be identified. This example clearly shows that this type of active perception is essential for the formation of the robot’s behavior, whereby perception and action become inseparable. Eventually, sensory– motor coordination was naturally selected for active perception in their experiment.

Nolfi and Floreano (2002) showed another good example of evolu-tion based on active perception, but in this case there is the added ele-ment of self- organization, the so- called behavior attractor. They showed that the Khepera robot equipped with a simple perceptron- type neural network model can evolve to distinguish between walls and cylindrical objects, avoiding walls while staying close to cylindrical objects. After

Figure 5.22. An illustration of the behavior trajectory generated by a successfully evolved Khepera robot. It would wander around the environment until it found a cylinder of large size and then would start circling it.


131

the process of evolution, the robot moves around by avoiding walls and staying close to cylindrical objects whenever encountering them. Here, staying close to cylindrical objects does not mean stopping. Rather, the robot continues to move back and forth and/ or left and right while maintaining its relative angular position to the object almost constant. A steady oscillation of sensory– motor patterns with small amplitude was observed while the robot stayed close to the object. Nolfi and Floreano inferred that the robot could keep its relative position by means of active perception that was mechanized by a limit cycle attractor developed in the sensory– motor coupling with the object. These two experimental studies with the Khepera robot show that some nontrivial schemes for sensory– motor coordination can emerge via network adaptation through evolution even when the network structure is relatively simple.

Before closing this subsection, I would like to introduce an intrigu-ing scheme proposed by Gaussier and colleagues (1998) for generat-ing immediate imitation behaviors of robots. The scheme is based on the aforementioned thoughts by Nadel (see section 5.2) that immediate imitation as a means for communication can be generated by synchro-nization achieved by a simple sensory– motor mapping organized under the principle of homeostasis. Gaussier and colleagues built an arm robot with a vision camera that learned a mapping between the arm's position as perceived in the visual frame and the proprioception (joint angles) of its own arm by using a simple perceptron- type neural network model. After the learning, another robot of a similar configuration was placed in front of the robot and the other robot moved its arm (Figure 5.23).

Proprioception

Self-robot

Arm

mov

emen

t

Arm

mov

emen

t

Other robot “like me”

Controller

Visualpercept

Vision camera

Figure 5.23. A robot generates immediate imitation of another robot’s movement by using acquired visuo- proprioceptive mapping (Gaussier et al., 1998).

132 On the Mind

132

When the self- robot perceived the arm of the other robot as its own, its own arm was moved and synchronized with the one of the other for the sake of minimizing the difference between the current propriocep-tion state and its estimation obtained from the output of the visuo- proprioceptive map under the homeostasis principle. This study nicely illustrates that immediate imitation can be generated as synchronicity by using a simple sensory– motor mapping that also supports the hypoth-esis of the “like me” mechanism also described in section 5.2.

Next, we look at a robotics experiment that uses sensory– motor map-ping but in a context- dependent manner.

5.6.3 Self- Organization of Internal Contextual Dynamic Structures in Navigation

We should pause here to remind ourselves that the role of neuronal sys-tems should not be regarded as a simple mapping from sensory inputs to motor outputs. Recalling Maturana and Varela (1980), neural cir-cuits are considered to exhibit endogenous dynamics, wherein sensory inputs and motor outputs are regarded as perturbations of and readouts from the dynamical system, respectively. This should also be true if we assume dynamic neural network models with recurrent connections, such as RNNs or CTRNNs. The following study shows such an example from my own investigations on learning goal- directed navigation, which was done in collaboration with Naohiro Fukumura (Tani & Fukumura, 1993, 1997). The experiment was conducted with a real mobile robot named Yamabico (Figure 5.24a).

The task was designed in such a way that a mobile robot with limited sensory capabilities learns to navigate given paths in an obstacle envi-ronment through teacher supervision. It should be noted that the robot cannot access any global information, such as its position in the X- Y coordinate system in the workspace. Instead, the robot has to navigate the environment depending solely on its own ambiguous sensory inputs in the form of range images representing the distance to surrounding obstacles.

First, let me explain a scheme called “branching” that is implemented in low- level robot control. The robot is preprogrammed with a colli-sion avoidance maneuvering scheme that determines its reflex behav-ior by using inputs from the range sensors. The range sensors perceive range images from 24 angular directions covering the front of the robot.


133

The robot essentially moves toward the largest open space in a forward direction while maintaining equal distance to obstacles on its left and right sides. Then, a branching decision is required when a new open space appears. Figure 5.24b,c illustrates how branching takes place in this workspace.

Action(branching)

(a)

(b) (d)

23

4

start

(c)

Tim

e

L R

4

3

2

1

Branching

1

context unitsrecurrent loopRange

Sensor

CCD cameras

Laser projector

Figure 5.24. Yamabico robot and its control architecture. (a) The mobile robot Yamabico employed in this experiment. (b) An example of a collision- free movement trajectory that contains four branching points labeled 1 to 4. (c) The corresponding flow of range sensor inputs, where brighter (closer) and darker (farther) parts indicate their ranges. The exact range profile at each branching point is shown on the right. Arrows indicate the branching decision to “advance” to a new branch or to “stay” at the current one. (d) The employed RNN model that receives inputs from range sensors and outputs the branching decision at each branching point.

134 On the Mind

134

Once this branching scheme is implemented in the robot, the essence of learning how to navigate the environment is reduced to the task of learning the correct branching sequences associated with the sensory inputs at each branching point. Here, the RNN model is used for learn-ing the branching sequences. Figure 5.24d shows how the Jordan- type RNN (Jordan, 1986) explained previously was used in the current navigation task. In this architecture, the original 24- dimensional range images are reduced to a three- dimensional vector by using a prepro-cessing scheme. This reduced sensory vector is provided as input to the RNN at each branching step, and the RNN outputs the corresponding branching decision along with the context outputs. Learning proceeds under supervision, wherein the experimenter trains the robot to gener-ate correct branching on specified target routes. The target route in this experiment is designed such that cyclic trajectories emerge in the form of a figure- 8 and a circular trajectory at the end, alternating them, as shown in Figure 5.25a.

In the actual training of the robot, the robot is guided repeatedly to enter this target cyclic route by starting from various locations outside the cyclic route (see Figure 5.25b for the traces of the training trajecto-ries). Then, a set of sequential data consisting of the sensory inputs and branching decisions along with the branching sequences are acquired. This sequential data is used to train the RNN so it can generate cor-rect branching decisions upon receiving sensory inputs in the respective sequences. Note that this is not just simple learning of input- output mapping, because the sensory inputs cannot necessarily determine the branching outputs uniquely. For example, the decision whether to move left and down or straight and down at the switching position denoted as A in Figure 5.25a should depend on the current context (i.e., whether the last travel was a figure- 8 or a circular trajectory) instead of on solely sensory inputs, because the latter are the same in both cases. This is called the sensory aliasing problem. It is expected that such differentia-tion of context unit activation can be achieved through adaptation of the connection weights.

After the training stage, the experimenter examines how the robot can accomplish the learned task by placing the robot in arbitrary ini-tial positions. Figure 5.25c shows two examples of evaluation trials, in which it can be seen that the robot always converges toward the desired loop regardless of its starting position. The time required for achieving convergence is different in each case, and even if the robot leaves the


135

loop after convergence under the influence of noise, it always returns to the loop after a time. These observations indicate that the robot has learned the objective of the navigation task as embedded in the attractor dynamics of limit cycles, which are structurally stable.

It is interesting to examine how the task is encoded in the internal dynamics of the RNN. By investigating the activation patterns of the RNN after its convergence toward the loop, it is found that the robot is exposed to a lot of noise during navigation. It is found as well that the sensing input vector becomes unstable at particular locations and that

A

(a) (b)

(c)

Figure 5.25. Training and evaluation trajectories. (a) The target trajectory, which the robot loops around, forming a sequence of figure- 8 and circular trajectories, with A as the switching point between two sequences, (b) the traces of the training trajectories, and (c) the traces of evaluation trajectories starting from arbitrary initial positions. Adapted from (Tani & Fukumura, 1997) with permission.

136 On the Mind

136

the number of branches in one cycle is not constant, even though the robot seems to follow the same cyclic trajectory. At the switching point A for either route, the sensory input receives noisy jitter in different pat-terns independent of the route. The context units, on the other hand, are completely identifiable between two decisions, which suggests that the task sequence between two routes is hardwired into the internal contextual dynamics of the RNN, even in a noisy environment.

To sum up, the robot accomplished the navigation task in terms of the convergence of attractor dynamics that emerge in the coupling of internal and environmental dynamics. Furthermore, situations in which sensory aliasing and perturbations arise can be disambiguated in navi-gating repeated experienced trajectories by self- organizing the autono-mous internal dynamics of the RNN.

5.7. Summary

The current chapter introduced the dynamical systems approach for modeling embodied cognition. The chapter started with an introduc-tion of nonlinear dynamics covering characteristics of different classes of attractor dynamics. Then, it described Gibsonian and Neo- Gibsonian ideas in psychology and developmental psychology, ideas central to the contemporary philosophy of embodied minds (Varela et al., 1991). These ideas fit quite well with the dynamical systems approach, and this chapter looked at how they have influenced behavior- based robotics and neurorobotics researchers who attempt to understand the essence of cognition in terms of the dynamic coupling between internal neural systems, bodies, and environments.

This chapter also provided brief tutorials on connectionist neural network models with special focus on dynamic neural network models including RNN and CTRNN. The chapter concluded by introducing some studies on neurorobotics that aim to capture minimum cognitive behaviors based on the ideas of nonlinear dynamical systems and by uti-lizing the schemes of dynamic neural network models.

Although the dynamical systems views introduced in this chapter in terms of Gibsonian psychology, connectionist level modeling, and neurorobotics may provide plausible accounts for some aspects of the embodied cognition, some readers might feel that these do not solve all


137

of the essential problems outstanding in the study of cognitive minds. They may ask how the dynamical systems approach described so far can handle difficult problems including those of compositionality in cogni-tion, of free will, and of consciousness. On the other hand, some others such as Takashi Ikegami have argued that simple dynamic neural net-work models are sufficient to exhibit a variety of higher- order cognitive behaviors such as turn taking (Ikegami & Iizuka, 2007) or free decision (Ogai & Ikegami, 2008), provided that the dynamics of the coupling of bodies and environments are developed as specific classes of complex dynamics. The next chapter introduces my own thoughts on the issue, and I put more emphasis on subjectivity than on the objective world as we try to articulate a general thought of embodied cognition through the study of neurodynamical robot models.

138

139

Part II

Emergent Minds: Findings from Robotics Experiments

140

141

141

6

New Proposals

The examples of “learnable neurorobots” described in chapter 5 illustrate how various goal- directed tasks can be achieved through self- organizing adequate sensory– motor coupling between the internal neuronal dynam-ics and the body– environment dynamics. Although the adaptive behav-iors presented so far seem to capture at least some of the essence of embodied cognition, I feel that something important is still missing. That something is the subjectivity or intentionality of the system.

6.1. Robots with Subjective Views

Phenomenologists might argue that subjectivity cannot be detected explicitly because the goal of embodied cognition is to combine the subjective mind and the objective world into a single inseparable entity through interactions with the environment. However, I argue that such a line of robotics research focuses only on “reactive behavior” based on the perception- to- motor cycle, and, therefore, might never be able to access the core problem of the dichotomy between the subjective mind and the objective world. All these robots do is to generate adequate motor commands reactively to current sensory inputs or to current inter-nal states summed with past sequences of sensory inputs.

142 Emergent Minds: Findings from Robotics Experiments

142

When my group conducted the robot navigation experiments aimed at learning cyclic trajectories mentioned in section 5.6, in the begin-ning I was interested in observing the emergent behaviors of the robot in terms of generating diverse trajectories in its transient states before converging to a limit cycle. However, after a while, I began to feel that robots with such reactive behaviors are simply like the steel balls in pinball machines, repeatedly bouncing against the pins until they finally disappear down the holes. Although we might see some complexity on the surface level of these behaviors, they are fundamentally different from those generally expected from humans in the contexts of both phenomenology and neuroscience. The behaviors of these robots seem too automatic and not requiring any effort, as happens in machines, which show no traits of subjectivity. The behaviors of these robots might be analogous to patients with alien hand syndrome who show behaviors generated automatically, as afforded by related perception without sub-jective or intentional control (see section 4.2).

Going back to Husserl (2002), he considered that the world consists of objects that the subject can consciously meditate on or describe. However, the bottom line is that direct experiences for humans originate not in such consciously representable objects but in the continuity of direct experi-ences in time. As described in chapter 3, in considering this problem, Husserl assumed a three- level structure in phenomenological time that consists of the absolute flow at the deepest level, the preempirical time level of retention and protention, and the objective time at the surface level. He also considered that the continuous flow of experiences becomes articulated into consciously accessible events or objects as a result of its development though these phenomenological levels. According to Husserl, this development is achieved through a process of interweaving double intentionality, namely transversal (retention and protention) and longitudinal (immanence of levels) intentionality, into the unitary flow of consciousness. Certainly, robots characterized with reactive behav-ior have nothing to do with such intentionality for consolidating as- yet- unknown everyday experiences into describable or narrative objects. This is both good and bad. Although such robots might be able to mimic smart insects, such as tumblebugs that skillfully roll balls of dung down path-ways, at this level of sophistication they are not yet capable of authentic and of inauthentic being as characterized by Heidegger (see section 3.4).

This is to say that current robots cannot, like human beings, construct their own subjective views of the world by structuring and objectifying

New Proposals 143

143

experiences accumulated through interactions with the world, and especially with other beings more or less like themselves within it. Constructed by each individual when constantly facing various prob-lems unique to that individual’s place within the objective world, such characteristic viewpoints and the experiences that underlie them repre-sent the subjectivity of the individual within the greater social system. What I would like to build are robots that realize what Heidegger con-sidered authentic being, a character that presumably emerges in dynamic interplay between looking ahead toward possible futures and reflecting on one’s own unique past in order to recruit the resources necessary to enact and realize the most possible future shared with others (see section 3.4).

How can the subjective views be constructed? Clearly, we are at the formative stages of this work. However, some clue as to how to begin— and make no mistake, this is the very beginning— appeared first in sec-tion 4.2, which explained the possible role of the predictive model for action generation and recognition in the brains. As Gibson and Pick con-jectured (2000), a set of perceptual structures obtained when an active learner engages in perceptual interaction with the environment and extracts information from it can be regarded as a subjective view belong-ing to that individual. Such an agent can have a proactive expectation of what the world should look like as it performs its intended actions. The developmental psychologist Claes von Hofsten has demonstrated that even 4– month- old infants exhibit such anticipatory behaviors. They track moving objects even when temporarily hidden from view by making a saccade to the reappearance point before the object reap-pears there (Rosander & von Hofsten, 2004). When they plan to reach for an object, their hands start to close before the object is encountered as they take into account the direction of and distance to the object (von Hofsten & Rönnqvist, 1988). These infants have prospects for their actions. These are the formative stages in the development of a poten-tially authentic being.

6.2. Engineering Subjective Views into Neurodynamic Models

So, as a first step in understanding how an artificial agent such as those under consideration in this book may be engineered with the capacity to act and eventually to be responsible for its actions, and moreover for how


144

the world turns out because of them, we now need to consider a theo-retical conversion from the reactive- type behavior generated by means of perception- to- action mapping to the proactive behavior generated by means of intention- to- perception mapping. Here, perception is active, and should be considered as a subject acting on objects of perception, as Merleau- Ponty (1968) explained in terms of visual palpation (see section 3.5). In terms of the neurodynamic models from which our robots are constructed, the per-ceptual structure for a particular intended action can be viewed as vector flows in the perceptual space as mapped from this intention. The vector flows constitute a structurally stable attractor. Let me explain this idea by considering some familiar examples. Suppose the intended action is your right hand reaching to a bottle from an arbitrary posture. If we consider a perceptual space consisting of the end- point position of the hand that is visually perceived and proprioception of the hand posture at each time step, the perceptual trajectories for reaching the bottle from arbitrary posi-tions in this visuo- proprioceptive space can be illustrated with reduced dimensionality as shown in Figure 6.1a as a flow toward and a conver-gence of vectors around an attractor that stands as the goal of the action.

These trajectories, and the actions that arise from them, can be gen-erated by fixed point attractor dynamics (see section 5.1). In this case, the position of the fixed point varies depending on the position of the object in question, but all actions of a similar form can be generated by this type of attractor.

Another example is that of shaking a bottle of juice rhythmically. In this case, we can imagine the vector flow in the perceptual space as illustrated in Figure 6.1b, which corresponds to limit cycle attractor dynamics. The essence here is that subjective views or images about the intended actions can be developed as perceptual structures represented by the correspond-ing attractor embedded in the neural network dynamics, as we have seen with CTRNN models that can develop various types of attractors (section 5.5). By switching from one intention to another, the corresponding sub-jective view in terms of perceptual trajectories is generated in a top- down manner. These perceptual structures might be stored in the parietal cor-tex associated with intentions received from the prefrontal cortex, as dis-cussed in section 4.2. This idea is analogous to the Neo- Gibsonian theory (Kelso, 1995) in which movement patterns can be shifted by phase transi-tions due to changes in the system parameters (see section 5.2).

The top- down projection of the subjective view should (only implic-itly) have several levels in general, wherein the views at higher levels

New Proposals 145

145

might be more abstract and those at lower levels might be more concrete and detailed. Also, top- down views of the world should be “composi-tional” enough so that proactive views for various ways of intentionally interacting with the world can be represented by systematically recom-bining parts of images extracted from accumulated experiences. For example, to recall once again the very familiar image of everyday rou-tine action with which this text began, when we intend to drink a cup of coffee, the higher level may combine a set of subintentions for primi-tive actions such as reaching- to- cup, grasping- cup, and bringing- cup- to- mouth in sequences that may be projected downward to a lower level where detailed proactive images of corresponding perceptual trajectories can be generated. Ultimately, perceptual experiences, which are associ-ated with various intentional interactions with the world, are: semanti-cally combinatorial language of thought (Fodor and Pylyshyn, 1988).

One essential question is how the higher level can manipulate or com-bine action primitives or words systematically. Do we need a framework of symbol representation and manipulation, especially in the higher cog-nitive level, for this purpose? If I said yes to this, I would be criticized just like Dreyfus criticized Husserl or like Brooks criticized conventional AI and cognitive science research.

What I propose is this: We need a neurodynamic system, well- formed through adaptation, that can afford compositionality as well as

(a) (b)

Vision

Prop

rioce

ptio

n

Prop

rioce

ptio

nVision

Figure 6.1. The perceptual trajectories for different intended actions in visuo- proprioceptive space, for (a) approaching an object and (b) shaking it.


146

systematicity and gives an impression that as if discrete symbols existed within the system as well as that as if these symbols were manipulated by that system. The model of ‘compositional’ or ‘symbolic’ mind to which I am now pointing is not impossible to achieve in a neurodynamic sys-tem, if we remember that the sensitivity of chaos toward initial condi-tions exhibits a sort of combinatory mechanics by folding and stretching in phase space. This chaotic dynamic can produce combinatory sequences of symbols in terms of symbolic dynamics via partitioning processes of the continuous state space with a finite number of labels, as described in section 5.1. In simpler terms, the continuous space of action can be cut up into chunks, and these chunks can be referenced as things in themselves, represented, symbolized. Dale and Spivey (2005) have provided a sym-pathetic argument, proposing that the promise of symbolic dynamics lies in articulating the transition from dynamical, continuous descriptions of perception into the theoretical language of discrete, algorithmic processes for high- level cognition. What I am saying here is that the segmentation of “thinking” into discrete “thoughts,” which are represented in terms of logical operators, as propositions, as combinations of symbols, can be per-formed by dynamic models of mind that do not employ discrete symbolic computation in their internal operations (Tani & Fukumura, 1995.)

What about creative composition of primitives into novel sequences of action? Neurodynamic models account for this capacity, as well. Nonlinear dynamics can exhibit structural changes of varying discrete-ness, as can be seen in bifurcations from one attractor structure to another or in phase transitions by means of controlling relatively low- dimensional external parameters. So, we may suppose that the higher level sending sequences of parameter values to the lower level in the network results in sequential switching of primitive actions by means of the parameter bifurcation in this lower neurodynamic system. And if the neurodynam-ics in the higher level for generating these parameter sequences is driven by its intrinsic chaos, various combinatory sequences of the primitive actions could be generated. Figure 6.2 illustrates the idea.

Although an agent driven by top- down intentions for action has proac-tive subjective views on events experienced during its interaction with the objective environment, its cognitive mind should also reflect on unex-pected outcomes through the bottom- up process to modify the current intention. This modification of the intentions for action in the bottom- up process can be achieved by utilizing information about the prediction error, the possibility of which having been briefly mentioned in the previ-ous section. Figure 6.2 illustrates the process whereby the state values in

147

Vision

Prop

rioc

eptio

n

Prop

rioc

eptio

n

Vision

Sequences of state values sampled at Poincaré section.(0.91, 0.24), (0.37, 0.91), (0.65, 0.55) ......

Sequences of parameter bifurcation

Predicting sequences of action primitives

Higher level

Lower level

Set intention in terms of initial state.

parameter (0.91, 0.24) parameter (0.37, 0.91)

Predicting visuo-proprioceptive state

Actual

Error

Bottom-up error related signal

Modify

Figure 6.2. An illustration showing how chaos can generate diverse sequential combinations of action primitives. In the higher level, the state trajectory is generated by a chaotic dynamic system with a given initial state and the state values are sampled at each time they cross a Poincaré section. These state values are input to a parameterized dynamic system in the lower level as its parameters successively (along the solid arrow) cause sequential bifurcation in the parameterized dynamic system and associated action primitives. The lower level predicts the coming visuo- proprioceptive state and its prediction error is monitored. The state in the higher level is modified in the direction of minimizing this error (along the dashed arrow.)


148

the higher level are modified to minimize the prediction error in the lower level. This error signal might convey the experience of consciousness in terms of the first- person awareness of one’s own subjectivity because the subjective intention is directly differentiated from the objective reality and the subject feels, as it were, “out of place” and thus at a difference from its own self- projection. My tempting speculation is that the authen-tic being could be seen in a certain imminent situation caused by such error or conflict between the two.

In summary, what I am suggesting is that nonlinear neurodynamics can support discrete computational mechanics for compositionality while preserving the metric space of real- number systems in which physical properties such as position, speed, weight, and color can be represented. In this way, neurodynamic systems are able to host both semantically combinatorial thoughts at higher levels and the corre-sponding details of their direct perception at lower levels. Because both of these share the same phase space in a coupled dynamical sys-tem, they can interact seamlessly and thus densely, not like symbols and patterns that interact somewhat awkwardly in more common, so- called hybrid architectures. Meanwhile, the significance of symbolic expression is not only retained on the neurodynamic account but it is clarified, and with this newfound clarity we may anticipate many his-torical problems regarding the nature of representation in cognition in philosophy of mind to finally dissolve.

6.3. The Subjective Mind and the Objective World as an Inseparable Entity

Next, let’s extend such thinking further and examine how the subjec-tive mind and the objective world might be related. Figure 6.3 illustrates conceptually how the interactions between top- down and bottom- up processes take place in the course of executing intended actions.

It is thought that the intention of the subjective mind (top- down) as well as the perception of the objective world (bottom- up) proceeds as shown in Figure 6.3 (left panel). These two processes interact, resulting in the “recognition” of the perceptual reality in the subjective mind and the “generation” of action in the objective world (middle panel). This “recognition” results in the modification of the subjective mind―and

New Proposals 149

149

subjective mind subjective mind subjective mind

intention intention

predict predict predictrecognize recognize

perceive perceive perceiveact act

physicalinteraction

objective worldobjective worldobjective world

modify

Figure 6.3. The subjective mind and the objective world become an insep-arable entity through interactions between the top- down and bottom- up pathways. Redrawn from Tani (1998).

potential consciousness―whereas the “generation” of action modifies the objective world, and the interactions continue with the modified states of the mind and the world (right panel). In this process, we see the circular causality between action and recognition. This circular causality results in inseparable flows between the subjective mind and the objec-tive world as they reciprocally intertwine with each other via action- perception cycles, as Merleau- Ponty proposed (Merleau- Ponty, 1968). If we were able to achieve this scenario in a robot, the robot would be free from Descartes’s Cartesian dualism, as its subjective mind and the objective world could finally become inseparable.

I want to conclude this chapter by pointing out what I consider to be essential for constructing models of the cognitive mind:

1. The cognitive mind is best represented by nonlinear dynamical systems defined in the continuous time and space domain, wherein their nonlinearity can provide the cognitive competence of compositionality.

2. Both natural and artificial cognitive systems should be capable of predicting the perceptual outcome for the current intention for acting on the outer world via top- down pathways, whereas the current intention is adapted by using bottom- up signals of error detected between the prediction and the actual perceptual outcome in the action- perception cycle.


150

3. The underlying structure for consciousness and free will should be clarified by conducting a close examination of nonstationary characteristics in the circular causality developed through the aforementioned top- down and the bottom- up interaction between the subjective mind and the objective world. The essence of authentic being also might be clarified via such examination of the apparent dynamic structure.

The remaining chapters test these conjectures by reviewing a series of synthetic robotics experiments conducted in my laboratory. Readers should be aware that my ideas were not at all in a concrete nor com-plete form from the very outset. Rather, they became consolidated over time as the modeling studies were conducted. Moreover, my colleagues and I have never tried to put all of the assumed elements of the mind that we have discussed thus far into our synthetic robotic models. It was not our aim to put all available neuroscience knowledge about local functions, mechanisms, and anatomy into the brains of our tiny robots. Instead, in each trial we varied and developed “minimal brains,” so to speak, in dynamic neural network models of the RNN type. We tried neither to implement all possible cognitive functions into a particular robotic model nor to account for the full spectrum of phenomenologi-cal issues in each specific experiment. We concentrated on models and experiments with specific focuses, therforee in each new trial we added elements relevant to the focus and removed irrelevant ones. My hope is that, in reviewing the outcomes of our series of synthetic robotic stud-ies, readers will be able to share the deep insights into the nature of the mind, especially how thought and its interaction with the world could arise, whicht I have come to in performing and reflecting on the actual experiments day- to- day.

The next chapter examines how robots can lean about the outer environment by using a sensory prediction mechanism in the course of exploration. It also explores the issue of self- consciousness as related to this sensory prediction mechanism.

151

151

7

Predictive Learning About the World from Actional Consequences

The previous chapter argued that understanding the processes essential in the development of a subjective view of the world by way of interac-tive experiences within that world is crucial if we are to reconstruct the cognitive mind in another medium, such as in our neurodynamic robots. But, how exactly can robots develop such subjective views from their own experiences? Furthermore, if a robot becomes able to acquire a subjective view of the world, how does it also become aware of its own subjectivity or self? In considering these questions, this chapter reviews a set of robotics experiments in the domain of navigation learn-ing. These experiments were conducted in a relatively simple setting more than 20 years ago in my lab, but they addressed two essential ques-tions. The first experiment addresses the question of how the compo-sitional representation of the world can be developed by means of the self- organization of neurodynamic structures via the accumulated learn-ing of actional experiences in the environment. The second experiment inquires into the phenomenology of the “self” or self- consciousness. I attempt to clarify its underlying structure by examining the possible interaction between top- down prediction and bottom- up recognition during robot navigation.


152

7.1. Development of Compositionality: The Symbol Grounding Problem

In the mid- 1990s, I started to think about how robots could acquire their own images of the world from experiences gathered while inter-acting with their environment (Tani, 1996). Because humans can men-tally generate perceptual images for various ways of interacting with the world, I wondered if robots could also develop a similar competence via learning. As my colleagues and I had just completed experiments on robot navigation learning with homing and cyclic routing, as described in chapter 5, I decided to pursue this new problem in the context of robot navigation.

First, I tried to apply the forward dynamics model proposed by Masao Ito and Mitsuo Kawato (see chapter 4) directly to my Yamabico robot navigation problem. I thought that a recurrent neural network (RNN) would work as a forward dynamics model that predicts how the sensa-tion of images at range changes in response to arbitrary motor command inputs for two wheel drives at every 500- ms time interval. However, achieving the convergence of learning with the sensory– motor data acquired in the original workspace proved to be very difficult. The rea-son for this failure was that it was just asking too much of the network to learn to predict the sensory outcomes for all possible combinations of motor commands at each time step. Instead, it seemed reasonable to assume that the trajectory of the robot should be generated under the constraint of smooth, collision- free maneuvering. From this assumption, I decided to employ the scheme of branching with collision- free maneu-vering shown in section 5.6 again. This branching scheme enables the robot to move along “topological” trajectories in a compositional way by arbitrarily combining branching decisions in sequence.

By utilizing this scheme, the problem could be simplified to one wherein an RNN learns to predict just the sensation of the next branch-ing point in response to action commands (branching decision) at the current branching point. I speculated that the RNN could acquire com-positional images while traveling around the workspace by combining various branching decisions, provided that the RNN had already learned a sufficient number of branching sequences in the topological trajecto-ries. A focal question that the experiment was designed to address was this one: What happens when the prediction differs from the actual out-come of the sensation? In this situation, a robot navigating a workspace

Predictive Learning About the World from Actional Consequences 153

153

by referring to an internal map with a finite state machine (FSM)- like representation of the topological trajectories would experience the sym-bol grounding problem (see Figure 2.2), discussed in chapter 2.

7.1.1 Navigation Experiments with Yamabico

In the learning phase, the robot explores a given environment containing obstacles by taking random branching decisions. Let’s assume that the robot arrives at the nth branching point, where it receives sensory input (range image vector plus travel distance from the previous branch point) pn and randomly determines the branching (0 or 1) as xn, after which it moves to the (n+1)st branching point (see the left side of Figure 7.1).

The robot acquires a sequence of pairs of sensory inputs and actions (pn, xn) throughout the course of exploring its environment. Using these sample pairs of sensory inputs and actions, the RNN is trained so that it can predict the next sensory input pn+1 in terms of the current sensory input pn and the branching action xn taken at branching point n (see the right panel in Figure 7.1). In this predictive navigation task, the context units in the RNN play the role of storing the current state in the work-ing memory, which is analogous to the previous Yamabico experiment described in chapter 5. The actual training of the RNN is conducted in an offline manner with the sample sequence data saved in short- term memory storage.

Once the RNN is trained, it can perform two types of prediction. One is the online prediction of the sensory inputs at the next branching

x1

x2

x3

p1

p2

p3

Pn+1

Pn XnCn

Prediction of sensation at next branch

Sensation at current branch Current Branch decision

Context loop

Cn+1

Figure 7.1. An RNN learning to predict sensation at the next branching point from the current branching decision.


154

point for an action taken at the current branch. The other is the offline look- ahead prediction for multiple branching steps while the robot stays at a given branching point. Look- ahead prediction is performed by making a closed loop between the sensory prediction output units and the sensory input units of the RNN, as denoted with a dotted line in Figure 7.1. In the forward dynamics of an RNN with a closed sensory loop, arbitrary steps for look- ahead prediction can be taken by feeding the current predictive sensory outputs as sensory inputs in the next step instead of employing actual external sensory inputs. This enables the robot to perform the mental simulation of arbitrary branching action sequences as well as goal- directed planning to achieve given goal states, as described later.

After exploring the workspace for about 1 hour (see the exact tra-jectories shown in Figure 7.2a) and undergoing offline learning for one night, the robot’s performance for online one- step prediction was tested.

In the evaluation after the learning phase, the robot was tested for its predictive capacity during navigation of the workspace. It navigated the workspace from arbitrarily set initial positions by following an arbitrary action program of branching and tried to predict the upcoming sensory inputs at the next branching point from the sensory inputs at the cur-rent branching point. Figure 7.2b presents an instance of this process, wherein the left panel shows the trajectory of the robot as observed and the right panel shows a comparison between the actual sensory sequence and the predicted one. The figure shows the nine steps of the branch-ing sequence; the leftmost five units represent sensory input, the next five units represent the predicted state for the next step, the following unit is the action command (branching into 0 or 1), and the rightmost four units are the context units. Although, initially, the robot could not make correct predictions, it became increasingly accurate after the fourth step. Because the context units were initially set randomly, the prediction failed at the very beginning. However, as the robot contin-ued to travel, sensory input sequences “entrained” context activations into the normal/ steady- state transition sequence, after which the RNN became capable of producing correct predictions.

We repeated this experiment with various initial settings (different initial positions and different action programs) and the robot always started to produce correct predictions within 10 branch steps. We also found that although the context was easily lost when perturbed by strong noise in the sensory input (e.g., when the robot failed to detect a


155

branch and ended up in the wrong place), the prediction accuracy was always recovered as long as the robot continued to travel. This autore-covery feature of the cognitive process is a consequence of the fact that a certain coherence in terms of the close matching between the inter-nal prediction dynamics and the environment dynamics emerges during their interaction.

(a)

start

(c)

bran

chin

g st

ep

sensorysequence

look-ahead predictionsequence

p p x c

sensorysequence

one-step predictionsequence

p p x c

bran

chin

g st

ep

(b)

start

Figure 7.2. Trajectories of Yamabico (a) during exploration, (b) during online one- step prediction (left) and comparison between the actual sensory sequence and its corresponding one- step prediction (right), and (c) generated after offline look- ahead prediction (left) and comparison between an actual sensory sequence and its look- ahead prediction (right). Adopted from Tani (1996) with permission.


156

Once the robot was “situated” in the environment by the entrainment process, it was able to perorm multistep look- ahead prediction from branching points. A comparison between a look- ahead prediction and the actual sensory sequence during travel is shown in Figure 7.2c. The arrow in the workspace in the left panel of the figure denotes the branch-ing point where the robot performed look- ahead prediction for an action program represented by the branching sequence 1100111. The robot, after conducting look- ahead prediction, actually traveled following the action program, generating a figure- 8 trajectory. The right panel in the figure shows a comparison between the actual sensory input sequence and its look- ahead prediction associated with the action program and the context activation sequence. It can be seen that the look- ahead prediction agrees with the actual sequence. It is also observed that the context val-ues as well as the prediction of sensory input at the initial and final steps are almost the same. This indicates that the robot predicted its return to the initial position at the end step in its “mental” simulation for traveling along a figure- 8 trajectory. We repeated this experiment of look- ahead prediction for various branching sequences and found that the robot was able to predict sensory sequences correctly for arbitrary action programs in the absence of severe noise affecting the branching sequence.

Finally, the robot was instructed to generate action plans (branch-ing sequences) for reaching a particular goal (position) specified by a sensory image. In the planning process, the robot searched for adequate action sequences that could be used to reach the target sensory state in the look- ahead prediction of sensory sequences from the current state while minimizing estimated travel distance to the goal. Figure 7.3 shows the result of one particular trial.

In this example in Figure 7.3, the robot generated three different action plans, each of which was actually executed. The figure shows the three corresponding trajectories successfully reaching a given goal from a starting position in the adopted workspace. Although the third trajec-tory might look redundant due to the unnecessary loop, the creation of such trajectories suggests that a sort of compositional mechanics in the forward dynamics of the RNN had developed as a result of consolida-tion learning. This self- organized mechanism enabled the robot to gen-erate diverse navigational plans as if segments of images obtained during actual navigation were combined by following acquired rules.

Some may consider that the process of the goal- direct plan by the RNN is analogous to the one by GPS described in section 2.2, because


157

the forward prediction of the next sensory state for actions to be taken at each situation of the robot by the RNN seems to play the same role as that of the causal rule described for each situation in the problem space in GPS. However, there are crucial differences between the for-mer functioning in a continuous state space and the latter in a discrete state space. We will come to understand the significance of these differ-ences through the following analysis.

7.1.2 Analysis of the Acquired Neurodynamic Structure

After the preceding experiment, I thought that it would be interesting to see what sorts of attractors or dynamical structures emerged as a result of self- organization in the RNN and its coupling with the environment, as well as how such attractors could explain the observed phenomena, such as the look- ahead prediction of combinatorial branching sequences and the autorecovery of internal contexts by environmental entrainment. Therefore I conducted a phase- space analysis of the obtained RNN to

(a)

start

goal

(b)

startgoal

(c)

startgoal

(d)

startgoal

Figure 7.3. The result of goal- directed planning. Trajectories corresponding to three different generated action programs are shown. Adopted from Tani (1996) with permission.


158

examine its dynamical structure, as shown for the Rössler attractor in chapter 5. One difference was that time integration by forward dynam-ics of the RNN required feeding external inputs in the form of branching action sequences into the network. Therefore, the RNN in the closed- loop mode was dynamically activated for thousands of steps while being fed random branching sequences (1s and 0s). Then, the activation values of two representative context units were plotted for all steps, in which the transient part corresponding to the first several hundred steps was excluded. It was like looking at trajectories from the mental simulation of thousands of consecutive steps of random branching sequences in the workspace while ignoring the initial transient period of state transitions. The resultant plot can be seen in Figure 7.4.

We can see a set of segments (Figure7.4a). Moreover, a magnification of a particular segment shows an assembly of points resembling a Cantor set (Figure 7.4b). The plot represents the invariant set of a global attrac-tor, as the assembly appears in the same shape regardless of the initial val-ues of the context units or the exact sequences of randomly determined branching sequences. This means that the context state initialized with arbitrary values always converged toward steady- state transitions within the invariant set after some transient period. It was found that, after convergence was reached, the context state shifted from one segment to another at each step, and moreover it was found that each segment corresponded to a particular branching point. Additionally, an analysis

0.0

0 .5

1.0

0.0 0.5 1.00.8

0.95

0.4 0.55

(a) (b)

context-1

cont

ext-

2

context-1

cont

ext-

2

Figure 7.4. Phase space analysis of the trained RNN. (a) An invariant set of an attractor appeared in the two- dimensional context activation space. (b) A magnification of a section of the space in (a). Adopted from Tani (1996) with permission.


159

of the aforementioned experiments for online prediction revealed that, whenever the predictability of the robot was lost due to perturbations, the context state left the invariant set. However, the perturbed context state always returned to the original invariant set after several branching steps because the invariant set had been generated as a global attractor. Our repeated experiments with different robot workspace configura-tions revealed that the observed properties of the RNN are repeatable and therefore general.

7.1.3 Is the Problem of Symbol Grounding Relevant?

Given that the context state shifted from one segment to another in the invariant set in response to branching inputs, we can consider that what the RNN reproduced in this case was exactly an FSM consisting of nodes representing branching points and edges corresponding to transitions between these points, as shown in Figure 2.2. This is analogous to what Cleeremans and colleagues (1989) and Pollack (1991) demonstrated by training RNNs with symbol sequences characterized by FSM regulari-ties. Readers should note, however, that the RNNs achieve much more than just reconstructing an equivalent of the target FSM.

First, each segment observed in the phase space of the RNN dynam-ics is not a single node but a set of points, namely a Cantor set spanning a metric space. The distance between two points in a segment repre-sents the difference between past trajectories arriving at the node. If the two trajectories come from different branching sequences, they arrive at points in the segment that are also far apart. On the other hand, if the two trajectories come from exactly the same branching sequences after passing through an infinite number of steps except for the initial branching points, they arrive at arbitrarily close neighbors in the same segment. Theoretically speaking, a set of points in the segment consti-tutes a Cantor set with fractal- like structures because this infinite num-ber of points should be capable of representing the history of all possible combinations of branching (this can be proven by taking into account the theorem of iterative function switching [Kolen, 1994] and random dynamical systems [Arnold, 1995]). This fractal structure is actually a signature of compositionality, which has appeared in the phase space of the RNN by means of iterative random shifts of the dynamical system triggered by given input sequences of random branching. Interestingly, Fukushima and colleagues (2007) recently showed supportive biological


160

evidence from electrophysiological recording data that CA1 cells in the rat hippocampus encode sequences of episodic memory with a similar fractal structure.

Second, the observed segments cannot be manipulated or represented explicitly as symbols attached to nodes in an FSM. They just appear as a dynamic closure1 as a result of the convergent dynamics of the RNN. The nature of global convergence of the context state toward steady- state transitions within the invariant set as a dynamic closure can afford global stability to the predictive dynamics of the RNN. An illustration of this concept appears in Figure 7.5.

On the other hand, in the case of an FSM, there is no autorecov-ery mechanism against perturbations in the form of invalid inputs because the FSM provides only a description of steady- state transitions within the graph and cannot account for how to recover such states from dynamically perturbed ones. As mentioned earlier, when an FSM receives invalid symbols (e.g., unexpected sensations during navigation), it simply halts operation. The discussion here is analogous to that in chapter 5 about the advantages of utilizing dissipative dynamic sys-tems rather than sinusoidal functions for stably generating oscillatory patterns.

1. It is called a dynamic closure because the state shifts only between points in the set of segments in the invariant set (Maturana & Varela, 1980).

Figure 7.5. The dynamic closure of steady- state transitions organized as an attractor (solid arrows) associated with a convergent vector flow (dashed arrows).


161

It is essential to understand that there is no homunculus that looks at and manipulates representations or symbols in the proposed approach. Rather, there are just iterations of a dynamical system whereby com-positionality emerges. Ultimately, it can be said that this system is not affected by the symbol grounding problem because there are no “sym-bols” to be grounded to begin with, at least not internally.

Before moving on, I should mention some drawbacks to this approach. The current scheme utilizing the forward model is limited to small- scale problems because of the frame problem discussed in section 7.1. The model worked successfully because the navigation environment was small and the branching scheme preprogramed in the lower level simpli-fied the navigation problem. Although how robots can acquire the lower sensory– motor level skills such as branching or collision- free maneuver-ing from their own direct sensory– motor experiences is quite an impor-tant problem, we did not address the problem in this study. Another problem concerns the intentionality of the robot. What the experiment showed is so- called latent learning in which an agent learns an internal model of the environment via random exploration without any inten-tions. If the robot attempts to learn about all possible exploration expe-riences without any intentions or goals to achieve, such learning will face the combinatory explosion problem sooner or later. We return to these issues in the later sections.

The next section explores how sensory prediction learning and phe-nomena of self- consciousness could be related, by reviewing results of another type of robot navigation experiment.

7.2. Predictive Dynamics and Self- Consciousness

This section examines how the notion of “self” or self- consciousness could emerge in artificial systems as well as in human cognitive minds through the review of further robotics experiments on the topics of prediction learning in navigation as extended from the aforementioned Yamabico ones. The following robotics experiments clarify the essential role of sensory prediction mechanisms in the possible development of self- consciousness as presumed in the earlier chapters.

Although the experiments with Yamabico described in the previ-ous section revealed some interesting aspects of contextual predictive dynamics, they still miss some essential features, one of which is the


162

utilization of prediction error signals. The error signal is considered to be a crucial cue for recognizing a gap between the subjective image and objective reality. Recent evidence from neuroscience has revealed brain waves related to prediction error, as in the case of mismatched negativity, and it is speculated that they are used for fast modification of ongoing brain processes. Also, Yamabico did not have a particular bias or attention control for acquiring sensory input. It would naturally be expected that the addition of some attention control mechanism would reinforce our proposed framework of top- down prediction– expectation versus bottom- up recognition. Therefore, we introduced a visual system with an attention control mechanism in the robot platform that suc-ceeded Yamabico. Finally, it would be interesting to incorporate such a system with dynamic or incremental learning of experiences rather than looking at the result of one- time offline “batch” learning, as in the case of Yamabico. Our findings in these robotics experiments enriched with these new elements suggest a novel interpretation of concepts such as the momentary self and minimal self, which correspond to ideas devel-oped by William James (1982) and Martin Heidegger (1962).

7.2.1 Landmark- Based Navigation Performed By a Robot with Vision

I built a mobile robot with vision provided by a camera mounted on a rotating head, as shown in Figure 7.6a (Tani, 1998). The task of this robot was to learn to dynamically predict landmark sequences encoun-tered while navigating a confined workspace.

After a successful learning process, the robot was expected to be able to use its vision to recognize landmarks in the form of colored objects and corners within a reasonable amount of time before colliding with them, while navigating the workspace by following the wall and the edge between the wall and the floor. It should be noted that the navigation scheme did not include branching as in the case of Yamabico, because the learning of compositional navigational paths was not the focus of research in this robot study.

The robot was controlled by the neural network architecture shown in Figure 7.6b. The entire network consisted of parts responsible for prediction (performed by an RNN) and parts responsible for percep-tion, the latter being divided into “what” and “where” pathways thereby mimicking known visual cortical structures. In the “what” pathway, visual patterns of landmarks corresponding to colored objects were

163

(b)(a)

camera

Hop�eld net

winner take all neurons

pop-up

categorical output

categorical output

‘what’‘where’context loop

Associationnetwork

left wheel right wheel

motors

visual �eld

sensory input

sensory predictionPrediction by RNN

Figure 7.6. A vision- enabled robot and its neural architecture. (a) A mobile robot featuring vision is looking at a colored landmark object. (b) The neural network architecture employed in the construction of the robot. Adopted from Tani (1998) with permission.


164

processed in a Hopfield network, which can store multiple static pat-terns by using multiple fixed- point attractors. When a perceived visual pattern converged toward one of the learned fixed- point attractors, the pattern was recognized and its categorical output was generated by a winner- takes- all activation network, known as a Kohonen net-work. Learning was initiated for both the Hopfield and Kohonen net-works whenever a visual stimulus was encountered. In the “where” pathway, accumulated encoder readings of the left and right wheels from the last encountered landmark to the current one, and the direc-tions of the detected landmarks in frontal view, were processed by the Kohonen network, in which its categorical outputs were generated. Together with both pathways, “what” categories of visual landmark objects and “where” categories of the relative travel distance from the last landmark to the current one, as well as the corresponding direc-tion determined by the camera orientation, were sent for prediction in a bottom- up manner.

In the prediction process, the RNN learned to predict in a top- down manner the perceptual categories of “what” and “where” for landmarks to be encountered in the future. Note that there were no action inputs in this RNN because there was no branching in the current setting. In this model, the bottom- up and top- down pathways did not merely provide inputs and outputs to the system. Rather, they existed for their mutual interactions, and the system was prepared for expected percep-tual categories in the top- down pathway before actually encountering the landmarks. This expectation ensured that the system was ready for the next arriving pattern in the Hopfield network and was prepared to direct the camera toward the landmark with correct timing and direction. Actual recognition of the landmark objects was established by dynamic interactions between the two pathways. This means that if the top- down prediction of the visual pattern failed to match the currently encountered one, the perception would result in an illusion constituting a combination of the two patterns. Moreover, a mismatch in the “where” perceptual category could result in failure to attend any of the expected landmarks to be recognized. Such misrecognition outcomes were fed into the RNN, and the next prediction was made on this basis. Note that the RNN was capable of engaging in “mental rehearsal” of learned sequential images by constructing a closed loop between the prediction outputs and the sensation inputs in the same way as Yamabico.


165

A particular mechanism for internal parameter control was imple-mented to achieve adequate interactive balance between the top- down and bottom- up pathways. The mechanism exerted more top- down pressure on the two perceptual categories (“what” and “where”) as the error between the predicted perception and its actual outcome decreased. A shorter time period was also allocated for reading the per-ceptual outcomes in the Hopfield network in this case. On the other hand, less top- down pressure was exerted when the error between the predicted perception and its actual outcome was larger, and a longer time period was allowed for dynamic perception in the Hopfield net-work. In other words, in the case of fewer errors, top- down prediction dominated the perception, whereby the attention was quickly turned to upcoming expected landmarks, which resulted in quick convergence in the Hopfield network. Otherwise, the bottom- up pathway dominated the perception, taking longer to look for landmarks while waiting for convergence in the Hopfield network.

Learning of the RNN was conducted for event sequences associated with encountering landmarks. More specifically, experienced sequences of perceptual category outcomes were used as target sequences to be learned. Incremental training of the RNN was conducted after every 15th landmark by adopting a scheme of rehearsal and consolidation, so that phenomena such as “catastrophic forgetting” could be avoided. RNNs lose previously learned memory content quite easily when new sequences are learned, thereby altering acquired connection weights. Therefore, in the new scheme, the RNN “rehearsed” previously learned content with the closed- loop operation and stored the generated sequences in the “hippocampus” (corresponding to short- term memory) together with the newly acquired sequences, and catastrophic forgetting of existing memory was avoided by retraining the RNN with both the rehearsed sequences and the newly experienced ones.

This rehearsal and consolidation might correspond to dreaming dur-ing the REM sleep phase reported in the literature on consolidation learning (Wilson & McNaughton, 1994; Squire & Alvarez, 1995). It has been considered that generalization of our knowledge proceeds sig-nificantly through consolidating newly acquired knowledge with older knowledge during sleep. Our robot actually stopped for rest when this rehearsal and consolidation learning was taking place after every fixed period. However, in reality the process would not be so straightforward as this if the rehearsed and the newly acquired experiences conflict with


166

each other. One of the aims behind the next experiment I will describe was to examine this point.

7.2.2 Intermittency During Dynamic Learning

The experiment was conducted in a confined workspace containing five landmarks (two colored objects and three corners). It was repeated three times, and in each trial the robot circulated the workspace about 20 times, which was a limit imposed by the battery life of the robot. We monitored three characteristic features of the robot’s navigation behavior in each run: prediction error, bifurcation of the RNN dynam-ics due to iterative learning, and phase plots representing the attractor dynamics of the RNN at particular times during the bifurcation process. A typical example is shown in Figure 7.7a.

The prediction error was quite high at the beginning of all trials because of the initially random connection weights. After the first learn-ing period, the predictability was improved to a certain extent in all three trials, but the errors were not eliminated completely. Prediction failures occurred intermittently in the course of the trials, and we can see from the bifurcation diagram that the dynamical structure of the RNN varied. In a typical example, shown in Figure 7.7a, a fixed- point attractor appearing in the early periods of the learning iterations as a single point is plotted at each step in the bifurcation diagram, in most cases before the third learning period. After the third learning period, a quasiperiodic or weakly chaotic region appears. Then, after the fourth learning period, it becomes a limit cycle with a periodicity of 5, as can be seen from the five points plotted in the bifurcation diagram at each step during this period. In addition, a snapshot is shown in the phase plot containing five points. After the fifth learning period, a highly cha-otic region appears, as indicated by the strange attractor in the corre-sponding phase plot.

Importantly, the state alternates between the strange attractor (chaos) and the limit cycle attractor with a periodicity of 5. In fact, limit- cycle dynamics with a periodicity of 5 appeared most frequently in the course of all trials. A periodicity of 5 is indicative because it cor-responds to the five landmarks that the robot encountered in a single turn around the workspace. Indeed, the five points represent a dynamic closure for the steady- state transitions between these five landmarks. However, it should be noted that this limit cycle with a periodicity of 5


167

does not remain stationary, because the periodicity disappears at times and other dynamical structures emerge. The dynamic closure observed in the current experiment is not stable but changes in the course of dynamic learning. From the view of symbolic dynamics (see chapter 5), this can be interpreted as the robot could mentally simulate various

0 1 2 3 4 5 6 7learning times

neur

alac

tivat

ion

stat

e

0.0

1.0

0.5

0

0.5

1

pred

ictio

n er

ror

0 15 30 45 60 75 90 105event steps

4 5 7

c1

c 2

c 1

c 2

c 1

c 2

(a)

(b) Unsteady phase Steady phase

Figure 7.7. Experimental results for a vision robot. (a) Prediction error, bifurcation diagram of the RNN dynamics, and phase plot for two context units at particular times during the learning process. (b) The robot’s trajectories as recorded in the unsteady and steady phases. Adopted from (Tani, 1998) with permission.


168

symbolic sequence structures for encountering landmark labels includ-ing deterministic symbol sequences of a period of 5 and the ones with probabilistic state transitions during the rehearsal.

From these results, we can conclude that there were two distinct phases: a steady- state phase represented by the limit- cycle dynamics with a periodicity of 5, and an unsteady phase characterized by nonperiodic dynamics. We also see that transitions between these two phases took place arbitrarily over the course of time, and that differences appeared in the physical movements of the robot concurrently. To clarify why this happened, we compared the actual robot trajectories observed in these two periods. Figure 7.7b shows the robot trajectories measured in these two periods with a camera mounted above the workspace. The trajec-tory was more winding in the unsteady phase than in the steady phase, particularly in the way objects and corners were approached. From this it was inferred that the robot’s maneuvers were more unstable in the unsteady phase because it spent more time on the visual recognition of objects due to the higher prediction error. So, the robot faced a higher risk of misdetecting landmarks when its trajectory meandered during this period, which was indeed the case in the experiments. In the steady phase, however, the detection sequence of landmarks became more deterministic and travel was smooth, with greater prediction success. What is important here is that these steady and unsteady dynamics are attributable not only to the internal cognitive processes arising in the neural network, but also were expressed in the physical movements of the robot’s body as it interacted with the external environment.

Finally, we measured the distribution of interval steps between cat-astrophic error peaks (error >0.5) observed in three different experi-ments of the robot (Figure 7.8).

The graph indicates that the distribution of the breakdown interval has a long- tail characteristic with a power- law- like profile. This indi-cates that the shift from the steady to the unsteady phase takes place intermittently, without dominant periodicity. The observed intermit-tency might be due to the tangency developed in the whole dynamics (see section 5.1.). The observation here might be also analogous to the so- called phenomenon of chaotic itinerancy (Tsuda et al., 1987; Ikeda et al., 1989; Kaneko, 1990; Aihara et al., 1990) in which state trajec-tories tend to visit multiple pseudoattractors one by one itinerantly in a particular class of networks consisting of dynamic elements. Tsuda and colleagues (1987) showed that intermittent chaos mechanized by means


169

of tangency in nonlinear mapping (see section 5.1) generated the chaotic itinerancy observed in his memory dynamics model.

The robotics experiment described in this section has demonstrated that phenomena similar to chaos itinerancy could also emerge in the learning dynamics of a network model coupled with a physical environ-ment. The dynamic learning processes while interacting with the outer environments can generate complex trajectories that alternate between stabilizing the memory contents and their breakdown.

7.2.3 Accounting for the “Minimal Self”

An interesting observation from the last experiment is that the transi-tions between steady and unsteady phases occurred spontaneously, even though the workspace environment was static. In the steady phase, coher-ence is achieved between the internal dynamics and the environmental dynamics when subjective anticipation agrees closely with observation. All the cognitive and behavioral processes proceed smoothly and auto-matically, and no distinction can be made between the subjective mind and the objective world. In the unsteady phase, this distinction becomes rather explicit as conflicts in terms of the prediction error are gener-ated between the expectations of the subjective mind and the outcome generated in the objective world. Consequently, it is at this moment of incoherence that the “self- consciousness” of the robot arises, whereby

11

2

4

8

16

4 16Interval (Steps)

Freq

uenc

y (T

imes

)

64 128

Figure 7.8. Distribution of interval steps between catastrophic prediction error peaks greater than 0.5, where the x axis represents the interval steps and the y axis represents the frequency of appearance in the corresponding range, and both axes are in log scale.


170

the system’s attention is directed toward the conflicts to be resolved. On the other hand, in the steady phase, the “self- consciousness” is reduced substantially, as there are no conflicts demanding the system’s attention.

This interpretation of the experimental observations corresponds to the aforementioned analysis in Heidegger’s example of the hammer miss-ing the nail (see section 3.4) as well as in James’ concept of the stream of consciousness (see section 3.6) in which the inner stream consists of transient and substantive parts, and the self can become consciously aware momentarily in the discrete event of breakdown. With reference to the Scottish philosopher David Hume, Gallagher (2000) considered that this “momentary self” is in fact a “minimal self,” which should be dis-tinguished from the self- referential self or narrative- self provided with a past and a future in the various stories that we tell about ourselves.

However, one question still remains for us to address here: Why couldn’t the coherence in the steady phase last longer and the break-down into incoherence take place intermittently? It seems that the com-plex time evolution of the system emerged from mutual interactions between multiple local processes. It was observed that changes in the visual attention dynamics due to changes in the predictability caused drifts in the robot’s maneuvers. These drifts resulted in misrecogni-tion of upcoming landmarks, which led to modification of the dynamic memory stored in the RNN and a consequent change in predictability. Dynamic interactions took place as chain reactions with certain delays among the processes of recognition, prediction, perception, learning, and acting, wherein we see the circular causality between the subjec-tive mind and the objective world. So, this circular causality might then provide a condition for developing a certain criticality.

The aforementioned circular causality can be explained more intu-itively as follows. When the learning error decreases as learning pro-ceeds, more strict timing of visual recognition is required for upcoming landmarks because only a short period for recognition of the objects is allowed, which is proportional to the magnitude of the current error. In addition, the top- down image for each upcoming landmark pattern is shaped into a fixed one, without variance. This is because the same periodic patterns are learned repeatedly and the robot tends to trace exactly the same trajectories in the steady phase. If all goes completely as expected, this strictness grows as the prediction error decreases fur-ther. Ultimately, at the peak of strictness, catastrophic failure in the recognition of landmark sequences can occur as a result of even minor


171

noise perturbation because the entire system has evolved too rigidly by building up relatively narrow and sharp top- down images.

The described phenomena remind me of a theoretical study con-ducted on sand pile behavior by Bak and colleagues (1987). In their simulation study, grains of sand were dropped onto a pile, one at a time. As the pile grew, its sides became steeper, eventually reaching a critical state. At that very moment, just one more grain would have triggered an avalanche. I consider that this critical state is analogous to the situation of generating catastrophic failures in recognizing the landmarks in the robotics experiment. Bak found that although it is impossible to pre-dict exactly when the avalanche will occur, the size of the avalanches is distributed in accordance with a power law. The natural growth of the pile to a critical state is known as self- organized criticality (SOC), and it is found to be ubiquitous in various other phenomena as well, such as earthquakes, volcanic activity, the Game of Life, landscape formation, and stock markets. A crucial point is that the evolution toward a certain critical state itself turns out to be a stable mechanism in SOC. It is as if a critical situation such as “tangency” (see section 5.1) can be preserved with structural stability in the system. This seems to be possible in the system with relatively larger dimensions allowing local nonlinear interac-tions inside (Bak et al., 1987).

Although we might need a larger experimental dataset to confirm the presence of SOC in the observed results, I speculate that some dynamic mechanisms for generating criticality could be responsible for the auton-omous nature of the “momentary self,” which James metaphorically spoke of as an alternation of periods of flight and perching throughout a bird’s life. Here, the structure of consciousness responsible for generat-ing the momentary self can be accounted for by emergent phenomena resulting from the aforementioned circular causality.

Incidentally, readers may wonder how we can appreciate a robot with such fragility in its behavior characterized by SOC— the robot could “die” by crashing into the wall due to a large fluctuation at any moment. I argue, however, that the potential for an authentic robot arises from this fragility (Tani, 2009), remembering what Heidegger said about the authentic being of man, who resolutely anticipates death as his own- most possibility (see section 3.4). By following Heidegger, the vivid “nowness” of a robot might be born in this criticality as a consequence of the dynamic interplay between looking ahead to the future for possibili-ties and regressing to the conflictive past through reflection. In this, the


172

robot may ultimately achieve authentic being in terms of its irreplace-able behavioral trajectories.

Finally, we may ask whether the account provided so far could open a new pathway to access the hard problem of consciousness character-ized by Chalmers (see section 4.3) or not. I would say “yes” by observ-ing the following logic. The top- down pathway of predicting perceptual event sequences exemplifies subjectivity because it is developed solely along with the first- person experiences of perceptual events accumu-lated through iterative interactions in the objective world. Subjectivity is not a state but a dynamic function of predicting the perceptual outcomes resulting from interactions with the objective world. If this is granted, the consciousness that is the first- person awareness of one’s own subjectivity can originate only from a sense of discomfort in one’s own predictability— that is the prediction error,2 which is also the first- person experience but in another level of the second order (where the contents of prediction are the first order). Subjectivity as a mirror of the objective world cannot be aware just by itself alone. It requires differen-tiation from the objective world as another pole by means of interacting with it. To this end, the subject and the object turn out to be an insepa-rable entity by means of the circular causality between them wherein the open dynamics characterized by intermittent transitions between the predictable steady phase and the conflictive unsteady one emerges. And as such, this interpretation of experimental results reviewed in this chapter provides insight into the fundamental structure of con-sciousness, rather than merely into a particular state of consciousness or unconsciousness at a moment.

7.3. Summary

This chapter introduced two robotics experiments on the topics of pre-diction learning in the navigation domain by utilizing mobile robots with a focus on how robots can acquire subjective views of the exter-nal world through iterative interactions with it. The first experiment focused on the problem of learning to extract compositionality from

2. Recently, Karl Friston (2010) proposed that a likelihood measure by prediction error divided by an estimate of its variance can represent the “surprise” of the system. This measure might quantify the state of consciousness better than simply error itself.


173

sensory– motor experiences and their grounding. The experimental results using the Yamabico robot showed that the compositionality hid-den in the topological trajectory in the obstacle environments can be extracted by the predictive model instantiated by RNN. The navigation of the robot became inherently robust because the mechanism of autore-covery was supported by means of the development of the global attrac-tor in the RNN dynamics. We concluded that symbol- like structures self- organized in neurodynamic systems can be naturally grounded in the physical environment by allowing active interactions between them in a shared metric space.

The second experiment addressed the phenomenological problem of “self” by further extending the aforementioned robot navigation experi-ments. In this new experiment, a vision- based mobile robot implemented with an RNN model learns to predict landmark sequences experienced during its dynamic exploration of environment. It was shown that the developmental learning process during the exploration switches sponta-neously between coherent phases (when the top- down prediction agrees with the bottom- up sensation) and incoherent phases (when conflicts appear between the two). By investigating possible analogies between this result and the phenomenological literature on the self, we drew the conclusion that the open dynamic structure characterized by SOC can account for the underlying structure of consciousness through which the “momentary self” appears autonomously.

It is interesting to note that, although I emphasized the grounding of the subjective image of the world in the first navigation experi-ment, the second experiment suggested that the momentary self could appear instead in the sense of the groundlessness of subjectivity. The apparent gap between these two has originated from two different research attitudes for exploring cognitive minds, which are revisited in later chapters. One drawback of the models presented for robot navigation in this chapter is that the models could not provide direct experience of perceptual flow to the robots because the model oper-ated in an event- based manner that was designed and programmed by the experimenters. The next chapter introduces a set of robotics experiments focusing on mirror neuron mechanisms in which we con-sider how event- like perception develops out of the continuous flow of perceptual experience, as related to phenomenological problem of time perception.

174

175

175

8

Mirroring Action Generation and Recognition with Articulating Sensory– Motor Flow

In the physical word, everything changes continuously in time like a river flows. Discontinuity is just a special case. Sensory- motor states change continuously and neural activation states in essential dimen-sions do so, too as Churchland observed (2010; also see section 4.3). If this is granted, one of the most difficult questions in understanding the sensory– motor system should be how continuous sensory– motor flows can be recognized as well as generated structurally, that is, recog-nized as segmented into “chunks” as well as generated with articulation. According to the motor schemata theory proposed by Michael Arbib (1981), a set of well- practiced motor programs or primitives are stored in long- term memory, and different combinations of these programs in space and time can generate a variety of motor actions. Everyday actions, such as picking up a mug to drink some coffee can be generated by con-catenating different chunks or behavioral schemes, namely those of the vision system attending to the mug, the hand approaching the handle of the mug in the next chunk, followed by the hand gripping the handle in the final chunk. Similarly, Yasuo Kuniyoshi proposed that complex


176

human actions can be recognized by their structurally segmenting visual perceptual flow of concatenated reusable patterns (Kuniyoshi et al., 1994). Kuniyoshi and colleagues (2004) also showed in a psychological experiment that recognition of timing of such segmentation is essential to extract crucial information about the action observed.

The problem of segmentation is closely related also to the afore-mentioned phenomenological problem of time perception considered by Husserl, which is concerned with the question of how a flow of experiences in the preempirical level can be consciously recalled in the form of articulated objects or events at the objective time level (section 3.2). Please note that we did not address this problem in the previous experiments with the Yamabico robot because segmentation of sen-sory flows was mechanized by the hand- coded program for branching. Yamabico received sequences of discontinuous sensory states at each branching point.

In this chapter, our robots have to deal with a continuous flow of sensory- motor experiences. Then, we investigate how these robots can acquire a set of behavioral schemes and how they can be used for recognizing as well as generating whole complex actions by segment-ing or articulating the sensory- motor flow. I presume that mirror neu-rons are integral to such processes because I speculate that they encode basic behavior schemes in terms of predictive coding (Rao & Ballard, 1999; Friston, 2010; Clark, 2015) that can be used for both recogni-tion and generation of sensory- motor patterns, as mentioned previ-ously (see section 4.2). This chapter develops this idea into a synthetic neurorobotics model.

The following sections will introduce our formulation of the basic dynamic neural network model for the mirror neuron system. The for-mulation is followed by neurorobotics experiments utilizing the model for a set of cognitive behavior tasks including creation of novel patterns via learning a set of behavior patterns, imitative learning, and acquisition of actional concepts via associative learning between a quasilanguage and motor behaviors. The analysis of these experimental results provide us with some insight into how the interaction between the top- down prediction/ generation process and the bottom- up recognition process can achieve segmentation of a continuous perceptual flow into mean-ingful chunks, and how distributed representation schemes adopted in the model can enhance the generalization of learned behavioral skills, knowledge, and concepts.

Mirroring Action Generation and Recognition 177

177

8.1. A Mirror Neuron Model: RNNPB

In this section, we examine a dynamic neural network model, the recurrent neural network with parametric biases (RNNPB) that I and my colleagues (Tani, 2003; Tani et al., 2004) proposed as a possible model to account for the underlying mechanism for mirror neurons (Rizzolatti et al., 1996.) The RNNPB model adopts the distributed rep-resentation framework by way of which multiple behavioral schemes can be memorized in a single network by sharing its neural resources. This contrasts with the local representation framework in which each memory content is stored in a distinct local module network separately (Wolpert & Kawato, 1998; Tani & Nolfi, 1999; Demiris & Hayes, 2002; Shanahan, 2006).

In RNNPB, the inputs of a low- dimensional static vector, the para-metric bias (PB) represent the intention for action to be enacted. The RNNPB generates prediction of the perceptual sequence for the out-come of the enactment of the intended action. The RNNPB can model the mirror neuron system in an abstract sense because the same PB vector value accounts for both generation and recognition of the same action in terms of the corresponding perceptual sequence pattern. This idea corresponds to the aforementioned concept about the predictive model in the parietal cortex associated with mirror neurons shown in Figure 4.6. From the viewpoint of dynamical systems, the PB vector is considered to play the role of bifurcation parameters in nonlinear dynamical systems as the PB shifts the dynamic structure of the RNN for generating different perceptual sequences. Let’s look at the detailed mechanism of the model (Figure 8.1).

The RNNPB can be regarded as a predictive coding or genera-tive model whereby different target perceptual sequence patterns,

pt , ...t l= 0 -1 can be learned for regeneration as mapped from the cor-responding PB vector values. The PB vector for each learning sequence pattern is determined autonomously without supervision by utilizing the error signals back- propagated to the PB units, whereas the synaptic weights (common to all patterns) are determined during the learning process as shown in Figure 8.1a. Readers should note that the RNNPB can avoid the frame problem described in section 4.2 because the dynamic mapping to be learned is not from arbitrary actions to per-ceptual outcomes at each time step but from a specific set of actional


178

Inferred byerror-BP

dela

y lin

e

ct

PB

Externallyset

pt+1

PB

Learning phase

Inferred byerror-BP

dela

y lin

e

pt+1

pt ctpt ct

PB

Teaching target: pt+1 Perception target: pt+1

Recognition phaseGeneration phase

Error

pt+1

Error

pt

(a) (b) (c)

Figure 8.1. The system flow of a recurrent neural network with parametric biases (RNNPB) in (a) learning mode, (b) top- down generation mode where intention is set externally in the PB, and (c) bottom- up recognition mode wherein intention in the PB is inferred by utilizing the back- propagated error.

intentions to the corresponding perceptual sequences. This makes the learning process feasible because the network is trained not for all pos-sible combinatorial trajectories but only for selected ones.

After the learning is completed, the network is used both for generat-ing (predicting) and recognizing perceptual sequences. The learned per-ceptual sequences can be regenerated by means of forward dynamics of the RNNPB by the PB set given with values determined in the learning process (see Figure 8.1b). This is the top- down generation process with the corresponding actional intention represented by the PB. Perceptual sequences can be generated and predicted either in the open- loop mode by receiving the current perceptual inputs from the environment, or in the closed- loop mode, wherein motor imagery is generated by feeding back the network’s own prediction outputs into the inputs (dotted line indicates the feedback loop.)

On the other hand, in (c), the experienced perceptual sequences can be recognized by searching the optimal PB values that minimize the errors between the target sequences to be recognized and the output sequences to be generated, as shown in Figure 8.1c. This is the bottom- up process of inferring the intention in terms of the PB for the given perceptual sequences. As an experiment described later shows, gen-eration of action and recognition of the resultant perceptual sequences


179

can be performed simultaneously. More specifically, behavior is gen-erated by predicting change in posture in terms of proprioception, depending on the current PB, while the PB is updated in the direction of minimizing the prediction error for each coming perceptual input. By this means, the intention– perception cycle can be achieved in the RNNPB, whereby the circular causality between intention and percep-tion appears. Note also that both action learning and generation are formulated as dynamic processes for minimizing the prediction error (Tani, 2003), the formulation of which is analogous to the free- energy principle proposed by Karl Friston (2005; 2010).

Here, I should explain the learning process more precisely, because its mechanism may not necessarily be intuitive. When learning is com-menced, the PB vector of each training sequence is set to a small random value. The forward top- down dynamics initiated with this temporarily set PB vector generates a predictive sequence for the training perceptual sequence. The error generated between the target training sequence and the output sequence is back- propagated along the bottom- up path iter-ated backward through time steps via recurrent connections, whereby the connection weights are modified in the direction of minimizing the error signal. The error signal is also back- propagated to the PB units, in which their values for each training sequence are modified. Here, we see that the learning proceeds by having dense interactions between the top- down regeneration of the training sequences and the bottom- up regression of the regenerated sequences utilizing the error signals. The internal structures for embedding multiple behavior schemata can be gradually developed though this type of the bottom- up and top- down interaction by self- organizing distributed representation in the network.

It is also important to note that the generation of sequence pat-terns is not limited to trained ones. The network can create a vari-ety of similar or novel sequential patterns depending on the values of the PB vector. It is naturally assumed that if PB vectors are similar, they would generate similar sequence patterns, otherwise they could be quite different. The investigation of these characteristics is one of the highlights in the study of the current model characterized by its distributed representational nature. The following subsections detail such characteristics of the RNNPB model by showing robotics experi-ments using it.


180

8.2. Embedding Multiple Behaviors in Distributed Representation

A simple experiment involving learning a set of target motor behaviors was conducted to examine PB mapping, in which a structure emerges as a result of self- organization through the process of learning. PB mapping shows how points in the PB vector space can be mapped to sequence patterns to be generated after learning a set of target patterns. In this experiment, an RNNPB was trained on five different movement pat-terns of a robotic arm with four degrees of freedom. The five target movement patterns in terms of four- dimensional proprioceptive (joint angle) sequence patterns are shown in Figure 8.2.

Teach- (1, 2, and 3) are discrete movements with different end points, and Teach- (4 and 5) are different cyclic movements. The arrows associ-ated with those sequence patterns indicate the corresponding PB vector points determined in two- dimensional space in the training. It can be seen that the PB vectors for all three discrete movement patterns appear in the upper right region and the PB vectors for the two target cyclic movement patterns appear in the lower right region in PB space, which was found to be divided into two regions (the boundary is shown as a dotted curve), as shown in Figure 8.2.

The area above the dotted curve is the region for generating dis-crete movements, and the remaining area under the dotted curve is for cyclic movement patterns (including nonperiodic ones). An impor-tant observation is that the characteristic landscape is quite smooth in the region of discrete movements, wherby if the PB vector is changed slightly, the destination point of the discrete movement changes only slightly. Particularly inside of the triangular region defined by these three PB points corresponding to the trained discrete movements, the profiles of all generated sequence patterns seem to be generated by interpolations of these three trained sequence patterns. On the other hand, the characteristic landscape in the region of periodic movement patterns is quite rugged against changes in the PB values. The pro-files of generated patterns could change drastically as compared with changes in the PB vector in this region. Patterns generated from this region could include a variety of novel patterns, such as Novel- (1 and 2) shown in Figure 8.2. Novel- 2 is a nonperiodic pattern that is espe-cially difficult to imagine as being derived from the profiles of the training patterns.

181

1.0

0.0 1.0PB1

(0.61,0.29)

(0.86,0.49)

(0.87,0.81)(0.57,0.71)

(0.78,0.91)

PB2

00

1.0

20 3010 40

Teach-4

00

1.0

20 3010 40

Teach-5

00

1.0

20 40 60 80 100Novel-2

00

1.0

20 40

Novel-1

00

1.0

105Teach-1

00

1.0

10 155Teach-2

00

p

1.0

105Teach-3

P[1]

P[3]P[4]

P[2]

time

Prop

rioce

ptio

n

Prop

rioce

ptio

n

Prop

rioce

ptio

n

time

time time

Figure 8.2. Mapping from PB vector space with two- dimensional principal components to the generated movement pattern space.


182

One interesting observation here is that two qualitatively distinct regions appeared, namely the discrete movement part and the cyclic movement part, including nonperiodic patterns. The former successfully achieves generalization in terms of interpolation of trained sequence pat-terns because it might be easy to extract common structures shared by the three trained discrete movements, which exhibit fixed- point dynam-ics with various destination points. On the other hand, in the latter case it is difficult to achieve generalization because structures shared between the two cyclic movement patterns with different shapes, periodicities, and amplitudes are equally difficult to extract. This results in a highly nonlinear landscape in this region due to the embedding of quite dif-ferent dynamic patterns in the same region. In such a highly nonlinear landscape, diverse temporal patterns can be created by changing the PB vector.

The aforementioned experiment result fits very well with James’s thought (James, 1892) that when the memory hosts complex relations or connections between images of past experiences, images can be regenerated with spontaneous variations into streams of consciousness (see section 3.6). James predicted this type of phenomena without con-ducting any experiments or simulations but only from formal introspec-tion. Now that we have covered the basic characteristics of the RNNPB model, the following subsections introduce a set of cognitive robotics experiments utilizing the RNNPB model with a focus on mirror neu-ron functions. First, the next subsection looks at the application of the RNNPB model to a robot task of imitation learning.

8.3. Imitating Others by Reading Their Mental States

In section 5.2, I briefly explained about the development of imita-tion behavior with emphasis on its early stage in which the imitation mechanism is accounted for by simple stimulus response. I also intro-duced a robot study by Gaussier and colleagues (1998) that showed that robots can generate synchronized imitation with other robots using acquired visuo- proprioceptive mapping under the homeosta-sis principle. Rizzolatti and colleagues (2001) suggested the neural mechanism at this level as response facilitation without understanding meaning. Experimental results using monkeys indicated that the same


183

motor neurons in the rostral part of the inferior parietal cortex are acti-vated when a monkey generates and when he observes meaningless arm movements.

Also, as mentioned in section 4.2, it was observed that the same F5 neurons in monkeys fire when purposeful motor actions such as grasp-ing an object, holding it, and bringing it to the mouth are either gener-ated or observed. The neural mechanism at this level is called response facilitation with understanding meaning (Rizzolatti et al., 2001), which is considered to correspond to the third stage of the “like me” mecha-nism hypothesized by Meltzoff (2005). In this stage “my” mental state can be projected to those of others who act “like me.” I consider that our proposed mechanism for inferring the PB states in RNNPB can account for the “like me” mechanism at this level. Let’s look here at the results of a robotics experiment that my team conducted to elucidate how the recognition of other’s actional intention can be mirrored in one’s own generation of the same action, wherein the focus falls again on the online error regression mechanism used in the RNNPB model (Ito & Tani, 2004; Ogata et al., 2009).

8.3.1 Model and Robot Experiment Setup

This experiment on imitative interactions between robots and humans was conducted by using Sony humanoid robot QRIO (Figure 8.3).

In the learning phase of this experiment, the robot learns multi-ple hand movement patterns demonstrated by the experimenter. The RNNPB learns to predict how the positions of the experimenter’s hands (perceived as a visual image) change in time in terms of dynamic mapping from vt to vt+1. Simultaneously, the network also learns, in an imitative manner, to predict how its own arms (4DOF joints for each arm) move as corresponding to the observed movements performed by the experimenter. This prediction takes the form of dynamic map-ping of arm proprioception from pt to pt+1 through direct training per-formed by a teacher who guides the movements of the robot’s arms by moving them directly while following the experimenter’s hand movements. The tutoring is conducted for each movement pattern by determining its corresponding PB vector for encoding. In the interac-tion phase, when one of the learned movement patterns is demon-strated by the experimenter, the robot is expected to recognize it by


184

inferring an optimal PB vector for reconstruction of the movement pattern through which its own corresponding movement pattern may be generated. When the experimenter switches his/ her demonstration of hand movement patterns from one to another freely, the movement patterns generated by the robot should change accordingly by inferring the optimal PB vector.

8.3.2 Results: Reading Others’ Mental States by Segmenting Perceptual Flow

In the current experiment, after the robot was trained on four different movement patterns, it was tested in terms of its dynamic adaptation to sudden changes in the patterns demonstrated by the experimenter. Figure 8.4 shows one of the obtained results in which the experimenter switched demonstrated movement patterns twice during a trial of 160 steps.

It can be seen that when the movement pattern demonstrated by the experimenter was shifted from one of the learned patterns to another,

Figure 8.3. Sony humanoid robot QRIO employed in the imitation learning experiment. Reproduced from Tani et al. (2004) with permission.

185

LYHLZHRYHRZH

LYHLZHRYHRZH

PBN1PBN2PBN3PBN4

20 40 60 80 100 120 140Step

PBG

ener

ated

Rob

ot A

rm(J

oint

Ang

le)

Pred

icte

d H

uman

H

and

Posit

ion

Act

ual H

uman

Han

d Po

sitio

n

1.00.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

1.0

0.8

0.6

0.4

0.2

0.0

Step

Step

Step

L: leftR: rightY: Y-axisZ: Z-axisH: hand

SH: shoulderP: pitchR: rollY: yaw


LSHPLSHRLSHYRSHPRSHRRSHY

20 40 60 80 100 120 140

20 40 60 80 100 120 140

20 40 60 80 100 120 140

Figure 8.4. Dynamic changes in the movement patterns generated by the robot triggered by changes in the movements demonstrated by the experimenter. The time evolution profile of the perceived position of the experimenter’s hand and the profile predicted by the robot are shown in the first and the second rows, respectively. The third and fourth rows show the time profiles for the predicted proprioception (joint angles) of the robot’s arm and the PB vectors, respectively. Adopted from Tani et al. (2004) with permission.


186

the visual and proprioceptive prediction patterns were also changed cor-respondingly, accompanied by stepwise changes in the PB vector. Here, it can be seen that the continuous perceptual flow was segmented into chunks of different learned patterns via sudden changes in the PB vector mechanized by bottom- up error regression. This means that RNNPB was able to read the transition of mental states of the experimenter by segmenting the flow.

There was an interesting finding that connects the ideas of compo-sitionality and segmentation. When the same robot was trained for a long sequence that consisted of periodic switching between two differ-ent movement patterns, the whole sequence was encoded by a single PB vector without segmentation. This happened because perception of every step in the trained sequence was perfectively predictable, includ-ing the moment of switching between the movement patterns due to the exact periodicity in the tutored sequence. When everything becomes predictable, all moments of perception belong to a single chunk without segmentation. The compositionality entails potential unpredictability because there is always some arbitrariness, perhaps by “free will,” in combining a set of primitives into the whole. Therefore, segmentation of the whole compositional sequence into primitives can be performed by using the resultant prediction error. In this situation, what is read from the experimenter’s mind might be his or her “free will” for alternating among primitive patterns.

The aforementioned results accord with the phenomenology of time perception. Husserl assumed that the subjective experience of “now-ness” is extended to include the fringes in the sense of both the expe-rienced past and the future, in terms of retention and protention, as described in section 3.3. This description of retention and protention in the preempirical level seems to correspond directly to the forward dynamic undertaken by RNNs (Tani, 2004). RNNs perform predic-tion by retaining the past flow in a context- dependent way. This self- organized contextual flow of the forward dynamics in RNNs could be responsible for the phenomenon of retention. Even if Husserl’s notion of nowness in terms of retention and protention is understood as corre-sponding to contextual dynamics in RNNs, the following question still remains: What are the boundaries of nowness?

The idea of segmentation could be the key to answering this question. Our main idea is that nowness is bounded where the flow of experience


187

is segmented (Tani, 2004). In the RNNPB model, when the external perceptual flow cannot be matched with the internal flow correspond-ing to the anticipated outcome, the resultant error drives PB vector change. When the prediction is not fulfilled, the flow is segmented into chunks, which are no longer just parts of the flow but rather represent events that are identified as one of the perceptual categories by the PB vector. This identification process takes a certain period of effort accompanied by “consciousness” because of delays in the convergence of the PB regression dynamics, as observed in the preceding experiments. This might also explain the aforementioned observation by Varela (1999) that the flow of events in the immediate past is experienced just as an impression, which later becomes a consciously retrieved object after undergoing segmentation. Finally, I claim that projection of“my” mental state to those of others who act “like me” assumed in the third stage of Meltzoff’s (2005) “like me” mechanism should accompany such conscious process.

8.3.3 Mutual Imitation Game

The previous experiment involved unidirectional interaction in which only the robot adapted to movements demonstrated by the experi-menter. Our next experiment examined the case of mutual interac-tion by introducing a simple game played by the robot and human subjects. In this new experiment, the robot was trained for four move-ment patterns by the experimenters and then human subjects who were unaware of what the robot had learned participated. In the imita-tion game, the subjects were instructed to identify as many movement patterns as possible and to synchronize their movements with those of the robot through interactions. Five subjects participated in the experiment and each subject was allowed to interact with the robot for 1 hour.

Although most of the subjects eventually identified all of the move-ment patterns, the interaction was not trivial for them. If they merely attempted to follow the robot’s movement patterns, convergence could not be achieved in most instances because the PB values fluctuated wildly when unpredictable hand movement patterns were demon-strated. Actually, the robot tended to generate diverse movement pat-terns due to fluctuations in the PB. Also, if the subjects attempted to


188

execute their desired movement patterns regardless of the robot’s move-ments, the robot could not follow them unless the movement patterns of the subjects corresponded to those already learned by the robot.

The movement patterns of the human and the robot as well as the neural activity (PB units) obtained during interaction in the imitation game are plotted in Figure 8.5 in the same format as in Figure 8.4. We can see that diverse movement patterns are generated by the robot and the human subject, accompanied by frequent shifts during their interactions.

It can be seen that matching by synchronization between the human subject’s movements and the robot’s predictions is achieved after an exploratory phase (see the sections denoted as “Pattern 1” and “Pattern 2” in the figure). However, it was often observed that such matching was likely to break down before a match was achieved for another pattern.

An interesting observation involves the spontaneous switching of initiative between the robot and the subjects. In postexperiment inter-views, the subjects reported that when they felt that the robot movement pattern became close to theirs, they just kept following the movements passively to stabilize the pattern. However, when they felt that their movements and those performed by the robot could not synchronize, they often initiated new movement patterns, hoping that the robot would start to follow them and eventually synchronize its movements with those of the subject. This observation is analogous to the turn tak-ing during imitative exchange observed by Nadel (2002) as described in section 5.2.

Another interesting observation was that spontaneous transitions between the synchronized phase and the desynchronized phase tended to occur more frequently in the middle of each session, when the sub-ject was already familiar with the robot’s responses to some degree. When the subjects managed to reach a synchronized movement pat-tern, they tended to keep the attained synchronization for a short period of time to memorize the pattern. However, this synchronization could break down after a while due to various uncertainties in mutual interactions. Even small perturbations could confuse the subjects if they were not yet fully confident of the robot’s repertoire of movement patterns. This too can be explained by the mechanism of self- organized criticality (see section 7.2), which can emerge only during a specific period characterized by an adequate balance between predictability

189

20 40 60 80 100 120 140 160 180 200

1.00

0.80

0.60

0.40

0.20

0.00

20 40 60 80 100 120 140 160 180 200

1.00

0.80

0.60

0.40

0.20

0.00

20 40 60 80 100 120 140 160 180 200

1.00

0.80

0.60

0.40

0.20

0.00

LYHLZHRYHRZH

PBN1PBN2PBN3PBN4



Pattern1 Pattern2

Act

ual H

uman

Han

dPo

sitio

nPr

edic

ted

Hum

an H

and

Posit

ion

PB

Step

LYHLZHRYHRZH

Figure 8.5. A snapshot of parameter values obtained during the imitation game. Movement matching by synchronization between the human subject and the robot took place momentarily, as can be seen from the sections denoted as Pattern 1 and Pattern 2 in the plot.


190

and unpredictability in the course of the subjects’ developmental learning in the mutual imitation game. Turn taking was observed more frequently during this period. These results imply that vivid commu-nicative exchanges between individuals can appear by utilizing and anticipating such criticality.

The current experimental results of the imitation game suggest that imitation provides not only a simple function of storing and regenerating observed patterns, but also provides for rich functions of spontaneously generating novel patterns from learned ones through dynamic interac-tions with others. In this context, we may say that imitation for human beings is a means for developing diverse creative images and actions through communicative interaction, rather than simply for mimicking action patterns as demonstrated by others “like me.”

The next subsection explores how mirror neurons may function in developing actional concepts through the association of language with action learning.

8.4. Binding Language and Action

In conventional neuroscience, language processing and action processing have been treated as independent areas of research simply because of the different areas of expertise necessary for conducting studies in each of those areas. However, as mentioned in section 4.2, recent reports have shown that understanding words or sentences related to actions may require the presence of specific motor circuits responsible for gen-erating those actions, and therefore the parts of the brain responsible for language and actions might be interdependent (Hauk et al., 2004; Tettamanti et al., 2005).

According to Chomskian ideas in conventional linguistics, linguistic competence has been regarded as independent from other competen-cies, including sensory– motor processing (see the argument on the fac-ulty of language in narrow sense by Hauser, Chomsky, and Fitch [2002] in section 2.1). This view, however, is now being challenged by recent evidence from neuroscience, including the aforementioned studies examining the interdependence between linguistic and other modali-ties. If everyday experiences involving speech and its corresponding sensory- motor signals tend to overlap during child development, synap-tic connections between the two circuits can be reinforced by Hebbian


191

learning, as discussed by Pulvermuller (2005). This suggests the pos-sibility that the meanings of words and sentences as well as associated abstract concepts can be acquired in association with related sensory– motor experiences. Researchers working in the area of cognitive lin-guistics have proposed the so- called usage- based approach (Tomasello, 2009), wherein it is argued that linguistic competency can be acquired through statistical learning of linguistic and sensory– motor stimuli dur-ing child development, without the need to assume innate mechanisms such as Chomsky’s universal grammar. Analogous to these ideas is the view of Arbib (2012) discussed earlier, that the evolution from dex-terous manual behaviors learned by imitation to the anticipated imi-tation of conventionalized gestures (protolanguage) is reflected in the evolution within the primate line and resulted in humans endowed with “language- ready” brains.

8.4.1 Model

In this context, we consider the possibly interdependent nature of lan-guage and motor action in terms of a mirror neuron model. This concept is based on a predictive coding model for linguistic competence assumed in the extension of Wernicke’s area to Broca’s area and another predic-tive coding model for the action competency assumed in the extension from Broca’s area and the parietal cortex to the motor cortex. Broca’s area, as a hub connecting these two distinct pathways, is assumed to play the role of unifying the two different modalities by means of mir-roring recognition in one modality and generation in the other modality by sharing the intention.

The version of the RNNPB model proposed by Yuuya Sugita and me (Sugita & Tani, 2005) for investigating the task of recognizing a given set of action- related imperative sentences (word sequences) and of also generating the corresponding behaviors (sensory- motor sequences) is shown in Figure 8.6.

The model consists of a linguistic RNNPB and a behavioral RNNPB that are interconnected through PB units. The key idea of the model is that the PB activation vectors in both modules are bound to become identical for generating pairs of corresponding linguistic and behavioral sequences via learning. More specifically, in the course of associative learning of pairs of linguistic and behavioral sequences, the PB activa-tion vectors in both modules are updated in the direction of minimizing


192

their differences as well as minimizing the prediction error in both modalities (Figure 8.6a). By using the error signal back- propagated from both modules to the shared PB units, a sort of unified representation between the two modalities is formed through self- organization in the PB activations. After convergence of the bound learning, word sequences shown to the linguistic RNNPB can be recognized by inferring the PB

teaching word sequencetarget: wT+1

teaching sensory motortarget: mt+1, st+1

ErrorwT+1

wT

PBI

PB

ct

linguistic module behavior moduleShared

Errormt+1

mt st

st+1

PBbct

Learning phase

ErrorwT+1

wT

PBI

ct

mt+1 st+1

mt st

PBb

ct

PB

given word sequencewT+1 sensory motor generation

linguistic module behavior moduleTransfer

Recognition and generation phase

(b)

(a)

Figure 8.6. RNNPB model extended for language- behavior bound learning. (a) Bound learning of word sequences and corresponding sensory- motor sequences through shared PB activation and (b) recognition of word sequences in the linguistic recurrent neural network with parametric biases (RNNPB) and generation of corresponding sensory– motor sequences in the behavioral RNNPB. Redrawn from Tani et al. (2004).


193

activation values by means of error regression. Thereafter, the forward dynamics of the behavioral RNNPB activated with the obtained PB acti-vation values generate a prediction of the corresponding sensory- motor sequences (Figure 8.6b).

8.4.2 Robot Experiments

Yuuya Sugita and I (Sugita & Tani, 2005) conducted robotics experi-ments on this model by utilizing a quasilanguage with the aim of gain-ing insights into how humans acquire compositional knowledge about action- related concepts through close interactions between linguistic inputs and related sensory– motor experiences. We also addressed the issue of generalization in the process of learning linguistic concepts, which concerns the inference of the meanings of as yet unknown combi-nations of word sequences through a generalization capability related to the “poverty of stimulus” problem (Chomsky, 1980) in human language development.

A physical mobile robot equipped with vision and a one- DOF arm was placed in a workspace in which red, blue, and green objects were always located to the left, in front, and to the right of the robot, respec-tively (Figure 8.7).

mobile robot with vision and1-D hand at home position

(a) (b)red, blue and green objects

redblue

green

Figure 8.7. Robot experiment setup for language- behavior bound learning. (a) The task environment with the mobile robot in the home position and three objects in front of the robot. (b) A trained behavior trajectory of the command “hit red.” Adopted from Tani et al. (2004) with permission.


194

A set of sentences consisting of three verbs (point, push, hit), six nouns (left, center, right, red, blue, green) were considered. For exam-ple, “push red” means that the robot is to move to the red object and push it with its body, and “hit left” means that the robot is to move to the object to its left and hit it with its arm (Figure 8.7b). Note that “red” and “left” are synonymous in the setting of this workspace, as are “blue” and “center” and as are “green” and “right.” For given combinations of verbs and nouns, corresponding actions in terms of sensory- motor sequences composed of more than 100 steps are trained by guiding the robot while introducing slight variations in the positions of the three objects with each trial. The sensory- motor sequences consist of sensory inputs in the form of several visual feature vectors, values for motor torques of the arm and wheel motors, and motor outputs for the two wheels and the one- DOF arm. To investigate the generalization capabil-ities of the robot, especially in the case of linguistic training, only 14 out of the 18 possible sentences were trained. This means that behavioral categories corresponding to the four untrained sentences were learned without being bound with sentences.

8.4.3 Compositionality and Generalization

Recognition and generation tests were conducted after convergence in learning was attained by minimizing the error. Corresponding behav-iors were successfully generated for all 18 sentences, including the four untrained ones. To examine the internal structures emerging as a result of self- organization in the bound learning process, an analysis of the PB mapping was conducted by taking two- dimensional principal components in the original six- dimensional PB space. Figure 8.8 shows the PB vector points corresponding to all 18 sentences as plotted in a two- dimensional space.

These PB points were obtained as a result of the recognition of cor-responding sentences. The PB vector points for the four untrained word sequences are surrounded by dashed circles in the figure. First, it can be seen that PB points corresponding to sentences with the same verbs fol-lowed by synonymous nouns appeared close to each other on the two- dimensional map. For example, “hit left” and “hit red” appeared close to each other in the space. Even more interesting is that the PB map-pings for all 18 sentences appeared in the form of a two- dimensional grid structure with one dimension for verbs and another for nouns.


195

This means that the PB mapping emerged through self- organization of an adequate metric space, which can be used for compositional repre-sentation of acquired meanings in terms of combinations of verbs and object nouns. Furthermore, it should be noted that even the untrained sentences (“push red/left” and “point green/right”) were mapped to appropriate points on the grid (see the points surrounded by dotted circles in Figure 8.8). This explains why untrained sentences were rec-ognized correctly, as inferred from the successful generation of corre-sponding behaviors.

These results imply that meanings are acquired through generaliza-tion when a set of meanings is represented as a distribution of neural activity while preserving the mutual relationships between meanings in a binding metric space. Such generalization cannot be expected to arise if each meaning or concept is stored in a separate local module as is the case in localist models. It is postulated that mutual interactions between

PB (1st PC)

PB (2

nd P

C)

0.2 0.80.2

0.8

push

green

blue

rednoun

verbhit

point

Point redPoint leftPoint bluePoint centerPoint greenPoint right

Push redPush leftPush bluePush centerPush greenPush right

Hit redHit leftHit blueHit centerHit greenHit right

Figure 8.8. Mapping from PB vector points to generated word sequences. The two- dimensional grid structure consists of an axis for verbs and another for nouns. Four PB points surrounded by dotted circles correspond to untrained sentences (push red, push left, point green, and point right.) Redrawn from Sugita and Tani (2005).


196

different concepts during learning processes can eventually induce the consolidation of generalized structures in the memory structure as rep-resented earlier in the form of a two- dimensional distribution. This idea is analogous to what the PDP group (1986) argued in their connection-ist book more than two decades ago (see section 5.4).

Finally, I would like to add one more remark concerning the role of language in developing compositional conceptual space. When the afore-mentioned experiments were conducted without binding the linguistic inputs in learning the same set of action categories, we found that nine different clusters corresponding to different actional categories were developed without showing any structural relations among them, such as is illustrated by the aforementioned two- dimensional grid structure in the PB space. This result suggests that compositionality explicitly perceived in the linguistic input channel can enhance the development of compositionality in the actional channel via shared neural activity, perhaps, again, within the Broca’s area of the human brain.

8.5. Summary

We’ve now covered RNNPB models that can learn multiple behavioral schemes in the form of structures represented as distributions in a sin-gle RNN. The model is characterized by the PB vector, which plays an essential role in modeling mirror neural functions in both the generation and recognition of movement patterns by forming adequate dynamic structures internally through self- organization. The model was evalu-ated through a set of robotics experiments involving the learning of multiple movement patterns, the imitation learning of others’ move-ment patterns, and generating actional concepts via associative learning of protolanguage and behavior.

The hallmark of these robotics experiments exists in their attempt to explain how generalization in learning as well as creativity for gen-erating diversity in behavioral patterns can be achieved through self- organizing distributed memory structures. The contrast between the proposed distributed representation scheme and the localist scheme in this context is clear. On the localist scheme, each behavioral schema is memorized as an independent template in a corresponding local module, whereas on the distributed representation scheme, learning


197

is considered to include not just memorizing each template of behav-ioral patterns but also reconstructing them by extracting the structural relationships between the templates. If there are tractable relationships between learned patterns in a set, these relationships should appear in the corresponding memory structures as embedded in a particular met-ric space. Such characteristics of distributed representation in RNNPB model has been investigated by others (Ogata et al., 2006; Ogata et al., 2009; Zhong, et al., 2014) as well.

The aforementioned characteristics were demonstrated clearly in the analysis of the PB mapping obtained in the results of learning a set of movement patterns and of learning bound linguistic and behavioral patterns. The RNNPB model learned a set of experienced patterns not just as they were, but also deeply consolidated them, resulting in the emergence of novel or “creative” images. This observation might account for a fascinating mechanism of human cognition by way of which we humans can develop images or knowledge through multiple stages from our own limited experiences: In the first stage, each instance of experi-ence is acquired; in the second stage, generalized images or concepts are developed by extracting relational structures among the acquired instances; in the third stage, even novel or creative ones can be found in the memory developed with the relational structures after long period of consolidation.

Another interesting characteristic feature of the model is that it accounts for both top- down generation and bottom- up recognition processes by utilizing the same acquired generative model. Interactions between these two processes take place in offline learning processes as well as during real- time action generation/ recognition. In offline learn-ing, iterations of top- down and bottom- up interactions enable long- term structural developments of the internal structures for PB mapping in terms of memory consolidation, as mentioned previously. In real- time action generation/ recognition, shifts of the PB vector by means of error regression enable rapid adaptation to situational changes. As observed in the imitative game experiments, nontrivial dynamics emerge in the close interactions between top- down prediction and bottom- up recog-nition, leading to segmentation of the continuous perceptual flow into meaningful chunks. Complexity arises from the intrinsic characteristics of mutual interactions occurring in the process, whereby recognition of the actions of others in the immediate past has a profound effect on the actions generated by the robot in the current step, which in turn


198

affects the recognition of these perceptual inputs in the immediate future, thereby forming a circular causality over the continuum of time between protention and retention.

The same error regression mechanism can give an account for the problem of imitation. How can motor acts demonstrated by others be imitated by reading their intentions or mental states? It was shown that imitating others by inferring their mental states can be achieved by segmenting the resultant perceptual flow by regressing the PB states with its prediction error. This prediction error may result in the subject becoming conscious while recognizing the shift of mental states of oth-ers as they alternate their motor acts.

Finally, I assume there might be some concerns about the scalabil-ity of the RNNPB model, or more specifically whether there are any limits to the degree of complexity that the learned behavioral pat-terns can have. Here, I just mention that this scalability issue depends heavily on how functional hierarchies can be developed that can decompose complex patterns into sets of simpler ones, or compose them vice versa, in the network. Accordingly, the next two and the final chapter of this book are entirely dedicated to the investigation of this problem.

199

199

9

Development of Functional Hierarchy for Action

It is generally held that the brain makes use of hierarchical organization for both recognizing sensory inputs and generating motor outputs. As an example, chapter 4 illustrated how visual recognition proceeds in the brain from early signal processing in the primary vision area to object recognition in the inferior temporal area. It also described how action generation proceeds from the sequencing and planning of action primi-tives in the supplementary motor area and prefrontal cortex (PFC) to motor pattern generation in the primary motor cortex (M1). Although we don’t yet completely understand what hierarchy and what levels exist in the brain and how they actually function, it is generally accepted that some form of functional hierarchy exists, whereby sensory– motor processing is conducted at the lower level and more global controls of those processes occur at the higher level. Also, this functional hierar-chy is thought to be indispensable for expressing the essential human cognitive competency of compositionality, in other words, composition and decomposition of whole complex action routines from and into reusable parts.


200

In speculating about possible neuronal mechanisms for a functional hierarchy that allows for complex actions to be composed by sequen-tially combining behavior primitives (a set of commonly used behavior patterns), readers should note that there are various ways to achieve such compositions. One possibility is to use a localist representation scheme. For example, Tani and Nolfi (1997, 1999) proposed a localist model, called a “hierarchical mixture” of RNNs, Demiris and Hayes (2002) showed a similar idea in the proposal of Hierarchical Attentive Multiple Models for Execution and Recognition, and also Haruno and colleagues (2003) did in the proposal of the so- called hierarchical MOSAIC. The basic idea was that each behavior primitive is stored in its own independent local RNN at the lower level, and sequential switching of the primitives is achieved by a winner- take- all- type gate- opening control of these RNNs performed by the higher level RNN (see Figure 9.1).

Information processing at the higher level is abstracted in such a way that the higher level only remembers which RNN in the lower level should be selected next as well as the timing of switching over a longer timescale, without concerning itself with details about the

Pattern3Pattern2 Pattern1

Time Steps

Time Steps

Higher

Lower

cTGateT

GateT+1

ct

pt+1 p

t+1pt+1

Perc

eptu

al P

redi

ctio

n

T

Gate-2 Gate-3Gate-1

WTA

-typ

e ga

te o

peni

ng c

ontr

ol

tpt ctpt ctpt

Gate-2

Gat

e op

enin

g

Gate-3Gate-1

Figure 9.1. Hierarchical generation of perceptual sequence patterns in the hierarchical mixture of RNNs. As the higher level RNN dispatches the lower level RNNs sequentially by manipulating the openings of their attached gates, sequential combinations of primitive patterns can be generated.

Development of Functional Hierarchy for Action 201

201

sensory– motor profiles themselves. Although the proposed scheme seems to be straightforward in terms of mechanizing a functional hier-archy, the scheme is faced with the problem of miscategorization in dealing with perturbed patterns. Rather, the discrete mechanism of dispatching behavior primitives through the winner- take- all- type selec-tion of the lower RNNs tends to generate a certain level of information mismatch between the higher and lower levels.

Another possible mechanism can be considered by utilizing a distrib-uted representation scheme in an extension of the RNNPB model. As I previously proposed (Tani, 2003), if a specific PB vector value is assigned to each acquired behavior primitive, sequential changes in the PB vector generated at the higher level by another RNN can cause corresponding sequential changes in the primitives at the lower level (Figure 9.2).

The higher level RNN learns to predict event sequences in terms of stepwise changes in the PB vector, as well as the timings of such events. However, this scheme could also suffer from a similar problem

PB PB1

Pattern3Pattern2

Perc

eptu

al P

redi

ctio

n

Pattern1

T

t

Time Steps

Time Steps

Higher

Lower

PB2

cTPBT

PBT+1

ctpt

pt+1

PBt

Figure 9.2. Possible extension of the RNNPB model with hierarchy, wherein sequential stepwise changes in the PB vector at the higher level generate corresponding changes in the primitive patterns at the lower level. Redrawn from Tani (2003).


202

of information mismatch between the two levels. If one behav-ior primitive is concatenated to another by corresponding stepwise changes in the PB vector, a smooth connection between the two primitives cannot be guaranteed. A smooth connection often requires some degree of specific adaptation of profiles at the tail of the preced-ing primitive and at the head of the subsequent primitive, depend-ing on their combination. However, such fine adaptation cannot take place by simply changing the components of the PB vectors in a step-wise manner within the time necessary for the primitive to change. The same problem is encountered in the case of gated local network models if primitives are changed by simply opening and closing the corresponding gates.

The crucial point here is that the generation of compositional actions cannot be achieved by simply transforming primitives into sequences in the same manner as manipulating discrete objects. Instead, the task requires fluid transitions between primitives by adapting them via interactions between top- down parametric control exerted on the primitives and bottom- up modulation of signals implementing such parametric control. Close interactions could minimize the possible mismatch between the two sides, whereby we might witness what Alexander Luria (1973) metaphorically referred to as “kinetic melody” in the fluid generation of actions. The following sections show that such fluid compositionality can be achieved without using preexist-ing mechanisms such as gating and parametric biases. Rather, it can emerge by using intrinsic constraints on timescale differences in neu-ral activity between multiple levels in the course of self- organization, accompanied by iterative interactions between the levels in consolida-tion learning.

In the following, we see how the functional hierarchy that enables compositional action generation can be developed through the use of a novel RNN model characterized by its multiple timescales dynam-ics. The model is tested in a task involving learning object manipula-tion and developing this learning. We then discuss a possible analogy between the synthetic developmental processes observed and real human infant developmental processes. The discussion helps to explain that how fluid compositionality can be developed in both human and artifact through specific constraints within their brain networks.


203

9.1. Self- Organization of Functional Hierarchy in Multiple Timescales

9.1.1 Multiple- Timescale Recurrent Neural Network

My colleague Yuuichi Yamashita and I (Yamashita & Tani, 2008) pro-posed a dynamic neural network model characterized by the dynamics of its neural activity in multiple timescales. This model, named the multiple- timescale recurrent neural network (MTRNN), is outlined in Figure 9.3.

The MTRNN consists of interconnected subnetworks to which dynamics with different timescales are assigned. Each subnetwork takes the form of a fully connected continuous- time recurrent neural network (CTRNN) with a specific time constant τ assigned for the purposes of neural activation dynamics, as can be seen in Eq. 17 in section 5.5. The model shown in Figure 9.3 is composed of subnetworks with slow,

Slow

Intermediate

Propriomodule

Visionmodule

Fast

Bott

om-u

p er

ror

regr

essio

n

Top-

dow

n pr

edic

tion

SetUpdate

Intention state

CompositionalGenerations

Action A

Action B

Error

Fast

Pool for Primitives

Intermediate

Init B

Slow

Top-down Generation

Action PlansInit A

Vt+1Pt+1

Pt+1

VtPt

motort+1Vt+1

Error

Figure 9.3. The multiple- timescale recurrent neural network (MTRNN) model. The left panel shows the model architecture and the right panel the information flow in the case of top- down generation of different compositional actions, Action A and Action B as triggered by the corresponding intention in terms of initial states of Init A and Init B in the intention units, respectively.


204

intermediate, and fast dynamics characterized by the leaky- integrator neural units with larger, medium, and smaller values of τ, respectively. Additionally, the subnetwork with fast dynamics is subdivided into two peripheral modular subnetworks for proprioception/ motor operations and for vision. Our expectation in the proposed multiple timescales architecture was that the slow dynamic subnet using large time constant leaky- integrator units should be good at learning long- time correlation, as indicated by Jaeger and colleagues (2007), whereas the fast dynamics one should be good at learning precise short- ranged patterns.

We designed this particular model to generate targets of multiple per-ceptual sequences that contain a set of primitives or chunks acquired as a result of supervised learning. In this case, we made use of the sensitiv-ity of the dynamics toward initial conditions seen in nonlinear dynamics (see section 5.1) as a mechanism for selecting a specific sequence from among multiple learned ones as intended one. The network dynamic always starts with the same neutral neural states for all units, with the exception of some neural units in the subnetwork with slow dynam-ics referred to as intention units. By providing specific initial states for these intention units, corresponding learned perceptual sequences of intended can be regenerated, and thus the initial states of the intention units play the role of selecting sequences, similar to the role of PB vec-tors in RNNPB models. The difference is that selection in the case of PB is based on parametric bifurcation, while in the case of intention units in MTRNNs this is performed by utilizing the sensitivity of the network dynamics to the initial conditions. We decided to employ a switching scheme based on sensitivity to the initial conditions for the MTRNN because this feature affords learning of sequence patterns with a long time correlation.

Adequate mappings between the respective initial states of the inten-tion units and the corresponding perceptual sequences are acquired by means of the error backpropagation through time learning scheme applied for CTRNN (Eq. 18 in section 5.5). In the course of error backpropagation learning, two classes of variables are determined, namely the connection weights in all subnetworks and a specific set of initial state values for the intention units for each perceptual sequence to be learned. When learning is commenced, the initial state of the intention units for each training sequence is set to a small random value. The for-ward top- down dynamics initiated with this temporarily set initial state generates a predictive sequence for the training visuo- proprioceptive


205

sequence. The error generated between the training sequence and the output sequence is back- propagated along the bottom- up path through the subnetworks with fast and intermediate dynamics to the subnetwork with slow dynamics, and this backpropagation is iterated backward through time steps via recurrent connections, whereby the connection weights within and between these subnetworks are modified in the direction of minimizing the error signal. The error signal is also back- propagated through time steps to the initial state of the intention units, in which the initial state values for each training sequence are modified. Here, we see again that learning proceeds through dense interactions between top- down regeneration of the training sequences and bottom- up regression of the regenerated sequences utilizing error signals, just as the RNNPB does.

One point to keep in mind here is that the dampening of the error signal in backward propagation though time steps depends on the time constant as described previously (see Eq. 18 in section 5.5). It becomes smaller within the subnetwork with slow dynamics (characterized by a larger time constant) and greater within the subnetwork with fast dynamics (characterized by a smaller time constant). This forces the learning process to extract the underlying correlations spanning lon-ger periods of time in the training sequences in the parts with slower dynamics and correlations spanning relatively shorter periods of time in the parts with faster dynamics in the whole network.

The right panel of Figure 9.3 illustrates how learning multiple per-ceptual sequences consisting of a set of primitives results in the devel-opment of the corresponding functional hierarchy. First, it is assumed that a set of primitive patterns or chunks should be acquired in the subnetworks with fast and intermediate dynamics through distributed representation. Next, a set of trajectories corresponding to slower neural activation dynamics should appear in the subnetwork with slow dynam-ics in accordance with the initial state. This subnetwork, of which activ-ity is sensitive to the initial conditions, induces specific sequences of primitive transitions by interacting reciprocally with the intermediate dynamics subnetwork. In the slow dynamics subnetwork, action plans are selected according to intention and are passed down to the interme-diate dynamics subnetwork for fluid composition of assembled primi-tives in the fast dynamics subnetwork. It is noted that change in the slow dynamic activity plays a role of parameter bifurcation for the intermedi-ate and fast dynamics to generate transitions of primitives.


206

As another function, MTRNNs can generate motor imagery by feeding predicted visuo- proprioceptive states into future inputs, anal-ogous to the closed- loop forward dynamics of the RNNPB. Diverse motor imagery can be generated by manipulating the initial state of the intention units. By this means, our robots with MTRNN can become self- narrative about own possibility, as described later. Additionally, MTRNNs can perform both offline and online recognition of per-ceptual sequences by means of error regression, as in the case of the RNNPB model. For example, prediction errors caused by unexpected visual sensory input due to certain changes in the environment are back- propagated from the visual module of the fast dynamics subnet-work through the one with intermediate dynamics to the intention units in the slow dynamics subnetwork, whereby the modulation of the activity of the intention units in the direction of minimizing the errors results in the adaptation of the currently intended action to match the changed environment. These functions have been evaluated in a set of robotics experiments utilizing this model, as described later in this chapter.

9.1.2 Correspondence with Neuroscience

Now, let’s revisit our previous discussions and examine briefly the corre-spondence of the proposed MTRNN model to concepts in system- level neuroscience. Because the neuronal mechanisms for action generation and recognition are still puzzling due to clear conflicts between differ-ent experimental results, as discussed in chapter 4, the correspondence between the MTRNN model and parts of the biological brain can be investigated only in terms of plausibility at best. First, as shown by Tanji and Shima (1994), there is a timescale difference in the buildup of neu-ral activation dynamics between the supplementary motor area (with slower dynamics spanning timescales of the order of seconds) and M1 (with faster dynamics of the order of a fraction of a second) immediately before action generation (see Figure 4.5), and therefore our assumption that the organization of a functional hierarchy involves timescale differ-ences between regional neural activation dynamics should make sense in modeling the biological brain. Considering this, Kiebel and colleagues (2008), Badre and D’Esposito (2009), and Uddén and Bahlmann (2012) proposed a similar idea to explain the rostral– caudal gradient of times-cale differences by assuming slower dynamics at the rostral side (PFC)


207

and faster dynamics at the caudal side (M1) in the frontal cortex to account for a possible functional hierarchy in the region.

Accordingly, the MTRNN model assumes that the subnetwork with slow dynamics corresponds to the PFC and/ or the supplementary motor area, and that the modular subnetwork with fast dynamics corresponds to the early visual cortex in one stream and to the premotor cortex or M1 in another stream (Figure 9.4).

The subnetwork with moderate dynamics may correspond to the parietal cortex, which can interact with both the frontal part and the peripheral part. One possible scenario for the top- down pathway is that the PFC sets the initial state of activations with slow dynamics assumed in the supplementary motor cortex, which subsequently propagates to the parietal cortex assumed to exhibit moderate- timescale dynamics. Activations in the parietal cortex propagate further into peripheral cor-tices (the early visual cortex and the premotor or primary motor cortex), whereby detailed predictions of visual sensory input and propriocep-tion are made, respectively, by means of neural activations with fast dynamics.

On the other hand, prediction errors generated in those periph-eral areas are propagated backward to the forebrain areas through the parietal cortex via bottom- up error regression in both learning and recognition, assuming of course that the aforementioned retro-grade axonal signaling mechanism of brains implements the error

IntentionMotor (Fast)

Vision (Fast)

Parietal (Medium)

Error

PFC/SMA (Slow)

Error

Figure 9.4. Possible correspondence of MTRNN to parts of the biological brain. The solid line represents the top- down prediction pathway (from PFC/ SMA, Parietal to Motor and Vision), and the dotted line represents the bottom- up error regression pathway (from Vision, Parietal to PFC/ SMA).


208

backpropagation scheme (see section 5.5.). In this situation, the pari-etal cortex wedged between the frontal and peripheral parts plays the role of an information hub that integrates multiple input modalities and motor outputs with the current intention for action. It has been speculated that populations of bimodal neurons in the parietal cortex, which have been shown to encode multiple modalities of information processing, such as vision and motor outputs (Sakata et al., 1995) or vision and somatosensory inputs (Hyvarinen & Poranen, 1974), are the consequence of synaptic modulation accompanied by top- down prediction and bottom- up error regression in the iterative learning of behavioral skills.

It is worth pausing here a moment to think about what the initial states actually mean in the brain. Because the initial states unfold into sequences of behavior primitives, which are expanded into target pro-prioceptive sequences and finally into motor command sequences, it can be said that motor programs can be represented by the initial states of particular neural dynamics in the brain. Coincidentally, as I was writing this section, Churchland and colleagues published new results from monkey electrophysiological experiments that support this idea (Churchland et al., 2012). They conducted simultaneous recordings of multiple neurons in the motor and premotor cortices while monkeys repeatedly reached in varying directions and at various distances. The collective activities of neuron firings were plotted into two- dimensional state space from their principal components, in the same way Churchland used before (see Figure 4.12).

A nontrivial finding was that, after movement onset, the neural acti-vation state exhibited a quasirotational movement in the same direction but with different phase and amplitude in the two- dimensional state space for each different case of reaching. The differences in the develop-ment of the neural activation state were due to differences in their ini-tial state at the moment of movement onset. Churchland and colleagues interpreted this as follows: The preparatory activity sets the initial state of the dynamic system for generating quasirotational trajectories and their subsequent evolution produces the corresponding movement activity. Their interpretation is quite analogous to the idea Yamashita and I proposed: Motor programs might be represented in terms of the initial states of particular neural dynamical systems. The next section describes a robotics experiment pursuing this line of reasoning utilizing the MTRNN model.


209

9.2. Robotics Experiments on Developmental Training of Complex Actions

This section shows how the MTRNN model can be used in humanoid robot experiment tasks on learning and generating skilled action.

9.2.1 Experimental Setup

I conducted the following studies to investigate how a humanoid robot can acquire skills for performing complex actions by organizing a func-tional hierarchy in the MTRNN through interactive tutoring processes (Yamashita & Tani, 2008; Nishimoto & Tani, 2009). A small humanoid robot QRIO was trained on a set of object manipulation tasks in parallel through iterative guidance provided by a teacher. The robot could move its arms by activating joint motors with eight degrees of freedom (DOF) and was also capable of arm proprioception by means of encoder readings for these joints. The robot used a vision camera that could automatically track a color point placed in the center of the object. Therefore, reading the joint angles of the camera head (two DOF) represents visual sensory input corresponding to the object position. The robot was trained on three different tasks in sequence (shown in Figure 9.5), each of which

Home position Back to home

Task 1

Task 2

Task 3

Move up and down Move left and right

Move forward and back Touch by each hand

Touch by both hands Rotate in the air

Figure 9.5. A robot trained on three behavioral tasks, each of which is composed of a sequence of behavior primitives. After the third session, Task 3 was modified, as illustrated by the dotted lines. Adopted from Nishimoto and Tani (2009) with permission.


210

consisted of sequential combinations of different cyclic movement pat-terns of actions applied to the object.

The training was conducted interactively in cycles of training ses-sions, meaning that the arms were physically guided to follow adequate trajectories while the robot attempted to generate its own trajectories based on its previously acquired skills. In this sense, it can be said that the actual training trajectories were “codeveloped” by the teacher and the robot. Through this physical guidance, the robot eventually per-ceived a continuous visuo- proprioceptive (VP) flow without explicit cues for segmenting the flow into primitives of movement patterns. In the course of developmental learning, the robot was trained gradu-ally in the course of five sessions. During each session, all three tasks were repeated while introducing changes in the object position, and the network was trained with all training data obtained during the ses-sion. After each training session, offline training of the MTRNN was conducted by utilizing the VP sequences obtained in the process of guidance, in which the connection weights and the initial states of the intention units for all task sequences were updated. Subsequently, the performance of both the open- loop physical behavior and the closed- loop motor imagery was tested for all three tasks. Novel movement pat-terns were added to one of the tasks during the development process for the purpose of examining the capability of the network for incremental learning of new behavioral patterns (see Task 3 in Figure 9.5).

The employed MTRNN model consisted of 36 units with fast dynamics for vision and 144 units with fast dynamics for propriocep-tion (τ = 1.0), 30 units with intermediate dynamics (τ = 5.0), and 20 units with slow dynamics (τ = 70.0). The units with slow and inter-mediate dynamics were fully interconnected, as were all the units with fast and moderate dynamics, whereas the units with slow and fast dynamics were not connected directly. It was assumed that this kind of connection constraint would allow functional phenomena such as infor-mation bottlenecks or hubs to be developed in the subnetwork with intermediate dynamics.

9.2.2 Results

The developmental learning of multiple goal- directed actions success-fully converged after five training sessions, even in the case of Task 3, which was modified with the addition of a novel primitive pattern after


211

the third session. The developmental process can be categorized into sev-eral stages, and Figure 9.6 shows the process for Task 1 for the first three sessions. Plots are shown for the trained VP trajectories (left), motor imagery (middle), and actual output generated by the robot (right). The profiles for the units with slow dynamics in the motor imagery and the actual generated behavior were plotted for their first four principal com-ponents after conducting principal component analysis (PCA).

In the first stage, which mostly corresponds to Session 1, none of the tasks were accomplished, as most of the actually generated movement patterns were premature, and the time evolution of the activations of the units with slow dynamics was almost flat. In the second stage, cor-responding to Session 2, most of the primitive movement patterns were actually generated, showing some generalization with respect to changes in object position, although correct sequencing of them was not yet complete. In the third stage, corresponding to Session 3 and subsequent sessions, all tasks were successfully generated with correct sequencing of the primitive movement patterns and with good generalization with respect to changes in object position. The activations of units with slow dynamics became more dynamic compared with previous sessions in the case of both motor imagery and generation of physical actions. In sum-mary then, the level responsible for organization of primitive movement patterns was developed during the earlier period, and the level respon-sible for the organization of these patterns into sequences developed in later periods.

One important point I want to make here is that there was a lag between the time when the robot became able to generate motor imagery and the time when it started generating actual behaviors. Motor imag-ery was generated earlier than the actual behavior, as it was observed that the motor imagery for all tasks was nearly complete by Session 2, as compared to Session 3 in the case of actual generated behaviors. This outcome is in accordance with the arguments of some contempo-rary developmental psychologists, such as Karmiloff- Smith (1992) and Diamond (1991), who consider that 2- month- old infants already possess intentionality toward objects they wish to manipulate, although they cannot reach or grip them properly due to the immaturity of their motor control skills. Moreover, this developmental course of the robot’s learn-ing supports the view of Smith and Thelen (2003) that development is better understood as the emergent product of many local interactions that occur in real time.

212

Figure 9.6. Development of Task 1 for the first three sessions with trained VP trajectories (left), motor imagery (middle), and actual generated behavior, accompanied by the profiles of units with slow dynamics, after conducting principal component analysis. (a) Session 1, (b) Session 2, (c) Session 3. Adopted from Nishimoto and Tani (2009) with permission.


213

Another interesting observation taken from this experiment was that the profiles of the training trajectories also developed across sessions. It can be seen that the training trajectories in Session 1 were quite dis-torted. Training patterns such as UD (moving up and down) in the first half and LR (moving left and right) in the second half did not form regular cycles. This is a typical case when cyclic patterns are taught to robots without using metronome- like devices. However, it can be seen that cyclic patterns in the training process became much more regular as the sessions proceeded. This is due to the development of limit- cycle attractors in the MTRNN that shaped the trajectories trained through direct guidance into more regular cyclic ones via physical interactions.

This result shows a typical example of the codevelopment process undertaken by the robot and teacher whereby the robot’s internal struc-tures develop via dense interactions between the top- down intentional generation for the robot’s movement and the bottom- up recognition of the teacher’s intention for guiding the robot’s movement. The interac-tion modifies not only the robot’s action but also the teacher’s. When I tried to guide physically the robot’s arms to move slightly differently from its own movement by grasping the arms, I became aware of its intention of persistence by some resistance force perceived in my hands. This modified my teaching intention and the resultant trajectory of guidance to some degree. In this sense, it can be said that the robot’s behavior trajectory and my teaching trajectory codeveloped during the experiment.

Next, let’s see how neurodynamics with different timescales success-fully generates sets of action tasks consisting of multiple movement pat-terns. Figure 9.7 shows how the robot behaviors were generated in the test run after five training sessions.

First, as can be seen in the first and second rows in Figure 9.7, VP trajectories for trained robots were successfully generated for all three tasks accompanied by changes in the cyclic movement patterns. Looking at the activation dynamics of the units with intermediate dynamics (shown in the fourth row) after conducting PCA, it is clear that their dynamics are correlated with the VP trajectories.

However, the activation dynamics of the units with slow dynamics, which started from different initial states for each of the three tasks, developed to be uncorrelated with the VP or the trajectories of the units with intermediate dynamics (see the bottom row). Also, the profiles changed drastically as the movement patterns changed. However, the

214

UD LR

Step Step Step

(a) (b) (c) BG ROFB TchLR

Teac

hG

ener

atio

nSl

ow u

nits

Inte

rmed

iate

uni

ts

Inte

rmed

iate

uni

ts

Inte

rmed

iate

uni

ts

Teac

hG

ener

atio

nSl

ow u

nits

Teac

hG

ener

atio

nSl

ow u

nits

PC1

PC2PC3PC4

Prop1Prop2Vision1

Vision2

1.00.8

0.60.40.2

0.00 50 100 150 200 250 300 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300

1.00.8

0.60.40.2

0.0

1.00.8

0.60.40.2

0.0

1.00.8

0.60.40.2

0.0

1.00.8

0.60.40.2

0.0

1.00.8

0.60.40.2

0.00 50 100 150 200 250 300 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300

2

1

0

–1

–2

2

1

0

–1

–2

2

1

0

–1

–2

2

1

0

–1

–2

2

1

0

–1

–2

2

1

0

–1

–2

0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300

0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300

Figure 9.7. Visuo- proprioceptive trajectories (two normalized joint angles denoted as Prop 1 and Prop 2 and the camera direction denoted as Vision 1 and Vision 2) during training and actual generation in session 5 accompanied by activation profiles of intermediate and slow units after principal component analysis, denoted as PC 1– 4. (a) Moving up and down (UD) followed by moving left and right (LR) in Task 1, (b) moving forward and backward (FB) followed by touching by left hand and right hand (TchLR) in Task 2, (c) touching by both hands (BG) followed by rotating in air (RO) in Task 3. Adopted from Nishimoto & Tani (2009) with permission.


215

transitions were still smooth, unlike in the case of gate opening or PB, which were accompanied by stepwise changes, as described in the previ-ous section. Such drastic but smooth changes in the slow context profile were tailored by means of dense interactions between the top- down forward prediction and the bottom- up error regression. The bottom- up error regression tends to generate rapidly changing profiles at the moment of switching, whereas the top- down forward prediction tends to generate only slowly changing profiles because of its large time con-stant. The collaboration and competition between the two processes result in such natural, smooth profiles. After enough training, all actions are generated unconsciously because no prediction error is generated in the course of well- practiced trajectories unless encountered with unex-pected events such as dropping the object.

Further insight was obtained by observing how the robot managed to generate action when perturbed by receiving external inputs. In Task 1, the experimenter, by pulling the robot’s hand slightly, could induce the robot to switch action primitives from moving up and down to moving left and right earlier than four times cycling as it had been trained. This implies that counting at the higher level is more like an elastic dynamic process rather than a rigid logical computational one, which could be modulated by external inputs like being pulled by the experimenter. An interesting observation was that the action primitive of moving up and down was smoothly connected to the next primitive of moving the object to the left, which took place right after locating the object on the floor, even though the switch was made after incorrect times of cycling. The transitions never took place half way of ongoing primitive and were made at the same connection point as always regardless of incorrect times of cycling at the transition.

This observation suggests that the whole system was able to generate action sequences with fluidity and flexibility by adequately arbitrating between the higher level that has been trained to count specific times before switching and the lower level that has been trained to connect one primitive to another at the same point. In the current observation, the intention from the higher level was elastic enough to give in for incorrect times of counting against the bottom- up force exerted by the experi-menter, whereas the lower level was successful at connecting the first primitive to the second one at the same point as having been trained. Our proposed dynamic system’s scheme can allow this type of dynamic con-flict resolution between different levels by letting them interact densely.


216

9.3. Summary

This chapter was entirely dedicated to an examination of functional hierarchy by exploring the potential of the MTRNN model. The exper-imental results suggest that sequences of primitives are abstractly rep-resented in the subnetwork consisting of units with slow dynamics, whereas detailed patterns of behavior primitives are generated in the subnetworks consisting of units with fast and intermediate dynamics. We can conclude that a sort of “fluid compositionality” for smooth and flexible generation of actions is achieved through self- organization of a functional hierarchy by utilizing the timescale differences as well as the structural connectivity among different levels in the proposed MTRNN model. These findings provide a possible explanation for how different functional roles can be assigned in different regions in brains (i.e., the PFC for creating abstract actional plans and the parietal cortex for com-posing sensory– motor details).

Such assignments in the brain may not be tailor made by a genome program, but result as a consequence of self- organization via develop-ment and learning under various structural constraints imposed on the anatomy of the brain, including connectivity among local regions with bottlenecks and timescale differences in neuronal activities. This can be accounted by a well- known conception in complex adaptive systems, known as downward causation (Campbell, 1974; Bassett & Gazzaniga, 2011) which denotes causal relationship from the global to local parts. It can be said that the functional hierarchy emerges by means of the upward causation in terms of collective neural activity both in the for-ward activation dynamics and the error backpropagation which are con-strained by the downward causation in terms of timescale difference, network topology, and environmental interaction. The observed fluid compositionality that has been metaphorically expressed as “kinetic melody” by Luria should be resulted from this. It was also shown that the capability of abstraction through hierarchy in MTRNN can provide robots with competency of self- narrative for own actional intention via mental simulation. Reflective selves of robots may start from this point.

Readers may ask a crucial question. Can the time constant parameters in the MTRNN be adapted via learning, or do they have to be set by the experimenters as in the current version? Hochreiter and Schmidhuber (1997) proposed the “long- term and short- term memory” RNN model,


217

which is characterized by its dynamic memory mechanism imple-mented in “memory cells.” A memory cell can keep its current dynamic state for arbitrarily long time steps without specific parameter setting by means of its associated adaptive gate opening– closing mechanisms learned via the error backpropagation scheme. If the memory cells were allocated in multiple levels of subnetworks, it would be interesting to examine whether a functional hierarchy can be developed by organizing the long- term memory in the higher level and the shorter memory in the lower level. Actually, the MTRNN model was originally developed with a time- constant adaptation mechanism by using a genetic algo-rithm (Paine & Tani, 2005). Simulation experiments on robot naviga-tion learning using this model showed that a functional hierarchy for navigation control of the robot was developed by evolving slower and faster dynamics structures between two levels of the subnetworks, pro-vided that a bottleneck connection is prepared between them.

Some may consider that brains should involve also a spatial hierarchy as having been evidenced in the accumulated studies on the visual rec-ognition pathway (see section 4.1). Also, Hasson and colleagues (Hasson et al., 2008) suggested development of spatiotemporal hierarchy in human visual cortex. In response to this concern, our group (Jung et al., 2015; Choi & Tani, 2016) has recently shown that a spatiotemporal hier-archy can be developed successfully in a neurodynamic model referred to as multiple spatiotemporal neural network (MSTNN) for the rec-ognition as well as generation of compositional human action sequence patterns, as represented in pixel level video images when both spatial and temporal constraints are applied to neural activation dynamics in multiple scales for different levels. Furthermore, MSTNN and MTRNN has been integrated in a simulated humanoid robot platform (Figure 9.8) by which the simulated robot becomes able to generate object manipula-tion behaviors corresponding to visually demonstrated human gesture via end- to- end learning from the video image inputs to the motor out-puts (Hwang et al., 2015. As the result of end- to- end learning for vari-ous combinations of the gesture patterns and the corresponding motor outputs for grasping different shape of objects, it was found that the intentions for grasping different objects can be developed in the PFC subnetwork characterized by its slowest time scale in the whole network.

Going back to the robotics experiment using the MTRNN, we observed that actions could be generated compositionally depending on


218

the initial states of the intention units. However, this naturally poses the question of how the initial state is set (Park & Tani, 2015). Is there any way that the initial state representing the intentionality for action could be self- determined and set autonomously rather than being set by the experimenter? This issue is related to the problem of the ori-gin of spontaneity or free will, as addressed in section 4.3. The next chapter explores this issue by examining the results from several syn-thetic robotics experiments while drawing attention to possible corre-spondences with the experimental results of Libet (1985) and Soon and colleagues (2008).

Intention to manipulate speci�ed object

Cat

egor

izing

hum

an g

estu

re(a)

Dynamic visionInout (VI)

MSTNNFast

MSTNNSlow

MTRNNSlow

MTRNNFast

Motor output

Attention control

Higher level (PFC)Very slow

time

(b)

Figure 9.8. A simulated humanoid robot learns to generate object manipulation behaviors as specified by human gesture demonstrated to the robot by video image. (a) Task space and (b) the integrated model of MSTNN for video image processing and MTRNN for dynamic motor pattern generation.

219

219

10

Free Will for Action and Conscious Awareness

We first explore how intentions for actions can be generated spon-taneously in higher cognitive brain areas by reviewing our robotics experiments. As I wrote in section 4.3, Libet (1985) demonstrates that awareness of intention is delayed, a result later confirmed by Soon and colleagues (2008). Later sections investigate this problem by clarifying causal relationships shared by free will and consciousness.

10.1. A Dynamic Account of Spontaneous Behaviors

Although we may not be aware of it, our everyday life is full of spon-taneity. Let’s take the example of the actions involved in making a cup of instant coffee, something we are all likely to be very familiar with. After I’ve put a spoonful of coffee granules in my mug and have added hot water, I usually add milk and then either add sugar or not, which is rather unconsciously determined. Then frequently, I only notice later that I actually added sugar when I take the first sip. Some parts of these action sequences are defined and static―I must add the coffee granules and hot water―but other parts are optional, and this is where I can see


220

spontaneity in the generation of my own actions. A similar comment can be made about improvisations in playing jazz or in contemporary dance, where musical phrases or body movement patterns are created freely and on the spot in an unpredictable manner.

It seems that spontaneity appears not within a chunk but at junctions between chunks in behavior streams. Chunks are behavior primitives, such as pouring hot water into a mug or repeating musical phrases, which are presumably acquired through practice and experience, as I have mentioned many times already. Junctions between behavior primitives are weaker relationships than within primitives themselves because junctions appear less frequently than primitives in repeated behavioral experiences. Actually, psychological observations of child development as well as adult learning have suggested that chunk structures can be extracted through statistical learning with a sufficiently large number of perceptual and behavioral experiences (e.g. Klahr et al., 1983; Saffran et al., 1996; Kirkham et al., 2002; Baldwin et al., 2008). Here, the term “chunk structures” denotes repeatable patterns of action sequences as unified “chunks” and takes into account the probabilistic state transi-tions between those chunks, “junctions.”

One question essential to the problem of free will arises. How is it that subsequent chunks or behavior primitives can be considered to be freely selected if one simply follows a learned statistical expectation? If we consider someone who has learned that the next behavior primitive to enact in a certain situation is either A or B, provided that past expe-rience defines equal probabilities for A and B, it is plausible that either of the primitives might be enacted, so there is at least the apparent potential for freely chosen action in such instances. However, following the studies by Libet (1985) and Soon and colleagues (2008) discussed in section 4.3, voluntary actions might originate from neural activities in the supplementary motor area, prefrontal cortex, or parietal cortex, and in no case are these activities accompanied by awareness. Thus, even though one might believe that the choice of a particular action from among multiple possibilities (e.g., primitives A, B, and C) has been entirely conscious, in fact this apparently conscious decision has been precipitated by neural activity not subject to awareness, and indeed free will seems not so freely determined, at all.

Our MTRNN model can account for these results by assuming that neural activities preceding apparently freely chosen actions are represented by the initial states of the intentional units located in the

Free Will for Action and Conscious Awareness 221

221

network with slow dynamics. However, this explanation generates fur-ther questions: (1) how are the values of the initial states set for ini-tiating voluntary actions, and (2) how can conscious awareness of the decision emerge with delay? To address these problems, my colleagues and I conducted some neurorobotics experiments involving the statisti-cal learning of imitative actions (Namikawa et al., 2011). The following experimental results highlight the role of cortical itinerant dynamics in generating spontaneity.

10.1.1 Experiment

A humanoid robot was trained to imitate actions involving object manipulation though direct guidance by an experimenter. The setup used for the robot and the way its movements were guided were the same as in our experiment described section 9.2 (and in Yamashita and Tani, 2008). The target actions to imitate are shown in Figure 10.1.

The target task to be imitated included stochastic transitions between primitive actions. The object was located on the workbench in one of three positions (left, center, or right), and the experimenter repeated primitive actions that consisted of picking up the object, moving it to one of the other two possible positions, and releasing it by guiding the hands of the robot while deciding the next object position randomly with equal probability (50%). This process generated 24 training

(a) Right to Left (50%)

Left to Right (50%) Horizontal

Vert

ical

Right to Center (50%) Left to Center (50%)

Center to Right (50%) Center to Left (50%)

(b)

–1.0 1.0–1.0

1.0

Figure 10.1. Object manipulation actions to be imitated by a Sony humanoid robot. (a) The task consists of stochastic transitions between primitive actions: moving an object to one of two possible positions with equal probability after reaching and grasping it. (b) Trajectory of the center of mass of the object as observed by using the robot’s vision system. Adopted from Namikawa et al. (2011) with PLoS Creative Commons Attribution (CC BY) license.


222

sequences, each of which consisted of 20 transitions between primi-tive actions, amounting to about 2,500 time steps of continuous visuo- proprioceptive sequences. The time constants of the employed MTRNN were set to 100.0, 20.0, and 2.0 for units with slow, intermediate, and fast dynamics, respectively. It is noted that in this experiment the lower level was assembled with a set of gated RNNs (Tani & Nolfi, 1999) that interacted directly with the visuo- proprioceptive sequences. The inter-mediate subnetwork controlled the gate opening by the outputs.

After the offline training of the network, the robot was tested on imi-tating (generating) each training sequence by setting it to the acquired initial state. Although the trained primitive action sequences were repro-duced exactly during the initial period consisting of several primitive action transitions, the sequences gradually started deviating from the learned ones. This was considered to be due to the sensitivity to the ini-tial conditions in the trained network. Statistical analysis conducted on the transition sequences generated over longer periods showed that the probabilities with which the transitions between the primitive actions were reproduced were quite similar to the ones to which the robot was exposed during the training period. The same analysis was repeated for cases of different transition probabilities of the target actions. When the transition probabilities for some of the target actions were changed to 25% and 12.5%, the same proportion of corresponding sequences were newly generated in each case. An analysis of the sequences produced by the trained network for each case showed that the transition prob-abilities of the reproduced actions mostly followed the target ones, with deviations of only a few percent. These results imply that the proposed model, although unable to learn to imitate the long visuo- proprioceptive sequences exactly, could extract the statistical structures (chunks) with their corresponding transition probabilities from these sequences.

Let’s now examine the main issue in this context, namely the ori-gin and indeterminacy of spontaneity in choosing subsequent primitive actions. Here, one might assume that the prevailing opinion is that spon-taneity is simply due to noise in the (external) physical world, which induces transitions between primitive actions represented by different attractors. The following experiment, however, shows that this is not the case. Furthermore, in examining whether the same statistical repro-duction could also be observed in the case of motor imagery rather than actual motor action, it turned out that the answer is affirmative. This turns out to be quite important because, as motor imagery is generated


223

deterministically in offline simulation without any contamination from external sensory noise, the observed stochastic properties should be due to some internally generated fluctuations rather than noise- induced perturbations. In other words, the spontaneity observed at junctions between chunks of action sequences seems to arise from within the robots and by way of processes perfectly consistent with results from Libet and Soon.

To gain some insight into this phenomenon, let’s look at the neural activation sequences in units with different timescales associated with the visuo- proprioceptive sequences during the generation of motor imagery, as shown in Figure 10.2.

Time steps

Slow dynamicsnetwork

Proprioception

Primitive action

Vision

Intermediate dynamicsnetwork

0

30

130

1

1

–1

1R R RL LC C C

–1

1000

Act

ivat

ion

Act

ivat

ion

Uni

t ID

Uni

t ID

Figure 10.2. Time evolution of neural activities associated with visuo- proprioceptive sequences in motor imagery. Capital letters shown in the first panel denote primitive actions executed (R: moving to right, L: moving to left and C: moving to center). Plots in the first panel and in the second panel show predicted vision and proprioception outputs, respectively. Plots in the third and fourth panels show, with different shades of gray, the activities of 30 neural units in the subnetworks with intermediate and slow dynamics, respectively. Adopted from Namikawa et al. (2011) with PLoS Creative Commons Attribution (CC BY) license.


224

It can be seen that the neural activities in the subnetworks with intermediate and slow dynamics develop with their intrinsic timescale dynamics. In the plot of intermediate neural activity, it can be seen that its dynamic pattern repeats for the same action primitive generated. On the other hand, in the plot of slow dynamics one, neither of such apparent regularity nor apparent repeated patterns of activity can be observed.

To examine the dynamic characteristics of the networks, a dynamic measure known as the Lyapunov exponent was calculated for the activity of each subnetwork. The Lyapunov exponent is a multidimensional vec-tor that indicates the rate of divergence of adjacent trajectories in a given dynamic system. If the largest component of this vector is positive, this indicates that chaos is generated by means of the stretching and folding mechanism described in section 5.1. In the analysis, it was calculated that the maximum Lyapunov exponent was positive for the subnetwork with slow dynamics and negative for the subnetworks with intermediate and fast dynamics. The results were repeatable for different runs of training of the network, implying that chaos emerged in the subnetwork with slow dynamics but not in the other subnetworks. Therefore, deterministic chaos emerging in the subnetwork with slow dynamics might affect the subnet-works with intermediate and fast dynamics, generating pseudostochastic transitions between primitive action sequences. Readers may see that this result corresponds exactly with the aforementioned idea (illustrated in Figure 6.2), that chaos in the higher level network can drive compositional generation of action primitives, which are stored in the lower level as well as with what Braitenberg’s Vehicle 12 predicted in section 5.3.2.

To clarify the functional role of each subnetwork, we next conducted an experiment involving a “lesion” artificially created in one of the sub-networks. The trajectory of the manipulated object generated as visual imagery by the original intact network was compared with the one gen-erated by the same network but with a lesion in the subnetwork with slow dynamics (Figure 10.3).

A complex trajectory wandering between the three object positions was generated in the case of the intact network, whereas a simple tra-jectory of exact repetitions of moving to left, to right, and to center was generated in the case of the “lesion” in the slow dynamics subnet-work. This implies that the lesion in the subnetwork with slow dynam-ics deprived the network of the potential to spontaneously combine primitive actions.


225

10.1.2 Origin of Spontaneity

The results of the robotics experiments described so far suggest a pos-sible mechanism for generating spontaneous actions and their images in the brain. It is assumed that deterministic chaos emerging in the sub-network with slow dynamics, possibly corresponding to the prefrontal cortex, might be responsible for spontaneity in sequencing primitive actions by destabilizing junctions in chunk structures. This agrees well with Freeman’s (2000) speculation that intentionality is spontaneously generated by means of chaos in the prefrontal cortex. The isolation of chaos into the prefrontal cortex would make sense because the robust-ness of generation of physical actions would be lost if chaos governs the whole cortical region. Also, such isolation of chaos in the higher level of the organized functional hierarchy in the brain might afford the estab-lishment of two competencies essential to cognitive agency, namely free selection and combination of actions, and their robust execution in an actual physical environment.

Our consideration here is analogous to William James’ consideration for the mechanism of free will, as was illustrated in Figure 3.4. He con-sidered that multiple alternatives can be regarded as accidental generations with spontaneous variation from memory consolidating various experi-ences, in which one alternative is eventually selected as the next action.

–1

(a) (b)

–1

1

–1

1

1 –1 1

Figure 10.3. Comparison of behaviors between an intact network and a “lesioned” network. Trajectories of the manipulated object (a) generated as visual imagery by the original intact network and (b) generated by the same network but with a “lesion” in its subnetwork with slow dynamics. Adopted from Namikawa et al. (2011) with PLoS Creative Commons Attribution (CC BY) license.


226

Chaos present at a higher level of the brain may account for this “acciden-tal” generation with spontaneous variation. Also, his metaphoric refer-ence to substantial parts as “perchings” and transient parts as “flights” in theorizing the stream of consciousness might be analogous to the chunk structures and their junctions apparent in the robotics experiments described in section 3.5. What James referred to as intermittent transi-tions between these perches and flights might also be due to the chaos- based mechanism discussed here. Furthermore, readers may remember the experimental results of Churchland and colleagues (2010) showing that the low- dimensional neural activity during the movement prepara-tory period exhibits greater fluctuation before the appearance of the tar-get and a more stable trajectory after its appearance. Such fluctuations in neuronal activity, possibly due to chaos originating in higher levels of organization, might facilitate the spontaneous generation of actions and images.

Here, one thing to be noted is that wills or intentions which are spon-taneously generated by deterministic chaos are not really “freely” gener-ated because they are generated by following the deterministic causality of internal states. They may look as if generated with some randomness, because the true internal state is not consciously accessible. If we observe action sequences in terms of categorized symbol sequences, they turn out to be probabilistic sequences as explained by symbolic dynamics (see section 5.1). Mathematically speaking, complete free will without any prior causality may not exist. But, it may feel as if free will exists when one has a limited awareness of underlying casual mechanisms.

Now, I’d like briefly to discuss the issue of deterministic dynamics versus probabilistic processes in modeling spontaneity. The unique-ness of the current model study lies in the fact that deterministic chaos emerges in the process of imitating probabilistic transitions of action primitives, provided that sufficient training sequences are used to induce generalization in learning. This result can be understood as a reverse of the ordinary way of constructing the symbolic dynamic in which deter-ministic chaos produces probabilistic transitions of symbols as shown in chapter 5. The mechanism is also analogous to what we have seen about the emergence of chaos in conflicting situations encountered by robots, as described in section 7.2.

We might be justified in asking why models of deterministic dynamic systems are considered to be more essential than models of stochas-tic processes, such as Markov chains (Markov, 1971). A fundamental


227

reason for this preference is that models of deterministic dynamical systems more closely represent physical phenomena that take place in continuous time and space, as argued in previous sections. In contrast, Markov chain models, which are the most popular schemes for mod-eling probabilistic processes, employ discrete state representations by partitioning the state space into substates. The substates are assigned nodes with labels, and the possible state transitions between those states are denoted by arcs, as in the case of a finite- state machine (FSM). The only difference with an FSM is that arcs represent transition probabili-ties rather than deterministic paths. In such a discretization scheme, even a slight mismatch between the current state of the model and any inputs from the external environment can result in a failure to match. When inputs with unexpected labels arrive, Markov chain models just halt and refuse to accept the inputs. On the other hand, at the very least, dynamical system models can avoid such catastrophic events as their dynamics develop autonomously. The intrinsic fuzziness in represent-ing levels, primitives, and intentions in dynamical system models, such as MTRNN, could develop robustness and smoothness in interactions with the physical world.

10.1.3 Creating Novel Action Sequences

My colleagues and I investigated the capability of MTRNNs in generat-ing diverse combinatorial action sequences by means of chaos developed via the tutored learning of a set of trajectories. In such experiments, we often observed that MTRNNs generated novel movement patterns by combining prior learned segments in mental simulation as well as in actual behaviors (Arie et al., 2009; Arie et al., 2012).

In one such humanoid robot experiment involving an object manipu-lation task, we employed an extended MTRNN model that can cope with dynamic visual images at pixel levels (Arie et al., 2009). In this exten-sion, a Kohonen network model was used for preprocessing of the pixel level visual pattern similar to the model described in section 7.2. The pixel pattern received at each step was fed into the Kohonen network as a two- dimensional topological map and the low- dimensional winner- take- all activation pattern of the Kohonen network units was input to the MTRNN, and the output of the MTRNN was fed into the Kohonen network to reconstruct the predicted image of the pixel pattern. In training for the object manipulation task, the robot was tutored on a set


228

of movement sequences for manipulating a cuboid object by utilizing the initial sensitivity characteristics in the slow dynamics context units (Figure 10.4). The tutored sequences were started from two different ini-tial conditions in which one initial condition was set as an object (a small block) that stood on the base in front of a small table (a large block). From this initial condition, the standing object was either moved to the right side by pushing, put on the small table by grasping, or laid down by hitting. The other initial condition was set as the same object laid on the base in front of the table. Then the object was either moved to the right or was put on the small table. The tutoring was repeated while the posi-tion of the object in the initial condition was varied.

After learning all tutored sequences, the network model was tested on the generation of visual imagery as well as actual action. It was observed that the model could generate diverse visual imag-ery sequences, or hallucinations, both for physically possible ones and impossible ones depending on the initial slow context state

Object standingon base(Initial condition 1)

Object laid on base(Initial condition 2)

(5) Put laid object on table

(4) Move laid object to right

(3) Lay down standing object

(2) Put standing object on table

(1) Move standing object to right

Time

Figure 10.4. A humanoid robot tutored on five different movement sequences starting from two different initial conditions of a manipulated object. Adopted from Arie et al. (2009) with permission.


229

(representing the intention). For example, as for the physically pos-sible case, the network generated an image of concatenating a partial sequence of laying down the standing object on the base and that of grasping it to put on the small table. On the other hand, an example of a physically impossible case involved a slight modulation of the afore-mentioned possible case. This impossible case involved laying down the standing object and then grasping the lying down object to put it on the small table standing up. Although it is physically impossible that the lying object suddenly stands up after being put on the table, this strange hallucination appeared because the prior learned partial sequence pattern of grasping the standing object and putting it on the table were wrongly concatenated in the image. In the test of actual action generation, the aforementioned physically possible one was suc-cessfully generated as shown in Figure 10.5.

The experimental results described here are analogous to the results obtained by using an RNNPB model. In section 8.2, it was shown that various action sequences including novel ones were generated by chang-ing the PB values in the RNNPB. In the current case using MTRNN, diverse sequential combinations of movement primitives including novel combinations were spontaneously generated by means of chaos or transient chaos organized in the higher level network. It can be said that these robots using RNNPB or MTRNN generated something novel by trying to avoid simply falling into own habitual patterns. It is noted again that novel images can be found in the deep memory developed with relational structure among experienced ones through long- term consolidative learning. An analogous observation has been obtained in robotics experiment using MTRNN on learning to generate composi-tional action sequences corresponding to observation of compositional

Time

Figure 10.5. The humanoid robot generated an action by spontaneously concatenating prior learned two- movement sequences of laying down the standing object on the base and grasping it to put it on the small table. Adopted from Arie et al. (2009) with permission.


230

gesture patterns (Park & Tani, 2015.) It was shown that novel action sequences can be adequately generated as corresponding to observation of unlearned gesture pattern sequences conveying novel compositional semantics after consolidative learning of the tutored exemplar which did not contain all possible combination patterns.

* * *

This is not the end of the story. An important question still remains unanswered. If we consider that the spontaneous generation of actional intentions mechanized by chaos in the PFC is the origin of free will, why is the awareness of a free decision delayed, as evi-denced by Libet’s (1985) and Soon’s (2008) experiments? Here, let us consider how we recognize our own actions in daily life. In the very beginning of the current chapter, I wrote that, after adding cof-fee granules and hot water, I “either add sugar or not, which is rather unconsciously determined” and then “only notice later that I actually added sugar when I take the first sip.” Indeed, in many situations one’s own intention is only consciously recognized when confronted with unexpected outcomes. This understanding, moreover, had led me to develop a further set of experiments clarifying the structural rela-tionships between the spontaneous generation of intention for action and the conscious awareness of these intentions by way of the results of said actions. The next section reviews this set of robotics experi-ments, the last one in this book.

10.2. Free Will, Consciousness, and Postdiction

This final section explores possible mechanisms accounting for the awareness of one’s own actional intentions by examining cases of con-flictive interactions taking place between the self and others in a robot-ics experiment. The idea is essentially this. In conflicting situations, spontaneously generated intentions are not completely free but modified so that the conflict can be reduced, and it is in this interplay that con-sciousness arises. To illustrate these processes, we conducted a simple robotics experiment. Through the analysis of the experimental result, we attempt to explain why free will can become consciously aware with delay immediately before the onset of the actual action.


231

10.2.1 Model and Robotics Experiment

In this experiment (Murata et al., 2015), two humanoid robots were used in which a robot, referred as the “self robot,” was controlled by an extended version of MTRNN and the other robot, referred as the “other robot” was teleoperated by a human experimenter. At each trial, after the right hand of the “other robot” was settled in the center position for a moment, the human experimenter commanded the robot to move the hand either to the left “L” or right “R” direction at random (by using a pseudo random generator). Meanwhile the “self robot” attempted to generate the same movement simultaneously by predicting the decision made by the “other robot.” This trial was repeated several times.

In the learning phase, the “self robot” was trained to imitate the ran-dom action sequences of either moving left or right demonstrated by the “other robot” through visual inputs. Because this part of the robot train-ing is analogous to the one described in the last section, it was expected that the robot could learn to imitate the random action sequences by developing chaos in the slow dynamics network of the MTRNN. In the test phase for interactive action generation with the “other robot,” the “self robot” was supposed to decide to move its hand either left or right spontaneously at each juncture. However, at the same time, it had to fol-low the movement of the “other robot” by modifying its own intention when its decision conflicted with the “other robot.” It is worth noting here that the chance of conflict is 50% because moving either left or right by the “other robot” is determined randomly.

Under the aforementioned task condition, we examined possible interactions between the top- down process for spontaneously gener-ating actional intention and the bottom- up process for modifying the intention by recognizing the perceptual reality by means of the error regression mechanism in the conflictive situation. The error regression was applied for updating the activation states of context units in the slow dynamics network over a specific time length of the regression win-dow in the immediate past. Specifically, the prediction errors for the visual inputs for l steps in the immediate past were back- propagated through time for updating the activation values of context units at the – lth step in the slow dynamics network toward minimizing those errors. This update reconstructs new image sequence in the regression window in the immediate past as well as prediction of future sequence by means of the forward dynamics in the whole network. This was all done using a


232

realization of the abstract model proposed in chapter 6 (see Figure 6.2) that can perform regression of immediate past and prediction of future simultaneously in online.

The test for robot action generation was conducted in comparison between two conditions, namely with and without using the error regression scheme. Figure 10.6a and b show examples of the robot tri-als with open- loop one- step prediction, as observed in the experiments both without and with using the error regression, respectively. Both cases were tested with the same conflictive situation wherein the inten-tion of the “self robot” in terms of the initial state of the slow context units was set so that an action sequence LLRRL was anticipated while the “other robot” actually generated an action sequence RRLLR. The profiles of one- step sensory prediction (two representative joint angles of the “self robot” and two- dimensional visual inputs representing the hand position of the “other robot”) are shown in the first row, the online prediction error is shown in the second row, and the slow context and fast context activity are shown in the third and fourth rows, respec-tively. The dotted vertical lines represent the decision points.

It was observed that the case of one- step prediction without using the error regression was significantly poorer as compared with the one with the error regression. In fact, the prediction error became significantly large at the decision point. In this situation, the movement of the “self robot” became erratic. Although the “self- robot” seemed to try to fol-low the movements of the “other robot” by using the sensory inputs, its movements were significantly delayed. Furthermore, at the point of the fourth decision the “self robot” moved its arm in the direction opposite to that of the other robot (see a cross mark in Figure 10.6a.) It seems that the “self robot” cannot adapt to the ongoing conflictive situation just by means of the sensory entrainment, because its top- down intention is too strong to be modified. In contrast, one- step prediction using the error regression was quite successful by generating only a spikelike momen-tary error even at a conflictive decision point (see Figure 10.6b.) These results suggest that the error regression mechanism is more effective for achieving immediate adaptation of the internal neural states to the cur-rent situation than the sensory entrainment mechanism.

Now, we examine how the neural activity represents the perceptual images of past, present, and future as associated with current inten-tion and how such image and intention can be modulated dynamically through iterative interactions between the top- down intentional process


233

(a) (b)R R L R R

time step

R R L L R

sens

ory

pred

ictio

nM

SQer

ror

fast

activ

ityslo

wac

tivity

sens

ory

pred

ictio

nM

SQer

ror

fast

activ

ityslo

wac

tivity

time step

1

0

0 100 200 300 400

0 100 200 300 400

0 100 200 300 400

0 100 200 300 400

–1

1

0

0 100 200 300 400–1

0.8

0.4

0.00 100 200 300 400

.8

.4

.0

1

0

–1

1

0

–1

0 100 200 300 400

0 100 200 300 400

1

0

–1

1

0

–1

Figure 10.6. The results of the self- robot interacting with the other robot by the open- loop generation without (a) and with (b) the error regression mechanism. Redrawn from Murata et al. (2015).

and the bottom- up error regression process while online movement of the robot. Figure 10.7 shows plots for the neural activity at several now steps— the 221st step, the 224th step, and the 227th step from left to right— in an event when the prediction was in conflict with immediate sensory input. The plots for the sensory prediction (joint angles and visual inputs), the prediction error, the slow context unit activity, and the fast context unit activity are shown from the first row to the fourth row, respectively. They show profiles for the past and for the future, with the current step of now sandwiched between them. The prediction error is shown only for the past, naturally. The regression window is shown as a shaded area in the immediate past.

The hand of the “self robot” started to move once to the right direction around the 215th step after settling in the home position for a moment (see the leftmost panels). It is noted that although the joint angles of the “self robot” were settled, there were dynamic changes in the activity of fast context units. This dynamic activity prepares a bias to move the hand to a particular direction, which was the right direction in this case. Also, it can be seen that the error arose sharply in the immediate past when the current “now” was at the 221st step. At this moment, the pre-diction by the “self robot” was betrayed because the hand of the “other

234

2601

0

Sens

ory

pred

ictio

n

1

–1

0

1

–1

0

1

240220200180

260

Pred

ictio

ner

ror

0.0

0.4

0.8

240220200180

260240220200180

260240220200180 260240220200180

260240220200180

260240220200180

260240220200180

Slow

con

text

units

Fast

con

text

units

Plan

Overwrittenpast

Planmodulated

Past Past

Now(step = 221)

Now(step = 224)

Now(step = 227)

Regressionwindow

0.0

0.4

0.8

0.0

0.4

0.8

–1

0

1

–1

0

1

–1

0

1

–1

0

1

–1

0

1

–1

0

1

260240220200180

260240220200180

260240220200180

260240220Time step

200180

PlanPast

Figure 10.7. The rewriting of future by prediction and past by postdiction in the case of conflict. Profiles of sensory prediction, prediction error, and activations of slow and fast context units are plotted from past to future for different current “now” steps. The current “now” is shifted from the 221st step in the left panels, the 224th step in the center panels, and the 227th step in the right panels. Each panel shows profiles corresponding to the immediate past (the regression window) with solid lines and to the future with dotted lines. Redrawn from Murata et al. (2015).


235

robot” moved to the left. Then, the error signal generated was propa-gated upstream strongly and the slow context activation state in the starting step of the regression window was modified with effort. Here, we can see discontinuity in the profiles of the slow context unit activ-ity at the onset of the regression window. This modification caused the overwriting of all profiles of the sensory prediction (reconstruction) and the neural activity in the regression window by means of the forward dynamics recalculated from the onset of the window (see the panels of the current “now” at the 224th step.) The profiles for future steps were also modified accordingly while the error was decreased as the current “now” shifted to the 224th and to the 227th steps. Then, the arm of the “self robot” moved to the left.

What we have observed here is postdiction1 for the past and prediction for the future (Yamashita & Tani, 2012; Murata et al., 2015) by which one’s own action can be recognized only in a “postdictive” manner when one’s own actional intention is about to be rewritten. This structure reminds us of Heidegger’s characterization of the dynamic interplay between looking ahead to the future for possibilities and regressing to the conflictive past through reflection where vivid nowness is born (see section 7.2.) Surely, at this point the robot becomes self- reflective for own past and future!! Especially, the rewritten window in our model may correspond to the encompassing narrative history as space of time in its thought. Thus, we are led to a natural inference, that people may notice their own intentions in the specious present when confronted with conflicts that must be reduced, with the effort resulting in con-scious experience.

10.2.2 Interpretation

Can we apply the aforementioned analysis to account for the delayed awareness of free will? The reader may assume that no conflict should be encountered in just freely pressing a button as in the Libet experi-ment. However, our experiments show how conflicts might arise due to the nature of embodied, situated cognition. When an intention uncon-sciously developed in the higher cognitive level by deterministic chaos

1. Postdiction is known as perceptual phenomena in which a stimulus presented later affects the perception of another stimulus presented earlier (e.g., Eagleman & Sejnowski, 2000; Shimojo, 2014).


236

exceeds a certain threshold, it attempts to drive the lower peripheral parts to generate a particular movement abruptly (see Figure 10.8).

However, the lower levels may not be able to respond to this impetus immediately because the internal neural activity in the peripheral areas, including muscle potential states, may not be always ready to initiate physical body movements according to top- down expectations. It is like when a locomotive suddenly starts to move, the following freight train cars cannot follow immediately, and the wheels spin as the system over-comes resistance to new inertia. As the wheels spin, the engineer may slow the engine speed to optimize the acceleration and get the train going properly. Likewise, in terms of the preceding experimental model, when higher levels cannot receive exactly the expected response from lower levels, some prediction error is generated, which can call for a certain modification of the intention for the movement in the direction of minimizing the error. Here, when the intention for the movement that has been developed unconsciously is modified, conscious awareness arises.

This consciously aware intention is different from the original uncon-scious one because it has been already rewritten by means of postdic-tion. In short, if actions can be generated automatically and smoothly as intended exactly in the beginning, they are not accompanied by consciousness. However, when they are generated in response to con-flicts arising due to the nature of embodiment in the real world, these actions are accompanied by consciousness. This interpretation of our

t

Prediction ofproprioception

Proprioception

Motorsignal

Error

Parietal

M11. Spontaneousgeneration ofintention by chaosin PFC.

Error

4. Intention modulatedby error conscious

2. Intention driveslower level.

3. Embodiment entailscertain amount of error.

Figure 10.8. Account for how free will can be generated unconsciously and how one can become consciously aware of it later.


237

experimental results is analogous to the aforementioned speculation made by Desmurget and colleagues (2009) (see section 4.3) that the parietal cortex might mediate error monitoring between the predicted perceptual outcome for the intended action and the actual one, a pro-cess through which one becomes consciously aware. Freeman (2000) also pointed out that action precedes conscious decision, referring to Merleau- Ponty:

In reality, the deliberation follows the decision—and it is my secret decision that brings the motives to life (Merleau- Ponty, 1962, p 506).

On this account, the relationship between free will and consciousness can be accounted for in the following way; (1) deterministic chaos is developed in the higher cognitive brain area; (2) the top- down intention is spontaneously fluctuated by means of the chaotic dynamics without accompanying consciousness; (3) at the moment of initiating a physi-cal action as triggered by this fluctuated intention, prediction error is generated between the intended state and the reality in the external world; (4) the intention, which has been modified by means of the error regression (postdiction), becomes consciously noticed as the cause for the action about to be generated. In terms of human cognition, then, we may say that consciousness is the feeling of one’s own embodied neural structure as it physically changes in adaptation to a changing, unpredict-able or unpredicted external environment.

If considered as just discussed, Thomas Hobbes (section 3.6) might be right in saying that there is no space left for free will because every “free” action is determined through deterministic dynamics. However, the point is that our conscious minds cannot see how they develop deterministically through causal chains in unconscious processes, but only notice that each freeaction seems to pop out all of a sudden without any cause. Therefore, we feel as if our intentions or wills could be gener-ated freely without cause. To sum up, my account is that free will exists phenomenologically, whereas third- party observation of the physical processes underlying its appearance tells a different story.

10.2.3 Circular Causality, Criticality, and Authenticity

I explored further possibility for applying the MTRNN model extended with the error regression mechanism to a scenario of incremental and


238

interactive tutoring, because such a venture looked so fascinating to me. When I taught a set of movement sequences to the robot, the robot gen-erated various images as well as actual actions by spontaneously combin-ing these sequences (this is analogous to the experiment results shown in section 10.1.) While the robot generated such actions, I occasionally interacted with the robot in order to modify its ongoing movement by grasping its hands. In these interactions, the robot would suddenly ini-tiate an unexpected movement by pulling my hands. When I pushed them back in a different direction, they responded with something in another way. Now, I understand that novel patterns of the robot were more likely to be generated when my response conflicted with that of the robot. This was because the reaction forces generated between the robot’s hands and my hands were transformed into an error signal in the MTRNN model in the robot’s brain, and consequently its internal neural state was modified by means of the resultant error regression process. Such experiences, resulting from the enactment of such novel intentions, can be learned successively and can induce further modi-fication of the memory structure in the robot brain. Intentions for a variety of novel actions can be generated again from such reconstructed memory structures. What I witnessed is illustrated with a sketch shown in Figure 10.9a.

Re-structuring of memory

Memory structure

(b)(a)

Environment/other agents

Unpredictedperception

Novelaction

Spontaneousgeneration ofnovel intentionConscious

experience

Figure 10.9. Circular causality. (a) Chain of circular causality and (b) its appearance by means of mutual prediction of future and regression of past between a robot and myself.


239

This sketch depicts that there is a circular causality among (1) spon-taneous generation of intentions with various proactive actional images developed from the memory structure, (2) enactment of those actional images in reality, (3) conscious experience of the outcome of the inter-action, and (4) incremental learning of these new experiences and the resultant reconstruction in the memory structure. Here, an open dynamic structure emerges by way of the aforementioned circular causality. Consequently, diverse images, actions, and thoughts can be generated, accompanied by spontaneous shifts between conscious and unconscious states of mind after repeated confrontation and reconcilia-tion between the subjective mind and the objective world.

Furthermore, it is worth noting that the emergent processes described in Figure 10.9a include also me as I insert myself into the cir-cular causality in the robotics experiment described in this section (see Figure 10.9b.) When I concentrated on tactile perception for the move-ment of the robot in my grasp, sometimes I noticed that my own next movement image popped out suddenly without my conscious control. I also noticed that tension between me and the robot rose up to critical level occasionally from where unexpected movement patterns of mine as well as of the robot burst out. Although I may be unable to articulate the mechanics behind such experience in greater detail through unaided introspection, alone, I became sure that the interaction between the robot and me exhibited its “authentic” trajectory. Ultimately, free will or free action might be generated in a codependent manner between “me” and others who seek for the most possibility in the shared social situation in this world. At the same time, finally, I realized that I had conducted robotics experimental studies not only to evaluate the pro-posed cognitive models objectively, but also to enjoy myself, creating a rich subjective experience in the exploration of my own consciousness and free will through my online interaction with neurodynamic robots.

10.3. Summary

This chapter tackled the problems of consciousness, intention, and free will through the analysis of neurorobotics experimental results. The problems we focused on were how free will for action can emerge and how it can become the content of consciousness. First, our study investi-gated how intention for different actions can be generated spontaneously.


240

It was found that actions can be shifted from one to another spontane-ously when a chaotic attractor is developed in the slow dynamics sub-network in the higher levels of the cognitive brain. This implies that intention for free action arises from fluctuating neural activity by means of deterministic chaos in the higher cognitive brain area. And this inter-pretation accords with experiment results as delivered by Libet (1985) and Soon and colleagues (2008).

The next question tackled was why conscious awareness of the inten-tion for generating spontaneous actions arises only with a delay imme-diately before actual action is initiated. For the purpose of considering this question, a robotics experiment simulating conflictive situations between two robots was performed. The experiment used an extended version of the MTRNN model employing an error regression scheme for achieving online modification of the internal neural activity in the con-flictive situation. The experimental results showed that spontaneously generated intention in the higher level subnetwork can be modified in a postdictive manner by using the prediction error generated by the con-flict. It was speculated that one becomes consciously aware of one’s own intention for generating action only via postdiction, when the originally generated intention is modified in the face of conflicting perceptual real-ity. In the case of generating free actions, as in the experiment by Libet, the delayed awareness of one’s own intention can be explained similarly, as the conflict emerges between the higher level unconscious intention for initiating a particular movement and the lower level perceptual real-ity by embodiment, which results in generation of the prediction error.

These considerations lead us to conjecture that there might be no space for free will because all phenomena including the spontaneous generation of intentions can be explained by causally deterministic dynamics. We enjoy, however, an experience of free will subjectively, because we feel as if freely chosen actions appear out of a clear sky in our minds without any cause, because our conscious mind cannot trace its secret development in unconscious process.

Finally, the chapter examined the circular causality appearing among processes generating intention, embodiment of such intention in reality, conscious experience of perceived outcomes, and successive learning of such experience in the robot– human interactive tutoring experiment. It was postulated that, because of this circular causal-ity, all processes time- develop in a groundless manner (Varela, et al.,


241

1991) without any convergence to particular situations, whereby images and actions are generated diversely. The vividness and the authenticity of our “selves” might appear especially at a certain crit-icality under such groundless situations developed through circular causality. And thus, our minds might become ultimately free only when gifted with such groundlessness.

242

243

243

11

Conclusions

Now, after completing descriptions of our robotics experiment out-comes, this final chapter presents some conclusions from reviewing these experiments.

11.1. Compositionality in the Cognitive Mind

This book began with a quest for a solution to the symbol grounding problem by asking how robots can grasp meanings of the objective world from their subjective experiences such as the smell of cool air from a refrigerator or the feeling of one’s own body sinking back into a sofa. I considered that this problem originated from Cartesian dual-ism, wherein René Descartes suggested that the mind is a nonmaterial, thinking thing essentially distinct from the nonthinking, material body, only then to face the “problem of interactionism,” that is, expound-ing how nonmaterial minds can cause anything in material bodies, and vice versa. Actually, today’s symbol grounding problem addresses the same concern, asking how symbols considered as arbitrary shapes of tokens defined in nonmetric space could interact densely with sensory– motor reality defined in physical and material metric space (Tani, 2014; Taniguchi et al., 2016).

244 Exploring Robotic Minds

244

In this book, I attempted to resolve this longstanding problem of mind and body by taking synthetic approaches. The book presents the experimental trials, inspired by Merleau- Ponty’s philosophy of embodi-ment, in which my colleagues and I have engineered self- organizing, nonlinear dynamic systems onto robotic platforms. Our central hypoth-esis has been that essential cognitive mechanisms self- organize in the form of neurodynamic structures via iterative learning of continu-ous flow of sensory– motor experience. This learning grounds higher level cognition in perceptual reality without suffering the disjunction between lower and higher level operations that is often found in hybrid models employing symbolic composition programs. Instead, iterative interactions between top- down, subjective, intentional processes of acting on the objective world and bottom- up recognition of perceptual reality result in the alteration of top- down intention through circular causality. Consequently, our models have successfully demonstrated what Merleau- Ponty described metaphorically as the reciprocal inser-tion and intertwining of the subject and the object through which those two become inseparable entities.

It might be still difficult for proponents of cognitivism such as Chomsky to accept such a line of thought. As mentioned in chapter 2, the cognitivist’s first assumption is that an essential aspect of human cognition can be well accounted for in terms of logical symbol systems, the substantial strength of which being that they can support an infinite range of recursive expressions. The second assumption is that sensory– motor or semantic systems are not necessary for the composition or recursion taking place in terms of symbol systems, and therefore may not be essential components of any cognitive systems.

However, one crucial question is whether or not it is necessary for the daily actions and thoughts of human being to be supported by such an infinite length of recursive compositions, in the first place. In everyday situations, a human being speaks only with a limited depth of embedded sentences, and makes action plans composed of only a limited length of primitive behavior sequences at each level. An infinite depth of recur-sive composition is required in neither case. And, the series of robot-ics experiments described in this book confirm this characterization. Our multiple timescale recurrent neural networks (MTRNNs) can learn to imitate stochastic sequences via self– organizing deterministic chaos with complexity of finite state machines, but not with that of infinite ones. A mathematical study by Siegelmann (1995) and recently

Conclusions 245

245

by Graves and colleagues (2014) have proved the potential of analog computational models, including recurrent neural networks (RNNs) with external memory for writing and reading, that they can exhibit computational capabilities beyond the Turing limit. However, the construction of such Turing machines through learning is practically impossible, because the corresponding parameters such as connectivity weights can be found only in singular points in the weight space. Such a parameter- sensitive system may not function reliably, situated in the noisy, sensory– motor reality that its practical embodiment may require, even if an equivalence to such a Turing machine might be constructed in an RNN by chance (Tani et al., 2014).

This should be the same for ordinary human cognitive processes that rely on relatively poor working memory characterized by the magic num-ber seven (Miller, 1956). My work with robots has attempted to model everyday analogical processes of ordinary humans generating behaviors and thoughts characterized by an everyday degree of compositionality. This scope may include the daily utterances of children, before the age of 5 or 6, who can compose sentences in their mother language with-out explicitly recognizing their syntactic structures, and also include the tacit learning of skilled actions such as the grasping of an object to pass it to others without thinking about it, or even making a cup of instant coffee. Our robotics experiments have demonstrated that self- organization of particular dynamical structures within dynamic neural network models can develop a finite level of compositionality, and that the contents of these compositions can remain naturally grounded in the ongoing flow of perceptual reality throughout this process.

Of course, this is far from the end of the story. Even though we may have created an initial picture of what is happening in the mind, prob-lems and questions remain. For example, a typical concern people often ask me about is whether symbols really don’t exist in the brain (Tani et al., 2014). On this count, many electrophysiological researchers have argued for the existence of so- called grandmother cells based on studies of animal brains in which local firings are presumed to encode specific meanings in terms of a one- to- one mapping. These researchers argue that these grandmother cells might function like symbols. A neurophys-iologist once emphatically argued with me, denying the possibility of distributed representations, saying that “this recorded neuron encodes the action of reaching to pulling that object.” On the contrary, I thought it a possibility that this neuron could fire for generating other types of


246

actions that could not be observed in his experiment setting in which movements of the animals were quite constrained. Indeed, recent devel-opments in multiple- cell recording techniques suggest that such map-pings are more likely to be many- to- many than one- to- one. Mormann and colleagues’ (2008) results from multiple- cell recordings of the human medial temporal lobe revealed that the firing of cells for a par-ticular concept is sparse (firing of around 1% cell population) and that each cell encodes from two to five different concepts (e.g., an actress’ face, an animal shape, and a mathematical formula). Even though con-cepts are represented sparsely, their representation is not one- to- one but distributed, and so any presumption that something like direct symbolic representations exist in the human brain seems equally to be in error.

That aside, I speculate that we humans use discrete symbols out-side of the brain depending on the situation. Human civilization has evolved through the use of outside- brain devices such as pens and paper to write down linguistic symbols, thereby distributing thought through symbolic representations, an aspect of what Clark and Chalmers (1998) have called “extended mind.” This use of external representation, more-over, may be internalized and employed through working memory like a “blackboard” in the brain to “write down” our thoughts when we don’t have pen or paper handy. In this book, my argument has been that our brain can facilitate everyday compositionality such as in casual conver-sation or even regular skilled action generation by combining primitive behaviors without needing to (fully) depend on symbol representation or manipulation in the outside- brain devices.

Still, when we need to construct complicated plans for solving com-plex problems such as job scheduling for a group of people in a company or basic designing for building complex facilities or machines, we typi-cally compose these plans into flow charts, schematic drawings, or item-ized statements on paper or in other media utilizing symbols. Tasks at this level might be solved by cognitive architectures such as Act- R, GPS, or Soar. Indeed, these cognitive architectures are good at manipulating symbols as they exist outside of brains by utilizing explicit knowledge or rules. So, this poses the question of how these symbols outside of the brain can be “grounded” in the neurodynamic structures inside the brain. Actually, one of the original inventors of Soar, John Laird, has recently investigated this problem by extending Soar (Laird, 2008). The extended Soar contains additional building blocks that are involved in the learning of tacit knowledge about perception and action generation

Conclusions 247

247

without using symbolic representation. Such subsymbolic levels are interfaced with symbolically represented short- term memory (STM) in next level. Next actions are determined by applying production rules to the memory contents in the STM. Similar research trials can be seen elsewhere (Ritter et al., 2000; St Amant & Riedl, 2001; Bach, 2008).

Ron Sun (2016) have developed a cognitive architecture, CLARION, which is characterized by interactions between explicit processes real-ized by symbol systems and implicit processes by the connectionist net-works under the similar motivation. Although these trials are worth examining, I speculate that the introduction of symbolic representa-tions in STM in Soar or in the explicit level in CLARION might be too early, because such representations can be developed still in a nonsym-bolic manner such as by analog neurodynamic patterns, as I have shown repeatedly in the current book. The essential questions would be from which level in cognitive process external symbols should be used and how such symbols can be interfaced with subsymbolic representation. These questions are left for future studies, and there will undoubtedly be many more we will face.

11.2. Phenomenology

The current book also explored phenomenological aspects of human mind including notions of self, consciousness, subjective time, and free will by drawing correspondences between the outcomes of neurorobot-ics experiments and some of the literature in traditional phenomenol-ogy. Although some may argue that such analysis from the synthetic modeling side can never be more than metaphorical, against this I would argue that models capture aspects essential to a phenomenon, reduc-ing the complexity of a system to only these essential dimensions, and in this way models are not metaphors. They are the systems in ques-tion, only simpler, at least in so far as essential dimensions are indeed modeled and nothing more (see further discussion by Jeffrey White [2016].) In this spirit, I believe that interdisciplinary discussions on the outcomes of such neurorobotics experiments can serve to strengthen the insights for connecting aspects of robot and human behaviors more closely. It should be true that human phenomenology, human behav-ior, and underlying brain mechanisms can be understood only through their mutual constraints imposed on the formal dynamical models, as


248

Varela (1996) pointed out. In this way, robotics experiments of the sort reviewed in this text afford privileged insights into the human condi-tion. To reinforce these insights, let us review these experiments briefly.

In the robot navigation experiment described in section 7.2, it was argued that the “self” might come to conscious awareness when coher-ence between internal dynamics and environmental dynamics breaks down, when subjective anticipation and perceptual observation conflict. By referring to Heidegger’s example about a carpenter hitting nails with a hammer, it was explained that the subject (carpenter) and the object (hammer) form an enactive unity when all of the cognitive and behav-ioral processes proceed smoothly and automatically. This process is char-acterized by a steady phase of neurodynamic activity. In the unsteady phase, the distinction between these two becomes explicit, and the “self” comes to be noticed consciously. An important observation was that these two phases alternated intermittently by exhibiting the characteristics of self- organized criticality (Bak et al., 1987). It was considered that the authentic being might be accounted for by this dynamic structure.

In section 8.4, I proposed that the problem of segmenting the contin-uous perceptual flow into meaningful reusable primitive patterns might be related to the problem of time perception as formulated by Husserl. For the purpose of examining this thought, we reviewed an experiment involving robot imitation learning that uses the RNNPB model. From the analysis of these experimental results, it was speculated that “now-ness” is bounded where the flow of experience is segmented. When the continuous perceptual flow can be anticipated without generating error, there is no sense of events passing through time. However, when the pre-diction error is generated, the flow is segmented into chunks by means of a parametric bias vector modification with an effort for minimizing the error. With this, the passing of time comes to conscious awareness. The segmented chunks are no longer just parts of the flow, but rather represent discrete events that can be consciously identified according to the perceptual categories as encoded on our model by the PB vector.

In fact, it is interesting to see that the observation of compositional actions by others accompanies the momentary consciousness at the moment of segmenting the perceptual flow into a patterned set of primi-tives. This is because compositional actions generated by others entail potential unpredictability when such actions are composed of primi-tive acts voluntarily selected by means of the “free will” of the oth-ers. Therefore, compositionality in cognition might be related to the

Conclusions 249

249

phenomenology of free will and consciousness. If some animals live only on sensory- reflex behaviors without the ability to either recognize or generate compositional actions, there might be no space for conscious-ness or for the experience of free will in their “minds.”

In chapter 9, I wrote that the capability of abstraction through hierar-chy in MTRNN can provide robots with competency of self- narrative for own actional intention in mental simulation. I speculated that reflective selves of robots may originate from this point. By following this argu-ment, chapter 10 was devoted to the relationship between free will and conscious experience in greater depth. From results of robotics experi-ments utilizing the MTRNN model (section 10.1), I proposed that inten-tions for free actions could be generated spontaneously by deterministic chaos in the higher cognitive brain area. Results of the robotics experi-ment shown in section 10.2 suggest that conscious awareness of the intention developed by such deterministic dynamics can arise only in a postdictive manner when conflicts arise between top- down prediction and bottom- up reality. This observation was correlated with the account for the delayed awareness of free will reported by Libet (1985). By con-sidering possible situations in which the intention to enact a particular movement generated in the higher level conflicts with the sensory– motor reality as constituted in the lower level, it was proposed that an effort autonomously mechanized for reducing the conflict would bring the intention to conscious awareness.

Finally, this chapter suggested that there might be no space for free will from an objective view because all of the mechanisms necessary for generating voluntary actions can be explained by deterministic dynamics due to causal physical phenomena, as I have shown in our robotics experiments. Though it is true that in our everyday subjec-tive experience we feel as if free will exists, through the results of our neurorobotics experiments we can see that this phenomenon may arise simply because our minds cannot see the causal processes at work in generating each intentional action. Our minds cannot observe the phase space trajectory of chaos developed in the higher cognitive brain area. We are conscious of each intention as if it pops up without any prior cause immediately before the corresponding action is enacted. On this account, thus, we may conclude that free will exists but merely as an aspect of our subjective experience.

With the relationship between free will and consciousness thus clarified, I will reiterate once more that the problem of consciousness


250

may not be the hard problem after all. If consciousness is considered to be the first person awareness of embodied physical processes, then an exhaustive account of consciousness should likewise appear via the explanation of the relationships between the subjective and the objective. This stands to reason, of course, provided that the whole of this universe is also constituted by these two poles, and that nothing exists outside of them (something “supernatural”). When subjectiv-ity is exemplified by the top- down pathway of predicting an actional outcome, and objectivity by the bottom- up recognition of the percep-tual reality, these poles are differentiable in terms of the gap between them. Consequently, consciousness at each moment should appear as a sense of an effortful process aimed at minimizing this gap. Then, qualia might be a special case of conscious experience that appears when the gap is generated only in the lower perceptual level in which the vividness of qualia may be originated from the prediction error residual at each instance. Along this line, and more specifically, Friston (2010) would say that it is from the error divided by the estimated variance (uncertainty) rather than the error itself.

However, a more essential issue is to understand the underlying struc-ture of consciousness rather than just a conscious state at a particular moment that is measured post hoc in terms of integrated information (Tononi, 2008), for example, or in terms of the aforementioned gap or prediction error. We have to explain the underlying structural mechanism accounting for, for example, the stream of consciousness formulated as spontaneous alternation between conscious state and unconscious state by William James (1892). The crucial proposal in the current book is that the circular causality developed between the subjective mind and the objective world is responsible for consciousness and also for an appear-ance of free will, as these two are dependent on each other within the same dynamic structure. The top- down proactive intention acting on the objective world induces changes on this world, whereas the bottom- up postdictive recognition of such changes including unexpected ones may induce changes in memory and intention in the subjective mind. This could result in another emergence of “free” action by means of the poten-tial nonlinearity of the system. In the loop of circular causality, spontane-ous shifts between unconscious state in terms of the coherent phase and conscious state in terms of the incoherent phase occur intermittently as the dynamic whole develops toward criticality.

Conclusions 251

251

To sum up, this open dynamic structure developed in the loop of the circular causality should account for the autonomy of consciousness and free will. Or, it can be said that this open dynamic structure explains the inseparable nature of the subjective mind and the objective world in terms of autonomous mechanisms moderating the breakdown and unification of this system of self and situation. Conclusively, critical-ity developed in this open, dynamic structure might account for the authenticity thought by Heidegger that generates trajectory toward own most possibility by avoiding just falling into habitual or conventional ways of acting (Tani, 2009). Reflective selves of robots that can examine own past and future possibility should originate from this perspective.

11.3. Objective Science and Subjective Experience

The readers might have noticed that two different attitudes in conduct-ing robotics experiments appear by turns in Part II of the current book. One type of my robotics experiment focuses more on how adequate action can be generated based on the learning of a rational model of the outer world, whereas the other type focuses more on the dynamic characteristics of possible interactions between the subjective mind and the objective world.

For example, chapter 7 employs these two different approaches in the study of robot navigation learning. Section 7.1 described how the RNN model used in mobile robots can develop compositional repre-sentations of the outer environment and how these representations can be grounded. On the other hand, section 7.2 explored characteristics of groundlessness (Varela et al., 1991) in terms of fluctuated interac-tion between the subjective mind and the objective world. Section 8.3 describes the one- way imitation learning of the robot to show that the RNNPB model can learn to generate and recognize a set of primi-tive behavior patterns by observing movements of its human partner. Afterward, I introduced the imitation game experiment in which two- way mutual imitation between robot and human was the focus. It was observed that some psychologically plausible phenomena such as turn taking of initiative emerged in the course of the imitation game, reinforc-ing our emphasis on the interaction between the first- personal subjec-tive and the objective, in this case social, world. In chapter 9, I described


252

how the MTRNN model can learn compositional action sequences by developing an adequate functional hierarchy in the network model. Then, chapter 10 examined how circular causality can be developed among different cognitive processes for the purpose of investigating the free will problem by using the same MTRNN model. This chapter also reported how novel image and action can be generated both in robot and human sides during interactive tutoring of robots by human tutors.

To sum up, my research attitude has been shifting between one side of investigating rational models for cognitive mechanisms from an objective view and the other side of exploring subjective phenomena by means of putting myself inside the interaction loop in robotics experi-ments. Matsuno (1989) and Gunji (Gunji & Konno, 1991) wrote that the former type of research attitude would take a view of the so- called external observer and the latter of the so- called internal observer. They used the term observation as mostly equivalent to the term interaction. When the relationship between the observer and the observed can alter because of the interactions between them, such an observer is regarded as an internal observer because it is included in the internal loop of the interactions. On the other hand, the external observer assumes only one- way, passive observation from observed to observer without any interactive feedback.

Observation, itself, consists in a set of embodied processes that are physically constrained in various ways such as by imprecision in percep-tion and in motor generation, time delays in neural activation and body movement, limitation in memory capacity, and so on. Such physical con-straints in time and space do not allow the system to be uniquely opti-mized and thus give rise to incompleteness and inconsistency. Actually, in our robot experiments, such inconsistencies arise in every aspect of cognitive processes including action generation, recognition of percep-tual outcomes, and the learning of resultant new experience. However, at the moment of encountering such an inconsistency, the processes can-not be merely terminated. Instead, each process attempts to change its current relations as if it were expected that the inconsistency will be resolved sometime in the future and as long as the interaction continues (Gunji & Konno, 1991).

We can experience something analogous to this when we go to a gig of “cutting- edge” contemporary jazz. A brilliant tenor sax player like the late Michael Brecker often started a tune with familiar phrases of improvisation in calm, but his play and other band members’ got tensed

Conclusions 253

253

gradually through mutual responses. At the near peak of the tension, likely to break down at any moment, his play sometimes got stuck for an instant as his body control for blowing or tonguing seemed unable to catch up with his rushed image any more. In the next moment, however, the unbelievable tension of sound and phrase burst out. His genuine cre-ativity in such thrilling playing resulted not merely from his outstanding skills for improvising phrases or for perfect control of the instrument but originated from the urgent struggle for enactment of his exploded mental image and intention.

It is interesting to note that cognitive minds appear to maintain two processes moving toward opposite directions, one toward stabil-ity and the other toward instability. The goal directedness is consid-ered as an attempt to achieve the stability of the system by resolving the currently observed inconsistencies of the system. All processes of recognition, generation, and learning can be regarded as goal- directed activities, which can be accounted for such as by the prediction error minimization principle employed in our models. These activities are geared toward grounding as shown in some of our robotics experiments. However such goal- directed attempts always entail instability because of their embodiment as well as potential openness of adopted envi-ronment that resulted in the groundlessness, as we have witnessed in our other robotics experiments. The coexistence of the stable and the unstable nature does not allow the system state to simply converge but imbues the system with autonomy for generating itinerant trajectory (Tsuda, 2001; Ikegami & Iizuka, 2007; Ikegami, 2013) wherein we can find the vividness of a living system.

By overviewing my research history, now I become sure that both research attitudes are equally important for the goal of understanding the mind via synthesis. On the one side, it is crucial to build rational models of cognition with the goal of optimization and stabilization of each elementary cognitive process. On the other hand, it is equally cru-cial to explore dynamic aspects of mind while the optimization is yet to be achieved during the ongoing process of robots acting in the world. The former research can be much more advanced by using the recent results from the booming research programs on machine learning and deep learning in which the connectionist approach with employing the error back propagation scheme has been revived by introducing more elegant mathematics to the models than those in 1980’s. For further advancement of the latter part, we need to explore the methodology


254

of articulating the subjective experience of the experimenters who are within the interaction loop in the robotics experiment.

What we need to do is to enhance further the circular loop between the objective science of modeling cognitive mechanisms and the prac-tice for articulating the subjective experience. This exactly follows what Varela and colleagues proposed in the embodied mind (Varela et al., 1991) and in their so- called neurophenomenology program (Varela, 1996). Varela and colleagues proposed to build a bridge between mind in science and mind in experience by articulating a dialogue between these two tra-ditions of Western cognitive science and Buddhist meditative psychology (Varela et al., 1991, xviii). Why Buddhist meditation for the analysis of subjective experience? This is because the Buddhist tradition of medita-tion practice spanning more than 26 centuries has achieved systematic and pragmatic disciplines for accessing the human experience. Parts of Buddhist meditation disciplines could be applied directly to our problem of how to articulate the subjective experience of the experimenter in the robotics experiment loop.

The Buddhist mindful awareness tradition starts with practices to sus-pend habitual attitudes granted in everyday life (Varela et al., 1991). By practicing this suspension of the habitual attitude, meditators become able to let their minds present themselves or go by themselves by devel-oping a mood for stepping back. Analogously, if we attempt to develop ultimately natural, spontaneous mindful interactions between robots and human, we should get rid of arbitrary thinking in the human sub-jects, such as what robots or human should do or should not do, which have been assumed in the conventional human– robot interaction frame-work. In my own experience of interacting with the robot as described in section 10.2, when I was more absorbed in the robot interaction by concentrating on tactile perception for the movement of the robot in my grasp, I felt more vividness on the robot movement and also experi-enced more spontaneous arousal of kinesthetic image for my own move-ment. The ongoing interaction was neither dominated by my subjectivity nor the objectivity of the robot. It was like floating in the middle way between the two extremes of the subjectivity and the objectivity. Such intensive interaction alternated between a more tensed, conflictive phase and a more relaxed one, as I already mentioned. It is noted that con-tinuance of such subtle interaction depended on how diverse memory patterns were consolidated by developing generalized deep structure in the dynamic neural network used in the robot. The more deeply the

Conclusions 255

255

memory structure develops, the more intriguing the generated images become. The enhancement of the employed models greatly contributes to realization of sensible interactions between the robots and the human subjects.

In summary, it is highly expected that the goal of understanding the mind can be achieved by making efforts both in the objective sci-ence and the subjective experience, one for investigating more effective cognitive models assuring for better performance and scalability, and the other for practicing to achieve truly mindful interaction with the robots. True features of the mind should be captured by undertaking such research trials of moving back and forth in exploration of objective science and subjective experience.

11.4. Future Directions

Although this book has not concentrated on modeling the biological reality of the brain in details recent exciting findings in system- level neuroscience draw me to explore this area of research more explicitly. The sizeable amount of human brain imaging data that have been gath-ered to date has enabled a global map to be created of both static con-nectivity and dynamic connectivity between all the different cortical areas (Sporns, 2010). Thanks to such data, now might be a good time to start trying to reconstruct a global model of the brain so that we can synthetically examine what sorts of brain functions appear locally and globally with both static and dynamic connectivity constraints. In the process, we may also examine how these models correspond with evi-dence from neuroscience.

An exciting future task might be to build a largescale brain network by using either rate- coding neural units or spiking neurons for artificial humanoid brains. Such experiments have been started already by some researchers, including Edelman’s group (Fleischer et al., 2007) and Eliasmith (2014) by introducing millions of spiking neurons in their models. I should emphasize, however, that large scale does not mean a complete replica of real brains. We still need a good abstraction of the biological reality to build tractable models. We may not need to recon-struct the whole brain by simulating activity of 100 billions of biological plausible neurons interconnected with columnar structure as like aimed by Blue Brain project (see section 5.4).


256

Interestingly, it has been recently shown that some connectionist type neural network models using several orders less number of rate- coding neural units can exhibit human level performance in specific tasks such as visual object recognition. It was shown that so- called the convolutional neural network (CNN, LeCun et al., 1998) developed as inspired by the hierarchical organization of the visual cortex can learn to classify visual images of hundreds object types such as bicycles, cars, chairs, tables, gui-tars and so on in diverse views and sizes with error rate of 0.0665 by using 1 million set of static visual image training data (Szegedy et al., 2015.) Although this classification accuracy is almost close to that of human (Szegedy et al., 2015), a surprising fact is that the used CNN consisting of 30 layers contains only around a million of almost homoge-neous rate- coding neural units as opposed to the fact that the real visual cortex contains 10 billion of spiking neurons with hundreds of different morpho- electrical types (see section 5.4.) This implies that activities of 10 thousands of spiking neurons could be represented by that of a single rate coding neural unit as a point mass in connectionist models without degrading their performance level as I presumed in section 5.4. It is also inferred that the known diversity in cell types as well as in synaptic con-nection types can be regarded as biological details which may not con-tribute to primary system level understanding of brain mechanisms such as how visual objects can be classified in brains. Building a largescale brain network model consisting of a dozen of major brain areas in its sub-networks by allocating around 10 million of rate- coding neural units as the total may not be so difficult even in the current computational envi-ronment of using clusters. Now, we can start such an enterprise, referred to as Humanoid Brain project. Humanoid Brain project would clarify the underlying mechanism on the functional differentiation observed across local areas in our brains in terms of downward causation by the functional connectivity and the multiple spatiotemporal scales property evidenced in human brains and by embodiment in terms of structural coupling of the peripheral cortical areas with sensory- motor reality.

Another line of meaningful extension in terms of neuro- phenomenological- robotics would be exploration of underlying mechanisms for various psychiatric diseases including schizophrenia, autism, and depression. Actually, I and my colleagues have started studies in this direction, which have already shown initial results. Yamashita and Tani (2012) proposed that disturbance of self, which is a major symptom in schizophrenia, can be explained as compensation for adaptive behavior by means of the

Conclusions 257

257

error regression. A neurorobotics model was built as inspired by the dis-connectivity hypothesis by Friston (1998) that suggests that basic pathol-ogy of schizophrenia may be associated with functional disconnectivity in the hierarchical network of the brain (i.e., between prefrontal and poste-rior brain regions). In the neurorobotics experiment (Yamashita and Tani, 2012) using an MTRNN model, a humanoid robot was trained for a set of behavioral tasks. After the training, a certain amount of perturbation was given in the connectivity weights between the higher level and the lower level to represent the disconnectivity. When the robot performed the trained tasks with online error regression, the inner prediction error was generated because of the disconnectivity introduced. Consequently, the intention state in the higher level was modulated autonomously by the error signal back- propagated from the lower perception level. This observation suggests that aberrant modulatory signals induced by inter-nally generated prediction error might be a source of the patient’s feeling that his intention is affected by some outside force.

Furthermore, the experimental result by Yamashita and Tani (2012) suggests a hypothetical account for a schizophrenia symptom, cognitive fragmentation (Perry & Braff, 1994), in which the patients lack conti-nuity in spatiotemporal perception. It is speculated that such cognitive fragmentation might be caused by frequent occurrences of the inner prediction error, because subjective experience of time passing can be considered to be associated with prediction error in segmentation points in perceptual flow, as I have analyzed in section 8.3.

In future research, the mechanism for autism could be clarified in terms of another type of malfunction in the predictive coding scheme presumed in the brain. Recently, Van de Cruys and colleagues (Van de Cruys et al., 2014) proposed that hyper- prior with less tolerance with the prediction error results in failure in generalization in learning which is the primary cause of autism. This can be intuitively explained that the prediction net-work can generate overfitting problem with generalization error when the top- down pressure for minimizing the error in learning is imposed on the network too strongly. This generalization error in predicting com-ing perceptual state could be considered as the main cause of autism from accumulated evidence on the patients’ typical symptom that they are significantly good at learning by rote but lacking capability in struc-tural learning (Van de Cruys et al., 2014; Nagai & Asada, 2015.) Robotic experiment for reconstructing the symptom could be conducted by mod-eling hyper- prior by implementing estimation of inverse precision used


258

in Bayesian predictive coding framework (Friston, 2005; Murata et al., 2015), as Van de Cruys and colleagues (2014) rationalized that over esti-mation of the precision under noisy real world circumstance can result in overfitting of the prediction model. Future studies should examine other psychiatric diseases including attention deficit hyperactivity disorder and obsessive– compulsive disorder. In summary, if a particular neurorobotics model represents a good model of the human mind, it should be able to account also for the underlying mechanisms for these common psychia-try pathologies, because brain structures of these patients are known to be not so much different from the normal ones.

Another crucial question should be how much we can scale the neu-rorobots described in this book, as I know well that still my robots can work only in toy environments. Although I’d say that the progress made in neurorobotics has thus far been steady, actually scaling robots to near- human level might be very difficult. Confronted with this challenge, recent pragmatic studies of deep learning (Hinton et al., 2006; Bengio et al., 2013) have revived aging connectionist approaches supercharged with huge computational power latent within (multiple) graphic process-ing units in standard desktop PCs. Already, some deep learning schemes have recently demonstrated significant advances in perception and rec-ognition capabilities by using millions of exemplar datasets for learning. For example, a convolutional neural network (LeCun et al., 1998) can perform visual object classification with near human level performance by learning (Szegedy et al., 2015) as described previously in this sub-section, and a speech recognition system provided a far better recogni-tion rate given noisy speech signals of unspecified speakers than widely used, state- of- the- art commercial speech recognition systems (Hannun et al., 2014). The handwriting recognition system using long- term short- term memory by Doetsch and colleagues (2014) demonstrated its almost human- equivalent recognition performance.

Such promising results seem to justify some optimism, that the arti-ficial upscaling to human- like cognitive capabilities using these meth-ods may not be so difficult. Optimists may say that these systems can exhibit near human- level perceptual capabilities. Although this should be true for recognition of a single modality of perceptional channel, it is clear that deep understanding of the world on a human level cannot be achieved just by this. Such understanding should require associative integration among multiple modalities of perceptual flows, experienced through iterative interactions of the agents with the world.

Conclusions 259

259

Regardless, these and other recent advances in deep learning suggest that neurorobotics studies could be scaled significantly with the afore-mentioned largescale brain network model if massive training librar-ies are used alongside multimodal, high- dimensional perceptual flow including the pixel level visual stream, like the one shown by Hwang et al. (2015) briefly described in section 9.3, and tactile sensation via hundreds of thousands of points of contact covering an entire “skin” surface; likewise for auditory signals, olfactory “organs” and so on. So empowered, with online experience actively associated with its own intentional interaction with the world, deep minds near human level might appear as a consequence. When a robot becomes able to develop subjective, proactive self- images in huge numbers of dimensions along-side its own unique “real- time” perceptual flow as it interacts with the world, we may approach the reconstruction of real human minds! Attempts to scale neurorobots toward human- like being, of course, are scientifically fascinating, and the developmental robotics commu-nity has already begun investigating this issue seriously (Kuniyoshi & Sangawa, 2006; Oudeyer et al., 2007; Asada et al., 2009; Metta et al., 2010, Asada, 2014; Cangelosi & Schlesinger, 2015; Ugur et al., 2015.)

However, what is crucially missing from current models is general intelligence by way of which various tasks across different domains can be completed by adaptively combining available cognitive resources through functions such as inference, induction, inhibition of habitua-tion, imitation, improvisation, simulation, working memory retrieval, and planning, among many others. One amazing aspect of human com-petency is that we can perform such a wide variety of tasks like navigat-ing, dancing, designing intricate structures, cleaning rooms, talking with others, painting pictures, deliberating over mathematical equations, and searching the Internet for information on neurorobotics, simply to name a few. Compared with this, what our robots can do is merely navigate a given workspace or manipulate simple objects. So, taking our work one stage further logically involves educating robots to perform multiple domain tasks toward multiple goals with increasing degrees of complex-ity. Success in this endeavor should lead to a more general intelligence.

Toward this end, the crucial question becomes how to increase the amount of learning. This is not easy, however, because we cannot train robots simply by connecting them to the Internet or to a data-base. Robots must act on the physical environment to acquire their own experiences. So, researchers must provide a certain developmental


260

educational environment wherein robots can be tutored every day for months or possibly for years. And, as robots must be educated within various task domains, this environment is necessarily more complex than a long series of still photos.

In considering developmental education of robots, an essential ques-tion, still remained is that how human or artifacts like robots can acquire structural representation of the world by learning through experience under the constraints of “poverty of stimulus”, as Norm Chomsky (1972) once asked. This is asking how generalization in learning can be achieved, for example, in robots with limited amount of tutoring experiences. For this question developmental robotics could provide a possible solution by using the concept of staged development considered by Piaget (1951). The expectation is that learning in one developmental stage can provide “prior” to the one in the next stage by which dimensionality of the learning can be drastically reduced, and therefore generalization with less amount of tutoring experience becomes possible. Based on this conception, devel-opmental stage would proceed from physically embodiment level to more symbolic level. Trials should require a lengthy period wherein phys-ical interactions between robots and tutors involve “scaffolding”— guiding support provided by tutors that enables the bootstrapping of cognitive and social skills required in the next stage (Metta et al., 2010). With scaffolding, higher level functions are entrained alongside foundational perceptual abilities during tutoring, and the robot’s cognitive capacities develop from grounding simple sensory- motor skills to more complex compositional cognitive ones. It could happen that the earlier stages may require merely sensory– motor level interaction with environment physi-cally guided by tutors whereas the later stages may provide tutoring more in demonstration and imitation style without introducing physical guid-ance. The very final stage of education may require only usage of virtual environments (like learning from watching videos) or symbolically repre-sented materials (like reading books). For implementation of such staged tutoring and development of robots, research on the method for the tutor or educator side may become equally important.

In the aforementioned developmental tutoring process, a robot should not be a passive learner. Rather, it should be an active learner that acts “creatively” for exploring the world, not merely repeating acquired skills or habits. For this purpose, robots should become authentic beings, as I mentioned repeatedly, by reflecting own past seriously and also by act-ing proactively for own most possibility that is shared with the tutors.

Conclusions 261

261

Tutoring interaction between such active learner robots and human tutors should inevitably become highly intensive occasionally. To carry out long- term and sometime intensive educational interactions, the development of emotions within the robot would be an indispensable aid. Although this issue has been neglected in this book, to take care of robots like children, human tutors would require emotional responses from the robots. Otherwise, many human tutors may not be able to con-tinue cordial interactions with stone- cold, nonliving machines for such long periods. The development of adequate emotional responses should deepen bonds between tutors and robots, by which long- term, affectively reinforced education would become possible. Minoru Asada proposed so- called affective developmental robotics (Asada, 2015) in which he assumes multiple stages of emotional development from a simple stage to a com-plex one including emotional contagion, emotional empathy, cognitive empathy, and sympathy. His crucial premise is that the development of emotion and that of embodied social interaction are codependent on each other. Consequently, the long- term educational processes of robots by human caregivers should be accompanied by these two codependent channels of development.

Finally, a difficult but important problem to be considered is whether artifacts can embody and express moral virtue. Aristotle says that moral virtues are not innate, but they can be acquired through habitual prac-tice. It is said that an individual becomes truthful by acting truthfully or becomes unselfish by acting unselfishly. Simultaneously, human beings are motivated to do something “good” for others because they share in the consequences of their actions by means of mirror neurons. The net effect is that, as one human seeks happiness for him- or herself, he or she experiences happiness in bringing happiness to others similarly embodied. In principle, robots can do the same by learning the effects of their own actions on the happiness expressed by others and reinforced through mirroring neural models. I would like to prove that robots can be developed or educated to acquire not only sophisticated cognitive competency but also moral virtue. Nowadays, robots may start to have “free will,” as I have postulated in this book. This means that those robots could happen to generate bad behaviors toward others as well by own wills. However, if the robots can learn about moral virtue, such robots would generate only good behaviors by inhibiting themselves to generate bad behaviors. Such robots would contribute to true happiness in a future human– robot coexisting society.


262

11.5. Summary

This final section overviews the whole book once again for the purpose of providing final conclusive remarks.

This book sought to account for the subjective experience character-ized on the one hand by compositionality of higher- order cognition and on the other hand by fluid and spontaneous interaction with the outer world through the examination of synthetic neurorobotics experiments conducted by the author. In essence, this is to inquire into the essential, dynamical nature of the mind. The book was organized into two parts, namely “Part I—On the Mind” and “Part II—Emergent Minds: Findings from Robotics Experiments.” In Part I, the book reviewed how different questions about minds have been explored in different research fields, including cognitive science, phenomenology, brain science, psychology, and synthetic modelling. Part II started with new proposals for tackling open problems through neurorobotics experiments. We once again look at each chapter briefly to summarize them.

Part I started with an introduction to cognitivism, in chapter 2 emphasizing “compositionality,” considered to be a uniquely human competency whereby knowledge of the world is represented by utilizing symbols. Some representative cognitive models were introduced that address the issues of problem solving in problem spaces and the abstrac-tion of information by using “chunking” and hierarchy. This chapter sug-gested, however, the potential difficulty in utilizing symbols internal to the mechanics of minds, especially in an attempt to ground symbols in real- time, online, sensory- motor reality and context.

Chapter 3 on phenomenology introduced views on the mind from the other extreme, emphasizing direct or pure experiences prior to being articulated with particular knowledge or symbols. The chapter covered the ideas of subjective time by Husserl, being- in- the- world by Heidegger, embodiment by Merleau- Ponty, and stream of consciousness by James. By emphasizing the cycle of perception and action in the physical world via embodiment, we explored how philosophers have tackled the prob-lem of the inseparable complex that is the subjective mind and the objec-tive world. It was also shown that notions of consciousness and free will may be clarified through phenomenological analysis.

Chapter 4 attempted to explain how human brains can support cogni-tive mechanisms through a review of current knowledge in the field of neuroscience. To start with, we looked at a possible hierarchy in brains

Conclusions 263

263

that supports complex visual recognition and action generation. We then considered the possibility that two cognitive functions— generating actions and recognizing perceptual reality— are just two sides of the same coin by reviewing empirical studies on the mirror neurons and the pari-etal cortices. This chapter also examined the issue of the origin of free will by reviewing the experimental study conducted by Libet (1985). Despite the recent accumulation of various experimental findings in neurosci-ence, these chapters concluded that it is not yet possible to grasp com-plete understanding of the neuronal mechanisms accounting for cognitive functions of our interests due to conflicting evidence and the limitations inherent in experimental observation in neuroscience.

Chapter 5 introduced the dynamical systems approach for modeling embodied cognition both in natural and artificial systems. The chap-ter began with a tutorial on nonlinear dynamical systems. By following this tutorial, the chapter described Gibsonian and Neo- Gibsonian ideas in psychology that fit quite well with the dynamical systems frame-work and also explained how they have influenced the communities of behavior- based robotics and neurorobotics. Some representative neuro-robotics studies were introduced investigating how primitive behaviors can develop and be explained from the dynamical systems perspective.

Chapter 6, as the first chapter of Part II, proposed new paradigms for understanding cognitive minds by taking a synthetic approach uti-lizing neurorobotics experiments. First, the chapter postulated the potential difficulty in clarifying the essence of minds by just pursuing the bottom- up pathway emphasized by the behaviour- based approach. Then it was argued that what is missing are the top- down subjective intentions for acting on the objective world and its iterative interaction with the bottom- up perceptual reality. It was speculated that human- like capabilities for dealing with compositional language- thoughts or even for much simpler cognitive schemes should emerge as the results of iterative interactions between these two pathways, top to bottom and bottom to top, rather than just by one- way processes along the bottom- up pathway. It was furthermore speculated that a key to solving the so- called hard problem of consciousness and free will could be found on close examination of such interactions.

Based on the thoughts described in chapter 6, new challenges dis-cussed in chapters 7 through 10 concerned the reconstruction of various cognitive or psychological behaviours in a set of synthetic neurorobot-ics experiments. In these robotics studies, our research focus went


264

back and forth between two fundamental issues. On the one hand, we explored how compositionality for cognition can be developed via itera-tive sensory– motor level interactions of agents with their environments and how these compositional representations can be grounded. On the other hand, we also examined the codependent relationship between the subjective mind and the objective world that emerges in their dense interaction for the purpose of investigating the underlying structure of consciousness and free will.

In the first half of chapter 7, we investigated the development of compositionality by reviewing a robotics experiment on predictive nav-igation learning using a simple RNN model. The experimental results showed that the compositionality hidden in the topological trajectory in the obstacle environment can be extracted as embedded in a global attractor with fractal structure in the phase space of the RNN model. It was shown that compositional representation developed in the RNN can be naturally grounded in the physical environment by allowing iter-ative interactions between the two in a shared metric space.

In the second half of chapter 7, on the other hand, we explored a sense of groundlessness (a sense of not to be grounded completely) through the analysis of another navigation experiment. It was shown that the develop-mental learning process during the exploration switched spontaneously between coherent phases and incoherent phases when chain reactions took place among different cognitive processes of recognition, prediction, perception, learning, and acting. By referring to Heidegger’s example about a carpenter hitting nails with a hammer, it was explained that the distinction between the two poles of the subjective mind and the objec-tive world become explicit in the breakdown, as shown in the incoherent phase whereby the “self” rises to conscious awareness. We drew the con-clusion that the open dynamic structure characterized by self- organized criticality (SOC) can account for the underlying structure of conscious-ness by way of which the “momentary self” appears spontaneously.

Chapter 8 introduced the RNNPB as a model of mirror neurons that have been considered to be crucially responsible for the composition and decomposition of actions. The RNNPB can learn a set of behavior primitives for generation as well as for recognition by means of error minimization in a predictive coding framework. The RNNPB model was evaluated through a set of robotics experiments including learning of multiple movement patterns, imitation game, and associative learn-ing of protolanguage and action whereby the following characteristics

Conclusions 265

265

emerged. (1) The model can recognize aspects of a continuous percep-tual flow by segmenting it into a sequence of chunks or reusable primi-tives; (2) a set of actional concepts can be learned with generalization by developing relational structures among those concepts in the neu-ral activation space, as shown in the experiment on associative learning between protolanguage and actions; and (3) the model can generate not only learned behavior patterns but also novel ones by means of twists or dimples generated in the manifold of the RNNPB due to the potential nonlinearity of the network.

Chapter 9 addressed the issue of hierarchy in cognitive systems. For this purpose, we proposed a dynamic model, the MTRNN that is character-ized by its multiple timescale and examined how a functional hierarchy for action can be developed in the model through robotics experiments employing this model. Results showed that a set of behavior primitives were developed in the fast timescale network in the lower level, while the whole action plan that sequences the behavior primitives was developed in the slow timescale network in the higher level. It was also found that the initial neural activation state in the slow timescale network encoded the top- down actional intention that triggers the generation of a correspond-ing slow dynamics trajectory in the higher level, which again triggers the projection of an intended sequence of behavior primitives from the lower level of the network to the outer world. It was concluded that a sort of “fluid compositionality” for smooth and flexible generation of actions was achieved in the proposed MTRNN model through the self- organization of a functional hierarchy by adopting neuroscientifically plausible con-straints including timescale differences among different local networks and structural connectivity among them as downward causation.

Chapter 10 also considered two problems about free will. One involved its origin and the other the conscious awareness of it. From the results of experiments employing the MTRNN model, I proposed that actional intention can be spontaneously generated by means of chaos in the higher cognitive brain areas. It was postulated that intention or will developed unconsciously in the higher cognitive brain by chaos would only come to conscious awareness in a postdictive manner. More specifi-cally, when a gap emerged between the top- down intention for acting and the bottom- up perception of reality, the intention may be noticed as the effort of minimizing this gap is exercised.

Furthermore, the chapter examined the circular causality developed among different cognitive processes in human– robot interactive tutoring


266

experiments. It was conjectured that free will could exist in the subjective experience of the human experimenter as well as the robot who seeks their own most possibility in their conflictive interaction when they feel as if whatever creative image for next act could pop out freely in their minds. The robot as well as human at such moments could be regarded as authentic beings.

Finally, some concluding remarks are shown. The argument pre-sented here leads to:

1. The mind should emerge via intricate interactions between the top- down subjective view for proactively acting on the external world and the bottom- up recognition of the perceptual reality.

2. Structures and functions constituting mechanisms driving higher- order cognition, such as for compositional manipulations of symbols, concepts, or linguistic thoughts may develop by means of the self- organization of neurodynamic structures through the aforementioned top- down and bottom- up interactions, aiming at the reduction of any apparent conflict between these two processing streams. It is presumed that such a compositional cognitive process embedded in neurodynamic attractors could be naturally grounded into the physical world, provided they share the same metric space for interaction.

3. Image or knowledge can be developed through multiple stages of learning from an agent’s limited experiences— first stage: each instance of experience is acquired; second stage: generalized images or concepts are developed by extracting relational structures among the acquired instances in the memory; third stage: novel or creative structures can be found in the memory developed with nonlinearity. Such a developmental process should take place in a large network consisting of the PFC, the parietal cortex, and the sensory– motor peripheral areas that are assumed to be the neocortical target of the consolidative learning in human or mammals.

4. However, the most crucial aspect of minds is the sense of groundlessness that arises by circular causality, understood in the end as the inseparability of subjectivity and the objective

Conclusions 267

267

world. This understanding could shed light on the hard problem of consciousness and its relationship to the problem of free will through unification of theoretical studies on SOC of the holistic dynamics evolved and Heidegger’s thoughts on authenticity.

5. The exploration of cognitive minds should continue with close dialogue between objective science and subjective experience (as suggested by Varela and others) for which synthetic approaches including cognitive, developmental, or neuronal robotics could contribute by providing effective research platforms.

268

269

269

Glossary for Abbreviations

BPTT backpropagation through timeCPG central pattern generatorCTRNN continuous- time recurrent neural networkDOF degree of freedomEEG electroencephalographyfMRI functional magnetic resonance imagingIPL inferior parietal lobeLGN lateral geniculate nucleusLIP lateral intraparietal areaLSBN largescale brain networkLSTM long- term short- term memoryLRP lateralized readiness potentialM1 primary motor cortexMIST medial superior temporal areaMSTNN multiple spatiotemporal neural networkMT middle temporal areaMTRNN multiple timescale recurrent neural networkPB parametric biasesPC parietal cortexPCA principal component analysisPFC prefrontal cortexPMC premotor cortexPMv ventral premotor areaRNN recurrent neural network

270 Glossary for Abbreviations

270

RNNPB recurrent neural network with parametric biasesRP readiness potentialSMA supplementary motor areaSOC self- organized criticalitySTS superior temporal sulcusTEO inferior temporal areaTPJ temporoparietal junctionV1 primary visual cortexVIP ventral intraparietal areaVP visuo- proprioceptive

271

271

References

Aihara, K., Takabe, T., & Toyoda, M. (1990). Chaotic neural networks. Physics Letters A, 144, 333– 340.

Aristotle. (1907). De anima (R. D. Hicks, Trans.). Oxford: Oxford University Press.

St Amant, R., & Riedl, M. O. (2001). A perception/ action substrate for cog-nitive modeling in HCI. International Journal of Human- Computer Studies, 55(1), 15– 39.

Amari, S. (1967). A theory of adaptive pattern classifiers. IEEE Transactions on Electronic Computers, 3, 299– 307.

Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.

Andry, P., Gaussier, P., Moga, S., Banquet, J. P., & Nadel, J. (2001). Learning and communication via imitation: An autonomous robot perspective. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 31(5), 431– 442.

Arbib, M. A. (1981). Perceptual structures and distributed motor control. In V. B. Brooks (Ed.), Handbook of physiology: The nervous system. II. Motor control (pp. 1448– 1480). Bethesda, MD: American Physiological Society.

Arbib, M. (2010). Mirror system activity for action and language is embed-ded in the integration of dorsal and ventral pathways. Brain & Language, 112, 12– 24.

Arbib, M. (2012). How the brain got language: The mirror system hypothesis. New York: Oxford University Press.

Arie, H., Endo, T., Arakaki, T., Sugano, S., & Tani, J. (2009). Creating novel goal- directed actions at criticality: A neurorobotic experiment. New Mathematics and Natural Computation, 5(01), 307– 334.

272 References

272

Arnold, L. (1995). Random dynamical systems. Berlin: Springer.Asada, M., Hosoda K., Kuniyoshi, Y., Ishiguro, H., Inui, T., Yoshikawa, Y.,

Ogino, M. & Yoshida, C. (2009). Cognitive developmental robotics: A sur-vey. IEEE Transactions on Autonomous Mental Development, 1(1), 12– 34.

Asada, M. (2015). Towards artificial empathy. How can artificial empathy fol-low the developmental pathway of natural empathy?. International Journal of Social Robotics, 7(1), 19– 33.

Bach, J. (2008). Principles of synthetic intelligence: Building blocks for an archi-tecture of motivated cognition. New York: Oxford University Press.

Bach, K., (1987). Thought and reference. Oxford: Oxford University Press.Badre, D., D’Esposito, M. (2009). Is the rostro- caudal axis of the frontal lobe

hierarchical? Nature Reviews Neuroscience, 10, 659– 669.Bak, P., Tang, C., & Wiesenfeld, K. (1987). Self- organized criticality: An

explanation of the 1/ f noise. Physical Review Letters, 59, 381– 384.Baldwin, D., Andersson, A., Saffran, J., & Meyer, M. (2008). Segmenting

dynamic human action via statistical structure. Cognition, 106, 1382– 1407.Balslev, D., Nielsen, F. A., Paulson, O. B., & Law, I. (2005). Right tempo-

roparietal cortex activation during visuo- proprioceptive conflict. Cerebral Cortex, 15(2), 166– 169.

Baraglia, J., Nagai, Y., and Asada, M. (in press). Emergence of altruistic behav-ior through the minimization of prediction error. IEEE Transactions on Cognitive and Developmental Systems.

Bassett, D. S., & Gazzaniga, M. S. (2011). Understanding complexity in the human brain. Trends in cognitive sciences, 15(5), 200– 209.

Beer, R. D. (1995a). On the dynamics of small continuous- time recurrent neural networks. Adaptive Behavior, 3(4), 471– 511.

Beer, R. D. (1995b). A dynamical systems perspective on agent- environment interaction. Artificial Intelligence, 72(1), 73– 215.

Beer, R. D. (2000). Dynamical approaches to cognitive science. Trends in Cognitive Sciences, 4(3), 91– 99.

Bengio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), 1798– 1828.

Billard, A. (2000). Learning motor skills by imitation: A biologically inspired robotic model. Cybernetics and Systems, 32, 155– 193.

Blakemore, S- J., & Sirigu, A. (2003). Action prediction in the cerebel-lum and in the parietal cortex. Experimental Brain Research, 153(2), 239– 245.

Bor, D., & Seth, A. K. (2012). Consciousness and the prefrontal parietal net-work: Insights from attention, working memory, and chunking. Frontiers in Psychology, 3, 63.

Braitenberg, V. (1984). Vehicles: Experiments in synthetic psychology. Cam-bridge, MA: MIT Press.

References 273

273

Brooks, R. A. (1990). Elephants don’t play chess. Robotics and Autonomous Systems, 6, 3– 15.

Brooks, R. A. (1991). Intelligence without representation. Artificial Intelligence Journal, 47, 139– 159.

Campbell, D. T. (1974). “Downward causation” in hierarchically organized biological systems. In Studies in the Philosophy of Biology (pp. 179– 186). Macmillan Education UK.

Cangelosi, A., & Schlesinger, M. (2015). Developmental robotics from babies to robots. Cambridge, MA: MIT Press.

Chalmers, D. J. (1995). Facing up to the problem of consciousness. Journal of Consciousness Studies, 2(3), 200– 219.

Chomsky, N. (1972). Language and mind. New York: Harcourt Brace Jovanovich.

Choi, M. & Tani, J. (2016). Predictive coding for dynamic vision : Development of functional hierarchy in a multiple spatio-temporal scales RNN model. arXiv.org preprint arXiv:1606.01672

Chomsky, N. (1980). Rules and representations. Oxford: Basil Blackwell.Churchland, M. M., Yu, B. M., Cunningham, J. P., Sugrue, L. P., Cohen, M. R.,

Corrado, G. S., Newsome, W. T., Clark, A. M., Hosseini, P., Scott, B. B., Bradley, D. C., Smith, M. A., Kohn, A., Movshon, J. A., Armstrong, K. M., Moore, T., Chang, S. W., Snyder, L. H., Lisberger, S. G., Priebe, N. J., Finn, I. M., Ferster, D., Ryu, S. I., Santhanam, G., Sahani, M. & Shenoy, K. V. (2010). Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nature Neuroscience, 13(3), 369– 378.

Churchland, M. M., Cunningham, J. P., Kaufman, M. T., Nuyujukian, P., Foster, J. D., Ryu, S. I., & Shenoy, K. V. (2012). Structure of neural popula-tion dynamics during reaching. Nature, 487, 51– 56.

Clark, A. (1998). Being there: Putting brain, body, and world together again. Cambridge, MA: MIT Press.

Clark, A., & Chalmers, D. (1998). The extended mind. Analysis, 58(1), 7– 19.Clark, A. (1999). An embodied cognitive science?. Trends in Cognitive Sciences,

3(9), 345– 351.Clark, A. (2015). Surfing uncertainty: Prediction, action, and the embodied

mind. New York: Oxford University Press.Cleeremans, A., Servan- Schreiber, D., & McClelland, J. L. (1989). Finite state

automata and simple recurrent networks. Neural Computation, 1, 372– 381.Cliff, D., Husbands, P., & Harvey, I. (1993). Explorations in evolutionary

robotics. Adaptive Behavior, 2(1), 73– 110.Crutchfield, J. P. & Young, K. (1989). Inferring statistical complexity. Physical

Review Letters, 63, 105– 108.Dale, R., & Spivey, M. J. (2005). From apples and oranges to symbolic dynam-

ics: A framework for conciliating notions of cognitive representation. Journal of Experimental & Theoretical Artificial Intelligence, 17(4), 317– 342.

274 References

274

Delcomyn, F. (1980). Neural basis of rhythmic behavior in animals. Science, 210, 492– 498.

Demiris, Y., & Hayes, G. (2002). Imitation as a dual- route process featur-ing predictive and learning components: A biologically plausible compu-tational model. In K. Dautenhahn & C.L. Nehaniv (Eds.), Imitation in animals and artifacts (pp. 327– 361). Cambridge, MA: MIT Press.

Dennett, D. (1993). Review of F. Varela, E. Thompson and E. Rosch (Eds.), The embodied mind. American Journal of Psychology, 106, 121– 126.

Desmurget, M., & Grafton, S. (2000). Forward modeling allows feedback control for fast reaching movements. Trends in Cognitive Sciences, 4(11), 423– 431.

Desmurget, M., Reilly, K. T., Richard, N., Szathmari, A., Mottolese, C., & Sirigu, A. (2009). Movement intention after parietal cortex stimulation in humans. Science, 324, 811– 813.

Devaney, R. L. (1989). An introduction to chaotic dynamical systems (Vol. 6). Reading, MA: Addison- Wesley.

Diamond, A. (1991). Neuropsychological insights into the meaning of object concept development. In S. Carey, R. Gelman (Eds.), The epigenesis of mind: Essays on biology and knowledge (pp. 67– 110). Hillsdale, NJ: Erlbaum.

Di Paolo, E. A. (2000). Behavioral coordination, structural congruence and entrainment in a simulation of acoustically coupled agents. Adaptive Behavior, 8(1), 27– 48.

Doetsch, P., Kozielski, M., & Ney, H. (2014). Fast and robust training of recur-rent neural networks for offline handwriting recognition. In IEEE 14th International Conference on Frontiers in Handwriting Recognition (ICFHR) (pp. 279– 284).

Downar, J., Crawley, A. P., Mikulis, D.J., & Davis, K.D. (2000). A multi-modal cortical network for the detection of changes in the sensory environ-ment. Nature Neuroscience, 3(3), 277– 283.

Doya, K., & Uchibe, E. (2005). The cyber rodent project: Exploration of adaptive mechanisms for self- preservation and self- reproduction. Adaptive Behavior, 13(2), 149– 160.

Doya, K., & Yoshizawa, S. (1989). Memorizing oscillatory patterns in the ana-log neuron network. Proceedings of the 1989 International Joint Conference on Neural Networks, I, 27– 32.

Dreyfus, H. L., & Dreyfus, S. E. (1988). Making a mind versus model-ing the brain: artificial intelligence back at a branch point. Daedalus, 117(1), 15– 43.

Dreyfus, H. L. (1991). Being- in- the- world: A commentary on Heidegger’s Being and Time. Cambridge, MA: MIT Press.

Du, J., & Poo, M. (2004). Rapid BDNF- induced retrograde synaptic modifica-tion in a developing retinotectal system. Nature, 429, 878– 883.

References 275

275

Eagleman, D. M., & Sejnowski, T. J. (2000). Motion integration and postdic-tion in visual awareness. Science, 287(5460), 2036– 2038.

Edelman, G. M. (1987). Neural Darwinism: The theory of neuronal group selec-tion. New York: Basic Books, Inc.

Ehrsson, H., Fagergren, A., Johansson, R., & Forssberg, H. (2003). Evidence for the involvement of the posterior parietal cortex in coordination of fin-gertip forces for grasp stability in manipulation. Journal of Neurophysiology, 90, 2978– 2986.

Eliasmith, C. (2014). How to build a brain: A neural architecture for biological cognition. New York: Oxford University Press.

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14, 179– 211.Elman, J. L. (1991). Distributed representations, simple recurrent networks,

and grammatical structure. Machine Learning, 7(2– 3), 195– 225.Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G.

(2008). Learning CPG- based biped locomotion with a policy gradient method: Application to a humanoid robot. The International Journal of Robotics Research, 27(2), 213– 228.

Eskandar, E., & Assad, J. (1999). Dissociation of visual, motor and predic-tive signals in parietal cortex during visual guidance. Nature Neuroscience, 2, 88– 93.

Evans, G. (1982). The varieties of reference. Oxford: Clarendon Press.Fitzsimonds, R., Song, H., & Poo, M. (1997). Propagation of activity depend-

ent synaptic depression in simple neural networks. Nature, 388, 439– 448.Fleischer, J., Gally, J., Edelman, J., & Krichmar, J. (2007). Retrospective and

prospective responses arising in a modeled hippocampus during maze navigation by a brain- based device. Proceedings of the National Academy of Sciences of the USA, 104(9), 3556– 3561.

Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architec-ture: A critique. Cognition, 28, 3– 71.

Fogassi, L., Ferrari, P., Gesierich, B., Rozzi, S., Chersi, F., & Rizzolatti, G. (2005). Parietal lobe: from action organization to intention understanding. Science, 308, 662– 667.

Freeman, W. (2000). How brains make up their minds? New York: Columbia University Press.

Fried, I., Katz, A., McCarthy, G., Sass, K. J., Williamson, P., Spencer, S. S. & Spencer, D. D. (1991). Functional organization of human supplementary motor cortex studied by electrical stimulation. Journal of Neuroscience, 11, 3656– 3666.

Friston, K. (1998). The disconnection hypothesis. Schizophrenia Research, 30(2), 115– 125.

Friston, K. (2005). A theory of cortical responses. Philosophical transactions of the Royal Society B: Biological Sciences, 360(1456), 815– 836.

276 References

276

Friston, K. (2010). The free- energy principle: A unified brain theory? Nature Reviews Neuroscience, 11, 127– 138.

Frith, C. D., & Frith, U. (2012). Mechanisms of social cognition, Annual Review of Psychology, 63, 287– 313.

Fukushima, Y., Tsukada, M., Tsuda, I., Yamaguti, Y., & Kuroda, S. (2007). Spatial clustering property and its self- similarity in membrane poten-tials of hippocampal CA1 pyramidal neurons for a spatiotemporal input sequence. Cognitive Neurodynamics, 1, 305– 316.

Gallagher, S. (2000). Philosophical conceptions of the self: Implications for cognitive science. Trends in Cognitive Sciences, 4(1), 14– 21.

Gallese, V. & Goldman, A. (1998). Mirror neurons and the simulation theory of mind- reading. Trends in Cognitive Sciences, 2, 493– 501.

Gaussier, P., Moga, S., Quoy, M., & Banquet, J. P. (1998). From perception- action loops to imitation processes: A bottom- up approach of learning by imitation. Applied Artificial Intelligence, 12(7- 8), 701– 727.

Georgopoulos, A. P., Kalaska, J. F., Caminiti, R., & Massey, J. T. (1982). On the relations between the direction of two- dimensional arm movements and cell discharge in primate motor cortex. The Journal of Neuroscience, 2, 1527– 1537.

Gershkoff- Stowe, L., & Thelen, E. (2004). U- shaped changes in behav-ior: A dynamic systems perspective. Journal of Cognition and Development, 5, 11– 36.

Gibson, E.J., & Pick, A.D. (2000). An ecological approach to perceptual learn-ing and development. New York: Oxford University Press.

Gibson, J. J. (1986). The ecological approach to visual perception. Boston: Houghton Mifflin.

Goodale, M. A., Milner, A. D., Jakobson, L. S., & Carey, D. P. (1991). A neurological dissociation between perceiving objects and grasping them. Nature, 349(6305), 154– 1546.

Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines. arXiv.org preprint arXiv:1410.5401.

Graziano, M., Taylor, C., & Moore, T. (2002). Complex movements evoked by microstimulation of precentral cortex. Neuron, 34, 841– 851.

Gunji, Y. & Konno, N. (1991). Artificial life with autonomously emerging boundaries. Applied Mathematics and Computation, 43, 271– 298.

Haggard, P. (2008). Human volition: towards a neuroscience of will. Nature Reviews Neuroscience, 9(12), 934– 946.

Haken, H. (1983). Advanced synergetics. Berlin: Springer.Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E.,

Prenger, R., Satheesh, S., Sengupta, S., Coates, A. & Ng, A. Y. (2014). DeepSpeech: Scaling up end- to- end speech recognition. arXiv.org pre-print arXiv:1412.5567.

Harnad, S. (1990). The symbol grounding problem. Physica D, 42, 335– 346.

References 277

277

Harnad, S. (1992). Connecting object to symbol in modeling cognition. In A. Clarke, & R. Lutz (Eds.), Connectionism in context. Berlin: Springer Verlag.

Haruno, M., Wolpert, D. M., & Kawato, M. (2003). Hierarchical MOSAIC for movement generation. In International congress series (Vol. 1250, pp. 575– 590). Amsterdam: Elsevier.

Harris K. (2008). Stability of the fittest: Organizing learning through ret-roaxonal signals. Trends in Neurosciences, 31(3), 130– 136.

Hasson, U., Yang, E., Vallines, I., Heeger, D. J., & Rubin, N. (2008). A hier-archy of temporal receptive windows in human cortex. The Journal of Neuroscience, 28(10), 2539– 2550.

Hauk, O., Johnsrude, I., & Pulvermuller, F. (2004). Somatotopic representa-tion of action words in human motor and premotor cortex. Neuron, 41(2), 301– 307.

Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of lan-guage: What is it, who has it, and how did it evolve? Science, 298(5598), 1569– 1579.

Heidegger, M. (1962). Being and time (J. Macquarrie, & E. Robinson, Trans.). London: SCM Press.

Molesworth, W. (1841). The English works of Thomas Hobbes (Vol. 5). J. Bohn, 1841.

Hinton, G., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527– 1554.

Hochreiter, S., & Schmidhuber, J. (1997). Long short- term memory. Neural Computation, 9(8), 1735– 1780.

Husserl, E. (1964). The phenomenology of internal time consciousness (J. S. Churchill, Trans.). Bloomington, IN: Indiana University Press.

Husserl, E. (1970). Logical investigations (Vol. 1). London: Routledge & Kegan Paul Ltd.

Husserl, E. (2002). Studien zur arithmetik und geometrie. New York: Springer- Verlag.

Hyvarinen, J., & Poranen, A. (1974). Function of the parietal associative area 7 as revealed from cellular discharges in alert monkeys. Brain, 97, 673– 692.

Hwang, J., Jung, M., Madapana, N., Kim, J., Choi, M., & Tani, J. (2015). Achieving “synergy” in cognitive behavior of humanoids via deep learning of dynamic visuo- motor- attentional coordination. In Proceeding of 2015 IEEE- RAS 15th International Conference on Humanoid Robots (pp. 817– 824).

Iacoboni, M., Woods, R. P., Brass, M., Bekkering, H., Mazziotta, J. C. & Rizzolatti, G. (1999). Cortical mechanisms of imitation. Science, 286, 2526– 2528.

Ijspeert, A. J. (2001). A connectionist central pattern generator for the aquatic and terrestrial gaits of a simulated salamander. Biological Cybernetics, 84, 331– 348.

278 References

278

Ikeda, K., Otsuka, K. & Matsumoto, K. (1989). Maxwell- Bloch turbulence. Progress of Theoretical Physics, 99, 295– 324.

Ikegami, T. & Iizuka, H. (2007). Turn- taking interaction as a cooperative and co- creative process. Infant Behavior and Development, 30(2), 278– 288.

Ikegami, T. (2013). A design for living technology: Experiments with the mind time machine. Artificial Life, 19(3– 4), 387– 400.

Ikegaya, Y., Aaron, G., Cossart, R., Aronov, D., Lampl, I., et al. (2004). Synfire chains and cortical songs: Temporal modules of cortical activity. Science, 304, 559– 564.

Iriki, A., Tanaka, M., & Iwamura, Y. (1996). Coding of modified body schema during tool use by macaque postcentral neurones. Neuroreport, 7(14), 2325– 2330.

Ito, M. (1970). Neurophysiological basis of the cerebellar motor control sys-tem. International Journal of Neurology, 7, 162– 176.

Ito, M. (2005). Bases and implications of learning in the cerebellum— adaptive control and internal model mechanism. Progress in Brain Research, 148, 95– 109.

Ito, M., & Tani, J. (2004). On- line imitative interaction with a humanoid robot using a dynamic neural network model of a mirror system. Adaptive Behavior, 12(2), 93– 115.

Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting cha-otic systems and saving energy in wireless telecommunication. Science, 308, 78– 80.

Jaeger, H., Lukoševičius, M., Popovici, D., & Siewert, U. (2007). Optimization and applications of echo state networks with leaky- integrator neurons. Neural Networks, 20(3), 335– 352.

James, W. (1884). The dilemma of determinism. Unitarian Review (Vol. XXII, p.193). Reprinted (1956) in The will to believe (p.145). Mineola, NY: Dover Publications, p.145.

James, W. (1892). The stream of consciousness. World: Cleveland, OH.James, W. (1918). The principles of psychology (Vol 1). New York, NY:

Henry Holt.Jeannerod, M. (1994). The representing brain: Neural correlates of motor

intention and imagery. Behavioral and Brain Sciences, 17, 187– 202.Johnson- Pynn, J., Fragaszy, D. M., Hirsh, E. M., Brakke, K. E., & Greenfield,

P. M. (1999). Strategies used to combine seriated cups by chimpanzees (Pan troglodytes), bonobos (Pan paniscus), and capuchins (Cebus apella). Journal of Comparative Psychology, 113(2), 137– 148.

Jordan, M. I. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proceedimgs of Eighth Annual Conference of Cognitive Science Society (pp. 531– 546). Hillsdale, NJ: Erlbaum.

References 279

279

Jung, M., Hwang, J., & Tani, J. (2015). Self- organization of spatiotemporal hierarchy via learning of dynamic visual image patterns on action sequences. PLoS One, 10(7), e0131214.

Kaneko, K. (1990). Clustering, coding, switching, hierarchical ordering and control in a network of chaotic elements. Physica D, 41, 137– 72.

Karmiloff- Smith, A. (1992). Beyond modularity: A developmental perspective on cognitive science. Cambridge, MA: MIT Press.

Kawato, M. (1990). Computational schemes and neural network models for formation and control of multijoint arm trajectory. In T. Miller, R. S. Sutton, & P. J. Werbos (Eds.), Neural networks for control (pp. 197– 228). Cambridge, MA: MIT Press.

Kelso, S. (1995). Dynamic patterns: The self- organization of brain and behavior. Cambridge, MA: MIT Press.

Kiebel, S., Daunizeau, J., & Friston, K. (2008). A hierarchy of timescales and the brain. PLoS Computational Biology, 4, e1000209.

Kimura, H., Akiyama, S., & Sakurama, K. (1999). Realization of dynamic walking and running of the quadruped using neural oscillator. Autonomous Robots, 7(3), 247– 258.

Kirkham, N., Slemmer, J., & Johnson, S. (2002). Visual statistical learning in infancy: Evidence for a domain general learning mechanism. Cognition, 83, B35– B42.

Klahr, D., Chase, W. G., & Lovelace, E. A. (1983). Structure and process in alphabetic retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9(3), 462.

Kolen, J. F. (1994). Exploring computational complexity of recurrent neural net-works. (PhD thesis, The Ohio State University).

Kourtzi, Z., Tolias, A. S., Altmann, C. F., Augath, M., & Logothetis, N. K. (2003). Integration of local features into global shapes: monkey and human fMRI studies. Neuron, 37(2), 333– 346.

Koza, J. R. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge, MA: MIT Press.

Krichmar, J. L. & Edelman, G. M. (2002). Machine psychology: autonomous behavior, perceptual categorization and conditioning in a brain- based device, Cerebral Cortex, 12, 818– 830.

Kuniyoshi, Y., Inaba, M. and Inoue, H. (1994). Learning by watch-ing: Extracting reusable task knowledge from visual observation of human performance. IEEE Transactions on Robotics Automation, 10, 799– 822.

Kuniyoshi, Y., Ohmura, Y., Terada, K., Nagakubo, A., Eitoku, S. I., & Yamamoto, T. (2004). Embodied basis of invariant features in execution and perception of whole- body dynamic actions— knacks and focuses of Roll- and- Rise motion. Robotics and Autonomous Systems, 48(4), 189– 201.

280 References

280

Kuniyoshi. Y., & Sangawa, S. (2006). Early motor development from par-tially ordered neural- body dynamics— experiments with a cortico- spinal- musculo- skeletal model. Biological Cybernetics, 95, 589– 605.

Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). Soar: An architecture for general intelligence. Artificial Intelligence, 33, 1– 64.

Laird, J. E. (2008). Extending the Soar cognitive architecture. Frontiers in Artificial Intelligence and Applications, 171, 224.

LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient- based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278– 2324.

Li, W., Piëch, V., & Gilbert, C. D. (2006). Contour saliency in primary visual cortex. Neuron, 50(6), 951– 962.

Libet, B. (1985). Unconscious cerebral initiative and the role of conscious will in voluntary action. Behavioral and Brain Sciences, 8, 529– 539.

Lu, X., & Ashe, J. (2005). Anticipatory activity in primary motor cortex codes memorized movement sequences. Neuron, 45, 967– 973.

Luria, A. (1973). The working brain. London: Penguin Books Ltd.McCarthy, J. (1963). Situations, actions and causal laws. Stanford Artificial

Intelligence Project, Memo 2. Stanford University.Markov, A. (1971). Extension of the limit theorems of probability theory to

a sum of variables connected in a chain. Dynamic Probabilistic Systems, 1, 552– 577.

Markram, H., Muller, E., Ramaswamy, S., Reimann, M. W., Abdellah, M., Sanchez, C. A., … & Kahou, G. A. A. (2015). Reconstruction and simula-tion of neocortical microcircuitry. Cell, 163(2), 456– 492.

Matarić, M. (1992). Integration of representation into goal- driven behavior- based robots. IEEE Transactions on Robotics and Automation, 8(3), 304– 312.

Matsuno, K. (1989). Physical Basis of Biology. Boca Raton, FL: CRC Press.Maturana, H. R., & Varela, F. J. (1980). Autopoiesis and Cognition.

Netherlands: Springer.May, R. M. (1976). Simple mathematical models with very complicated

dynamics. Nature, 261(5560), 459– 467.Meeden L. (1996). An incremental approach to developing intelligent neural

network controllers for robots. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 26(3), 474– 485.

Merleau- Ponty, M. (1962). Phenomenology of perception (C. Smith, Trans.), London: Routledge & Kegan Paul Ltd.

Merleau- Ponty, M. (1968). The Visible and the invisible: Followed by working notes (Studies in phenomenology and existential philosophy). Evanston, IL: Northwestern University Press.

Meltzoff, A. N., & Moore, M. K. (1977). Imitation of facial and manual ges-tures by human neonates. Science, 198(4312), 75– 78.

References 281

281

Meltzoff, A.N. (2005). “Imitation and other minds: The ‘like me’ hypothe-sis.” In S. Hurley and N. Chater (Eds.), Perspectives on imitation: From cogni-tive neuroscience to social science (pp. 55– 77). Cambridge, MA: MIT Press.

Metta, G., Natale, L., Nori, F., Sandini, G., Vernon, D., Fadiga, L., et al. (2010). The iCub humanoid robot: An open- systems platform for research in cognitive development. Neural Networks, 23(8– 9), 1125– 1134.

Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81.

Morimoto, J., & Doya, K. (2001). Acquisition of stand- up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems, 36(1), 37– 51.

Mormann, F., Kornblith, S., Quiroga, R. Q., Kraskov, A., Cerf, M., Fried, I., & Koch, C. (2008). Latency and selectivity of single neurons indicate hierarchical processing in the human medial temporal lobe. Journal of Neuroscience, 28, 8865– 8872.

Mulliken, G. H., Musallam, S., & Andersen, R. A., (2008). Forward esti-mation of movement state in posterior parietal cortex. Proceedings of the National Academy of Sciences of the USA, 105(24), 8170– 8177.

Murata, S., Yamashita, Y., Arie, H., Ogata, T., Sugano, S., & Tani, J. (2015). Learning to perceive the world as probabilistic or deterministic via interac-tion with others: A neurorobotics experiment. IEEE Transactions on neu-ral Networks and Learning Systems, [2015 Nov 18; epub ahead of print], DOI: 10.1109/ TNNLS.2015.2492140

Mushiake, H., Inase, M., & Tanji, J. (1991). Neuronal activity in the pri-mate premotor, supplementary, and precentral motor cortex during visu-ally guided and internally determined sequential movements. Journal of Neurophysiology, 66(3), 705– 718.

Nadel, J. (2002). Imitation and imitation recognition: Functional use in pre-verbal infants and nonverbal children with autism. In A. N. Meltzoff, & W. Prinz (Eds.), The imitative mind: Development, evolution, and brain bases (pp. 42– 62). Cambridge University Press.

Nagai, Y., & Asada, M. (2015). Predictive learning of sensorimotor informa-tion as a key for cognitive development. In Proceedings of the IROS 2015 Workshop on Sensorimotor Contingencies for Robotics. Osaka, Japan.

Namikawa, J., Nishimoto, R., & Tani, J. (2011). A neurodynamic account of spontaneous behavior. PLoS Computational Biology, 7(10), e1002221.

Newell, A., & Simon, H. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice- Hall.

Newell, A., & Simon, H. A. (1975). Computer science as empirical inquiry: Symbols and search. Communications of the ACM, 19(3), 113– 126.

Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.

282 References

282

Nicolis, G., & Prigogine, I. (1977). Self- organization in nonequilibrium systems. New York: Wiley.

Nishida, K. (1990). An inquiry into the good (M. Abe & C. Ives, Trans.). New Haven: Yale University Press.

Nishimoto, R., & Tani, J. (2009). Development of hierarchical structures for actions and motor imagery: A constructivist view from synthetic neuroro-botics study. Psychological Research, 73, 545– 558.

Nolfi, S. & Floreano, D. (2000). Evolutionary robotics: The biology, intelligence, and technology of self- organizing machines. Cambridge, MA: MIT Press.

Nolfi, S., & Floreano, D. (2002). Synthesis of autonomous robots through artificial evolution. Trends in Cognitive Sciences, 6(1), 31– 37.

Ogai, Y., & Ikegami, T. (2008). Microslip as a simulated artificial mind. Adaptive Behavior, 16(2– 3), 129– 147.

Ogata, T., Hattori, Y., Kozima, H., Komatani, K., & Okuno, H. G. (2006). Generation of robot motions from environmental sounds using intermo-dality mapping by RNNPB. In Sixth International Workshop on Epigenetic Robotics, Paris, France.

Ogata, T., Yokoya, R., Tani, J., Komatani, K., & Okuno, H. G. (2009). Prediction and imitation of other’s motions by reusing own forward- inverse model in robots. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation (pp. 4144– 4149). Kobe, Japan.

O’Regan, J. K., & Noe, A. (2001). A sensorimotor account of vision and visual consciousness. Behavioral & Brain Sciences, 24, 939– 1031.

Oudeyer, P. Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation sys-tems for autonomous mental development. Evolutionary Computation, IEEE Transactions on, 11(2), 265– 286.

Oztop, E., Kawato, M., & Arbib, M. (2006). Mirror neurons and imita-tion: A computationally guided review. Neural Networks, 19(3), 254– 271.

Paine, R. W., & Tani, J. (2005). How hierarchical control self- organizes in artificial adaptive systems. Adaptive Behavior, 13(3), 211– 225.

Park, G., & Tani, J. (2015). Development of compositional and contextual communicable congruence in robots by using dynamic neural network models. Neural Networks, 72, 109– 122.

Pepperberg, I. M., & Shive, H. R. (2001). Simultaneous development of vocal and physical object combinations by a Grey parrot (Psittacus eritha-cus): Bottle caps, lids, and labels. Journal of Comparative Psychology, 115(4), 376– 384.

Perry, W., & Braff, D. L. (1994). Information- processing deficits and thought disorder. American Journal of Psychiatry, 15(1), 363– 367.

Pfeifer, R & Bongard, J. (2006). How the body shapes the way we think— A new view of intelligence. Cambridge, MA: MIT Press.

Piaget, J. (1951). The child's conception of the world. Rowman & Littlefield.

References 283

283

Piaget, J. (1962). Play, dreams, and imitation in childhood (G. Gattegno, & F. M. Hodgson, Trans.). New York: Norton.

Pollack, J. B. (1991). The induction of dynamical recognizers. Machine Learning, 7, 227– 252.

Pulvermuller, F. (2005). Brain mechanisms linking language and action. Nature Neuroscience, 6(5), 76– 82.

Ramachandran, V. S., & Blakeslee, S. (1998). Phantoms in the brain: Probing the mysteries of the human mind. New York: William Morrow.

Rao, R., & Ballard, D. (1999). Predictive coding in the visual cortex: A func-tional interpretation of some extra- classical receptive- field effects. Nature Neuroscience, 2, 79– 87.

Ritter, F. E., Baxter, G. D., Jones, G., & Young, R. M. (2000). Supporting cog-nitive models as users. ACM Transactions on Computer- Human Interaction, 7(2), 141– 173.

Rizzolatti, G., Fadiga, L., Gallese, V., & Fogassi, L. (1996). Premotor cor-tex and the recognition of motor actions. Cognitive Brain Research, 3, 131– 141.

Rizzolatti, G., Fogassi, L., & Gallese, V. (2001). Neurophysiological mecha-nisms underlying the understanding and imitation of action. Nature Review Neuroscience, 2, 661– 670.

Rizzolatti, G., & Craighero, L. (2004). The mirror- neuron system. Annual Review of Neuroscience, 27, 169– 192.

Rosander, R., & von Hofsten, C. (2004). Infants’ emerging ability to represent object motion. Cognition, 91, 1– 22.

Rössler, O. E. (1976). An equation for continuous chaos. Physics Letters, 57A(5), 397– 398.

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, & J. L. Mclelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Cambridge, MA: MIT Press.

Rumelhart, D. E., McClelland, J. L., & the PDP Research Group. (1986). Parallel distributed processing: Explorations in the microstructure of cognition, Cambridge, MA: MIT Press.

Saffran, J., Aslin, R., & Newport, E. (1996). Statistical learning by 8- month- old infants. Science, 274, 1926– 1928.

Sakata, H., Taira, M., Murata, A., & Mine, S. (1995). Neural mechanisms of visual guidance of hand action in the parietal cortex of the monkey. Cerebral Cortex, 5(5), 429– 438.

Schaal, S. (1999). Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 3, 233– 242.

Scheier, C., Pfeifer, R., Kuniyoshi, Y. (1998). Embedded neural net-works: Exploiting constraints. Neural Networks, 11, 1551– 1596.

284 References

284

Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4 (2), 234– 242.

Schöner, G. & Kelso, J. A. S. (1988). Dynamic pattern generation in behav-ioral and neural systems. Science, 239, 1513– 1539.

Schöner, G., & Thelen, E. (2006). Using dynamic field theory to rethink infant habituation. Psychological Review, 113(2), 273– 299.

Shanahan, M. (2006). A cognitive architecture that combines internal simula-tion with a global workspace. Consciousness and cognition, 15(2), 433– 449.

Shibata, K., & Okabe, Y. (1997). Reinforcement learning when visual sen-sory signals are directly given as inputs. In Proceedings of IEEE International Conference on Neural Networks (Vol. 3, pp. 1716– 1720).

Shima, K., & Tanji, J. (1998). Both supplementary and presupplementary motor areas are crucial for the temporal organization of multiple move-ments. Journal of Neurophysiology, 80, 3247– 3260.

Shima, K., & Tanji, J. (2000). Neuronal activity in the supplementary and presupplementary motor areas for temporal organization of multiple movements. Journal of Neurophysiology, 84, 2148– 2160.

Shimojo, S. (2014). Postdiction: Its implications on visual awareness, hind-sight, and sense of agency. Frontiers in Psychology, 5, 196.

Siegelmann, H. T. (1995). Computation beyond the Turing limit. Science, 268(5210), 545– 548.

Simon, H. A. (1981). The sciences of the artificial (2nd ed.). Cambridge, MA: MIT Press.

Sirigu, A., Daprati, E., Ciancia, S., Giraux, P., Nighoghossian, N., Posada, A., & Haggard, P. (2003). Altered awareness of voluntary action after damage to the parietal cortex. Nature Neuroscience, 7, 80– 84.

Sirigu, A., Duhamel, J. R., Cohen, L., Pillon, B., Dubois, B. & Agid, Y. (1996). The mental representation of hand movements after parietal cortex dam-age. Science, 273(5281), 1564– 1568.

Smith, L. & Thelen, E. (2003). Development as a dynamic system. Trends in Cognitive Sciences, 7(8), 343– 348.

Soon, C., Brass, M., Heinze, H., & Haynes, J. (2008). Unconscious deter-minants of free decisions in the human brain. Nature Neuroscience, 11, 543– 545.

Spencer- Brown, G. (1969). Laws of form. Wales, UK: George Allen and Unwin Ltd.

Spivey, M. (2007). The continuity of mind. New York: Oxford University Press.

Sporns, O. (2010). Networks of the brain. Cambridge, MA: MIT Press.Squire, L. R., & Alvarez, P. (1995). Retrograde amnesia and memory consoli-

dation: A neurobiological perspective. Current Opinion in Neurobiology, 5, 169– 177.

References 285

285

Steil, J. J., Röthling, F., Haschke, R., & Ritter, H. (2004). Situated robot learning for multimodal instruction and imitation of grasping. Robotics and Autonomous Systems, 47(2), 129– 141.

Sugita, Y., & Tani, J. (2005). Learning semantic combinatoriality from the interaction between linguistic and behavioral processes. Adaptive Behavior, 13(1), 33– 52.

Sun, R. (2016). Anatomy of mind: Exploring psychological mechanisms and processes with the CLARION cognitive architecture. New York: Oxford University Press.

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., … & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1– 9).

Taga, G., Yamaguchi, Y. & Shimizu, H. (1991). Self- organized control of bipedal locomotion by neural oscillators in unpredictable environments. Biological Cybernetics, 65, 147– 159.

Tanaka, K. (1993). Neuronal mechanisms of object recognition. Science, 262, 685– 688.

Tani, J. (1996). Model- based learning for mobile robot navigation from the dynamical systems perspective. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 26(3), 421– 436.

Tani, J. (1998). An interpretation of the “self” from the dynamical systems perspective: A constructivist approach. Journal of Consciousness Studies, 5(5- 6), 516– 542.

Tani, J. (2003). Learning to generate articulated behavior through the bottom- up and the top- down interaction process. Neural Networks, 16, 11– 23.

Tani, J. (2004). The dynamical systems accounts for phenomenology of immanent time: An interpretation by revisiting a robotics synthetic study. Journal of Consciousness Studies, 11(9), 5– 24.

Tani, J. (2009). Autonomy of “self” at criticality: The perspective from syn-thetic neurorobotics. Adaptive Behavior, 17(5), 421– 443.

Tani, J. (2014). Self- Organization and compositionality in cognitive brains: A neurorobotics study. Proceedings of the IEEE, 102(4), 586– 605.

Tani, J., Friston, K., & Haykin, S. (2014). Self- organization and composition-ality in cognitive brains [Further thoughts]. Proceedings of the IEEE, 4(102), 606– 607.

Tani, J., & Fukumura N. (1997). Self- organizing internal representation in learning of navigation: a physical experiment by the mobile robot YAMABICO. Neural Networks, 10(1), 153– 159.

Tani, J., & Fukumura, N. (1993). Learning goal- directed navigation as attrac-tor dynamics for a sensory motor system. (An experiment by the mobile robot YAMABICO), Proceedings of the 1993 International Joint Conference on Neural Networks (pp. 1747– 1752).

286 References

286

Tani, J., & Fukumura, N. (1995). Embedding a grammatical description in deterministic chaos: an experiment in recurrent neural learning. Biological Cybernetics, 72(4), 365– 370.

Tani, J. and Nolfi, S. (1997). Self- organization of modules and their hierarchy in robot learning prblems: A dynamical systems approach. System Analysis for Higher Brain Function Research Project News Letter, 2(4), 1– 11.

Tani, J., & Nolfi, S. (1999). Learning to perceive the world as articulated: An approach for hierarchical learning in sensory- motor systems. Neural Networks, 12(7), 1131– 1141.

Tani, J., Ito, M., Sugita, Y. (2004). Self- organization of distributedly repre-sented multiple behavior schemata in a mirror system: Reviews of robot experiments using RNNPB. Neural Networks, 17, 1273– 1289.

Tani, T. (1998). The physics of consciousness. Tokyo: Keiso- shobo.Taniguchi, T., Nagai, T., Nakamura, T., Iwahashi, N., Ogata, T., & Asoh,

H. (2016). Symbol emergence in robotics: A survey. Advanced Robotics, DOI: 10.1080/ 01691864.2016.1164622.

Tanji, J., & Shima, K. (1994). Role for supplementary motor area cells in pla-nning several movements ahead. Nature, 371, 413– 416.

Tettamanti, M., Buccino, G., Saccuman, M. C., Gallese, V., Danna, M., Scifo, P. F., Fazio, Rizzolatti, G. S., Cappa, F., & Perani, D. (2005). Listening to action- related sentences activates fronto- parietal motor circuits. Journal of Cognitive Neuroscience, 17(2), 273– 281.

Thelen, E. & Smith, L. (1994). A dynamic systems approach to the development of cognition and action, Cambridge, MA: MIT Press.

Tokimoto, N., & Okanoya, K. (2004). Spontaneous construction of “Chinese boxes” by Degus (Octodon degu): A rudiment of recursive intelligence? Japanese Psychological Research, 46, 255– 261.

Tomasello, M. (2009). Constructing a language: A usage- based theory of lan-guage acquisition. Cambridge, MA: Harvard University Press.

Tononi, G. (2008). Consciousness as integrated information: A provisional manifesto. The Biological Bulletin, 215(3), 216– 242.

Trevena, J. A., & Miller, J. (2002). Cortical movement preparation before and after a conscious decision to move. Consciousness and Cognition, 11(2), 162– 190.

Tsuda, I., Körner, E. & Shimizu, H. (1987). Memory dynamics in asynchro-nous neural networks. Progress of Theoretical Physics, 78, 51– 71.

Tsuda, I. (2001). Toward an interpretation of dynamic neural activity in terms of chaotic dynamical systems. Behavioral and Brain Sciences, 24(5), 793– 810.

Uddén, J., & Bahlmann, J. (2012). A rostro- caudal gradient of structured sequence processing in the left inferior frontal gyrus. Philosophical. Transactions of the Royal Society of London Series B- Biological Sciences, 367, 2023– 2032.

References 287

287

Ueda, S. (1994). Experience and Awareness: Exploring Nishida Philosophy (English Translation from Japanese). Tokyo, Japan: Iwanami Shoten.

Keiken to jikaku– Nishida tetsugaku no basho wo motomete, Iwanami- shoten.Ugur, E., Nagai, Y., Sahin, E., & Oztop, E. (2015). Staged development of

robot skills: Behavior formation, affordance learning and imitation with motionese. IEEE Transactions on Autonomous Mental Development, 7(2), 119– 139.

Van de Cruys, S., Evers, K., Van der Hallen, R., Van Eylen, L., Boets, B., de- Wit, L., & Wagemans, J. (2014). Precise minds in uncertain worlds: predic-tive coding in autism. Psychological Review, 121(4), 649.

Varela, F. J., Thompson, E. T., & Rosch, E. (1991). The embodied mind: Cognitive science and human experience. Cambridge, MA: MIT Press.

Varela, F. J. (1996). Neurophenomenology: A methodological remedy to the hard problem. Journal of Consciousness Studies, 3, 330– 350.

Varela, F. J. (1999). Present- time consciousness. Journal of Consciousness Studies, 6 (2- 3), 111– 140.

von Hofsten, C., & Rönnqvist, L. (1988). Preparation for grasping an object: A developmental study. Journal of Experimental Psychology: Human Perception and Performance, 14, 610– 621.

Werbos, P. (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. (PhD thesis, Harvard University).

Werbos, P. J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4), 339– 356.

White, J. (2016). Simulation, self- extinction, and philosophy in the service of human civilization. AI & Society, 31(2), 171– 190.

Wilson, M. A. & McNaughton, B. L. (1994). Reactivation of hippocampal ensemble memories during sleep. Science, 265, 676– 679.

Williams, B. (2014). Descartes: The project of pure enquiry. London and New York: Routledge.

Williams, R. J. & Zipser, D. (1989). A Learning algorithm for continu-ally running fully recurrent neural networks. Neural Computation, 1, 270– 280.

Wolpert, D., & Kawato, M. (1998). Multiple paired forward and inverse mod-els for motor control. Neural Networks, 11, 1317– 1329.

Yamashita, Y., & Tani, J. (2008). Emergence of functional hierarchy in a multiple timescale neural network model: A humanoid robot experiment. PLoS Computational Biology, 4(11), e1000220.

Yamashita, Y., & Tani, J. (2012). Spontaneous prediction error generation in schizophrenia. PloS One, 7(5), e37843.

Yen, S. C., Baker, J., & Gray, C. M. (2007). Heterogeneity in the responses of adjacent neurons to natural stimuli in cat striate cortex. Journal of Neurophysiology, 97, 1326– 1341.

288 References

288

Ziemke, T. & Thieme, M. (2002). Neuromodulation of reactive sensorimotor mappings as a short- term memory mechanism in delayed response tasks. Adaptive Behavior, 10(3/ 4), 185– 199.

Zhong, J., Cangelosi, A., & Wermter, S. (2014). Toward a self- organizing pre- symbolic neural model representing sensorimotor primitives. Frontiers in Behavioral Neuroscience, 7, 22.

289

289

Index

Note: Page numbers followed by “ f ” and “t” denote figures and tables, respectively.

absolute flow level, 28, 39abstract sequences, SMA

encoding, 52– 53accidental generations with

spontaneous variation, 225– 26actional consequences, predictive

learning from, 151– 73action generation, 197– 98

brain, 44– 68hierarchical mechanisms for,

50f, 52fthrough hierarchy, 49– 50, 50fperception's role in, 55sensory- motor flow mirroring,

175– 98, 178faction- related words, 67actions. See also complex actions;

goal- directed actions; intentionscategories, 196conscious decision preceded

by, 237external inputs perturbing, 215

free will for, 219– 41, 221f, 223f, 225f, 228f, 229f, 233f, 234f, 236f, 238f

functional hierarchy developed for, 199– 218, 200f, 201f, 203f, 265

intransitive, 65language bound to, 190– 96, 192fneurodynamics generating tasks

of, 213– 15, 214fparietal cortex meeting of,

56– 61, 59fperceptual reality changed

by, 60– 61primitives, 145– 48, 147frecognition's circular causality

with, 149subjective mind influenced by, 49training influencing, 215transitive, 65unconscious generation of, 215as voluntary, 39

290 Index

290

action sequences, 15, 219– 20. See also chunking

as compositional, 229– 30, 252MTRNNs generating, 227– 30,

228f, 229fas novel, 227– 30, 229f

active learner, 143active perception, 130– 31Act- R, 14affective developmental

robotics, 261affordance, 93– 94. See also

Gibsonian approachagent, 31– 32alien hand syndrome, 50– 51alternative images, 71alternative thoughts, 71Amari, Shun- ichi, 113. See also error

backpropagation schemeambiguity, 63– 64animals, recursion- like behaviors

exhibited by, 10– 11A- not- B task, 98– 100, 99fappearance, 24– 25, 24fArbib, Michael, 9– 10, 58, 66,

175, 191Arie, H., 228fAristotle, 4, 261arm robot, 131– 32, 131fartificial evolution, 126– 28Asada, Minoru, 261Ashe, J., 52– 53attractive object, 98– 100, 99fattractors, 91– 97. See also behavior

attractor; limit cycle attractorsas global, 158– 59, 158finvariant set, 84, 158– 59, 158fRössler, 90, 158types of, 84– 85, 85f, 91f

attunement, 39authentic agent, 31– 32

authentic being, 31– 32, 143, 148, 171– 72

authenticity, 31– 32, 237– 39, 238f, 267

autism, 256, 257– 58autorecovery, 155, 157, 160

backpropagation through time (BPTT), 116f

Badre, D., 206– 7, 207fBahlmann, J., 206– 7, 207fBak, P., 171Ballard, Dona, 48, 60Beer, Randall, 121, 126f,

127– 28, 128fbehavior attractor, 130– 31behavior- based approach, 37behavior- based robotics, 82, 103– 9,

104f, 105f, 107fbehavior primitives, 10, 13. See

also chunkscompositions, 200– 202,

200f, 201ffunctional hierarchy development,

203f, 205– 6localist scheme, 200– 201, 200fMTRNN, 204– 6PB vector value assigned to,

201– 2, 201fbehaviors. See also skilled behaviors

distributed representation embedding of, 180– 82, 181f

as imitative, 66, 100– 102, 102fmodel, 191– 96as reactive, 141– 42as spontaneous, 219– 30

Being and Time (Heidegger), 30being- in- the- world, 29– 32, 34beings, 22, 24– 25

as authentic, 31– 32, 143, 148, 171– 72

Index 291

291

of equipment, 30– 31as inauthentic, 142– 43man reflecting on, 31meaning of, 30

bimodal neurons, 53– 54, 56– 57, 208

Blakemore, S.- J., 58Blakeslee, S., 63blind man, 33– 34, 61Blue Brain project, 255bodies, 33– 34, 59– 60, 59f. See also

Cartesian dualismbonobos, 11bottom- up error regression,

MTRNN, 207– 8, 207f, 215bottom- up pathway, 63– 64, 164– 65,

205, 263bottom- up recognition, 60– 61,

197– 98, 266bound learning, 191– 96, 192f,

193f, 195fboys, 119, 119f, 120BPTT. See backpropagation

through timebrains. See also neural network

models; specific brain structuresaction generation, 44– 68, 65fbrain science and, 43– 79, 65f,

70f, 73fchemical plant as, 4– 5, 6cognitive competencies

hosted by, 10dynamics in, 40– 41FLN component in, 11hierarchical mechanisms, 44– 54,

47f, 50f, 52fhuman language- ready,

66– 67, 191intention adjustment mechanisms

employed by, 60– 61mind originating in, 4– 6

models, 81– 83, 109– 12outcomes monitored by, 60– 61overview, 43– 79recognition in, 55– 68, 59f, 62f, 65fspatio- temporal hierarchy of, 217symbols in, 245– 47two- stage model mechanized

by, 40– 41visual recognition, 44– 54, 45f,

46f, 47f, 50f, 52fbrain science, 262– 67

brain and, 43– 79future directions of, 255– 61on linguistic competency, 190– 91MTRNN correspondence,

206– 8, 207fBraitenberg, Valentino, 103– 6,

107, 108– 9branching

overview, 132– 34, 133fYamabico, 152– 60, 153f

Brecker, Michael, 252– 53Broca's area, 191, 196Brooks, Rodney, 103, 106, 107– 8,

107f, 125, 145Buddhist meditation, 254– 55button press trial, 69– 71, 70f

calculus of indications, 18– 19Cantor set, 158– 60, 158fcarpenter, 30– 31, 42, 248, 264Cartesian dualism, 7, 16, 32– 37,

36f, 149cascaded recurrent neural network

(cascaded RNN), 116fcatastrophic forgetting, 165cats, 49cells, 49, 51– 53, 52f, 57. See also

neuronscentral pattern generators (CPGs),

126– 28, 128f

292 Index

292

cerebellum, 58, 60cerebral hemorrhage, 57CFG. See context- free grammarChalmers, David, 75, 172, 246chaos, 87– 90, 91f, 108, 225– 27chaotic attractor, 84, 85fchaotic itinerancy, 168– 69, 169fchemical plant, 4– 5, 6chiasm, 35chimps, 10– 11Chomsky, Noam, 10– 12, 12f, 190– 91,

244, 260. See also faculty of language in broad sense; faculty of language in narrow sense

chunking, 15, 262chunks, 175– 76, 197– 98

junctions, 220, 225– 26MTRNN, 204– 6, 222– 23QRIO, 186– 87structures, 220, 225– 26

Churchland, M., 72, 175, 208, 226circuit- level mechanisms,

76– 77, 78– 79circular causality, 7, 149, 170– 72,

179, 198, 240– 41, 265– 67authenticity and, 237– 39, 238fcriticality and, 237– 39,

238f, 250– 51CLARION. See Connectionist

Learning with Adaptive Rule Induction On- line

Clark, Andy, 94– 95, 246classical artificial intelligence

(classical AI), 106Cleeremans, A., 159closed- loop mode, 178CNN. See convolutional neural

networkcodevelopment process, 213cogito, 23, 25, 29– 32, 107– 9, 107f.

See also being

cognitive competencies, 10cognitive fragmentation, 257cognitive minds, 149– 50,

243– 47, 253cognitive models, 13– 15, 14tcognitive processes, 7, 155, 266.

See also embodied cognition; embodiment

cognitivism, 244– 45, 262composition, 9– 13, 12fcontext, 18– 19models, 13– 15, 14toverview, 9– 20recursion, 9– 13, 12fsymbol grounding problem,

15– 18, 17fsymbol systems, 9– 13, 12f

coherence, 169– 72collective neurons, 63, 72– 73, 73fcollision- free maneuvering, 152columnar organization, 45, 46,

46f, 49comb, 50– 51combinatory explosion problem, 161complex actions

developmental training of, 209– 15experiments, 209– 15QRIO, 209– 15, 209f, 212f, 214f

complex object features, 46– 47, 46f, 47f

complex objects, 48complex visual objects, 46fcompositional action sequences,

229– 30, 252compositionality, 248– 49, 262, 266

in cognitive mind, 243– 47development of, 152– 61, 264as fluid, 202, 216, 265generalization and, 194– 96, 195fMTRNN, 217– 18, 218f

compositions, 145– 48, 147f

Index 293

293

behavior primitives, 200– 202, 200f, 201f

cognitivism, 9– 13, 12flocalist scheme, 200– 201, 200fin symbol systems, 9– 13, 12f

concepts, 246concrete movements, 33– 34Connectionist Learning with

Adaptive Rule Induction On- line (CLARION), 247

connection weight matrix, 121conscious awareness, 248

free will for, 219– 41, 236f, 238fintentions, 69– 75, 70f, 73f, 230–

39, 236f, 238fconscious decision, action

preceding, 237conscious memory, 27– 28consciousness, 25, 187, 250. See also

streams of consciousnessabsolute flow of, 28cogito problem concerning, 29– 32easy problem of, 75free will and, 230– 39, 236f, 238fhard problem of, 75, 172, 249– 50,

263, 267postdiction and, 230– 39,

236f, 238fquestions about, 3– 4structure of, 172surprise quantifying, 172n2

conscious states, 37– 39consolidation, 164– 69, 167f, 169f,

197, 225– 26context- free grammar (CFG),

11, 12fcontexts, 18– 19, 48, 157,

158– 60, 158fcontinuity of minds, 100continuous- time recurrent neural

network (CTRNN), 120– 25,

122f, 123f, 127, 204– 5. See also multiple- timescale recurrent neural network

continuous- time systems, 90contours, 47, 48convolutional neural network

(CNN), 256, 258corridor, 94, 95fcortical electrical stimulation

study, 73– 74cortical song, 72counting, 10– 12CPGs. See central pattern generatorscreative images, 197criticality, 237– 39, 238f, 250– 51CTRNN. See continuous- time

recurrent neural networkcup nesting, 11cursor, 57

Dale, R., 146Dasein, 34death, 31, 171– 72deep learning, 253– 54, 258– 59deep minds, 259degu, 11Demiris, Y., 200– 201, 200fDennett, Daniel, 108depression, 256Descartes, René, 16, 29, 243. See

also Cartesian dualismDesmurget, M., 73– 74, 75, 76, 237D'Esposito, M., 206– 7, 207fdeterministic chaos, 226– 27deterministic dynamics, 226– 27developmental psychology,

97– 100, 99fdevelopmental training, 209– 15,

209f, 212f, 214f, 259– 61Diamond, A., 211difference equation, 83– 84

294 Index

294

dimension, 35– 36, 36fdirect experiences, 22– 23, 23f,

26– 28, 106– 8, 107f, 142– 43direct reference, 34disconnectivity hypothesis, 257discrete movements, 180, 180fdiscrete time system, 85– 90, 86f,

88f, 89fdistributed representation

framework, 177, 180– 82, 181f, 196– 97, 201– 2

disturbance of self, 256– 57Doetsch, P., 258domain specificity, 29do- re- mi example, 26– 28double intentionalities, 142dreaming, 165Dreyfus, H. L., 28– 29, 41, 145dynamical structure, 86– 87, 132–

36, 133f, 135f, 245dynamical systems. See also

nonlinear dynamical systemscontinuous- time, 90difference equation, 83– 84discrete time, 85– 90, 86f, 88f, 89fneurorobotics from perspective of,

125– 36, 126f, 128f, 129f, 130f, 131f, 133f, 135f

structural stability, 90– 93, 91f, 92f

dynamical systems approach, 79, 263

embodied cognition modeled by, 81– 137, 126f

self- organization applied by, 7dynamic closure, 160, 160f, 166– 68dynamic conflict resolution, 215dynamic learning, intermittency

during, 166– 69, 167f, 169fdynamic neural network models,

137, 176– 79, 178f. See also

recurrent neural network with parametric biases

A Dynamic Systems Approach to the Development of Cognition and Action (Thelen and Smith), 97– 98

dynamic systems theory, 83– 93, 85f, 86f, 88f, 89f, 91f, 92f

easy problem, 75echo- state network, 124– 25Edelman, G., 255edge of chaos, 89electroencephalography (EEG), 61,

69– 71, 70felectrophysiological

experiments, 56– 57“Elephants don't play chess”

(Brooks), 106Eliasmith, C., 255Elman, Jeffrey, 118– 20, 119fElman net, 118– 20, 119fembodied cognition, 235– 36.

See also dynamic neural network models

definition of, 82dynamical systems approach

modeling, 81– 137, 85fembodied mind, 32– 37, 36f, 42, 254embodiment, 78, 79, 107– 9,

107f, 236– 37dimension of, 35– 36Gibsonian approach and, 94– 95prediction error generated by, 240

emergence through synthesis, 83emergency shutdown, 4– 5emotions, 261end effectors, 67end- to- end learning, 217entrainment, 95– 96, 154– 56epoché. See suspension of disbelief

Index 295

295

error backpropagation scheme, 113– 16, 113f, 123f, 257

CTRNN application of, 204– 5perceptual sequences acquired by,

204– 5, 206retrograde axonal

signaling mechanism implementing, 207– 8

error regression, 231– 39, 233f, 234f, 236f, 238f, 257

Evans, Gareth, 9– 10evolution, 126– 32experiences, 266. See also direct

experiences; first- person experience; pure experience; subjective experiences

continuous flow of, 26– 28perception dependence of, 23– 42of selfhood, 39

extended mind, 246external inputs, 215external observer, 252

facial imitation, 101faculty of language in broad sense

(FLB), 10, 12, 16faculty of language in narrow sense

(FLN), 10, 11– 13, 12f, 16, 19fallenness, 32fast dynamics, 203f, 204, 205, 206

at M1, 206– 7, 207fQRIO, 210

feature representation, 49feed- forward network model,

112– 20, 113f, 116f, 119f, 129– 30, 130f

Feynman, Richard, 81, 103fingers, 96– 97, 96ffinite state machine (FSM), 17,

88– 89, 153, 160, 227first- person experience, 106– 8, 107f

fixed point attractor, 84, 85, 85f, 94FLB. See faculty of language in

broad senseflesh, 34– 36FLN. See faculty of language in

narrow senseFloreano, D., 130– 31flow, of subjective

experiences, 26– 29fluid compositionality, 202, 216, 265fMRI. See functional magnetic

resonance imagingfocus of expansion (FOE), 94, 95fforward model, 58, 152, 161frame problem, 59, 161, 177– 78frame system, 29Freeman, Walter, 55, 72– 73,

225, 237free will, 69– 75, 78, 218, 248– 51,

261, 263, 265– 67for action, 219– 41, 236f, 238ffor conscious awareness, 219– 41,

221f, 223f, 225f, 228f, 229f, 233f, 234f, 236f, 238f

consciousness and, 230– 39, 236f, 238f

consolidation, 225– 26definition of, 39experiments, 221– 25, 221f,

223f, 225fintention correlates, 69– 75,

70f, 73fJames considering, 225– 26model for, 39– 41, 40fin MTRNN model, 220– 22, 221foverview, 39postdiction and, 230– 39,

236f, 238fstream of consciousness and,

37– 41, 40f, 42vehicle possessing, 108

296 Index

296

Fried, I., 74, 75– 76Friston, Karl, 172n2, 179, 250, 257frontal cortex, 207frontopolar part of prefrontal cortex,

71– 73, 73fFSM. See finite state machineFukumura, Naohiro, 132Fukushima, Y., 159– 60functional magnetic resonance

imaging (fMRI), 60– 61, 65, 66, 67, 70– 71

Gallagher, S., 170Gallese, Vittorio, 67, 68gated local network models,

200, 202gated recurrent neural networks

(RNNs), 200, 222Gaussie, 131f, 182generalization, 194– 96,

195f, 257– 58General Problem Solver (GPS),

13– 15, 14t, 17, 156– 57genetic algorithm, 217Georgopoulos, A., 49– 50, 54Gershkoff- Stowe, L., 98, 99Gestalt. See structuring processes

of wholeGibson, Eleanor, 55, 58, 143Gibson, J., 93– 95, 95fGibsonian approach, 93– 95,

95f, 107, 263. See also Neo- Gibsonian approaches

global attractor, 158– 59, 158fgoal- directed action plans, 13, 108,

156– 57, 157fgoal- directed actions, 65– 66,

210– 15, 212f, 214f, 253Goldman, Alvin, 67, 68Goodale, Mel, 56GPS. See General Problem Solver

grammar, 11, 12– 13, 12fgrandmother cells, 245– 46grasping neurons, 64– 66, 65fGraves, A., 245Graziano, M., 54groundlessness, 240– 41, 251, 253,

264, 267– 68Gunji, Y., 252

Haas, H., 125hair, 50– 51hallucinations, 228– 29hammer, 30– 31, 42, 60, 170,

248, 264hands, 34– 35, 63

metronome in synchrony with, 96– 97, 96f

QRIO imitating, 187– 90, 189fQRIO predicting, 183– 87,

184f, 185fhandwriting recognition system, 258hard problem, 75, 172, 249– 50,

263, 267harmonic oscillator, 92, 93Harnad, Steven, 16. See also symbol

grounding problemHarris, K., 124Haruno, M., 200– 201, 200fHauk, O., 66– 67Hayes, G., 200– 201, 200fHebbian learning, 190– 91Heidegger, Martin, 21, 39, 41– 42,

60, 61, 142– 43, 162, 170, 248, 251, 264, 267. See also being- in- the- world

on future, 235on past, 235

hermeneutics, 29– 31Hierarchical Attentive Multiple

Models for Execution and Recognition, 200– 201, 200f

Index 297

297

hierarchical mechanisms, 44– 54, 45f, 46f, 47f, 50f, 52f

hierarchical mixture, 200– 201, 200f

hierarchical Modular Selection and Identification for Control (MOSAIC), 200– 201, 200f

hierarchy, 49– 50, 50f, 262, 265hippocampus, 72– 73, 73f, 109, 165Hobbes, Thomas, 39, 237Hochreiter, S., 216– 17holding neurons, 65homeostasis principle, 182Hopfield network, 164, 165how pathway, 56, 63Humanoid Brain project, 256humanoid robots, 221– 22, 221f,

227– 30, 228f, 229f, 257. See also other robot; self robot

humanscogito level had by, 107– 9, 107fdirect experiences for, 142– 43imitation for, 190intransitive actions of, 65language- ready brains, 66– 67, 191linguistic competency of,

66– 67, 190– 91mirror systems in, 65, 66parietal cortex damage in, 57presupplementary motor

area, 75– 76Hume, David, 170Husserl, Edmund, 21, 22– 23, 23f,

24– 25, 24f, 41, 61, 106, 142– 43, 145, 176, 186– 87, 248– 49

on direct experiences, 26– 28temporality notion of, 32on time perception, 26– 29

Hwang, J., 259hybrid system. See symbol grounding

problem

ideational apraxia, 57ideomotor apraxia, 57Ikegami, Takashi, 137Ikegaya, Y., 72, 109images, 71– 72, 182, 197, 266. See also

motor imagery; visual imageryimitation, 100– 102, 102f, 131– 32,

131f, 188game, 187– 90, 189f, 251for humans, 190manipulation, 221– 22, 221fby mental state reading, 182– 90,

184f, 185f, 189fprediction error influencing, 198QRIO, 187– 90, 189f

imitative actions, statistical learning of, 221– 22, 221f

imitative behaviors, 66imperative sentences, 191– 96, 192f,

193f, 195fimpression, 27inauthentic agent, 31– 32inauthentic being, 142– 43incoherence, 169– 72index fingers, 96– 97, 96findexing, 18– 19infants

developmental psychology, 97– 100, 99f

imitation in, 100– 102, 102fintentionality possessed by, 211object used by, 101– 2, 102f,

143, 211preverbal, 101– 2, 102f

inferior frontal cortex, 61inferior parietal cortex, 183inferior parietal lobe (IPL), 65– 66inferior temporal area (TE) (TEO),

46– 47, 46f, 47finferotemporal cortex, 46– 47,

46f, 47f

298 Index

298

infinite regression, 19information bottlenecks, 210information hubs, 208, 210information mismatch, 201, 202information processing, 200– 201initial states, 208– 15, 209f,

212f, 214fof intention units, 220– 22, 221fsetting, 218

inner prediction error, 257instrumental activities, 102, 102f“Intelligence without representation”

(Brooks), 106intentionalities, 28, 142, 161, 211,

225. See also subjectivityintentions, 56– 61, 59f, 78

conscious awareness, 69– 75, 70f, 73f, 230– 39, 236f, 238f

free will neural correlates, 69– 75, 70f, 73f

initiation of, 71– 75, 73fintention switched to from, 144mirror neurons coding, 68organization of, 74– 75parietal cortex as involved in, 76from PFC, 144prediction error generated by, 240rising of, 69– 75, 70f, 73fspontaneous, 69– 75, 70f, 73f,

230– 40, 236f, 238ftop- down subjective, 263

intention- to- perception mapping, 144

intention units, 204– 6, 218, 220– 22, 221f

interaction, 252– 55interactionism, problem of,

16, 243– 47intermediate dynamics, 203f,

204, 205MTRNN, 223f, 224

parietal cortex, 207– 8QRIO, 210VP trajectories, 213– 15, 214f

intermittency, during dynamic learning, 166– 69, 167f, 169f

intermittent chaos, 89, 168intermittent transitions, 226internal contextual dynamic

structures, 132– 36, 133f, 135finternal observer, 252intransitive actions, 65intraparietal sulcus, 62invariant set, 84, 158– 59, 158fIPL. See inferior parietal lobeIriki, Atsushi, 61– 62, 62fIto, Masao, 58, 152

Jaeger, H., 125, 204James, William, 21, 37– 41, 40f,

42, 69, 71– 72, 162, 170, 182, 250. See also streams of consciousness

free will consideration of, 225– 26momentary self spoken of by, 171

Jeannerod, M., 59Johnson- Pynn, J., 11Jordan- type recurrent neural

network (Jordan- type RNN), 116f, 133f, 134

joystick task, 57

Karmiloff- Smith, A., 211Kawato, Mitsuo, 58, 152Kelso, Scott, 95– 97, 96fKhepera, 129– 30, 129f, 130fKiebel, S., 206– 7, 207fkinetic melody, 202, 216knowledge, 57, 266Kohonen network, 164, 227– 30,

228f, 229fKourtzi, Z., 48

Index 299

299

Kugler, 95– 96Kuniyoshi, Y., 128– 30, 129f,

130f, 175– 76

Laird, John, 15, 246– 47landmark- based navigation, mobile

robot performing, 162– 72, 163f, 167f, 169f, 248

landmarks, 17– 18, 17f, 170– 71language, action bound to,

190– 96, 192flanguage- ready brains, 66– 67, 191latent learning, 161lateral intraparietal area (LIP), 46Lateralized Readiness Potential, 70learnable neurorobots, 141learning, 259– 61. See also

consolidation; deep learning; dynamic learning; error backpropagation scheme; imitation; predictive learning

bound, 191– 96, 192f, 193f, 195fas end- to- end, 217Hebbian, 190– 91of imitative actions, 221– 22, 221fas latent, 161offline processes, 197– 98in RNNPB, 177– 82, 178f, 181fas statistical, 221– 22, 221f

lesion, 224, 225fLi, W., 48Libet, Benjamin, 69– 71, 70f, 218,

219, 220, 223, 230, 235, 240, 249, 263

like me mechanism, 101, 132, 183, 187, 190

limbs, 33, 62– 63, 73– 74limit cycle attractors, 84, 85,

85f, 92– 93locomotion evolution with,

126– 28, 128f

in MTRNN, 213periodicity of, 166– 68

limit torus, 84, 85flinguistic competency,

66– 67, 190– 91LIP. See lateral intraparietal arealocal attractors, 84, 85, 85flocalist scheme, 196– 97,

200– 201, 200flocal representation framework, 177locomotion, limit attractor

evolution, 126– 28, 128f. See also walking

locomotive controller, 127– 28logistic maps, 85– 89, 86f, 88f, 89f,

90, 108longitudinal intentionality, 28long- term and short- term memory

recurrent neural network (RNN) model, 216– 17

look- ahead prediction, Yamabico, 154– 57, 155f, 157f

Lu, X., 52– 53Luria, Alexander, 202, 216Lyapunov exponent, 224

M1. See primary motor cortexmacaque monkeys, 45f, 46Mach, Ernst, 22– 23, 23fman, 31, 33– 34, 61manipulation, 63– 64

imitation, 221– 22, 221fof QRIO, 209– 15, 209f, 212f, 214fsymbol, 145– 48, 147ftutored sequences, 227– 30,

228f, 229fof visual objects, 56– 57

Markov chains, 226– 27Massachusetts Institute of

Technology (MIT), 103Matarić, M., 108

300 Index

300

matching, 188Matsuno, K., 252Maturana, H., 117, 132May, Robert, 85. See also

logistic mapsmeanings, 195– 96medial superior temporal area

(MST), 45– 46medial temporal lobe, 246melody, 26– 28Meltzoff, A., 101, 183, 187memory cells, 217mental rehearsal, 164– 69,

167f, 169fmental simulation, 154, 156mental states, imitating others

by reading, 182– 90, 184f, 185f, 189f

Merleau- Ponty, Maurice, 21, 25, 32– 37, 36f, 42, 61– 64, 78, 144, 237, 244. See also embodiment; Schneider

middle temporal area (MT), 45middle way, 254– 55Miller, J., 70Milner, David, 56miming, 57mind/ body dualism. See Cartesian

dualismmind- reading, 68minds, 3– 8. See also cognitive

minds; consciousness; embodied cognition; subjective mind

continuity of, 100deep, 259embodiment of, 32– 37, 36f,

42, 254as extended, 246overview, 262– 67theory of, 67

minimal cognition, 126

minimal self, 169– 72Minsky, Marvin, 29mirror box, 63mirror neurons, 55– 56, 261. See also

recurrent neural network with parametric biases

dynamic neural network model for, 176– 79, 178f

evidence for, 64– 67, 65fgrasping, 64– 66, 65fholding, 65implementation, 67– 68intention coded by, 68IPL, 65– 66model, 177– 79, 191– 96, 192f,

193f, 195fof monkeys, 76overview, 64– 68, 65fin parietal cortex, 76, 177tearing, 65

mirror systems, in humans, 65, 66mismatches, 60– 61, 201, 202MIT. See Massachusetts Institute of

Technologymixed pattern generator, 128, 128fmobile robots, 16– 18, 17f. See also

Yamabicoexample, 5– 6, 16– 18, 17flandmark- based navigation

performed by, 162– 72, 163f, 248

in office environment, 16– 18, 17f

problem, 16– 18, 17fwith vision, 162– 72, 163f, 167f,

169f, 173, 193– 96, 193fmodels, 57modularity, 44– 49, 45f, 46f, 47fmomentary self, 170, 171, 173, 264monkey– banana problem,

14– 15, 14t

Index 301

301

monkeys, 45f, 46, 48, 50, 51– 52, 52f, 53– 54, 57

inferior parietal cortex of, 183IPL of, 65– 66mirror neurons of, 76motor cortex of, 208motor neurons of, 183parietal cortex of, 61– 62,

62f, 76PMC of, 208PMv controlling, 64– 65, 65fpresupplementary motor

area, 75– 76primitive movements of, 75– 76

Moore, M., 101moral virtue, 261Mormann, F., 246mortality, 32motifs, 72motor cortex, 208, 222– 23motor imagery, 59, 206, 211,

222– 24, 223fmotor neurons, of monkeys, 183motor programs, 208– 15, 209f,

212f, 214fmotor schemata theory,

9– 10, 175– 76movements

discrete, 180, 180fparietal cortex, 73– 74patterns, 180– 82, 181f, 187– 90,

189f, 213– 15, 214fPMC, 73

MST. See medial superior temporal area

MSTNN. See multiple spatiotemporal neural network

MT. See middle temporal areaMTRNNs. See multiple- timescale

recurrent neural networksMulliken, G. H., 57

multiple spatiotemporal neural network (MSTNN), 217, 218f

multiple- timescale recurrent neural networks (MTRNNs), 252, 257, 265

action sequences generated by, 227– 30, 228f, 229f

behavior primitives, 204– 6bottom- up error regression, 207– 8,

207f, 215brain science correspondence,

206– 8, 207fchunks, 204– 6, 222– 23compositionality, 217– 18, 218fexperiment, 208– 15, 209f, 212f,

214f, 230– 35, 233f, 234ffree will in, 220– 22, 221flimit- cycle attractors in, 213motor imagery generated by, 206overview, 203– 8, 203f, 207f,

216– 18, 218fperceptual sequences, 204– 6recognition performed by, 206RNNPB as analogous to, 229– 30top- down forward

prediction, 215top- down pathway, 207– 8, 207ftutoring, 237– 39, 238f

Mu- ming Poo, 124Murata, A., 233fMushiake, H., 53– 54mutual imitation game,

187– 90, 189f

Nadel, Jacqueline, 101– 2, 102f, 131, 188

Namikawa, J., 221f, 223f, 225fnavigation, 251. See also landmark-

based navigation; mobile robotdynamical structure in, 132– 36,

133f, 135f

302 Index

302

navigation (Cont.)internal contextual dynamic

structures in, 132– 36, 133f, 135f

problem, 107– 9, 107fself- organization in, 132– 36,

133f, 135fYamabico experiments,

132– 36, 153– 62Neo- Gibsonian approaches, 95– 97,

96f, 144, 263neonates, 101neural activation sequences,

222– 24, 223fneural activation state, 208neural circuits, 117, 132– 36,

133f, 135fneural correlates, 76– 77, 78– 79neural network models, 255– 56. See

also feed- forward network modeloverview, 112– 25, 113f, 116f, 119f,

122f, 123ftypes of, 112– 25, 113f, 116f, 119f,

122f, 123fneurodynamic models, subjective

views in, 143– 48, 145f, 147fneurodynamic structure,

157– 59, 158fneurodynamics with timescales,

213– 15, 214fneurodynamic system, 145– 48, 147fneurons, 46. See also mirror

neurons; motor neurons; neural network models

bimodal, 53– 54, 56– 57, 208collective, 63, 72– 73, 73fhard problem, 75as motifs, 72PMC, 72– 73, 73fpostsynaptic, 124presynaptic, 124

as spiking, 109– 10, 255– 56V1, 48

neuro- phenomenological- robotics, 256– 57

neurophenomenology program, 254neurorobotics

from dynamical systems perspective, 125– 36, 126f, 128f, 129f, 130f, 131f, 133f, 135f

model, 257– 59neuroscience. See brain scienceNewell, Allen, 13, 15. See also

General Problem Solvernewness, 248– 49Nishida, Kitaro, 21, 22– 23, 25Nishimoto, R., 209f, 212fNolfi, S., 130– 31, 200– 201, 200fnonlinear dynamical systems,

structural stability of, 90– 93, 91f, 92f. See also logistic maps

nonlinear dynamics, 83– 93nonlinear mapping, tangency in,

89, 89fnouvelle artificial intelligence

(nouvelle AI), 106novel action sequences, 227– 30,

228f, 229fnowness, 27, 171– 72, 186– 87, 235

objectification, 26– 29objective science, subjective

experience and, 251– 55, 267objective time, 27– 28, 38– 39objective world, 266– 67

phenomenology, 7, 23– 42, 24f, 36f, 40f

subjective mind as tied to, 49, 148– 50, 149f

subjective mind's distinction from, 7, 23– 42, 24f, 36f, 40f

subjectivity as mirror of, 172

Index 303

303

objectivity, 250, 254– 55objects. See also manipulation; tools;

visual objectsas attractive, 98– 100, 99fchimps and, 10– 11complex, 46fcounting of, 10– 12features, 46finfants using, 101– 2, 102f,

143, 211perception of, 33– 37, 36fshaking, 144– 45, 145fskilled behaviors for manipulating,

57– 61, 59fsubject as separated from, 22– 23subject iterative exchanges, 36subject's unified existence with,

25, 36, 244, 248as three- dimensional, 35– 36, 36fas two- dimensional, 35– 36, 36f

offline learning processes, 197– 98offline look- ahead prediction. See

look- ahead predictionone- step prediction, 154– 55,

155f, 232online prediction, 153– 54open- loop mode, 178operating system (OS), 79optical constancy, 95foptical flow, 94, 95fOS. See operating systemother robot, 231– 35, 233f, 234foutfielder, 94– 95overregularization, 98Oztop, E., 58

palpation, 34– 35, 144Parallel Distributed Processing

(PDP) Research Group, 196parametric bias (PB), 177– 82, 178f,

181f, 215

activations, 191– 93, 192fprediction error, 198self- organization, 192vectors, 183– 87, 185f, 191– 93,

192f, 194– 96, 195f, 197, 201– 2, 201f, 204

parietal cortex, 55, 78, 237, 266. See also precuneus

action intention meeting of, 56– 61, 59f

bimodal neurons in, 208cells, 57damage to, 57as information hub, 208intention involvement of, 76intermediate dynamics, 207– 8mirror neurons in, 76, 177of monkeys, 61– 62, 62f, 76movements, 73– 74overview, 56– 61, 59fperceptual outcome meeting of,

56– 61, 59fperceptual structures in, 144predictive model in, 59f, 68stimulation of, 73– 74visual objects involvement

of, 56– 57parrots, 11past, 235pastness, 27PB. See parametric biasPCA. See principal component

analysisPDP Research Group. See Parallel

Distributed Processing Research Group

perception. See also active perception; what pathway; where pathway

action changing reality of, 60– 61action generation role of, 55

304 Index

304

perception (Cont.)cogito as separate from, 25experience as dependent on,

23– 42, 24f, 36f, 40fintention altered by reality

of, 60– 61of objects, 33– 37, 36foutcome, 56– 61, 59fparietal cortex meeting of,

56– 61, 59fof square, 24– 25, 24fof time, 26– 29, 176,

186– 87, 248– 49perception- to- action mapping, 144perception- to- motor cycle, 106, 107perceptual constancy, 95perceptual flows, 258perceptual sequences, 177– 79, 178f,

203f, 204– 6perceptual structures, in parietal

cortex, 144perchings, 38, 226periodicity, of limit cycle

attractors, 166– 68perseverative reaching,

98– 100, 99fPFC. See prefrontal cortexPfeifer, R., 128– 30, 129f, 130fphantom limbs, 33, 62– 63phase transitions, 95– 97, 96f, 144phenomenological reduction, 21phenomenology, 20

being- in- the- world, 29– 32direct experience in, 22– 23, 23fembodiment of mind, 32– 37, 36fobjectification, 26– 29objective world, 7, 23– 42, 24f,

36f, 40foverview, 21– 42, 23f, 24f, 36f,

40f, 247– 51subjective experiences, 26– 29

subjective mind, 7, 23– 42, 24f, 36f, 40f

time perception, 26– 29, 176, 186– 87, 248– 49

Piaget, Jean, 98– 101, 99f, 102, 260Pick, Anne, 55, 58, 143pilots, 94PMC. See premotor cortexPMv. See ventral premotor areaPoincaré section, 90, 91fpolarity, 25poles, 36Pollack, J., 159postdiction, 230– 39, 233f, 234f,

236f, 238fpostsynaptic neurons, 124posture, 59– 60, 59fpoverty of stimulus problem,

193, 260precuneus, 71– 73, 73fprediction. See also one- step

predictionerrors, 166– 72, 167f, 169f, 192– 93,

198, 206, 207– 8, 231– 32, 236, 240, 257– 58

as offline, 154as online, 153– 54RNNs as responsible for, 186of sensation, 153– 57, 153f,

155f, 157ftop- down, 60– 61,

164– 65, 197– 98Yamabico, 153– 57, 153f, 155f, 157f

predictive coding, 48, 191– 96, 192f, 193f, 195f

predictive dynamics, self- consciousness and, 161– 72, 163f, 167f, 169f

predictive learningfrom actional

consequences, 151– 73

Index 305

305

about world, 151– 73, 153f, 155f, 157f, 158f, 160f, 163f, 167f, 169f

predictive model, 57– 64, 59f, 62f, 68

preempirical time, 26– 27prefrontal cortex (PFC), 144,

206– 7, 207f, 225– 26, 266. See also frontopolar part of prefrontal cortex

premotor cortex (PMC), 49– 54, 50f, 52f, 77– 78

of monkey, 208movements, 73neurons, 72– 73, 73frole of, 76stimulations of, 73

present, 31– 32presentness, 27presupplementary motor area, direct

stimulation of, 74– 76presynaptic neurons, 124pretend play, 101preverbal infants, 101– 2, 102fprimary motor cortex (M1), 49– 53,

50f, 52f, 54, 60, 77– 78faster dynamics at, 206– 7, 207fSMA and, 206– 7, 207f

primary visual cortex (V1), 44– 45, 48

primitive actions, stochastic transitions between, 221– 22, 221f

primitive movements, 51– 52, 52f, 53, 75– 76

principal component analysis (PCA), 211

Principles of Psychology (James), 37– 39

private states of consciousness, 38– 39

probabilistic processes, 226– 27

problem of interactionism, 16, 243– 47

proprioception, 59– 60, 59f, 179, 183– 87, 184f, 185f

protention, 26– 27, 61, 186, 198protosigns, 66– 67Pulvermuller, F., 191pure experience, 22– 23,

26– 28, 39

Quest for cuRIOsity (QRIO), 183– 90, 184f, 189f

complex actions, 209– 15, 209f, 212f, 214f

developmental training, 209– 15, 209f, 212f, 214f

fast dynamics, 210intermediate dynamics, 210manipulation of, 209– 15, 209f,

212f, 214fslow dynamics, 210

rake, 62Ramachandran, V., 63Rao, Rajesh, 48, 60rapid eye movement (REM) sleep

phase, 165rats, hippocampus of, 72– 73, 73freactive behaviors, 141– 42Readiness Potential (RP),

69– 71, 70frecognition, 22. See also visual

recognitionaction's circular causality

with, 149bottom- up, 60– 61, 197– 98, 266in brain, 55– 68, 59f, 62f, 65fof landmarks, 170– 71MTRNNs performing, 206of perceptual sequences, 206

reconstruction, 22

306 Index

306

recurrent neural networks (RNNs), 111– 12, 150, 245. See also cascaded recurrent neural network; Jordan- type recurrent neural network

as forward dynamics model, 152as gated, 222models, 116– 20, 116f, 119f, 124,

202, 216– 17, 264prediction responsibility of, 186Yamabico, 153– 61, 153f, 155f,

157f, 160frecurrent neural network with

parametric biases (RNNPB), 177– 79, 205. See also parametric bias

characteristics of, 179– 82, 181fdistributed representation

characteristics in, 197frame problem avoided by, 177– 78learning in, 177– 82, 178f, 181fmodels, 191– 98, 192f, 193f, 195f,

201– 2, 201f, 204, 229– 30, 248– 49, 264– 65

MTRNN as analogous to, 229– 30overview, 176– 79, 178fsegmentation of, 186– 87system flow of, 178f

recursion, 9– 13, 12freflective pattern generator, 127reflective selves, 216refrigerator, 5, 19, 41– 42refusal of deficiency, 33rehearsal, 164– 69, 167f, 169fREM phase. See rapid eye movement

sleep phaserepresentation, 25, 28, 106, 108,

145– 48, 147fresponse facilitation with

understanding meaning, 183retention, 26– 27, 186, 198

retrograde axonal signal, 124retrograde axonal signaling

mechanism, 124, 207– 8Rizzolatti, G., 64– 66, 65f,

76, 182– 83RNNPB. See recurrent neural

network with parametric biasesRNNs. See recurrent neural

networksrobotics, 5– 6, 261. See also

behavior- based robotics; neurorobotics

robots. See also arm robot; behavior- based robotics; mobile robots; other robot; self robot

Cartesian dualism freedom of, 149as humanoid, 221– 22, 221f, 227–

30, 228f, 229f, 257Khepera, 129– 30, 129f, 130fnavigation problem, 107– 9, 107freflective selves of, 216as self- narrative, 206, 216, 249with subjective views, 141– 43walking of, 126– 28, 128f

Rössler attractor, 90, 158Rössler system, 90, 91frostral- caudal gradient, 206– 7, 207fRP. See Readiness Potentialrules, 11, 12f, 14– 15, 14tRumelhart, D., 113. See also error

backpropagation scheme

Sakata, Hideo, 57sand pile behavior, 171scaffolding, 260Scheier, C., 128– 30, 129f, 130fschizophrenia, 256– 57Schmidhuber, J., 216– 17Schneider, 33– 34, 56see- ers,34– 35segmentation, 176, 186– 87

Index 307

307

self- consciousness, 161– 72, 163f, 167f, 169– 70, 169f

selfhood, 39self- organization, 7, 98, 130,

202, 244– 45in bound learning process,

194– 96, 195fof dynamical structure, 132– 36,

133f, 135f, 245dynamical systems approach

applying, 7of functional hierarchy, 203– 8,

203f, 207fmultiple timescales, 203– 8,

203f, 207fin navigation, 132– 36, 133f, 135fPB, 192

self- organized criticality (SOC), 171– 72, 188– 90, 264, 267

self robot, 231– 35, 233f, 234fselves, 248, 264

disturbance of, 256– 57as minimal, 169– 72momentary, 170, 171, 173, 264range of, 33as reflective, 216

semantically combinatorial language of thought, 145

sensationalism, 24sensations, prediction of,

153– 57, 153f, 155f, 157f. See also synesthesia

sensory aliasing problem, 134sensory cortices, 63– 64sensory- guided actions, in

PMC, 53– 54sensory- motor coordination,

128– 32, 129f, 130f, 131fsensory- motor flow

action generation mirrored by, 175– 98, 178f

articulating, 175– 98, 185fsensory- motor sequences

model, 191– 96sentences, 190

Elman net generating, 118– 20, 119f

model, 191– 96, 192frecursive structure of, 11, 12f

sequence patterns, 177– 79, 178f. See also recurrent neural network with parametric biases

sequential movements, 53– 54shaking, 144– 45, 145fShima, K., 51– 52, 52f, 53, 76, 206short- term memory (STM), 247Siegelmann, Hava, 112, 244– 45Simon, Herbert, 13. See also General

Problem Solversimulation theory, 67, 68single- unit recording, 46sinusoidal function, 92Sirigu, A., 58, 59, 76skilled behaviors, 50– 51, 57– 61, 59fslow dynamics, 203– 4, 203f, 205,

206. See also intention unitsMTRNN, 223f, 224at PFC, 206– 7, 207fQRIO, 210

SMA. See supplementary motor areaSmith, Linda, 97– 98, 211Soar, 14, 15, 246– 47SOC. See self- organized criticalitySoon, C., 70– 71, 74, 218, 219, 220,

223, 230, 240speech recognition system, 258Spencer- Brown, G., 18– 19spiking neurons, 109– 10, 255– 56.

See also neural network modelsSpivey, M., 100, 146spoken grammar, 12– 13spontaneity, 226– 27

308 Index

308

spontaneous behaviors, 219– 30spontaneous generation

of intention, 69– 75, 70f, 73f, 230– 40, 236f, 238f

overview, 71– 72staged development, 260statistical learning, of imitative

actions, 221– 22, 221fsteady phase, 168, 169, 170STM. See short- term memorystochastic transitions between

primitive actions, 221– 22, 221fstreams of consciousness, 170

characteristics of, 37– 39definition of, 37flights, 38, 226free will and, 37– 41, 40f, 42images in, 182overview, 37– 41, 40fperchings, 38, 226states, 37– 39

stretching and folding, 87, 88fstructural stability, 90– 93, 91f, 92fstructuring processes of whole

(Gestalt), 34subject

object as separated from, 22– 23object iterative exchanges, 36object's unified existence with,

25, 36, 244, 248subjective experiences, 26– 29,

251– 55, 267subjective mind

actions influencing, 49objective world as tied to, 49,

148– 50, 149fobjective world's distinction from,

7, 23– 42, 24f, 36f, 40fphenomenology, 7, 23– 42, 24f,

36f, 40fsubjective sense of time, 32

subjective views, 141– 48, 145f, 147fsubjectivity, 141, 172, 250,

254– 55, 266– 67subrecursive functions, 13substantial parts, 226subsumption architecture,

107– 9, 107fSugita, Yuuya, 191– 96, 192f,

193f, 195fSun, Ron, 247superior parietal cortex. See

precuneussupplementary motor area

(SMA), 49– 54, 50f, 52f, 61, 63– 64, 77– 78

EEG activity, 70– 71M1 and, 206– 7, 207f

surprise, 172n2suspension of disbelief (epoché), 25symbol grounding problem, 15– 18,

17f, 159– 61, 160f, 243symbolic dynamics, 88– 89symbolic processes, 87– 88symbols, 19– 20, 108, 145– 48,

147f, 245– 47symbol systems, 9– 13, 12f, 18– 20synchronization, 188synchrony, 102, 102f, 131– 32synesthesia, 34, 63synthesis, emergence through, 83synthetic modeling approach, 79,

82. See also dynamical systems approach; embodiment

synthetic neurorobotics studies, 6, 7, 263– 64

synthetic robotics approach, 7, 267synthetic robotics experiments,

150, 218

tactile palpation, 34Tanaka, Keiji, 46

Index 309

309

tangency, 89, 89f, 171Tani, Tohru, 17f, 28, 35, 38Tanji, J., 51– 52, 52f, 53– 54, 76, 206TE. See inferior temporal areatearing neurons, 65temporality, 32temporal patterns, 182temporoparietal junction

(TPJ), 61TEO. See inferior temporal areaTettamanti, M., 67that which appears, 24– 25, 24fThelen, Ester, 97– 98, 99, 211thinking, thought segmentation

of, 146thoughts, 71– 72

chaos generating, 108experiments, 103– 5, 104f, 105fsemantically combinatorial

language of, 145thinking segmented into, 146

three- dimensional objects, 35– 36, 36f

time, subjective sense of, 32. See also objective time

time perception phenomenology, 26– 29, 176, 186– 87, 248– 49

tokens, 10, 13tools, 57– 61, 59ftop- down forward prediction, 215top- down pathway, 63, 164– 65,

172, 207– 8, 207f, 250top- down prediction, 60– 61,

164– 65, 197– 98top- down projection, 144– 45top- down subjective intentions, 263top- down subjective view, 266touched, 35touching, 35toy, 98– 100, 99fTPJ. See temporoparietal junction

training, actions influenced by, 215. See also developmental training

transient parts, 226transition rules, 14– 15, 14ttransition sequences, 222transitive actions, 65transversal intentionality, 28Trevena, J., 70Tsuda, I., 168– 69Turing limit, 112, 245turn taking, 188, 190Turvey, 95– 96tutored sequences, 227– 30, 228f, 229ftutoring, 237– 39, 238f,

240– 41, 259– 61two- dimensional objects, 35– 36, 36ftwo- stage model, 39– 41, 40f

Uddén, J., 206– 7, 207fUeda, Shizuteru, 23universal grammar, 191unsteady phase, 168, 169– 70usage- based approach, 191U- shaped development, 98– 100, 99f

V1. See primary visual cortexV2, 45, 48V4, 46, 47, 48Van de Cruys, S., 257– 58Varela, Francisco, 27, 42, 117, 132,

187, 248, 254vector field, 91– 92, 92fvector flow, 91– 92, 92fvehicles, 103– 5, 104f, 105f,

107, 108– 9Vehicles: Experiments in Synthetic

Psychology (Braitenburg), 103– 6ventral intraparietal area (VIP), 46ventral premotor area (PMv),

64– 65, 65fVIP. See ventral intraparietal area

310 Index

310

virtual- reality mirror box, 63virtue, 261vision

Merleau- Ponty on, 34– 35mobile robot with, 162– 72, 163f,

167f, 169f, 173, 193– 96, 193f, 195fvisual agnosia, 56visual alphabets, 47visual cortex, 44– 49, 45f, 46f, 47fvisual imagery, 228– 29visual objects, 46f, 56– 57visual palpation, 144visual receptive field, 62visual recognition, 44– 54, 45f, 46f,

47f, 50f, 52fvisuo- proprioceptive (VP) flow, 210visuo- proprioceptive mapping,

131– 32, 131fvisuo- proprioceptive (VP)

trajectories, 213– 15, 214fvoluntary actions, 39voluntary sequential movements in

SMA, 50– 53, 52fvon Hofsten, Claes, 143VP flow. See

visuo- proprioceptive flowVP trajectories. See visuo-

proprioceptive trajectories

walking, 126– 28, 128fwalking reflex, 98water hammers, 4– 5, 6Werbos, Paul, 113. See also error

backpropagation scheme

Wernicke's area, 191what pathway, 45, 46, 47f,

162– 65, 163fwhere pathway, 45, 162– 65, 163fwill, 56. See also free willWittgenstein, Ludwig, 18w- judgment time, 69– 71, 70fWolpert, Daniel, 58words, 67, 145– 48, 147f, 190World War II (WWII), 94

Yamabico, 132– 36, 133f, 135f, 164, 173

branching, 133, 152– 60, 153f, 155f, 157f, 158f, 160f, 176

intentionality of, 161look- ahead prediction, 154– 57,

155f, 157fnavigation experiments with,

153– 62, 153f, 155f, 157f, 158f, 160f

neurodynamic structure, 157– 59, 158f

prediction, 153– 57, 153f, 155f, 157f

RNN, 153– 61, 153f, 155f, 157f, 160f

symbol grounding problem, 159– 61, 160f

trajectories of, 155, 156– 57, 157fYamashita, Yuuichi, 208, 256– 57.

See also multiple- timescale recurrent neural network

Yen, S. C., 49

311

312

Documents

Exploring Robotic Mindsdl.booktolearn.com/...exploring_robotic_minds_92a1.pdfPart I On the Mind 1. Where Do We Begin with Mind? 3 2. Cognitivism 9 2.1 Composition and Recursion in