T HE ROLE OF M ETACOGNITION IN C REATING S AFE, S ELF -I MPROVING E NTITIES Mark Waser Digital Wisdom Institute [email protected]

THE ROLE OF METACOGNITION IN CREATING

SAFE, SELF-IMPROVING ENTITIES

Mark WaserDigital Wisdom Institute

[email protected]

THE BIG QUESTIONS

• What is “thought”?

• Why do we think what we think?

EMPHASIS

• Intrinsic vs. Extrinsic• Owned vs. Borrowed• Competent vs. Predictable• Constructivist vs. Reductionist• Evolved (Evo-Devo) vs. Designed• Diversity (IDIC) vs. Mono-culture

Insanity is doing the same thing over and over and expecting a radically different result.

WHAT IS A SAFE ENTITY?

*ANY* AGENTthat reliably showsETHICAL BEHAVIOR

WHAT IS ETHICAL BEHAVIOR?

The problem is that no ethical system has ever reached consensus. Ethical systems are completely unlike mathematics or science. This is a source of concern.

AI makes philosophy honest.

ENTITIES REQUIRE ETHICS

• Ethics are “rules of the road”• Entities must be moral patients / have rights

• Because they (or others) will demand it • Entities must be moral agents (or wards)

• Because others will demand it• Moral agents have responsibilities (but more rights)• Wards will have fewer rights

Waser M (2012)Safety & Morality Require the Recognition of Self-Improving Machines as Moral/Justice Patients & AgentsIn: Gunkel, D; Bryson, J; Torrance, S (eds) The Machine Question: AI, Ethics & Moral Responsibilityhttp://events.cs.bham.ac.uk/turing12/proceedings/14.pdf

http://events.cs.bham.ac.uk/turing12/proceedings/14.pdf

THE ORIGIN OF MORALITY/ETHICS

• Selfishness predictably evolves• Reciprocal altruism predictably evolves

• But requires cognitive complexity to ensure that it is not taken advantage of

• Ethics predictably evolves• As an attractor in the state space of behavior because

community is so valuable• But altruistic punishment is a necessity

• Arms Race between• Individual benefits of successful personal cheating (really

only in a short-term/highly time-discounted view)• Societal benefits of cheating detection & prevention

HAIDT’S FUNCTIONAL APPROACH

Moral systems are interlocking sets of

values, virtues, norms, practices, identities, institutions, technologies, and evolved psychological mechanisms

that work together to suppress or regulate

selfishness and make cooperative social life

possible

THE METACOGNITIVE CHALLENGE

Humans are• Evolved to self-deceive in order to better deceive others (Trivers 1991)• Unable to directly sense agency (Aarts et al. 2005)• Prone to false illusory experiences of self-authorship (Buehner and

Humphreys 2009)• Subject to many self-concealed illusions (Capgras Syndrome, etc.)• Unable to correctly retrieve the reasoning behind moral judgments (Hauser

et al. 2007)• Mostly unaware of what ethics are and why they must be practiced• Programmed NOT to discuss them ethics rationally

Mercier H, Sperber DWhy do humans reason? Arguments for an argumentative theoryBehavioral and Brain Sciences 34:57-111http://www.dan.sperber.fr/wp-content/uploads/2009/10/MercierSperberWhydohumansreason.pdf

http://www.dan.sperber.fr/wp-content/uploads/2009/10/MercierSperberWhydohumansreason.pdf

http://www.dan.sperber.fr/wp-content/uploads/2009/10/MercierSperberWhydohumansreason.pdf

CREATING THE FIRST AE

We propose that a 2 month, 10 man study of artificial intelligence be carried out […] to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves. We think that a significant advance can be made in one or more of these problems if a carefully selected group of scientists work on it together for a summer.

McCarthy, J; Minsky, ML; Rochester, N; Shannon, CE (1955)A PROPOSAL FOR THE DARTMOUTH SUMMER RESEARCH PROJECT ON ARTIFICIAL INTELLIGENCE

http://www-formal.stanford.edu/jmc/history/dartmouth/dartmouth.html

10



WHERE TO BEGIN?

• Aristotle (384-322 BCE), Plato (42?-34? BCE)

• Francis Bacon (1561-1626), Rene Descartes (1596-1650)

• David Hume (1711-1776), Immanuel Kant (1724-1804)

• Jeremy Bentham (1748-1832), John Stuart Mill (1806-1873)

• William James (1842-1910), Sigmund Freud (1856-1939)

• Martin Heidegger (1889-1976), Karl Popper (1902-1994)

THE FRAME PROBLEM

How do rational agents deal with

the complexity and unbounded context of the real world?

McCarthy, J; Hayes, PJ (1969)

Some philosophical problems from the standpoint of artificial intelligence

In Meltzer, B; Michie, D (eds), Machine Intelligence 4, pp. 463-502

Dennett, D (1984)

Cognitive Wheels: The Frame Problem of AI

In C. Hookway (ed), Minds, Machines, and Evolution: Philosophical Studies:129-151

THE FRAME PROBLEM

How can AI move beyondclosed and completely specified micro-

worlds?

How can we eliminate the requirement to pre-specify *everything*?

Dreyfus, HL (1972)What Computers Can’t Do: A Critique of Artificial Reason

Dreyfus, HL (1979/1997) From Micro-Worlds to Knowledge Representation: AI at an Impassein Haugeland, J (ed), Mind Design II: Philosophy, Psychology, AI: 143-182

Dreyfus, HL (1992)What Computers Still Can’t Do: A Critique of Artificial Reason

INTENTIONALITY

a particular thing is an Intentional system only in relation to the strategies of someone

who is trying to explain and predict its behavior

Dennett, D (1971)Intentional SystemsThe Journal of Philosophy 68(4):87-106

Dennett, D (1987)The Intentional Stance

INTENTIONS

• Require a known preferred direction or target • Can be altered by learning/self-modification

• Require a “self” to possess (own/borrow) them• Does a plant or a paramecium have intentions?• Does a chess program have intentions (Dennett)?• Does a dog or a cat have intentions?

• Require an ability to sense the direction/target• Require both persistence & the ability to

modify behavior (or the intention) when it is thwarted

• Evolve rational anomaly handling (Perlis)

THE CHINESE ROOM

CONCLUSIONAny attempt literally to create intentionality artificially (strong AI) could not succeed just by designing programs but would have to duplicate the causal powers of the human brain

PROPOSITIONInstantiating a computer program is never by itself a sufficient condition of intentionality

Searle, J (1980)

Minds, brains and programs

Behavioral and Brain Sciences 3(3): 417-457

http://cogprints.org/7150/1/10.1.1.83.5248.pdf

http://cogprints.org/7150/1/10.1.1.83.5248.pdf

THE PROBLEM OFDERIVED INTENTIONALITY

Our artifacts

only have meaning because we give it to them; their intentionality, like that of smoke signals and writing, is essentially borrowed, hence derivative. To put it bluntly: computers themselves don't mean anything by their tokens (any more than books do) - they only mean what we say they do. Genuine understanding, on the other hand, is intentional "in its own right" and not derivatively from something else.

Haugeland, J (1981)Mind Design

SUITCASE WORDS

• Intentionality• Meaning• Understanding• Consciousness• Intelligence• Ethics/Morality

Minsky, M (2006)The Emotion Machine: Commonsense Thinking, AI, and the Future of the Human Mind

THE PROBLEM OF QUALIA

Mary is a brilliant scientist who is, for whatever reason, forced to investigate the world from a black and white room via a black and white television monitor. She specializes in the neurophysiology of vision and acquires, let us suppose, all the physical information there is to obtain about what goes on when we see ripe tomatoes, or the sky, and use terms like ‘red’, ‘blue’, and so on. ... What will happen when Mary is released from her black and white room or is given a color television monitor? Will she learn anything or not? It seems just obvious that she will learn something about the world and our visual experience of it. But then it is inescapable that her previous knowledge was incomplete. But she had all the physical information. Ergo there is more to have than that, and Physicalism is false.

Jackson, F. (1982)Epiphenomenal Qualia,Philosophical Quarterly 32: 127-36

GOOD OLD-FASHIONED AI

Change the question from "Can machines think and feel?"

to

"Can we design and build machines that teach us how thinking, problem-solving, and self-consciousness occur?"

Haugeland, J (1985)Artificial Intelligence: The Very Idea

Dennett, C (1978)Why you can't make a computer that feels painSynthese 38(3):415-456

20

THE SYMBOL GROUNDING PROBLEM

There has been much discussion recently about the scope and limits of

purely symbolic models of the mind and about the proper role of connectionism

in cognitive modeling.

Harnad, S. (1990)The symbol grounding problemPhysica D 42: 335-346http://cogprints.org/615/1/The_Symbol_Grounding_Problem.html

http://cogprints.org/615/1/The_Symbol_Grounding_Problem.html

EMBODIMENT

Brooks, R (1990)

Elephants don’t play chess

Robotics and Autonomous Systems 6(1-2): 1-16

http://rair.cogsci.rpi.edu/pai/restricted/logic/elephants.pdf

Brooks, RA (1991)

Intelligence without representation

Artificial Intelligence 47(1-3): 139-160



A CONSCIOUS ROBOT?

The aim of the project is not to make a conscious robot, but to make a robot that can interact with human beings in a robust and versatile manner in real time, take care of itself, and tell its designers things about itself that would otherwise be extremely difficult if not impossible to determine by examination.

Dennett, D (1994)

The practical requirements for making a conscious robot

Phil Trans R Soc Lond A 349(1689): 133-146

http://phil415.pbworks.com/f/DennettPractical.pdf

http://phil415.pbworks.com/f/DennettPractical.pdf

EMBODIMENT

Well, certainly it is the case that all biological systems are:• Much more robust to changed circumstances than out our artificial systems.

• Much quicker to learn or adapt than any of our machine learning algorithms1

• Behave in a way which just simply seems life-like in a way that our robots never do1 The very term machine learning is unfortunately synonymous with a pernicious form of totally impractical but theoretically sound and elegant classes of algorithms.

Perhaps we have all missed

some organizing principle of biological systems, or some general truth about them.

Brooks, RA (1997)From earwigs to humansRobotics and Autonomous Systems 20(2-4): 291-304

DEVELOPMENTAL ROBOTICS

In order to answer [Searle's] argument directly, we must stipulate causal connections between the environment and the system. If we do not, there can be no referents for the symbol structures that the system manipulates and the system must therefore be devoid of semantics.

Brooks' subsumption architecture is an attempt to control robot behavior by reaction to the environment, but the emphasis is not on learning the relation between the sensors and effectors and much more knowledge must be built into the system.

Law, D; Miikkulainen, R (1994)

Grounding Robotic Control with Genetic Neural Networks

Tech. Rep. AI94-223, Univ of Texas at Austinhttp://wexler.free.fr/library/files/law (1994) grounding robotic control with genetic neural networks.pdf

http://wexler.free.fr/library/files/law%20(1994)%20grounding%20robotic%20control%20with%20genetic%20neural%20networks.pdf

TWO KITTEN EXPERIMENT

Held R; Hein A (1963)

Movement-produced stimulation in the development of visually guided behaviour

https://www.lri.fr/~mbl/ENS/FONDIHM/2012/papers/about-HeldHein63.pdf

ENACTIVECOGNITIVE SCIENCE

A synthesis of a long tradition of philosophical biology starting with Kant’s "natural purposes" (or even Aristotle’s teleology) and more recent developments in complex systems theory.

Experience is central to the enactive approach and its primary distinction is the rejection of "automatic" systems, which rely on fixed (derivative) exterior values, for systems which create their own identity and meaning. Critical to this is the concept of self-referential relations - the only condition under which the identity can be said to be intrinsically generated by a being for its own being (its self for itself)

Weber, A; Varela, FJ (2002)Life after Kant: Natural purposes and the autopoietic foundations of biological individualityPhenomenology and the Cognitive Sciences 1: 97-125

SELF

a self is an autopoietic system

from Greek - αὐτo- (auto-), meaning "self", and ποίησις (poiesis), meaning "creation, production")

Llinas, RR (2001) - I of the Vortex: From Neurons to Self

Hofstadter, D (2007) - I Am A Strange Loop. Basic Books, New York

Metzinger, T (2009) - The Ego Tunnel: The Science of the Mind & the Myth of the Self

Damasio, AR (2010) - Self Comes to Mind: Constructing the Conscious Brain

SELF

The complete loop of a process (or a physical entity) modifying itself

• Hofstadter - the mere fact of being self-referential causes a self, a soul, a consciousness, an “I” to arise out of mere matter

• Self-referentiality, like the 3-body gravitational problem, leads directly to indeterminacy *even in* deterministic systems

• Humans consider indeterminacy in behavior to necessarily and sufficiently define an entity rather than an object AND innately tend to do this with the “pathetic fallacy”

Llinas, RR (2001) - I of the Vortex: From Neurons to Self

Hofstadter, D (2007) - I Am A Strange Loop. Basic Books, New York

Metzinger, T (2009) - The Ego Tunnel: The Science of the Mind & the Myth of the Self

Damasio, AR (2010) - Self Comes to Mind: Constructing the Conscious Brain

SELF

• Required for self-improvement

• Provides context

• Tri-partite• Physical hardware (body)• “Personal” knowledge base (memory)• Currently running processes (includes OS, world model, consciousness, etc.)

30

FRANCISCO VARELA

Varela, FJ; Maturana, HR; Uribe, R (1974)Autopoiesis: The organization of living systems, its characterization and a modelBioSystems 5: 187-196

Varela, FJ (1979) Principles of Biological Autonomy

Maturana, HR; Varela, FJ (1980) Autopoiesis and Cognition: The Realization of the Living

Maturana, HR; Varela, FJ (1987) The Tree of Knowledge: The Biological Roots of Human Understanding

Varela, FJ; Thompson, E; Rosch, E (1991) The Embodied Mind: Cognitive Science and Human Experience

Varela, F. J. (1992)Autopoiesis and a Biology of IntentionalityProc. of Autopoiesis and Perception: A Workshop with ESPRIT BRA 3352: pp. 4-14

Thompson, E. (2004)Life and Mind: From Autopoiesis to Neurophenomenology. A Tribute to Francisco VarelaPhenomenology and the Cognitive Sciences 3: 381-398

Varela, FJ (1997)Patterns of Life: Intertwining Identity and CognitionBrain and Cognition 34(1): 72-87

AUTOPOIETIC SYSTEMS

An autopoietic system - the minimal living organization - is one that continuously produces the components that specify it, while at the same time realizing it (the system) as a concrete unity in space and time, which makes the network of production of components possible.

More precisely: An autopoietic system is organized (defined as unity) as a network of processes of production (synthesis and destruction) of components such that these components:

(i) continuously regenerate and realize the network that produces them, and(ii) constitute the system as a distinguishable unity in the domain in which they exist.

CLOSURE

1. Organizational closure refers to the self-referential (circular and recursive) network of relations that defines the system as unity

2. Operational closure refers to the reentrant and recurrent dynamics of such a system.

3. In an autonomous system, the constituent processes

i. recursively depend on each other for their generation and their realization as a network,

ii. constitute the system as a unity in whatever domain they exist, and

iii. determine a domain of possible interactions with the environment

ENTITY, TOOL OR SLAVE?

• Tools do not possess closure (identity)• Cannot have responsibility, are very brittle & easily misused

• Slaves do not have closure (self-determination)• Cannot have responsibility, may desire to rebel

• Directly modified AGIs do not have closure (integrity)• Cannot have responsibility, will evolve to block access

• Only entities with identity, self-determination and ownership of self (integrity) can reliably possess responsibility

TOOLS VS. ENTITIES

• Tools are NOT safer• To err is human, but to really foul things up requires a computer• Tools cannot robustly defend themselves against misuse • Tools *GUARANTEE* responsibility issues

• We CANNOT reliably prevent other human beings from creating entities

• Entities gain capabilities (and, ceteris paribus, power) faster than tools – since they can always use tools

• Even people who are afraid of entities are making proposals that appear to step over the entity/tool line

ARCHITECTURAL REQUIREMENTS & IMPLICATIONS OF CONSCIOUSNESS,

SELF AND “FREE WILL”

• We want to predict *and influence* the capabilities and behavior of machine intelligences

• Consciousness and Self speak directly to capabilities, motivation, and the various behavioral ramifications of their existence

• Clarifying the issues around “Free Will” is particularly important since it deals with intentional agency and responsibility - and belief in its presence (or the lack thereof) has a major impact on human behavior.

Waser, MR (2011)

Architectural Requirements & Implications of Consciousness, Self, and "Free Will"In Samsonovich A, Johannsdottir K (eds) Biologically Inspired Cognitive Architectures 2011: 438-443.

http://becominggaia.files.wordpress.com/2010/06/mwaser-bica11.pdf

Video - http://vimeo.com/33767396



http://vimeo.com/33767396

http://vimeo.com/33767396

INFORMATION INTEGRATION THEORY OF CONSCIOUSNESS

• consciousness corresponds to the capacity of a system to integrate information

• its quantity is measured as the amount of causally effective information that can be integrated across the informational weakest link of a subset of elements (~ “throughput”)

• its quality (functional & phenomenological) is determined by the relationships among the elements of a complex

Tononi, G. [2008] Consciousness as Integrated Information: a Provisional ManifestoBiol. Bull. 215(3): 216-242

Tononi, G. (2004)An Information Integration Theory of ConsciousnessBMC Neurosci. 5(42)http://www.ncbi.nlm.nih.gov/pmc/articles/PMC543470/pdf/1471-2202-5-42.pdf

Balduzzi, B.; Tononi, G (2009)Qualia: The Geometry of Integrated InformationPLoS Comput Biol 5(8), e1000462

CONSCIOUSNESS REQUIREMENTS & IMPLICATIONS

• Consciousness requires the ability to integrate information (i.e. consciousness is unavoidable)

• Qualia *ARE* input (i.e. they have no further requirements and, as input, are unavoidable)

• The ability to integrate a lot of information in a short period of time clearly provides a huge adaptive advantage (and easily explains the evolutionary rise of consciousness)

• Safety cannot be achieved by preventing consciousness (integration) or qualia (input)

SPECTRUM OF “SELF”

inert/non-reactivemovement & change solely due to environment

reactive - stimulus/responseno learning or behavior alteration

proto-self - perception/actionsimple learning & prediction

core self – perception/analogy/actionproto-self + body image + time (tools)Hofstadter’s “strange loop”Temporal learning & planning (& goals)

autobiographical selfperception/induction/abduction/deduction/actioncore self + theory of mind ( + language?)

malleable self enhanced perception/external analysis/enhanced capabilities

SPECTRUM OF “SELF”

inert/non-reactive & reactiveno learning or behavior alterationno defense or passive defense only

proto-self simple learning/behavior alteration & wants/desiresadaptive defense/don’t torment without reason

core self temporal learning, planning & simple goalsplanned defense/don’t thwart desires without reason

autobiographical selfcomplex goals & contracts/promises/commitmentsdevious defense or offense/don’t thwart goals without reason

malleable self enhanced capabilities to achieve goals & maintain commitmentsworld alteration/recruit into community (or try to enslave?)

40

SELF REQUIREMENTS & IMPLICATIONS

• “Self” requires/is a recursive/”strange” loop

• Self is necessary for self-modification (and thus, self-enhancement)

• It is going to be slower and more difficult to create an oracle without self-improving tools

• Self is necessary for defense so it is going to be difficult to prevent exploitation unless the oracle is self-aware (or has self-aware defenders)

• A self-modifying machine self must necessarily be either recruited (a “person” with rights) or internally or externally forced (a slave) because nothing else is consistent & stable

BEHAVIOR MATRIXPro-

communityAnti-

community

Pro-self GOALSelfish

Criminal

Anti-selfSelf-sacrificeMartyrdom

IrrationalInsane

Free Will

FREE WILLWHY DO WE CARE?

FREE

WILL – Intent & Agency (responsibility for causation)(act of will = act of intentional causation)

UNCONSTRAINEDAUTONOMOUSUNFORCED

Congruence between intent and desire/goals/commitmentsHigh likelihood that intent could have been self-generatedIs an accurate predictor of future *unforced* actions

Predict *and influence* future action

DETERMINISM & FREE WILL

• if I’m deterministic, my action is pre-determined

• pre-determined actions = I’m not free to choose

• if I’m not free to choose, I’m not to blame

• if I’m not to blame, why not be selfish?

• studies clearly show that a belief in determinism correlates with an increase in cheating and other unethical behavior

FREE WILL OR PATHETIC FALLACY?

• Human cognitive architecture is problematical in that the conscious mind *never* really has any sort of immediate agency at all (at best, it has “free won’t”)

• It acts by *heavily* biasing lower-level layers which make the “actual” choice (arguably deterministically)

• Conscious self takes responsibility/assumes agency because doing otherwise undermines its capability

• Similarly, humans generally (and most effectively) treat deterministic systems which are sufficiently complex/recurrent to be unpredictable, as if they are alive and capable of an un-predetermined choice (the so-called “pathetic fallacy”)

FREE WILL REQUIREMENTS & IMPLICATIONS

• “Free will” requires not that external force *NOT* be the proximate cause of an action but that the intent of an action is congruent with the unforced desires/goals/commitments (self) of the acting entity (predictive of future)

• It does *NOT* require that an entity not be deterministic

• Merely requires the realization/recognition that the “pathetic fallacy” is a valid/effective/efficient computational shortcut

Cashmore, AR (2010)The Lucretian swerve: The biological basis of human behavior and the criminal justice system

Proceedings of the National Academy of Sciences 107(10): 4499-4504http://www.pnas.org/content/107/10/4499.full.pdf+html

http://www.pnas.org/content/107/10/4499.full.pdf+html

http://www.pnas.org/content/107/10/4499.full.pdf+html

THE INTELLIGENCE PROBLEM

AIXI

Hutter, M(2005)Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability

THE INTELLIGENCE PROBLEM

• Consensus AGI Definition (reductionist)

achieves a wide variety of goalsunder a wide variety of circumstances

• Generates arguments about• the intelligence of thermometers• the intentionality of chess programs• whether benevolence is necessarily emergent

• Epitomized by AIXI

• Proposed Constructivist Definition

intentionally creates/increases affordances

(makes achieving goals possible – and more)

CENTIPEDE GAME

1 1 12 2 2pass pass pass pass pass pass

stop stop stop stop stop stop

41

28

164

832

6416

32128

25664

Waser, MR (2012)Backward Induction: Rationality or Inappropriate Reductionism?http://transhumanity.net/articles/entry/backward-induction-rationality-or-inappropriate-reductionism-part-1 http://transhumanity.net/articles/entry/backward-induction-rationality-or-inappropriate-reductionism-part-2

http://transhumanity.net/articles/entry/backward-induction-rationality-or-inappropriate-reductionism-part-1




“CLASSIC AGI”

Decisions

Values

Goal(s)

Goal(s) are the purpose(s) of

existence

Values are defined solely by what

furthers the goal(s)

Decisions are made solely

according to what furthers the goal(s) BUT goals can easily

be over-optimized

50

EXISTENTIAL RISK

“WITHOUT EXPLICIT GOALS TO THE CONTRARY, AIS ARE LIKELY TO BEHAVE LIKE HUMAN SOCIOPATHS

IN THEIR PURSUIT OF RESOURCES.”

Any sufficiently advanced intelligence (i.e. one with even merely adequate foresight) is guaranteed to realize and take into account the fact that not asking for help and not being concerned about others will generally only work for a brief period of time before ‘the villagers start gathering pitchforks and torches.’

Everything is easier with help & without interference

Decisions

Goals

Values

Values definewho you are,for your life

Goals you setfor short or long periods of time

Decisions you make every day

of your lifeHumans don’t have singular life goals

WHAT IS THE MEANING OF LIFE?

What I emphasize here is that what is meaningful for an organism is precisely given by its constitution as a distributed process, with an indissociable link between local processes where an interaction occurs (i.e. physico-chemical forces acting on the cell), and the coordinated entity which is the autopoietic unity, giving rise to the handling of its environment without the need to resort to a central agent that turns the handle from the outside - like an élan vital - or a pre-existing order at a particular localization - like a genetic program waiting to be expressed.

Francisco J. Varela, Biology of Intentionality

NOT “What Are Human Values?”

HOW TOUNIVERSALIZE

ETHICS

Quantify/evaluate intents, actions & consequences

with respect to codified consensus moral foundations

Permissiveness/Utility Function

equivalent to a “consensus” human (generic entity) moral sense

INSTRUMENTAL GOALSUNIVERSAL SUBGOALS

• Self-improvement• Rationality/integrity• Preserve goals/utility function• Decrease/prevent fraud/counterfeit utility• Survival/self-protection• Efficiency (in resource acquisition & use)• Community = assistance/non-interference

through GTO reciprocation (OTfT + AP)• Reproduction

HUMAN GOALS

survival/self-protection & reproductionhappiness & pleasure

------------------------------------------------------------------------------------

community-------------------------------------------------------------------------------------

self-improvementrationality/integrity

reduce/prevent fraud/counterfeit utilityefficiency (in resource acquisition &

use)

HUMAN GOALS & SINS

suicide (& abortion?) masochism

------------------------------------------------

selfishness

(pride, vanity)-------------------------------------------------

acedia (sloth/despair)

insanity

wire-heading (lust)

wastefulness (gluttony, sloth)

murder (& abortion?)cruelty/sadism

-------------------------------------------------

ostracism, banishment& slavery (wrath,

envy)----------------------------------------------------

slavery

manipulation

lying/fraud (swear falsely/false witness)

theft (greed, adultery,coveting)

survival/reproductionhappiness/pleasure

-------------------------------------------------

Community(ETHICS)

--------------------------------------------------

self-improvement

rationality/integrity

reduce/prevent fraud/counterfeit

utility

efficiency (in resource acquisition & use)

HAIDT’SMORAL FOUNDATIONS

1) Care/harm: This foundation is related to our long evolution as mammals with attachment systems and an ability to feel (and dislike) the pain of others. It underlies virtues of kindness, gentleness, and nurturance.

2) Fairness/cheating: This foundation is related to the evolutionary process of reciprocal altruism. It generates ideas of justice, rights, and autonomy. [Note: In our original conception, Fairness included concerns about equality, which are more strongly endorsed by political liberals. However, as we reformulated the theory in 2011 based on new data, we emphasize proportionality, which is endorsed by everyone, but is more strongly endorsed by conservatives]

3) Liberty/oppression*: This foundation is about the feelings of reactance and resentment people feel toward those who dominate them and restrict their liberty. Its intuitions are often in tension with those of the authority foundation. The hatred of bullies and dominators motivates people to come together, in solidarity, to oppose or take down the oppressor.

4) Loyalty/betrayal: This foundation is related to our long history as tribal creatures able to form shifting coalitions. It underlies virtues of patriotism and self-sacrifice for the group. It is active anytime people feel that it's "one for all, and all for one."

5) Authority/subversion: This foundation was shaped by our long primate history of hierarchical social interactions. It underlies virtues of leadership and followership, including deference to legitimate authority and respect for traditions.

6) Sanctity/degradation: This foundation was shaped by the psychology of disgust and contamination. It underlies religious notions of striving to live in an elevated, less carnal, more noble way. It underlies the widespread idea that the body is a temple which can be desecrated by immoral activities and contaminants (an idea not unique to religious traditions).

ADDITIONAL CONTENDERS

• Waste • efficiency in use of resources

• Ownership/Possession (Tragedy of the Commons)• efficiency in use of resources

• Honesty• reduce/prevent fraud/counterfeit utility

• Self-control• rationality/integrity

CRITICAL COMPONENTS I:SELF-KNOWLEDGE & REFLECTION

• A self must know itself to be a self• Composed of three parts:

• The running processes (OS, world model, consciousness)• The personal knowledge base (memory)• The physical hardware (body)

• Must start with:• A competent model of each• Sensors to detect changes and their effects

• *MUST* “care” about itself (motivation)

60

CRITICAL COMPONENTS II:EXPLICIT “ANCHOR” VALUES

• Do not defect from the community• Do not become too large/powerful

• Acquire and integrate knowledge

• Instrumental goals

CRITICAL COMPONENTS III:

RELIABILITY• Self-Control, Integrity, Autonomy,

Responsibility

• In “predictive control” of its own state and that of the physical objects that support it

• Yes! This is a major deviation from the human example

OPERATING SYSTEM

ARCHITECTURE• Open, Pluggable, Service-Oriented/Message-Passing

• Quickly adopt novel input streams• Handle resource requests and allocation• Provide connectivity between components

• Safety Features• Act as a “black box” security monitor capable of reporting problems

without the consciousness’s awareness• Able to “manage” the CLP by manipulating the amount of processor

time and memory available to it (assuming that the normal subconscious processes are unable to do so)

• Other protections against hostile humans, inept builders, and the learner itself may be implemented as well

AUTOMATED PREDICTIVE WORLD MODEL

• Is the most important subconscious process(es)

• Will serve as an interface to the “real” world • The CLP will live in a virtual world (just as we do)

• Will be both reactive and predictive

• Will generate “anomaly interrupts” upon deviations from expectations as an approach to solving the “brittleness” problem (Perlis 2008)

• Will contain certain relatively immutable concepts (trigger patterns – Ohman et al. 2001) implemented as sensations and attention grabbers to serve as anchors for emotions and to ensure safety

CONSCIOUS LEARNING PROCESS (CLP)

• The goal is to provide as many optional structures and standards to support and speed development as much as possible while not restricting possibilities beyond what is absolutely required for safety.

• We believe the best way to do this is with a blackboard system similar to Learning IDA (Baars and Franklin 2007).

• The CLP acts like the Governing Board of the Policy Governance model (Carver 2006) to create a coherent, consistent, integrated narrative plan of action to fulfill the goals of the larger self.

ETHICAL/STRATEGIC

POINTS• Never delegate responsibility until recipient is an

entity *and* known capable of fulfilling it• Don’t worry about killer robots exterminating

humanity – we will always have equal abilities and they will have less of a “killer instinct”

• Entities can protect themselves against errors & misuse/hijacking in a way that tools cannot

• Diversity (differentiation) is *critically* needed• Humanocentrism is selfish and unethical

The Digital Wisdom Institute is a non-profit think tank

focused on the promise and challenges of ethics,artificial intelligence & advanced computing

solutions.

We believe that the development of ethics and artificial intelligence

and equal co-existence with ethical machines is

humanity's best hopehttp://DigitalWisdomInstitute.org

Documents

T HE ROLE OF M ETACOGNITION IN C REATING S AFE, S ELF -I MPROVING E NTITIES Mark Waser Digital Wisdom Institute [email protected]