NATURAL AND ARTIFICIAL INTELLIGENCE M , MAPS AND …users.dsic.upv.es/~jorallo/TALKS/ClareHallTalk-v.1.4.pdf · Cognitive robots Intelligent assistants Pets, animats and other artificial

NATURAL AND ARTIFICIAL INTELLIGENCE:

MEASURES, MAPS AND TAXONOMIES

José Hernández-Orallo ([email protected])Universitat Politècnica de València, Valencia (www.upv.es)

Leverhulme Centre for the Future of Intelligence, Cambridge (lcfi.ac.uk)

Clare Hall, Cambridge, 1 August 2018

mailto:[email protected]

http://www.upv.es/

lcfi.ac.uk

A COPERNICAN REVOLUTION!

Places humans, non-human animals and AI in a wider landscape:

Still strong inertias: “human-level”, “rat-level”, cladistic taxonomies, etc.

N A T U R A L A N D A R T I F I C I A L I N T E L L I G E N C E : M E A S U R E S , M A P S

A N D T A X O N O M I E S2

Human Intelligence

Natural Intelligence

Artificial Intelligence

Intelligence Landscape

AI: WHAT CAN THEY DO?



Play board games

Win TV quizzes? Paint?

See faces?

Play videogames?

Don’t look at the breakthroughs!

HOW MUCH IS AI PROGRESSING?

AI index



Tegmark’s “Life 3.0”

WHAT JOBS ARE MORE LIKELY TO BE REPLACED?



White-collar jobs were in danger!

“Most fears of automation are misplaced. As the new generation of intelligent devices

appears, it will be the stock analysts and petrochemical engineers and parole board

members who are in danger of being replaced by machines. The gardeners, receptionists,

and cooks are secure in their jobs for decades to come” (Steven Pinker “The Language

Instinct”, 1995).

Risk of automation (Frey and Osborne “The Future of employment” 2017):

“Financial Analysts” (0.23), “Chemical engineers” (0.017), “judges, magistrate judges, and

magistrates” (0.4).

“Landscaping and groundskeeping workers” (0.95), “receptionists” (0.96), “cooks” (0.96)

Who’s right?

IS SUPERINTELLIGENCE NEAR?

https://futureoflife.org/superintelligence-survey/



Does superintelligence lead to the singularity?

“Let an ultraintelligent machine be defined as a machine that can far surpass all the

intellectual activities of any man however clever. Since the design of machines is one of

these intellectual activities, an ultraintelligent machine could design even better

machines; there would then unquestionably be an ‘intelligence explosion’.” (Good 1965),

Not everyone agrees (Will it happen? What will be the consequences?):

Do we really know what

“superintelligence” is and

how it can grow?

https://futureoflife.org/superintelligence-survey/

WILL AI BE BENEFICIAL?

Asilomar Conference 2017: Beneficial AI 2017

Principles: https://futureoflife.org/ai-principles/

Safety issues, privacy, inclusion, global standards, competition, ..



https://futureoflife.org/ai-principles/

HOW TO PREPARE FOR ALL THIS?



Better understanding of the intelligence landscape:

Measures: We need measures to compare artificial and natural intelligence, to

determine the pace of progress in AI and to link the capability and generality of

new intelligent systems with the resources they may require.

Maps: We need topological ways to locate different kinds of intelligence

(artificial, natural or hybrid), to spot those areas that may be unsafe or

unethical, and to determine the trajectories we want to pursue.

Taxonomies: We need to understand the diversity of natural and artificial

cognition, and cluster all these entities according to meaningful behavioural

traits, instead of phylogenetic approaches or machine learning architectures.

Almost everything to be done: three great scientific opportunities

HOW TO PREPARE FOR ALL THIS?



small

big

MEASUREMENT. WHY IS IT SO IMPORTANT?



“Measure and Evaluate AI Technologies through Standards and Benchmarks”

Strategy 6 (of 7) in U.S. National AI Research and Development Plan: (2016)

“Greatest accuracy, at the frontiers of science, requires greatest effort, and probably

the most expensive or complicated of measurement instruments and procedures”

(David Hand, “Measurement: A Very Short Introduction”, OUP, 2004).

“Public authorities must act in order to develop and implement standards, tests and

measurement methods [for] AI technology”

Villani report (French AI Strategy): (2018)

MEASURING ARTIFICIAL INTELLIGENCE

Specific (task-oriented) AI systems



Machine translation, information retrieval,

summarisation

Warning! Intelligence

NOT included.

PR: computer vision,

speech recognition, etc.

Robotic

navigation

Driverless

vehicles

Prediction and

estimation

Planning and

scheduling

Automated

deductionKnowledge-

based assistants

Game

playing


NOT included. Warning! Intelligence

NOT included.


NOT included.


NOT included.


NOT included.


NOT included.


NOT included.


NOT included.

All images from wikicommons

MEASURING ARTIFICIAL INTELLIGENCE

Specific domain evaluation settings: CADE ATP System Competition PROBLEM BENCHMARKS

Termination Competition PROBLEM BENCHMARKS

The reinforcement learning competition PROBLEM BENCHMARKS

Program synthesis (Syntax-guided synthesis) PROBLEM BENCHMARKS

Loebner Prize HUMAN DISCRIMINATION

Robocup and FIRA (robot football/soccer) PEER CONFRONTATION

International Aerial Robotics Competition (pilotless aircraft) PROBLEM BENCHMARKS

DARPA driverless cars, Cyber Grand Challenge, Rescue Robotics PROBLEM BENCHMARKS

The planning competition PROBLEM BENCHMARKS

General game playing AAAI competition PEER CONFRONTATION

BotPrize (videogame player) contest HUMAN DISCRIMINATION

World Computer Chess Championship PEER CONFRONTATION

Computer Olympiad PEER CONFRONTATION

Annual Computer Poker Competition PEER CONFRONTATION

Trading agent competition PEER CONFRONTATION

Robo Chat Challenge HUMAN DISCRIMINATION

UCI repository, PRTools, or KEEL dataset repository. PROBLEM BENCHMARKS

KDD-cup challenges and ML kaggle competitions PROBLEM BENCHMARKS

Machine translation corpora: Europarl, SE times corpus, the euromatrix, Tenjinno competitions… PROBLEM BENCHMARKS

NLP corpora: linguistic data consortium, … PROBLEM BENCHMARKS

Warlight AI Challenge PEER CONFRONTATION

The Arcade Learning Environment PROBLEM BENCHMARKS

Pathfinding benchmarks (gridworld domains) PROBLEM BENCHMARKS

Genetic programming benchmarks PROBLEM BENCHMARKS

CAPTCHAs HUMAN DISCRIMINATION

Graphics Turing Test HUMAN DISCRIMINATION

FIRA HuroCup humanoid robot competitions PROBLEM BENCHMARKS

…



http://www.cs.miami.edu/~tptp/CASC/J4/

http://termination-portal.org/wiki/Termination_Competition_2014

http://www.rl-competition.org/

http://www.sygus.org/

http://www.loebner.net/Prizef/loebner-prize.html

http://www.robocup.org/

http://www.fira.net/

http://www.aerialroboticscompetition.org/

http://archive.darpa.mil/grandchallenge/

http://www.darpa.mil/cybergrandchallenge/

http://www.theroboticschallenge.org/

http://ipc.icaps-conference.org/

http://games.stanford.edu/

http://www.botprize.org/

http://www.icga.org/

http://www.icga.org/

http://www.computerpokercompetition.org/

http://tradingagents.org/organisation/

http://www.robochatchallenge.com/

http://archive.ics.uci.edu/ml/

http://prtools.org/

http://sci2s.ugr.es/keel/datasets.php

http://www.sigkdd.org/kddcup/index.php

https://www.kaggle.com/

http://www.statmt.org/europarl/

http://www.statmt.org/setimes/

http://matrix.statmt.org/matrix/info

https://www.ldc.upenn.edu/new-corpora

AI Challenge

http://www.arcadelearningenvironment.org/

http://www.movingai.com/benchmarks/

http://gpbenchmarks.org/

http://www.captcha.net/

http://arxiv.org/abs/cs/0603132v1

http://www.fira.net/contents/sub03/sub03_1.asp

MEASURING (ARTIFICIAL) INTELLIGENCE

How to evaluate general-purpose systems and cognitive components?



Cognitive robots

Intelligent assistants

Pets, animats and other

artificial companions

Smart environments

Agents, avatars, chatbotsWeb-bots, Smartbots, Security bots…

Warning! Some intelligence

MAY BE included.


MAY BE included.


MAY BE included.


MAY BE included.


MAY BE included.


MAY BE included.


AI-completeness benchmarks:

Science exams, commonsense reasoning

The “Mythical” Turing Test:

And a myriad variants....

New evaluation platforms:

Videogames, naïve physics, etc.

Psychometric tests:

IQ tests, developmental tests, …

Comparative cognition (animal) tests:

Morgan’s canon?




Adapting tests between disciplines (AI, psychometrics, comparative

psychology) is problematic:

Test from one group only valid and reliable for the original group.

Not necessary and/or not sufficient for the ability.

Machines and hybrids represent a new population.

Nowadays, many benchmarks are assuming that AI will use deep

learning with millions of examples.

But machines and hybrids are also an opportunity to understand how

to evaluate cognitive tasks and cognitive abilities. However,



We need a different foundation

MEASURING INTELLIGENCE

From anthropocentrism:

Or even from biocentrism:

To a more principled approach:

“The Measure of All Minds: Evaluating Natural and Artificial Intelligence”,

Cambridge University Press, 2017. http://www.allminds.org



“Man is the measure of all things”

(Protagoras, 5th century BCE)

[intellectual faculties] “have been perfected or advanced

through natural selection” (Darwin, 1871, p. 128).

http://www.allminds.org/

MAPS: THE ATLAS OF INTELLIGENCE

Can we represent the state and course of cognition graphically,

including regions and trajectories according to different dimensions?

CFI initiative:

Main goal:



The Atlas of Intelligence: a collection of maps

Map a relevant portion of the actual and future landscape of

cognition through an atlas of intelligence, collecting and exhibiting

information of all kinds of intelligence, including humans, non-

human animals, AI systems, hybrids and collectives thereof.

MAPS: FIGURATIVE OR ACTUAL DATA?



Left: Figurative “human-likeness” vs consciousness (from Shanahan 2016). Right: Two dimensions of cognitive skills (social vs

physical domain) according to the results of a test battery on three different groups of apes (adapted from Herrmann et al. 2007).

Humanlikeness vs Consciousness

MAPS: INCLUDING HUMAN, NON-HUMAN ANIMALS AND AI?



Comparison between humans, monkeys and a DNN for visual object recognition according to several

psychophysic features (Rajalingham 2018).

MAPS: FROM DATA TO SPATIAL REPRESENTATIONS



Comparison of data from (Schaie 1996) for 10 cohorts ranging during 1903 and 1966 for two scores: inductive reasoning and numeric

ability. “Aging” shows the effect of age, from 25 to 88 years. “Flynn” shows the results for new generations. Left: By placing time as x-axis we

see things evolve, but we do not really see that their trajectories are opposite. Right: By placing the scores as two dimensions, we can now

plot a real trajectory, where “Aging” goes left down and “Flynn” goes in opposite direction initially and then stops diverging on numeric ability.

MAPS: MORE TRAJECTORIES



Multidimensional utility space for Alpha* for Go (left) and several AI techniques for Atari games (right). Research gradient

evolution from 2013 to 2018 is represented as a trajectory with a segmented grey arrow (Martínez-Plumed et al. 2018b).

Linnaeus: consolidated the binomial

nomenclature:

Made it possible to classify and catalogue

natural systems.

Physical phenotypical traits dominated the

taxonomy.

Today, superseded by the phylogenetic

nomenclature:

Genotype (DNA) dominates the taxonomy.



TAXONOMIES: FROM OBSERVABLE TO NON-OBSERVABLE

We need behavioural taxonomies for the intelligence landscape!

What about convergent evolution (similar behavioural traits: perception, general

and social intelligence)?

What about artificial, hybrid and collective behaviours not governed by evolution?



TAXONOMIES: BACK TO OBSERVABLES

Left : Scala naturae, as depicted in the 16th century (de Valades, 1579). Middle: a representation of Dennett's Tower of Generate

and Test, which depicts creatures according to when and how they adapt (Dennett, 1995), Right: Godfrey-Smiths refinement of the

bottom part of Dennett's tower (the part corresponding to cognitive evolution) in the form of a tree (Godfrey-Smith, 2015, Fig. 2).

(Numeric) Taxonomies can be built from features and measurements

(e.g., phenetics), but can also be built from similarity metrics.

Can be derived from multidimensional maps through clustering, but maps can

be derived from similarity metrics as well.



TAXONOMIES: MANY POSSIBILITIES

Left: Dendrogram of different machine learning families (for supervised problems) according to their behavioural

similarity (data from Fabra-Boluda et al. 2017, 2018). Right: corresponding MDS representation.

CONCLUSIONS

To understand AI and its future, we need to understand intelligence,

In all its varieties and forms: natural intelligence, and especially human

intelligence, as a special case.

A variability and diversity of phenomena that can be understood through:

Measures: metrics and instruments.

Maps: projections, aggregations and representations.

Taxonomies: categories and groups.

The right dimensions and structure of intelligence are not known yet,

But this exercise will help progress in this understanding.



A new age of discovery:

metrologists, cartographers and taxonomists wanted!

Documents

NATURAL AND ARTIFICIAL INTELLIGENCE M , MAPS AND …users.dsic.upv.es/~jorallo/TALKS/ClareHallTalk-v.1.4.pdf · Cognitive robots Intelligent assistants Pets, animats and other artificial