39
HyperMembrane Structures for Open Source Cognitive Computing Japanese Agency for Science and Technology Tokyo, Japan 3 March, 2015 Jack Park © 2015 TopicQuests Foundation CC by SA 4.0

HyperMembrane Structures for Open Source Cognitive Computing

Embed Size (px)

Citation preview

HyperMembrane Structures for Open Source Cognitive Computing

Japanese Agency for Science and Technology Tokyo, Japan 3 March, 2015 Jack Park

© 2015 TopicQuests Foundation CC by SA 4.0

The Present Situation

Upon this gifted age, in its dark hour, Rains from the sky a meteoric shower Of facts . . . they lie unquestioned, uncombined. Wisdom enough to leech us of our ill Is daily spun; but there exists no loom To weave it into fabric

Edna St. Vincent Millay, 1939

2

Topics To Cover

• Discovery, learning, problem solving

• Topic Maps

• OpenSherlock

• HyperMembranes

• Open Source

• Key reasons for building open source cognitive systems

3

Cognitive Computing: My View

• Cognitive Computing is:

– Far less about what a computer knows

– Far more about how computers can augment human cognitive capabilities

– Based on the J.C.R Licklider and Douglas Engelbart augmentation work

J.C.R. Licklider

Douglas Engelbart

4 Imgs: Wikipedia

A Domain-specific Problem Statement

• An Example:

– Do these two sentences say the same thing?

• CO2 is a causal factor in climate change.

• Climate change is caused by carbon dioxide.

• Problem Statement

– Software agents need elegant methods for reading, representing, organizing, and modeling information resources to support discovery and answering questions.

5

A Framing Thought

• From [1]

– The understanding of global brain organization and its large-scale integration remains a challenge for modern neurosciences.

• To

– The understanding of global conversations about topics that matter and their large-scale federation remain a challenge for modern information technology.

[1] Petri G, Expert P, Turkheimer F, Carhart-Harris R, Nutt D, Hellyer PJ, Vaccarino F. (2014) Homological scaffolds of brain functional networks. J. R. Soc. Interface 11: 20140873.

6

Our Goals

• Improve Human-Tool Capabilities

• Augment existing analytic methods

– Increase opportunities for discovery

– Improve already sophisticated methods

• Build Looms

– Read documents

– Map and model topics read

– Weave information fabrics

Douglas Engelbart

7

Discovery

• Is it really possible for people to see everything? – Part of discovery is connecting dots not

yet connected. – “Cognitive Agents” can help increase

chances of serendipity.

“Discovery consists of seeing what everybody has seen and thinking what nobody has thought.”

–Albert Szent-Györgyi

8

Related Work

• Commercial – IBM Watson – Wolfram Alpha – Viv – Saffron 10 – Clueda – Siri – Google Now – Cortana – …

• Open Source – OAQA – DeepDive – OpenCog – OpenNARS – Watsonsim – YodaQA – AKSW OpenQA – AKSW QA – AquaLog – OpenSherlock – OpenIRIS (CALO) – …

• Research

– Project Aristo

– Project Halo

– FREyA

– CASIA

– NLP-Reduce

– EIS Sina

– WDAqua ITM

– Intui2

– …

9

Biologically Inspired Design

• Humans are blessed with:

– Memory to keep concepts organized and connected

– Internal mechanisms which map sensor data into memory for processing and storage

– The abilities of complex, adaptive, anticipatory systems

10

Memory: Introducing Topic Maps

• A Topic Map is like a library without all the books* – A Topic Map is indexical

• Like a card catalog – Each topic has its own representation

• Improving on a card catalog, a topic can be identified many different ways

• Captures metadata and optionally content

– A Topic Map is relational • Like a good road map

– Topics are connected by associations (relations) – Topics point to their occurrences in the territory

– A Topic Map is organized • Multiple records on the same topic are co-located (stored as one

topic) in the map

*a map is not its territory

11

TopicMap Structure

•Topics as Actors •Topics as Relations •Topics as Types •Topics as Biographies

12

Processing Mechanisms

• Typically, software processes take the form of variants of NLP (natural language processing)

– Parsers

– Cluster analysis

– Entity recognition

– Relation detection

– Role recognition

– Probabilistic methods

13

A Key Question in My Research

• Can a Topic Map learn (construct itself) by “reading” literature? – Relevant issues:

• Bootstrapping • Machine reading

– NLP – Linguistics – Statistics – Analogy & Metaphor – …

• Knowledge representation • Model building

– Anticipation

• Weaving information fabrics • Literature-based discovery • Deep Question Answering

14

A Simple Example

• Read this sentence: – Gene expression is caused by insoluble hormones

binding to a plasma membrane hormone receptor

• Topic Map recognizes: – Gene expression GeneExpression – insoluble hormones InsolubleHormone – plasma membrane hormone receptor

PlasmaMembraneReceptor

• Software agents transform: – is caused by Cause – binding to Binds

• Final semantic structure: • { {InsolubleHormone, Binds, PlasmaMembraneReceptor}, Cause, GeneExpression }

15

Introducing OpenSherlock

• OpenSherlock is: – A Topic Map for information resource identity and organization – A HyperMembrane information fabric structure – A society of agents system which can

• Read documents • Process information resources

– Maintain the topic map – Maintain the HyperMembrane – Build and maintain models – Perform discovery tasks – Answer questions

– Agents are coordinated by: • A blackboard system • A dynamic task-based agenda • Event propagation and handling

16

Observations 1

• A Topic Map is central to the key question, and therefore to a thesis entailed by this research

– It serves as a kind of memory for social processes

– It provides a robust platform for subject identity

– It can also serve as a repository for domain-specific vocabularies (ontologies, taxonomies, naming conventions,…)

17

Observations 2

• A Topic Map is necessary but not sufficient to support discovery, learning, or problem solving – It really only provides a powerful indexical structure related to

the key artifacts in any universe of discourse: • Actors • Their relations • Their states • Rules, laws, theories,…

• To model those key artifacts, other representation strategies are required – Conceptual Graphs – Qualitative Process Theory – Belief Networks – …

18

A Research Question

• What processes are available which, if performed while harvesting (reading) documents, can reduce the amount of processing required later during question answering? – The question entails

• Synthesis of ontology

• Co-reference resolution

• Re-representation during question lifting

• …

19

A Working Hypothesis

• Process

– Build and maintain a content-addressable memory of questions, claims, arguments, and evidence fields.

• We call that a HyperMembrane

– Note:

• Every text object passed into the system is processed by the same algorithms – Sentences harvested from text

– Questions and responses posed by humans

20

Key Concept: HyperMembrane

• HyperMembrane is a key concept in the working hypothesis that OpenSherlock seeks to explore and demonstrate – A growing graph as a collection of woven and

intersecting fabrics • constructed from normalized tuples (n-tuples) which

are designed to reduce the amount of NLP required to read documents

• such that intersections of fabrics occur where named entities in the graph of n-tuples are the same

– Inspired by Ted Nelson’s ZigZag Architecture

21

Machine Reading in OpenSherlock

• Goals: – Grow the topic map

• Topic Map then serves to support fabrication of higher-order knowledge structures

– Conceptual Graphs – Belief Networks – QP Theory Models – HyperMembrane – …

• Process Loop: – For a given document

• For every paragraph in that document – For every sentence in each paragraph

» Read the sentence

22

Sentence Reading

• First Step:

– Process sentence into word grams*

• Second Step:

– Where possible

• Transform word grams into n-tuples**

• n-tuples form the HyperMembrane

* A container of words, from 1 to 8 words per container

** A container of symbols based on words in word grams

23

Process Sentence into WordGrams

• Approach – Break sentence into word grams*

• WordGram objects are shared across sentences – Count of sentence identifiers associated with each object

serves as basis for probabilistic models

– Either • TopicMap recognizes terms

– Or • Sentence is parsed by Link-Grammar Parser**

• TopicMap learns from parse results

*http://en.wikipedia.org/wiki/W-shingling **http://www.link.cs.cmu.edu/link/ 24

Transform WordGrams to N-Tuples

• Normalized tuple (N-Tuple) – A structure where the subject, predicate, and object are normalized

• Nouns and verbs transformed – CO2, Carbon Dioxide, … CO2 – causes, is caused by, … cause

• Two sentence example – CO2 is a cause of climate change. – Climate change is caused by carbon dioxide. – Result:

» { CO2, cause, climate change }

– Normalization processes include general and domain specific lenses • Rule-based interpreters which detect structures

– Taxonomy – Causality – Biomedical – Geophysical – …

• Process models – Built and maintained while reading – Predict while reading – Anticipatory Reading

25

About N-Tuples

• An N-Tuple is a structured record of – Topics in the topic map – Those topics are harvested from text

• An N-Tuple takes the form: – { Subject, Predicate, Object } – Where

• Subject and/or Object can be one of: – A topic from the topic map – Another N-Tuple

• An N-Tuple is identified by the identities of the terms it contains – When thinking in terms of terms (words) read from documents, the identities

(numeric representations) of those terms form the identity of the N-Tuple object. • N-Tuples are content addressable

• Disambiguation of subjects is a topic mapping process – Learning means continuous refinement of subject identity – Ambiguities can also be solved through human intervention

26

N-Tuples as HyperMembrane

Tuples

{A, Bind, B}

{{A, Bind, B}, Cause, X}

{X, Bind, D}

{{X, Bind, D}, Cause, Y}

27

A B Bind

X

Y

D

Cause

Bind

Cause

Current State of OpenSherlock

ElasticSearch

Titan or

Blazegraph

Ontology Importer

Ontologies

PubMed Reader

PubMed Abstracts

HyperMembrane Engine

TellAsk

UMLS Importer

UMLS

28

Observations 3

• HyperMembrane is a reminding system – HyperMembrane is a record of federated human

conversation • Harvested from books, papers, and recorded

conversation

• Includes statistical properties of recorded utterances

– HyperMembrane records: • That which is common

• That which is novel – Possibly wrong

– Possibly game changing

29

TellAsk Interface

Conversation Tree User can click a

node to select as parent for any user response

Response Type Selectors. Selection required before

response.

User types here

Linear conversation flow

Entry Forms Selector List

Map starts a new conversation with entered topic

30

The Open Source Stack

• Persistence – ElasticSearch – Considering Titan – Considering Blazegraph (Bigdata™ RDF Store)

• Libraries – Many from Apache Foundation and others – LinkGrammarParser (Java version) – XML PullParser – Simple JSON Parser

• Tools – Eclipse

31

Summary

32

Current State of Development

• Aim to answer simple questions about casuality

– Current focus on biomedical domain

– Current focus on two lenses

• Taxonomy

• Casuality

– No Conceptual Graphs

– No Process Models

– No Probabilistic Models

33

Future Work

• Aim to complete an anticipatory system

– Process models for anticipation

– Conceptual graphs

– Probabilistic models

– More lenses

• Pluggable lenses

• Adaptive lenses

– More domains

34

Why Do This?

• Augment human capabilities in problem solving

• Participate in Open Science

35

Augmenting Social Sensemaking

1

2

3

Creating Ideas

Refining Connections

Connecting Ideas

Cancer patient

36

Participate in Open Science

37

Key Context for Open Science

• A planet-wide, collaborative quest for Global Thrivability*.

– Issues include

• Sociological events – Health, epidemics, wars,…

• Geophysical events – Climate change, earthquakes, volcanoes, …

• Astrophysical events – Asteroids, our Sun. …

* Let’s call the quest: EarthMoonshot

38

Completed Representation

antioxidants kill

free radicals

Contraindicates

macrophages use free radicals to

kill bacteria

Bacterial Infection Antioxidants

Because

Appropriate For

Compromised Host

Let us co-create Cognitive Agents for Discovery [email protected]

OpenSherlock documents at: http://debategraph.org/OpenSherlock Code emerging at: https://github.com/opensherlock/ Slides online at http://slideshare.net/jackpark/

Acknowledgments: Bob Gleichauf David Alexander Price Arun Majumdar Robert S. Stephenson Mark Szpakowski Martin Radley Sherry Jones Alexander Wenzowski Ted Kahn Patrick Durusau

39