Hybrid system architecture overview

Overview of Hybrid Architecture in Project Halo Jesse Wang, Peter ClarkMarch 18, 2013

Status of Hybrid ArchitectureGoals, Modularity, Dispatcher, Evaluation

Hybrid System Near Term Goals

• Setup the infrastructure to communicate with existing reasoners

• Reliably dispatch questions and collect answers

• Create related tools and resourceso Question generation/selection, answer

evaluation, report analysis, etc.

• Experiment ways to choose the answers from available reasoners – as hybrid solver

Dispatcher

Focus Areas of Hybrid Framework (until mid 2013)

• Loose coupling, high cohesion, data exchange protocols

Modularity

• Send requests and handle the responses

Dispatching

• Ability to get ratings on answers, and report results

Evaluation

DirectQA

CYC TEQA

IR?SQDB

Retrieval

CYC SQs

SQsHybri

EVALUATIONReport

Hybrid System Core Components

SQs: suggested questions SQA: QA with suggested questions TEQA: Textual Entailment QAIR: Information Retrieval

Yellow Outline: New or Updated

Filtered Set of

Questions

In Campbe

Chapt 7

Find-A-Value

Infrastructure: Dispatchers

Dispatcher

CYC TEQA

Live Single QA

Suggested QA Batch QA

Dispatcher Features

• Asynchronous batch mode and single/experiment mode

• Parallel dispatching to reasonerso Very functional UI: Live progress indicator, view question file, logso Exception and error handling

• Retry question when server is busy

• Batch service can continue to finish even if the client dieso Cancel/stop the batch process also available

• Input and output support both XML and CSV/TSV formatso Pipeline support: accept Question-Selector input

• Configurable dispatchers, select reasonerso Collect answers and compute basic statistics

Question-Answering via Suggested Questions

• Similar features as Live/Direct QA

• Aggregate suggested questions’ answers as a solver

• Unique features:o Interactively browse suggested questions databaseo Filter on certain facetso Using Q/A concepts, question types, etc. to improve relevanceo Automatic comparison of filtered and non-filtered results by

chapters

Question and Answer Handling

• Handling and parsing reasoner’s returned resultso Customized programming

• Information on execution: details and summary

• Report generationo Automatic evaluation

• Question Selectoro Support multiple facets/filterso Question bankso Dynamic UI to pick questionso Hidden tags support

Automatic Evaluation: Status as of 2013.3

• Automatic result evaluation features• Web UI/service to use• Algorithms to score exact and variable answers

– brevity/clarity– relevance: correctness + completeness– overall score

• Generate reports – Summary & details– Graph plot

• Improving evaluation result accuracy • Using: basic text processing tricks (stop words, stemming, trigram

similarity, etc.), location of answer, length of answer, bio concepts, counts of concepts, chapters referred, question types, answer type

• Experiments and analysis (several rounds, W.I.P.)0

User overall AutoEval Overall

Hybrid PerformanceHow we evaluate and how can improve overall system performance

Caveats: Question Generation and Selection

• Generated by a small group of SMEs (senior biology students)

• In natural language, without textbook (only syllabus)

Question Set Facets

Chapter Distribution

FIND-A-VALUE46%

IS-IT-TRUE-THAT9%

HAVE-RELATIONSHIP8%

PROPERTY6%

HOW-MANY5%

WHERE5%

WHAT-DOES-X-DO3%

WHAT-IS-A3%

HAVE-SIMILARITIES2%

X-OR-Y2%FUNCTION-OF-X

1%HAVE-DIFFERENCES

Question Types

Caveat: Evaluation Criteria

• We provided a clear guideline, but still subjectiveo A(4.0) = correct, complete answer, no major weaknesso B(3.0) = correct, complete answer with small cosmetic issueso C(2.0) = partially correct or complete answers, with some big issueso D(1.0) = somewhat relevant answer or information, or poor presentationo F(0.0) = wrong or irrelevant, conflicting or hard-to-locate answers

• Only 3 users to rate the answers, under tight timeline

7 15 230

User Preferences

AuraCycText QA

Evaluation ExampleQ: What is the maximum number of different atoms a carbon atom can bind at once?

More Evaluation Samples (Snapshot)

0.00 0.33 0.67 1.00 1.33 1.67 2.00 2.33 2.67 3.00 3.33 3.67 4.000

Answer Counts Over Rating

Text QA

Reasoner Quality Overview

Performance Number

Precision Recall F10.000

Reasoner Performance on All Ratings (0..4)

AuraCycText QA

Precision Recall F10.000

Reasoner Performance on "Good" (>= 3.0)

Answers

AuraCycText QA

Answers Over Question Types

FIND-A-VALUE

HOW-MANY

PROPERTY

WHAT-DOES-X-DO

WHAT-IS-A

X-OR-Y

IS-IT-TRUE-THAT

HAVE-DIFFERENCES

HAVE-SIMILARITIES

HAVE-RELATIONSHIP

0.000.501.001.502.002.503.003.504.00

Answer Overall Rating

Text QACycAura

FIND-A-VALUE

HOW-MANY

PROPERTY

WHAT-DOES-X-DO

WHAT-IS-A

X-OR-Y

IS-IT-TRUE-THAT

HAVE-DIFFERENCES

HAVE-SIMILARITIES

HAVE-RELATIONSHIP

0 2 4 6 8 10 12 14 16 18 20

Count of Answered Questions

Text QACycAura

Answer Distribution Over Chapters

AuraAuraAura

AuraAura

CycCyc

Text QA

Text QAText QA

Text QA

0 4 5 6 7 8 9 10 11 120.00

4.00Answer Quality Over Chapters

Text QA

Answers on Questions with E/V Answer Type

Aura Cyc Text QA Average0.000.501.001.502.002.503.00

Exact/Variou Answer Quality

Aura Cyc Text QA0

1020304050

Exact/Various Answer Count

Improve Performance: Hybrid Solver – Combine!• Random selector (dumbest, baseline)

o Total question answered correctly should beat the best solver

• Priority selector (less dumb)o Pick reasoner following a good order (e.g. Aura > Cyc > Text QA) *o Expected performance: better than best individual

• Trained selector: Feature and rule-based selector (smarter)o Decision-Tree (CTree…) learning over Q-Type, Chapter, …o Expected performance: slightly better than above

• Theoretical best selector: MAX – the upper limit (smartest)o Suppose we can always pick the best performing reasoner

Performance (F1) with Hybrid Solvers

Aura Cyc Text QA Random Priority D-Tree Max0.000

Performance of Solvers on Good Answers (Good: Rating >= 3.0)

Conclusion

• Each reasoner has its own strength and weaknesso Some aspects not handled well by AURA & CYCo Low hanging: IS-IT-TRUE-THAT for all, WHAT-IS-A for CYC, …

• Aggregated performance easily beats the best individual (Text QA)o Random solver does a good job (F1: mean=0.609): F1

MAX –

F1random

~ 2.5%

• Little room for better performance via answer selectiono F1

MAX – F1

D-Tree ~ 0.5%

o Better focus on MORE and/or BETTER solvers

Future and Discussions

Near Future Plans

• Include SQDB-based answers as a “Solver”o Help alleviate question interpretation problems by reasoners

• Include Information Retrieval-based answers as a “Solver”o Help understand the extra power reasoners can have over search

• Improvement evaluation mechanism

• Extract more features from questions and answers to enable a better solver, and see how close we can get to the upper limit (MAX)

• Improve question selector to support multiple sources and automatic update/merge of question metadata

• Find ways to handle question bank evolution

Get More, Better Reasoners

• Extract and use more features to select best answers• Evidence collection and weighing

Machine learning, Evidence combination

• Easier to explore individual results and diagnose failures

• Support to tune and optimize performance over target question-answer datasets

Analytics & tuning

• Support shared data, shared answers• Subgoaling• Allow reasoners to call each other for subgoals

Inter-solver communication

Further Technical Directions (2013.6+)

Open Data

Open Services

Open Environment

Open *Data*

• Clear Semantics, Common Format (standard), Easy to Access, Persistent (available)

Requirements

• Questions bank, training sets, knowledge base, protocol for intermediate and final data exchange

Data Sources

• Design and implement protocols and services for data I/O

Open Data Access Layer

Open *Services*

• Pure machine/algorithms based• Human-computation (social, crowd sourcing)

Two Categories

• Communicate with open data, generate meta data, • More reliable, scalable, reusable

Requirements

• Convert raw, noisy, inaccurate data refined, structured, useful

Goal: Process and refine data

Open *Environment*

• AI development environment to facilitate collaboration, efficiency and scalability

Definition

• like MMPOG, each “player” gets credits: contribution, resource consumption; interests, loans; ratings…

Operation

• self-organized projects, growth potential, encourage collaboration, grand prize

Opportunities

Thank You!For having the opportunity for Q&A

Backup slides next

IBM Watson’s “DeepQA” Hybrid Architecture

DeepQA Answer Merging And Ranking Module

Wolfram Alpha Hybrid Architecture

• Data Curation

• Computation

• Linguistic components

• Presentation

Answer Distribution (Density)

0.00 0.33 0.67 1.00 1.33 1.67 2.00 2.33 2.67 3.00 3.33 3.67 4.000

Answer Distribution

Text QACycAura

Average User Rating

Data Table for Answer Quality Distribution

Work Performed

• Created web-based dispatcher infrastructureo For both Live Direct QA and Live Suggested Questionso Batch mode to handle larger amount

• Built a web UI for UW student to rate answers of questions (HEF)o Coherent UI, duplicate removal, queued tasks

• Established automatic ways for result evaluation and comparison

• Applied first versions of file exchange format and protocols

• Employed initial file and data exchange formats and protocols• Setup faceted browsing and search (retrieval) UI

o And web services for 3rd party consumption

• Carried out many rounds of relevance studies and analysis

First Evaluation via Halo Evaluation Framework• We sent individual QA result set to UW students for evaluation

• First round hybrid system evaluation:o Cyc SQA: 9 best (3 ties), 14 good, 15 / 60 answeredo Aura QA: 1 best, 9 good, 14/60 answered; o Aura SQA: 4 best (3 ties), 7 good, 8/60 answeredo Text QA: 27 best, 29 good; SQA: 3 best, 5 good, 7/60 answeredo Best scenario: 41/60 answered o Note: Cyc Live was not included

o * SQA (Answering via suggested questions)

Ask a question Waiting for answers

Answers returned?

Live Direct QA Dispatcher ServiceWhat does ribosome make?

Live Suggested QA Dispatcher Service

Batch QA Dispatcher Service

Result automatically downloaded once finished

Live solver Service Dispatchers

Direct Live QA: What does ribosome make?

Hybrid system architecture overview

Technology

Hybrid Morphologies- Infrastructure Architecture Landscape

h13189 EMC Hybrid Cloud - Reference Architecture

Bandwidth Management - cisco.com · 6-3 Preferred Architecture for Cisco Webex Hybrid Services February 11, 2019 Chapter 6 Bandwidth Management Overview Figure 6-1 Architecture for

Managing an Enterprise Class Hybrid Architecture

A TRIPARTITE HYBRID MODEL ARCHITECTURE FOR …

Overview of Hybrid vehicles

2009 Architecture Plan Overview 2009 Architecture Plan Overview

HYBRID MEDIA STREAMING ARCHITECTURE, FOCUSSING ON

Hybrid integration reference architecture

CHERI: A Hybrid Capability Architecture

D1.1: Concept of Modular Architecture for Hybrid Electric ... · for Hybrid Electric Propulsion of Aircraft ... Concept of Modular Architecture for Hybrid Electric ... AIRCRAFT EXAMPLES

Hybrid Architecture

Architecture Description Languages: An Overview Architecture Description Languages: An Overview Architecture Definition ADLs Architecture vs. Design ADLs

Hybrid cloud architecture & connectivity

A Scalable Hybrid Network Monitoring Architecture for ...nsrg.ece.illinois.edu/publications/thesis_final.pdf · A Scalable Hybrid Network Monitoring Architecture for Measuring, Characterizing,

Hybrid Access Broadband Network Architecture · PDF fileHybrid Access Broadband Network Architecture Issue: 1 Issue Date: July 2016 . Hybrid Access Broadband Network Architecture TR-348

SAL: A Hybrid Cognitive Architecture

Hybrid01 Hybrid System Overview

Hybrid Architecture FrameWork

Hybrid System Overview