Jan Šedivý - Intelligent Personal Assistants


Citation preview

Intelligent Personal Assistant Jan Šedivý

Petr Baudiš, Tomáš Gogár, Tomáš Tunys

April 2016



SegmentsFunctionalityUse casesExpectationsInteractionModesContextPrivacyTechnologiesOur FocusRule based IPAYodaQAFuture


Intelligent Personal Assistants

Intelligent Personal Assistants on the market today

Apple Siri, Google Now, Microsoft Cortana and Amazon Echo applications/appliances are the best known Intelligent, Virtual or Personal Assistants (IPA).

In this presentation I will discuss the use cases, challenges and basic architecture of the future intelligent assistants.

IPA Basic Definition

Predict users' needs, helps, alerts, answers questions and ultimately acts autonomously on our behalf.

To achieve it this goal: IPAs needs to communicate and be connected to the cloud.

How can IPA help us?

IPA Segments

Mobile applications

Car interface


Chat bots

Home automation

Robots, appliances

Where is IPA currently used?

IPA Functionality

Alerting, reminding

Command control - mobile, car, wearables

IR front end, Search UI

Simple, factoid question answering

Simple chat bots, Avatars

Advising, Robo advisers


How can IPA help us?

How to communicate with IPA?

People met shook hands and made a deal

Call centers, credit cards and parcel delivery

Internet commerce web sites

WeChat, FaceBook, Alexa call me uber

A short history of making business

Relationships start with conversation …


IPA interaction channels

Input: Text, Speech, Haptic, Gestures, Image recognition,

Output: Text, Voice, Graphics, Haptic,

Mic, Camera, Touch screen, Wearables sensors, Brain waves

Multimodal: 5 senses, combines voice and GUI

Use not only language!

IPA Interaction

ModesIPA initiated: Alerts, Suggests,

User initiated: Execute command, Answer question (voice), Carry a dialog

Directed dialog

Who starts the interaction?

IPA Interaction


Open dialog

Mixed-initiative: The initiative is changing. IPA or user starts the dialog.

Disambiguation: IPA clarifies the question through a dialog


Topic changing system

Who is leading the dialog?

User Model - Context

Internal, embedded

Location, Time,

History - query, commands, situations, …

Future - calendar, email,...

User’s profile, preferences, usage modes, …

Affective computing - Emotional models

People know context!

User Model - Context

External - environmentSocial, family, friendsconnected == IoTSensors, actuators, LANs

Private - with limited access

Recorded phone calls,Credit card transactions,Utilities ...

IPA Privacy

IPA may know almost all

Supertrust relation

How much of private information do we want to share?

User need control

The information may only be shared with trust, (Norms of human relationships)

How much private data do we need to share to let IPA act

for us!

IPA is one of the most complicated examples of the AI technologies

IPA Technologies

Speech recognition, Speaker, language recognitionImage recognition,Haptics, gestures, face gesture, emotion recognitionEmotion recognition TTS, automatic speech generationGraph, picture, haptic generationUser modeling

NLP, NLU, Information retrieval, Knowledge management, Dialog managementInternet, APIsIoT etc. EvaluationAffective computing, Emotional


To meet the user's expectations we need to combine many AI technologies:

IPA Architecture

Rule based If Sentence Pair Match is high=> intent do this

Statistical MLQuestion analysis,Knowledge base and Internet Answer Hypothesis

Answer scoring

Rule based or Statistical

Rule Based IPA

Spoken input - ASR - Text - Entity extraction - Intent detector - Normalization - Execution

These systems assume questions with clear goal

If the question is beyond the system capabilities “I can’t answer this question”

Or it does the WEB search

Intent Reco

Answer Sentence Selection

Next Utterance Ranking

Semantic Textual Similarity

Paraphrase Identification

Recognizing Textual Entailment

Basic models: TF-IDF, BM25, word, sentence embeddings

Sentence Pair Scoring

The YodaQA System

● Universal end-to-end QA

● Searching databases and documents

● Open source research system

● Machine learning no manual rules!

● Java, Apache UIMA, Apache Solr

● Proof-of-concept web+mobile interface, public live demo

Factoid Question Answering

Naturally phrased question instead of keywords

Output is not a whole document, but just the snippet of information

Voice interaction

Factoid Question Answering

We cover the basic factoid questions!

When was J. R. R. Tolkien born?

What is the population of Brazil?

Who played Marge in The Simpsons?

Where was she born? (Julie Kavner)

How do I get to Wall Street?

Turn on the green light!

Tune BBC World News!

You are the last one, do you want me tu turn on the alarm?

IPA Building Steps

Intet identification

Data collection,


Feature engineering,

Models building,

(Active learning,)

Model evaluation


Implement norms of human relationship: mutual value, respect, trust

What makes a better conversation?

How to carry an effective dialog, negotiation?

How to design an engine recognizing emotions?

How to learn habits?

How to make the IPA more human like?


How to make IPA adaptable to the user?

How to make IPA automatically configurable and integrate in a new environment?

How to make IPA enough flexible?

IPA unified interface to mobile applications

Millions of mobile apps

Navigation, login-chaos, and unified bad notification

leveraging the context-of-consumption

leverage sensory and multimodal inputs

Gartner: By 2020, IPA will facilitate 40 percent of mobile interactions and it will begin to dominate the postapp era.

Thank you

TeamČVUT FEL - dept. Of Cybernetics

Human behaviour


People ask questions







People need help everywhere

Small real estate

Navigation, cross-app API, password chaos

UI has to change

iPhone introduced 2007

IPA Segments






Real estate.


What are the industries benefiting from IPAs?

IPA Developmen


Collect utterances

Define the answers

Label utterances

Build the model (ML)


Iterate to improve

Users Expectations

Mustn't forcing to memorize commands.

It must understand natural language.

Helps solving everyday tasks.

Must be non obtrusive giving suggestions.

Answers questions.

IPA Use Cases

Complex questions,

Conversational, dialog

Complex robo advisers,

Presentation commerce,

Digital, enterprise, media asset management

Automatic generating documents, stats, news, tweets based on content on the web ….
