30
BUILDING THE WORLD’S SMARTEST KEYBOARD Ben Medlock Co-founder, CTO Data Driven NYC 2015 19 August, 2015

Ben Medlock, SwiftKey // Building a Better Keyboard

Embed Size (px)

Citation preview

BUILDING THE WORLD’S SMARTEST KEYBOARD

Ben Medlock

Co-founder, CTO

Data Driven NYC 2015

19 August, 2015

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.2

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.3

We believe the next generation of

technology won’t just be smart, it will provide

a more human experience – one that adapts

to you, not the other way around.

BACKGROUND

HOW DO WE THINK?

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.5

HOW CAN WE MODEL HOW WE THINK

WHAT IS ARTIFICIAL INTELLIGENCE?

19 August, 20156 TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.

Copyright: Warner Bros. Pictures

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.7

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.8

PROBABILITY

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.9

P(A |B) =P(B | A)P(A)

P(B)

MACHINE LEARNING

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.10

INFERENCE

MODEL

INPUT

NLP

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.12

Narrow AI

General AIWeak AI

Strong AI

AI

Web search

Collaborative filtering

Voice recognition

Machine translation

Driverless cars

Image processing

SWIFTKEY

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.14

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.15

TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.16

RETHINKING TYPING

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.

P(s|e,M)

context input prior

language

detection

next word

prediction

error

correction

tap /

continuous

unseen

sequences

DIFFERENT INTERPRETATIONS

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.

• Independent probability distributions with smoothing parameters to

govern level of belief

• Ranking signals as inputs to a rank preference learner

• Single distribution estimates, e.g. maximum entropy, where all

evidence types can be expressed as features

LANGUAGE MODELING

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.

• Use language models to capture domain usage:

• Background

• Conversational

• Personal

• Context-specific

• Combine multiple models using most confident, interpolation,

etc.

BUILDING LANGUAGE MODELS

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.

• Smoothed n-gram models are fast and efficient

• Work well with optimized trie search

• Smoothing: interpolative, backoff, “stupid”…

• Morphemes

• Neural nets / representation learning

DATA COLLECTION

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.

COMBINING

MODELS

INPUT MODELING

A

• Use Gaussian distributions to model interaction with the keyboard

surface

• Linear Gaussians for e.g. spacebar

• Other distributions?

INPUT MODELING

Q W

will we quit

• Train using re-parameterized online MAP

• Track keystroke-character correlations and train models on a per-

session basis

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.26

HYPERPARAMETER LEARNING

• Lots of hyperparameters!

Input confidence, prune threshold, dynamic LM, etc…

• Some can be learned automatically, e.g. prefix probability

• What kind of typist?

• Accurate and visual-led

• Fast and furious

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.27

LOTS OF OTHER LANGUAGE PROBLEMS!

• Profanity filtering

• Stochastic tokenisation (Chinese, Vietnamese etc.)

• Language detection

• Vocabulary evolution

• Clustering

TIME SAVED SO FAR

19 August, 2015TouchType Ltd, 2014. CONFIDENTIAL - do not copy/distribute. All content for illustrative purposes only.28

50TRILLIONCHARACTERS

WRITTEN

15TRILLION

KEYSTROKES

SAVED

WATCH

FRIENDS

19 MILLION

TIMES

3

THANK YOU