Identifying Unknown Unknowns in the Open World ......Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration Himabindu Lakkaraju , Ece Kamar

Identifying Unknown Unknowns in the Open World:Representations and Policies for Guided Exploration

Himabindu Lakkaraju∗, Ece Kamar+, Rich Caruana+, Eric Horvitz+∗Stanford University, +Microsoft Research

∗[email protected], + {eckamar, rcaruana, horvitz}@microsoft.com

AbstractPredictive models deployed in the real world may assign in-correct labels to instances with high confidence. Such errorsor unknown unknowns are rooted in model incompleteness,and typically arise because of the mismatch between trainingdata and the cases encountered at test time. As the models areblind to such errors, input from an oracle is needed to iden-tify these failures. In this paper, we formulate and address theproblem of informed discovery of unknown unknowns of anygiven predictive model where unknown unknowns occur dueto systematic biases in the training data. We propose a model-agnostic methodology which uses feedback from an oracle toboth identify unknown unknowns and to intelligently guidethe discovery. We employ a two-phase approach which firstorganizes the data into multiple partitions based on the fea-ture similarity of instances and the confidence scores assignedby the predictive model, and then utilizes an explore-exploitstrategy for discovering unknown unknowns across these par-titions. We demonstrate the efficacy of our framework byvarying the underlying causes of unknown unknowns acrossvarious applications. To the best of our knowledge, this pa-per presents the first algorithmic approach to the problem ofdiscovering unknown unknowns of predictive models.

IntroductionPredictive models are widely employed in a variety of do-mains ranging from judiciary and health care to autonomousdriving. As we increasingly rely on these models for high-stakes decisions, identifying and characterizing their unex-pected failures in the open world is critical. We categorizeerrors of a predictive model as: known unknowns and un-known unknowns (Attenberg, Ipeirotis, and Provost 2015).Known unknowns are those data points for which the modelmakes low confidence predictions and errs. On the otherhand, unknown unknowns correspond to those points forwhich the model is highly confident about its predictionsbut is actually wrong. Since the model lacks awareness ofits unknown unknowns, approaches developed for address-ing known unknowns (e.g., active learning (Settles 2009))cannot be used for discovering unknown unknowns.

Unknown unknowns can arise when data that is used fortraining a predictive model is not representative of the sam-ples encountered at test time when the model is deployed.

Copyright c© 2017, Association for the Advancement of ArtificialIntelligence (www.aaai.org). All rights reserved.

Figure 1: Unknown unknowns in an image classificationtask. Training data comprised only of images of black dogsand of white and brown cats. A predictive model trained onthis data incorrectly labels a white dog (test image) as a catwith high confidence.

This mismatch could be a result of unmodeled biases in thecollection of training data or differences between the trainand test distributions due to temporal, spatial or other factorssuch as a subtle shift in task definition. To illustrate, consideran image classification task where the goal is to predict if agiven image corresponds to a cat or a dog (Figure 1). Letus assume that the training data is comprised of images ofblack dogs, and white and brown cats, and the feature setincludes details such as nose shape, presence or absence ofwhiskers, color, and shape of the eyes. A predictive modeltrained on such data might learn to make predictions solelybased on color despite the presence of other discriminativefeatures because color can perfectly separate the two classesin the training data. However, during test time, such a modelwould classify an image of a white dog as a cat with highconfidence. The images of white dogs are, therefore, un-known unknowns with regard to such a predictive model.

We formulate and address the problem of informed dis-covery of unknown unknowns of any given predictive modelwhen deployed in the wild. More specifically, we seek toidentify unknown unknowns which occur as a result of sys-tematic biases in the training data. We formulate this asan optimization problem where unknown unknowns are dis-covered by querying an oracle for true labels of selected

instances under a fixed budget which limits the number ofqueries to the oracle. The formulation assumes no knowl-edge of the functional form or the associated training data ofthe predictive model and treats it as a black box which out-puts a label and a confidence score (or a proxy) for a givendata point. These choices are motivated by real-world sce-narios in domains such as healthcare and judiciary, wherepredictive models are being deployed in settings where endusers have no access to either the model details or the asso-ciated training data (e.g., COMPAS risk assessment tool forsentencing (Brennan, Dieterich, and Ehret 2009)). Identify-ing the blind spots of predictive models in such high-stakessettings is critical as undetected unknown unknowns can becatastrophic. In criminal justice, biases and blindspots canlead to the inappropriate sentencing or incarceration of peo-ple charged with crimes or unintentional racial biases (Craw-ford 2016). To the best of our knowledge, this is the firstwork providing an algorithmic approach to addressing thisproblem.

Developing an algorithmic solution for the discovery ofunknown unknowns introduces a number of challenges: 1)Since unknown unknowns can occur in any portion of thefeature space, how do we develop strategies which can ef-fectively and efficiently search the space? 2) As confidencescores associated with model predictions are typically notinformative for identifying unknown unknowns, how can wemake use of the feedback from an oracle to guide the dis-covery of unknown unknowns? 3) How can we effectivelymanage the trade-off between searching in neighborhoodswhere we previously found unknown unknowns and exam-ining unexplored regions of the search space?

To address the problem at hand, we propose a two-stepapproach which first partitions the test data such that in-stances with similar feature values and confidence scoresassigned by the predictive model are grouped together, andthen employs an explore-exploit strategy for discovering un-known unknowns across these partitions based on the feed-back from an oracle. The first step, which we refer to as De-scriptive Space Partitioning (DSP), is guided by an objectivefunction which encourages partitioning of the search spacesuch that instances within each partition are maximally sim-ilar in terms of their feature values and confidence scores.DSP also provides interpretable explanations of the gener-ated partitions by associating a comprehensible and compactdescription with each partition. As we later demonstrate inour experimental results, these interpretable explanations arevery useful in understanding the properties of unknown un-knowns discovered by our framework. We show that our ob-jective is NP-hard and outline a greedy solution which is aln N approximation, where N is the number of data pointsin the search space. The second step of our methodologyfacilitates an effective exploration of the partitions gener-ated by DSP while exploiting the feedback from an oracle.We propose a multi-armed bandit algorithm, Bandit for Un-known Unknowns (UUB), which exploits problem-specificcharacteristics to efficiently discover unknown unknowns.

The proposed methodology builds on the intuition that un-known unknowns occurring due to systematic biases are of-ten concentrated in certain specific portions of the feature

space and do not occur randomly (Attenberg, Ipeirotis, andProvost 2015). For instance, the example in Figure 1 illus-trates a scenario where systematic biases in the training datacaused the predictive model to wrongly infer color as the dis-tinguishing feature. Consequently, images following a spe-cific pattern (i.e., all of the images of white dogs) turn outto be unknown unknowns for the predictive model. Anotherkey assumption that is crucial to the design of effective algo-rithmic solutions for the discovery of unknown unknowns isthat available evidential features are informative enough tocharacterize different subsets of unknown unknowns. If suchfeatures were not available in the data, it would not be pos-sible to leverage the properties of previously discovered un-known unknowns to find new ones. Consequently, learningalgorithms designed to discover unknown unknowns wouldnot be able to perform any better than blind search (no freelunch theorem (Wolpert and Macready 1997)).

We empirically evaluate the proposed framework on thetask of discovering unknown unknowns occurring due to avariety of factors such as biased training data and domainadaptation across various diverse tasks, such as sentimentclassification, subjectivity detection, and image classifica-tion. We experiment with a variety of base predictive mod-els, ranging from decision trees to neural networks. The re-sults demonstrate the effectiveness of the framework and itsconstituent components for the discovery of unknown un-knowns across different experimental conditions, providingevidence that the method can be readily applied to discoverunknown unknowns in different real-world settings.

Problem FormulationGiven a black-box predictive modelM which takes as inputa data point x with features F = {f1, f2, · · · fL}, and re-turns a class label c′ ∈ C and a confidence score s ∈ [0, 1],our goal is to find the unknown unknowns ofMw.r.t a giventest setD using a limited number of queries,B, to the oracle,and, more broadly, to maximize the utility associated withthe discovered unknown unknowns. The discovery processis guided by a utility function, which not only incentivizesthe discovery of unknowns unknowns, but also accounts forthe costs associated with querying the oracle (e.g., monetaryand time costs of labeling in crowdsourcing). Recall that,in this work, we focus on identifying unknown unknownsarising due to systematic biases in the training data. It is im-portant to note that our formulation not only treats the pre-dictive model as a black-box but also assumes no knowledgeabout the data used to train the predictive model.

Although our methodology is generic enough to find un-known unknowns associated with all the classes in the data,we formulate the problem for a particular class c, a criticalclass, where false positives are costly and need to be dis-covered (Elkan 2001). Based on the decisions of the systemdesigner regarding critical class c and confidence thresholdτ , our search space for unknown unknown discovery con-stitutes all of those data points in D which are assigned thecritical class c by modelM with confidence higher than τ .

Our approach takes the following inputs: 1) A set of Ninstances, X = {x1, x2 · · ·xN} ⊆ D, which were confi-dently assigned to the critical class c by the modelM, and

the corresponding confidence scores, S = {s1, s2 · · · sN},assigned to these points byM, 2) An oracle o which takesas input a data point x and returns its true label o(x) as wellas the cost incurred to determine the true label of x, cost(x)3) A budget B on the number of times the oracle can bequeried.

Our utility function, u(x(t)), for querying the label ofdata point x(t) at the tth step of exploration is defined as:

u(x(t)) = 1{o(xt)6=c} − γ × cost(x(t)) (1)

where 1{o(xt)6=c} is an indicator function which returns1 if x(t) is identified as an unknown unknown, and a 0otherwise. cost(x(t)) ∈ [0, 1] is the cost incurred by theoracle to determine the label of x(t). Both the indicator andthe cost functions in Equation 1 are initially unknown andobserved based on oracle’s feedback on x(t). γ ∈ [0, 1] is atradeoff parameter which can be provided by the end user.

Problem Statement: Find a sequence of B instances{x(1), x(2) · · ·x(B)} ⊆ X for which the cumulative util-

ityB∑t=1

u(x(t)) is maximum.

MethodologyIn this section, we present our two-step framework designedto address the problem of informed discovery of unknownunknowns which occur due to systematic biases in the train-ing data. We begin this section by highlighting the assump-tions required for our algorithmic solution to be effective:

1. Unknown unknowns arising due to biases in training datatypically occur in certain specific portions of the featurespace and not at random. For instance, in our image clas-sification example, the systematic bias of not includingwhite dog images in the training data resulted in a specificcategory of unknown unknowns which were all clumpedtogether in the feature space and followed a specific pat-tern. Attenberg et. al. (Attenberg, Ipeirotis, and Provost2015) observed this assumption to hold in practice andleveraged human intuition to find systematic patterns ofunknown unknowns.

2. We also assume that the features available in the datacan effectively characterize different kinds of unknownunknowns, but the biases in the training data preventedthe predictive model from leveraging these discriminatingfeatures for prediction. If such features were not availablein the data, it would not be possible to utilize the char-acteristics of previously discovered unknown unknownsto find new ones. Consequently, no learning algorithmwould perform better than blind search if this assump-tion did not hold (no free lunch theorem (Wolpert andMacready 1997)).Below we discuss our methodology in detail. First we

present Descriptive Space Partitioning (DSP), which in-duces a similarity preserving partition on the set X . Then,we present a novel multi-armed bandit algorithm, which werefer to as Bandit for Unknown Unknowns (UUB), for sys-tematically searching for unknown unknowns across thesepartitions while leveraging feedback from an oracle.

Descriptive Space PartitioningOur approach exploits the aforementioned intuition thatblind spots arising due to systematic biases in the data donot occur at random, but are instead concentrated in specificportions of the feature space. The first step of our approach,DSP, partitions the instances in X such that instances whichare grouped together are similar to each other w.r.t the fea-ture space F and were assigned similar confidence scoresby the model M. Partitioning X enables our bandit algo-rithm, UUB, to discover regions with high concentrations ofunknown unknowns.

Algorithm 1 Greedy Algorithm for Partitioning1: Input: Set of instances X , Confidence scores S, Patterns Q,

Metric functions {g1 · · · g5}, Weights λ2: Procedure:3: P = ∅, E = X4: while E 6= ∅ do:5:

p = arg maxq∈Q

|E ∩ covered by(q)|g(q)

where

g(q) = λ1g1(q)− λ2g2(q) + λ3g3(q)− λ4g4(q) + λ5g5(q)

6: P = P ∪ p , Q = Q \ p , E = E \ covered by(p)7: end while8: return P

The intuition behind our partitioning approach is that twoinstances a and a′ ∈ X are likely to be judged using a sim-ilar logic by model M if they share similar feature valuesand are assigned to the same class c with comparable confi-dence scores byM. In such cases, if a is identified as an un-known unknown, a′ is likely to be an unknown unknown aswell1. Based on this intuition, we propose an objective func-tion which encourages grouping of instances in X that aresimilar w.r.t the criteria outlined above, and facilitates sep-aration of dissimilar instances. The proposed objective alsoassociates a concise, comprehensible description with eachpartition, which is useful for understanding the explorationbehavior of our framework and the kinds of unknown un-knowns ofM (details in the Experimental Evaluation Sec-tion).

DSP takes as input a set of candidate patterns Q ={q1, q2, · · · } where each qi is a conjunction of (feature, op-erator, value) tuples where operator ∈ {=, 6=,≤, <,≥, >}.Such patterns can be obtained by running an off-the-shelffrequent pattern mining algorithm such as Apriori (Agrawal,Srikant, and others 1994) on X . Each pattern covers a set ofone or more instances in S. For each pattern q, the set ofinstances that satisfy q is represented by covered by(q), thecentroid of such instances is denoted by xq , and their meanconfidence score is sq .

The partitioning objective minimizes dissimilarities of in-stances within each partition and maximizes them across

1Note that this is not always the case, as we will see in the nextsection.

partitions. In particular, we define goodness of each pat-tern q in Q as the combination of following metrics, whered and d′ are standard distance measures defined over featurevectors of instances and their confidence scores respectively:

Intra-partition feature distance:

g1(q) =∑

{x∈X : x ∈ covered by(q)}

d(x, xq)

Inter-partition feature distance:

g2(q) =∑

{x∈X : x ∈ covered by(q)}

∑{q′∈Q: q′ 6=q}

d(x, xq′)

Intra-partition confidence score distance:

g3(q) =∑

{si: xi∈X ∧xi ∈ covered by(q)}

d′(si, sq)

Inter-partition confidence score distance:

g4(q) =∑

{si: xi∈X ∧xi ∈ covered by(q)}

∑{q′∈Q: q′ 6=q}

d′(si, sq′)

Pattern Length: g5(q) = size(q), the number of(feature, operator, value) tuples in pattern q, included tofavor concise descriptions.

Given the sets of instances X , corresponding confidencescores S, a collection of patterns Q, and weight vector λused to combine g1 through g5, our goal is to find a set ofpatterns P ⊆ Q such that it covers all the points in X andminimizes the following objective:

min∑q∈Q

fq(λ1g1(q)− λ2g2(q) + λ3g3(q)

−λ4g4(q) + λ5g5(q)) (2)

s.t.∑

q: x∈covered by(q)

fq ≥ 1 ∀x ∈ X , where fq ∈ {0, 1}

∀q ∈ Q

where fq corresponds to an indicator variable associatedwith pattern q which determines if the pattern q has beenadded to the solution set (fq = 1) or not (fq = 0).

The aforementioned formulation is identical to that ofa weighted set cover problem which is NP-hard (Johnson1974). It has been shown that a greedy solution provides a lnN approximation to the weighted set cover problem (John-son 1974; Feige 1998) where N is the size of search space.Algorithm 1 applies a similar strategy which greedily se-lects patterns with maximum coverage-to-weight ratio ateach step, thus resulting in a ln N approximation guaran-tee. This process is repeated until no instance in X is leftuncovered. If an instance in X is covered by multiple parti-tions, ties are broken by assigning it to a partition with theclosest centroid.

Our partitioning approach is inspired by a class of clus-tering techniques commonly referred to as conceptual clus-tering (Michalski and Stepp 1983; Fisher 1987) or de-scriptive clustering (Weiss 2006; Li, Peng, and Wu 2008;Kim, Rudin, and Shah 2014; Lakkaraju and Leskovec 2016).

Algorithm 2 Explore-Exploit Algorithm for Unknown Un-knowns1: Input:2: Set of partitions (arms) {1, 2 · · ·K}, Oracle o, Budget B3: Procedure:4: for t from 1 to B do:5: if t ≤ K then:6: Choose arm At = t7: else8: Choose arm At = arg max

1≤i≤Kut(i) + bt(i)

9: end if10: Sample an instance x(t) from partition pAt and query the

oracle for its true label11: Observe true label of x(t) and the cost of querying the or-

acle and compute u(x(t)) using Equation (1).12: end for

13: returnB∑

t=1

u(x(t))

We make the following contributions to this line of research:We propose a novel objective function, whose componentshave not been jointly considered before. In contrast to pre-vious solutions which employ post processing techniques oruse Bayesian frameworks, we propose a simple, yet elegantsolution which offers theoretical guarantees.

Multi-armed Bandit for Unknown UnknownsThe output of the first step of our approach, DSP, is a setof K partitions P = {p1, p2 · · · pK} such that each pj cor-responds to a set of data points which are similar w.r.t thefeature space F and have been assigned similar confidencescores by the modelM. The partitioning scheme, however,does not guarantee that all data points in a partition share thesame characteristic of being unknown unknown (or not be-ing unknown unknown). It is important to note that sharingsimilar feature values and confidence scores does not ensurethat the data points in a partition are indistinguishable as faras the model logic is concerned. This is due to the fact thatthe modelM is a black-box and we do not actually observethe underlying functional forms and/or feature importanceweights being used by M. Consequently, each partitionhas an unobservable concentration of unknown unknown in-stances. The goal of the second step of our approach is tocompute an exploration policy over the partitions generatedby DSP such that it maximizes the cumulative utility of thediscovery of unknown unknowns (as defined in the ProblemFormulation section).

We formalize this problem as a multi-armed bandit prob-lem and propose an algorithm for deciding which partitionto query at each step (See Algorithm 2). In this formaliza-tion, each partition pj corresponds to an arm j of the ban-dit. At each step, the algorithm chooses a partition and thenrandomly samples a data point from that partition withoutreplacement and queries its true label from the oracle. Sincequerying the data point reveals whether it is an unknown un-known, the point is excluded from future steps.

In the first K steps, the algorithm samples a point fromeach partition. Then, at each step t, the exploration decisions

are guided by a combination of ut(i), the empirical meanutility (reward) of the partition i at time t, and bt(i), whichrepresents the uncertainty over the estimate of ut(i).

Our problem setting has the characteristic that the ex-pected utility of each arm is non-stationary; querying a datapoint from a partition changes the concentration of unknownunknowns in the partition and consequently changes the ex-pected utility of that partition in future steps. Therefore, sta-tionary MAB algorithms such as UCB (Auer, Cesa-Bianchi,and Fischer 2002) are not suitable. A variant of UCB, dis-counted UCB, addresses the non-stationary settings and canbe used as follows to compute ut(i) and bt(i) (Garivier andMoulines 2008).

ut(i) =1

Nt(ϑit, i)

t∑j=1

ϑij,t u(x(j)) 1Aj=i

bt(i) =

√√√√√2 logK∑i=1

Nt(ϑit, i)

Nt(ϑit, i), Nt(ϑ

it, i) =

t∑j=1

ϑij,t 1Aj=i

The main idea of discounted UCB is to weight recent ob-servations more to account for the non-stationary nature ofthe utility function. If ϑij,t denotes the discounting factorapplied at time t to the reward obtained from arm i at timej < t, ϑij,t = γt−j in the case of discounted UCB, whereγ ∈ (0, 1). Garivier et. al. established a lower bound on theregret in the presence of abrupt changes in the reward dis-tributions of the arms and also showed that discounted UCBmatches this lower bound upto a logarithmic factor (Garivierand Moulines 2008).

The discounting factor of discounted UCB is designed tohandle arbitrary changes in the utility distribution, whereasthe way the utility of a partition changes in our setting has acertain structure: The utility estimate of arm i only changesby a bounded quantity when the arm is queried. Using thisobservation, we can customize the calculation of ϑij,t for oursetting and eliminate the need to set up the value of γ, whichaffects the quality of decisions made by discounted UCB.We compute ϑij,t as the ratio of the number of data points inthe partition i at time j to the number of data points in thepartition i at time t:

ϑij,t = (Ni −t∑

l=1

1Al=i)/

(Ni −j∑

l=1

1Al=i) (3)

The value of ϑij,t is inversely proportional to the number ofpulls of arm i during the interval (j, t). ϑij,t is 1, if the arm iis not pulled during this interval, indicating that the expectedutility of i remained unchanged. We refer to the version ofAlgorithm 2 that uses the discounting factor specific to oursetting (Eqn. 3) as Bandit for Unknown Unknowns (UUB).

Experimental EvaluationWe now present details of the experimental evaluation ofconstituent components of our framework as well as the en-tire pipeline.

Datasets and Nature of Biases: We evaluate the perfor-mance of our methodology across four different data setsin which the underlying cause of unknown unknowns varyfrom biases in training data to domain adaptation:(1) Sentiment Snippets: A collection of 10K sentiment snip-pets/sentences expressing opinions on various movies (Pangand Lee 2005). Each snippet (sentence) corresponds to adata point and is labeled as positive or negative. We split thedata equally into train and test sets. We then bias the trainingdata by randomly removing sub-groups of negative snippetsfrom it. We consider positive sentiment as the critical classfor this data.(2) Subjectivity: A set of 10K subjective and objective snip-pets extracted from Rotten Tomatoes webpages (Pang andLee 2004). We consider the objective class in this datasetas the critical class, split the data equally into train and testsets, and introduce bias in the same way as described above.(3) Amazon Reviews: A random sample of 50K reviews ofbooks and electronics collected from Amazon (McAuley,Pandey, and Leskovec 2015). We use this data set to studyunknown unknowns introduced by domain adaptation; wetrain the predictive models on the electronics reviews andthen test them on the book reviews. Similar to the sentimentsnippets data set, the positive sentiment is the critical class.(4) Image Data: A set of 25K cat and dog images (Kaggle2013). We use this data set to assess whether our frameworkcan recognize unknown unknowns that occur when seman-tically meaningful sub-groups are missing from the trainingdata. To this end, we split the data equally into train and testand bias the training data such that it comprises only of im-ages of dogs which are black, and cats which are not black.We set the class label cat to be the critical class in our ex-periments.Experimental Setting: We use bag of words features totrain the predictive models for all of our textual data sets.As the features for the images, we use super-pixels obtainedusing the standard algorithms (Ribeiro, Singh, and Guestrin2016). Images are represented with a feature vector com-prising of 1’s and 0’s indicating the presence or absence ofthe corresponding super pixels. We experiment with mul-tiple predictive models: decision trees, SVMs, logistic re-gression, random forests and neural network. Due to spaceconstraints, this section presents results for decision trees asmodel M but detailed results for all the other models areincluded in an online Appendix (Lakkaraju et al. 2016). Weset the threshold for confidence scores τ to 0.65 to constructour search space X for each data set. We consider two set-tings for the cost function (refer Eqn. 1): The cost is set to 1for all instances (uniform cost) in the image dataset and it isset to [(length(x)−minlength)/(maxlength−minlength)](variable cost) for all textual data. length(x) denotes thenumber of words in a snippet (or review) x; minlength andmaxlength denote the minimum and maximum number ofwords in any given snippet (or review). Note that these costfunctions are only available to the oracle. The tradeoff pa-rameter γ is set to 0.2. The parameters of DSP {λ1, · · ·λ5}are estimated by setting aside as a validation set 5% of thetest instances assigned to the critical class by the predictivemodels. We search the parameter space using coordinate de-

Sentiment Subjectivity Amazon Reviews Image Data0.0

0.2

0.4

0.6

0.8

1.0

1.2E

ntr

op

yDSP

kmeans-both

kmeans-conf

kmeans-features

Figure 2: Evaluating partitioning strategies using entropy(smaller values are better).

scent to find parameters which result in the minimum valueof the objective function defined in Eqn. 2. We set the bud-get B to 20% of all the instances in the set X through outour experiments. Further, the results presented for UUB areall averaged across 100 runs.

Evaluating the Partitioning SchemeThe effectiveness of our framework relies on the notion thatour partitioning scheme, DSP, creates partitions such thatunknown unknowns are concentrated in a specific subset ofpartitions as opposed to being evenly spread out across them.If unknown unknowns are distributed evenly across all thepartitions, our bandit algorithm cannot perform better than astrategy which randomly chooses a partition at each step ofthe exploration process. We, therefore, measure the qualityof partitions created by DSP by measuring the entropy ofthe distribution of unknown unknowns across the partitionsin P . For each partition p ∈ P , we count the number ofunknown unknowns, Up based on the true labels which areonly known to the oracle. We then compute entropy of P asfollows:

Entropy(P) = −∑p∈P

Up∑p′∈P

Up′log2(

Up∑p′∈P

Up′)

Smaller entropy values are desired as they indicate higherconcentrations of unknown unknowns in fewer partitions.

Figure 2 compares the entropy of the partitions generatedby DSP with clusters generated by k-means algorithms us-ing only features in F (kmeans-features), only confidencescores in S (kmeans-conf) and both (kmeans-both) by firstclustering using confidence scores and then using features.The entropy values for DSP are consistently smaller com-pared to alternative approaches using kmeans across all thedatasets. This can be explained by the fact that DSP jointlyoptimizes inter and intra-partition distances over both fea-tures and confidence scores. As shown in Figure 2, the en-tropy values are much higher when k-means considers onlyfeatures or only confidence scores indicating the importanceof jointly reasoning about them.

We also compare the entropy values obtained for DSP aswell as other k-means based approaches to an upper boundcomputed with random partitioning. For each of the algo-rithms (DSP and other k-means based approaches), we de-

0 100 200 300 400 500

No. of Queries to the Oracle

0

50

100

150

200

250

Cu

mu

lative

Re

gre

t

Random

Greedy

Epsilon Greedy

UCB1

UCB-f

Sliding Window UCB (50)

Discounted UCB (0.2)



UUB

(a)

0 100 200 300 400 500

No. of Queries to the Oracle

0

50

100

150

200

250

300

Cu

mu

lative

Re

gre

t

Random

Least Average Similarity

Least Maximum Similarity

Most Uncertain

UUB

(b)

Figure 3: (a) Evaluating the bandit framework on imagedata, (b) Evaluating the complete pipeline on image data(decision trees as predictive model).

signed a corresponding random partitioning scheme whichrandomly re-assigns all the data points in the set X to parti-tions while keeping the number of partitions and the numberof data points within each partition same as that of the cor-responding algorithm. We observe that the entropy valuesobtained for DSP and all the other baselines are consistentlysmaller than those of the corresponding random partitioningschemes. Also, the entropy values for DSP are about 32-37% lower compared to its random counterpart across all ofthe datasets.

Evaluating the Bandit AlgorithmWe measure the performance of our multi-armed bandit al-gorithm UUB in terms of a standard evaluation metric in theMAB literature called cumulative regret. Cumulative regretof a policy π is computed as the difference between the totalreward collected by an optimal policy π∗, which at each stepplays the arm with the highest expected utility (or reward)and the total reward collected by the policy π. Small val-ues of cumulative regret indicate better policies. The utilityfunction defined in Eqn. 1 determines the reward associatedwith each instance.

We compare the performance of our algorithm, UUB,with that of several baseline algorithms such as random,greedy, ε-greedy strategies (Chapelle and Li 2011), UCB,UCBf (Slivkins and Upfal 2008), sliding window and dis-counted UCB (Garivier and Moulines 2008) for various val-ues of the discounting factor γ = {0.2, 0.5, 0.8}. All algo-rithms take as input the partitions created by DSP. Figure

Figure 4: Illustration of the methodology on image data.

3(a) shows the cumulative regret of each of these algorithmson the image data set. Results for the other data sets canbe seen in the Appendix (Lakkaraju et al. 2016). The fig-ure shows that UUB achieves the smallest cumulative regretcompared to other baselines on the image data set. Similarly,UUB is the best performing algorithm on the sentiment snip-pets and subjectivity snippets data sets, whereas discountedUCB (γ = 0.5) achieves slightly smaller regret than UUBon the Amazon reviews data set. The experiment also high-lights a disadvantage of the discounted UCB algorithm asits performance is sensitive to the choice of the discount-ing factor γ, where as UUB is parameter free. Further, bothUCB and its variant UCBf which are designed for stationaryand slowly changing reward distributions respectively havehigher cumulative regret than UUB and discounted UCB in-dicating that they are not as effective in our setting.

Evaluating the Overall MethodologyIn the previous section, we compared the performance ofUUB to other bandit methods when they are given the samedata partitions to explore. In this section, we evaluate theperformance of our complete pipeline (DSP + UUB). Dueto the lack of existing baselines which address the problemat hand, we compare the performance of our framework toother end-to-end heuristic methods we devised as baselines.Due to space constraints, we present results only for the im-age dataset. Results for other data sets can be seen in theAppendix (Lakkaraju et al. 2016).

We compare the cumulative regret of our frameworkto that of a variety of baselines: 1) Random sampling:Randomly select B instances from set X for querying theoracle. 2) Least average similarity: For each instance in X ,compute the average Euclidean distance w.r.t all the datapoints in the training set and choose B instances with thelargest distance. 3) Least maximum similarity: Computeminimum Euclidean distance of each instance in X fromthe training set and choose B instances with the highestdistances. 4) Most uncertain: Rank the instances in Xin increasing order of the confidence scores assigned by

the model M and pick the top B instances. The leastaverage similarity and least maximum similarity baselinesare related to research on outlier detection (Chandola,Banerjee, and Kumar 2007). Furthermore, the baselinetitled most uncertain is similar to the uncertainty samplingquery strategy used in active learning literature. Notethat the least average similarity and the least maximumsimilarity baselines assume access to the data used to trainthe predictive model unlike our framework which makes nosuch assumptions. Figure 3(b) shows the cumulative regretof our framework and the baselines for the image data. Itcan be seen that UUB achieves the least cumulative regretof all the strategies across all data sets. It is interesting tonote that the least average similarity and the least maximumsimilarity approaches perform worse than UUB in spiteof having access to additional information in the form oftraining data.

Qualitative Analysis Figure 4 presents an illustrative ex-ample of how our methodology explores three of the par-titions generated for the image data set. Our partitioningframework associated the super pixels shown in the Figurewith each partition. Examining the super pixels reveals thatpartitions 1, 2 and 3 correspond to the images of white chi-huahuas (dog), white cats, and brown dogs respectively. Theplot shows the number of times the arms corresponding tothese partitions have been played by our bandit algorithm.The figure shows that partition 2 is chosen fewer times com-pared to partitions 1 and 3 — because white cat images arepart of the training data used by the predictive models andthere are not many unknown unknowns in this partition. Onthe other hand, white and brown dogs are not part of thetraining data and our bandit algorithm explores these parti-tions often. Figure 4 also indicates that partition 1 was ex-plored often during the initial plays but not later on. This isbecause there were fewer data points in that partition and thealgorithm had exhausted all of them after a certain numberof plays.

Related WorkIn this section, we review prior research relevant to thediscovery of unknown unknowns.

Unknown Unknowns The problem of model incom-pleteness and the challenge of grappling with unknownunknowns in the real world has been coming to the fore as acritical topic in discussions about the utility of AI technolo-gies (Horvitz 2008). Attenberg et. al. introduced the ideaof harnessing human input to identify unknown unknownsbut their studies left the task of exploration and discoverycompletely to humans without any assistance (Attenberg,Ipeirotis, and Provost 2015). In contrast, we propose analgorithmic framework in which the role of the oracle issimpler and more realistic: The oracle is only queriedfor labels of selected instances chosen by our algorithmicframework.

Dataset Shift A common cause of unknown unknowns isdataset shift, which represents the mismatch between train-

ing and test distributions (Quionero-Candela et al. 2009;Jiang and Zhai 2007). Multiple approaches have beenproposed to address dataset shift, including importanceweighting of training instances based on similarity totest set (Shimodaira 2000), online learning of predictionmodels (Cesa-Bianchi and Lugosi 2006), and learningmodels robust to adversarial actions (Teo et al. 2007;Graepel and Herbrich 2004; Decoste and Scholkopf 2002).These approaches cannot be applied to our setting as theymake one or more of the following assumptions which limittheir applicability to real-world settings: 1) the model is nota black box 2) the data used to train the predictive model isaccessible 3) the model can be adaptively retrained. Further,the goal of this work is different as we study the problemof discovering unknown unknowns of models which arealready deployed.

Active Learning Active learning techniques aim to buildhighly accurate predictive models while requiring fewerlabeled instances. These approaches typically involvequerying an oracle for labels of certain selected instancesand utilizing the obtained labels to adaptively retrain thepredictive models (Settles 2009). Various query strate-gies have been proposed to choose the instances to belabeled (e.g., uncertainty sampling (Lewis and Gale 1994;Settles 2009), query by committee (Seung, Opper, and Som-polinsky 1992), expected model change (Settles, Craven,and Ray 2008), expected error reduction (Zhu, Lafferty, andGhahramani 2003), expected variance reduction (Zhang andOles 2000)). Active learning frameworks were designedto be employed during the learning phase of a predictivemodel and are therefore not readily applicable to our settingwhere the goal is to find blind spots of a black box modelwhich has already been deployed. Furthermore, querystrategies employed in active learning are guided towardsthe discovery of known unknowns, utilizing informationfrom the predictive model to determine which instancesshould be labeled by the oracle. These approaches are notsuitable for the discovery of unknown unknowns as themodel is not aware of unknown unknowns and it lacksmeaningful information towards their discovery.

Outlier Detection Outlier detection involves identifying in-dividual data points (global outliers) or groups of data points(collective outliers) which either do not conform to a tar-get distribution or are dissimilar compared to majority ofthe instances in the data (Han, Pei, and Kamber 2011;Chandola, Banerjee, and Kumar 2007). Several parametricapproaches (Agarwal 2007; Abraham and Box 1979; Eskin2000) were proposed to address the problem of outlier de-tection. These methods made assumptions about the under-lying data distribution, and characterized those points witha smaller likelihood of being generated from the assumeddistribution, as outliers. Non-parametric approaches (Es-kin 2000; Eskin et al. 2002; Fawcett and Provost 1997)which made fewer assumptions about the distribution of thedata such as histogram based methods, distance and densitybased methods were also proposed to address this problem.Though unknown unknowns of any given predictive model

can be regarded as collective outliers w.r.t the data used totrain that model, the aforementioned approaches are not ap-plicable to our setting as we assume no access to the trainingdata.

Discussion & ConclusionsWe presented an algorithmic approach to discovering un-known unknowns of predictive models. The approach as-sumes no knowledge of the functional form or the associ-ated training data of the predictive models, thus, allowingthe method to be used to build insights about the behavior ofdeployed predictive models. In order to guide the discoveryof unknown unknowns, we partition the search space andthen use bandit algorithms to identify partitions with largerconcentrations of unknown unknowns. To this end, we pro-pose novel algorithms both for partitioning the search spaceas well as sifting through the generated partitions to discoverunknown unknowns.

We see several research directions ahead, including op-portunities to employ alernate objective functions. For in-stance, the budget B could be defined in terms of the totalcost of querying the oracle instead of the number of queriesto the oracle. Our method can also be extended to moresophisticated settings where the utility of some types of un-known unknowns decreases with time as sufficient examplesof the type are discovered (e.g., after informing the engineer-ing team about the discovered problem). In many settings,the oracle can be approximated via the acquisition of la-bels from crowdworkers, and the labeling noise of the crowdmight be addressed by incorporating repeated labeling intoour framework.

The discovery of unknown unknowns can help systemdesigners when deploying predictive models in numerousways. The partitioning scheme that we have explored pro-vides interpretable descriptions of each of the generated par-titions. These descriptions could help a system designerto readily understand the characteristics of the discoveredunknown unknowns and devise strategies to prevent errorsor recover from them (e.g., silencing the model when aquery falls into a particular partition where unknown un-knowns were discovered previously). Discovered unknownunknowns can further be used to retrain the predictive modelwhich in turn can recognize its mistakes and even correctthem.

Formal machinery that can shine light on limitations ofour models and systems will be critical in moving AI solu-tions into the open world–especially for high-stakes, safetycritical applications. We hope that this work on an algo-rithmic approach to identifying unknown unknowns in pre-dictive models will stimulate additional research on incom-pleteness in our models and systems.

AcknowledgmentsHimabindu Lakkaraju carried out this research during an in-ternship at Microsoft Research. The authors would like tothank Lihong Li, Janardhan Kulkarni, and the anonymousreviewers for their insightful comments and feedback.

ReferencesAbraham, B., and Box, G. E. 1979. Bayesian analysis of someoutlier problems in time series. Biometrika 66(2):229–236.Agarwal, D. 2007. Detecting anomalies in cross-classifiedstreams: a bayesian approach. Knowledge and information systems11(1):29–44.Agrawal, R.; Srikant, R.; et al. 1994. Fast algorithms for miningassociation rules. In VLDB.Attenberg, J.; Ipeirotis, P.; and Provost, F. 2015. Beat the ma-chine: Challenging humans to find a predictive model’s unknownunknowns. J. Data and Information Quality 6(1):1:1–1:17.Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-time anal-ysis of the multiarmed bandit problem. Machine learning 47(2-3):235–256.Brennan, T.; Dieterich, W.; and Ehret, B. 2009. Evaluating thepredictive validity of the compas risk and needs assessment system.Criminal Justice and Behavior 36(1):21–40.Cesa-Bianchi, N., and Lugosi, G. 2006. Prediction, learning, andgames. Cambridge university press.Chandola, V.; Banerjee, A.; and Kumar, V. 2007. Outlier detection:A survey.Chapelle, O., and Li, L. 2011. An empirical evaluation of thomp-son sampling. In NIPS, 2249–2257.Crawford, K. 2016. Artificial intelligence’s whiteguy problem. New York Times. http://www.nytimes.com/2016/06/26/opinion/sunday/artificial-intelligences-white-guy-problem.html.Decoste, D., and Scholkopf, B. 2002. Training invariant supportvector machines. Machine learning 46(1-3):161–190.Elkan, C. 2001. The foundations of cost-sensitive learning. InIJCAI.Eskin, E.; Arnold, A.; Prerau, M.; Portnoy, L.; and Stolfo, S. 2002.A geometric framework for unsupervised anomaly detection. InApplications of data mining in computer security. Springer. 77–101.Eskin, E. 2000. Anomaly detection over noisy data using learnedprobability distributions. In ICML, 255–262.Fawcett, T., and Provost, F. 1997. Adaptive fraud detection. Datamining and knowledge discovery 1(3):291–316.Feige, U. 1998. A threshold of ln n for approximating set cover.Journal of the ACM 45(4):634–652.Fisher, D. H. 1987. Knowledge acquisition via incremental con-ceptual clustering. Machine learning 2(2):139–172.Garivier, A., and Moulines, E. 2008. On upper-confidencebound policies for non-stationary bandit problems. arXiv preprintarXiv:0805.3415.Graepel, T., and Herbrich, R. 2004. Invariant pattern recognitionby semidefinite programming machines. In NIPS, 33.Han, J.; Pei, J.; and Kamber, M. 2011. Data mining: concepts andtechniques. Elsevier.Horvitz, E. 2008. Artificial intelligence in the open world. Presi-dential Address, AAAI. http://bit.ly/2gCN7t9.Jiang, J., and Zhai, C. 2007. A two-stage approach to domainadaptation for statistical classifiers. In CIKM, 401–410.Johnson, D. S. 1974. Approximation algorithms for combinatorialproblems. Journal of computer and system sciences 9(3):256–278.Kaggle. 2013. Dogs vs cats dataset. https://www.kaggle.com/c/dogs-vs-cats/data.

Kim, B.; Rudin, C.; and Shah, J. A. 2014. The bayesian casemodel: A generative approach for case-based reasoning and proto-type classification. In NIPS, 1952–1960.Lakkaraju, H., and Leskovec, J. 2016. Confusions over time:An interpretable bayesian model to characterize trends in decisionmaking. In NIPS, 3261–3269.Lakkaraju, H.; Kamar, E.; Caruana, R.; and Horvitz, E. 2016.Discovering blind spots of predictive models: Representations andpolicies for guided exploration. https://arxiv.org/abs/1610.09064.Lewis, D. D., and Gale, W. A. 1994. A sequential algorithm fortraining text classifiers. In SIGIR, 3–12.Li, Z.; Peng, H.; and Wu, X. 2008. A new descriptive cluster-ing algorithm based on nonnegative matrix factorization. In IEEEInternational Conference on Granular Computing, 407–412.McAuley, J.; Pandey, R.; and Leskovec, J. 2015. Inferring net-works of substitutable and complementary products. In KDD, 785–794.Michalski, R. S., and Stepp, R. E. 1983. Learning from obser-vation: Conceptual clustering. In Machine learning: An artificialintelligence approach. Springer. 331–363.Pang, B., and Lee, L. 2004. A sentimental education: Sentimentanalysis using subjectivity summarization based on minimum cuts.In ACL, 271.Pang, B., and Lee, L. 2005. Seeing stars: Exploiting class relation-ships for sentiment categorization with respect to rating scales. InACL, 115–124.Quionero-Candela, J.; Sugiyama, M.; Schwaighofer, A.; andLawrence, N. D. 2009. Dataset Shift in Machine Learning. TheMIT Press.Ribeiro, M. T.; Singh, S.; and Guestrin, C. 2016. ” why should itrust you?”: Explaining the predictions of any classifier. In KDD.Settles, B.; Craven, M.; and Ray, S. 2008. Multiple-instance activelearning. In NIPS, 1289–1296.Settles, B. 2009. Active learning literature survey. Computer Sci-ences Technical Report 1648, University of Wisconsin–Madison.Seung, H. S.; Opper, M.; and Sompolinsky, H. 1992. Query bycommittee. In COLT, 287–294.Shimodaira, H. 2000. Improving predictive inference under co-variate shift by weighting the log-likelihood function. Journal ofstatistical planning and inference 90(2):227–244.Slivkins, A., and Upfal, E. 2008. Adapting to a changing environ-ment: the brownian restless bandits. In COLT, 343–354.Teo, C. H.; Globerson, A.; Roweis, S. T.; and Smola, A. J. 2007.Convex learning with invariances. In NIPS, 1489–1496.Weiss, D. 2006. Descriptive clustering as a method for exploringtext collections. Ph.D. Dissertation.Wolpert, D. H., and Macready, W. G. 1997. No free lunch theoremsfor optimization. IEEE transactions on evolutionary computation1(1):67–82.Zhang, T., and Oles, F. 2000. The value of unlabeled data forclassification problems. In ICML, 1191–1198.Zhu, X.; Lafferty, J.; and Ghahramani, Z. 2003. Combining ac-tive learning and semi-supervised learning using gaussian fieldsand harmonic functions. In ICML workshop on The Continuumfrom Labeled to Unlabeled Data in Machine Learning and DataMining.

Documents

Identifying Unknown Unknowns in the Open World ......Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration Himabindu Lakkaraju , Ece Kamar