From W1-S16

From W1-S16

Node failure• The probability that at least one node failing is: f= 1 – (1-p)n

• When n =1; then f =p

• Suppose p=0.0001 but n=10000, then:

f = 1 – (1 -0.0001)10000 = 0.63 [why/how ?]

• This is one of the most important formulas to know (in general).

From W2-S9

Example• For example suppose the hash functions maps {to, Java, road}

to one node. Then– (to,1) remains (to,1)– (Java,1);(Java,1);(Java,1) (Java, [1,1,1])– (road,1);(road,1)(road,[1,1]);

• Now REDUCE function converts – (Java,[1,1,1]) (Java,3) etc.

• Remember this is a very simple example…the challenge is to take complex tasks and express them as Map and Reduce!

From W2-S15

Similarity Example [2]

Notice, it requires some ingenuity to come up with key-value pairs. This iskey to suing map-reduce effectively

From W2-S19

K-means algorithmLet C = initial k cluster centroids (often selected randomly)Mark C as unstableWhile <C is unstable> Assign all data points to their nearest centroid in C. Compute the centroids of the points assigned to each element of C. Update C as the set of new centroids. Mark C as stable or unstable by comparing with previous set of centroids. End While

Complexity: O(nkdI)n:num of points; k: num of clusters; d: dimension; I: num of iterationsTake away: complexity is linear in n.

From W3-S14

Example: 2 Clusters

c

c

c

c

A(-1,2) B(1,2)

C(-1,-2) D(1,-2)(0,0)

K-means Problem: Solution is (0,2) and (0,-2) and the clusters are {A,B} and{C,D}

K-means Algorithm: Suppose the initial centroids are (-1,0) and (1,0) then{A,C} and {B,D} end up as the two clusters.

4

2

From W3-S16

Bayes Rule

PriorPosterior

From W4-S21

Example: Iris Flower• F=Flower; SL=Sepal Length; SW = Sepal Width; • PL=Petal Length; PW =Petal Width

• Data

Large Small Medium Small ?

choose themaximum

From W4-S25

Confusion Matrix

Actual Label (1) Actual Label (-1)

Predicted Label (1) True Positive (N1) False Positive (N2)

Predicted Label (-1) False Negatives (N3) True Negatives (N4)

Label 1 is called Positive, Label -1 is called Negative

Let the number of test samples be N

N = N1 + N2 + N3 + N4.

True Positive Rate (TPR) = N1/(N1+N3)True Negative Rate (TNR) = N4/(N4+N2)

False Positive Rate (FPR) = N2/(N2+N4)

False Negative Rate (FNR) = N3/(N1+N3)

Accuracy = (N1+N4)/(N1+N2+N3+N4)

Precision = N1/(N1+N2) Recall = N1/(N1+N3)

From W5-S7

ROC (Receiver Operating Characteristic) Curves

• Generally a learning algorithm A will return a real number…but what we want is a label {1 or -1}

• We can apply a threshold..TA 0.7 0.6 0.5 0.2 0.1 0.09 0.08 0.02 0.01

T=0.1 1 1 1 1 1 -1 -1 -1 -1

True Label

1 1 -1 -1 1 1 -1 -1 -1

A 0.7 0.6 0.5 0.2 0.1 0.09 0.08 0.02 0.01

T=0.2 1 1 1 1 -1 -1 -1 -1 -1

True Label

1 1 -1 -1 1 1 -1 -1 -1

TPR = 3/4FPR = 2/5

TPR = 2/4FPR = 2/5

From W5-S9

Random Variable

• A random variable X can take values in a set which is:– discrete and finite.

• Lets toss a coin and X = 1 if it’s a head and X=0 if it’s a tail. X is random variable

– discrete and infinite (countable)• Let X be the number of accidents in Sydney in a day.. Then X

= 0,1,2,…..– Infinite (uncountable)

• Let X be the height of a Sydney-sider.– X = 150, 150.11,150.112,……

From W5-S13

From W7-S2

These slides are from Steinbach, Pang and Kumar

From W7-S7

From W7-S8

From W9-S9

From W9-S12

From W9-S21

From W9-S26

The Key Idea• Decompose the User x Rating matrix into:

• User x Rating = ( User x Genre ) x (Genre x Movies)– Number of Genres is typically small

• Or

• R =~ UV

• Find U and V such that ||R – UV|| is minimized…– Almost like k-means clustering…why ?

From W11-S9

UV Computation….From W11-S15

This example is from Rajaraman, Leskovic and Ullman: See Textbook

Documents

From W1-S16