Sponsored Search Acution Design Via Machine Learning

Maria-Florina Balcan04/22/2019

Semi-Supervised Learning

Readings: • Semi-Supervised Learning. Encyclopedia of Machine

Learning. Jerry Zhu, 2010

• Combining Labeled and Unlabeled Data with Co-Training. Avrim Blum, Tom Mitchell. COLT 1998.

Labeled Examples

Fully Supervised Learning

Learning Algorithm

Expert / Oracle

Data Source

Alg.outputs

Distribution D on X

c* : X ! Y

(x1,c*(x1)),…, (xm,c*(xm))

h : X ! Yx1 > 5

x6 > 2

+1 -1

+1

+

-

+++

--

--

-

Labeled Examples

Learning Algorithm

Expert / Oracle

Data Source

Alg.outputs

Distribution D on X

c* : X ! Y

(x1,c*(x1)),…, (xm,c*(xm))

h : X ! Y

Goal: h has small error over D.

errD h = Prx~ D

(h x ≠ c∗(x))

Sl={(x1, y1), …,(xml, yml)}

xi drawn i.i.d from D, yi = c∗(xi)

Fully Supervised Learning

Classic Paradigm Insufficient NowadaysModern applications: massive amounts of raw data.

Only a tiny fraction can be annotated by human experts.

Billions of webpages ImagesProtein sequences

http://images.google.com/imgres?imgurl=http://www.teachenglishinasia.net/files/u2/purple_lotus_flower.jpg&imgrefurl=http://www.teachenglishinasia.net/asiablog/asian-water-lilies-and-lotus-flowers&usg=__jQMsElfDOQGWm-hebjVtJDqL-40=&h=335&w=500&sz=28&hl=en&start=3&sig2=hmb0FR5kLCXdU2BdwjCNcA&zoom=1&itbs=1&tbnid=r_JpUcIxETAUoM:&tbnh=87&tbnw=130&prev=/images?q=flower&hl=en&gbv=2&tbs=isch:1&ei=Is1yTcKaOcGAlAe0wqGmAQ

http://images.google.com/imgres?imgurl=http://www.soccerballworld.com/images/Ball-UEFA-FINALE.jpg&imgrefurl=http://www.soccerballworld.com/UEFA-Champions-Finale.htm&usg=__IkP2FDq2F5yEyjqW0wDIthrifzc=&h=300&w=300&sz=15&hl=en&start=5&sig2=Sw_H6_M6TzTbVg5PqjP6lw&zoom=1&itbs=1&tbnid=E1Xl3YAzGDlvgM:&tbnh=116&tbnw=116&prev=/images?q=ball&hl=en&gbv=2&tbs=isch:1&ei=-8xyTbz0OISclge849RL

http://images.google.com/imgres?imgurl=http://ceoworld.biz/ceo/wp-content/uploads/2009/07/microsoft_logo.jpg&imgrefurl=http://ceoworld.biz/ceo/2009/07/13/microsoft-to-sell-razorfish-wpp-omnicom-publicis-groupe-interpublic-group-and-dentsu-possible-bidder&usg=__D-yYVRjhhUHbi2sYkh2hyP_5EK4=&h=299&w=300&sz=5&hl=en&start=2&sig2=4orOiNNbbV2i6Ii7SFMdmg&zoom=1&itbs=1&tbnid=Sk8jPaSR61onuM:&tbnh=116&tbnw=116&prev=/images?q=microsoft&hl=en&gbv=2&tbs=isch:1&ei=1MtyTbH9IYOglAeJrq2bAQ

http://images.google.com/imgres?imgurl=http://www.treehugger.com/ec-rnd-005.jpg&imgrefurl=http://www.treehugger.com/files/2008/07/17-electric-cars-overview-2005-to-2008.php&usg=__4G2QjNYXIHEYVXE6DoqIE5zzzrY=&h=322&w=468&sz=40&hl=en&start=9&sig2=P7jmw1vmKb715x8CELHezQ&zoom=1&itbs=1&tbnid=iGci7mkzT2KN2M:&tbnh=88&tbnw=128&prev=/images?q=car&hl=en&gbv=2&tbs=isch:1&ei=qMtyTcilGIOClAey9IGqCw

http://images.google.com/imgres?imgurl=http://images.nymag.com/news/features/newface080811_lede_560.jpg&imgrefurl=http://nymag.com/news/features/48948/&usg=__cimnczuBny94AVQhDb04KyNUIm4=&h=482&w=560&sz=50&hl=en&start=11&sig2=4wGA_vH60nBqlHqvekl3lA&zoom=1&itbs=1&tbnid=gg1XN95D7aTn3M:&tbnh=114&tbnw=133&prev=/images?q=face&hl=en&gbv=2&tbs=isch:1&ei=dMtyTbuKL8OqlAecwcFN

http://images.google.com/imgres?imgurl=http://www.classicsavers.com/casablanca.jpg&imgrefurl=http://www.classicsavers.com/Casablanca.html&h=600&w=800&sz=72&tbnid=wSXOd5UUibIJ:&tbnh=106&tbnw=141&start=3&prev=/images?q=casablanca&hl=en&lr=&ie=UTF-8


http://images.google.com/imgres?imgurl=http://www.flash-slideshow-maker.com/images/help_clip_image020.jpg&imgrefurl=http://www.flash-slideshow-maker.com/flash-images/&usg=__jMIGXrfx-_pXlchniUcjT2UYT9k=&h=422&w=554&sz=35&hl=en&start=1&sig2=YCfTn6kxhcufspbxJdXhrQ&zoom=1&itbs=1&tbnid=_ruOuQ2EjL3wOM:&tbnh=101&tbnw=133&prev=/images?q=images&hl=en&sa=X&gbv=2&tbas=0&tbs=isch:1&ei=n8hyTej5N8X_lgffrdVF

http://images.google.com/imgres?imgurl=http://www.dynamicdrive.com/dynamicindex4/lightbox2/flower.jpg&imgrefurl=http://www.dynamicdrive.com/dynamicindex4/lightbox2/index.htm&usg=__6G24ltZqq0UGAerNdiENN4jOqqQ=&h=509&w=500&sz=46&hl=en&start=4&sig2=P_pNfXebhTzd8n0DrpgNhw&zoom=1&itbs=1&tbnid=--IQZoA0t4jdOM:&tbnh=131&tbnw=129&prev=/images?q=images&hl=en&sa=X&gbv=2&tbas=0&tbs=isch:1&ei=n8hyTej5N8X_lgffrdVF

http://images.google.com/imgres?imgurl=http://www.nrl.navy.mil/content_images/clem.jpg&imgrefurl=http://www.nrl.navy.mil/media/popular-images/&usg=__Cj15wZbDIqpRCeVmO7SsP5XIOzM=&h=1213&w=1720&sz=1347&hl=en&start=3&sig2=TRnaOP-O9wUtTBwS0_vDZA&zoom=1&itbs=1&tbnid=EprKgSAflWdsYM:&tbnh=106&tbnw=150&prev=/images?q=images&hl=en&sa=X&gbv=2&tbas=0&tbs=isch:1&ei=n8hyTej5N8X_lgffrdVF

http://images.google.com/imgres?imgurl=http://hirise.lpl.arizona.edu/HiBlog/wp-content/uploads/2008/03/mex_phobos.jpg&imgrefurl=http://hirise.lpl.arizona.edu/HiBlog/category/hirise/news-events/special-images/&usg=__DLojkIcAQfQzGQ4oBrU4aT3OcHI=&h=400&w=400&sz=29&hl=en&start=15&sig2=cqA6Gj5KQHXSZIuOwlp0iQ&zoom=1&itbs=1&tbnid=QIVdyHeoWfEyFM:&tbnh=124&tbnw=124&prev=/images?q=images&hl=en&sa=X&gbv=2&tbas=0&tbs=isch:1&ei=n8hyTej5N8X_lgffrdVF

http://images.google.com/imgres?imgurl=http://www.mabima.net/company/images/Tucker_House_2003_640.JPG&imgrefurl=http://www.mabima.net/company/&usg=__yK2ELUzEORpde1XjSFJTrRDtFxc=&h=426&w=640&sz=69&hl=en&start=4&sig2=Q0e47SnpfYjO5jf5DQQk-Q&zoom=1&itbs=1&tbnid=GD54wp6WHlrhMM:&tbnh=91&tbnw=137&prev=/images?q=house&hl=en&gbv=2&tbs=isch:1&ei=KMtyTZnAE8aqlAegju1U

http://images.google.com/imgres?imgurl=http://www.popsci.com/files/imagecache/article_image_large/articles/ie2.jpg&imgrefurl=http://www.popsci.com/gadgets/article/2010-08/microsoft-fight-between-revenue-and-privacy-money-1-privacy-zero&usg=__wfaTvyWsRtJGRnAP5yJyb-kp3u0=&h=333&w=500&sz=22&hl=en&start=1&sig2=PTP24GqOtv3iVaxKrsl8dg&zoom=1&itbs=1&tbnid=_nE8aJVhvxfpbM:&tbnh=87&tbnw=130&prev=/images?q=internet&hl=en&gbv=2&tbs=isch:1&ei=FsxyTaDEPMWAlAePlcFS

Modern applications: massive amounts of raw data.

Modern ML: New Learning Approaches

Expert

• Semi-supervised Learning, (Inter)active Learning.

Techniques that best utilize data, minimizing need for expert/human intervention.

Paradigms where there has been great progress.

Active Learning

face

O

O

O

Expert Labeler

raw data

Classifier

not face


http://images.google.com/imgres?imgurl=http://www.socksoff.co.uk/00001/page05/Lone_Tree_1600.jpg&imgrefurl=http://www.hongkiat.com/blog/100-absolutely-beautiful-nature-wallpapers-for-your-desktop/&usg=__dsyO8OzB7rmE16btYKAa2WrNJtQ=&h=1200&w=1600&sz=693&hl=en&start=9&sig2=N22ozJ4mtT9DuMR9P-wNOg&zoom=1&itbs=1&tbnid=UU66VMJn9kPePM:&tbnh=113&tbnw=150&prev=/images?q=tree&hl=en&gbv=2&tbs=isch:1&ei=YsxyTearFsSclge1p_hf





http://images.google.com/imgres?imgurl=http://www.wallpaperbase.com/wallpapers/animals/fish/fish_4.jpg&imgrefurl=http://www.wallpaperbase.com/animals-fish.shtml&usg=__2GMQzBfy-zcwD5e3ceBpUaGJGsE=&h=768&w=1024&sz=115&hl=en&start=6&sig2=86Ql8saH12Fk6g3FoyAagw&zoom=1&itbs=1&tbnid=vPHzH3MqjP6xOM:&tbnh=113&tbnw=150&prev=/images?q=fish&hl=en&gbv=2&tbs=isch:1&ei=TstyTb3AB8Oclget89ChAw






POLL

Expert Labeler


raw data

face not face

Labeled data

Classifier

http://images.google.com/imgres?imgurl=http://static.howstuffworks.com/gif/house-selling-1.jpg&imgrefurl=http://home.howstuffworks.com/real-estate/house-selling.htm&usg=__qnE74MTnGIEKpohCSP6iTM9E8EA=&h=327&w=400&sz=24&hl=en&start=3&sig2=G6Bq69GdXS9x6hMpDk3HfA&zoom=1&itbs=1&tbnid=3fiW8K30_ygY1M:&tbnh=101&tbnw=124&prev=/images?q=house&hl=en&gbv=2&tbs=isch:1&ei=RF9sTcWkOsL88AbtremUBQ


http://images.google.com/imgres?imgurl=http://static.howstuffworks.com/gif/house-selling-1.jpg&imgrefurl=http://home.howstuffworks.com/real-estate/house-selling.htm&usg=__qnE74MTnGIEKpohCSP6iTM9E8EA=&h=327&w=400&sz=24&hl=en&start=3&sig2=G6Bq69GdXS9x6hMpDk3HfA&zoom=1&itbs=1&tbnid=3fiW8K30_ygY1M:&tbnh=101&tbnw=124&prev=/images?q=house&hl=en&gbv=2&tbs=isch:1&ei=RF9sTcWkOsL88AbtremUBQ


http://images.google.com/imgres?imgurl=http://www.socksoff.co.uk/00001/page05/Lone_Tree_1600.jpg&imgrefurl=http://www.hongkiat.com/blog/100-absolutely-beautiful-nature-wallpapers-for-your-desktop/&usg=__dsyO8OzB7rmE16btYKAa2WrNJtQ=&h=1200&w=1600&sz=693&hl=en&start=9&sig2=N22ozJ4mtT9DuMR9P-wNOg&zoom=1&itbs=1&tbnid=UU66VMJn9kPePM:&tbnh=113&tbnw=150&prev=/images?q=tree&hl=en&gbv=2&tbs=isch:1&ei=YsxyTearFsSclge1p_hf





http://images.google.com/imgres?imgurl=http://www.wallpaperbase.com/wallpapers/animals/fish/fish_4.jpg&imgrefurl=http://www.wallpaperbase.com/animals-fish.shtml&usg=__2GMQzBfy-zcwD5e3ceBpUaGJGsE=&h=768&w=1024&sz=115&hl=en&start=6&sig2=86Ql8saH12Fk6g3FoyAagw&zoom=1&itbs=1&tbnid=vPHzH3MqjP6xOM:&tbnh=113&tbnw=150&prev=/images?q=fish&hl=en&gbv=2&tbs=isch:1&ei=TstyTb3AB8Oclget89ChAw





Labeled Examples


Learning Algorithm

Expert / Oracle

Data Source

Unlabeled examples

Algorithm outputs a classifier

Unlabeled examples

Sl={(x1, y1), …,(xml, yml)}

Su={x1, …,xmu} drawn i.i.d from D

xi drawn i.i.d from D, yi = c∗(xi)Goal: h has small error over D.

errD h = Prx~ D

(h x ≠ c∗(x))

Semi-supervised Learning

Test of time

awards at ICML!

Workshops [ICML ’03, ICML’ 05, …]

• Semi-Supervised Learning, MIT 2006O. Chapelle, B. Scholkopf and A. Zien (eds)

Books:

• Introduction to Semi-Supervised Learning,Morgan & Claypool, 2009 Zhu & Goldberg

• Major topic of research in ML.• Several methods have been developed to try to use

unlabeled data to improve performance, e.g.:– Transductive SVM [Joachims ’99]

– Co-training [Blum & Mitchell ’98]

– Graph-based methods [B&C01], [ZGL03]


Test of time

awards at ICML!





Both wide spread applications and solid foundational understanding!!!


Test of time

awards at ICML!





Today: discuss these methods.

Very interesting, they all exploit unlabeled data in different, very interesting and creative ways.

Unlabeled data useful if we have beliefs not only about the form of the target, but also about its relationship

with the underlying distribution.

Key Insight

Semi-supervised learning: no querying. Just have lots of additional unlabeled data.

A bit puzzling; unclear what unlabeled data can do for us…. It is missing the most important info. How can it help us in substantial ways?

Semi-supervised SVM[Joachims ’99]

Margins based regularity

+

+

_

_

Labeled data only

+

+

_

_

+

+

_

_

Transductive SVMSVM

Target goes through low density regions (large margin).• assume we are looking for linear separator• belief: should exist one with large separation

Transductive Support Vector MachinesOptimize for the separator with large margin wrt labeled andunlabeled data. [Joachims ’99]

argminw w 2 s.t.:

• yi w ⋅ xi ≥ 1, for all i ∈ {1, … ,ml}

Su={x1, …,xmu}

• ෞyuw ⋅ xu ≥ 1, for all u ∈ {1,… ,mu}

• ෞyu ∈ {−1, 1} for all u ∈ {1, … ,mu}

0

++++-

--

0 +--

--

w’𝑤’ ⋅ 𝑥 = −1

𝑤’ ⋅ 𝑥 = 1

0

0

0

0

0

0

0

0

0

00

0

0

0

00

00

0

0

0

0

0

0

0

0

0

0

0

0

0

00

00

0

0

0

0

0

0

0

0

0

0

0

0

0

00

00

0

0

0

0

0

0

0

0

0

0

0

0

0

00

00

0

0

0

Input: Sl={(x1, y1), …,(xml, yml)}

Find a labeling of the unlabeled sample and 𝑤 s.t. 𝑤 separates both labeled and unlabeled data with maximum margin.

Transductive Support Vector Machines

argminw w 2 + 𝐶 σ𝑖 𝜉𝑖 + 𝐶 σ𝑢 𝜉𝑢

• yi w ⋅ xi ≥ 1-𝜉𝑖, for all i ∈ {1, … ,ml}

Su={x1, …,xmu}

• ෞyuw ⋅ xu ≥ 1 − 𝜉𝑢, for all u ∈ {1, … ,mu}

• ෞyu ∈ {−1, 1} for all u ∈ {1, … ,mu}

0

++++-

--

0 +--

--

w’𝑤’ ⋅ 𝑥 = −1

𝑤’ ⋅ 𝑥 = 1

0

0

0

0

0

0

0

0

0

00

0

0

0

00

00

0

0

0

0

0

0

0

0

0

0

0

0

0

00

00

0

0

0

0

0

0

0

0

0

0

0

0

0

00

00

0

0

0

0

0

0

0

0

0

0

0

0

0

00

00

0

0

0


Find a labeling of the unlabeled sample and 𝑤 s.t. 𝑤 separates both labeled and unlabeled data with maximum margin.

Optimize for the separator with large margin wrt labeled andunlabeled data. [Joachims ’99]

Transductive Support Vector MachinesOptimize for the separator with large margin wrt labeled andunlabeled data.

argminw w 2 + 𝐶 σ𝑖 𝜉𝑖 + 𝐶 σ𝑢 𝜉𝑢

• yi w ⋅ xi ≥ 1-𝜉𝑖, for all i ∈ {1, … ,ml}

Su={x1, …,xmu}

• ෞyuw ⋅ xu ≥ 1 − 𝜉𝑢, for all u ∈ {1, … ,mu}

• ෞyu ∈ {−1, 1} for all u ∈ {1, … ,mu}

0


NP-hard….. Convex only after you guessed the labels… too many possible guesses…

Transductive Support Vector MachinesOptimize for the separator with large margin wrt labeled andunlabeled data.

Heuristic (Joachims) high level idea:

Keep going until no more improvements. Finds a locally-optimal solution.

• First maximize margin over the labeled points

• Use this to give initial labels to unlabeled points based on this separator.

• Try flipping labels of unlabeled points to see if doing so can increase margin

Experiments [Joachims99]

Highly compatible +

+_

_

Helpful distribution

Non-helpful distributions

Transductive Support Vector Machines

1/°2 clusters, all partitions separable by large margin

Margin satisfiedMargin not satisfied

Different type of underlying regularity assumption: Consistency or Agreement Between Parts

[Blum & Mitchell ’98]

Co-training

Co-training: Self-consistency

My AdvisorProf. Avrim Blum My AdvisorProf. Avrim Blum

x1- Text info x2- Link infox - Link info & Text info

x = h x1, x2 i

Agreement between two parts : co-training [Blum-Mitchell98].

- examples contain two sufficient sets of features, x = h x1, x2 i

For example, if we want to classify web pages:

- belief: the parts are consistent, i.e. 9 c1, c2 s.t. c1(x1)=c2(x2)=c*(x)

as faculty member homepage or not

my advisor

Idea: Use unlabeled data to propagate learned information.

Idea: Use small labeled sample to learn initial rules.• E.g., “my advisor” pointing to a page is a good indicator it is a

faculty home page.

• E.g., “I am teaching” on a page is a good indicator it is a faculty home page.

Iterative Co-Training

Idea: Use unlabeled data to propagate learned information.

Idea: Use small labeled sample to learn initial rules.• E.g., “my advisor” pointing to a page is a good indicator it is a

faculty home page.

• E.g., “I am teaching” on a page is a good indicator it is a faculty home page.


Look for unlabeled examples where one rule is confident and the other is not. Have it label the example for the other.

hx1,x2ihx1,x2ihx1,x2i

hx1,x2ihx1,x2ihx1,x2i

Training 2 classifiers, one on each type of info. Using each to help train the other.


• Have learning algos A1, A2 on each of the two views.• Use labeled data to learn two initial hyp. h1, h2.

• Look through unlabeled data to find examples where one of hi is confident but other is not.

• Have the confident hi label it for algorithm A3-i.

Repeat

+++

X1 X2

Works by using unlabeled data to propagate learned information. hh1

Original Application: Webpage classification12 labeled examples, 1000 unlabeled

(sample run)

Iterative Co-Training A Simple Example: Learning Intervals

c2

c1

Use labeled data to learn h11 and h2

1

Use unlabeled data to bootstrap

h11

h21

Labeled examplesUnlabeled examples

h12

h21

h12

h22

+- -

Expansion, Examples: Learning Intervals

c1

c2

D+

c1

c2

Consistency: zero probability mass in the regions

Non-expanding (non-helpful)distribution

Expanding distribution

c1

c2

D+

S1

S2

Co-training: Theoretical GuaranteesWhat properties do we need for co-training to work well?We need assumptions about:

1. the underlying data distribution2. the learning algos on the two sides

[Blum & Mitchell, COLT ‘98]

1. Independence given the label2. Alg. for learning from random noise.

[Balcan, Blum, Yang, NIPS 2004]

1. Distributional expansion.2. Alg. for learning from positve data only.

𝐷1+ 𝐷2+

𝐷1− 𝐷2−

View 1 View 2

Co-training/Multi-view SSL: Direct Optimization of Agreement

argminh1,h2 l=1

2

i=1

ml

l(hl xi , yi) + Ci=1

mu

agreement(h1 xi , h2 xi )

Su={x1, …,xmu}Input: Sl={(x1, y1), …,(xml, yml)}

Each of them has small labeled error

Regularizer to encourage agreement over unlabeled dat

P. Bartlett, D. Rosenberg, AISTATS 2007; K. Sridharan, S. Kakade, COLT 2008E.g.,

Co-training/Multi-view SSL: Direct Optimization of Agreement

Su={x1, …,xmu}Input: Sl={(x1, y1), …,(xml, yml)}

• l(h xi , yi) loss function

• E.g., square loss l h xi , yi = yi − ℎ xl2

• E.g., 0/1 loss l h xi , yi = 1𝑦𝑖≠ℎ(𝑥𝑖)

P. Bartlett, D. Rosenberg, AISTATS 2007; K. Sridharan, S. Kakade, COLT 2008E.g.,

argminh1,h2 l=1

2

i=1

ml

l(hl xi , yi) + Ci=1

mu

agreement(h1 xi , h2 xi )

Original Application: Webpage classification12 labeled examples, 1000 unlabeled

(sample run)

Many Other Applications

E.g., [Levin-Viola-Freund03] identifying objects in images. Two different kinds of preprocessing.

E.g., [Collins&Singer99] named-entity extraction.– “I arrived in London yesterday”

…

What You Should Know• Unlabeled data useful if we have beliefs not only about

the form of the target, but also about its relationship with the underlying distribution.

• Different types of algorithms (based on different beliefs).

– Transductive SVM [Joachims ’99]



Additional Material on Graph Based Methods

Similarity Based Regularity[Blum&Chwala01], [ZhuGhahramaniLafferty03]

Graph-based Methods

E.g., handwritten digits [Zhu07]:

• Assume we are given a pairwise similarity fnc and that very similar examples probably have the same label.

• If we have a lot of labeled data, this suggests a Nearest-Neighbor type of algorithm.

• If you have a lot of unlabeled data, perhaps can use them as “stepping stones”.

Graph-based MethodsIdea: construct a graph with edges between very similar examples.Unlabeled data can help “glue” the objects of the same class together.

Graph-based MethodsIdea: construct a graph with edges between very similar examples. Unlabeled data can help “glue” the objects of the same class together.

Person Identification in Webcam Images: An Application of Semi-Supervised Learning. [Balcan,Blum,Choi, Lafferty, Pantano, Rwebangira, Xiaojin Zhu], ICML 2005 Workshop on Learning with Partially Classified Training Data.

Several methods:– Minimum/Multiway cut [Blum&Chawla01]– Minimum “soft-cut” [ZhuGhahramaniLafferty’03]– Spectral partitioning– …

Main Idea:

• Might have also glued together in G examples of different classes.

Often, transductive approach. (Given L + U, output predictions on U). Are alllowed to output any labeling of 𝐿 ∪ 𝑈.

• Construct graph G with edges between very similar examples.

• Run a graph partitioning algorithm to separate the graph into pieces.

Graph-based Methods

Minimum/Multiway Cut[Blum&Chawla01]Objective: Solve for labels on unlabeled points that minimize total weight of edges whose endpoints have different labels.

• If just two labels, can be solved efficiently using max-flow min-cut algorithms

- Create super-source 𝑠 connected by edges of weight ∞ to all + labeled pts.

- Create super-sink 𝑡 connected by edges of weight ∞ to all − labeled pts.

- Find minimum-weight 𝑠-𝑡 cut

(i.e., the total weight of bad edges)

Minimum “soft cut” [ZhuGhahramaniLafferty’03]

• Can be done efficiently by solving a set of linear equations.

Objective Solve for probability vector over labels 𝑓𝑖 on each unlabeled point 𝑖.

• Minimize σ𝑒=(𝑖,𝑗)𝑤𝑒 𝑓𝑖 − 𝑓𝑗2

where ‖𝑓𝑖 − 𝑓𝑗‖ is Euclidean distance.

(0100000000)

(1000000000)

(0001000000)

(0000000001)

(0000000010)

(0000000100)(labeled points get coordinate vectors in direction of their known label)

How to Create the Graph• Empirically, the following works well:

1. Compute distance between i, j

2. For each i, connect to its kNN. k very small but still connects the graph

3. Optionally put weights on (only) those edges

4. Tune V

Documents

Sponsored Search Acution Design Via Machine Learning