Foundations of Machine Learning and Data Science

Maria-Florina (Nina) Balcan

Lecture 1, September 9, 2015

Course Staff

• Nina Balcan http://www.cs.cmu.edu/~ninamf

• Avrim Blum http://www.cs.cmu.edu/~avrim

Instructors:

• Nika Haghtalab http://www.cs.cmu.edu/~nhaghtal

• Sarah Allen http://www.cs.cmu.edu/~srallen

TAs:

Lectures in general

On the board

Ocasionally, will use slides

4

Image Classification

Document Categorization

Speech Recognition

Branch Prediction

Protein Classification

Spam Detection Fraud Detection

Machine Learning

Playing Games Computational Advertising

http://images.google.com/imgres?imgurl=http://www.classicsavers.com/casablanca.jpg&imgrefurl=http://www.classicsavers.com/Casablanca.html&h=600&w=800&sz=72&tbnid=wSXOd5UUibIJ:&tbnh=106&tbnw=141&start=3&prev=/images%3Fq%3Dcasablanca%26hl%3Den%26lr%3D%26ie%3DUTF-8

Machine Learning is Changing the World

“A breakthrough in machine learning would be worth ten Microsofts” (Bill Gates, Microsoft)

“Machine learning is the hot new thing” (John Hennessy, President, Stanford)

“Web rankings today are mostly a matter of machine learning” (Prabhakar Raghavan, VP Engineering at Google)

The COOLEST TOPIC IN SCIENCE

• “A breakthrough in machine learning would be worth ten Microsofts” (Bill Gates, Chairman, Microsoft)

• “Machine learning is the next Internet” (Tony Tether, Director, DARPA)

• Machine learning is the hot new thing” (John Hennessy, President, Stanford)

• “Web rankings today are mostly a matter of machine learning” (Prabhakar Raghavan, Dir. Research, Yahoo)

• “Machine learning is going to result in a real revolution” (Greg Papadopoulos, CTO, Sun)

• “Machine learning is today’s discontinuity” (Jerry Yang, CEO, Yahoo)

This course: foundations of Machine Learning

A2 Â

and Data Science

• what kinds of tasks we can hope to learn, and from what kind of data

Goals of Machine Learning Theory Develop and analyze models to understand:

• what types of guarantees might we hope to achieve

• prove guarantees for practically successful algs (when will they succeed, how long will they take?)

• Algorithms

Interesting connections to other areas including:

• Optimization

• Probability & Statistics • Game Theory

• Information Theory • Complexity Theory

• develop new algs that provably meet desired criteria (potentially within new learning paradigms)

9

Example: Supervised Classification

Goal: use emails seen so far to produce good prediction rule for future data.

Not spam spam

Decide which emails are spam and which are important.

Supervised classification

10

example label

Reasonable RULES:

Predict SPAM if unknown AND (money OR pills)

Predict SPAM if 2money + 3pills –5 known > 0

Represent each message by features. (e.g., keywords, spelling, etc.)

Example: Supervised Classification

+

-

+ + +

- -

-

Linearly separable

11

Two Main Aspects of Supervised Learning

Algorithm Design. How to optimize?

Automatically generate rules that do well on observed data.

Confidence Bounds, Generalization Guarantees, Sample Complexity

Confidence for rule effectiveness on future data.

Well understood for passive supervised learning.

Using Unlabeled Data and Interaction for Learning

Computer Vision

Search/Information Retrieval

Computational Biology

Spam Detection

Medical Diagnosis Robotics

Application Areas

13

Billions of webpages

Only a tiny fraction can be annotated by human experts.

Massive Amounts of Raw Data

Images Protein sequences

http://images.google.com/imgres?imgurl=http://www.teachenglishinasia.net/files/u2/purple_lotus_flower.jpg&imgrefurl=http://www.teachenglishinasia.net/asiablog/asian-water-lilies-and-lotus-flowers&usg=__jQMsElfDOQGWm-hebjVtJDqL-40=&h=335&w=500&sz=28&hl=en&start=3&sig2=hmb0FR5kLCXdU2BdwjCNcA&zoom=1&itbs=1&tbnid=r_JpUcIxETAUoM:&tbnh=87&tbnw=130&prev=/images?q=flower&hl=en&gbv=2&tbs=isch:1&ei=Is1yTcKaOcGAlAe0wqGmAQ

http://images.google.com/imgres?imgurl=http://www.soccerballworld.com/images/Ball-UEFA-FINALE.jpg&imgrefurl=http://www.soccerballworld.com/UEFA-Champions-Finale.htm&usg=__IkP2FDq2F5yEyjqW0wDIthrifzc=&h=300&w=300&sz=15&hl=en&start=5&sig2=Sw_H6_M6TzTbVg5PqjP6lw&zoom=1&itbs=1&tbnid=E1Xl3YAzGDlvgM:&tbnh=116&tbnw=116&prev=/images?q=ball&hl=en&gbv=2&tbs=isch:1&ei=-8xyTbz0OISclge849RL

http://images.google.com/imgres?imgurl=http://ceoworld.biz/ceo/wp-content/uploads/2009/07/microsoft_logo.jpg&imgrefurl=http://ceoworld.biz/ceo/2009/07/13/microsoft-to-sell-razorfish-wpp-omnicom-publicis-groupe-interpublic-group-and-dentsu-possible-bidder&usg=__D-yYVRjhhUHbi2sYkh2hyP_5EK4=&h=299&w=300&sz=5&hl=en&start=2&sig2=4orOiNNbbV2i6Ii7SFMdmg&zoom=1&itbs=1&tbnid=Sk8jPaSR61onuM:&tbnh=116&tbnw=116&prev=/images?q=microsoft&hl=en&gbv=2&tbs=isch:1&ei=1MtyTbH9IYOglAeJrq2bAQ

http://images.google.com/imgres?imgurl=http://www.treehugger.com/ec-rnd-005.jpg&imgrefurl=http://www.treehugger.com/files/2008/07/17-electric-cars-overview-2005-to-2008.php&usg=__4G2QjNYXIHEYVXE6DoqIE5zzzrY=&h=322&w=468&sz=40&hl=en&start=9&sig2=P7jmw1vmKb715x8CELHezQ&zoom=1&itbs=1&tbnid=iGci7mkzT2KN2M:&tbnh=88&tbnw=128&prev=/images?q=car&hl=en&gbv=2&tbs=isch:1&ei=qMtyTcilGIOClAey9IGqCw

http://images.google.com/imgres?imgurl=http://images.nymag.com/news/features/newface080811_lede_560.jpg&imgrefurl=http://nymag.com/news/features/48948/&usg=__cimnczuBny94AVQhDb04KyNUIm4=&h=482&w=560&sz=50&hl=en&start=11&sig2=4wGA_vH60nBqlHqvekl3lA&zoom=1&itbs=1&tbnid=gg1XN95D7aTn3M:&tbnh=114&tbnw=133&prev=/images?q=face&hl=en&gbv=2&tbs=isch:1&ei=dMtyTbuKL8OqlAecwcFN

http://images.google.com/imgres?imgurl=http://www.classicsavers.com/casablanca.jpg&imgrefurl=http://www.classicsavers.com/Casablanca.html&h=600&w=800&sz=72&tbnid=wSXOd5UUibIJ:&tbnh=106&tbnw=141&start=3&prev=/images?q=casablanca&hl=en&lr=&ie=UTF-8

http://images.google.com/imgres?imgurl=http://www.flash-slideshow-maker.com/images/help_clip_image020.jpg&imgrefurl=http://www.flash-slideshow-maker.com/flash-images/&usg=__jMIGXrfx-_pXlchniUcjT2UYT9k=&h=422&w=554&sz=35&hl=en&start=1&sig2=YCfTn6kxhcufspbxJdXhrQ&zoom=1&itbs=1&tbnid=_ruOuQ2EjL3wOM:&tbnh=101&tbnw=133&prev=/images?q=images&hl=en&sa=X&gbv=2&tbas=0&tbs=isch:1&ei=n8hyTej5N8X_lgffrdVF

http://images.google.com/imgres?imgurl=http://www.dynamicdrive.com/dynamicindex4/lightbox2/flower.jpg&imgrefurl=http://www.dynamicdrive.com/dynamicindex4/lightbox2/index.htm&usg=__6G24ltZqq0UGAerNdiENN4jOqqQ=&h=509&w=500&sz=46&hl=en&start=4&sig2=P_pNfXebhTzd8n0DrpgNhw&zoom=1&itbs=1&tbnid=--IQZoA0t4jdOM:&tbnh=131&tbnw=129&prev=/images?q=images&hl=en&sa=X&gbv=2&tbas=0&tbs=isch:1&ei=n8hyTej5N8X_lgffrdVF

http://images.google.com/imgres?imgurl=http://www.nrl.navy.mil/content_images/clem.jpg&imgrefurl=http://www.nrl.navy.mil/media/popular-images/&usg=__Cj15wZbDIqpRCeVmO7SsP5XIOzM=&h=1213&w=1720&sz=1347&hl=en&start=3&sig2=TRnaOP-O9wUtTBwS0_vDZA&zoom=1&itbs=1&tbnid=EprKgSAflWdsYM:&tbnh=106&tbnw=150&prev=/images?q=images&hl=en&sa=X&gbv=2&tbas=0&tbs=isch:1&ei=n8hyTej5N8X_lgffrdVF

http://images.google.com/imgres?imgurl=http://hirise.lpl.arizona.edu/HiBlog/wp-content/uploads/2008/03/mex_phobos.jpg&imgrefurl=http://hirise.lpl.arizona.edu/HiBlog/category/hirise/news-events/special-images/&usg=__DLojkIcAQfQzGQ4oBrU4aT3OcHI=&h=400&w=400&sz=29&hl=en&start=15&sig2=cqA6Gj5KQHXSZIuOwlp0iQ&zoom=1&itbs=1&tbnid=QIVdyHeoWfEyFM:&tbnh=124&tbnw=124&prev=/images?q=images&hl=en&sa=X&gbv=2&tbas=0&tbs=isch:1&ei=n8hyTej5N8X_lgffrdVF

http://images.google.com/imgres?imgurl=http://www.mabima.net/company/images/Tucker_House_2003_640.JPG&imgrefurl=http://www.mabima.net/company/&usg=__yK2ELUzEORpde1XjSFJTrRDtFxc=&h=426&w=640&sz=69&hl=en&start=4&sig2=Q0e47SnpfYjO5jf5DQQk-Q&zoom=1&itbs=1&tbnid=GD54wp6WHlrhMM:&tbnh=91&tbnw=137&prev=/images?q=house&hl=en&gbv=2&tbs=isch:1&ei=KMtyTZnAE8aqlAegju1U

http://images.google.com/imgres?imgurl=http://www.popsci.com/files/imagecache/article_image_large/articles/ie2.jpg&imgrefurl=http://www.popsci.com/gadgets/article/2010-08/microsoft-fight-between-revenue-and-privacy-money-1-privacy-zero&usg=__wfaTvyWsRtJGRnAP5yJyb-kp3u0=&h=333&w=500&sz=22&hl=en&start=1&sig2=PTP24GqOtv3iVaxKrsl8dg&zoom=1&itbs=1&tbnid=_nE8aJVhvxfpbM:&tbnh=87&tbnw=130&prev=/images?q=internet&hl=en&gbv=2&tbs=isch:1&ei=FsxyTaDEPMWAlAePlcFS

14

Expert Labeler

Semi-Supervised Learning

raw data

face not face

Labeled data

Classifier

http://images.google.com/imgres?imgurl=http://static.howstuffworks.com/gif/house-selling-1.jpg&imgrefurl=http://home.howstuffworks.com/real-estate/house-selling.htm&usg=__qnE74MTnGIEKpohCSP6iTM9E8EA=&h=327&w=400&sz=24&hl=en&start=3&sig2=G6Bq69GdXS9x6hMpDk3HfA&zoom=1&itbs=1&tbnid=3fiW8K30_ygY1M:&tbnh=101&tbnw=124&prev=/images?q=house&hl=en&gbv=2&tbs=isch:1&ei=RF9sTcWkOsL88AbtremUBQ

http://images.google.com/imgres?imgurl=http://www.socksoff.co.uk/00001/page05/Lone_Tree_1600.jpg&imgrefurl=http://www.hongkiat.com/blog/100-absolutely-beautiful-nature-wallpapers-for-your-desktop/&usg=__dsyO8OzB7rmE16btYKAa2WrNJtQ=&h=1200&w=1600&sz=693&hl=en&start=9&sig2=N22ozJ4mtT9DuMR9P-wNOg&zoom=1&itbs=1&tbnid=UU66VMJn9kPePM:&tbnh=113&tbnw=150&prev=/images?q=tree&hl=en&gbv=2&tbs=isch:1&ei=YsxyTearFsSclge1p_hf

http://images.google.com/imgres?imgurl=http://www.wallpaperbase.com/wallpapers/animals/fish/fish_4.jpg&imgrefurl=http://www.wallpaperbase.com/animals-fish.shtml&usg=__2GMQzBfy-zcwD5e3ceBpUaGJGsE=&h=768&w=1024&sz=115&hl=en&start=6&sig2=86Ql8saH12Fk6g3FoyAagw&zoom=1&itbs=1&tbnid=vPHzH3MqjP6xOM:&tbnh=113&tbnw=150&prev=/images?q=fish&hl=en&gbv=2&tbs=isch:1&ei=TstyTb3AB8Oclget89ChAw

15

Active Learning

face

O

Expert Labeler

raw data

Classifier

not face

http://images.google.com/imgres?imgurl=http://www.socksoff.co.uk/00001/page05/Lone_Tree_1600.jpg&imgrefurl=http://www.hongkiat.com/blog/100-absolutely-beautiful-nature-wallpapers-for-your-desktop/&usg=__dsyO8OzB7rmE16btYKAa2WrNJtQ=&h=1200&w=1600&sz=693&hl=en&start=9&sig2=N22ozJ4mtT9DuMR9P-wNOg&zoom=1&itbs=1&tbnid=UU66VMJn9kPePM:&tbnh=113&tbnw=150&prev=/images?q=tree&hl=en&gbv=2&tbs=isch:1&ei=YsxyTearFsSclge1p_hf

http://images.google.com/imgres?imgurl=http://www.wallpaperbase.com/wallpapers/animals/fish/fish_4.jpg&imgrefurl=http://www.wallpaperbase.com/animals-fish.shtml&usg=__2GMQzBfy-zcwD5e3ceBpUaGJGsE=&h=768&w=1024&sz=115&hl=en&start=6&sig2=86Ql8saH12Fk6g3FoyAagw&zoom=1&itbs=1&tbnid=vPHzH3MqjP6xOM:&tbnh=113&tbnw=150&prev=/images?q=fish&hl=en&gbv=2&tbs=isch:1&ei=TstyTb3AB8Oclget89ChAw

16

• Semi-Supervised Learning

Using cheap unlabeled data in addition to labeled data.

• Active Learning

The algorithm interactively asks for labels of informative examples.

Other Protocols for Supervised Learning

Theoretical understanding entirely lacking 10 years ago.

Lots of progress recently. We will cover some of these.

Distributed Learning Many ML problems today involve massive amounts of data distributed across multiple locations.

Often would like low error hypothesis wrt the overall distrib.

http://cdn.wikimg.net/strategywiki/images/f/fa/Globe.svg

Distributed Learning

E.g., medical data

Data distributed across multiple locations.

E.g., scientific data

Data distributed across multiple locations.

• Data distributed across multiple locations.

• Each has a piece of the overall data pie.

Important question: how much communication?

Plus, privacy & incentives.

• To learn over the combined D, must communicate.

The World is Changing Machine Learning

Many competing resources & constraints. E.g.,

• Computational efficiency (noise tolerant algos)

• Communication

• Human labeling effort

• Statistical efficiency

• Privacy/Incentives

New approaches. E.g.,

• Semi-supervised learning

• Distributed learning

• Interactive learning

• Multi-task/transfer learning

• Never ending learning

• Deep Learning

Structure of the Class

• Simple algos and hardness results for supervised learning.

• Classic, state of the art algorithms: AdaBoost and SVM (kernel based mehtods).

• Basic models: PAC, SLT.

• Standard Sample Complexity Results (VC dimension)

• Weak-learning vs. Strong-learning

Basic Learning Paradigm: Passive Supervised Learning

• Modern Sample Complexity Results • Rademacher Complexity; localization

• Margin analysis of Boosting and SVM

• Incorporating Unlabeled Data in the Learning Process.

• Incorporating Interaction in the Learning Process:

• Active Learning

• More general types of Interaction

Other Learning Paradigms

• Distributed Learning.

• Transfer learning/Multi-task learning/Life-long learning.

• Deep Learning.

• Foundations and algorithms for constraints/externalities. E.g., privacy, limited memory, and communication.

• Online Learning, Optimization, and Game Theory

• connections to Boosting

Foundations of Machine Learning and Data Science › ~ninamf › courses › 806 ›...

Documents

CSC491-lect09 week91 Software The computer programs that governs the operation of the computer Control the working of the computer hardware Computer program:

Maria-Florina Balcan Tuomas Sandholm · Maria-Florina Balcan Carnegie Mellon University ninamf@cs.cmu.edu Tuomas Sandholm Carnegie Mellon University Optimized Markets, Inc. Strategic

Lect09 Array

Lect09 adv-branch-prediction

Machine Learning 10-601 - Carnegie Mellon School of ...ninamf/courses/601sp15/slides/06... · Machine Learning 10-601 ... Training Logistic Regression: MCLE ... Bad news: no closed-form

Lect09: Propagation Models-I - Power Unitpowerunit-ju.com/.../11/Lecture9_Propagation_Models_I.pdf · 2018. 10. 11. · Propagation models Basic propagation models: 2- Free Space

Lect09 10 Interrupts

Sponsored Search Acution Design Via Machine …ninamf/courses/601sp15/slides/20...2015/04/01 · [Balcan, Beygelzimer, Langford, ICML’06] A2 the first algorithm which is robust

MARIA-FLORINA BALCAN Carnegie Mellon University Pittsburgh, …ninamf/cv_nina.pdf · 2020-01-16 · 5 2. Noise in Classification.Maria-Florina Balcan and Nika Haghtalab. Book Chapter

Pranjal Awasthi Maria-Florina Balcan Ruth Urnerarxiv.org/pdf/1503.03594.pdf · Maria-Florina Balcan ninamf@cs.cmu.edu Nika Haghtalab nhaghtal@cs.cmu.edu Ruth Urner rurner@tuebingen.mpg.de

NVC Bio105 Lect09 Nervous I Sp16 Handouts/Lect09... · Sodium-potassium pump The sodium-potassium pump uses cellular energy (ATP) to pump sodium ions out of the cell and potassium

Machine Learning Theory - Carnegie Mellon School of ...ninamf/courses/601sp15/slides/08_Theory_2-9-201… · Goals of Machine Learning Theory ... Proof Assume k bad hypotheses h1,h2,

Maria-Florina Balcan Travis Dick Tuomas Sandholm …ninamf/papers/learn-branch.pdfdesigner to use far fewer samples than in the worst case if the data is well-structured. 1.1 Related

Advanced Branch Prediction - University of …ece752.ece.wisc.edu/lect09-adv-branch-prediction.pdf · Advanced Branch Prediction ... PC = 010110100101 01 01 0110 000000 000001 000010

Ching-Yuan Yang National Chung-Hsing University Department ...cc.ee.nchu.edu.tw/~aiclab/teaching/AIC/lect09.pdf · 1 Ching-Yuan Yang National Chung-Hsing University Department of

The Living Earth - George Mason Universitysolar.gmu.edu/teaching/ASTR111_2006/lect09/ch09_note_Zhang.pdf · The Living Earth Chapter Nine. ... – Dense materials such as iron sank

Learning Valuation Functions - Carnegie Mellon School of ...ninamf/papers/learn-vals.pdf · guish between the case of having low ... to much less general function classes than the

Lect09 string

Machine Learning - Carnegie Mellon School of Computer …ninamf/courses/601sp15/slides/01... · on fMRI brain activity . 4 ... Behavior . 7 What You’ll Learn in This Course

Recitation Clustering and PCA Colin White Nupur Chatterji, Kenny …ninamf/courses/401sp18/recitations... · 2018-04-19 · 3 of 42 Applications (Clustering comes up everywhere