15

Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

Embed Size (px)

Citation preview

Page 1: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:
Page 2: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:
Page 3: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

Machine Learning

by John Paul Mueller and Luca Massaron

Page 4: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

Machine Learning For Dummies®

Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com

Copyright © 2016 by John Wiley & Sons, Inc., Hoboken, New Jersey

Media and software compilation copyright © 2016 by John Wiley & Sons, Inc. All rights reserved.

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.

For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit www.wiley.com/techsupport.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2016940023

ISBN: 978-1-119-24551-3

ISBN 978-1-119-24577-3 (ebk); ISBN ePDF 978-1-119-24575-9 (ebk)

Manufactured in the United States of America

10 9 8 7 6 5 4 3 2 1

Page 5: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

Contents at a GlanceIntroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Part 1: Introducing How Machines Learn . . . . . . . . . . . . . . . . . . . . . 7CHAPTER 1: Getting the Real Story about AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9CHAPTER 2: LearningintheAge ofBigData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23CHAPTER 3: Having a Glance at the Future . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Part 2: Preparing Your Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . 45CHAPTER 4: InstallinganRDistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47CHAPTER 5: Coding in R Using RStudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63CHAPTER 6: InstallingaPythonDistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89CHAPTER 7: Coding in Python Using Anaconda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137

Part 3: Getting Started with the Math Basics . . . . . . . . . . . . . . . 145CHAPTER 9: Demystifyingthe MathBehindMachineLearning . . . . . . . . . . . . . . . . . 147CHAPTER 10:DescendingtheRight Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167CHAPTER 11: Validating Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181CHAPTER 12:StartingwithSimpleLearners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

Part 4: Learning from Smart and Big Data . . . . . . . . . . . . . . . . . . 217CHAPTER 13:PreprocessingData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219CHAPTER 14:LeveragingSimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237CHAPTER 15:WorkingwithLinearModelsthe EasyWay . . . . . . . . . . . . . . . . . . . . . . . . 257CHAPTER 16:HittingComplexitywithNeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . 279CHAPTER 17: Going a Step beyond Using Support Vector Machines . . . . . . . . . . . . . . 297CHAPTER 18:ResortingtoEnsemblesofLearners . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

Part 5: Applying Learning to Real Problems . . . . . . . . . . . . . . . . 331CHAPTER 19:ClassifyingImages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333CHAPTER 20:ScoringOpinionsandSentiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349CHAPTER 21:RecommendingProductsandMovies . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

Part 6: The Part of Tens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383CHAPTER 22: Ten Machine Learning Packages to Master . . . . . . . . . . . . . . . . . . . . . . . . 385CHAPTER 23:TenWaystoImproveYourMachineLearningModels . . . . . . . . . . . . . . 391

INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

Page 6: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:
Page 7: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

Table of ContentsINTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

AboutThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1FoolishAssumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2IconsUsedinThisBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3BeyondtheBook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4WheretoGofromHere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5

PART 1: INTRODUCING HOW MACHINES LEARN . . . . . . . . . . . 7

CHAPTER 1: Getting the Real Story about AI . . . . . . . . . . . . . . . . . . . . . . . . 9Moving beyond the Hype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10DreamingofElectricSheep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

UnderstandingthehistoryofAIandmachinelearning . . . . . . . . . .12ExploringwhatmachinelearningcandoforAI . . . . . . . . . . . . . . . .13Consideringthegoalsofmachinelearning . . . . . . . . . . . . . . . . . . . .13Definingmachinelearninglimitsbasedonhardware . . . . . . . . . . .14

OvercomingAIFantasies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15DiscoveringthefadusesofAIandmachinelearning . . . . . . . . . . .16ConsideringthetrueusesofAIandmachinelearning . . . . . . . . . .16Beinguseful;beingmundane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

Considering the Relationship between AI and Machine Learning . . . .19ConsideringAIandMachineLearningSpecifications . . . . . . . . . . . . . .20DefiningtheDividebetweenArtandEngineering . . . . . . . . . . . . . . . . .20

CHAPTER 2: Learning in the Age of Big Data . . . . . . . . . . . . . . . . . . . . . . . 23DefiningBigData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24ConsideringtheSourcesofBigData . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

Buildinganewdatasource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26Using existing data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .27Locating test data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28

SpecifyingtheRoleofStatisticsinMachineLearning . . . . . . . . . . . . . .29UnderstandingtheRoleofAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . .30

Definingwhatalgorithmsdo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .30Consideringthefivemaintechniques . . . . . . . . . . . . . . . . . . . . . . . .30

DefiningWhatTrainingMeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32

CHAPTER 3: Having a Glance at the Future . . . . . . . . . . . . . . . . . . . . . . . . 35CreatingUsefulTechnologiesfortheFuture . . . . . . . . . . . . . . . . . . . . .36

Consideringtheroleofmachinelearninginrobots . . . . . . . . . . . . .36Usingmachinelearninginhealthcare . . . . . . . . . . . . . . . . . . . . . . . .37Creatingsmartsystemsforvariousneeds . . . . . . . . . . . . . . . . . . . .37

Table of Contents v

Page 8: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

Usingmachinelearninginindustrialsettings . . . . . . . . . . . . . . . . . .38Understandingtheroleofupdatedprocessors and other hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39

DiscoveringtheNewWorkOpportunitieswith Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39

Workingforamachine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40Workingwithmachines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41Repairingmachines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41Creatingnewmachinelearningtasks . . . . . . . . . . . . . . . . . . . . . . . . .42Devisingnewmachinelearningenvironments . . . . . . . . . . . . . . . . .42

AvoidingthePotentialPitfallsofFutureTechnologies . . . . . . . . . . . . .43

PART 2: PREPARING YOUR LEARNING TOOLS . . . . . . . . . . . . . . 45

CHAPTER 4: Installing an R Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47ChoosinganRDistributionwithMachineLearninginMind . . . . . . . . .48Installing R on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49Installing R on Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56Installing R on Mac OS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57DownloadingtheDatasetsandExampleCode . . . . . . . . . . . . . . . . . . . .59

Understanding the datasets used in this book . . . . . . . . . . . . . . . . .59Definingthecoderepository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60

CHAPTER 5: Coding in R Using RStudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63UnderstandingtheBasicDataTypes . . . . . . . . . . . . . . . . . . . . . . . . . . . .64Working with Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66OrganizingDataUsingLists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66Working with Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67

Creatingabasicmatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68Changingthevectorarrangement . . . . . . . . . . . . . . . . . . . . . . . . . . .69Accessingindividualelements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .69Namingtherowsandcolumns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70

InteractingwithMultipleDimensionsUsingArrays . . . . . . . . . . . . . . . .71Creating a basic array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71Namingtherowsandcolumns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72

CreatingaDataFrame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74Understandingfactors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74Creatingabasicdataframe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76Interactingwithdataframes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .77Expandingadataframe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79

PerformingBasicStatisticalTasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80Making decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .80Working with loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82

vi Machine Learning For Dummies

Page 9: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

Performingloopedtaskswithoutloops . . . . . . . . . . . . . . . . . . . . . . .84Workingwithfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85Findingmeanandmedian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85Charting your data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .87

CHAPTER 6: Installing a Python Distribution . . . . . . . . . . . . . . . . . . . . . . 89ChoosingaPythonDistributionwithMachineLearninginMind . . . . .90

GettingContinuumAnalyticsAnaconda . . . . . . . . . . . . . . . . . . . . . .91Getting Enthought Canopy Express . . . . . . . . . . . . . . . . . . . . . . . . . .92Getting pythonxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93Getting WinPython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93

Installing Python on Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93Installing Python on Mac OS X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94Installing Python on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96DownloadingtheDatasetsandExampleCode . . . . . . . . . . . . . . . . . . . .99

UsingJupyterNotebook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100Definingthecoderepository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101Understanding the datasets used in this book . . . . . . . . . . . . . . . .106

CHAPTER 7: Coding in Python Using Anaconda . . . . . . . . . . . . . . . . . . . 109WorkingwithNumbersandLogic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110

Performingvariableassignments . . . . . . . . . . . . . . . . . . . . . . . . . . .112Doingarithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113ComparingdatausingBooleanexpressions . . . . . . . . . . . . . . . . . .115

Creating and Using Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117InteractingwithDates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .118Creating and Using Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .119

Creatingreusablefunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .119Callingfunctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .121Working with global and local variables . . . . . . . . . . . . . . . . . . . . . .123

UsingConditionalandLoopStatements . . . . . . . . . . . . . . . . . . . . . . . .124Makingdecisionsusingtheifstatement . . . . . . . . . . . . . . . . . . . . .124Choosingbetweenmultipleoptionsusingnesteddecisions . . . .125Performingrepetitivetasksusingfor . . . . . . . . . . . . . . . . . . . . . . . .126Usingthewhilestatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .127

StoringDataUsingSets,Lists,andTuples . . . . . . . . . . . . . . . . . . . . . . .128Creating sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .128Performingoperationsonsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .128Creating lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .129Creating and using tuples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131

DefiningUsefulIterators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .132IndexingDataUsingDictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134Storing Code in Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134

Table of Contents vii

Page 10: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . 137MeetingthePrecursorsSAS,Stata,andSPSS . . . . . . . . . . . . . . . . . . . .138LearninginAcademiawithWeka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .140AccessingComplexAlgorithmsEasilyUsingLIBSVM . . . . . . . . . . . . . .141Running As Fast As Light with Vowpal Wabbit . . . . . . . . . . . . . . . . . . .142VisualizingwithKnimeandRapidMiner . . . . . . . . . . . . . . . . . . . . . . . . .143DealingwithMassiveDatabyUsingSpark . . . . . . . . . . . . . . . . . . . . . .144

PART 3: GETTING STARTED WITH THE MATH BASICS . . . . . 145

CHAPTER 9: Demystifying the Math Behind Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147WorkingwithData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .148

Creatingamatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .150Understanding basic operations . . . . . . . . . . . . . . . . . . . . . . . . . . . .152Performingmatrixmultiplication . . . . . . . . . . . . . . . . . . . . . . . . . . .152Glancingatadvancedmatrixoperations . . . . . . . . . . . . . . . . . . . . .155Usingvectorizationeffectively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .155

ExploringtheWorldofProbabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . .158Operating on probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .159ConditioningchancebyBayes’theorem . . . . . . . . . . . . . . . . . . . . .160

DescribingtheUseofStatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .163

CHAPTER 10: Descending the Right Curve . . . . . . . . . . . . . . . . . . . . . . . . . . 167InterpretingLearningAsOptimization . . . . . . . . . . . . . . . . . . . . . . . . . .168

Supervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .168Unsupervised learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .169Reinforcementlearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .169The learning process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .170

Exploring Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .173DescendingtheErrorCurve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .174UpdatingbyMini-BatchandOnline . . . . . . . . . . . . . . . . . . . . . . . . . . . .177

CHAPTER 11: Validating Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 181CheckingOut-of-SampleErrors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .182

Lookingforgeneralization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .183GettingtoKnowtheLimitsofBias . . . . . . . . . . . . . . . . . . . . . . . . . . . . .184KeepingModelComplexityinMind . . . . . . . . . . . . . . . . . . . . . . . . . . . .186KeepingSolutionsBalanced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .188

Depictinglearningcurves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .189Training,Validating,andTesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .191Resorting to Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .191LookingforAlternativesinValidation . . . . . . . . . . . . . . . . . . . . . . . . . . .193

viii Machine Learning For Dummies

Page 11: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

OptimizingCross-ValidationChoices . . . . . . . . . . . . . . . . . . . . . . . . . . .194Exploringthespaceofhyper-parameters . . . . . . . . . . . . . . . . . . . .195

AvoidingSampleBiasandLeakageTraps . . . . . . . . . . . . . . . . . . . . . . .196Watchingoutforsnooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .198

CHAPTER 12: Starting with Simple Learners . . . . . . . . . . . . . . . . . . . . . . . . 199DiscoveringtheIncrediblePerceptron . . . . . . . . . . . . . . . . . . . . . . . . . .200

Fallingshortofamiracle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .200Touchingthenonseparabilitylimit . . . . . . . . . . . . . . . . . . . . . . . . . .202

GrowingGreedyClassificationTrees . . . . . . . . . . . . . . . . . . . . . . . . . . .204Predictingoutcomesbysplittingdata . . . . . . . . . . . . . . . . . . . . . . .204Pruning overgrown trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .208

Taking a Probabilistic Turn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .209UnderstandingNaïveBayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .209EstimatingresponsewithNaïveBayes . . . . . . . . . . . . . . . . . . . . . . .212

PART 4: LEARNING FROM SMART AND BIG DATA . . . . . . . . 217

CHAPTER 13: Preprocessing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219GatheringandCleaningData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .220RepairingMissingData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .221

Identifyingmissingdata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .221Choosingtherightreplacementstrategy . . . . . . . . . . . . . . . . . . . .222

TransformingDistributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .225CreatingYourOwnFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .227

Understandingtheneedtocreatefeatures . . . . . . . . . . . . . . . . . .227Creatingfeaturesautomatically . . . . . . . . . . . . . . . . . . . . . . . . . . . .228

CompressingData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .230DelimitingAnomalousData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .232

CHAPTER 14: Leveraging Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237MeasuringSimilaritybetweenVectors . . . . . . . . . . . . . . . . . . . . . . . . . .238

Understandingsimilarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .238Computingdistancesforlearning . . . . . . . . . . . . . . . . . . . . . . . . . . .239

UsingDistancestoLocateClusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . .240Checkingassumptionsandexpectations . . . . . . . . . . . . . . . . . . . . .241Inspectingthegearsofthealgorithm . . . . . . . . . . . . . . . . . . . . . . .243

TuningtheK-MeansAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .244ExperimentingK-meansreliability . . . . . . . . . . . . . . . . . . . . . . . . . .245Experimentingwithhowcentroidsconverge . . . . . . . . . . . . . . . . .247

SearchingforClassificationbyK-NearestNeighbors . . . . . . . . . . . . . .251LeveragingtheCorrectKParameter . . . . . . . . . . . . . . . . . . . . . . . . . . .252

Understandingthekparameter . . . . . . . . . . . . . . . . . . . . . . . . . . . .252Experimentingwithaflexiblealgorithm . . . . . . . . . . . . . . . . . . . . .253

Table of Contents ix

Page 12: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

CHAPTER 15: Working with Linear Models the Easy Way . . . . . . . . 257StartingtoCombineVariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .258MixingVariablesofDifferentTypes . . . . . . . . . . . . . . . . . . . . . . . . . . . .264Switching to Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .267

Specifyingabinaryresponse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .267Handlingmultipleclasses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .270

Guessing the Right Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .271Definingtheoutcomeoffeaturesthatdon’tworktogether . . . . .271Solvingoverfittingbyusingselection . . . . . . . . . . . . . . . . . . . . . . . .272

LearningOneExampleataTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .274Using gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .275UnderstandinghowSGDisdifferent . . . . . . . . . . . . . . . . . . . . . . . .275

CHAPTER 16: Hitting Complexity with Neural Networks . . . . . . . . 279LearningandImitatingfromNature . . . . . . . . . . . . . . . . . . . . . . . . . . . .280

Goingforthwithfeed-forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .281Going even deeper down the rabbit hole . . . . . . . . . . . . . . . . . . . .283GettingBackwithBackpropagation . . . . . . . . . . . . . . . . . . . . . . . . .286

StrugglingwithOverfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .289Understandingtheproblem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .289Opening the black box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .290

IntroducingDeepLearning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .293

CHAPTER 17: Going a Step beyond Using Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297RevisitingtheSeparationProblem:ANewApproach . . . . . . . . . . . . .298ExplainingtheAlgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .299

GettingintothemathofanSVM . . . . . . . . . . . . . . . . . . . . . . . . . . . .301Avoidingthepitfallsofnonseparability . . . . . . . . . . . . . . . . . . . . . .302

ApplyingNonlinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .303Demonstratingthekerneltrickbyexample . . . . . . . . . . . . . . . . . .305Discoveringthedifferentkernels . . . . . . . . . . . . . . . . . . . . . . . . . . .306

IllustratingHyper-Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .308ClassifyingandEstimatingwithSVM . . . . . . . . . . . . . . . . . . . . . . . . . . .309

CHAPTER 18: Resorting to Ensembles of Learners . . . . . . . . . . . . . . . . 315LeveragingDecisionTrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .316

Growingaforestoftrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .317Understandingtheimportancemeasures . . . . . . . . . . . . . . . . . . . .321

WorkingwithAlmostRandomGuesses . . . . . . . . . . . . . . . . . . . . . . . . .324BaggingpredictorswithAdaboost . . . . . . . . . . . . . . . . . . . . . . . . . .324

BoostingSmartPredictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .327Meeting again with gradient descent . . . . . . . . . . . . . . . . . . . . . . . .328

AveragingDifferentPredictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .329

x Machine Learning For Dummies

Page 13: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

PART 5: APPLYING LEARNING TO REAL PROBLEMS . . . . . . 331

CHAPTER 19: Classifying Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333WorkingwithaSetofImages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .334Extracting Visual Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .338RecognizingFacesUsingEigenfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . .340ClassifyingImages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .343

CHAPTER 20: Scoring Opinions and Sentiments . . . . . . . . . . . . . . . . . . . 349IntroducingNaturalLanguageProcessing . . . . . . . . . . . . . . . . . . . . . . .349Understanding How Machines Read . . . . . . . . . . . . . . . . . . . . . . . . . . .350

Processing and enhancing text . . . . . . . . . . . . . . . . . . . . . . . . . . . . .352Scrapingtextualdatasetsfromtheweb . . . . . . . . . . . . . . . . . . . . .357Handlingproblemswithrawtext . . . . . . . . . . . . . . . . . . . . . . . . . . .360

UsingScoringandClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .362Performingclassificationtasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .362Analyzingreviewsfrome-commerce . . . . . . . . . . . . . . . . . . . . . . . .365

CHAPTER 21: Recommending Products and Movies . . . . . . . . . . . . . . 369Realizing the Revolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .370DownloadingRatingData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .371

Trudging through the MovieLens dataset . . . . . . . . . . . . . . . . . . . .371Navigatingthroughanonymouswebdata . . . . . . . . . . . . . . . . . . .373Encounteringthelimitsofratingdata . . . . . . . . . . . . . . . . . . . . . . .374

LeveragingSVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .375ConsideringtheoriginsofSVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . .376UnderstandingtheSVDconnection . . . . . . . . . . . . . . . . . . . . . . . . .377SeeingSVDinaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .378

PART 6: THE PART OF TENS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383

CHAPTER 22: Ten Machine Learning Packages to Master . . . . . . . . 385Cloudera Oryx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .386CUDA-Convnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .386ConvNetJS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .387e1071 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .387gbm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .388Gensim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .388glmnet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .388randomForest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .389SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .389XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .390

Table of Contents xi

Page 14: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

CHAPTER 23: Ten Ways to Improve Your Machine Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391Studying Learning Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .392Using Cross-Validation Correctly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .393Choosing the Right Error or Score Metric . . . . . . . . . . . . . . . . . . . . . . .394SearchingfortheBestHyper-Parameters . . . . . . . . . . . . . . . . . . . . . . .395Testing Multiple Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .395Averaging Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .396Stacking Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .396Applying Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .397SelectingFeaturesandExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .397LookingforMoreData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .398

INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399

xii Machine Learning For Dummies

Page 15: Machine Learning For Dummies - Buch.de Learning For Dummies ... CHAPTER 8: Exploring Other Machine Learning Tools . . . . . . . . . . . . . . . . . . . . . . . . . 137 Part 3:

Introduction 1

Introduction

The term machine learning has all sorts of meanings attached to it today, especially after Hollywood’s (and others’) movie studios have gotten into the picture. Films such as Ex Machina have tantalized the imaginations of

moviegoers the world over and made machine learning into all sorts of things that it really isn’t. Of course, most of us have to live in the real world, where machine learning actually does perform an incredible array of tasks that have nothing to do with androids that can pass the Turing Test (fooling their makers into believing they’re human). Machine Learning For Dummies provides you with a view of machine learning in the real world and exposes you to the amazing feats you really can perform using this technology. Even though the tasks that you perform using machine learning may seem a bit mundane when compared to the movie version, by the time you finish this book, you realize that these mundane tasks have the power to impact the lives of everyone on the planet in nearly every aspect of their daily lives. In short, machine learning is an incredible technology — just not in the way that some people have imagined.

About This BookThe main purpose of Machine Learning For Dummies is to help you understand what machine learning can and can’t do for you today and what it might do for you in the future. You don’t have to be a computer scientist to use this book, even though it does contain many coding examples. In fact, you can come from any discipline that heavily emphasizes math because that’s how this book focuses on machine learning. Instead of dealing with abstractions, you see the concrete results of using specific algorithms to interact with big data in particular ways to obtain a certain, useful result. The emphasis is on useful because machine learning has the power to perform a wide array of tasks in a manner never seen before.

Part of the emphasis of this book is on using the right tools. This book uses both Python and R to perform various tasks. These two languages have special features that make them particularly useful in a machine learning setting. For example, Python provides access to a huge array of libraries that let you do just about any-thing you can imagine and more than a few you can’t. Likewise, R provides an ease of use that few languages can match. Machine Learning For Dummies helps you under-stand that both languages have their role to play and gives examples of when one language works a bit better than the other to achieve the goals you have in mind.