Upload
asim-jalis
View
1.665
Download
0
Embed Size (px)
Citation preview
NEURAL NETWORKS AND DEEPLEARNINGASIM JALIS
GALVANIZE
INTRO
ASIM JALISGalvanize/Zipfian, DataEngineeringCloudera, Microso!,SalesforceMS in Computer Sciencefrom University ofVirginia
GALVANIZE PROGRAMSProgram Duration
Data ScienceImmersive
12weeks
DataEngineeringImmersive
12weeks
WebDeveloperImmersive
6months
Galvanize U 1 year
TALK OVERVIEW
WHAT IS THIS TALK ABOUT?Using Neural Networksand Deep LearningTo recognize imagesBy the end of the classyou will be able tocreate your own deeplearning systems
HOW MANY PEOPLE HERE HAVEUSED NEURAL NETWORKS?
HOW MANY PEOPLE HERE HAVEUSED MACHINE LEARNING?
HOW MANY PEOPLE HERE HAVEUSED PYTHON?
DEEP LEARNING
WHAT IS MACHINE LEARNINGSelf-driving carsVoice recognitionFacial recognition
HISTORY OF DEEP LEARNING
HISTORY OF MACHINE LEARNINGInput Features Algorithm Output
Machine Human Human Machine
Machine Human Machine Machine
Machine Machine Machine Machine
FEATURE EXTRACTIONTraditionally data scientists to define featuresDeep learning systems are able to extract featuresthemselves
DEEP LEARNING MILESTONESYears Theme
1980s Backpropagation invented allows multi-layerNeural Networks
2000s SVMs, Random Forests and other classifiersovertook NNs
2010s Deep Learning reignited interest in NN
IMAGENETAlexNet submitted to the ImageNet ILSVRC challenge in2012 is partly responsible for the renaissance.Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton usedDeep Learning techniques.They combined this with GPUs, some other techniques.The result was a neural network that could classify imagesof cats and dogs.It had an error 16% compared to 26% for the runner up.
Ilya Sutskever, Alex Krizhevsky, Geoffrey Hinton
INDEED.COM/SALARY
MACHINE LEARNING
MACHINE LEARNING AND DEEPLEARNING
Deep Learning fits insideMachine LearningDeep Learning aMachine LearningtechniqueShare techniques forevaluating andoptimizing models
WHAT IS MACHINE LEARNING?Inputs: Vectors or points of high dimensionsOutputs: Either binary vectors or continuous vectorsMachine Learning finds the relationship between themUses statistical techniques
SUPERVISED VS UNSUPERVISEDSupervised: Data needs to be labeledUnsupervised: Data does not need to be labeled
TECHNIQUESClassificationRegressionClusteringRecommendationsAnomaly detection
CLASSIFICATION EXAMPLE:EMAIL SPAM DETECTION
CLASSIFICATION EXAMPLE:EMAIL SPAM DETECTION
Start with large collection of emails, labeled spam/not-spamConvert email text into vectors of 0s and 1s: 0 if a wordoccurs, 1 if it does notThese are called inputs or featuresSplit data set into training set (70%) and test set (30%)Use algorithm like Random Forest to build modelEvaluate model by running it on test set and capturingsuccess rate
CLASSIFICATION ALGORITHMSNeural NetworksRandom ForestSupport Vector Machines (SVM)Decision TreesLogistic RegressionNaive Bayes
CHOOSING ALGORITHMEvaluate different models on dataLook at the relative success ratesUse rules of thumb: some algorithms work better on somekinds of data
CLASSIFICATION EXAMPLESIs this tumor benign or cancerous?Is this lead profitable or not?Who will win the presidential elections?
CLASSIFICATION: POP QUIZIs classification supervised or unsupervised learning?
Supervised because you have to label the data.
CLUSTERING EXAMPLE: LOCATECELL PHONE TOWERS
Start with GPScoordinates of all cellphone usersRepresent data asvectorsLocate towers in biggestclusters
CLUSTERING EXAMPLE: T-SHIRTSWhat size should a t-shirt be?Everyone’s real t-shirtsize is differentLay out all sizes andclusterTarget large clusterswith XS, S, M, L, XL
CLUSTERING: POP QUIZIs clustering supervised or unsupervised?
Unsupervised because no labeling is required
RECOMMENDATIONS EXAMPLE:AMAZON
Model looks at userratings of booksViewing a book triggersimplicit ratingRecommend user newbooks
RECOMMENDATION: POP QUIZAre recommendation systems supervised or unsupervised?
Unsupervised
REGRESSIONLike classificationOutput is continuous instead of one from k choices
REGRESSION EXAMPLESHow many units of product will sell next monthWhat will student score on SATWhat is the market price of this houseHow long before this engine needs repair
REGRESSION EXAMPLE:AIRCRAFT PART FAILURE
Cessna collects datafrom airplane sensorsPredict when part needsto be replacedShip part to customer’sservice airport
REGRESSION: QUIZIs regression supervised or unsupervised?
Supervised
ANOMALY DETECTION EXAMPLE:CREDIT CARD FRAUD
Train model on goodtransactionsAnomalous activityindicates fraudCan pass transactiondown to human forinvestigation
ANOMALY DETECTION EXAMPLE:NETWORK INTRUSION
Train model on networklogin activityAnomalous activityindicates threatCan initiate alerts andlockdown procedures
ANOMALY DETECTION: QUIZIs anomaly detection supervised or unsupervised?
Unsupervised because we only train on normal data
FEATURE EXTRACTIONConverting data to feature vectorsNatural Language ProcessingPrincipal Component AnalysisAuto-Encoders
FEATURE EXTRACTION: QUIZIs feature extraction supervised or unsupervised?
Unsupervised
MACHINE LEARNING WORKFLOW
DEEP LEARNING USED FORFeature ExtractionClassificationRegression
HISTORY OF MACHINE LEARNINGInput Features Algorithm Output
Machine Human Human Machine
Machine Human Machine Machine
Machine Machine Machine Machine
DEEP LEARNING FRAMEWORKS
DEEP LEARNING FRAMEWORKSTensorFlow: NN library from GoogleTheano: Low-level GPU-enabled tensor libraryTorch7: NN library, uses Lua for binding, used by Facebookand GoogleCaffe: NN library by Berkeley AMPLabNervana: Fast GPU-based machines optimized for deeplearning
DEEP LEARNING FRAMEWORKSKeras, Lasagne, Blocks: NN libraries that make Theanoeasier to useCUDA: Programming model for using GPUs in general-purpose programmingcuDNN: NN library by Nvidia based on CUDA, can be usedwith Torch7, CaffeChainer: NN library that uses CUDA
DEEP LEARNING PROGRAMMINGLANGUAGES
All the frameworks support PythonExcept Torch7 which uses Lua for its binding language
TENSORFLOWTensorFlow originallydeveloped by GoogleBrain TeamAllows using GPUs fordeep learningalgorithmsSingle processor versionreleased in 2015Multiple processorversion released inMarch 2016
KERASSupports Theano andTensorFlow as back-endsProvides deep learningAPI on top of TensorFlowTensorFlow provideslow-level matrixoperations
TENSORFLOW: GEOFFREYHINTON, JEFF DEAN
KERAS: FRANCOIS CHOLLET
NEURAL NETWORKS
WHAT IS A NEURON?
Receives signal on synapseWhen trigger sends signal on axon
MATHEMATICAL NEURON
Mathematical abstraction, inspired by biological neuronEither on or off based on sum of input
MATHEMATICAL FUNCTION
Neuron is a mathematical functionAdds up (weighted) inputs and applies sigmoid (or otherfunction)This determines if it fires or not
WHAT ARE NEURAL NETWORKS?Biologically inspired machine learning algorithmMathematical neurons arranged in layersAccumulate signals from the previous layerFire when signal reaches threshold
NEURAL NETWORKS
NEURON INCOMINGEach neuron receivessignals from neurons inprevious layerSignal affected byweightSome are moreimportant than othersBias is the base signalthat the neuron receives
NEURON OUTGOINGEach neuron sends itssignal to the neurons inthe next layerSignals affected byweight
LAYERED NETWORK
Each layer looks at features identified by previous layer
US ELECTIONS
ELECTIONSConsider the electionsThis is a gated systemA way to aggregatedifferent views
HIGHEST LEVEL: STATES
NEXT LEVEL: COUNTIES
ELECTIONSIs this a Neural Network?How many layers does ithave?
NEURON LAYERSThe nomination is thelast layer, layer NStates are layer N-1Counties are layer N-2Districts are layer N-3Individuals are layer N-4Individual brains haveeven more layers
GRADIENT DESCENT
TRAINING: HOW DO WEIMPROVE?
Calculate error from desired goalIncrease weight of neurons who voted rightDecrease weight of neurons who voted wrongThis will reduce error
GRADIENT DESCENTThis algorithm is called gradient descentThink of error as function of weights
FEED FORWARDAlso called forwardpropagation or forwardpropInitialize inputsCalculate activation ofeach layerCalculate activation ofoutput layer
BACK PROPAGATIONUse forward prop tocalculate the errorError is function of allnetwork weightsAdjust weights usinggradient descentRepeat with next recordKeep going over trainingset until convergence
HOW DO YOU FIND THE MINIMUMIN AN N-DIMENSIONAL SPACE?
Take a step in the steepest direction.Steepest direction is vector sum of all derivatives.
PUTTING ALL THIS TOGETHERUse forward prop toactivateUse back prop to trainThen use forward propto test
TYPES OF NEURONS
SIGMOID
TANH
RELU
BENEFITS OF RELUPopularAccelerates convergenceby 6x (Krizhevsky et al)Operation is faster sinceit is linear notexponentialCan die by going to zero
Pro: Sparse matrixCon: Network can die
LEAKY RELUPro: Does not dieCon: Matrix is not sparse
SOFTMAXFinal layer of networkused for classificationTurns output intoprobability distributionNormalizes output ofneurons to sum to 1
HYPERPARAMETER TUNING
PROBLEM: OIL EXPLORATIONDrilling holes isexpensiveWe want to find thebiggest oilfield withoutwasting money on dudsWhere should we plantour next oilfield derrick?
PROBLEM: NEURAL NETWORKSTestinghyperparameters isexpensiveWe have an N-dimensional grid ofparametersHow can we quickly zeroin on the bestcombination ofhyperparameters?
HYPERPARAMETER EXAMPLEHow many layers shouldwe haveHow many neuronsshould we have inhidden layersShould we use Sigmoid,Tanh, or ReLUShould we initialize
ALGORITHMSGridRandomBayesian Optimization
GRIDSystematically searchentire gridRemember best foundso far
RANDOMRandomly search the gridRemember the best found so farBergstra and Bengio’s result and Alice Zheng’sexplanation (see References)60 random samples gets you within top 5% of grid searchwith 95% probability
BAYESIAN OPTIMIZATIONBalance betweenexplore and exploitExploit: test spots withinexplored perimeterExplore: test new spotsin random locationsBalance the trade-off
SIGOPTYC-backed SF startupFounded by Scott ClarkRaised $2MSells cloud-basedproprietary variant ofBayesian Optimization
BAYESIAN OPTIMIZATION PRIMERBayesian Optimization Primer by Ian Dewancker, MichaelMcCourt, Scott ClarkSee References
OPEN SOURCE VARIANTSOpen source alternatives:
SpearmintHyperoptSMACMOE
PRODUCTION
DEPLOYINGPhases: training,deploymentTraining phase run onback-end serversOptimize hyper-parameters on back-endDeploy model to front-end servers, browsers,devicesFront-end only usesforward prop and is fast
SERIALIZING/DESERIALIZINGMODEL
Back-end: Serialize model + weightsFront-end: Deserialize model + weights
HDF 5Keras serializes model architecture to JSONKeras serializes weights to HDF5Serialization model for hierarchical dataAPIs for C++, Python, Java, etchttps://www.hdfgroup.org
DEPLOYMENT EXAMPLE: CANCERDETECTION
Rhobota.com’s cancerdetecting iPhone appDeveloped by BryanShaw a!er his son’sillnessModel built on back-end,deployed on iPhoneiPhone detects retinalcancer
DEEP LEARNING
WHAT IS DEEP LEARNING?Deep Learning is a learning method that can train the
system with more than 2 or 3 non-linear hidden layers.
WHAT IS DEEP LEARNING?Machine learning techniques which enable unsupervisedfeature learning and pattern analysis/classification.The essence of deep learning is to computerepresentations of the data.Higher-level features are defined from lower-level ones.
HOW IS DEEP LEARNINGDIFFERENT FROM REGULAR
NEURAL NETWORKS?Training neural networks requires applying gradientdescent on millions of dimensions.This is intractable for large networks.Deep learning places constraints on neural networks.This allows them to be solvable iteratively.The constraints are generic.
AUTO-ENCODERS
WHAT ARE AUTO-ENCODERS?An auto-encoder is a learning algorithmIt applies backpropagation and sets the target values tobe equal to its inputsIn other words it trains itself to do the identitytransformation
WHY DOES IT DO THIS?Auto-encoder places constraints on itselfE.g. it restricts the number of hidden neuronsThis allows it to find a good representation of the data
IS THE AUTO-ENCODERSUPERVISED OR UNSUPERVISED?
It is unsupervised.The data is unlabeled.
WHAT ARE CONVOLUTIONNEURAL NETWORKS?
Feedforward neural networksConnection pattern inspired by visual cortex
CONVOLUTIONAL NEURALNETWORKS
CNNSThe convolutional layer’s parameters are a set oflearnable filtersEvery filter is small along width and heightDuring the forward pass, each filter slides across the widthand height of the input, producing a 2-dimensionalactivation mapAs we slide across the input we compute the dot productbetween the filter and the input
CNNSIntuitively, the network learns filters that activate whenthey see a specific type of feature anywhereIn this way it creates translation invariance
CONVNET EXAMPLE
Zero-Padding: the boundaries are padded with a 0Stride: how much the filter moves in the convolutionParameter sharing: all filters share the same parameters
CONVNET EXAMPLEFrom http://cs231n.github.io/convolutional-networks/
WHAT IS A POOLING LAYER?The pooling layer reduces the resolution of the imagefurtherIt tiles the output area with 2x2 mask and takes themaximum activation value of the area
REVIEWkeras/examples/mnist_cnn.py
Recognizes hand-written digitsBy combining different layers
RECURRENT NEURAL NETWORKS
RNNSRNNs capture patternsin time series dataConstrained by sharedweights across neuronsEach neuron observesdifferent times
LSTMSLong Short Term Memory networksRNNs cannot handle long time lags between eventsLSTMs can pick up patterns separated by big lagsUsed for speech recognition
RNN EFFECTIVENESSAndrej Karpathy usesLSTMs to generate textGenerates Shakespeare,Linux Kernel code,mathematical proofs.Seehttp://karpathy.github.io/
RNN INTERNALS
LSTM INTERNALS
CONCLUSION
REFERENCESBayesian Optimization by Dewancker et al
Random Search by Bengio et al
Evaluating machine learning modelsAlice Zheng
http://sigopt.com
http://jmlr.org
http://www.oreilly.com
REFERENCESDropout by Hinton et al
Understanding LSTM Networks by Chris Olah
Multi-scale Deep Learning for Gesture Detection andLocalizationby Neverova et al
Unreasonable Effectiveness of RNNs by Karpathy
http://cs.utoronto.edu
http://github.io
http://uoguelph.ca
http://karpathy.github.io
QUESTIONS