Upload
vukien
View
238
Download
3
Embed Size (px)
Citation preview
Machine Learning @ Microsoft StanfordScaledMachineLearningConference
August2nd2016QiLu,Applica=on&ServicesGroup,MicrosoA
Agenda § WhatWeDo
§ History§ Goingforward
§ HowWeScale§ CNTK§ FPGA§ OpenMind
§ Q&A
What We Do
Bingmapslaunches
MicrosoAResearchformed
Kinectlaunches
AzureMachineLearningGA
Office365Substrate
HoloLens
Hotmaillaunches
Bingsearchlaunches
SkypeTranslatorlaunches
1991 2014 2009 1997 2015 2010 2008
MachinelearningispervasivethroughoutMicroso2products
ML @ Microsoft: History Answeringques=onswithexperience
Whichemailisjunk?
What’sthebestwayhome?
WhichURLsaremostrelevant?
Whatdoesthatmo=on“mean”?
Whatisthatpersonsaying?
Whatwillhappennext?
ML @ Microsoft: Going Forward § Data=>Model=>Intelligence=>FuelsofInnova=on§ Applica=ons&Services
§ Office365,Dynamic365(BizSaaS),Skype,Bing,Cortana§ DigitalWork&DigitalLife§ Modelsfor:World,Organiza=ons,Users,Languages,Context,…
§ Compu=ngDevices§ PC,Tablet,Phone,Wearable,Xbox,Hololens(AR/VR),….§ Modelsfor:NaturalUserInterac=ons,Reality,…
§ Cloud§ AzureInfrastructureandPlaiorm§ AzureMLTools&Services§ IntelligenceServices
Machine Learning Building Blocks Azure ML (Cloud)
EaseofusethroughVisualWorkflows
Singleclickopera=onaliza=on
ExpandreachwithGalleryandmarketplace
Integra=onwithJupyterNotebook
Integra=onwithR/Python
Microsoft R Server (On-Prem & Cloud)
EnterpriseScale&Performance
WriteOnce,DeployAnywhere
RToolsforVisualStudioIDE
Secure/ScalableOpera=onaliza=on
WorkswithopensourceR
Computational Network Toolkit
Designedforpeakperformance
WorksonCPUandGPU(single/mul=)
Supportspopularnetworktypes(FNN,CNN,LSTM,RNN)
HighlyFlexible–descrip=onlanguage
Usedtobuildcogni=veAPIs
Cognitive APIs (Cloud Services)
See,hear,interpret,andinteract
PrebuiltAPIswithCNTKandexperts
Vision,Speech,Language,Knowledge,
Buildandconnectintelligentbots
InteractwithyourusersonSMS,text,email,Slack,Skype
HDInsight/Spark
OpensourceHadoopwithSpark
UseSparkMLorMLLibusingJava,Python,ScalaorR
SupportforZeppelinandJupyternotebook
IncludesMRSoverHadooporoverSpark
TrainonTBsofdata
Runlargemassivelyparallelcomputeanddatajobs
Azure Machine Learning Services § Easeofusetoolswithdrag/dropparadigm,singleclickopera,onaliza,on§ Built-insupportforsta,s,calfunc,ons,dataingest,transform,featuregenerate/select,train,score,
evaluatefortabulardataandtextacrossclassifica,on,clustering,recommenda,on,anomaly§ SeamlessR/Pythonintegra=onalongwithsupportforSQLlitetofilter,transform
§ JupyterNotebooksfordataexplora=onandGalleryextensionsforquickstarts
§ Modulesfortextpreprocessing,keyphraseextrac=on,languagedetec=on,n-gramgenera=on,LDA,compressedfeaturehash,statsbasedanomaly
§ Spark/HDInsight/MRSIntegra=on§ GPUsupport
§ Newgeographies§ Computereserva,on
Intelligence Suite
Action
Web
Mobile
Bots
Intelligence
Dashboards & Visualizations
Cortana
BotFramework
Cogni=veServices
PowerBI
Information Management
EventHubs
DataCatalog
DataFactory
Machine Learning and Analytics
HDInsight(HadoopandSpark)
StreamAnaly=cs
Intelligence
DataLakeAnaly=cs
MachineLearning
Big Data Stores
SQLDataWarehouse
DataLakeStore
Data
Cognitive Services
How We Scale
Key Dimensions of Scaling § Datavolume/dimension§ Model/algorithmcomplexity§ Training/evalua=on=me§ Deployment/updatevelocity§ Developerproduc=vity/innova=onagility§ Infrastructure/plaiorm§ SoAwareframework/tool§ Dataset/algorithm
How We Scale Example: CNTK
CNTK: Computational Network Toolkit § CNTKisMicrosoA’sopen-source,cross-plaiormtoolkitforlearningandevalua=ngmodelsespeciallydeepneuralnetworks
§ CNTKexpresses(nearly)arbitraryneuralnetworksbycomposingsimplebuildingblocksintocomplexcomputa=onalnetworks,suppor=ngcommonnetworktypesandapplica=ons
§ CNTKisproduc=on-deployed:accuracy,efficiency,andscalestomul=-GPU/mul=-server
CNTK Development § Open-sourcedevelopmentmodelinsideandoutsidethecompany
§ CreatedbyMicrosoASpeechresearchers4yearsago;open-sourcedinearly2015§ OnGitHubsinceJan2016underpermissivelicense§ Nearlyalldevelopmentisoutintheopen
§ Drivingapplica=ons:Speech,Bing,Hololens,MSRresearch§ Eachteamhavefull-=meemployeesac=velycontributetoCNTK§ CNTKtrainedmodelsaretestedanddeployedinproduc=onenvironment
§ Externalcontribu=ons§ e.g.,fromMITandStanford
§ Plaiormsandrun=mes§ Linux,Windows,.Net,docker,cudnn5§ Python,C++,andC#APIscomingsoon
CNTL Design Goals & Approach § Adeeplearningframeworkthatbalances
§ Efficiency:cantrainproduc=onsystemsasfastaspossible§ Performance:canachievebest-in-classperformanceonbenchmarktasksforproduc=onsystems
§ Flexibility:cansupportagrowingandwidevarietyoftaskssuchasspeech,vision,andtext;cantryoutnewideasveryquickly
§ Lego-likecomposability§ Supportawiderangeofnetworks§ E.g.Feed-forwardDNN,RNN,CNN,LSTM,DSSM,sequence-to-sequence
§ Evolveandadapt§ Designforemergingprevailingpauerns
Key Functionalities & Capabilities § Supports
§ CPUandGPUwithafocusonGPUCluster§ Automa=cnumericaldifferen=a=on§ Efficientsta=candrecurrentnetworktrainingthroughbatching§ Dataparalleliza=onwithinandacrossmachines,e.g.,1-bitquan=zedSGD§ Memorysharingduringexecu=onplanning
§ Modulariza=onwithsepara=onof§ Computa=onalnetworks§ Execu=onengine§ Learningalgorithms§ Modeldescrip=on§ Datareaders
§ Modeldescrip=onsvia§ Networkdefini=onlanguage(NDL)andmodeledi=nglanguage(MEL)§ BrainScript(beta)withEasy-to-UnderstandSyntax
Architecture
Roadmap § CNTKasalibrary
§ Morelanguagesupport:Python/C++/C#/.Net§ Moreexpressiveness
§ Nestedloops,sparsesupport§ Finercontroloflearner
§ SGDwithnon-standardloops,e.g.,RL§ Largermodel
§ Modelparallelism,memoryswapping,16-bitfloats§ MorepowerfulCNTKserviceonAzure
§ GPUssoon;longertermwithcluster,container,newHW(e.g.,FPGA)
How We Scale Example: FPGA
Catapult v2 Architecture
§ Givessubstan=alaccelera=onflexibility§ Canactasalocalcomputeaccelerator§ Canactasanetwork/storageaccelerator§ Canactasaremotecomputeaccelerator
WCSGen4.1BladewithMellanoxNICandCatapultFPGA
Pikes Peak
WCS Tray Backplane
Option Card Mezzanine Connectors
CatapultWCSMezzcard(Pike’sPeak)
CPU CPU FPGA
NIC
DRAM DRAM DRAM
WCS2.0ServerBlade(Mt.Hood) CatapultV2(PikesPeak)
Gen32x8
Gen3x8
QPI Switch
QSFP
QSFP
QSFP
40Gb/s
40Gb/s
Configurable Clouds § Cloudbecomesnetwork+FPGAsauachedtoservers
§ Cancon=nuouslyupgrade/changedatacenterHWprotocols(network,storage,security)
§ Canalsouseasanapplica=onaccelera=onplane(HardwareAccelera=onasaService(HaaS)
§ ServicescommunicatewithnoSWinterven=on(LTL)
§ Singleworkloads(includingdeeplearning)cangrab10s,100s,or1000sofFPGAs
§ Cancreateservicepoolsaswellforhighthroughput
ToR ToR
CS CS
Networkaccelera=on
BingRankingHW
TexttoSpeech
Large-scaledeeplearning
ToR ToR
BingRankingSW
Scalable Deep Learning on FPGAs
§ ScaleMLEngine:aflexibleDNNacceleratoronFPGA§ FullyprogrammableviasoAwareandcustomizableISA§ Over10Ximprovementinenergyefficiency,cost,andlatencyversusCPU
§ Deployableaslarge-scaleDNNservicepoolsviaHaaS§ Lowlatencycommunica=oninfewmicroseconds/hop§ Largescalemodelsatultralowlatencies
F F F
L0
L1
F F F
L0
NNModel FPGAsoverHaaS ScaleMLEngine
InstrDecoder&Control
NeuralFU
How We Scale Example: Open Mind
OpenMindStudio:the“VisualStudio”forMachineLearningData,Model,Algorithm,Pipeline,Experiment,andLifeCycleManagement
FederatedInfrastructureDataStorage,Compliance,ResourceManagement,Scheduling,andDeployment
CNTK
TheNextNew
Framework
…
Specialized,Op=mized
Computa=onFrameworks
(e.g.,SCOPE,ChaNa)
OpenSource
Computa=onFrameworks
(e.g.,Hadoop,Spark)
OtherDeepLearning
Frameworks(e.g.,Caffe,MxNet,
TensorFlow,Theano,Torch)
HeterogeneousCompu=ngPlaiorm(CPU,GPU,FPGA,RDMA;Cloud,Client/Device)
ProgrammingAbstrac=onsforMachineLearning/DeepLearning
ChaNa:RDMA-Optimized Computation Framework § Focusonfasternetwork
§ Compactmemoryrepresenta=on§ Balancedparallelism§ Highlyop=mizedRDMA-awarecommunica=onprimi=ves§ Overlappingcommunica=onandcomputa=on
§ Anorderofmagnitudeimprovementinearlyresults§ Overexis=ngcomputa=onframeworks(withTCP)§ Againstseverallarge-scaleworkloadsinproduc=on
Programming Abstraction for Machine Learning § GraphEnginesforDistributedMachineLearning
§ Automa=csystem-levelop=miza=ons§ Paralleliza=onanddistribu=on§ Layoutforefficientdataaccess§ Par==oningforbalancedparallelism
§ Promisingearlyresults§ Simplifica=onofdistributedMLprogramsviahighlevelabstrac=ons§ About70-80%reduc=onincode
§ Rela=vetoMLsystemssuchasPetuum,ParameterServer§ MatrixFactoriza=onforrecommenda=onsystem§ LatentDirichletAlloca=onfortopicmodeling
Q&A
Thank You!