Upload
mr-vengineer
View
991
Download
0
Embed Size (px)
Citation preview
Intel Nervana Graph とは
@Vengineer
2017/05/222017/07/01, 08/12更新
いつものようにソースコードの中を探ってみました
ブログ : Vengineerの戯言http://blogs.yahoo.co.jp/verification_engineer
Twitter : @Vengineer
FPGAマガジン (No.16/17) FPGAコミュニティのススメhttp://fpga.cqpub.co.jp/
自己紹介
SlideShare https://www.slideshare.net/ssuser479fa3
この資料は、
各社の公開情報を
Google君で検索したものを
まとめたものです。
ご利用は、自己責任でお願いします
2016年8月9日、IntelはNervana Systemsを3.5億ドル以上で買収
創立2年のスタートアップで、投資家から2500万ドル近くを調達していたということは、投資家は2年で10倍で売り抜けたということ
2年間で3億ドル
Softbank GroupのARM買収は240億ポンドなので、ざっくり 1/100
引用:http://jp.techcrunch.com/2016/08/10/20160809intel-buys-deep-learning-startup-nervana-systems-for-a-reported-350-million/
Nervana Graph Compiler
引用:https://www.nervanasys.com/intel-nervana-graph-preview-release/
・Frontends : neon / TensorFlow / Caffe / Caffe2 / CNTK /MXnet
・Nervana Graph
・Transformers : CPU / GPU (CUDA)
Lowering
TensorFlowグラフ
XLAグラフに変換
コード生成
JIT or AOT
LLVMを利用
Lowering
TensorFlow XLA
CPUGPU(CUDA)
Nervana Graph Compilerと
TensorFlow XLA
何か同じじゃん
出ましたよ
https://www.intelnervana.com/intel-nervana-graph-and-neon-3-0-updates/
The connection between the XLA and Intel Nervana Graph APIs was quite straightforward given the similar projects’ intent for a compact and explicit intermediate representation.
While today the XLA/Intel Nervana Graph integration is at a pre-alpha level, we’d love for people to take it for a spin and kick the tires. We’re working on ironing out known performance issues and improving op and backend support.
Intel Nervana Graph Beta : 2017/6/22
Intel neon
neonhttps://github.com/NervanaSystems/neon
最新バージョンは、v1.9ARMのNEONと同じ名前だけど
neon is Intel Nervana's reference deep learning framework committed to best performance on all hardware
Datasets
Images: MNIST, CIFAR-10, ImageNet 1K, PASCAL VOC, Mini-Places2
Text: IMDB, Penn Treebank, Shakespeare Text, bAbI, Hutter-prize
Video: UCF101Others: flickr8k, flickr30k, COCO
neon vs cuDNN 4
“Not so fast, FFT”: Winograd (March 3, 2016)
引用:https://www.nervanasys.com/winograd/
cuDNN 5
Optimizing Recurrent Neural Networks in cuDNN 5 (April 6, 2016)https://devblogs.nvidia.com/parallelforall/optimizing-recurrent-neural-networks-cudnn-5/
Faster forward and backward convolutions
using the Winograd convolution algorithm;
Winogradで高速化!
Fast Algorithms for Convolutional Neural NetworksAndrew Lavin, Scott Grayhttps://arxiv.org/abs/1509.09308
Going beyond full utilization: The inside scoop on Nervana’s Winograd kernels (June 29, 2016)https://www.nervanasys.com/winograd-2/
neon v1.3 vs cuDNN v5.1
Still not slowing down: Benchmarking optimized Winograd implementations (July 25, 2016)
引用:https://www.nervanasys.com/winograd-3/
vs cuDNN v4 vs cuDNN v5.1
Scott Gray さん
https://twitter.com/scottgray76
High-Performance GPU kernels for deep learning• Fast matrix multiply for small minibatches• Direct convolution leveraging GEMM advances• Even faster convolution with Winograd
Nervana (2014年10月 〜 2017年7月)現在は、Open AI所属 (〜 2017年7月)
引用:http://on-demand.gputechconf.com/gtc/2016/presentation/s6485-scott-gray-gpu-programming-deep-learning.pdf
Intel NervanaGraph Compiler
Nervana Graph Compiler
引用:https://www.nervanasys.com/intel-nervana-graph-preview-release/
・Frontends : neon / TensorFlow / Caffe / Caffe2 / CNTK /MXnet
・Nervana Graph
・Transformers : CPU / GPU (CUDA)
Lowering
Graph Compilerの位置づけ
引用:http://pc.watch.impress.co.jp/docs/news/1034408.html
MKL-DNN Support
Mar 23, 2017 :Intelに買収された後
To install with Intel MKL-DNN support, first download MKL-DNN from [here] ・(https://github.com/01org/mkl-dnn) and follow the installation instructions・there to install MKL-DNN. Set environment variable MKLDNN_ROOT to point to ・the installated location and follow the rest of the steps to install Ngraph
引用:https://github.com/NervanaSystems/ngraph/commit/f3b7306214f40b4c1b4c40e3e223080797afb382
Transformer API
・CPU と GPU をサポートMemory usage optimization passesTransformers allow users to register an includedset of optional compiler passes for debug and visualization.
・GPUautomatic kernel fusion/compounding for increased performance
・LLVMのPassのような仕組み
引用:https://github.com/NervanaSystems/ngraph/blob/master/README.md
グラフを生成する
・Nervana Graph構造Data DependenciesInitializersNon-data Control Dependencies
・General properties of ops・Op Hierarchy・Ops influencing evaluation・Derivatives
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/building_graphs.rst
将来サポートするもの?
・Nervana Graph serialization/deserialization・Further improvements/abstractions to graph composability for usability/optimization
・Distributed, heterogeneous backend target support
・C APIs for interoperability to enable other languages to create/execute graphs
・Better debugging・Support for model deployment
引用:https://github.com/NervanaSystems/ngraph/blob/master/README.md
コレ以降、
Intel Nervana Graph Compilerの
ソースコードを探っていいきます
ngraphhttps://github.com/NervanaSystems/ngraph
サンプルコード
import ngraph as ngimport ngraph.transformers as ngt
x = ng.placeholder(())x_plus_one = x + 1
transformer = ngt.make_transformer()
plus_one = transformer.computation(x_plus_one, x)
for i in range(5): print(plus_one(i))
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/overview.rst
Caffeでの例
from __future__ import print_functionimport ngraph.transformers as ngtfrom ngraph.frontends.caffe.cf_importer.importer import parse_prototxt
model = "sum.prototxt"op_map = parse_prototxt(model,verbose=True)op = op_map.get("D")
res = ngt.make_transformer().computation(op)()print("Result is:",res)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/caffe.rst
TensorFlowでの例
x = tf.constant(1.)y = tf.constant(2.)f = x + y
importer = TFImporter()importer.import_graph_def(tf.Session().graph_def)
f_ng = importer.get_op_handle(f)
transformer = ngt.make_transformer()f_result = transformer.computation(f_ng)()print(f_result)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/tensorflow.rst
Transformers&
Computations
Transformers
Transfomersは、グラフをバックエンド固有の実行フォーマットに変換する。グラフから、Transformerによって、1つ以上のComputationが生成される。Transfomerによって生成された実行オブジェクトは、Computationによって操作される。
すべてのTransformerは、ユーザがバックエンドを切り替えられるように共通の抽象インターフェースを実装しなければいけない。
サポートしているバックエンド・CPUs (via NumPy)・NVIDIA GPUs (via PyCUDA)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Transformerの生成
1)、デフォルト
from ngraph.transformers import make_transformertransformer = make_transformer()
2)、ファクトリを利用
import ngraph.transformers as ngtavailable_transformers = ngt.transformer_choices()if 'gpu' in available_transformers: factory = ngt.make_transformer_factory('gpu') ngt.set_transformer_factory(factory)
transformer = ngt.make_transformer()
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Computations
Computatonは、Transformerによって生成され、グラフのサブセットを評価するためのインターフェースを備えている
生成された Computation のフォーマットは、Computation を実行するTransformerに依存する。
たとえば、
・CPUなら、NumPy ・GPUなら、PyCUDAである
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Computationの生成
import ngraph as ng
a = ng.constant(4)b = ng.placeholder(())c = ng.placeholder(())d = ng.multiply(a, b)e = ng.add(d, c)
example_comp = transformer.computation(e, b, c)
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Computationの実行
example_comp = transformer.computation(e, b, c)
result_e = eの戻り値
b = 第一引数
c = 第二引数
result_e = example_comp(2, 7) : b = 2, c = 7result_e = (4 * b) + c => ( 4*2 ) + 7 = 15
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Computationの実行
複数の戻り値
example_comp2 = transformer.computation([d, e], b, c)
result_d = dの戻り値, result_e = eの戻り値
b = 第一引数
c = 第二引数
result_d, result_e = example_comp2(2, 7)result_d = (4 * b) = (4 * 2) = 8result_e = (4 * b) + c => (4 * 2) + 7 = 15
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_usage.rst
Transformerの実装
・Transformerの生成
・Computationの生成
・Transformerの初期化Transformer PassesIntialization ComputationTensor Description InitializationComputation Transformation
・Computationの実行
引用:https://github.com/NervanaSystems/ngraph/blob/master/doc/source/transformer_implementation.rst
Transformerの実装
base.py : Transformer_ABC_Metabase.py : Transformer (ベース)
cputransform.py : CPUTransformergputransform.py : GPUTransformerhetrtransform.py : HetrTransformer
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers
Transformer_ABC_Metaクラス
class Transformer_ABC_Meta(abc.ABCMeta): """ metaclass for the backend objects takes care of registering all the backend subclasses """ def __init__(cls, name, bases, dict_): if not hasattr(cls, 'transformers'): # First possible transformer class sets things up cls.transformers = {}
# If this transformer has a transformer_name, register it transformer_name = getattr(cls, 'transformer_name', None) if transformer_name is not None: cls.transformers[transformer_name] = cls super(Transformer_ABC_Meta, cls).__init__(name, bases, dict_)
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
Transformerクラス
class Transformer(with_metaclass(Transformer_ABC_Meta, object)): """ Produce an executable version of op-graphs.
Computations are subsets of Ops to compute. The transformer determines storage allocation and transforms the computations and allocations into functions.
Arguments: fusion (bool): Whether to combine sequences of operations into one operation. **kwargs: Args for related classes.
Attributes: computations (:obj:`set` of :class:`Computation`): The set of requested computations. all_results (:obj:`set` of :class:`ngraph.op_graph.op_graph.Op`): A root set of Ops that need to be computed. finalized (bool): True when transformation has been performed. initialized (bool): True when variables have been initialized/restored. fusion (bool): True when fusion was enabled. device_buffers (set): Set of handles for storage allocations. """
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
Computationクラス
class Computation(NameableValue): """ A handle for a computation function.
Arguments: transformer (obj:`Transformer`): The associated transformer. returns: If an Op, return the value of the Op, if sequence of Ops, return the sequence of values, if a set return a map, if None, return None. *args: AllocationOps marked input will be arguments to the function. **kwargs: Args for related classes. """
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers
Computationクラス
def __init__(self, transformer, computation, **kwargs): super(Computation, self).__init__(**kwargs) self.transformer = transformer self.computation = computation self.computation_name = None self.executor = None
self.send_nodes = [] self.recv_nodes = [] self.scatter_send_nodes = [] self.scatter_recv_nodes = [] self.gather_send_nodes = [] self.gather_recv_nodes = [] self.allreduce_nodes = []
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
Computationの実装
base.py : Computation (ベース)
cputransform.py : CPUComputationBase.py : GPUComputationhetrtransform.py : HetrComputation
make_computationが実行されたとき、
各Transformerに対応するComputationが
生成される
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers
Computationの実装
cputransform.py : CPUComputation def make_computation(self, computation): return CPUDeviceComputation(self, computation)
base.py : GPUComputation def make_computation(self, computation): return Computation(self, computation)
hetrtransform.py : HetrComputation def make_computation(self, computation): return HetrComputation(self, computation)
Computationクラス
class Computation(NameableValue):
def __init__(self, transformer, computation_op, **kwargs): super(Computation, self).__init__(**kwargs) logging.info("Creating computation with computation_op: %s", computation_op) self.transformer = transformer self.computation_op = computation_op self.computation_name = None self.executor = None
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
CPUDeviceComputationクラス
class CPUDeviceComputation(Computation):
def __init__(self, transformer, computation, **kwargs): super(CPUComputation, self).__init__(transformer, computation, **kwargs) self.pool_params = dict() self.pool_slices = dict() self.conv_params = dict() self.conv_slices = dict()
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
HetrComputationクラス
class HetrComputation(Computation):
def __init__(self, hetr, computation_op): self.child_computations = dict() self.transformer = hetr self.send_nodes = hetr.send_nodes self.computation_op = computation_op
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/hetrtransform.py
Passの実装 (その1)
passes.py GraphPass (ベースクラス)passes.py GraphBuildingPasspasses.py GraphRewritePass passes.py PeepholeGraphPasspasses.py RequiredTensorShapingpasses.py CPUTensorShapingpasses.py SimplePruneflexpass.py FlexDtypePassflexpass.py FlexDECPassflexpass.py ClearTensorDescriptionsnviz.py JSONPass(GraphPass):nviz.py VizPass(GraphPass):
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes/base.py
Passの実装 (その2)
layout.py PruneContiguousPasslayout.py GenerateLayoutDomainslayout.py GenerateLayoutConstraintslayout.py AssignLayoutslayout.py AddLayoutConversions
cpufusion.py FusionPasscpulayout.py CPUTensorLayoutgpusimplification.py GPUSubstitutionhetrpasses.py DeviceAssignPasshetrpasses.py CommunicationPasshetrpasses.py DistributedPass
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes
Passの実装 (その3) MKL-DNNを利用するときの Pass
mkldnnpasses.py MklCreateOpDescriptorsmkldnnpasses.py MklAddLayoutConversions
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/passes
ComputationGraphTransformerクラス
class ComputationGraphTransformer(Transformer):
def run_registered_graph_passes(self, ops, **kwargs): for graph_pass in self.graph_passes: graph_pass.wrapped_do_pass(ops=ops, **kwargs) return ops
gputransformer.py class GPUTransformer(ComputationGraphTransformer):
hetrtransformer.py class HetrTransformer(ComputationGraphTransformer):
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
ExecutionGraphTransformerクラス
extransformer.py class ExecutionGraphTransformer(Transformer): def run_registered_graph_passes(self, computation_decl, **kwargs): op_accessor = ExOpGraphOpAccessor() for graph_pass in self.graph_passes: graph_pass.wrapped_do_pass( op_accessor=op_accessor, computation_decl=computation_decl, **kwargs)
cputransformer.py class CPUTransformer(ExecutionGraphTransformer):
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/extransformer.py
GraphPassクラス
class GraphPass(with_metaclass(abc.ABCMeta, DelegateOpAccessor)):
def wrapped_do_pass(self, **kwargs): self.begin_pass(**kwargs) self.do_pass(**kwargs) self.end_pass(**kwargs)
@abc.abstractmethod def do_pass(self, **kwargs): pass
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/base.py
CPUTransformerクラス
class CPUTransformer(Transformer):
def __init__(self, **kwargs): super(CPUTransformer, self).__init__(**kwargs) self.current_computation = None self.conv_engine = CPUConvEngine() self.init_code = CPUCodeGenerator(self) self.allocate_storage_code = CPUCodeGenerator(self) self.allocate_code = CPUCodeGenerator(self) self.compute_code = CPUCodeGenerator(self) self.code = CPUCodeGenerator(self) …..
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
CPUTransformerクラス
Passの追加
self.graph_passes = [] if self.mkldnn.enabled: self.graph_passes.append(CPUFusion()) self.graph_passes += [ # ExVizPass(view=True, filename="initial"), CPUTensorLayout(), SimplePrune(), RequiredTensorShaping(), CPUTensorShaping(), DeadCodeEliminationPass(), ]
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
CPUTransformerクラス if self.mkldnn.enabled: self.graph_passes.append(MklCreateOpDescriptors(mkldnn=self.mkldnn)),
DeadCodeEliminationPass(), self.graph_passes.append(MklAddLayoutConversions(mkldnn=self.mkldnn, layoutpass=add_layout_conversion)), DeadCodeEliminationPass() self.graph_passes += [ SSAConversion(), IndexElision(), # DeadCodeEliminationPass(), LivenessPass(), MemOptimizePass(), LivenessPass(), MemLayoutPass() ]
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
CPUTransformerクラス # from ngraph.transformers.passes.dumpgraphpass import DumpGraphPass
# self.graph_passes += [DumpGraphPass()]
# from ngraph.transformers.passes.visualizemem import VisualizeMemPass
# self.graph_passes += [VisualizeMemPass()]
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
GPUTransformerクラス
class GPUTransformer(Transformer):
def __init__(self, device_id=None, comm=None, **kwargs): super(GPUTransformer, self).__init__(**kwargs) GPUTransformer.gpu_transformers.add(self) ….. self.graph_passes = [ SimplePrune(), PruneContiguousPass(), GPUSubstitution(), layout_domain_pass, layout_constraints_pass, Layout_assign_pass, layout_convert_pass
]
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/gputransform.py
HetrTransformerクラス
class HetrTransformer(Transformer):
def __init__(self, device_id=None, comm=None, **kwargs): super(HetrTransformer, self).__init__(**kwargs) ….. self.graph_passes = [ DeviceAssignPass(hetr=self, default_device=device, default_device_id=0), CommunicationPass(self.send_nodes), DistributedPass(self.send_nodes)
]
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/hetrtransform.py
コード生成
CPUCodeGeneratorクラス
class CPUCodeGenerator(PyGen):
def __init__(self, transformer, **kwargs): super(CPUCodeGenerator, self).__init__(prefix="op", **kwargs) self.transformer = transformer
def name(self, x): if isinstance(x, CPUDeviceBufferStorage): return x.ref_str if isinstance(x, CPUDeviceTensor): return x.ref_str return x
引用:https://github.com/NervanaSystems/ngraph/tree/master/ngraph/transformers/cputransform.py
ありがとうございました
ブログ : Vengineerの戯言http://blogs.yahoo.co.jp/verification_engineer
Twitter : @Vengineer
勉強会主催 : Xilinx Zynq MPSoC (2016/02/20) Altera SDK for OpenCL (2016/06/10) Xilinx SDSoC (2017/01/28)
PYNQ祭り (2017/03/04)FPGAディープラーニング実践懇親会 (2017/05/20)