簡便エミュレーションによる実験計画のスマート化coop-math.ism.ac.jp/files/220/02 MI2I数協160226_樋口.pdf · 簡便エミュレーションによる実験計画のスマート化

簡便エミュレーションによる実験計画のスマート化

樋口知之情報・システム研究機構統計数理研究所

数学協働プログラムワークショップ MI2（情報統合型物質・材料開発）と数学連携による新展開平成28年2月26日(金)

1/28

アウトライン

1. データ同化2. 簡便エミュレータ

a. 必要な理由と動向b. 構成法

3. スパース回帰4. GPR: Gaussian Process Regression

2/28

システムモデルとしてのシミュレーションモデル

),( 1 tttt f vxx

)( 1 ttt f xx

tv

2cxt

x

（simplified meteorological model around Japan）

PDE : Partial differential equation

(time varying)

θ

ξ

ξ

ξ

ξ

x

tM

tm

tm

t

t

,

,1

,

,1

State Vector

代入計算

3/28

)|(),,(

)|(),,(

obs

sys1

θwwwxy

θvvvxx

ph

pf

ttttt

ttttt

～

～

データ同化と一般状態空間モデル

State Vector（Simulation variables）

Stochastic simulation model

Observation modelMeasurement model

tt Missing value (Outlier)

time integration

tt

t

t

1

step timesimulation:

nsobservatio of timesampling:

),0(, obsRNH ttttt ～wwxy 気象・海洋のデータ同化の枠組み

観測モデル

システムモデル

4/28

)()|(

)()|()|(

xxy

xxyyx

pp

ppp

　　

順解析：シミュレーション等

:

:

y

x 興味のある対象

データ

ベイズの反転公式

逆解析

ベイズの定理。等号の右側と左側で，赤と青で示した変数部分の縦棒との相対関係が反転していることがわかる。この事実により，ベイズの反転公式と呼ばれる。

ベイズの反転公式

5/28

i

kiz

G

y

)(

0

0

　

“連成”ベイズ（ベイズシミュレーション）の普及

■ベイズモデルの階層がますます高層化 (HDP: 階層ディリクレプロセス)

■ノンパラメトリックベイズ法の浸透

・ディリクレプロセスの利用が自然言語の分野ですすんだ。

・モデル選択からモデル混合へ。

■ミクロからマクロへ生成モデルの連続結合（“連成”）による、いわばフォワード推論がベイズ計算の主流

上田＆山田 (応用数理, 2007)から

i

i

G

G

y

00

K-mixture model vs. DP mixture model6/28

7

Niwayama et al. [PNAS 2011]

Assimulation Observation

Data Assimilation on Intracellular Fluid in C.elegans embryo(Collaboration with Dr. Kimuta at NIG)

2 1dp

dt

vvNavier-Stokes equation

mass conservation lawD

Dt

v

MPS method

PIV (Particle Image

Velocimetry) imaging

7/28

Data assimilation works powerfully also in the cell biology.

A large dynamic intra-cellular flow occurs in a fertile egg right

before a cell division, but actually its mechanism is yet

unknown.

We assume that proteins “myosin” in the cell wall drive

physically such intra-cellular flow, and we perform data

assimilation by combining a numerical simulation that solves

simultaneously these physical equations and the observation

obtained by the image analyses.

Tracking result

tracker

minimum spanning tree

Cell tracking

（Thanks to Dr. Tokunaga, 徳永＠統数研＆九工大から借用、改変）8/28

Quantitative analysis of neural activities with Ca2+ imaging of C.elegans whole brain

C. elegans

• The neuronal nuclei of adult C. elegans are labeled with a red fluorescent

protein sensor called mCherry.

• A combination of the fluorescent protein sensor for calcium ion visualizes the

neuronal activity of the multiple neurons in the living state.

• The temporal changes of the neuronal activities are measured by a confocal

microscope.

4D imaging

system

データ同化を生かした設計プロセス

CAE: Computer-Aided Engineering

設計プロセスへのCAE導入により試験コストの低減を実現してきた

データ同化の導入

データ同化によるCAE解析と模型試験の統合的アプローチ

データ同化による設計技術の革新

模型試験

得られた結果の統合（人為的プロセス）

性能評価

CAE解析

性能評価

再設計/最適設計再設計/

最適設計

センサーネットワークを擁した超サイバー試験CAE解析

統一的視点での性能評価

ビッグデータ統合解析

CAEモデル高度化・最適化

試験システムの高度化

データ同化による設計プロセスで利用される試験・CAEデータの信頼性・精度向上

時間コストの低減製品の高度化による競争力強化

9/28

UQ: Uncertainty Quantification• 欧米では、計算機シミュレーション結果の信頼性を具体的に確立するための方法論の研究が急速に熱を帯びてき

ており、ASME(The American Society of Mechanical Engineers)がVerification and Validation （通常V&Vと呼称）の標準化に大きな力を注いでいる。例えば、2006年には固体力学に対して、2009年には流体力学および熱解析に関する計算機シミュレーションのV&Vが公表されている。

• 欧州においては流体力学の分野で同種の研究活動が2012年から活発化しており、Uncertainty Quantification (UQ) in Industrial Analysis and Design の名のプロジェクト研究が現在進行中である。

• NASAでは、NASA UQ challenge 2014と題して、スパースな限定されたパラメータセットに関するシミュレーションの結果データから、UQをモデル化するコンペを開始した。

• 米国統計コミュニティは、2011-12年に、NSFのサポートを受ける機関SAMSI（Statistical and Applied Mathematical Sciences Institute）にてUQを集中的に研究するプログラムを立ち上げた。

• 米国統計学会はSIAM(Society for Industrial and Applied Mathematics)と共同でJournal on UQの刊行を2014年に開始した。その雑誌の取り扱う主たる分野としてsensitivity analysis, model validation, model calibration, data assimilationの4つがあげられている。最新号の論文（4本掲載）は、感度解析、ガウス過程回帰、モデル較正、ギプスサンプラーの解析のテーマとなっており、ほぼ統計学の範疇である。

重要な技術：ガウス過程回帰や、その古典版とも言えるクリギング次元削減を目的としたスパース回帰

中野慎也、樋口知之、地球科学におけるシミュレーションとビッグデータ─データ同化とエミュレーション─、電子情報通信学会誌、Vol.97(10), pp.869-875, 2014. 樋口知之、中村和幸、データ同化によるオンラインセンシングの高度化, 計測自動制御学会誌, Vol.51(9), 2012.長尾大道、佐藤光三、樋口知之, マルコフ連鎖モンテカルロ法を利用したトレーサー試験からフラクチャーの物理パラメータを推定する方法, 石油技術協会誌, Vol.78(2), pp.197-209, 2013.

Iba, Y. and Akaho, S., Gaussian process regression with measurement error, IEICE Trans. E93-D(10), 2010.

10/28

通常のエミュレータの概念

11/28

エミュレータ (Emulator)とは、コンピュータや機械の模倣装置あるいは模倣ソフトウェアのことである。

概要コンピュータ分野で使われることが多い用語だが、もともとは機械装置全般に使う言葉である。判りやすく言えば、機械を真似る機械である。

語源エミュレーションやエミュレータは、模倣対象のシステムにおいて、予測できる現象より予測できない現象が支配的である場合に使われる。また、非常に高い安全性が要求される場合にも良く使われる。予測できる現象が支配的な場合や、完全に模倣することが難しい場合はシミュレーション技術を使う。

(Wikipediaより）

計算コストと予測精度

12/28

• 高い計算コスト（オンライン計算が難しい）• 不確実性の取り扱い方

“不確実性を、世界を理解する人間の能力に付随するものではなく、実験に付随するものとしてとらえている。”

計算コスト（小）

インサンプル再現度（良い）

計算対象のシナリオ数の多様性（大）

通常型エミュレータ

簡便エミュレータ

シミュレータ

原理への忠実度（大）

Simulation → Data Assimilation → Emulator：Statistical model for predicting an output given input parameters

N

ii 1z

Input Parameter

N

iiy1

Result（Experimental Data）

Simulation

N

iim1

)(

z

Result（Prediction）

)(zm

Emulator

Model Improvement

)(zm

Adjustment and

Interpolation

13/28

Monte Carlo Experiments

→Experimental Design, UQ

x )(zgSystem

Black Box

Scalar OutputHuge dimensional

state vector

zSmall dimensional

feature vector

Temperature averaged over entire

surface (Global mean temperature)CO2 emission

Rainfall (or temperature)

at some location

①

Sparsing

Feature extraction

Sparse Regression②

)()g(E zz mStatistical Model

KrigingGaussian Process Regression

③

14/28

エミュレータの設計法

１．目的変数（スカラー値）を決める。

２．シミュレーションを複数回走らせる。

３．スパース回帰により、入力パラメータベクトルを同定する。

４．GPRにより、出力関数（応答局面）をもとめる。

②

③

①

Emulator and Emulation

Emulate an output of simulation given parameters

x )(zgSystem

Black Box

Scalar OutputSimulation (DA) Result

Huge Dimensional Vector

zSmall Dimension

N

ii 1z

Parameter Input

N

iiy1

Sparse Regression

Sparsing

Feature Extraction

Result（regarded as an experiment）Simulation

N

iim1

)(

z

Output（Prediction）

)(zm

Statistical ModelEmulation

Model Improvement

)(zm

Adjustment and

Interpolation

15/28

Monte Carlo experimentExperimental Design

GPR②

設計パラメータ

Sample

Sparse regression

Prediction

Climate Prediction：Climate Emulator

tL

tm

t

t

y

y

y

,

,

,1

yttitm eHy

1,

~x

Tttt 1

,

xy

tmt y ,1x

Simulation Result（）(or Assimilation Result)Observation Vector（）

Scalar Variable: Rainfall (Temperature) at some point

New input from a simulation system

Sparse Learning

Ensemble prediction of by using many samples

tmy ,

16/28

53 1010 ～74 1010 ～

*

1tx

53 1010 ～

0

),|( 1, zx

ttiyp

zxzzxx dpypyp tttmttm )|(),|()|( 11,1,

② N

ii 1z{ , }

izFor given

m,t

Sample

Regressiont

i

t

tm eHy

z

x 1

,

Tttt 1

,

xy

17/28

②

N

ii 1z

Tttt 1

,

xy

ttitm eHy

1,

~x

N

ii 1z

t

t

itm eHHy ~~,ˆ 1

,

z

x

2 stepAugmentation

Sparse Regression

Regression

zi

For given

Sparse Regression

HL1ノルムの拘束度合を変えるなどで異質性を反映

i

スパース（疎性を利用した）最適化

18/28

eay

H

Niexaxaxaay iPiPiii ),,1(,2,21,10 例：多変量回帰

= +

新NP問題 ( )

aayaa

2

min H

PN

aaya

tosubj..min2

H

Ny

y

1

0a

1a

Pa

1 1,ix Pix ,

Ne

e

1

モデルでは表現できない部分（誤差というのは不適切）

P

N

0

解の一意性と解のロバスト化

2

2

2

2.min aay

c Hリッジ回帰

1ny 1ma

mnH

サンプル数

独立変数の数

ih

iy

■ n > mの場合はたいてい大丈夫 ■ m > nの場合は解を一意に定められない。

1

2

2.min aay

c H

11.min aayc

H

ロバストな解を求めるために

2

/

2

2.min

RQUH aay

a初期のベイズ型

逆問題一般型

iii wy ah通常の線形回帰モデル

19/28

LASSO: Least Absolute Shrinkage and Selection Operator

R.Tibshirani (JRSS B, 1996)

1

2

2 tosubj..min aay

aH

■ m > nの場合は解を一意に定められない。

121 aa

1a

2a

(1,0)

12

2

1

22

2

2

2

2

1

aa

(2,2)

1

2

2.min aay

a H

具体的解法は，二次計画法問題に帰着

実際にはαをいろいろ変える必要があるため，効率的なアルゴリズムが提案されている．

LARS: Least Angle Regression (Efron et al., 2004)

20/28

Emulator and Emulation

Emulate an output of simulation given parameters

x )(zgSystem

Black Box

Scalar OutputSimulation (DA) Result

Huge Dimensional Vector

zSmall Dimension

N

ii 1z

Parameter Input

N

iiy1

Sparse Regression

Sparsing

Feature Extraction

Result（regarded as an experiment）Simulation

N

iim1

)(

z

Output（Prediction）

)(zm

Statistical ModelEmulation

Model Improvement

)(zm

Adjustment and

Interpolation

21/28

Monte Carlo experimentExperimental Design

GPR

③

Emulator：Gaussian Process Regression (GPR)

22/28

要は内挿の一手法データ：シミュレーション結果入力：設計パラメータ（多次元）出力：目的変数を出力とする確率的応答関数

1次元 2次元)(zg

z

)(* zm

z)(zg

z

)(* zm の等高線

http://gpytest2.readthedocs.org/en/latest/tuto_GP_regression.html

iz

iy

N

iiy1

)(zm

Emulator：Statistical Model（Linear Regression＋GP）

)),(),(())(()( 2 zzzz kmNgpgp

z

xzz

*

1)()g(E tHm

)'()'(exp)',( zzzzzz Rk T

2)',()'(),(Cov zzzz kgg

Kernel function：Mutual

Distance between input

parameters

Regression Coefficient

Common variance

)(zm

z

)(zgg

Gaussian Process：g is a continuous function of z

),0(, 2Neegy ii ～

Kernel function

Observation Model

)()|()|( gpgYpYgp 1g

Covariance function

System Model

2g

1z

1y

g),,1( Ni

Statistical model is given

NyyY ,,1

T

Posterior Distribution

23/28

2y

2z

g is a continuous function

Emulator：Design of Kernel function

)'()',()'(),(Cov 22zzzzzz kggy

)()g(E zz my

Kernel function is NOT related

to a mean function m(z).

)'(2

)'(exp

2

)'(exp 2

2

22

22

22

1 zzzzzz

SL

)(zgg SL 6

z

)'())'((sin2exp2

)'(exp 222

22

22

1 zzzzzz

z

)(zgg

Gaussian Processes for Regression:A Quick Introduction

M. Ebden, August 2008

24/28

Emulator：Similar structure to SSM if discretizing

)(zm

z

)(zgg

1z2z

1y

2y1g

2g

g

)(nm

t

ng

n0

nyIf g is defined

only on the

discrete point

),0(,

)~,0(,)(

2

2

1

Neegy

Nvvgnmg

nn

nnnn

～

～

)1()1(exp)1,( Rnnk TKernel function should positive definite

25/28

1ny

ng

0g

1ng

))',(),(()|)(( 2zzzz kmNYgp ～

)'()()',()',( 1Tztztzzzz

Kkk

Emulator：Online Adjustment（Calibration）and Interpolation

),(,),,()(

)()()()(

1

T

1

Nkk

KMYmm

zzzzzt

ztzz

Posterior distribution

),(),(

),(),(

1

111

NNN

N

kk

kk

K

zzzz

zzzz

Gram Matrix

N

N

yyY

mmM

,,

)(,),(

1

T

1

T

zz

02

2

In a case for

Difference between Data

and Statistical Model Normalization factor

1y

2yInformation from data always reduce uncertainty

For simplicity

26/28

takes a finite value2

Gaussian Processes for Regression:A Quick Introduction

M. Ebden, August 2008

Difference between input z and some

points

Emulator：Obtain a full Bayes model

))',(),((),,|)(( 22 zzzz kmNHYgp ～

1)(,1

)(2

2 Hpp

dHdHppHYgpYgp 222 )()(),,|()|(

Posterior distribution

t distribution with a degree of freedom N1)(dim z

Number of samples

For a part of Regression model

Emulator

参考文献J. Sacks et al., “Design and analysis of computer experiments,” Statistical Science, 1989.

M. C. Kennedy and A. O’hagan, “Bayesian calibration of computer models,” J. Roy. Statist. Soc. Ser. B, 2001.

中野、樋口、“地球科学におけるシミュレーションとビッグデータ―データ同化とエミュレーション―、” 信学会会報、Vol. 97, No.10. 869-875, 2014.

Rasmussen, C. and C. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006.

Gamma distribution

27/28

02

2

In a case for

For simplicity

28/28

エミュレータの設計法（再掲）

１．目的変数（スカラー値）を決める。

２．シミュレーションを複数回走らせる。

３．スパース回帰により、入力パラメータベクトルを同定する。

４．GPRにより、出力関数（応答局面）をもとめる。

②

③

①

Documents

簡便エミュレーションによる 実験計画のスマート化coop-math.ism.ac.jp/files/220/02 MI2I数協160226_樋口.pdf · 簡便エミュレーションによる 実験計画のスマート化

簡便エミュレーションによる実験計画のスマート化coop-math.ism.ac.jp/files/220/02 MI2I数協160226_樋口.pdf · 簡便エミュレーションによる実験計画のスマート化