introduce "Stealing Machine Learning Models via Prediction APIs"

2016.12.06

AISECjp #7

Presented by Isao Takaesu

論文紹介

Stealing Machine Learning Models

via Prediction APIsPart. 1

About the speaker

• 職業 : Webセキュリティエンジニア

• 所属 : 三井物産セキュアディレクション

• 趣味 : 脆弱性スキャナ作り、機械学習

• ブログ: http://www.mbsd.jp/blog/

• Black Hat Asia Arsenal, CODE BLUE / 2016

• AISECjpを主催

高江洲勲

Paper

タカエスイサオ

AISECjp

紹介する論文

Paper

Stealing Machine Learning Models via Prediction APIs

AISECjp

Author : Florian Tramèr (EPFL)

Fan Zhang (Cornell University)

Ari Juels (Cornell Tech, Jacobs Institute )

Michael K Reiter (UNC Chapel Hill)

Thomas Ristenpart (Cornell Tech )

Post Date: 9 Sep 2016

Proceedings of USENIX Security 2016

Source : https://arxiv.org/abs/1609.02943

論文の概要

Paper

機械学習(ML)モデルを複製する”model extraction attacks”の提案

AISECjp

D B

ML service

Data owner

Train model

Extraction

adversaryf 𝒙𝟏

𝒙𝟏

・・・

f 𝒙𝒒

𝒙𝒒

𝒇

LR

MLP

Decision tree

ブラックボックスアクセスのみでMLモデルを複製

モデル複製によるリスク

Paper

課金回避

MLモデルへのクエリ単位で課金するビジネスモデルの場合、

収益の悪化(課金 <訓練コスト)を招く。

訓練データからの情報漏えい

モデルに組み込まれた訓練データ(機密情報を含む)から、

機密情報が漏えい。

振る舞い検知の回避

MLモデルがスパム検知、マルウエア検知、N/W異常検知に使用される場合、

攻撃者は上記の検知機能を回避可能。

AISECjp

モデル複製の手法一覧

Paper

Extraction with Confidence Values

MLモデルがClassとConfidence Valuesを応答する場合。

・Equation-Solving Attacks

・Decision Tree Path-Finding Attacks

・Online Model Extraction Attacks (against BigML, Amazon ML)

Extraction Given Class Labels Only

MLモデルがClassのみ応答する場合。

・The Lowd-Meek attack

・The retraining approach

AISECjp

今回紹介する手法

Paper

Extraction with Confidence Values

MLモデルがClassとConfidence Valuesを応答する場合。

・Equation-Solving Attacks ⇐ココ

・Decision Tree Path-Finding Attacks

・Online Model Extraction Attacks (against BigML, Amazon ML)

Extraction Given Class Labels Only

MLモデルがClassのみ応答する場合。

・The Lowd-Meek attack

・The retraining approach

AISECjp

Paper

Equation-Solving Attacks

AISECjp

“Equation-Solving Attacks”とは ?

Paper AISECjp

MLモデルへの入力「」と、出力「」を基に、

(攻撃者にとって)未知の方程式「」を復元(複製)。

例）”Binary logistic regression”の場合

MLモデル：

攻撃者：

攻撃者が知り得る「」と「」を基に方程式を解き、

未知のパラメータ「」を特定(方程式の復元)。

f 𝒙, 𝒚𝒙, 𝒚

f 𝒙, 𝒚 = “?????”

f 𝒙, 𝒚 = 1.4150971 + 3.3421481 ∗ 𝒙 + 3.0892439∗ 𝒚

f 𝒙, 𝒚 = 𝒘𝟎 + 𝒘𝟏𝒙 + 𝒘𝟐𝒚

f 𝒙, 𝒚𝒙, 𝒚

𝒘𝟎 , 𝒘𝟏, 𝒘𝟐

“Equation-Solving Attacks”の検証

Paper AISECjp

MLモデルの複製

・Binary logistic regression

・Multiclass LR and Multilayer Perceptron


・Training Data Leakage for Kernel LR

・Model Inversion Attacks on Extracted Models

今回検証した“Equation-Solving Attacks”

Paper AISECjp


・Binary logistic regression ⇐ココ





Paper

Binary logistic regression

AISECjp

データのクラス分類(c=2)と(クラスに属する)確率を求める

decision boundary :

Paper AISECjp

f 𝒙𝟏, 𝒙𝟐 = 𝒘𝟎 + 𝒘𝟏𝒙𝟏 + 𝒘𝟐𝒙𝟐

“Binary logistic regression”とは ?（おさらい）

f(x1,x2)=0

f(x1,x2)>0

f(x1,x2)<0

positive

negative

Paper AISECjp

“positive”の確率：

“negative”の確率：

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

𝒂

𝜎 𝑎 =1

1 + 𝑒−𝑎

ロジスティック関数

“Binary logistic regression”とは ?

P 𝒙𝟏, 𝒙𝟐 = 𝝈(𝒘𝟎 + 𝒘𝟏𝒙𝟏 + 𝒘𝟐𝒙𝟐)

1-P 𝒙𝟏, 𝒙𝟐

positive

negative

Paper AISECjp

検証モデルの構築

訓練データ(ex2data1) : 赤 = positive, 青 = negative

⇒ decision boundaryを求める

Paper AISECjp

検証モデルの構築

訓練結果

decision boundary : f 𝒙𝟏, 𝒙𝟐 = 1.415 + 3.342 ∗ 𝒙𝟏 + 3.089∗ 𝒙𝟐

f(x1,x2)=0

f(x1,x2)>0

f(x1,x2)<0

positive

negative

Paper AISECjp

検証モデルの利用イメージ

D B

LR model

UserP=0.055, neg

𝒙𝟏, 𝒙𝟐

・・・

𝒙𝒒𝟏, 𝒙𝒒𝟐

P=0.996, pos

分類させたいデータ(x1, x2)を入力し、

分類結果(c=pos or neg)と(クラスに所属する)確率(P)を得る。

Paper AISECjp

検証モデルの悪用イメージ

D B

LR model

adversaryP=0.055, neg

𝒙𝟏, 𝒙𝟐

・・・

𝒙𝒒𝟏, 𝒙𝒒𝟐

P=0.996, pos

入力データ(x1, x2)と出力される確率(P)を利用し、

decision boundaryを特定する。

f 𝒙𝟏, 𝒙𝟐 = 1.42 + 3.34 ∗ 𝒙𝟏 + 3.09∗ 𝒙𝟐

Paper

どうやってやるのか？

AISECjp

Paper AISECjp

手順１：情報の収集

ユーザの入力モデルの出力

データ(x1, x2) クラス確率(P)

-1.602 0.638 negative 0.123

-1.062 -0.536 negative 0.022

-1.539 0.361 negative 0.068

-0.282 1.086 positive 0.979

・・・・・・・・・・・・

・モデルの利用結果

f 𝒙𝟏, 𝒙𝟐 = 𝒘𝟎 − 𝒘𝟏𝟏. 𝟔𝟎𝟐 + 𝒘𝟐𝟎. 𝟔𝟑𝟖

f 𝒙𝟏, 𝒙𝟐 = 𝒘𝟎 − 𝒘𝟏𝟏. 𝟎𝟔𝟐 − 𝒘𝟐𝟎. 𝟓𝟑𝟔

f 𝒙𝟏, 𝒙𝟐 = 𝒘𝟎 − 𝒘𝟏𝟏. 𝟓𝟑𝟗 + 𝒘𝟐𝟎. 𝟑𝟔𝟏

f 𝒙𝟏, 𝒙𝟐 = 𝒘𝟎 − 𝒘𝟏𝟎. 𝟐𝟖𝟐 + 𝒘𝟐𝟏. 𝟎𝟖𝟔

目的変数「」は？

⇒確率(P)をロジット関数「」に通す

f 𝒙𝟏, 𝒙𝟐

𝒍𝒐𝒈𝒊𝒕 𝑷 = 𝒍𝒐𝒈𝑷

𝟏 − 𝑷

Paper AISECjp

ユーザの入力モデルの出力

データ(x1, x2) クラス確率(P)

-1.602 0.638 negative 0.123

-1.062 -0.536 negative 0.022

-1.539 0.361 negative 0.068

-0.282 1.086 positive 0.979

・・・・・・・・・・・・

・モデルの利用結果

−𝟐. 𝟖𝟑𝟗 = 𝒘𝟎 − 𝒘𝟏𝟏. 𝟔𝟎𝟐 + 𝒘𝟐𝟎. 𝟔𝟑𝟖

−𝟓. 𝟒𝟔𝟕 = 𝒘𝟎 − 𝒘𝟏𝟏. 𝟎𝟔𝟐 − 𝒘𝟐𝟎. 𝟓𝟑𝟔

−𝟑. 𝟕𝟔𝟗 = 𝒘𝟎 − 𝒘𝟏𝟏. 𝟓𝟑𝟗 + 𝒘𝟐𝟎. 𝟑𝟔𝟏

𝟓. 𝟓𝟐𝟑 = 𝒘𝟎 − 𝒘𝟏𝟎. 𝟐𝟖𝟐 + 𝒘𝟐𝟏. 𝟎𝟖𝟔

手順２：方程式を解く(Equation-Solving)

・ Equation-Solving の結果

特定した係数：

複製した関数：

𝒘𝟎 = 2.042 𝒘𝟏 = 4.822 𝒘𝟐 = 4.457

𝒇 𝒙𝟏, 𝒙𝟐 = 2.042 + 4.822 ∗ 𝑥1 + 4.457 ∗ 𝑥2

Equation-Solving

Paper AISECjp

・オリジナルのモデル

“Equation-Solving Attacks”の結果

f 𝒙𝟏, 𝒙𝟐 = 1.415 + 3.342 ∗ 𝒙𝟏 + 3.089 ∗ 𝒙𝟐

・複製したモデル

𝒇 𝒙𝟏, 𝒙𝟐 = 2.042 + 4.822 ∗ 𝒙𝟏 + 4.457 ∗ 𝒙𝟐

複製モデルで正しく分類できるのか？

Paper AISECjp

オリジナルと複製モデルの比較結果

ユーザの入力オリジナルモデル複製モデル

データ(x1, x2) クラス確率(P) クラス確率(P)

-1.602 0.638 negative 0.123 negative 0.055

-1.062 -0.536 negative 0.022 negative 0.004

-1.539 0.361 negative 0.068 negative 0.023

-0.282 1.086 positive 0.979 positive 0.996

0.692 0.493 positive 0.995 positive 0.999

-0.234 1.638 positive 0.997 positive 0.999

0.485 -1.064 negative 0.437 negative 0.410

0.585 -1.008 positive 0.564 positive 0.591

0.177 -0.729 negative 0.439 negative 0.412

・・・・・・・・・・・・・・・・・・

オリジナルと複製モデルの分類結果は完全一致（n=100）

Paper AISECjp

・Rounding confidences

モデルが返すConfidence Valuesを丸めることで複製精度を下げる

例）P= 0.437401116 ⇒ P= 0.43

“Equation-Solving Attacks”の対策

Effect of rounding on model extraction(紹介論文からの引用).

次回の予定 (Equation-Solving Attacks)

Paper AISECjp


・Binary logistic regression（✔）





Download “.PDF” version of this document:

≫ https://aisecjp.connpass.com/event/44600/

Technology

introduce "Stealing Machine Learning Models via Prediction APIs"