16
Optimization Beyond Prediction: Prescriptive Price Optimization Shinji Ito NEC Corporation [email protected] Ryohei Fujimaki NEC Corporation [email protected] May 25, 2016 Abstract This paper addresses a novel data science prob- lem, prescriptive price optimization, which derives the optimal price strategy to maximize future profit/revenue on the basis of massive predictive for- mulas produced by machine learning. The prescrip- tive price optimization first builds sales forecast for- mulas of multiple products, on the basis of histori- cal data, which reveal complex relationships between sales and prices, such as price elasticity of demand and cannibalization. Then, it constructs a mathe- matical optimization problem on the basis of those predictive formulas. We present that the optimiza- tion problem can be formulated as an instance of bi- nary quadratic programming (BQP). Although BQP problems are NP-hard in general and computation- ally intractable, we propose a fast approximation al- gorithm using a semi-definite programming (SDP) relaxation, which is closely related to the Goemans- Williamson’s Max-Cut approximation. Our exper- iments on simulation and real retail datasets show that our prescriptive price optimization simultane- ously derives the optimal prices of tens/hundreds products with practical computational time, that po- tentially improve 8.2 % of gross profit of those prod- ucts. 1 Introduction Recent advances in machine learning have had a great impact on maximizing business efficiency in almost all industries. In the past decade, predictive analytics has become a particularly notable emergent technol- ogy. It reveals inherent regularities behind Big Data and provides forecasts of future values of key perfor- mance indicators. Predictive analytics has made it possible to conduct proactive decision makings in a data-scientific manner. Along with the growth of pre- dictive analytics, prescriptive analytics [4] has been recognized in the market as the next generation of advanced analytics. Advances in predictive analyt- ics w.r.t. both algorithms and software have made it considerably easy to produce a massive amount of predictions, purely from data. The key questions in prescriptive analytics is then how to benefit from those massive amount of predictions , i.e., how to automate complex decision makings by algorithms empowered using predictions. This raises a techni- cal issue regarding the integration of machine learn- ing with relevant theories and algorithms in terms of mathematical optimization, numerical simulation, etc. Predictive analytics usually produces two impor- tant outcomes: 1) predictive formulas revealing in- 1 arXiv:1605.05422v2 [math.OC] 24 May 2016

Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

Optimization Beyond Prediction:

Prescriptive Price Optimization

Shinji ItoNEC Corporation

[email protected]

Ryohei FujimakiNEC Corporation

[email protected]

May 25, 2016

Abstract

This paper addresses a novel data science prob-lem, prescriptive price optimization, which derivesthe optimal price strategy to maximize futureprofit/revenue on the basis of massive predictive for-mulas produced by machine learning. The prescrip-tive price optimization first builds sales forecast for-mulas of multiple products, on the basis of histori-cal data, which reveal complex relationships betweensales and prices, such as price elasticity of demandand cannibalization. Then, it constructs a mathe-matical optimization problem on the basis of thosepredictive formulas. We present that the optimiza-tion problem can be formulated as an instance of bi-nary quadratic programming (BQP). Although BQPproblems are NP-hard in general and computation-ally intractable, we propose a fast approximation al-gorithm using a semi-definite programming (SDP)relaxation, which is closely related to the Goemans-Williamson’s Max-Cut approximation. Our exper-iments on simulation and real retail datasets showthat our prescriptive price optimization simultane-ously derives the optimal prices of tens/hundredsproducts with practical computational time, that po-tentially improve 8.2 % of gross profit of those prod-ucts.

1 Introduction

Recent advances in machine learning have had a greatimpact on maximizing business efficiency in almostall industries. In the past decade, predictive analyticshas become a particularly notable emergent technol-ogy. It reveals inherent regularities behind Big Dataand provides forecasts of future values of key perfor-mance indicators. Predictive analytics has made itpossible to conduct proactive decision makings in adata-scientific manner. Along with the growth of pre-dictive analytics, prescriptive analytics [4] has beenrecognized in the market as the next generation ofadvanced analytics. Advances in predictive analyt-ics w.r.t. both algorithms and software have made itconsiderably easy to produce a massive amount ofpredictions, purely from data. The key questions inprescriptive analytics is then how to benefit fromthose massive amount of predictions, i.e., howto automate complex decision makings by algorithmsempowered using predictions. This raises a techni-cal issue regarding the integration of machine learn-ing with relevant theories and algorithms in termsof mathematical optimization, numerical simulation,etc.

Predictive analytics usually produces two impor-tant outcomes: 1) predictive formulas revealing in-

1

arX

iv:1

605.

0542

2v2

[m

ath.

OC

] 2

4 M

ay 2

016

Page 2: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

herent regularities behind data, and 2) forecasted val-ues for key performance indicators. While it wouldseem a straightforward to integrate the later oneswith mathematical optimization by treating the fore-casted values as their inputs, and in fact there ex-ists lots of existing studies such as inventory manage-ment [1], energy purchase portfolio optimization [13],smart water management [2, 6], etc. The focus ofthis paper is on the later problems in which decisionvariables (e.g., prices) in a target optimization prob-lem (e.g., profit/revenue maximization) are explana-tion variables in a prediction problem (e.g., sales fore-casting). Suppose we obtain regression formulas toforecast sales of multiple products, formulas which re-veal complex relationships between sales and prices,such as price elasticity of demand [14] and cross priceeffects (a.k.a. cannibalization) [15, 17]. The problemis then to find the optimal price strategy to maxi-mize future profit/revenue from such massive predic-tive formulas. We refer to the problem as prescriptiveprice optimization.

The prescriptive price optimization is a variant ofrevenue management [16, 8], which has been activelystudied in areas of marketing, economics, operationresearch. Traditional revenue management literaturehas been focused on such a problem as markdownoptimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a finite selling horizon.Our focus is more on static but simultaneous opti-mization of many products using machine learningbased predictions. Although there are several exist-ing studies such as fast-fashion retailer [3], online re-tailer [5], hotel room [9, 11], etc. (a comprehensivesurvey is given by [10]). However, existing methodshave strong restrictions in demand modeling capa-bility, e.g. one does not consider cross-price effects,another is domain specific and is hard to be gener-alized across industries. Further, most existing stud-ies employ mixed-integer programming for optimiz-ing prices, whose computational cost exponentiallyincreases over increasing number of products. Theprescriptive price optimization aims more machinelearning based (therefore flexibly modeled) revenuemanagement which enables simultaneous price opti-mization of tens/hundreds of products.

This paper addresses prescriptive price optimiza-

tion, and our contributions can be summarized asfollows:

Prescriptive Price Optimization Using Mas-sive Regression Formulas: We establish a math-ematical framework for prescriptive price optimiza-tion. First, multiple predictive formulas (i.e., salesforecasting models for individual products) usingnon-linear price features are derived using a regres-sion technique with historical data. These are thentransformed into a profit (or revenue) function, andthe optimal price strategy is obtained by maximizingthe profit function under business constraints. Weshow that the problem can be formulated as a binaryquadratic programming (BQP) problem.

Fast BQP Solver by SDP Relaxation: BQPproblems are, in general, NP-hard, and we needto use an approximation (or relaxation) method.Although BQP problems are often solved usingmixed-integer programming, computational costswith mixed-integer relaxation methods exponentiallyincrease with increasing problem size, and they arenot applicable to large scale problems. This paperproposes an alternative relaxation method that em-ploy semi-definite programming (SDP) [22], by em-ploying an idea of the Goemans-Williamson’s MAX-CUT approximation [7]. Although our target focuseson prescriptive price optimization, we note that ourSDP relaxation algorithm is a fast approximationsolver for general BQP problems and can be utilizedin wide range of applications.

Experiments on a Real Retail Dataset: We eval-uated the prescriptive price optimization on a real re-tail dataset with respect to 50 beer products as wellas a simulation dataset. The result indicates that thederived price strategy could improve 8.2% of grossprofit of these products. Further, our detailed empir-ical evaluation reveals risk of overestimated profitscaused by estimation errors in machine learning anda way to mitigate such a issue using sparse learning.

2

Page 3: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

2 Prescriptive Price Optimiza-tion

2.1 Problem and Pipeline Descrip-tions

Let us define terminologies for three types of vari-ables: decision, target, and external variables. De-cision variables are those we wish to optimize, i.e.,product prices. Target variables are ones we predict,i.e., sales quantities. External variables consist ofthe other information we can utilize, e.g., weather,temperature, product information, etc. We assumewe have historical observations of them. The goal,then, is to derive the optimal values for the decisionvariables with given external variables so as to max-imize a predefined objective function, e.g. profit orrevenue. In prescriptive price optimization, the deci-sion and target variables are product prices and salesquantities, respectively. The external variables mightbe weather, temperature, product information. Theobjective function is future profit or revenue that isultimately the measure of business efficiency.

Our prescriptive price optimization is conductedin of two stages, which we refer to as modeling andoptimization stages. In the modeling stage, using re-gression techniques, we build predictive formulas forthe target variables by employing the decision and ex-ternal variables (or their transformations) as featureson the basis of historical data relevant to them. Thisstage reveals complex relationships between sales andprices among such multiple products as price elastic-ity of demand and cannibalization. For this, it takesinto account the effect of external variables. In theoptimization stage, with given values of external vari-ables, we transform the multiple regression formulasinto a mathematical optimization problem. Businessrequirements expressed as linear constrains are inputby users of this system and are reflected. By solvingthe optimization problem, we are able to obtain opti-mal values for the decision variables (i.e., an optimalprice strategy).

2.2 Modeling Predictive Formulas

Suppose we have M products and a product indexis denoted by m ∈ {1, . . . ,M}. We employ linearregression models to forecast the sales quantity qm ofthe m-th product on the basis of price, denoted bypm, and the external variables. This modeling stagehas two tunable areas: 1) feature transformations and2) a learning algorithm of linear regression.

For the feature transformations, we suppose wehave D arbitral but univariate transformations onpm, which is denoted by fd (d = 1, . . . , D). fd mightbe designed to incorporate a domain specific relation-ship between price and demand, such as the law ofdiminishing marginal utility [14], as well as to achievehigh prediction accuracy. Further, the external vari-ables might be transformed into features denoted bygd (d = 1, . . . , D′). On the basis of these features,the regression model of the m-th product can be ex-pressed as follows:

q(t)m (p,g) = α(t)m +

M∑m′=1

D∑d=1

β(t)mm′dfd(pm′) +

D′∑d=1

γ(t)mdgd,

(1)

where α(t)m , β

(t)mm′d, and γ

(t)md are bias, the coefficient of

fd(pm′), and the coefficient of gd, respectively. Also,p and g are defined as p = [p1, . . . , pM ]> and g =[g1, . . . , gD′ ]>. The superscription (t) for the timeindex is introduced for optimization through multipletime steps. For example, in order to optimize pricesfor the next one week, we might need seven regressionmodels (one model per day) for a single product.

For the learning algorithm, in principle, anystandard algorithm, such as least square regres-sion, ridge regression (L2 regularized), Lasso (L1-regularized) [19], or orthogonal matching pur-suit (OMP, L0 regularized) [21] would be applicablewith our methodology. The choice of learning algo-rithm depends on the way relationships among mul-tiple products are to be modelled. Experience showsthat these relationships are complicated yet usuallysparse in practice1, and sparse learning algorithms

1For example, a price of rice ball might be related with salesof green tea, but might not be related with those of milk.

3

Page 4: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

might be preferable. More detailed discussions arepresented in Section 6.2.

2.3 Building Optimization Problem

Suppose values of gd are given for the time step t (e.g.

weather forecast), denoted by g(t)d , where g(t) =

[g(t)1 , . . . , g

(t)D′ ]>. Using predictive formulas obtained

in the modeling stage, given costs c = [c1, . . . , cM ]>,the gross profit can be represented as:

`(p) =

T∑t=1

M∑m=1

(pm − cm)q(t)m (p,g(t)) = (p− c)>q,

(2)

where q = [∑Tt=1 q

(t)1 (p,g(t)), . . . ,

∑Tt=1 q

(t)M (p,g(t))]>.

Note that c = 0 gives the sales revenue on p.For later convenience, let us introduce ξm and ζm

as follows:

ξm(pm) =

T∑t=1

(pm − cm)(α(t)m +

D′∑d=1

γ(t)mdg

(t)d )

(3)

ζmm′(pm, pm′) =

T∑t=1

(pm − cm)

D∑d=1

β(t)mm′dfd(pm′)

(4)

Then, (2) can be rewritten by:

`(p) =

M∑m=1

ξm(pm) +

M∑m=1

M∑m′=1

ζmm′(pm, pm′). (5)

In practice, pm is often chosen from the set{Pm1, . . . , PmK} of K price candidates where Pm1

might be a list price and Pmk (k > 1) might be dis-counted prices such as 3%-off, 5%-off, $1-off. Hence,the problem of maximizing the gross profit can beformulated as follows:

Maximize `(p) (6)

subject to pm ∈ {Pm1, . . . , PmK} (m = 1, . . . ,M).

An exhaustive search with respect to this problemwould require Θ(KM )-time computation, and hence

would be computationally intractable when M islarge. Further, the price strategy might have to sat-isfy certain business requirements. Let us consider asituation in which we can discount only L productsat the same time. Assume that Pm1 is the list pricefor the m-th product and Pmk (k > 1) are discountedprices. A requirement here can then be expressed interms of the following constraints:

|{m ∈ {1, . . . ,M} | pm = Pm1}| ≥M − L. (7)

The system allows users to input such business re-quirements, which are then transformed into math-ematical constraints, as shown above. The problem(6) is solved with such constraints taken into account.The type of requirements we can deal with is dis-cussed in the next section.

3 BQP Formulation

3.1 Derivation of BQP Problem

The general form (6) is intractable due to combina-torial nature of the optimization and also non-linearmapping ξm and ζm,m′ , and a naive method wouldrequire unrealistic computational cost. In order toefficiently solve (6), we here convert it into a moretractable form.

Let us first introduce binary variableszm1, . . . , zmK ∈ {0, 1} satisfying

∑Kk=1 zmk = 1.

Here, zmk = 1 and zmk = 0 refer to pm = Pmk andpm 6= Pmk, respectively, which gives

pm =

K∑k=1

Pmkzmk (m = 1, . . . ,M). (8)

For an arbitral function φ, the following equalityholds:

φ(pm) =

K∑k=1

φ(Pmk)zmk. (9)

Using (9), ζmm′(pm, pm′) can be rewritten as follows:

ζmm′(pm, pm′) = z>mQmm′zm′ , (10)

4

Page 5: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

where zm = [zm1, . . . , zmK ]>. We here define Qij ∈RK×K by

Qij =

ζij(Pi1, Pj1) ζij(Pi1, Pj2) · · · ζij(Pi1, PjK)ζij(Pi2, Pj1) ζij(Pi2, Pj2) · · · ζij(Pi2, PjK)

......

. . ....

ζij(PiK , Pj1) ζij(PiK , Pj2) · · · ζij(PiK , PjK)

.(11)

Similarly, ξi(pi) can be rewritten as follows:

ξi(pi) = r>i zi := [ξi(Pi1), . . . , ξi(PiK)]>zi. (12)

By substituting (10) and (12) into (6), (6) can berewritten as follows:

Maximize f(z) := z>Qz + r>z (13)

subject to z = [z11, . . . , z1K , z21, . . . , zMK ]> ∈ {0, 1}MK ,

K∑k=1

zmk = 1 (m = 1, . . . ,M), (14)

where Q ∈ RMK×MK and r ∈ RMK are defined by

Q =

Q11 Q12 · · · Q1n

Q21 Q22 · · · Q2n

......

. . ....

Qn1 Qn2 · · · Qnn

, r =

r1r2...rn

. (15)

The terms z>Qz and r>z are correspond to the sec-ond and first term of (5), respectively. Problem (13)is referred to as a BQP problem and is known to beNP-hard. Although it would be hard to find a glob-ally optimal solution of (13), its relaxation methodshave been well-studied and further, in Section 4, wepropose a relaxation method which empirically ob-tains an accurate solution.

Using z, the constraint (7) can be expressed as fol-lows:

M∑m=1

zm1 ≥M − L. (16)

This is a linear constraint on z and such linear con-straints can be naturally incorporated into (13) (theproblem remains to be BQP).

For simplicity, we redefine the indices of the entriesof vectors and matrices as follows:

z = (zi)1≤i≤KM , r = (ri)1≤i≤KM ∈ RKM , (17)

Q = (qij)1≤i,j≤KM ∈ RKM×KM . (18)

Then the equality constraints (14), can be expressedin the following general form:∑

i∈Im

zi = 1 (m = 1, . . . ,M), (19)

where {Im}Mm=1 is a partition of {1, 2, . . . ,KM}. Insummary, we solve the following BQP problem toobtain the price strategy satisfying business require-ments:

Maximize f(z) := z>Qz + r>z (20)

subject to z = [z1, . . . , zKM ]> ∈ {0, 1}KM ,∑i∈Im zi = 1 (m = 1, . . . ,M),

a>u z = bu (u = 1, . . . , U),

c>v z ≤ dv (v = 1, . . . , V ),

where U and V are the number of equality and in-equality constraints, respectively, and au, bu, cv, anddv are coefficients of linear constraints. Although werestrict business constraints to be expressed as lin-ear constraints, we emphasize that linear constraintsare able to cover a variety of practical business con-straints.

3.2 MIP relaxation method

Problem (13) is a kind of mixed integer quadraticprogramming called binary quadratic programming.One of the most well-known relaxation techniques forefficiently solving it is mixed integer linear program-ming [12].

By introducing auxiliary variables zij (1 ≤ i < j ≤KM) corresponding to zij = zizj , and also introduc-ing

j∑i=mK+1

zij +

mK+K∑i=j+1

zji = (

mK+K∑i=mK+1

zi − 1)zj = 0,

(21)

5

Page 6: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

we can transform (20) into the following MILP prob-lem [12]:

Maximize

KM∑i=1

(ri + qii)zi +

KM∑i=1

KM∑j=i+1

(qij + qji)zij

(22)

subject to

mK+K∑i=mK+1

zi = 1 (0 ≤ m ≤M − 1),

zij ≤ zi (1 ≤ i < j ≤ KM),

zij ≤ zj (1 ≤ i < j ≤ KM),

j∑i=mK+1

zi,j +

mK+K∑i=j+1

zj,i = 0

(mK < j ≤ mK +K, 0 ≤ m ≤M − 1),

zij ≥ 0 (1 ≤ i < j ≤ KM),

zi ∈ {0, 1} (1 ≤ i ≤ KM).

Note that, though the objective function and con-straints are linear, integer variables still exist. There-fore, worst case complexity is still exponential, whichmeans computational cost might rapidly increasew.r.t. increasing problem size even with a moderncommercial MILP solver.

4 SDP Relaxation UsingGoemans-Williamson’s Ap-proximation

In order to efficiently solve our prescriptive price opti-mization formulated in the BQP problem, this sectionproposes a fast approximation method. Our idea isclosely related to the Goemans-Williamson’s MAX-CUT approximation algorithm [7], which is abbre-viated to the GW algorithm. The GW algorithmis an algorithm for solving the MAX-CUT problem,achieving the best approximation ratio among exist-ing polynomial time algorithms. By noticing the factthat a MAX-CUT problem is a special case of BQPproblems, we generalize it to an approximation algo-rithm for BQPs.

The proposed algorithm consists of the followingtwo steps:

1. Transform the original BQP problem (13) into asemidefinite programming (SDP) problem (38) byborrowing the relaxation technique used in theGW algorithm.

2. Construct a feasible solution of the original BQPproblem on the basis of the optimal solution to theSDP problem.

The optimal solution of the SDP problem can beglobally and efficiently computed by a recent ad-vanced solver such as SDPA [24], SDPT3 [20], andSeDuMi [18]. In our experiments, empirical compu-tational time fits a cubic order of problem size.

4.1 Notations

Let us here introduce a few additional notations. LetSymn denote a set of all real symmetric matrices ofsize n as follows:

Symn = {X ∈ Rn×n | X> = X}. (23)

Let us also define an inner product over Symn by X •Y =

∑ni=1

∑nj=1XijYij for X,Y ∈ Symn. Further,

let Sn denote a set of all vectors on a unit `2 ball inthe n+ 1 dimension as follows:

Sn = {x ∈ Rn+1 | ‖x‖2 = 1}. (24)

4.2 Derivation of SDP Relaxation

Let us first define Q by Q := (Q+Q>)/2, which sat-isfies x>Qx = x>Qx. Let us also consider a trans-formed variable from {0, 1} to {−1, 1} as follows:

t = −1 + 2z ∈ {−1, 1}KM (25)

where t = [t1, . . . , tKM ]> and 1 = (1, 1, . . . , 1)>. Theobjective function of (13) can be then transformed asfollows:

z>Qz + r>z = [1 t>]A

[1t

], (26)

where we define A ∈ SymKM+1 by

A =1

4

[1>Q1 + 2r>1 (r + Q1)>

r + Q1 Q

]. (27)

6

Page 7: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

Further, the one-of-K constraint of (13), i.e.∑Kk=1 zmk = 1, can be transformed as follows:

K∑k=1

tKm+k = −K + 2 (m = 0, . . . ,M − 1) (28)

The central idea of the GW algorithm is to relax{1,−1}-valued variables into Sn-valued ones. In or-der to apply the GW algorithm, we first define thefollowing auxiliary variables:

x0 = [1, 0, . . . , 0]>, (29)

xi = [ti, 0, . . . , 0]> (i = 1, . . . ,KM) (30)

On the basis of this transformation, we obtain thefollowing relaxation problem:

Maximize tr

[x0,x1, . . . ,xKM ]A

x>0...

x>KM

(31)

s.t. xi ∈ RKM+1, ‖xi‖2 = 1 (i = 0, . . . ,KM),

K∑k=1

xKm+k = (−K + 2)x0 (m = 0, . . . ,M − 1).

It is easy to confirm that (31) is a relaxation problemof (13).

Next, in order to derive an SDP form, we transformthe objective as follows:

g(Y ) := tr

[x0,x1, . . . ,xKM ]A

x>0...

x>KM

= A • Y,

(32)

by introducing a new variable Y ∈ SymKM+1 as:

Y =

y00 y01 · · · y0,KMy10 y11 · · · y1,KM...

.... . .

...yKM,0 yKM,1 · · · yKM,KM

(33)

=

x>0x>1...

x>KM

[x0,x1, . . . ,xKM ]. (34)

From the definition, Y is positive semidefinite andsatisfies

yij = x>i xj (i = 0, 1, . . . ,KM, j = 0, 1, . . . ,KM).(35)

Conversely, there exists x0,x1, . . . ,xKM ∈ RKM+1

satisfying Eq. (33) and Eq. (35) if Y is positivesemidefinite.

By using the matrix Y , we can express the con-straint conditions ‖xi‖2 = 1 by yii = 1. Since

x0 is a unit vector, the condition∑Kk=1 xKm+k =

(−K + 2)x0 holds if and only if

x>0

K∑k=1

xKm+k = −K + 2,

∥∥∥∥∥K∑k=1

xKm+k

∥∥∥∥∥2

2

= (−K + 2)2,

(36)

which are expressed as follows:

K∑k=1

y0,Km+k = −K + 2,

K∑k=1

K∑l=1

yKm+k,Km+l = (−K + 2)2.

(37)

Summarizing the above arguments, we obtain the fol-lowing SDP problem:

Maximize g(Y ) (38)

s.t. Y = (yij)0≤i,j≤KM ∈ SymKM+1, Y � O,Yii = 1 (i = 0, . . . ,KM),

K∑k=1

y0,Km+k = −K + 2 (m = 0, . . . ,M − 1),

K∑k=1

K∑l=1

yKm+k,Km+l = (−K + 2)2 (m = 0, . . . ,M − 1).

This problem is equivalent to Problem (31), andhence is a relaxation of (13). Consequently, the op-timal value of (38) gives an upper bound of that of(13). Due to space limitation, we omit to derive theSDP problem for (20). We denote the optimal solu-tion of our SDP relaxation problem by Y .

4.3 Rounding

Once we obtain the optimal solution Y , we constructa feasible solution of the original BQP problem by

7

Page 8: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

using rounding techniques. In the derivation of Prob-lem (31), 1 is replaced by x0 and ti is replaced by xifor i = 1, . . . ,KM . Accordingly, as expectation, thefollowing relationship between z and Y might hold:

2zi − 1 = ti = 1 · ti ≈ x>0 xi = y0i (i = 1, . . . ,KM).(39)

A simple rounding is then to construct zi as follows:

zi =

{1 if y0i > y0j i 6= j

0 otherwise. (40)

where we denote the rounded solution by z. On thebasis of the above observation, this paper applies twoheuristics to explore a better feasible solution.

The first one is a deterministic search which is sum-marized in Algorithm 1. A key idea behind the de-terministic search is to explore feasible solutions bycombining elements with higher values of the relaxedsolutions, y0,Km+k and the algorithm first collectsindices as shown in Line 3. Then, it simply evalu-ates objective values of all possible combinations ofcollected indices as shown in Line 5. Note that itrestricts the size of the search space, by T , to avoidcombinatorial explosion of the search space.

The second one is a randomized search which issummarized in Algorithm 2. A key idea behind therandomized search is to interpret y0,Km+k as prob-

ability by the constraint∑Kk=1 y0,Km+k = −K + 2.

Then, we pick an index i by proportional to the prob-ability (y0,i + 1)/2 and set z as follows:

zj =

{1 j = i

0 j ∈ Is \ {i}(41)

The higher the value of y0,Km+k is, the more likelythe corresponding value of zj is to be 1. We repeatthis procedure until it returns a feasible solution ofthe original problem. On the basis of our empiri-cal evaluation, for (13), the deterministic search per-formed slightly better. On the other hand, it oftenmissed to find a feasible solution for (20) and there-fore the randomized search is preferable.

Algorithm 1 Deterministic Search Rounding

Input: Y , TOutput: z

1: For each m, initialize index sets such that

Cm = {arg maxk{y0,Km+k | k = 1, . . . ,K}}.

(42)

2: while∏Mm=1 |Cm| < T do

3: Update an index set such that

(m, k) = arg maxm∈{1,...,M},k∈{1,...,K},k/∈Cm

{y0,Km+k},

(43)

Cm ← Cm ∪ {k}. (44)

4: end while5: Let Cz be a set of all combinatorial candidates

of rounded solutions w.r.t. Cm for ∀m, formallydefined as:

Cz := {[0, . . . , 0, . . . ,k∨1 , . . . 0︸ ︷︷ ︸

m-th chunk

, . . . , 0]>|∀k ∈ Cm,∀m},

where |Cz| =∏Mm=1 |Cm| ≥ T . Then, compute

the rounded solution as follows:

z = arg maxz∈Cz∩Z

f(z) (45)

where Z is the feasible region of the original prob-lem.

8

Page 9: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

Algorithm 2 Randomized Search Rounding

Input: Y , TOutput: z

1: For each s, pick i from Is with probability (y0,i+1)/2 randomly, and set z by (41).

2: do3: Let Ivio ⊆ {1, . . . , n} be defined by:

Ivio = {i | ∃ violated constraint w.r.t. zi} (46)

4: Pick s such that Ivio ∩ Is 6= ∅ randomly, andpick i from Is with probability (y0,i+1)/2. Then,set z by (41) for a given Is.

5: while |Ivio| > 0

4.4 Approximation Quality

It is practically important to evaluate the quality ofthe solution obtained by the SDP relaxation method.By the SDP relaxation method, we obtain z and Ywhich immediately give f(z) and g(Y ). Then let usconsider the following inequality:

f(z) ≤ f(z∗) ≤ g(Y ), (47)

where z∗ is the optimal solution of the original prob-lem. The first inequality holds by the optimality of z∗

and the second one holds because the relaxed problemalways gives an upper bound of the original problem.Eq. (47) gives us a lower bound of the approximationratio of the obtained solution as follows:

δ(z, Y ) :=f(z)

g(Y )≤ f(z)

f(z∗)≤ 1. (48)

Although we cannot obtain the true optimal solutionz∗ since the problem is NP-hard, we can estimate thequality of the obtained solution z by checking thevalue of δ(z, Y ). Note that the approximation ratiocan be calculated by taking a ratio between the orig-inal objective value and the relaxed objective value,and hence it can be defined for the other relaxationmethods like the MILP relaxation.

5 Simulation Study

This section investigates detailed behaviors of theproposed method on the basis of artificial simulation.We used GUROBI Optimizer 6.0.42, which is a state-of-the-art commercial solver for mathematical pro-gramming, to solve MIQP and MILP problems. Also,we used SDPA 7.3.83, which is an open source solverfor SDP problems. All experiments were conductedin a machine equipped with Intel(R) Xeon(R) CPUE5-2699 v3 @ 2.30GHz (72 cores), 768GB RAM, andCentOS7.1. We limited all processes to single CPUcore.

5.1 Simulation Model

The sales quantity qm of the m-th product was gen-erated from the following regression model:

qm = α∗m +

M∑m′=1

D∑d=1

β∗mm′dfd(pm′) + ε, ε ∼ N(0, σ2),

(49)

where {fd(x)} = {x, x2, 1/x} and the true coefficients{α∗m} and {β∗mm′d} were generated by Gaussian ran-dom numbers, so that α∗m ∼ N(4M, 1), β∗mm′d ∼N(0, 1)(m 6= m′), β∗mm′d ∼ N(−1, 1)(m = m′). Theprice pm is uniformly sampled from the fixed pricecandidates {0.8, 0.85, 0.9, 0.95, 1} (K = 5) and costwas fixed to cm = 0.7.

Let us denote the gross profit function (2) com-puted with the true parameter by `∗(p). Then, wedenote its expectation by

f∗(z) := Eε[`∗(p)], (50)

where Eε is expectation with respect to ε. Its maxi-mizer is then denoted by

z∗ = arg maxz∈Z

f∗(z). (51)

5.2 Scalability Comparison of BQPSolvers

We compared the SDP relaxation method with theMIQP solver implemented in GUROBI which can di-

2http://www.gurobi.com/3http://sdpa.sourceforge.net/

9

Page 10: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

rectly solve the BQP problem and the MILP relax-ation method described in Section 3.2. We denotethem by SDPrelax, MIQPgrb and MILPrelax, respec-tively. For each solver, we obtain the relaxed objec-tive value f∗ and the original objective value f∗(z)which satisfy:

f∗(z) ≤ f∗(z∗) ≤ f∗. (52)

Given a problem, the performance of solvers was mea-sured by computational efficiency and difference off∗(z) and f∗. Note that f∗(z) = f∗ implies z = z∗.

Fig. 1 shows the results with a small number ofproducts, i.e. M = 1, 2, . . . , 15. We observed that:

• In the top figure, SDPrelax obtained the optimalsolution in only several seconds with M = 15, andwe confirmed the advantage in computational effi-ciency of SDPrelax against the others.

• In the top figure, the computational cost of MIQP-grb and MILPrelax exponentially increased overthe problem size, and both of them reached themaximum time limitation (one hour) at M = 11,and we confirmed that they cannot scale to largeproblems.

• In the bottom figure, f∗ and f∗(z) of SDPrelaxwere almost the same, which implied that SD-Prelax obtained nearly-optimal solutions.

• In the bottom figure, f∗ for MIQPgrb and MIL-Prelax rapidly increased from M = 11. This wasbecause we terminated the optimization by onehour limit. Further, the upper bound of MILPrelaxwas looser than the others. On the other hand,f∗(z) for MIQPgrb and MILPrelax were close tothat of SDPrelax (nearly-optimal) and thus theymight be able to obtain a practical solution withheuristic early stopping though it is not trivial todetermine when we stop the algorithms.

Next, we conducted experiments with large prob-lems by aiming to verify 1) scalability and solutionquality of SDPrelax for larger problems, and 2) solu-tion qualities of MIQPgrb and MILPrelax by fixingthe computational time budget. The second pointwas investigated since the bottom figure of Fig. 1 in-dicates that MIQPgrb and MILPrelax might reachnearly-optimal solution much earlier than the algo-rithm termination. In order to evaluate it, we aborted

0 2 4 6 8 10 12 14 16M: number of products

10-2

10-1

100

101

102

103

104

com

puta

tion tim

e [s]

SDPrelaxMIQPgrbMILPrelax

0 2 4 6 8 10 12 14 16M: number of products

0.8

1.0

1.2

1.4

1.6

1.8

2.0

M: valu

e o

f obje

ctiv

e funct

ion SDPrelax

MIQPgrbMILPrelax

Figure 1: Comparisons of SDPrelax, MIQPgrb andMILPrelax with a small number of products. Thehorizontal axis represents the number of products M .The vertical axes represent computational time (top)and f∗(z) and f∗ objective values (bottom). For thebottom, values are normalized such that f∗(z) = 1for SDPrelax.

MIQPgrb and MILPrelax with the same computa-tional time budget (i.e. we terminated them whenthe computational time reached that of SDPrelax.)

Fig. 2 shows the results with a large number ofproducts. We observed that:

• In the top figure, the computational time of SD-Prelax fits well to a cubic curve w.r.t. M , so itspractical computational order might be O(M3).For 250 products, it took only 6 minutes to obtainthe optimal solution. This is not real-time process-ing but is sufficiently for scenarios such as priceplanning for retail stores. Further, let us empha-size that we used only single core for comparison,and hence the computational time can be signif-icantly reduced by taking an advantage of recentadvanced parallel linear algebra processing.

• In the bottom figure, the solution of SDPrelax wasstill nearly-optimal even if the number of products

10

Page 11: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

0 50 100 150 200 250M: number of products

0

100

200

300

400

500

computation tim

e [s]

computation time of SDPrelax

y=cx3

0 50 100 150 200 2500.4

0.6

0.8

1.0

1.2

1.4

1.6

value of objective function f

SDPrelax f(z)

SDPrelax f

MIQPgrb f(z)

MILPrelax f(z)

Figure 2: Comparisons of SDPrelax, MIQPgrb andMILPrelax with a large number of products. Thetop figure shows the computational time of SDPrelaxover the number of products M . The bottom fig-ure compares values of f∗(z) for the three methodsby restricting their computational time to be thatof SDPrelax. For the bottom, values are normalizedsuch that f∗(z) = 1 for SDPrelax.

increased up to M = 250, i.e., f∗(z)/f∗ of SD-Prelax was at least 0.98. This indicates the SDP re-laxation is tight enough to obtain practically goodsolutions.

• In the bottom figure, under the computationalbudget constraint, the solutions of MIQPgrb andMILPrelax were significantly worse than that ofSDPrelax. Further, over the problem size, theirsolutions became even worse.

These results show that, for solving our BQPs (13),SDPrelax significantly outperforms the other state-of-the-art BQP solvers in both scalability and op-timization accuracy. Furthermore, SDPrelax returnssmaller upper bound f∗ of exact optimal value, whichmeans that it gives better guarantees on accuracy ofthe computed solution.

5.3 Influence of Parameter Estima-tion

In practice, we do not know the true parameters andhave to estimate them from a training dataset de-noted by D = {pn,qn}Nn=1. In this experiment, givenD, we estimated regression coefficients, which are de-noted by {αm} and {βmm′d}. The gross profit func-tion with the estimated parameters is then denotedby f(z).

Let us define the solution on the estimated objec-tive as follows:

z = arg maxz∈Z

f(z). (53)

This section investigates how optimization results areaffected by the estimation. Note that the problem isNP-hard and we can obtain neither z∗ nor z. How-ever, the results in the previous subsection indicatedSDPrelax obtains nearly-optimal solutions, so thissection considers the solutions of SDPrelax as z∗ andz.

We have three important quantities of practical in-terests: 1) ideal gross profit: f∗(z∗), 2) actual gross

profit: f∗(z), and 3) predicted gross profit: f(z). Itis worth noting that, if the true model is a linear re-gression like this setting, the following relationshipholds:

f∗(z) ≤ f∗(z∗) ≤ ED[f(z)] (54)

where ED is expectation w.r.t. D. We omit the prooffor space limitation. This result yields two naturalquestions:• How close f∗(z) and f∗(z∗) are? In other words,

how well does our estimated optimal strategy per-form?

• How close f∗(z) and f(z) are? In other words, canwe predict actual profit in advance?

In the following, let δ stand for the relative magnitudeof the noise in data: δ :=

√σ2/E[q2m], where σ2 is

the variance of the noise ε in (49). Roughly speaking,δ is the level of prediction or estimation error that wecannot avoid.

Fig. 3 illustrates behaviors of f∗(z)/f∗(z∗) and

f(z)/f∗(z∗) in different settings. We observed that:

11

Page 12: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

• In the top figure, the overestimation of the pre-dicted gross profit f∗(z)/f∗(z∗) got linearly largealong with increasing M and it became over15% (i.e. f∗(z)/f∗(z∗) ≥ 1.15) with fifty prod-ucts (M = 50) under δ = 0.2. On the other hand,the actual gross profit (red line) remained nearly-

optimal f(z)/f∗(z∗) ∼ 1.0. This result meansthat over-flexible models (i.e. the number of prod-ucts that we use in prediction models and that wecan optimize) significantly overestimates the grossprofit. Although the obtained price strategy staysin a nearly-optimal solution, this is not preferablesince users cannot appropriately assess the risk ofmachine-generated price strategies. In order tomitigate this issue, the next section investigates toincorporate a sparse learning technique in learn-ing regression models so that the effective modelflexibility stays reasonable even if M is large.

• In the middle figure, along with increasingnoise level, the gap between f∗(z)/f∗(z∗) and

f(z)/f∗(z∗) increased. This result verifies a natu-ral intuition: if the estimation errors of the regres-sion models are large, the modeling error in theBQP problem becomes large and the solution be-comes unreliable. Therefore, achieving fairly goodpredictive models (say the error rate is less than20% in this setting) is critical in this framework.

• In the bottom figure, along with increasing train-ing data, the gap between f∗(z)/f∗(z∗) and

f(z)/f∗(z∗) decreased and we confirmed that in-creased data size made estimation accurate andeventually made optimization accurate.

10 20 30 40 50M: number of products

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

1.30

value of objective function

f(z)/f ∗ (z ∗ )

f ∗ (z)/f ∗ (z ∗ )

0.0 0.2 0.4 0.6 0.8 1.0delta: noise level

0.8

1.0

1.2

1.4

1.6

value of objective function

f(z)/f ∗ (z ∗ )

f ∗ (z)/f ∗ (z ∗ )

0 500 1000 1500 2000 2500 3000N: number of data

0.9

1.0

1.1

1.2

1.3

1.4

value of objective function

f(z)/f ∗ (z ∗ )

f ∗ (z)/f ∗ (z ∗ )

Figure 3: Value of f∗(z)/f∗(z∗) (red) and

f(z)/f∗(z∗) (blue) for different setting. Dot and er-ror bar mean the average and standard deviation of100 times trial. Top: δ = 0.2, N = 1000. Middle:M = 10, N = 1000. Bottom: M = 10, δ = 0.2.

6 Real World Retail Data

6.1 Data and Experimental Settings

We applied our prescriptive price optimizationmethod to real retail data in a middle-size supermar-ket located in Tokyo4 [23]. We selected regularly-sold

4The data has been provided by KSP-SP Co., LTD,http://www.ksp-sp.com.

12

Page 13: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

50 beer products5 which varies in different brandsand different packages as shown in Table 1. The datarange is approximately three years from 2012/01 to2014/12, and we used the first 35 months (1065 sam-ples) for training regression models and simulated thebest price strategy for the next one week. In additionto 50 linear price features, we employed ”day of theweek” features (g1 - g7) for weekly trend, ”month” forseasonal trend (g8 - g19), weather and temperatureforecasting features (g20 - g24) and auto-correlationsfeatures (g25 - g31) as external features. The pricecandidates {Pmk}5k=1 were generated by equally split-ting the range [Pm1, Pm5] where Pm1 and Pm5 are thehighest and lowest prices of the m-th product in thehistorical data. We can regard Pm1 as the list price ofthe m-th product and the others as discounted prices.Further, we assumed that the cost cm for selling oneunit of the m-th product is 0.3Pm1.

6.2 Influence of Model Complexity

As we have discussed in the previous section, if weuse all 50 products in predictions, the predicted grossprofit might significantly overestimate the actual one.However, it can be reasonably assumed that sales ofa certain product is affected by only a limited num-ber of products, but not all products. Hence, it isexpected that we can mitigate the overestimation is-sue by learning such a sparse cross-price structure. Inaddition, there are influential variables that must betaken into account, e.g. the price of the top sellerproducts. In practice, however, we observed thatsuch variables could be omitted by OMP because ofmulticollinearity. In order to manage the issue, wefirst applied a standard least square estimation (LS)for the top-5 products and then applied OMP thethe residual to extract additional 10 variables includ-ing external variables. In our experiment, 10 pricefeatures and 5 external features were selected on av-erage. We denote this procedure by LS-OMP. Both astandard LS and LS-OMP produced fairly good pre-dictive models with approximately 20% relative er-rors on average.

5The data contains sales history of beer, bakery, milk andtofu products, and we chose beers since they have larger cross-price effects than the others in general.

Fig. 4 illustrates the predicted gross profits (solidlines) by fixing the number of discounted product L

by∑Mm=1 zm1 = M − L. Further, LS and LS-OMP

estimated overestimations LS and LS-OMP by 35%and 10%, respectively, on the basis of observations6

in Fig. 3. The dashed lines are after subtracting these35% and 10% from the solid lines by taking into ac-count the overestimation risk. We observed that:

• LS achieved much higher ”predicted profit” (greensolid) than the actual profit (red line). By this es-timation, this price strategy achieves 33.2% profitimprovement at the maximum point (L = 20),which is unrealistically high. On the other hand,by taking the overestimation into account (greendashed), this strategy could even decrease profit.These results imply the risk of machine-based priceoptimization and necessity of appropriate manage-ment of estimation errors in machine learning.

• LS-OMP achieved 19.1% profit improvement in the”predicted profit” (blue solid), which again mustbe much higher than reality. However, by re-stricting the number of price variables using OMP,LS-OMP still achieved 8.2% profit improvementin ”the worse” case. Although this number itselfneeds more careful inspection, this result imply theimportance of controlling model complexity to de-rive a realistic and profitable price strategy.

6.3 Interpretation of Derived PriceStrategy

Table 1 shows all 50 products and their prices, salesquantity [unit] and sales revenue [yen]. This tableprovides much richer insights to understand how ma-chine tried to maximize profit of this supermarket.Let us here summarize notable points:

• prices of 18 products out of 50 products were in-creased or decreased by the prescriptive price op-timization. Particularly, the major impact is ontop-10 discounted products. This meant prices of20% of products dominated revenues/profits.

• We observed sales decreases only on 4 minor prod-

6We roughly estimated overestimations of LS and LS-OMPby relating them with the cases of (M = 50, N = 1000, δ =0.2) and tM = 10, N = 1000, δ = 0.2), respectively.

13

Page 14: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

Figure 4: Computed estimated profits for LS (green)and LS-OMP (blue). The vertical dotted lines are thenumbers of discounted products that achieved max-imum profits. The actual profit stands for the grossprofit in data during this period.

ucts (id=3, 18, 44, 47) out of 50 products, and therest of 46 products increased their sales.

• Asahi Superdry and Kirin Ichibanshibori are themost popular products. Particularly, their 350ml* 6 packages (id=28, 34) dominated 23% of totalsales [yen]. With the optimal prices, this trend wasenhanced and their domination became even 27%by discounting their prices. Overall, the strategy isinterpreted to enhance sales of popular products,that does sound natural from domain point of view.

• It is interesting to notice that the price of AsahiSuperdry 500ml * 6 (id=14) was increased but salesquantity [unit] did not change, resulting in increaseof its sales [yen]. It can be interpreted that thisproduct might have low price elasticity in demandand therefore small price increase does not affectits demand but increases its sales.

7 Summary

This paper presented prescriptive price optimization,which models complex demand-price relationshipsbased on massive regression formulas produced bymachine learning and then finds the optimal pricesmaximizing the profit function. We showed that theproblem can be formulated as BQP problems, anda fast solver using a SDP relaxation was presented.

It was confirmed in simulation experiments that theproposed algorithm performs much better than state-of-the-art optimization methods in terms of bothscalability and quality of output solutions. Empiricalevaluations were conducted with a real retail datasetwith respect to 50 beer products as well as a simu-lation dataset. The result indicates that the derivedprice strategy could improve 8.2% of gross profit ofthese products. Further, our detailed empirical eval-uation reveals risk of overestimated profits caused byestimation errors in machine learning and a way tomitigate such a issue using sparse learning. A chal-lenging future work is to avoid effects of estimationerror in real application. When there are unobservedvariables affecting price or sales, then we cannot esti-mate the parameters accurately, which might cause abig errors in estimated gross profit function and theresult might be unreliable. In order to cope with sucha situation, studies on optimization framework whichcan take account of the error of estimation, such asrobust optimization framework, may be needed.

References

[1] D. Bienstock and N. OZbay. Computing ro-bust basestock levels. Discrete Optimization,5(2):389–414, 2008.

[2] J. Burgschweiger, B. Gnadig, and M. C. Stein-bach. Optimization models for operative plan-ning in drinking water networks. Optimizationand Engineering, 10:43–73, 2009.

[3] F. Caro and J. Gallien. Clearance pricing opti-mization for a fast-fashion retailer. OperationsResearch, 60(6):1404–1422, 2012.

[4] C. Dziekan. The analytics journey. Analytics,2010.

[5] K. J. Ferreira, B. H. A. Lee, and D. Simchi-Levi.Analytics for an online retailer: Demand fore-casting and price optimization. Manufacturing& Service Operations Management, pages 69–88,2015.

14

Page 15: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

Table 1: List of beer products and their optimized prices/sales. Notable parts are highlighted in boldfaces.price price price sales[unit] sales[unit] sales[unit] sales[yen] sales[yen] sales[yen]

id product name original optimal increase rate original optimal increase rate original optimal increase rate1 Kirin lager beer can 350ml 255 204 -20% 20 30 50% 5100 6110 20%2 Kirin lager beer can 350ml * 6 1120 1120 0% 17 26 53% 19000 29000 53%3 Suntory the premium malts 500ml 285 204 -28% 81 94 16% 23100 19200 -17%4 Kirin Ichibanshibori draft beer 250ml * 6 956 1000 5% 3 7 133% 2870 6730 134%5 Kirin lager beer can 350ml 188 189 1% 29 38 31% 5450 7100 30%6 Budweiser can 350ml 180 180 0% 53 98 85% 9540 17700 86%7 Asahi Oriondraft can 350ml 188 189 1% 19 24 26% 3570 4470 25%8 The premium malts tumbler 350ml * 6 1190 1290 8% 52 53 2% 61800 67700 10%9 Kirin Ichibanshibori draft beer 135ml * 6 543 543 0% 8 15 88% 4340 8170 88%10 Sapporo can draft black label 135ml 91 63 -31% 8 14 75% 728 857 18%11 Asahi Superdry can 500ml 255 235 -8% 143 199 39% 36500 46700 28%12 Corona extra bottle bin 355ml 256 230 -10% 13 23 77% 3330 5240 57%13 Kirin Ichibanshibori draft beer 250ml 162 146 -10% 22 26 18% 3560 3810 7%14 Asahi Superdry can 500ml * 6 1400 1510 8% 34 34 0% 47500 51700 9%15 Echigobeer Pilsner can 350ml 265 265 0% 4 12 200% 1060 3180 200%16 Sapporo Ebisu beer can 350ml 208 208 0% 91 120 32% 18900 25000 32%17 The premium malts can 500ml * 6 1570 1700 8% 22 24 9% 34500 41300 20%18 Kirin Ichibanshibori draft beer can 135ml 91 76 -16% 43 45 5% 3910 3410 -13%19 Kirin Ichibanshibori draft beer can 350ml 185 189 2% 99 136 37% 18300 25700 40%20 Asahi Superdry can 135ml * 6 543 490 -10% 10 13 30% 5430 6410 18%21 Sapporo Ebisu beer can 350ml * 6 1190 1130 -5% 33 47 42% 39200 52800 35%22 Sapporo Ebisu beer can 250ml 172 154 -10% 15 27 80% 2580 4190 62%23 Sapporo draft beer black label can 350ml 185 189 2% 51 56 10% 9440 10500 11%24 Asahi Superdry can 350ml 187 188 1% 132 199 51% 24700 37400 51%25 Asahi Superdry can 250ml 162 145 -10% 16 21 31% 2590 3090 19%26 Kirin Hartland beer bin 500ml 267 267 0% 41 56 37% 10900 14900 37%27 Sapporo draft beer black label can 350ml * 6 1040 1120 8% 27 26 -4% 28200 29300 4%28 Asahi Superdry can 350ml * 6 1050 993 -5% 115 182 58% 120000 181000 51%29 Gingakougen beer of wheat 350ml 246 204 -17% 25 50 100% 6150 10100 64%30 Sapporo draft beer black label can 500ml 250 255 2% 77 85 10% 19300 21700 12%31 Asahi Superdry can 250ml * 6 972 972 0% 5 6 20% 4860 5940 22%32 Suntory the premium malts 350ml 218 218 0% 92 105 14% 20100 23000 14%33 Asahi Superdry can 135ml 88 91 3% 23 30 30% 2020 2690 33%34 Kirin Ichibanshibori draft beer can 350ml * 6 1100 1120 2% 77 119 55% 84500 133000 57%35 Suntory the premium malts can 250ml 178 159 -11% 37 53 43% 6590 8410 28%36 Kirin Ichibanshibori draft beer can 500ml * 6 1500 1320 -12% 26 33 27% 38900 44100 13%37 Asahi Superdry Dryblack 500ml 255 255 0% 25 26 4% 6380 6570 3%38 Kirin Ichibanshibori draft beer bin 633ml 313 297 -5% 9 10 11% 2820 2960 5%39 Budweiser bin LNB 330ml 188 189 1% 10 34 240% 1880 6400 240%40 Asahi Superdry bin 633ml 293 295 1% 31 38 23% 9080 11100 22%41 Heineken bin 330ml 218 205 -6% 9 20 122% 1960 4150 112%42 Kirin Ichibanshibori stout can 350ml 168 189 13% 0 5 - 0 852 -43 Kirin lager beer 500ml * 6 1500 1510 1% 8 9 13% 12000 14200 18%44 Sapporo Ebisu beer can 500ml 275 222 -19% 78 79 1% 21500 17400 -19%45 Sapporo Ebisu beer can 500ml * 6 1570 1600 2% 24 31 29% 37700 48900 30%46 Heineken can 350ml 218 229 5% 23 34 48% 5010 7830 56%47 Asahi Superdry Dryblack 350ml 188 189 1% 29 18 -38% 5450 3410 -37%48 Echigo Premium red ale 350ml 265 265 0% 9 19 111% 2390 4910 105%49 Sapporo draft beer black label can 500ml * 6 1480 1510 2% 12 16 33% 17800 23600 33%50 Kirin Ichibanshibori draft beer can 500ml 255 255 0% 119 129 8% 30300 32800 8%

[6] D. Fooladivanda and J. A. Taylor. Optimalpump scheduling and water flow in water distri-bution networks. In IEEE 54th Annual Confer-ence on Decision and Control, pages 5265–5271,2015.

[7] M. X. Goemans and D. P. Williamson. Im-proved approximation algorithms for maximumcut and satisfiability problems using semidefi-nite programming. Journal of ACM, 42(6):1115–1145, 1995.

[8] R. Klein. Revenue Management. Springer, 2008.

[9] D. Koushik, J. A. Higbie, and C. Eister. Re-tail price optimization at intercontinental hotelsgroup. Interfaces, 42(1):45–57, 2012.

[10] T. P. Kunz and S. F. Crone. Demand models forthe static retail price optimization problem - arevenue management perspective. SCOR, pages101–125, 2014.

[11] S. Lee. Study of demand models and price opti-mization performance. PhD thesis, Georgia In-stitute of Technology, 2011.

15

Page 16: Optimization Beyond Prediction: Prescriptive Price ...optimization (a.k.a. dynamic pricing) where a per-ishable product is priced over a nite selling horizon. Our focus is more on

[12] R. M. Lima and I. E. Grossmann. On the solu-tion of nonconvex cardinality boolean quadraticprogramming problems, 2012.

[13] Y. Liu and X. Guan. Purchase allocation and de-mand bidding in electric power markets. PowerSystems, IEEE Transactions on, 18(1):106–112,2003.

[14] A. Marshall. Principles of Economics. Libraryof Economics and Liberty, 1920.

[15] M. Natter, T. Reutterer, and A. Mild. Dynamicpricing support systems for diy retailers - a casestudy from austria. Marketing Intelligence Re-view, 1:17–23, 2009.

[16] R. L. Phillips. Pricing and Revenue Optimiza-tion. Stanford University Press, 2005.

[17] G. v. Ryzin and S. Mahajan. On the relation-ship between inventory costs and variety bene-fits in retail assortments. Management Science,45(11):1496–1509, 1999.

[18] J. F. Sturm. Using sedumi 1.02, a matlab tool-box for optimization over symmetric cones. Opti-mization methods and software, 11(1-4):625–653,1999.

[19] R. Tibshirani. Regression shrinkage and selec-tion via the lasso. Journal of the Royal Statisti-cal Society, Series B, 58:267–288, 1996.

[20] K.-C. Toh, M. J. Todd, and R. H. Tutuncu.Sdpt3 – a matlab software package for semidef-inite programming, version 1.3. Optimizationmethods and software, 11(1-4):545–581, 1999.

[21] J. A. Tropp and A. C. Gilbert. Signal recov-ery from random measurements via orthogonalmatching pursuit. IEEE Transactions on Infor-mation Theory, 53(12):4655–4666, 2007.

[22] L. Vandenberghe and S. Boyd. Semidefinite pro-gramming. SIAM Review, 38:49–95, 1996.

[23] J. Wang, R. Fujimaki, and Y. Motohashi. Trad-ing interpretability for accuracy: Oblique treedsparse additive models. In KDD, pages 1245–1254, 2015.

[24] M. Yamashita, K. Fujisawa, and M. Kojima. Im-plementation and evaluation of sdpa 6.0. Opti-mization Methods and Software, 18(4):491–505,2003.

16