Upload
shauna-lane
View
215
Download
0
Embed Size (px)
Citation preview
George Boulougaris, Kostas Kolomvatsos, Stathes Hadjiefthymiades
Building the Knowledge Base of a Buyer Agent Using Reinforcement
Learning Techniques
Pervasive Computing Research Group, Department of Informatics
and TelecommunicationsUniversity of Athens, Greece
WCCI – IJNN 2010
Barcelona - Spain
Outline
2
IntroductionMarket MembersScenarioBuyer Q-Table Buyer Purchase BehaviorResults
Introduction
3
Intelligent AgentsAutonomous software componentsRepresent UsersLearn from their owners
Electronic MarketsPlaces where entities not known in advance
can negotiate over the exchange of productsReinforcement Learning
General framework for sequential decision making
Leads to the maximum long-term reward at every state of the world
Market Members
4
BuyersSellersMiddle entities (matchmakers, brokers,
market entities)Intelligent agents may represent each of these
entitiesEntities do not have any information about
the rest in the market
Scenario (1/2)
5
Buyers: could interact with sellerscould interact with brokers or matchmakers (matchmakers
cannot sell products)want to buy the most appropriate product in the most
profitable priceWe focus on the interaction between buyers and
selling entities (sellers or brokers)Most of the research efforts focus only on the
reputation of entitiesWe utilize Q-Learning that is appropriate to result
actions that lead to the maximum long-term reward (based on a number of parameters) at every state of the world
Scenario (2/2)
6
The products parameters for each selling entity are:IDTime validityPriceTime availabilityRelevance
Each selling entity represents the state that the buyer is
Buyer Q-Table (1/3)
7
The buyer has one Q-Table for each productRows represent states and columns represent
actionsThere are M+1 columns (M is the number of
selling entities)Actions [1..M] represent the transition to the [1..M]
entity (row of the Q-Table)The transition to another entity corresponds to a
‘not-buy-from-this-entity’ actionAction M+1 represent the purchase action (from
the specific entity)The buyer final Q-Table is a 3D table
Buyer Q-Table (2/3)
8
The buyer takes into consideration the following information in order to build the Q-Table:Relevancy factorPriceResponse timeNumber of transitions
The equation used is:
where l is the learning rate, r is the reward, γ is the future reward discount factor, st and at is the state and the action at the time t
a)Q(s,)a',Q(s'maxγrl)a,Q(s)a,Q(s
Aa'tttt
Buyer Q-Table (3/3)
9
Issues concerning the reward:has 5% decrement when deal with entities not having the
productis based on:
the reward for the relevancythe reward for the pricethe reward for the response timethe reward for the required transitions
the greater the relevancy is the greater the reward becomes
the smaller the price is the greater the reward becomesthe smaller the response time is the greater the reward
becomesthe smaller the number of transitions is the greater the
reward becomes
Buyer Purchase Behavior
10
The buyer is based on the Q-Table for the purchase action
There are two phases in its behaviorFirst Phase
It creates the Q-TableIt uses a specific number of episodes in the training
phaseSecond Phase
It utilizes the Q-Table for its purchasesAt first randomly selects an entity (row) for a specific
productAccordingly selects the action with the highest rewardIf the best action is to return to a previous visited entity
with inability to deliver, the purchase is not feasible
Results (1/4)
11
We consider a dynamic market where the number and the characteristics of entities is not static
In our experiments we take into consideration the following probabilities:2% that a new product is available in an entity5% that a product is totally new in the market5% that a product is no longer available in an
entity2% that an entity is totally new in the market1% that an entity is not able anymore for
negotiationsWe examine the purchases of 400 products in
each experiment
Results (2/4)
12
Tables creation time resultsEntities Number
(5 Products each)
First Table creation time
(ms)
Average tables creation time (except
first table) (ms)4 15 0
15 125 17.8650 1685 402.73
100 16520 3546.44200 208088 41846.64
Entities Number(40 Products
each)
First Table creation time (ms)
Average tables creation time
(except first table) (ms)
6 156 15.5015 561 114.3330 3510 453.7360 32667 2386.50
100 191303 8254.22
Products Number(15 Entities)
First Table creation time (ms)
Average tables creation time
(except first table) (ms)
5 125 17.8640 561 114.3380 1029 210.60
150 1731 374.57500 6155 1319.14
1000 14917 2940.865000 193644 14914.00
Results (3/4)
13
Q-Learning reduces the required purchase stepsEntities
Number(5 Products
each)
Total moves for 400
Products
Total moves for 400
Products (without using
Q-learning)
Moves reduction using Q-learning
4 653 2000 -67.35%15 716 6400 -88.81%50 714 20400 -96.50%
100 732 40400 -98.19%200 768 80400 -99.04%
Entities Number
(40 Products each)
Total moves for 400
Products
Total moves for 400
Products (without using
Q-learning)
Moves reduction using Q-learning
6 718 2800 -74.36%15 705 6400 -88.98%30 703 12400 -94.33%60 693 24400 -97.16%
100 712 40400 -98.24%
Results (4/4)
14
Q-Learning reduces the average price and the average response time as the number of entities increases
Q-Learning does not affect basic parameters as the number of products increases
15
Thank you
http://p-comp.di.uoa.gr