Upload
brianne-haughey
View
49
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Scalable and transparent parallelization of multiplayer games. Bogdan Simion MASc thesis Department of Electrical and Computer Engineering. Multiplayer games. Captivating, highly popular Dynamic artifacts. Multiplayer games. - More than 100k concurrent players. - Long playing times:. - PowerPoint PPT Presentation
Citation preview
1
Scalable and transparent parallelization of
multiplayer games
Bogdan SimionMASc thesis
Department of Electrical andComputer Engineering
4
Multiplayer games
1.World of Warcraft:“I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.”
- Long playing times:
- More than 100k concurrent players
5
Multiplayer games
1.World of Warcraft:“I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.”
2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.”
- Long playing times:
- More than 100k concurrent players
6
Multiplayer games
1.World of Warcraft:“I've been playing this Mage for 3 and a half years now, and I've invested too much time, blood, sweat and tears to quit now.”
2.Halo2: “My longest playing streak was last summer, about 19 hours playing Halo2 on my XBox.”
- Long playing times:
- More than 100k concurrent players
- Game server is the bottleneck
7
Server scaling
Game code parallelization is hard Complex and highly dynamic code Concurrency issues (data races) require
conservative synchronization Deadlocks
8
State-of-the-art
Parallel programming paradigms: Lock-based (pthreads)Transactional memory
Previous parallelizations of Quake Lock-based [Abdelkhalek et. al ‘04] shows
that false sharing is a challenge
9
Transactional Memory vs. Locks
AdvantagesSimpler programming taskTransparently ensures correct execution
Shared data access tracking Detects conflicts and aborts conflicting
transactions
DisadvantagesSoftware (STM) access tracking overheads
10
Transactional Memory vs. Locks
AdvantagesSimpler programming taskTransparently ensures correct execution
Shared data access tracking Detects conflicts and aborts conflicting
transactions
DisadvantagesSoftware (STM) access tracking overheads
Never shown to be practical for real applications
11
Contributions
Case study of parallelization for gamessynthetic version of Quake (SynQuake)
We compare 2 approaches: lock-based and STM parallelizations
We showcase the first realistic application where STM outperforms locks
12
Outline
Application environment: SynQuake gameData structures, server architecture
Parallelization issues False sharingLoad balancing
Experimental results Conclusions
13
Environment: SynQuake game
Simplified version of Quake
Entities: players resources
(apples) walls
Emulated quests
14
SynQuake
Playerscan move
and interact(eat, attack, flee, go to
quest)Apples
Food objects,increase life
WallsImmutable, limit
movement
Contains all the features found in Quake
24
Server frame
Clientrequests
Receive & Process Requests
Form &Send Replies
Clientupdates
1
2
3
Server frame
Admin(singlethread)
26
Parallelization: request processing
Clientrequests
Receive & Process Requests
Form &Send Replies
Clientupdates
1
2
3
Server frame
Admin(singlethread)
Parallelization in this stage
27
Outline
Application environment: SynQuake game Parallelization issues
False sharingLoad balancing
Experimental results Conclusions
28
Parallelization overview
Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies
34
Player assignment
P1
P2
P3
T1
T3Players handled by threads
If players P1,P2,P3 are assigned to distinct threads → Synchronization required
Long-range actions have a higher probability to cause conflicts
T2
38
False sharing
Action bounding box with TM
Action bounding box with locks
Move range
Sh
oo
t ran
ge
39
Parallelization overview
Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies
40
Synchronization algorithm: Locks
Hold locks on parents as little as possible Deadlock-free algorithm
41
Synchronization algorithm: Locks
…
A B
1
2
P1
P2
P4
P3P5
P6
P
A B
A1 A2 B1 B2
Root
P1
P2P5, P6 P4
P3
42
Synchronization algorithm: Locks
…
A B
1
2
P1
P2
P4
P3P5
P6
P
A B
A1 A2 B1 B2
Root
P1
P2P5, P6 P4
P3
Area of interest
43
Synchronization algorithm: Locks
…
A B
1
2
P1
P2
P4
P3P5
P6
P
A B
A1 A2 B1 B2
Root
P1
P2P5, P6 P4
P3
Area of interest
Leaves overlapped
44
Synchronization algorithm: Locks
…
A B
1
2
P1
P2
P4
P3P5
P6
P
A B
A1 A2 B1 B2
Root
P1
P2P5, P6 P4
P3
Area of interest
Leaves overlapped
Lock parentstemporarily
45
Synchronization: Locks vs. STMLocks:
1. Determine overlapping leaves (L)
2. LOCK (L)
3. Process L
4. For each node P in overlapping parents
LOCK(P)
Process P
UNLOCK(P)
5. UNLOCK (L)
STM:
1. BEGIN_TRANSACTION
2. Determine overlapping leaves (L)
3. Process L
4. For each node in P in overlapping parents
Process P
5. COMMIT_TRANSACTION
46
Synchronization: Locks vs. STMLocks:
1. Determine overlapping leaves (L)
2. LOCK (L)
3. Process L
4. For each node P in overlapping parents
LOCK(P)
Process P
UNLOCK(P)
5. UNLOCK (L)
STM:
1. BEGIN_TRANSACTION
2. Determine overlapping leaves (L)
3. Process L
4. For each node in P in overlapping parents
Process P
5. COMMIT_TRANSACTION
STM acquires ownership gradually, reduced false sharingConsistency ensured transparently by the STM
47
Parallelization overview
Synchronization problems Synchronization algorithms Load balancing issues Load balancing policies
49
Assign tasks to threads
Cross-border conflicts→ Synchronization
T1 T2
T3 T4
P2
P1
Moveaction
Shootaction P3
P4
Load balancing issues
51
Load balancing policies
a) Round-robin
Y
256
768
1024
512
0 256 768 1024512
X
- Thread 3 - Thread 4- Thread 1 - Thread 2
52
Load balancing policies
a) Round-robin
Y
256
768
1024
512
0 256 768 1024512
X
- Thread 3 - Thread 4- Thread 1 - Thread 2
b) Spread
256
768
1024
512
0 256 768 1024512
Y
53
Load balancing policies
c) Static locality-aware
- Thread 3 - Thread 4- Thread 1 - Thread 2
b) Spread
256
768
1024
512
0 256 768 1024512
X
Y
256
768
1024
512
0 256 768 1024512
X
Y
54
Locality-aware load balancing
Dynamically detect player hotspots and adjust workload assignments
Compromise between load balancing and reducing synchronization
57
Experimental results
Test scenarios Scaling:
with and without physics computation The effect of load balancing on scaling The influence of locality-awareness
59
Quest scenarios
- Quest 3
128
256
384
640
768
896
1024
512
0 128 256 384 640 768 896 1024512
- Quest 4
X
Y
- Quest 1 - Quest 2
68
Conclusions
First application where STM outperforms locks:Overall performance of STM is better at 4
threads in all scenariosReduced false sharing through on-the-fly
collision detection
Locality-aware load balancing reduces true sharing but only for STM