16
Using CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10, 2007

Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

Embed Size (px)

Citation preview

Page 1: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

Using CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers

Name Sanjay Rao, D John ShakshoberDate May 10, 2007

Page 2: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

AMD CodeAnalyst (CA) profiling on various user applications running RHEL5 Ga.

System Configurations● Tyan AMD 8­cpu, 4socket, dual core, 1­dual QLA2342 FiberChannel, 28 15k 

RPM disks, on HP Enterprise Virtual Array 4000, dual path MPIO McCalpin Stream Benchmark

● Copy Bandwidth – 1 GB per stream, 1,2,4 and 8 streams● W/ and without NUMA ● Measure IPC and L2 cache, Bus traffic

Oracle OLTP workload ­   ● Random 2k IO's (50% Read/50% Write), Sequential Write to logs, EXT3● Vary user count, tune SGA to saturate 8­cpu, using EXT3 Direct and Async I/O ● Number of transactions / minute (tpm) ● Run with and without Large pages (HughTLBfs)● Measure IPC, Translation Buffer Misses 

Page 3: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

Memory

Memory

Memory

Memory

C0 C1 C0 C1

C0 C1 C0 C1

S1 S2

S3 S4

Process on  S1C0

S1

Interleaved Memory

S2

S3

S4

S1

S2

S3

S4

S1

S2

S3

S4

Process on  S1C0

S1 S2 S3 S4Non­Interleaved (NUMA)

1 hop to any memory bank

 Tyan AMD64 Numa Memory Layout

Page 4: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

McCalpin Streams Copy Bandwidth (1,2,4,8)

1 2 4 80

2000

4000

6000

8000

10000

12000

14000

16000

0

2.5

5

7.5

10

12.5

15

17.5

20

22.5

NonNumaNuma%Difference

No. of Streams

Rat

e (M

B/s)

Page 5: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

IPC Comparison – McCalpin Streams

Page 6: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

Data Access Comparison – McCalpin Streams

Page 7: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

Instruction Comparison – McCalpin Streams

Page 8: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

L2 Cache Comparison – McCalpin Streams

Page 9: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

 CA used to montior CPU, data access stallsw/ complex Database Workload, Oracle 10G

Oracle OLTP workload ­   ● Random 2k IO's (50% Read/50% Write), Sequential Write to logs, EXT3● Vary user count, tune SGA to saturate 8­cpu, using EXT3 Direct and Async 

I/O ● Number of transactions / minute (tpm) ● Run with and without Large pages (HughTLBfs)● Measure IPC, Translation Buffer Misses 

Page 10: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

The Translation Lookaside Buffer (TLB) is a small CPU cache of recently used virtual to physical address mappings

TLB misses are extremely expensive on today's very fast, pipelined CPUs

Large memory applicationscan incur high TLB miss rates

HugeTLBs permit memory to bemanaged in very large segments

AMD64

● Standard page: 4KB● Default huge page: 2MB● 500:1 difference

File system mapping interface Ideal for databases

● E.G. TLB can fully map a 2GBOracle SGA w/ 1024 TLB entries

HugeTLBFS

Physical Memory

Virtual AddressSpace

TLB

128 data128 instruction

Page 11: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

 Oracle 10G OLTP Performance (tpm k) 4k vs 2MB huge pages

Trans / min

DTLB Accesses

IC – Misses

L2 ­ Misses

0.00

50000.00

100000.00

150000.00

200000.00

250000.00

300000.00

350000.00

400000.00

­17.5

­15

­12.5

­10

­7.5

­5

­2.5

0

2.5

5

7.5

RHEL5RHEL5 – Hugepages% Difference

Page 12: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

Data Access – DTLB Assessment Comparison – Oracle Workload

Page 13: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

Instruction Cycle Comparison – Oracle Workload

Page 14: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

L2 Cache Comparison – Oracle Workload

Page 15: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

IPC Comparison – Oracle Workload

Page 16: Using CodeAnalyst on Red Hat Enterprise Linux to ... CodeAnalyst on Red Hat Enterprise Linux to Understand Performance on AMD Servers Name Sanjay Rao, D John Shakshober Date May 10,

 RHEL and AMD CodeAnalyst w/ Oprofile Runs w/ Standard RHEL oprofile (install sysstat) Download CA rpm from AMDdeveloper page Gui allows for easy data collection of

● Cycles, retired inst profile IPC calculation● Data Cache access (both I and D)● Memory subsystem performance

● NUMActl at OS, L2 references● Translation buffer analysis (TLB)