15
Jeremy Main シニアソリューションアーキテクト GRID GRID Technical Session vGPU Top10 PoC Survival Tips

1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Embed Size (px)

Citation preview

Page 1: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Jeremy Main シニアソリューションアーキテクト GRID

GRID Technical SessionvGPU Top10 PoC Survival Tips

Page 2: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Not defining PoC success criteria with stakeholders

Define measureable metrics

Use actual applications and data

Don’t use GPU-centric benchmarks to simulate multiple users

Page 3: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Attempt to add PoC into existing IT infrastructure

Use an isolated and controlled environment

Retain PoC environment for tuning and troubleshooting after deployment

Setup a gateway for license server access if required

Page 4: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Not understanding application resource requirements

During typical user workloads, performance limiting factor is?

Application is CPU or memory-bound?

GPU frame buffer or rendering-bound?

Perfmon on existing workstations : “NVIDIA_GPU” counters

Page 5: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Not using all available resources of information

NVIDIA deployment guides, application sizing guides

Citrix and VMware reviewer guides and best practices

Page 6: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Attempting to use non GRID certified servers

There are many versions of GRID / Tesla cards

Not every card works in every server

Page 7: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

NVIDIA GRID™ Certified Platforms

UCS C240 M3, M4

UCS C460 M4

PowerEdge R720, R730, T620, T630

PowerEdge C4130, VRTX

PowerEdge C8220X GPU Sled

Precision R7610, Rack 9710

Celsius C620, R940, M740

Primergy CX400M1, RX2540M1,RX350S8, TX300S8

ProLiant WS460c Gen8

ProLiant DL380p Gen8 and Gen9, DL580 Gen 8

ProLiant SL250s Gen 8, SL270s Gen 8 SE

iDataPlex dx360 M4

NeXtScale nx360 M4, M5

Flex System

ThinkStation D30

System x3650M4/M5, x3850X6, x3950X6

For more information

on GRID enabled servers visit

www.nvidia.com/buygrid

Page 8: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Optimal CPUs for the workload are not used

Most CAD applications are very single threaded

Focus on higher CPU frequency, not number of cores

Page 9: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

BIOS power profile is set incorrectly

Set power profile to “Maximum Performance”

Ensure CPUs can reach their highest clock speeds

Page 10: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Servers don’t have enough memory

Memory overcommit does not work with vGPU

4GB : Power User, Entry Level Engineering

8GB : Mid-range Engineering, Video

16GB : Advanced Engineering

32GB : CAD/CAM

64GB : Digital Mock Up

Page 11: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Insufficient storage IOPS

Workstation class users expect…

SSD performance since they use it locally as well

Page 12: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Inadequate network environment for VDI

Don’t use legacy network type in VM : prefer VMXNET3

Confirm network’s ability to deliver enough bandwidth

“iperf” may be used to simulate single and parallel TCP/UDP networkd streams to confirm available bandwidth exists

Page 13: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Not enough vCPUs assigned to a VM

Assign at least 4 vCPU to a vGPU enabled VM

Two vCPUs for application

One vCPU for OS and system-calls

One vCPU for remoting protocol compression

Page 14: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Most Common Mistakes During POCs

Not optimizing virtual machine base image

Eliminate OS-level performance inhibitors

Citrix : “TargetOSOptimizer” tool

VMware : “VMwareOSOptimizationTool”

Page 15: 1102: GRID 技術セッション 3:vGPU Top10 PoC Survival Tips

Resources on www.nvidia.com/grid

White papers

Application guides

Deployment guides

Success stories

GRID 2.0 Datasheet and FAQ

Videos

Blogs