Upload
richard-otis
View
224
Download
2
Tags:
Embed Size (px)
Citation preview
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Best Practices for Setting Up Computer Hardware in a Grid Environment
Best Practices for Setting Up Computer Hardware in a Grid EnvironmentTom Keefer Cheryl DoningerPerformance Analyst, SAS R&D Director, SASTom Keefer Cheryl DoningerPerformance Analyst, SAS R&D Director, SAS
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Recipe for Success
review different grid architectures• different OS’s, network connectivity, storage solutions
show scalable through-put and sustained I/O as number of grid nodes increase
create reference architectures of successful grid configurations to help answer your questions
SAS Grid Computing lots of SAS users
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
What is Grid Computing?
“Grid computing integrates, virtualizes, and manages resources (software and hardware) to provide a much larger, powerful distributed computing infrastructure."
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Benefits of SAS on a Grid
increases scalability
increases availability
facilitates provisioning
increases flexibility
reduces costs
Virtual Data
Center=
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Running SAS on a Grid
SAS Grid Manager
Distributed Enterprise Scheduling
Workload BalancingParallelized Workload
Balancing
Distribute parallelized SAS workloads to a shared pool of resources. Automatically find and use the best available resource
Distribute workloads to a shared pool of resources.
Automatically find and use the best available resource.
Distribute jobs within workflows to range of hosts.
Automatically find and use the best available resource for each job.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
What products can leverage SAS Grid Manager?SAS Grid Manager
Distributed Enterprise Scheduling
Workload BalancingParallelized Workload
Balancing
SAS Data Integration Studio
SAS Enterprise Miner
SAS Risk Dimensions
Any SAS program (with modification)
Any SAS program (with wrapper)including stored processes and SAS Enterprise Guide programs
SAS Data Integration Studio
SAS Web Report Studio
SAS Marketing Automation
SAS Marketing Optimization
Any SAS program
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Metadata Server
Base SASSAS/ConnectSAS Grid ServerSAS Data Step Batch Server
Platform LSF
Grid Control Machine
Grid Node 1
Grid Node 2
Grid Node n
Grid Client+
Metadata Server
Management Console(Grid Manager plug-in) Platform Grid
Management Service
Platform LSF
Base SASSAS/ConnectSAS Grid ServerSAS Data Step Batch Server
Platform LSF Platform LSF
Base SASSAS/ConnectSAS Grid ServerSAS Data Step Batch Server
Platform Process Mgr
DIS or EM
Central File Server for:• Job Deployment Directories• Source and Target Data• SAS Log files
SAS Grid Architecture Topology
2
SASApp
21
1
1
1
. . .
Base SASSAS/ConnectSAS Workspace ServerSAS Grid ServerSAS Data Step Batch Server
2
1
1
3
SAS ProgramLSF
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Keys To Success – Areas To Focus
node configuration• heterogeneous or homogeneous
number and type of processors
memory
storage/data access
no different than single server - just more systems.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Data Storage is The Key
sharable
throughput across the grid
scalable
locality of data• input files
• output files
• temporary files
• external data access
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Shared File System Testing Efforts
Operating System File Sharing Technology
Red Hat Linux (RHEL 4) EMC Celerra Multi-Path File System on iSCSI (MPFSi)
Red Hat Linux (RHEL 4) Network Appliance (NFS)
Sun Solaris 10 Sun StorageTek QFS
Red Hat Linux (RHEL 4)* Global File System (GFS)
Windows* Polyserve / HP Matrix
AIX* IBM Global Parallel File System (GPFS)
HP-UX* Veritas Clustered File System (CFS)
*Efforts ongoing
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Steps to Success With Grid
determine your system requirements• what does your application do?
• data flow diagram
architect your system
test throughput outside of SAS first• third party tools
• replicate your applications behavior (i/o pattern)
single node SAS tests, then scale out
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
EMC MPFSi Architecture
Switch
“The Directory”
EMC Storage
Conversion
IP Traffic
Fiber Channel
Notes:
NAS
MPFSi client on nodes
network “managers”
leverage existing net/data
/work
/work
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
EMC MPFSi Discussion Points based on previous “Highroad” product
SAS data integration benchmarking scenario
40 Linux grid nodes• dual core, dual Ethernet per node for data
• up to 160 simultaneous SAS processes
performance tips:• analyze throughput from node to storage – data flow!!
• watch placement of disk volumes for performance
• don’t allow non-grid activity on network
• separate client and admin network
• monitor director and data mover throughput
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Network ApplianceNFS Architecture
Network Switch
Linux Nodes
NetApp FAS6030
(network storage)
Notes:
NAS
NFS client on nodes
leverage existing network
NFS everywhere/data
/work
ALL Ethernet
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Netapp NFS Discussion Points
pure network file system implementation (NFS)
SAS data integration benchmarking scenario
10 Linux grid nodes• quad core* - single Ethernet per node for data
performance tips:• check throughput from node to storage – data flow!!!
• don’t allow non-grid activity on network
• separate client and admin network
• watch placement of disk volumes for performance
* important note: core to throughput per node ratio
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Sun QFSArchitecture
Notes:
SAN
QFS software on nodes
QFS server “master”
fibre channel – node to disk
server nodes
Sun storage/data
/work
FC Switch
fibre channel
fibre channel
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Sun QFS Discussion Points
pure fibre channel (SAN)
SAS data integration benchmarking scenario
up to 4 Solaris server nodes• 48 to 64 core grid nodes (144 total on grid)
• up to 180 simultaneous SAS processes
• up to 20 fiber channel connections per server
performance tips:• check throughput from node to storage – data flow!!!
• watch placement of disk volumes for performance
• setup of QFS master server
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Other Shared File System Technologies
SAN based – fibre channel• Multi-Path File System (MPFS) NOT iSCSI
• IBM Global Parallel File System (GPFS)
• Polyserve / HP Matrix
− only one available for windows!!
• Linux Global File System (GFS)
• Veritas Clustered File System (CFS)
NAS - Ethernet• NetApp with iSCSI SAS is continuing its
testing efforts with various partners.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Overall Best Practices for Shared File Systems
data flow diagram• understand your applications throughput requirements
before you talk to a storage vendor
monitoring and management tools are a must!
test throughput OUTSIDE of SAS first!
some technologies have volume placement limitations! • i.e. can you span all the arrays with a single volume?
analyze throughput per $ before you buy
availability…. backups….future scalability….
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Scalable Performance Data Serveron a Grid
/spds/data1
/spds/data2
/spds/meta
/spds/indexSAN or NAS
each server / grid node runs its own instance of SAS
and SPDS Server
shared file systems
server / grid nodes
SPDS directories
bottom line: myspdslib.mysastable
is available on any server!
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
SAS Really Scales in a Grid
scalable I/O throughput
lots of choices for OS, storage solution, etc.
our work will continue...
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
More to See and Do...
“A Throughput-Intensive Compute and Storage Grid Using SAS® Grid Manager”• Somantak Chanda, American Express
• Tues 1:30-2:20, Northern Hemisphere E-2
SAS Grid demo booth #16
IT Intelligence for Grid Optimization- demo booth #53
Platform Computing – Alliance Café booth #87
various storage partners – Alliance Café
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
For More Information...
scalability website:
http://support.sas.com/rnd/scalability/grid
today’s presentation
http://support.sas.com/rnd/scalability/grid/gridpapers.html
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.