To Compress or not to Compress? Chuck Hopf. What is your precious? Gollum says every data center has...

Preview:

Citation preview

To Compress or not to Compress?

Chuck Hopf

What is your precious?

• Gollum says every data center has something that is precious or hard to come by– CPU Time– DASD Space– Run Time– IO – Memory

Lots of talk

• On the LISTSERVE – does compression use more CPU? Does it save DASD space?

• On the LISTSERVE – what is the best BUFNO= to use with MXG

Testing the theories

• Built two tests– COMPRESS=NO varying BUFNO from 2 10

15 20– COMPRESS=YES again varying the BUFNO

An Epiphany!

• What if you run with COMPRESS=NO and send the output to PDB as a temporary dataset and then at the end, turn on COMPRESS=YES and do a PROC COPY INDD=PDB OUTDD=PERMPDB NOCLONE; ? That would eliminate all of the compression during the reading and writing of all of the interim datasets but still create a compressed PDB.

So there are now 3 Tests!

• TEST=NO - COMPRESS=NO

• TEST=NO/YES - COMPRESS=NO but final PDB is compressed

• TEST=YES – COMPRESS=YES

CPU Time

Elapsed Time

Low Memory

High Memory

EXCP DASD

DASD IO Time

DASD Space

DASD Space by DDNAME

Conclusions?

• Running with COMPRESS=NO and then copying to a compressed PDB optimizes permanent DASD space and uses very little additional CPU.

• Even better, use the LIBNAME OPTION to turn it on where you want:– LIBNAME PDB COMPRESS=YES; /* zOS only */

• Memory requirements increase with BUFNO but are not really that bad and BUFNO GT 10 shows very little additional benefit

Caveats!

• BLKSIZE matters. SAS procs are sometimes built with a BLKSIZE of 6160 on WORK. This radically affects the IO counts. Use the recommended BLKSIZE=DASD(OPT) and leave the DCB attributes off of SAS datasets.

• REGION may have to be increased – use REGION=0M and be sure you are using the MXG defaults for MEMSIZE.

• This all applies to zOS not to ASCII platforms

So What About ASCII?

• Using the same data, tests run with SAS 9.2 on Win 7 system

• 1.5GB memory

• Dell 4600 – P4 2.7GHz

ASCII ResultsTest BUFNO Elapsed CPU MemoryCOMPRESS=NO DEFAULT 06:45.4 03:25.0 95713kCOMPRESS=YES DEFAULT 06:12.5 02:51.6 95721KCOMPRESS=NO 16K 07:35.1 03:56.8 275537KCOMPRESS=YES 16K 05:57.8 02:49.1 179769KCOMPRESS=NO 32K 07:39.2 02:58.0 275537KCOMPRESS=YES 32K 06:05.4 02:51.0 179679KCOMPRESS=NO 40K 08:28.6 04:17.0 275537KCOMPRESS=YES 40K 06:20.4 02:59.1 179769KCOMPRESS=NO 80K 07:44.1 04:02.8 275537KCOMPRESS=YES 80K 05:59.2 02:54.5 179769KCOMPRESS=NO 16M 07:42.1 04:01.1 275537KCOMPRESS=YES 16M 06:09.2 02:53.0 179769KCOMPRESS=NO 32M 07:43.4 03:54.9 275537KCOMPRESS=YES 32M 05:57.4 02:51.1 179769KCOMPRESS=NO 64M 08:02.7 03:58.7 275537KCOMPRESS=YES 64M 06:37.8 02:55.5 179769KCOMPRESS=NO 128M 08:14.2 03:55.0 275537KCOMPRESS=YES 128M 06:30.0 02:58.0 179769KCOMPRESS=NO 10 07:11.5 03:16.1 96259KCOMPRESS=YES 10 05:56.2 02:37.1 96649KCOMPRESS=NO 40 07:17.5 03:20.9 97603KCOMPRESS=YES 40 06:00.1 02:41.1 98892KCOMPRESS=NO 80 07:13.0 03:24.1 99529KCOMPRESS=YES 80 05:57.6 02:36.2 102095KCOMPRESS=NO 160 07:16.1 03:24.0 103379KCOMPRESS=YES 160 05:44.6 02:26.5 108825K

Wow!

• COMPRESS=YES outperforms COMPRESS=NO!

• BUFNO makes some difference but not a lot and BUFNO=10 looks to be optimal – Difference is in seconds not minutes– But… there is something we don’t understand

in the memory numbers

• Runs faster under Win 7 than under zOS– But does not include download time

So What Should You Do?

• It Depends on what your ‘precious’ is– Running zOS

• Optimal for CPU and DASD is COMPRESS=NO with a copy to a compressed dataset at the end or by setting the compress=YES option with a LIBNAME

• Optimal for CPU is COMPRESS=NO

• Optimal for DASD is COMPRESS=YES

• BUFNO=10 is optimal for run time

– Running ASCII• Optimal for CPU and DASD is COMPRESS=YES

JCL//* SAMPLE JCL TO RUN BUILDPDB WITH COMPRESS=NO AND COMPRESS AT//* THE END USING PROC COPY//S1 EXEC MXGSASV9//PDB DD DSN=MXG.PDB(+1),SPACE=(CYL,(500,500)),// DISP=(,CATLG,DELETE)//SPININ DD DSN=MXG.SPIN(0),SPACE=(CYL,(500,500))// DISP=(,CATLG,DELETE)//SPIN DD DSN=MXG.SPIN(+1),DISP=OLD//CICSTRAN DD DSN=MXG.CICSTRAN(+1),SPACE=(CYL,(500,500)),// DISP=(,CATLG,DELETE)//DB2ACCT DD DSN=MXG.DB2ACCT(+1),SPACE=(CYL,(500,500)),// DISP=(,CATLG,DELETE)//SMF DD DSN=YOUR,SMF DATA,DISP=SHR//SYSIN DD * OPTIONS COMPRESS=NO BUFNO=10; LIBNAME PDB COMPRESS=YES; LIBNAME SPIN COMPRESS=YES; %LET SPININ=SPININ; %UTILBLDP( MACKEEPX= MACRO _LDB2ACC DB2ACCT.DB2ACCT % MACRO _KDB2ACC COMPRESS=YES % MACRO _KCICTRN COMPRESS=YES % , SPINCNT=7, SPINUOW=2, OUTFILE=INSTREAM); %INCLUDE INSTREAM;

JCL is in the 27.10 SOURCLIB as JCLCMPDB

Why UTILBLDP?

• Allows you to add data sources to BUILDPDB without having to edit the macros in the SOURCLIB.

• Allows you to suppress data sources like 110 and DB2 and TYPE74 and process them in other jobs again without editing the macros.

• Flexibility

Example OPTIONS COMPRESS=NO BUFNO=10; LIBNAME PDB COMPRESS=YES; LIBNAME SPIN COMPRESS=YES; %LET SPININ=SPININ; %UTILBLDP( USERADD=42, SUPPRESS=110 DB2, SPINCNT=7, OUTFILE=INSTREAM); %INCLUDE INSTREAM; RUN;

MXG User Experience

• Running MXG with WPS instead of SAS

• Data from multiple platforms

• Processed under two Virtual products

• Also, Comparison of SAS/PC and WPS on zLinux

PC/SAS VMWARE/Windows versus PC/SAS Hyper-V/Windows: (four platform’s data, three installation “groups” PROD/QA/DEV)

Data From VMWARE(PROD) Hyper-V(PROD)

Unix 00:05:30 00:10:56zOS 00:01:30 00:04:54zVM/Linux 00:03:07 00:08:08Windows Servers 02:43:08 09:32:57

Data From VMWARE(QA) Hyper-V(QA)

Unix 00:00:31 00:04:18ZOS 00:01:27 00:02:46zVM/Linux 00:01:02 00:07:06Windows Servers 00:41:24 02:34:19

Data From VMWARE(DEV) Hyper-V(DEV)

Unix 00:00:43 00:02:42ZOS 00:00:21 00:01:42zVM/Linux 00:01:08 00:03:34Windows Servers 00:09:06 00:38:47

Processing of performanceData collected from Unix,zVM/Linux, zOS and Windows.

PC/SAS versus LNX/WPS

• PC/SAS VMWARE/Windows versus WPS zVM/Linux• PC/SAS VMWARE is taking 2:43:08 to process the data

from “Window Servers” for what the WPS zVM/Linux environment can do in 1:30:00 (hh:mm:ss). 

• That is, the Mainframe WPS zVM/Linux is a 45% improvement over the PC/SAS VMWARE/WIN.

• This is most likely due to the extra bandwidth the mainframe has for I/O’s compared to the Windows environment. 

• The results for Windows would probably be better if WIN2008 had been used.

PC/SAS versus WPS on z

•PC/SAS under Hyper-V•WPS under zVM/Linux on z-10

Z10: SAS versus WPS

• zOS/SAS versus zOS/WPS to run MXG

• 30% more I/O’s for SAS

• TCB for WPS = 551,423

• TCB for SAS = 551,273

• NOTES:

• WPS version 2.4.0.1 and SAS 9.1.3

• MXG from FEB 2009