78
1 Chapter 1: Introduction 1.1: Course Logistics 1.2: Measuring Efficiencies 1.3: SAS DATA Step Processing

1 Chapter 1: Introduction 1.1: Course Logistics 1.2: Measuring Efficiencies 1.3: SAS DATA Step Processing

Embed Size (px)

Citation preview

1

Chapter 1: Introduction

1.1: Course Logistics

1.2: Measuring Efficiencies

1.3: SAS DATA Step Processing

2

Chapter 1: Introduction

1.1: Course Logistics1.1: Course Logistics

1.2: Measuring Efficiencies

1.3: SAS DATA Step Processing

3

Objectives List the tasks in the SAS Programming 3 course. Explain the naming convention that is used for the

course files. Compare the three levels of exercises that are used

in the course. Describe, at a high level, how data is used and stored

at Orion Star Sports & Outdoors. Navigate to the Help facility.

4

Tasks in the SAS Programming 3 Course The course topics include techniques for the following data management tasks: compressing SAS data sets creating indexes for a quick retrieval of subsets performing table lookups using arrays, hash objects,

or formats combining data by merging, using the SQL procedure,

or using multiple SET statements combining summary and detail data sorting and grouping data developing a program quickly

5

Resource UtilizationAs programmers, you want to perform these tasks as efficiently as possible and optimize the use of the following resources: programmer time I/O CPU memory data storage space network bandwidth

6

Business ScenariosThe business scenarios are opportunities to compare multiple techniques for performing the tasks.

For example: Task: Table Lookups Possible Techniques:

– DATA step MERGE statement– PROC SQL joins– Formats in PUT functions or in FORMAT statements– DATA step arrays– DATA step hash objects

7

8

1.01 Multiple Answer PollWhat type(s) of SAS programs do you write?

a. Data manipulation with the DATA step

b. Data analysis with procedures

c. Report writing

d. A combination of the above

e. SAS training only; no programs written

f. Other

9

Filename Conventions

p304a01

p304a02

p304a02s

p304d01

p304d02

p304e01

p304e02

p304s01

p304s02

p304d01x

course ID chapter # item #type placeholder

Example: The SAS Programming 3course ID is p3, sop304d01 =SAS Programming 3,Chapter 4, Demo 1.

Code Type

a Activity

d Demo

e Exercise

s Solution

10

Three Levels of ExercisesLevel 1 The exercise mimics an example

presented in the section.

Level 2 Less information and guidance are provided in the exercise instructions.

Level 3 Only the task you are to perform or the results to be obtained are provided. Typically, you will need to use the Help facility.

You are not expected to complete all of the exercises in the time allotted. Choose the exercise or exercises that are at the level with which you are most comfortable.

11

Orion Star Sports & Outdoors

Orion Star Sports & Outdoors is a fictitious global sports and outdoors retailer with traditional stores, an online store, and a large catalog business.

The corporate headquarters is located in the United States with offices and stores in many countries throughout the world.

Orion Star has about 1,000 employees and 90,000 customers, processes approximately 150,000 orders annually, and purchases products from 64 suppliers.

12

Orion Star DataAs is the case with most organizations, Orion Star has a large amount of data about its customers, suppliers, products, and employees. Much of this information is stored in transactional systems in various formats.

Using applications and processes such as SAS Data Integration Studio, this transactional information was extracted, transformed, and loaded into a data warehouse.

Data marts were created to meet the needs of specific departments such as Marketing.

13

The SAS Help Facility

14

15

1.02 Quiz Start your SAS session. Open the Help facility. Determine the path to use to obtain information about

the SAS component objects.

16

1.02 Quiz – Correct Answer

Information relevant to this course can be found by following these paths in the SAS Help facility:

Contents tab SAS Products

Base SAS SAS 9.2 Language

Reference Dictionary Dictionary of

ComponentObject LanguageElements

Determine the path to use to obtain information about the SAS component objects.

17

SAS OnlineDocYou can also obtain information from SAS OnlineDoc.

Information relevant to this course can be found by following these paths in SAS OnlineDoc:

Contents tab Products Documentation

A-Z Base SAS

SAS 9.2 LanguageReference Dictionary

Dictionary ofComponentObject LanguageElements

18

19

Chapter 1: Introduction

1.1: Course Logistics

1.2: Measuring Efficiencies1.2: Measuring Efficiencies

1.3: SAS DATA Step Processing

20

Objectives Identify the resources used by a SAS program. Report computer resource usage using SAS system

options. Interpret resource usage statistics in your operating

environment. Benchmark resource usage.

21

Running a SAS ProgramWhat resources are required to run a SAS program?

The programmer must perform the following tasks: determine program specifications write the program test the program execute the program maintain the program

22

Running a SAS ProgramThe computer must perform the following actions: load the required SAS software into memory compile the program read the data execute the compiled program store output data files store output reports

23

What Resources Are Used?

programmertime

networkbandwidth

CPU

I/O

memory

data storagespace

resources used

24

25

1.03 Multiple Answer PollWhich of the following resources do you need to conserve?

a. CPU

b. I/O

c. Memory

d. Data storage space

e. Network bandwidth

f. Your time

26

Understanding Efficiency Trade-offsWhen you decrease the use of one resource, the use of other resources might increase.

Resource usage is dependent on your data. A specific technique might be more efficient with one data set and less efficient with another.

27

Understanding Efficiency Trade-offs

Decreasing the size of a SAS data set can result in an increase in CPU usage.

Data Data

Space

CPU

12

6

39

12

6

39

Often Implies

...

28

Understanding Efficiency Trade-offs

Decreasing the number of I/O operations comes at the expense of increased memory usage.

I/O

Memory

Often Implies

29

Deciding What Is Important for Efficiency

Your Site

Your Programs

Your Data

30

Understanding Efficiency at Your Site

SAS Environment

Hardware Operating Environment

System Load

31

32

1.04 Multiple Choice PollThis class uses SAS 9.2.

What is the latest version of SAS that are you running?

a. SAS 8.2

b. SAS 9.1

c. SAS 9.2

d. Other

33

Knowing How Your Program Will Be UsedThe importance of efficiency increases with the following: the complexity of the program and/or the size of the

files being processed the number of times that the program will be executed

34

Knowing Your Data

35

36

1.05 Multiple Answer PollWhat type(s) of data do you use?

a. SAS data sets

b. External files

c. Data from a relational database – for example, Oracle, Teradata, or SQL Server

d. Excel spreadsheets

e. OLAP cubes

f. Information maps

g. Other

37

Considering Trade-OffsIn this class, many tasks are performed using one or more techniques.

To decide which technique is most efficient for a given task, benchmark, or measure and compare, the resource usage of each technique.

You should benchmark with the actual data to determine which technique is the most efficient.

The effectiveness of any efficiency technique depends greatly on the data with which you use the technique.

38

Running Benchmarks: GuidelinesTo benchmark your programming techniques, do the following: Turn on the appropriate options to report resource

usage. Test each technique in a separate SAS session. Test only one technique or change at a time, with

as little additional code as possible. Run your tests under the conditions that your final

program will use (for example, batch execution, large data sets, and so on).

continued...

39

Running Benchmarks: Guidelines Run each program several times and base your

conclusions on averages, not on a single execution. (This is more critical when you benchmark elapsed time.)

Exclude outliers from the analysis because that data might lead you to tune your program to run less efficiently than it should.

Turn off the options that report resource usage after testing is finished, because they consume resources.

In a multi-user environment, other computer activities might affect the running of your program.

40

41

1.06 Multiple Choice PollWhich of the following SAS programs should be benchmarked?

a. A report that shows all the customers in the United Kingdom in March 2006

b. A report that calculates trends in sales at the end of every day for every department

c. A report showing the projected total cost of a 5% cost-of-living increase in employee salaries for a Human Resources project conducted on January 1, 2007

d. A yearly report that calculates the average sales of a line of apparel for the clothing manager

42

1.06 Multiple Choice Poll – Correct AnswerWhich of the following SAS programs should be benchmarked?

a. A report that shows all the customers in the United Kingdom in March 2006

b. A report that calculates trends in sales at the end of every day for every department

c. A report showing the projected total cost of a 5% cost-of-living increase in employee salaries for a Human Resources project conducted on January 1, 2007

d. A yearly report that calculates the average sales of a line of apparel for the clothing manager

43

Tracking Resource Usage

MEMRPT(z/OS only)

STATS(z/OS only)

SASOptions

STIMER

FULLSTIMER

44

Tracking Resources with SAS OptionsWindows, UNIX

z/OS»Invocation option only

OPTIONS NOFULLSTIMER | FULLSTIMER;OPTIONS NOFULLSTIMER | FULLSTIMER;

OPTIONS STIMER | NOSTIMER;OPTIONS STIMER | NOSTIMER;

OPTIONS STATS | NOSTATS;OPTIONS STATS | NOSTATS;

OPTIONS MEMRPT | NOMEMRPT;OPTIONS MEMRPT | NOMEMRPT;

STIMER | NOSTIMERSTIMER | NOSTIMER

OPTIONS NOFULLSTIMER | FULLSTIMER;OPTIONS NOFULLSTIMER | FULLSTIMER;

45

Business ScenarioYou should benchmark to determine the most efficient technique for creating a new variable based on a condition.

The following methods can be used: IF-THEN with an assignment statement IF-THEN/ELSE with an assignment statement SELECT/WHEN with an assignment statement

46

47

1.07 Quiz1. Open and submit p301a01a.

Record the user CPU: ____________

Exit SAS.

2. Start SAS.

Open and submit p301a01b.

Record the user CPU: ____________

Exit SAS.

3. Start SAS.

Open and submit p301a01c.

Record the user CPU: ____________

4. Which technique is most efficient?

In z/OS, record the CPU.

48

Sample Windows LogPartial SAS Log

p301a01a

5 options fullstimer;6 data _null_;7 length var $ 30;8 retain var2-var50 0 var51-var100 'ABC';9 do x=1 to 100000000;10 var1=10000000*ranuni(x);11 if var1>1000000 then var='Greater than 1,000,000';12 if 500000<=var1<=1000000 then var='Between 500,000 and 1,000,000';13 if 100000<=var1<500000 then var='Between 100,000 and 500,000';14 if 10000<=var1<100000 then var='Between 10,000 and 100,000';15 if 1000<=var1<10000 then var='Between 1,000 and 10,000';16 if var1<1000 then var='Less than 1,000';17 end;18 run;

NOTE: DATA statement used (Total process time): real time 1.26 seconds user cpu time 0.98 seconds system cpu time 0.04 seconds Memory 278k OS Memory 4976k Timestamp 6/29/2010 12:39:21 PM

49

Sample UNIX LogPartial SAS Log1 options fullstimer;2 data _null_;3 length var $30;4 retain var2-var50 0 var51-var100 'ABC';5 do x=1 to 10000000;6 var1=10000000*ranuni(x);7 if var1>10000000 then var='Greater than 1,000,000';8 if 500000<=var1<=1000000 then var='Between 500,000 and 1,000,000';9 if 100000<=var1<500000 then var='Between 100,000 and 500,000';10 if 10000<=var1<100000 then var='Between 10,000 and 100,000';11 if 1000<=var1<10000 then var='Between 1,000 and 10,000';12 if var1<1000 then var='Less than 1,000';13 end;14 run;

NOTE: DATA statement used (Total process time): real time 6.62 seconds user cpu time 5.14 seconds system cpu time 0.01 seconds Memory 526k OS Memory 5680k Timestamp 6/29/2010 11:55:32 AM Page Faults 82 Page Reclaims 0 Page Swaps 0 Voluntary Context Switches 91 Involuntary Context Switches 48 Block Input Operations 91 Block Output Operations 0

p301a01a

50

Sample z/OS LogPartial SAS Log

p301a01a

51

52

Chapter 1: Introduction

1.1: Course Logistics

1.2: Measuring Efficiencies

1.3: SAS DATA Step Processing1.3: SAS DATA Step Processing

53

Objectives List the attributes of a data set page and define how

it relates to the structure of SAS data sets. Describe how SAS reads and writes data.

54

SAS Data Set PagesA SAS data set page has the following attributes: It is the unit of data transfer between the operating

system buffers and SAS buffers in memory. It includes the number of bytes used by the descriptor

portion, the data values, and any operating system overhead.

It is fixed in size when the data set is created, either to a default value or to a value specified by the programmer.

55

Using PROC CONTENTS to Report Page Size

Partial PROC CONTENTS Output

Engine/Host Dependent Information

Data Set Page Size 16384Number of Data Set Pages 18First Data Page 1Max Obs per Page 92Obs in First Data Page 72Number of Data Set Repairs 0File Name S:\workshop\sales_history.sas7bdatRelease Created 9.0201M0Host Created XP_PRO

16,384*18=294,912 bytes

proc contents data=orion.sales_history;run;

56

57

1.08 QuizUse one of the following to determine the page sizeof the orion.customer_dim SAS data set: the CONTENTS procedure the DATASETS procedure the SAS Explorer window

What is the page size of the SAS data set orion.customer_dim?

p301a02

58

1.08 Quiz – Correct AnswerUse one of the following to determine the page sizeof the orion.customer_dim SAS data set: the CONTENTS procedure the DATASETS procedure the SAS Explorer window

What is the page size of the SAS data set orion.customer_dim?

16,384 bytes in Windows

24,576 bytes in UNIX

18,432 bytes in z/OS

p301a02

59

Reading External Files

InputRawData

memory

...

60

Reading External Files

InputRawData

I/Omeasured

hereBuffers

memoryCaches

...

Data might be cached in storage devices. On UNIX and Windows, data can also be cached by the OS file system.

61

Reading External Files

Input BufferInputRawData

I/Omeasured

hereBuffers

memoryCaches

...

62

Reading External Files

Input Buffer

PDVID Gender Country Name

InputRawData

I/Omeasured

hereBuffers

memoryData is converted

from externalformat to

SAS format.

Caches

...

63

Reading External Files

PDV

Input Buffer

ID Gender Country Name

InputRawData

I/Omeasured

hereBuffers

memoryData is converted

from externalformat to

SAS format.

Caches

...

Buffers

64

Reading External Files

PDV

Input Buffer

I/O measured

here

OutputSASData

ID Gender Country Name

InputRawData

I/Omeasured

here

Buffers

Buffers

memoryData is converted

from externalformat to

SAS format.

CachesCaches

65

Reading a SAS Data Set with a SET Statement

InputSAS Data

memory

...

66

Reading a SAS Data Set with a SET Statement

InputSAS Data

I/Omeasured

hereBuffers

memory

Data might be cached in storage devices. On UNIX and Windows, data can also be cached by the OS file system.

...

Caches

67

Reading a SAS Data Set with a SET Statement

InputSAS Data

...

I/Omeasured

here

Caches memory

68

Reading a SAS Data Set with a SET Statement

InputSAS Data

PDVID Gender Country Name

...

I/Omeasured

here No dataconversion

is necessary.

Caches memory

69

Reading a SAS Data Set with a SET Statement

InputSAS Data

PDVID Gender Country Name

...

memory

No dataconversion

is necessary.

I/Omeasured

here

Caches

70

Reading a SAS Data Set with a SET Statement

InputSAS Data

PDVID Gender Country Name

...

No dataconversion

is necessary.

I/Omeasured

here

Caches memory

71

Reading a SAS Data Set with a SET Statement

OutputSASData

InputSAS Data

PDVID Gender Country Name

...

No dataconversion

is necessary.

I/Omeasured

here

I/Omeasured

here

Caches memoryCaches

72

Reading a SAS Data Set with a SET Statement

InputSAS Data

memory

PDVID Gender Country Name

I/Omeasured

here

Sequential processing continues until the pointer

reaches the end of the file.

OutputSASData

I/Omeasured

here

73

74

Exercise

These exercises reinforce the concepts discussed previously.

75

Chapter Review1. What are the six resources consumed

by SAS programs?

2. What is the correct way to benchmark SAS programs?

3. What is a SAS data set page size?

76

Chapter Review Answers1. What are the six resources consumed

by SAS programs? programmer time network bandwidth CPU Memory I/O disk storage space

continued...

77

Chapter Review Answers2. What is the correct way to benchmark SAS programs?

a. Turn on the system options to report resource usage.

b. Test each technique in a separate SAS session.

c. Test only one technique or change at a time.

d. Run the test under final conditions.

e. Run each program three to five times and average the results.

f. Exclude outliers.

g. Turn off the resource usage reporting options.

continued...

78

Chapter Review Answers3. What is a SAS data set page size?

The size of the SAS data set page is the unit of data transfer between the system buffers and the SAS buffers in memory. The default transfer is one data set page at a time.

The page size determines the amount of memory that is used when data is read and written. The number of pages effects the I/O.