System Infra-Structure

System Infra-Structure

ByJames Lum

Forward:• Infra-structure concepts in response to IBM

objectives for FS (Future System) systems:– 24/7 in-service operation; hardware, micro code, and

software– Fixes applied when system active with customer apps

• Full infra-structure concepts and details in:– IBM Technical Report (TR) TR 03.443 July 1992

System Infra-Structure for Softwareby James LumSanta Theresa LaboratorySan Jose, CA

• All the design and development was actually done in the SAK (System Assurance Kernel) testing system at IBM Poughkeepsie DSD Product Assurance, NY from 1971 thru 1979. SAK is still in use primarily for its test programs.

IBM DSD Engineering Product Assurance Experience:Approx Yr Syste

mOp Sys

Comments

1964-1969 S/360 PTMR Cross

Uni-processorMulti-tasking

1968-1969 40I MPTMD Brand

2-CPUsTask StacksSpin Locks

1969 40I PTSJ Lum

16-CPUsSuspend Locks

1969-1975 S/370FAA

PTSJ Lum

Above Plus:- N Virtual Spaces- Modular Function

1971-1974 FS - - - Un-Do Recovery

1978-Present S/370/390 SAKJ LumT Bohizic

Above Plus:- N-CPUs- Module Loading

Overview of System Basics:

• Mechanics (discussed in this presentation)•Structure•Linkage•Recovery•Serialization•Packaging

• Not discussed in this presentation•Architecture, Conventions, Language, Macros•Function, Logic, Control Blocks•Documentation; Internals, User Guides•Development•Design, Coding, Testing, Releases, Maintenance

Contents:• Objectives• Overview• Structure

– Call Directory

• Linkage– Stacks, Module Layout

• Recovery– Theory, Method, Cascading, System

• Serialization– Locks, Promotion, Deadlock

Recovery

• Dynamic Module Replacement• Dynamic Module Loading

– Boot & Load List

• Overall Summary

Objectives:

Define the basic structures and concepts that are valid for any programming application, especially system control programs and subsystems, that provide:

• Good recovery• Design flexibility• Good performance• Ease of implementation• Ease of making changes and

applying fixes• A minimal set of global rules

Method:

• Combine selected solutions to basic design problems in:– Structure– Linkage– Recovery– Serialization– Development and maintenance– Flow of control

• And mesh them together such that the selected solutions help solve problems in other areas in the most optimum way

Method: Continued

• Thereby producing system solutions that:– Allows non-infinite recursion– Can nest interrupts (events)– Support a non-layered design– Has in-line locking for high performance

serialization– Supports an imperfect lock hierarchy– Has module and system level recovery– Has no mainline code for recovery hooks– Supports multi-tasking and multi-cpus– Stresses modularity– Is easy to add system wide module call/return

trace– Is easy to change and/or add new function

Structures:

Call Directory (contains anchors for)

Code Modules

Data Space(s)

Task Stacks

Task Control Blocks

Structure Problems:• Logic Concerns: (What programmers

do)– What are the arguments?– Where are the control blocks?– What algorithm should be used?– What other functions are needed?

• Non-Logic Concerns: (What system designers do)

–How is the work area acquired?–Is there enough work area space?–Where are the arguments?–What is the interface?–Can interrupts occur?–Is recursion allowed?–What is the execution state?–Can Function X be invoked?

Structure – Task Stacks

• Stack is unique to task– Structured flow of control– Supports recursion– Contains work area header– Contains register save area– Contains call arguments– Contains interrupt status– PUSH/POP operations– End-of-stack handling

Work Area Module A

Work Area Module B

Work Area Int Handler Int Status

Work Area Module F

Module A

Module B

Module F

Interrupt Handler

Task Code Flow

Structure – Module Function Problems:• All code modules have four functions that are

classically dispersed and written by different people:

– Initialization grouped with other initialization code– Mainline code grouped with its other component code

Initialization

– Recovery code grouped with other recovery code– Termination grouped with other termination code

Recovery TerminationComponent

How about NOT dispersing these functions?

Structure – Module Layout:Main:Init: Term: Recovery

Table::

Initialization:- Insert in Call Directory- Build Structures- Un-Do recovery for initialization

Termination:- Release Structures- Un-Do recovery for termination

Mainline:- Function- Un-Do recovery for mainline

Recovery Re-Entry Point Table:

Name, Date, Version, Size

•Module name•Release Date•System/Module

Version•Module Size, etc

•All code in the module•Structured programming

rules•Un-do recovery•Re-entrant code•Single CSECT•Single entry point

- Initialization code- Termination code- Recovery table

Module Header Contains:•Main entry jumps to mainline

code

•Pointers to:

Programmer Writes:

Recovery Philosophy:

• When I write and deliver code, there are no bugs in it, therefore I don’t need to write recovery for my code. Besides, if I find any bugs, I fix them immediately.

• But management and team leaders say that there must be recovery for my code.

• The error then occurs in the code I called, not my code. All I can do is undo my changes and retry the operation once and, if unsuccessful, pass the error to whomever called me.• Recovery will be invoked if data values are incorrect or if an unexpected interrupt occurs

Recovery Methods:• In-line conditional checks:

if good = ‘yes’ then call xyz(p,d,q); if returncode = ‘bad’ then good =‘no’

• Invoke a checkpoint routine• Put all retry in a separate module

Is there another way?How about backing out (un-do)…and then retrying???

Recovery Objectives:

•Software retry•Maintain consistent system state•No mainline code overhead•Insensitive to external changes•Insensitive to changes in inter

module flow

• The next three slides show some low level detail so that you get the idea of how un-do recovery works.

• After that, there will be slides to show how to make the programmers job easier and ensure that the module structures are automatically generated correctly for un-do recovery.

Recovery Method Step 1 of 3

LK XCall

GET Return;UNLK

XReturn;

Module M Step 1:- Write function code- Structured programming- Note downward code flow- Note call to module GET- Note serialization on lockword X

Return;


LK XCall

GET Return;UNLK

XReturn;


T:

A:

B:C:

AA:

CC:BB:

LK XCall

FREEUNLK

X

Step 2:- Split function into major pieces- Label each major piece- Write “undo” for each major piece- Label each “undo” piece- Place “undo” pieces in opposite order

Return;


LK XCall

GET Return;UNLK

XReturn;


T:

A:

B:C:

AA:

CC:BB:

LK XCall

FREEUNLK

X

CC:B:

T:

C:

CC:A: AA:

BB:

Step 2:- Split function into major pieces- Label each major piece- Write “undo” for each major piece - Label each “undo” piece- Place “undo” pieces in opposite orderStep 3:- Put labels in recovery table- Surround all code with a DO UNTIL loop- Insert recovery return at end

Do 1 to 2

ENDR-Return

Recovery Method:

• Wow! That’s a lot of non-main-function work for a programmer to do!

• Lets provide some macros that will generate these labels and the recovery table and the module header.

• The system designer is responsible for providing these macros.

• Programming language and macro preprocessor:– Allows constants address labels to be placed within and

before the executable code for module header and recovery table generation.

– Allows macro arguments to be collected and then expanded within the recovery redirection table.

Return;

Recovery Macros

LK X

Call GET

Return;

UNLK X

Return;

Module M M-HDR- Builds module header containing name, date, entry points, re-direction table anchor

LK X

Call FREE

UNLK X

M-HDRR-DOS-L(T)

S-L(A)

S-L(B)

S-L(C)R-L(C)R-L(B)

R-L(A)

R-ENDR-PERC

R-TBL

S-L(xx)- Creates main-line labels

R-L(xx)- Creates undo labels

R-DO- Creates DO UNTIL statement

R-END- Creates End for DO UNTIL statement

R-PERC- Creates Call to error percolation routine

R-TBL- Creates recovery re-direction table using labels from S-L and R-L macros

Recovery Return Percolation:

• After an unsuccessful retry, control is passed back to the calling module in a “return to” fashion:

• The argument passed to the return percolation function is either:

– The normal return address location in the calling module– Or the location at which an interrupt occurred

• The percolation function locates the calling module’s header via the task stack for security reasons.

• The module header found contains a pointer to that module’s recovery redirection table.

• The recovery redirection table entries are searched for an entry range that contains the location argument.

• The module’s status and registers are then loaded and control is given to the location found in the recovery table.

• The module will then do “undo” operations and percolate to its caller if unsuccessful to repeat the above process.

Recovery Cascading for System RecoveryModule M

Module N Module O

Normal Call/Return flow– Note downward time flow– Note call nesting depth

TimeCall Nesting Depth

Recovery Cascading for System RecoveryModule M

Module NModule O

Recovery Call/Return flow

TimeCall Nesting Depth

Log ErrorModule I٭

Module P

1. Error occurs in Module N

3. Percolate to Module N4. Undo Module N code

7. Percolate to Module M8. Undo Module M code

9. Retry Module M one time10. Call Module N 2nd time 11. Okay if retry is successful

2. Log the error

5. Retry Module N one time

6. Error re-occurs, log and repeat undo In Module N

12. Permanent error if not

Note: Number of actual retries based on nesting depth of the error

Recovery Summary:• Advantages:

– No mainline code for recovery hooks; labels are not executable– Insensitive to external code flow changes– Module recovery cascades into system recovery– Promotes modularity and top-down structured programming– Coding rules are the same for all modules

• Disadvantages:– Possible to lose asynchronous interrupts if recovery progresses thru

I/O, External, or Machine Check interrupt handlers

• Experience:– Recovery involves:

• Unlocking locks that were locked by the main code • Releasing resources that were acquired by the main code

– Can also do recovery on recovery and undo code– Recovery must be at the end of each internal subroutine– Valid to lose a control block from a free chain– Recovery is less than 10% of a module

Serialization Objectives:• Manage resources:

– In a multi-programming system (multi-tasking)– In a multi-processing system (multiple CPUs)– While enabled for interrupts– While allowing recursion– While unexpected interrupts are occurring

Serialization Observations:• There are different kinds of resources;

– Storage, I/O, CPUs, Time, etc

• All resources are defined by control blocks• A lockword can be assigned to each group of

control blocks and lockwords can be in each control block

• All resources are acquired on the behalf of a task

• A CPU is NOT a task, it is a resource• If a lockword is locked, the task must wait,

but how?

Serialization Methods:• Disable interrupts• Special instructions (atomic operations)

– TS, CS, CSD– Spin on the lockword– Lock with the task ID or the CPU ID

• Also consider:– Lock hierarchies to avoid deadlocks– “Design is not done until a lock hierarchy is defined and a

lock hierarchy is not defined until design is done”– Deadlock detection or avoidance?– Exclusive locks– Shared locks

Serialization Guidelines:• All resources are acquired on the behalf of a task• A locked lockword is associated with a task via a

pointer to the task’s control block. NEVER with a CPU ID!

• A task can have as many lockwords locked, as needed, at the same time

• Exclusive locking only … No locking for read only operations:

– Control blocks are filled in before they are enqueued– Single threaded chains to ensure atomic enqueues– Free chain pointers are NOT the same control block field as

the active chain pointer– In unused control blocks, the active chain pointer points to

the head of the active chain to steer any search code back to the active chain

Serialization – Lockwords and Tasks:

Task Control BlocksAA

BB

CC

DD

Lockword

XX AA@

– Current Lockword owner– 1st waiter on Lockword– 2nd waiter on Lockword– 3rd waiter on Lockword

0 0BB@

XX@

CC@

XX@

XX@

DD@

0Note:- Lockword waiter chain is within the Task control blocks- Waiting Tasks contain a pointer to the lockword they are waiting on- Tasks can lock many lockwords, one at a time, but will wait on the first lockword it finds locked by another task

Serialization - Process Promotion:

Task Control BlocksAA

BB

CC

DD

Lockword

XX AA@

– Task AA locks lockword XX– Task BB and CC are waiters– Task AA is dispatched (promoted) whenever it is Task BB’s and CC’s turn to run independent of any priorities

0 0BB@

XX@

CC@

XX@

0

Note:- Promotion reduces lockword contention (waiter) time spans- Promotion avoids long waiter queues

Serialization – Deadlock Detection:

AA@ BB@

LockwordsXX

YY BB@ AA@

YY@ 0Task Control BlocksAA

BB XX@

0

– Task AA locks lockword XX

and Task BB locks lockword YY– Task BB attempts to lock lockword XX and becomes a waiter– Task AA attempts to lock lockword YY causes a deadlock if allowed– Deadlock detected as part of the Process promotion algorithm:– Is lockword owner is waiting on a lockword; Task control block

field– If yes, locate lockword and check if the lockword owner is this Task – If yes, then a deadlock will occur if this Task becomes a waiter– If no, then repeat the above steps

– If no, then queue this Task as a waiter on the lockword and return

– The Lock Manager uses Process Promotion to detect/prevent deadlocks

Serialization – Deadlock Recovery:Task BB

Task AA

LK YYLK XX

LK XX

LK YYLK MGR

Log Err

UNLK XX

LK XX

UNLK XX

UNLK YY LK YY

- Task AA locks XX

and Task BB locks YY- Task BB attempts to lock XX and

waits- Task AA attempts to lock YY and calls the lock manager to become a waiter

Retry

- Lock manager detects deadlock and calls error logging module- Error is percolated back to Task AA- Task AA’s undo retry code unlocks XX- Task BB now owns XX and exits wait- Task AA retry attempts to lock XX and waits- Task BB unlocks XX, Task AA exits wait- Tasks now execute normally- Deadlock resolved as a temporary error

Time

Serialization Summary:Deadlock conditions:- At least two Tasks- At least two lockwords locked in opposite order- Conflicting relationship in timeRemove any condition to eliminate the deadlock

Classic solutions: Dead wait state, terminate task, lock hierarchiesBut undo retry recovery can change timing relationships!Design Notes:- Task waits on only one lockword at a time- Locking is done with inline code- No infinite spin locks. Can use finite spin and then suspend- Lock manager only called to put the Task on the lockword waiter chain- Lock hierarchy only needed for performance reasons- Lockword test needed to support recursion; “locked already”- Control blocks must be filled in before being enqueued- One lockword per control block and one lockword per control block chain

Serialization Results:– Imperfect lock hierarchy is acceptable

– Lock hierarchy can evolve naturally and is not a concern

– Design and implementation can occur concurrently

– Very good performance in non-deadlock case

– Code and modules can be added or changed as needed functionally

– Lockwords can be defined and locked as neededExperience:

– We had one lockword per control block and one lockword per control block chain; we never really counted or kept track of them

– Deadlock error logs showed that deadlocks only occurred in stressed forced excessively recursive situations; not normal operation

– Systems seem to have a natural lock hierarchy based on code flow

– We never even bothered to define a lock hierarchy

Dynamic Module Replacement Objectives:– Add new functions by module without recompiling the system

– Apply fixes without shutting down and re-booting the system

– Backing out bad fixes without shutting down/re-booting the system

– Add or remove debug tracking aids as needed

– Operator (System Administrator) controlled

– Optionally allow system to update itself

Dynamic Module Replacement: Classic Load Module: External References– Physical modularity lost

– Pathological relationships between modules

– Requires compile and link edit

– Only local external references resolved

– Difficult to uncouple a module Call Directory

Object Modules Only– No external references; all such data is located in Call Directory

– Physical modularity preserved

– Needs only a compile

– Easy to uncouple a module

Code and Data External References

Dynamic Module Replacement – Module Structure:Main:

Init: Term: Recovery Table::

Initialization:- Insert in Call Directory- Build Structures- Un-Do recovery for initialization

Termination:- Release Structures- Un-Do recovery for termination

Mainline:- Function- Un-Do recovery for mainline

Recovery Re-Entry Point Table:

Name, Date, Version, Size

•Module name•Release Date•System/Module

Version•Module Size, etc- Initialization code- Termination code- Recovery table

Module Header Contains:•Main entry jumps to mainline

code

•Pointers to:

Module Characteristics:•No external references

•Entire function encapsulated

•Structured programming rules

•Re-entrant code•Single CSECT•Single entry point

Dynamic Module Replacement Guidelines:– A module is a single encapsulated unit

– Physical modularity as well as logical modularity

– Apply fixes without shutting down and re-booting the system

– Single entry point modules

– Hardware provides pointer atomicity (four byte word)

Easy to Difficult Module Replacements:

– No external interface changes, internal changes only

– Calls to new modules; load new modules first

– Interface changes; assign unused Call Directory slot first

– Data structure changes; recompile and reboot recommended

Dynamic Module Replacement: One Module

– Task A calls Module BB

Call Directory

Call FF

Module BB

Call WW

Module FF

Call FF

Module KK

Task A Task BTask Stacks

BB work space

KK work space

FF work space

– Module BB calls Module FF– Task B calls Module KK– Module FFn is loaded

Call XX

Module FFn

– Module FFn initialization replaces Module FF’s ptr in the Call Directory

– Module KK calls Module FFn– Task A continues to use Module FF

FFn work space

– All future calls will call Module FFn– Module FF space reclaimed later

– Task B continues to use Module FFn

WW work space

XX work space

Dynamic Module Replacement: New ModulesCall Directory

– Module BB to be restructured with new modules PP and QQ– Deepest nested Module QQ loaded first. QQ sets its Call Directory ptr – Module BBn is loaded last and replaces Module BB’s Call Directory ptr – Module BB space reclaimed later

– Module PP loaded next and sets its Call Directory ptr

Call QQ

Module PP

Call HH

Module QQ

Call HH

Module BB

Call DD

Module HH

– Modules PP and QQ assigned unused Call Directory entries

Module load sequence is important so that new modules are not called before being loaded

Module BBn

Call PP

Dynamic Module Replacement: New InterfaceCall Directory

– Module HH’s call argument interface is changed and becomes Module HHn – Modules HH, BB, and QQ are recompiled. Module HHn is loaded first. – Module QQn is loaded last and replaces Module QQ’s Call Directory ptr – Module BB’s, QQ’s, and HH’s space is reclaimed later

– Module BBn is loaded next and replaces Module BB’s Call Directory ptr

Call DD

Module HHn

Call HH

Module QQ

Call DD

Module HH

– Module HHn is assigned an unused Call Directory entry

Module load sequence is important so that the new module, HHn, is not called before being loaded

Call HHn

Module QQn

Module BB

Call HH

Module BBn

Call HHn

Dynamic Module Replacement Summary:• Guidelines:

– Operations encapsulated in one re-entrant module:• Main function, initialization, termination, and recovery

– Call Directory entries are NOT reused– System administrator controls sequence and timing:

• No external interface changes, internal changes only; easy

• Calls to new modules or new module interface; assign unused Call Directory slots and load new modules first

• Data structure changes; recompile and reboot recommended

– Module Initialization just replaces its pointer in the Call Directory and does not initialize any structures. Termination NOT called.

• Experience:– Interface macros used to determine module re-compiles– Concept and method easy to explain & understand– Module replacement also used to back out bad changes

Dynamic Module Loading Objectives:– Improve storage space management by loading modules as needed

– Activate functions dynamically as needed:

– Virtual address spaces (paging support)

– Multiple CPU support (2 to N processors)

– Various I/O devices

– Dynamically adjust system based on existing hardware

– Dynamically adjust system based on Engineering Do’s and Don’ts

– Non-operator control of module loading; system/program needs

Dynamic Module Loading: OperationCall Directory

– Module Directory is a one-to-one Module name map of the Call Directory– An interrupt occurs when Module JJ calls Module EE

– Module EE’s initialization is called and sets its ptr in the Call Directory – Module II restores Module HH’s interrupt status and re-invokes call

– Module II verifies interrupt and loads indicated module, Module EE

– Call Directory pointers initialized to point to Module Directory entries

Module DirectoryAA

BB

CC

DD

0

0

0

0

0 EE

Module JJ

Call EE

Module II

Module EE

Performance penalty only when module is called the first time

The system will automatically adjust to the needs of the programs

Dynamic Module Loading: Boot & Load List

Load List is a text file containing names of main system modules

Load List

AA UU CC GG SS

– Boot Program is loaded: - Storage is scanned - Locations defined

System Disk

Boot Program

Data FileCall DirMod Dir

– Data file is loaded: - Call Directory - Module Name Directory– Load List is loaded– Code Modules, specified in the Load List, are loaded– Each module initialization is called to set Call Directory pointers

Code Modules

Data File is a compiled Call Directory with a Module Name Directory

Dynamic Module Loading Summary:• Guidelines:

– No external references in modules– Operations encapsulated in one re-entrant module:

• Main function, initialization, termination, and recovery

– System activity determines which modules are loaded– Module initialization code checks to see if its Call Directory

pointer points at an entry in the Module Directory or a code module• If a code module, then this is Module replacement. Set the module

pointer in the Call Directory• If not a code module, then this is Module loading. Build structures,

load other needed modules, set values in the Call Directory, and finally, set the module pointer in the Call Directory

Experience:– Concept and method easy to explain & understand

Overall Summary:• Guidelines:

– No external references allowed!– Each code module contains function mainline, initialization,

termination, and undo recovery code– Programmer writes/fixes all code in a code module

Experience:– Concepts and methods easy to explain & understand– Chief designer must own/write documentation and:

• Call Directory• Module name list• Module header structure• Task Control block structure

– No need for lock hierarchy allowed concurrent design and implementation along with ease of adding new functions and making performance code flow changes

Application to Other Systems:• Problem:

– Invested already in existing old software– Designers & programmers used to current procedures– No desire to redo existing code

• Steps:– Setup Call Directory– Remove external references from code modules

• Experience:

– Concepts and methods easy to explain & understand

– Build times shortened

– Storage requirements reduced

– New functions possible due to allowing recursion

Documents

System Infra-Structure