13
RAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc [email protected] 14th June 2002 (v8) UKUUG 2002 Bristol 5th July 12:40 Topics 1. Dynamic Probes 2. Kernel Hooks (GKHI) 3. Linux Event Logging for the Enterprise 4. Flexible Dump 5. System Trace 6. Community Adoption 7. Miscellaneous 8. What's Next 9. The Team - Contacts

RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc [email protected]

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

RAS Round-up From the IBM Linux Technology Centre

IBM Linux Technology Centre

Richard J. Moore C.Eng, MIEE, MIEEE, BSc

[email protected]

14th June 2002 (v8)

UKUUG 2002Bristol

5th July 12:40

Topics1. Dynamic Probes2. Kernel Hooks (GKHI)3. Linux Event Logging for the Enterprise4. Flexible Dump5. System Trace6. Community Adoption7. Miscellaneous8. What's Next9. The Team - Contacts

Page 2: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

Dynamic Probes (DProbes)

1.1 Dynamic Probes - What is it?

Low-level Universal DebuggerOperates in extreme conditionsKernel/User, Interrupt/Task, Code/DataFor Service/Support Engineers on Production SystemsMonitors Low-level System ResourcesDynamic & Fully AutomatedTrigger/Enabler for:KDB, LKCD,LTT,evlog, Core Dump, Syslog, etc

1.2 Dynamic Probes - Where Used?

Page 3: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

1.2 Dynamic Probes - Where Used?

Service/Support Engineer's FacilityLive SystemsNon-recreatable ProblemsNo source modification requiredTiming Sensitive Problems

Developer's ToolAlternative to temporary printk/printfApplication, Driver, Kernel etc.Timing Sensitive Problems

Test ToolFault Injection

1.3 Dynamic Probes - Mechanics

1.3 Dynamic Probes - Mechanics

Global Breakpoint ProbesIn-core code modificationTrack by Inode-OffsetAvoid COW page privatization using physical addressUnlimited Concurrent Probes

Global Watchpoint ProbesUses Debug RegistersTrack by Virtual Address

Pre-probe Script Driven Probe HandlerRPN assembler language interpreterHLL C-like Compiler

2 Kernel Hooks (GKHI)

Page 4: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

Kernel Hooks (GKHI)

112

2

3

4

567

89

1011

2.1 Kernel Hooks - What are they?

Code locations where added function may be inserted

Supplement or replace standard function - subclassing

Function may not be known at build or run time

Function may load later therefore simple call cannot be used

Kernel has a particular need to implement hooks

Used by DProbes

2.2 Why not Patch?

Page 5: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

2.2 Why not Patch?

InconvenientMultiple patches may require manual rework

InflexibleCannot select additional functions at run-time

Code BloatAdditional functions always presentObscures the prime function

2.3 Basic Requirements

2.3 Basic Requirements

Multiple hooks to co-exist within a module

Shared use of a hook by multiple exits

Sole use of a hook by a specific exit

Minimal impact to performance when a hook is unused

Exit must be able to operate as if inserted:Have access to local variablesTerminate the function

Group of exits able to insert atomically

Need a Managed Interface2.4 The Managed Interface

Page 6: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

2.4 The Managed Interface

For Hooked Code:A HOOK macro - indicate the hook locationhook_initialise - allows use of the hookhook_terminate - disallows use of the hook

For Hook Exits:hook_register - identifies exit routine and priorityhook_arm - activates group of exitshook_disarm - deactivate group of exitshook_deregister - removes exit from interface

3 Linux Event Logging

Linux Event Loggingfor the

Enterprise(evlog)

Page 7: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

3.1 evlog - What is it?

Comprehensive Logging CapabilityComplies with draft POSIX SRASS standardPOSIX APIsStructured Event RecordsOptionally Captures Syslog and Klog messagesLogs Binary and Text MessagesUser and Kernel SpaceTask, Init & Interrupt TimeEvent Notification - Automation, System ManagementEvent FilteringLog ManagementAfter-the-fact Formatting

3.2 evlog - Where Used

112

2

3

4

567

89

1011

3.2 evlog - Where Used

Device Driver HardeningAutomated RecoveryOn-line Diagnostic ActionSystem Management

Instrumentation SchemesWrapper macrosEase of Implementation

4 Flexible Dump

Page 8: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

Flexible Dump

4.1 Flexible Dump - What is it?

Goals for a Comprehensive System Dump Non-disruptive - Snapshot CapableSystem and (multiple) User Context VisibilityMinimal System DependenceStand-alone CapableCustomisable Dumping - Virtual & Physical Memory Ranges, Objects, Processor Resources etc.Multiple triggers: Exception Kernel/User, API, NMI/KBD InterruptAccess to Swapped DataDump Space/Repository ManagementProgrammable formatterSMP CapableSupport for Alternative Dump Devices (Telco)

4.2 Flexible Dump - Where is it?

Page 9: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

4.2 Flexible Dump - Where is it?

Contributions to LKCDSnapshot Dump - DProbes interfaceNon-disruptiveCustom Dump ObjectsMinimal System DependenceSMP fixes + multiple CPU status saving

Working with LKCD Community

5 System Trace

System Trace

Page 10: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

5.1 System Trace - What is it?

Generic Trace Recording Mechanism

Community contributions to:Linux Trace Toolkit (Opersys)Dynamic Trace - DProbes interfaceFormatting exit for RAW trace data

Supporting Similar efforts in:Linux Kernel State Trace (LKST) - Hitachi

5.2 Tracing Initiatives

112

2

3

4

567

89

1011

5.2 Tracing InitiativesBuffering:

Per CPUPer ComponentZero LockingVariable Length

ControlSuspend/ResumeGlobal Activate/Deactivate

InstrumentationKernelDriversUser-space SubsystemsFine-grained Dynamic Trace

Formatting

6 Community Adoption

112

2

3

4

567

89

1011

Page 11: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

Community Adoption

6.1 Adoption Initiatives

Establishing a RAS Community - OLS RAS BoFs

Minimise Fragmentation - Maximise Contribution

Canvassing Distributors

POSIX

Instrumentation - standards, aids, implementation

Porting & Currency

Preparation for 2.5 Kernel

7 Miscellaneous

Page 12: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

7 Miscellaneous

KDBComplex Breakpoints - DProbes InterfaceTwo Patches Accepted

KernelDebug Register Allocation Patch (Dprobes/KDB/gdb)

8 What's Next?

8 What's Next?

Log/Trace Instrumentation of Kernel and Device DriversWe need participation from the Community

DProbes ports for IA64 and zSeriesTurbo Linux release of RAS UtilitiesSampler Probe type for ProfilingDProbes HLL CompilerDump User ContextsKDB User ContextsMission Critical mcore Integration with LKCDOn-line Diagnostics FrameworkFirst Failure System TechnologyPerformance Co-PilotRAS Community BOF at OLS 9 The Team - Contacts

Page 13: RAS Round-up From the IBM Linux Technology CentreRAS Round-up From the IBM Linux Technology Centre IBM Linux Technology Centre Richard J. Moore C.Eng, MIEE, MIEEE, BSc richardj_moore@uk.ibm.com

9 The Team - Contacts

End of Presentation

India:

Suparna BhattacharyaS. VamsikrishnaSubodh SoniBharata B. Rao

USA:

Larry KesslerJames KenistonHaren MyneniHien Q NguyenMike SullivanMichael MasonThomas ZanussiDaniel StekloffDavid OleszkiewiczTom Hanrahan (Manager)

Mailing List: [email protected] Page: http://systemras.sourceforge.net/Richard J Moore: [email protected]

UK:

Richard J Moore (Architect)

Trademarks

Trademarks

IBM, zSeries and S/390 are trademarks of the International Business Machines Corporation in the United States and other countries.

IA32 and IA64 are abbreviations Pentium 32-bit and Itanium 64-bit architectures of the Intel Corporation.