25
Event Analysis ERO Event Analysis Data Base ERO Event Analysis Data Base August 1, 2012 Sam Holeman, Duke Energy, Chair of EA Subcommittee

ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Event AnalysisERO Event Analysis Data BaseERO Event Analysis Data Base

August 1, 2012

Sam Holeman, Duke Energy, Chair of EA Subcommittee

Page 2: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Stuff Happens

2 RELIABILITY | ACCOUNTABILITY

Page 3: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Stuff Happens

My Daughter’sMy Daughter s Weddingg

3 RELIABILITY | ACCOUNTABILITY

Page 4: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Goals of the EA Process

• Promoting Reliability• Developing a Culture of Reliability Excellence• Collaboration• Collaboration• Being a Learning Organization •HELPING SYSTEM OPERATORS IMPROVEG S S O O S O•IT IS NOT COMPLIANCE

4 RELIABILITY | ACCOUNTABILITY

Page 5: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

History

• EA Field Test began in• EA Field Test began in October 2010• Phase II Field Test began gMay 2011• Approved ERO Process in February 2012

5 RELIABILITY | ACCOUNTABILITY

Page 6: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Categorization of Events

CAT 1              CAT 2                CAT 3               CAT 4              CAT 5      

• Based on varying levels of significance • Impacts on the interconnected system 

6 RELIABILITY | ACCOUNTABILITY

Page 7: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Brief Report Template

• Draft within 5 business days• Draft within 5 business days• Final Report in 10 days• Quality steadily improvingQ y y p g• Focus on collaboration with Regions

7 RELIABILITY | ACCOUNTABILITY

Page 8: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Event Analysis Report

• CAT 3 and above require more• CAT 3 and above require more information • Timeline established but negotiable 

8 RELIABILITY | ACCOUNTABILITY

Page 9: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Events by Category

9 RELIABILITY | ACCOUNTABILITY

Page 10: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Event Analysis Report (EAR) Submittals25

30

24

26 26(October 25, 2010 ‐ February 25, 2012); 

28 EAR's submitted since end of Field Trial = 121 total 

20

16

15 13

5

10

5

7

4

0FRCC MRO NPCC RFC SERC SPP TRE WECC

10 RELIABILITY | ACCOUNTABILITY

Page 11: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Candidate Lessons Learned

Not every event on the bulk power system (BPS) has a quality “Lesson” to sharequality  Lesson  to share

• NERC looked at 230 qualifying events (Category 1 and above) and received 119 “candidates” for Lessonsabove) and received 119  candidates  for Lessons Learned 55 of these came from the Cold Snap event of 2011

• Excluding the Cold Snap event, there were 64 other events which resulted in a Lesson Learned being submitted for consideration

• Twenty‐two Lessons Learned published in 2011, and h d

11 RELIABILITY | ACCOUNTABILITY

thirteen to date in 2012

Page 12: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Lessons Learned – Published (2012)

Region Lessons Learned Brief Description DateTRE TRE-LL-05 – Plant Onsite Material and Personnel Needed for a Winter

Weather Event 1/06/2012

TRE TRE-LL-06 - Plant Operator Training to Prepare for a Winter Weather Event 1/06/2012

TRE TRE-LL-07 - Transmission Facilities and Winter Weather Operations 1/06/2012

NPCC LL 54 DC Supply and AC Transients 3/06/2012NPCC LL-54 - DC Supply and AC Transients 3/06/2012

WECC LL-58 – Saturated Bus Auxiliary Current Transformer causes Bus Differential Operations during Line Fault 3/06/2012

TRE TRE-LL-34 – Rotational Load Shed 3/06/2012

WECC LL-59 - Auxiliary Relay Contact Contamination 6/19/2012

WECC LL-60 – Remote Terminal Units not on DC Sources 6/19/2012

WECC LL-61 – EMS Database Corruption Problem 6/19/2012

WECC LL-62 – Unmanned Forklift contact with Energized Bus 6/19/2012

RFC LL-65 – Excessive Resource Utilization 6/19/2012

TRE LL-66 – Alarm Interpretation Leads to Generator Stator Coil Failure 6/19/2012

12 RELIABILITY | ACCOUNTABILITY

p

NPCC LL-67 – Protective Relaying Digital Input Board Loading 6/19/2012

Page 13: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Event Trending *

30Qualified events (October 25, 2010 ‐ June 25, 2012)

23

19.8020

25

10

15

8.305

10

Monthly average = 14.05 events

0

13 RELIABILITY | ACCOUNTABILITY

* Control chart of monthly events, with control limits calculated by using 3‐month Moving Average method

Page 14: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Cause Code Definitions

Short Title DefinitionDesign/Engineering Problem An event or condition that can be traced to a defect in

d i th f t l t d t fi tidesign or other factors related to configuration, engineering, layout, tolerances, calculations, etc.

Equipment/Material Problem Is defined as an event or condition resulting from the failure, malfunction, or deterioration of equipment or parts, , , q p p ,including instruments or material.

Individual Human Performance LTA

An event or condition resulting from the failure, malfunction, or deterioration of the individual human performance associated with the processperformance associated with the process.

Management Problem An event or condition that could be directly traced to managerial actions, or methodology (or lack thereof).

Communications LTA Inadequate presentation or exchange of informationCommunications LTA Inadequate presentation or exchange of information.

Other Problem The problem was caused by factors beyond the control of the organization

14 RELIABILITY | ACCOUNTABILITY

LTA = Less Than Adequate

Page 15: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

A L l C C d

Root Cause determinationsA‐Level Cause Code 

(of 127 Total "Qualified" events with CC "entered")

9%Design/Engineering Problem

Equipment/Material  Problem

20%37% Individual Human Performance LTA

Management Problem37% of the reports did not contain sufficient informationto determine causal factors.

3%

2%

Communication LTA

Other Problem

No Causes Found

22%2%6% Information to determine cause 

LTA

NERC has “Cause Coded” 174 Qualified Events (as of 6‐25‐2012) Of these events

15 RELIABILITY | ACCOUNTABILITY

NERC has  Cause Coded  174 Qualified Events (as of 6 25 2012).  Of these events, we were able to assign some type of “Root Cause” coding for 127 events (~72%).

Page 16: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Identified Root Causes

Identified Root Causes(80 events)(80 events)

14%4%

9%

Design/Engineering Problem

Equipment/Material  Problem

Individual Human Performance LTA

See Deeper dive Ch t See Deeper dive

30%35%

LTA

Management Problem

Communication LTA

Chart pChart

5%

Other Problem

16 RELIABILITY | ACCOUNTABILITY

Root cause for 80 events.

Page 17: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Deeper Dive into Management

7

"Management Problem" Cause Factors

A4B3C08 = Job Scoping did not identify special circumstances or conditionsA4B5C04 = Risks/consequences associated with change not adequately reviewedA4B1C04 Management follo p did not identif problems

5

6A4B1C04 = Management follow‐up  did not identify problemsA4B1C05 = Management assessment did not determine cause of previously event or known problemA4B1C06 = Previous Industry or in‐house experience was not effectively used to prevent recurrenceA4B5C05 = System interactions not considered

3

4

2

3

0

1

17 RELIABILITY | ACCOUNTABILITY

A4B3C08 A4B5C04 A4B1C04 A4B1C05 A4B1C06 A4B5C05 A4B1C03 A4B1C08 A4B1C09 A4B3C09 A4B5C02 A4B5C03

A4

Page 18: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Deeper Dive into Equipment

10

"Equipment/Material Problem" Cause Factors

A2B6C01: Defective or failed part

7

8

9p

A2B6C07: Software failureA2B3C03: Post-maintenance/post-modification Testing LTAA2B6C04: End-of-life failureA2B6C06: ContaminantA2B5C02: Fabricated item did not meet requirementsA2B3C02 I ti /t ti LTA

5

6

7 A2B3C02: Inspection/testing LTAA2B5C04: Product acceptance requirements LTA

3

4

0

1

2

18 RELIABILITY | ACCOUNTABILITY

0A2B6C01 A2B6C07 A2B3C03 A2B6C04 A2B6C06 A2B5C02 A2B3C02 A2B5C04

Page 19: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

NERC Alert-Advisory

Configuration Control Practices – Advised industry of  events resulting from human performance errors duringevents resulting from human performance errors during protection system maintenanceEvent examples of inadequate control procedures:Event examples of inadequate control procedures:

1. Relay technician follow proper procedure to return protection system to normal state resulting in remote trip

2. Construction team failed to use latest construction document resulting in incorrect calibration of equipment

3 Relay technician leaves work site Returns to resume work3. Relay technician leaves work site. Returns to resume work but did so at wrong cabinet and trips substation

4. Technician trips a transformer due opening a wrong 

19 RELIABILITY | ACCOUNTABILITY

p p g gcurrent shorting switch

Page 20: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

NERC Alert-Advisory

• EMS Alert Advisory Analysis‐ During the Event Analysis (EA) field trial, 28 Category 2b events have occurred where a complete loss of SCADA/EMS lasted for more than 30 minutes Analysis is currently being conducted to providefor more than 30 minutes. Analysis is currently being conducted to provide emerging trends for the industry

• Current analysis of these events has shown:

f f il b f f h Software failure is a major contributing factor in 50 percent of the events 

Testing of the equipment has been shown to be a factor in over 40 percent of the failures:

o Test environment did not match the production environment

o Product design (less than adequate)

Change Management has had an impact in over 50 percent of the failures:g g p p

o Risk and consequences associated with change not properly managed

o Identified changes not implemented in a timely manner  

Individual operator skill based error was involved in 15 percent of the

20 RELIABILITY | ACCOUNTABILITY

Individual operator skill‐based error was involved in 15 percent of the events...  

Page 21: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Solving Problems: Untying the Knot

21 RELIABILITY | ACCOUNTABILITY

Malcolm K. Sparrow John F. Kennedy School of Government, Harvard University

Page 22: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

The Way Ahead

• Process must continually improvey p• Need to combine processes when possible• Better follow‐up as neededp• Tie in other data sources• Provide not just data but information to industryj y

Not every event results in a succinct lesson learned, but we learn from every event. 

22 RELIABILITY | ACCOUNTABILITY

Malcolm K. Sparrow John F. Kennedy School of Government, Harvard University

Page 23: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Safety Check

Peer Check

23 RELIABILITY | ACCOUNTABILITY

Page 24: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

The Way Ahead

• EAS Focus• EA Process Document Annual Update• EMS SCADA Task Force• Registered Entity Reports to OC• Summary of current lessons learned for OCy• Human Performance/Cause Code Task Force

Goal – HELP OPERATOR ON SHIFT GET BETTER 

24 RELIABILITY | ACCOUNTABILITY

Malcolm K. Sparrow John F. Kennedy School of Government, Harvard University

Page 25: ERO EA Database - NERC 2012 Cause Analysis and D… · Sam Holeman, Duke Energy, Chair of EA Subcommittee . Stuff Happens 2 RELIABILITY | ACCOUNTABILITY. Stuff Happens My DaughterDaughter

Q ti d AQuestions and Answers

25 RELIABILITY | ACCOUNTABILITY