View
227
Download
0
Category
Tags:
Preview:
Citation preview
Direct Cause vs Root Cause
“A Problem Solving Concept”
INCOSE Enchantment Chapter Meeting
March 14,2007
Dr David E. Peercy
Sandia National LaboratoriesDepartment 12341, Weapon System and Software Quality
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 2
Presentation Objective
Events have many potential “causes”. We tend to think of “causes” as related mostly to “unwanted” events – but in effect, all events that occur have “causes” – that is, the reason that the event occurs.
The objective of this short presentation/discussion is to gain a better understanding of why it is important to understand the difference between “direct” causes and “root” causes of events.
In so doing, we enhance our capability to influence a much larger class of events – both in preventing unwanted events and ensuring wanted events actually do occur.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 3
An Example of a Problem
USAF F-22A jets grounded by software glitch<Jeremy Epstein <jepstein@webmethods.com>> Fri, 23 Feb 2007 15:55:52 -0500
Navigational systems failed, planes forced to return to Hawaii [visually having to follow their tankers to safety].
The problem turns out to be software (no surprise there). Fix created, "verified", installed, and they're off again.
[Direct or Root Cause addressed?]
A spokesman for Lockheed Martin this week insisted that the navigation software problem was minor. 'The issue was quickly identified in a matter of days and a fix installed in the airplanes, which were flown successfully to Japan,' he said. 'There are 87 of these exceptional fighters and they are out there performing exceptionally well, and their pilots continue to fly them in new and greater ways.'"
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 4
Examples to Test Our Understanding
Army Training Accident, June 2002
Friendly Fire Deaths, March 2002
Medical “Direct/Root” Cause Determinations
RESOURCE: http://catless.ncl.ac.uk/RisksPeter Neumann, Stanford University ProfessorRISK site provides a voluminous list of risks, many of which are computer/software related - primarily interested in security and safety risks; summaries are provided with links to more detail.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 5
A Simple Example
Assume each of these factors is as described below:e: car will not start d: battery is dead c: alternator does not function b: alternator is well beyond its designed service life a: car is not being maintained according to recommended service schedule
Direct Cause?
Intermediary Causes?
Root Cause?
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 6
Error, Fault/Defect, Failure
Error– a human action or lack of action that results in the inclusion of a fault in a
product or the way it is used
– the variance between expected and actual results
Fault/Defect– an accidental condition that causes a product to fail to perform its required
function if encountered during operational use
Failure– an event in which a product does not perform a required function within its
specified limits during operational use
ERROR FAULT/DEFECT FAILUREmay lead to may lead to
may lead to NO FAILUREREDUCED EFFECT
FAULT TOLERANCEor
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 7
Direct Cause
Causes of events may be natural or man-made, active or passive, initiating or permitting, obvious or hidden.
Those causes that lead immediately to the effect are often called direct or proximate causes.
Examples of direct/proximate causes: Equipment HumanArched • Pushed incorrect buttonLeaked • FellOver-loaded • Dropped toolOver-heated • Connected wires
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 8
Root Cause
Direct causes often result from another set of causes, which could be called intermediate causes, and these may be the result of still other causes.
When a chain of cause and effect is followed from a known end-state, back to an origin or starting point, root causes are found.
The process used to find root causes is called root cause analysis --- systematic problem solving.
A root cause is an initiating cause of a causal chain which leads to an outcome or effect of interest.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 9
The Benefits of Problem Solving!
The usual purpose of attempting to find root causes is to solve a problem that has actually occurred, or to prevent a less serious problem from escalating to an unacceptable level (e.g., Near miss safety for aircraft).
The basic concept is that solving a problem by addressing root causes is ultimately more effective than merely addressing symptoms or direct causes.
That is, a “class” of problems may be solved/prevented by addressing root causes rather than just direct causes.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 10
Basic Process - Continue to Ask Why!
Continue to ask “why” until you have reached:
1. Direct, Intermediate, and Root cause(s) - including all organizational factors that exert control over the design, fabrication, development, maintenance, operation, and disposal of the system.
2. A problem/cause that is not correctable by your organization => may be promoted to higher responsible organization.
3. Insufficient data to continue.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 11
Example
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 12
Why-Causal Tree
ROOT CAUSES
PROXIMATE CAUSES
INTERMEDIATE CAUSES
Event #2Event #2 Failed or Exceeded Barrier or Control
Failed or Exceeded Barrier or Control
Undesired OutcomeUndesired Outcome
ConditionConditionEvent #1Event #1
WHY Event #1 Occurred
WHY Event #1 Occurred
WHYFailed/Exceeded
Barrier or Control
WHY Event #2 Occurred
WHY Event #2 Occurred
WHYWHY WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY
WHY WHY WHY WHY WHY WHY WHY WHY WHY WHY
WHY ConditionExisted or Changed
WHY ConditionExisted or Changed
WHYFailed/Exceeded
Barrier or Control
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 13
Example
InstalledImproperly
Beyond ShelfLimit
Battery Dead
Power Supply Failed
Root Cause is Much DeeperKeep Asking Why
Satellite FailedTo Deploy Antenna
Satellite FailedTo Deploy Antenna
Technician Used WrongMethod to Correct
Technician Used WrongMethod to Correct
Lost High Speed Data Stream From Satellite (Mission Failure)
Lost High Speed Data Stream From Satellite (Mission Failure)
PoorLine of Sight
PoorLine of Sight
Thrusters Oriented Space Craft
Thrusters Oriented Space Craft
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 14
Potential Problem Analysis Tools
Failure Modes and Effects Analysis (FMEA)– an inductive engineering technique used at the component
level to define, identify, and eliminate known and/or potential failures, problems, and errors from the system, design, process, and/or service before they reach the customer
Fault Tree Analysis (FTA)– FTA is a deductive analytical technique of reliability and
safety analyses and generally is used for complex dynamic systems
Probabilistic Risk Assessment (PRA)– PRA is a systematic, logical, and comprehensive discipline
that uses tools like FMEA, FTA, Event Tree Analysis (ETA), Event Sequence Diagrams (ESD), Master Logic Diagrams (MLD), Reliability Block Diagrams (RBD), and so forth to quantify risk.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 15
Summary
Direct Cause vs Root Cause– Issue: level of problem solving
Problem Solving– Direct Cause: objective is to solve an instance of a
potential class of problems– Root Cause: objective is to solve a class of problems– Both are useful
Analysis Methods– Methods exist to analyze events – goal is to eliminate
occurrence of unwanted events and ensure wanted events do occur
– FMEA, FTA, PRAQ&A?
Examples
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 17
Army Training Accident
Incident– Thu, 13 Jun 2002: two soldiers were killed in training at Ft
Drum. They were firing artillery shells, and were relying on the output of the Advanced Field Artillery Tactical Data System. When they forgot to enter the target altitude, the system assumed an altitude of zero. (Ft Drum is 676 ft)
Direct Cause– Soldiers forgot to enter the target altitude
Potential Root Cause(s)– Software should not default to a valid altitude– Software/System analysis and modeling/testing inadequate– Software requirements not adequately specified– System CONOPS not adequate– Soldier training inadequate
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 18
Friendly Fire Deaths
Incident– A U.S. Special Forces air controller was calling in GPS positioning from some
sort of battery-powered device. He had used the GPS receiver to calculate the latitude and longitude of the Taliban position in minutes and seconds for an airstrike by a Navy F/A-18. The bomber crew "required" a seconds calculation in degree decimals. The crew did not have equipment to perform the minutes-seconds conversion themselves.
– The air controller had recorded the correct value in the GPS receiver when the battery died. Upon replacing the battery, he called in the degree-decimal position the unit was showing -- without realizing that the unit is set up to reset to its *own* position when the battery is replaced.
– The 2,000-pound bomb landed on the air controller position, killing three Special Forces soldiers and injuring 20 others.
Direct Cause– Taliban position was incorrectly transmitted to the Navy F/A-18 bomber crew
Potential Root Cause(s)– GPS System Default was a valid not invalid position– Lack of battery backup to hold values in memory during battery replacement– Not equipping users to translate one coordinate system to another (reminiscent
of the Mars Climate Orbiter slamming into the planet when ground crews confused English with metric)
– Using a device with such flaws in a combat situation without adequate testing
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 19
Medical Direct/Root CauseExample 1 - Questions?
Sentinel event Direct cause Root cause - thoughts?
A patient was given the wrong medication and the patient experienced an adverse reaction. As a result, the patient's length of stay was extended for an additional 10 days.
The nurse who administered the medication did not compare the name on the patient's armband to the name on the medication order. The nurse did not follow the patient identification policy.
Registration staff placed the wrong armband on the patient's arm to begin with.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 20
Medical Direct/Root CauseExample 2 - Questions?
Sentinel event Direct cause Root cause - thoughts?
Doctor prescribes an anti-seizure drug (phenytoin) and the patient develops a severe allergic reaction known as anaphylaxis. The symptoms were itching, hives, swelling in the throat, wheezing, light-headedness from low blood pressure, nausea, and
abdominal cramping.
Patient is allergic to phenytoin.
The doctor did not do a thorough background check on the patient medical history or the patient did not inform the doctor of his/her previous medical history.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 21
Medical Direct/Root CauseExample 3 - Questions?
Sentinel event Direct cause Root cause - thoughts?
Medication of Lasix drip hung to wrong patient. Patient had same last name.
Interruption during medication administration. - nurse had very heavy patient assignment and skipped double check medication administration with another RN.
Missed the double check process on patient identification and medication administration. All hospital medication should be double checked by two nurses.
March 14, 2007Direct Cause vs Root Cause INCOSE Chapter Meeting 22
Medical Direct/Root CauseExample 4- Questions?
Sentinel event Direct cause Root cause - thoughts?
A patient slips and falls on a slippery floor that has been mopped previously from another patient having an upset stomach.
Janitor was not able to put signs down noting caution before the patient walked down the hall because he was interrupted by a cafeteria worker needing him to clean a spill made.
The sign is not down noting the caution.
Recommended