Risks (and Rewards) 1-2 Quinn Chapter 8 Overview Introduction Data entry or data retrieval errors...

Preview:

Citation preview

Risks (and Rewards)

1-2

Quinn Chapter 8 Overview

• Introduction• Data entry or data retrieval errors• Software and billing errors• Notable software system failures• Therac-25• Computer simulations• Software engineering• Software warranties and vendor liability

Is Technology Necessary?

The Industrial Revolution and its consequences have been a disaster for the human race. - Theodore Kaczynski

Risks – Who Cares?

• Peter Neumann: Computer-Related Risks, Addison-Wesley/ACM Press. 1995

• ACM Risks Forum: http://www.risks.org

20 Mishaps That Might Have Started

Accidental Nuclear War

1) November 5, 1956: Suez Crisis Coincidence 2) November 24, 1961: BMEWS Communication Failure 3) August 23, 1962: B-52 Navigation Error 4) August-October, 1962: U2 Flights into Soviet Airspace 5) October 24, 1962- Cuban Missile Crisis: A Soviet Satellite Explodes 6) October 25, 1962- Cuban Missile Crisis: Intruder in Duluth 7) October 26, 1962- Cuban Missile Crisis: ICBM Test Launch 8) October 26, 1962- Cuban Missile Crisis: Unannounced Titan Missile Launch 9) October 26, 1962- Cuban Missile Crisis: Malstrom Air Force Base 10) October, 1962- Cuban Missile Crisis: NATO Readiness 11) October, 1962- Cuban Missile Crisis: British Alerts 12) October 28, 1962- Cuban Missile Crisis: Moorestown False Alarm 13) October 28, 1962- Cuban Missile Crisis: False Warning Due to Satellite 14) November 2, 1962: The Penkovsky False Warning 15) November, 1965: Power Failure and Faulty Bomb Alarms 16) January 21, 1968: B-52 Crash near Thule 17) October 24-25, 1973: False Alarm During Middle East Crisis 18) November 9, 1979: Computer Exercise Tape 19) June , 1980: Faulty Computer Chip 20) January, 1995: Russian False Alarm

http://www.nuclearfiles.org/menu/key-issues/nuclear-weapons/issues/accidents/20-mishaps-maybe-caused-nuclear-war.htm

From The Limits of Safety by Scott D. Sagan as quoted by Alan F. Philips, M.D.

Odds of Dying in One Year from Leading Causes

Odds Cause

1756 All Causes

4591     Nontransport Unintentional (Accidental) Injuries

6197     Transport Accidents

6535     Motor-Vehicle Accidents

14017     Accidental poisoning by and exposure to noxious substances

15614     Falls

17532     Intentional self-harm by firearm

18953     Other and unspecified land transport accidents

19216     Car occupant

25263     Assault by firearm

29971     Accidental poisoning by narcotics and psychodysleptics [hallucinogens]

40030     Intentional self-harm by hanging, strangulation, and suffocation

49139     PedestrianNational Safety Council – 2004 Data

Cause of Death – Lifetime Odds in US

Cause Chance of Dying Cause Chance of Dying

Heart Disease 1-in-5 Drowning 1-in-8,942

Cancer 1-in-7 Air Travel Accident 1-in-20,000

Stroke 1-in-23Flood (included also in

Natural Forces) 1-in-30,000

Accidental Injury 1-in-36 Legal Execution 1-in-58,618

Motor Vehicle Accident 1-in-100Tornado (incl also in

Natural Forces) 1-in-60,000

Intentional Self-harm (suicide) 1-in-121

Snake, Bee or other Venomous Bite or Sting 1-in-100,000

Falling Down 1-in-246Earthquake (incl also in

Natural Forces) 1-in-131,890

Assault by Firearm 1-in-325 Dog Attack 1-in-147,717

Fire or Smoke 1-in-1,116 Asteroid Impact 1-in-200,000**Natural Forces (heat, cold,

storms, quakes) 1-in-3,357 Tsunami 1-in-500,000

Electrocution 1-in-5,000 Fireworks Discharge 1-in-615,488

** Perhaps 1-in-500,000

Source: National Center for Health Statistics

Fanciful, But You Get the Idea

http://www.youtube.com/watch_popup?v=jEjUAnPc2VA#t=20

Why is Software Risky?

Lines of Code Developers

OpenOffice 9 million

Android OShttp://www.gubatron.com/blog/2010/05/23/how-many-lines-of-code-does-it-take-to-create-the-android-os/

GNU/Linux 30 million

Windows Vista 50 million 2000

Mac OS X 10.4 86 million

Lucent 5ESS Switch 100 million 5000

Risk of Failure

• Software error

• Hardware error

• Interaction between software design and hardware failure

• User error– User interface design– Training the user

Why might a complex system fail?

1-12

Two Kinds of Data-related Failure

• A computerized system may fail because wrong data entered into it

• A computerized system may fail because people incorrectly interpret data they retrieve

1-13

Disfranchised Voters

• November 2000 general election• Florida disqualified thousands of voters• Reason: People identified as felons• Cause: Incorrect records in voter database• Consequence: May have affected

election’s outcome

1-14

False Arrests

• Sheila Jackson Stossier mistaken for Shirley Jackson– Arrested and spent five days in detention

• Roberto Hernandez mistaken for another Roberto Hernandez– Arrested twice and spent 12 days in jail

• Terry Dean Rogan arrested after someone stole his identity– Arrested five times, three times at gun point

1-15

Accuracy of NCIC Records

• March 2003: Justice Dept. announces FBI not responsible for accuracy of NCIC information

• Exempts NCIC from some provisions of Privacy Act of 1974

• Should government take responsibility for data correctness?

1-16

Dept. of Justice Position

• Impractical for FBI to be responsible for data’s accuracy

• Much information provided by other law enforcement and intelligence agencies

• Agents should be able to use discretion• If provisions of Privacy Act strictly

followed, much less information would be in NCIC

• Result: fewer arrests

1-17

Position of Privacy Advocates

• Number of records is increasing• More erroneous records more false

arrests• Accuracy of NCIC records more important

than ever

1-18

Act Utilitarian Analysis:Database of Stolen Vehicles

• Over 1 million cars stolen every year• Just over half are recovered, say 500,000• Assume NCIC is responsible for at least 20%• 100,000 cars recovered because of NCIC• Benefit of $5,000 per car (owner gets car back; effects on national

insurance rates; criminal doesn’t profit)• Total value of NCIC stolen vehicle database: $500,000/year• Only a few stories of false arrests• Assume 1 false arrest per year (probably high)• Assume harm caused by false arrest $55,000 (size of award to

Rogan)• Benefit surpasses harm by $445,000/year• Conclusion: Good to have NCIC stolen vehicles database

1-19

8.3 Software and Billing Errors

1-20

8.3 Software and Billing Errors

Comair Cancelled All Flights on Christmas Day, 2004

1-21

AP Photo/Al Behrman, File

1-23

Analysis: E-Retailer Posts Wrong Price, Refuses to

Deliver• Amazon.com in Britain offered iPaq for £7

instead of £275• Orders flooded in• Amazon.com shut down site, refused to

deliver unless customers paid true price• Was Amazon.com wrong to refuse to fill

the orders?

1-24

Rule Utilitarian Analysis• Imagine rule: A company must always honor

the advertised price• Consequences

– More time spent proofreading advertisements– Companies would take out insurance policies– Higher costs higher prices– All consumers would pay higher prices– Few customers would benefit from errors

• Conclusion– Rule has more harms than benefits– Amazon.com did the right thing

1-25

Kantian Analysis

• Buyers knew 97.5% markdown was an error

• They attempted to take advantage of Amazon.com’s stockholders

• They were not acting in “good faith”• Buyers were in the wrong, not

Amazon.com

1-26

Notable Software System Failures

1-27

Patriot Missile

• Designed as anti-aircraft missile• Used in 1991 Gulf War to intercept Scud

missiles• One battery failed to shoot at Scud that killed 28

soldiers• Designed to operate only a few hours at a time• Kept in operation > 100 hours• Tiny truncation errors added up• Clock error of 0.3433 seconds tracking error of

687 meters

Patriot Missile Failure

1-28

Figure from SCIENCE 255:1347. Copyright ©1992 by The American Association for the Advancement of Science. Reprinted with permission.

1-29

Ariane 5

• Satellite launch vehicle• 40 seconds into maiden flight, rocket self-destructed

– $500 million of uninsured satellites lost• Statement assigning floating-point value to integer raised

exception• Exception not caught and computer crashed • Code reused from Ariane 4

– Slower rocket– Smaller values being manipulated– Exception was impossible

1-30

AT&T Long-Distance Network

• Significant service disruption– About half of telephone-routing switches crashed– 70 million calls not put through– 60,000 people lost all service– AT&T lost revenue and credibility

• Cause– Single line of code in error-recovery procedure– Most switches running same software– Crashes propagated through switching network

1-31

Robot Missions to Mars

• Mars Climate Orbiter– Disintegrated in Martian atmosphere– Lockheed Martin design used English units– Jet Propulsion Lab design used metric units

• Mars Polar Lander– Crashed into Martian surface– Engines shut off too soon– False signal from landing gear

Diebold Electronic Voting Machine

1-32

© AP Photo/Rogelio Solis

1-33

Issues with DRE Voting Machines

• Voting irregularities– Failure to record votes– Overcounting votes– Misrecording votes

• Lack of a paper audit trail• Vulnerability to tampering• Source code a trade secret, can’t be examined• Possibility of widespread fraud through malicious

programming

1-34

8.5 Therac-25

1-35

Genesis of the Therac-25

• AECL and CGR built Therac-6 and Therac-20• Therac-25 built by AECL

– PDP-11 an integral part of system– Hardware safety features replaced with software– Reused code from Therac-6 and Therac-20

• First Therac-25 shipped in 1983– Patient in one room– Technician in adjoining room

1-36

Chronology of Accidents and AECL Responses

• Marietta, Georgia (June 1985)• Hamilton, Ontario (July 1985)• First AECL investigation (July-Sept. 1985)• Yakima, Washington (December 1985)• Tyler, Texas (March 1986)• Second AECL investigation (March 1986)• Tyler, Texas (April 1986)• Yakima, Washington (January 1987)• FDA declares Therac-25 defective (February

1987)

1-37

Software Errors

• Race condition: order in which two or more concurrent tasks access a shared variable can affect program’s behavior

• Two race conditions in Therac-25 software– Command screen editing– Movement of electron beam gun

Race Condition Revealed by Fast-typing Operators

1-38

Race Condition Caused by Counter Rolling Over to Zero

1-39

1-40

Post Mortem• AECL focused on fixing individual bugs• System not designed to be fail-safe• No devices to report overdoses• Software lessons

– Difficult to debug programs with concurrent tasks– Design must be as simple as possible– Documentation crucial– Code reuse does not always lead to higher quality

• AECL did not communicate fully with customers

1-41

Moral Responsibility of theTherac-25 Team

• Conditions for moral responsibility– Causal condition: actions (or inactions) caused the

harm– Mental condition

• Actions (or inactions) intended or willed -OR-• Moral agent is careless, reckless, or negligent

• Therac-25 team morally responsible– They constructed the device that caused the harm– They were negligent

Postscript

• Computer errors related to radiation machines continue to maim and kill patients

• Investigation by The New York Times– Scott Jerome-Parks, New York (2006)– Alexandra Jn-Charles, New York (2006)

1-42

Therac-25

Denver airport baggage system -- 1993

Mars Climate Orbiter - 1999

AT&T switch failure -- 1990

20 Famous Software Disasters

http://www.devtopics.com/20-famous-software-disasters/

Some Other Famous Bugs

http://en.wikipedia.org/wiki/List_of_software_bugs

The Failure of the Software in the Patriot Missile System

What Really was the Bug?

1. The incident of February 23, 1991

2. Getting the information - the background of Patriot

3. The official explanation

4. Contradictions in the official explanation

5. A broader view of the development process

Causes of Error

• Data Entry – E.g. Florida general election, in 2000

• Data retrieval– Shirley Jackson Stossier arrested instead of

Shirley Jackson• Programming errors

– Patriot missile, Ariane-5 Mars probe, failed because software reused under conditions for which it wasn't designed

– Therac-25 failed because of race condition

Causes of Error contd.• Lack of redundancy and other checks

– Therac-25 problem• Hardware interlock of earlier models removed• Nurse could not see patient from nearby room

– AT&T switch recovered because of redundancy

• Poor handling of bugs– Bugs solved one by one instead of overhaul

of system, as in Denver airport

Causes of error contd.

• Miscommunication among people– Mars Climate Orbiter failed because Colorado

team used English units, but California team used metric

Electronic Voting

February, 2012:

Academy of Motion Picture Arts and Sciences to switch to electronic ballots in 2013.

Electronic Voting

http://homepage.mac.com/rcareaga/diebold/adworks.htm

Trust or accuracy in a game?

Close call in a game, that only technology can be used to resolve.

A. Trust in a referee is more important than what actually happened in the game.

B. Important to get the actual call accurate, even if it means overriding the referee's call.

Trust or accuracy in an election?

Close call in an election, that only technology can be used to resolve. Is trust in the election process important enough to possibly ignore mistakes?

A. YES

B. NO

Electronic Voting

• It’s complicated. Can we get it right?

• What about the bad guys?

Electronic Voting

http://www.cs.utexas.edu/~ear/cs349/slides/DCVotingMachineBug.html

Electronic Voting

http://www.cs.utexas.edu/~ear/cs349/slides/DCVotingMachineBug.html

Safari

browser

BALLOT

.pdf

My votes

BALLOT

.pdfsave as

Electronic Voting

Rating Financial Instruments

http://www.soxfirst.com/50226711/moodys_subprime_error_bug.php

Risks and Rewards

http://finance.fortune.cnn.com/2012/08/02/knight-high-frequency-loss/

Knight Capital Group installed new software but there was a glitch and they started trading wildly. In 45 minutes on August 1, 2012, they lost $440 million.

When Technologies Collide

When Technologies Collide

Risks and Rewards

http://www.youtube.com/watch?v=GrfXtAHYoVA

Risks and Rewards

http://www.youtube.com/watch?v=t3TAOYXT840

Risk and Trust

Risk and Trust

Risk and Trust

• 2010: Got recall notice for software patch.

• 2011: Government report clears electronic components of blame for accelerator problems.

Verification

• Does the software or hardware product meet specifications?– Specifications in English– Hardware tested by electrical signal inputs,

and whatever outputs the system is supposed to generate

– Software tested by logical inputs and logical outputs

Commonality between HW and SW verification

• Define all possible inputs• Calculate required output• Provide inputs to the system, and measure

output • Compare actual output to required output

Kinds of verification• Simulation

– Wide range of inputs provided, as if from a real life surrounding system• Test benches used

– Slow, not exhaustive set of inputs, inaccurate output calculation

• Testing– Real life systems– Fast for hardware compared to simulation– May not have test harness available– May not be exhaustive

Kinds of verification contd.

• Deductive reasoning– Used when writing initial versions– Correct by construction– Not formal or reproducible

Kinds of verification contd.

• Model checking– Given a model of a system, exhaustively and

automatically check whether this model meets a given specification.

– Model of the system and the specification have to be in formal language

– State space explosion problem– Model construction problem– False positives problem

Clicker

• Given a FSM with 1000 states interacting with a 1 kbit memory, how many states are to be considered in model-checking?

A. 2^10

B. 20

C. 2^20

D. 2^1010

E. Other

Risk and Trust

Risk and Trust

2010 Intro: http://www.youtube.com/watch?v=Atmk07Otu9U 2013 Update: http://www.youtube.com/watch?v=u6Ui_0PPw78 Helping the blind: http://www.youtube.com/watch?v=_JP-WTT1y3U

Risk and Trust

http://www.washingtontimes.com/news/2011/mar/8/self-driving-car-on-road-out-of-science-fiction/

2012: GM announces a self-driving Cadillac by 2015.

Risk and Trust

Summer, 2011

Risk and Trust

Intersection management

http://www.cs.utexas.edu/~aim/?p=video

Risk and Trust

Plane or planet? Sleepy pilot can’t tell.

Risk and Trust

In the meantime:

Risk and Trust

In the meantime:

The Android pothole app

Risk and RewardEmail

Risk and Reward

http://www.youtube.com/watch?v=uE7Yf4bw41E

Risk and Reward – A Case StudyLinear Accelerator Radiation Machines

• Social Benefit• Risk• Software Quality• Security• Ethics• Free Speech• Privacy• Law• Government Policy

http://www.nytimes.com/2010/01/24/health/24radiation.html?pagewanted=1&partner=rss&emc=rss

Problems Waiting to Happen?

Y2K Problem

• Attempt to save storage

• Did programmers imagine their code being used 30 years later?

Y2K Problem

• Attempt to save storage

• Did programmers imagine their code being used 30 years later?

• Will there be a “Year 2038 Problem” when UNIX system time (if stored in seconds since Jan 1, 1970 in a 32 bit signed integer) will overflow?

Unix 2038 Problem

http://xkcd.com/607/

Microsoft Windows Security

• 106 security updates in 2010 – one per 3.4 days

• 17 security updates from Jan 1, 2011 through March 29, 2011 – one per 5.1 days

• 22 security updates from Jan 1, 2012 through March 31, 2012 – one per 4.1 days

• 7 security updates in one month ending March 12, 2013 – one per 4.4 days.

Some Database ErrorsEntry and Misinterpretation

• A large population – many with similar names

Meet Mikey Hicks

http://www.nytimes.com/2010/01/14/nyregion/14watchlist.html

Some Database ErrorsEntry and Misinterpretation

• A large population – many with similar names • Automated processing lacking human/common sense or

recognition of special cases • Overconfidence in the accuracy of computer data • Errors – often carelessness - in data entry • Failure to update information and correct errors• Lack of accountability for errors

…and in Texas

…and in Texas

CVS, Texas settle lawsuit over dumping customers' records

HOUSTON — CVS Caremark Corp. will overhaul its information security system and pay the state of Texas $315,000 to settle a lawsuit that accused the drugstore operator of dumping credit card numbers, medical information and other material from more than 1,000 customers into a garbage container.

Texas Attorney General Greg Abbott, who sued CVS in April, announced the agreement Wednesday.

Yah, but is a that a computer system error?

Some High-Level Causes of Computer Systems Failures

• Lack of clear, well-thought-out goals and specifications • Poor management and poor communication among

customers, designers, programmers, and so on • Institutional or political pressures that encourage

unrealistically low bids, unrealistically low budget requests, and underestimates of time requirements

• Use of very new technology, with unknown reliability and problems, perhaps for which software developers have insufficient experience and expertise

• Refusal to recognize or admit that a project is in trouble

Some Factors in Computer-System Errors and Failures - 1

1. Design and development – Inadequate attention to potential safety risks. – Interaction with physical devices that do not work as

expected. Incompatibility of software and hardware or of application software and the operating system.

– Not planning and designing for unexpected inputs or circumstances.

– Insufficient testing. – Insufficient/unclear documentation– Reuse of software from another system without

adequate checking. Overconfidence in software. – Carelessness

Some Factors in Computer-System Errors and Failures - 2

2. Management and use – Data-entry errors. – Inadequate training of users. – Errors in interpreting results or output. – Failure to keep information in databases up to

date. – Overconfidence in software by users. – Insufficient planning for failures, no backup

systems or procedures .

Some Factors in Computer-System Errors and Failures – 3, 4

3. Misrepresentation, hiding problems, and inadequate response to reported problems

4. Insufficient market or legal incentives to do a better job.

Can we ensure quality and reliability?

• Criminal and civil penalties• Warranties for consumer software• Regulation and safety-critical applications• Professional licensing• Insurance companies• Taking responsibility

Software Engineering

1-114

1-115

Specification

• Determine system requirements• Understand constraints• Determine feasibility• End products

– High-level statement of requirements– Mock-up of user interface– Low-level requirements statement

1-116

Development

• Create high-level design• Discover and resolve mistakes, omissions

in specification• CASE tools to support design process• Object-oriented systems have advantages• After detailed design, actual programs

written• Result: working software system

1-117

Validation (Testing)

• Ensure software satisfies specification• Ensure software meets user’s needs• Challenges to testing software

– Noncontinuous responses to changes in input– Exhaustive testing impossible– Testing reveals bugs, but cannot prove none exist

• Test modules, then subsystems, then system

1-118

Software Quality Is Improving

• Standish Group tracks IT projects• Situation in 1994

– 1/3 projects cancelled before completion– 1/2 projects had time and/or cost overruns– 1/6 projects completed on time and on budget

• Situation in 2009– 1/4 projects cancelled– 5/12 projects had time and/or cost overruns– 1/3 projects completed on time and on budget

Success of IT Projects Over Time

1-119

8.8 Software Warranties andVendor Liability

1-120

1-121

Shrinkwrap Warranties

• Some say you accept software “as is”• Some offer 90-day replacement or money-

back guarantee• None accept liability for harm caused by

use of software

1-122

Are Software Warranties Enforceable?

• Mass-marketed software and software included in sale of hardware likely to be considered a good by a court of law

• Uniform Commercial Code applies to goods, despite what warranties may say

Key Court Cases

• Step-Saver Data Systems v. Wyse Technology and the Software Link– Court ruled that provisions of UCC held

• ProCD v. Zeidenberg– Court ruled shrinkwrap licenses are enforceable

• Mortenson v. Timberline Software– Court ruled in favor of Timberline and licensing

agreement that limited consequential damages

1-123

1-124

Moral Responsibility of Software Manufacturers

• If vendors were responsible for harmful consequences of defects– Companies would test software more– They would have to purchase liability insurance– Software would cost more– Start-ups would be affected more than big companies– Less innovation in software industry?– Software would be more reliable?

• Making vendors responsible for harmful consequences of defects may be a bad idea, but…

• Consumers should not have to pay for bug fixes

Recommended