28
CS 5150 1 CS 5150 Software Engineering Lecture 22 Reliability 3

CS 5150 1 CS 5150 Software Engineering Lecture 22 Reliability 3

  • View
    228

  • Download
    2

Embed Size (px)

Citation preview

CS 5150 1

CS 5150 Software Engineering

Lecture 22

Reliability 3

CS 5150 2

Administration

Final presentations

Sign up for your presentations now.

Weekly progress reports

Remember to send your progress reports to your TA

CS 5150 3

The Heisenbug

CS 5150 4

Failures and Faults

Failure: Software does not deliver the service expected by the user (e.g., mistake in requirements, confusing user interface)

Fault (BUG): Programming or design error whereby the delivered system does not conform to specification (e.g., coding error, interface error)

CS 5150 5

Faults and Failures

Actual examples

(a) An application crashes with an emulator, even though the emulator is bug free. (Compensating bug problem.)

(b) A mathematical function never converges with badly formulated data. (Round off error.)

(c) After a network is hit by lightning, it crashes on restart. (Problem of incremental growth.)

(d) The head of an organization is paid $5 a month instead of $10,005 because the maximum salary allowed by the program is $10,000. (Requirements problem.)

(e) An operating system fails because of a page-boundary error in the firmware. (Different operating environment problem.)

CS 5150 6

Terminology

Fault avoidance

Build systems with the objective of creating fault-free (bug-free) software

Fault tolerance

Build systems that continue to operate when faults (bugs) occur

Fault detection (testing and validation)

Detect faults (bugs) before the system is put into operation.

CS 5150 7

Fault Avoidance

Software development process that aims to develop zero-defect software.

• Careful requirements and detailed specification• Incremental development with customer input• Constrained programming options• Static verification• Statistical testing

It is always better to prevent defects than to remove them later.

Example: The four color problem.

CS 5150 8

Defensive Programming

Murphy's Law:

If anything can go wrong, it will.

Defensive Programming:

• Redundant code is incorporated to check system state after modifications.

• Implicit assumptions are tested explicitly.

• Risky programming constructs are avoided.

CS 5150 9

Defensive Programming: Error Avoidance

Avoid risky programming constructs except where really necessary

• Pointers

• Dynamic memory allocation

• Floating-point numbers

• Parallelism

• Recursion

• Interrupts

• Multi-threading

All are valuable in certain circumstances, but should be used with discretion

CS 5150 10

Defensive Programming Examples

• Use boolean variable not integer

• Test i <= n not i == n

• Assertion checking (e.g., validate parameters)

• Build debugging code into program with a switch to display values at interfaces

• Error checking codes in data (e.g., checksum or hash)

CS 5150 11

Some Notable Bugs

Even commercial systems may have serious bugs

• Built-in function in Fortran compiler (e0 = 0)

• Japanese microcode for Honeywell DPS virtual memory

• The microfilm plotter with the missing byte (1:1023)

• The Sun 3 page fault that IBM paid to fix

• Left handed rotation in the graphics package

• The preload system with the memory leak

Good people work around problems.The best people track them down and fix them!

CS 5150 12

Fault Tolerance

Aim:

A system that continues to operate even when problems occur.

Examples:

• Invalid input data (e.g., in a data processing application)• Overload (e.g., in a networked system)• Hardware failure (e.g., in a control system)

General Approach:

• Failure detection• Damage assessment• Fault recovery• Fault repair

CS 5150 13

Fault Tolerance

Basic Techniques:

• Timers and timeout in real-time and networked systems

• After error continue with next transaction (e.g., discard data record, drop packet)

• Allow user break options (e.g., force quit, cancel)

• Use error correcting codes in data (e.g., RAID)

• Spare hardware (e.g., bad block tables on disk drives)

• Redundancy in data structures (e.g., forward and backward pointers)

Report all errors for quality control

CS 5150 14

Fault Tolerance

Backward Recovery:

• Record system state at specific events (checkpoints). After failure, recreate state at last checkpoint.

• Combine checkpoints with system log (audit trail of transactions) that allows transactions from last checkpoint to be repeated automatically.

• Test the restore software!

CS 5150 15

Fault Tolerance

Google and Hadoop Files Systems

• Clusters of commodity computers (1,000+ computers, 1,000+ TB)

"Component failures are the norm rather than the exception....We have seen problems caused by application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors,networking, and power supplies."

• Data is stored in large chunks (64 MB).

• Each chunk is replicated, typically with three copies.

• If component fails, new replicas are created automatically.

Ghemawat, et al., The Google File System. 19th ACM Symposium on Operating Systems Principles, October 2003

CS 5150 16

Fault Tolerance

N-version programming

• Execute independent implementation in parallel, compare results, accept the most probable.

• Used when extreme reliability is required with no opportunity to repair (e.g., space craft).

• Difficulty is to ensure that the implementations are truly independent (e.g., separate power supplies, sensors, algorithms).

CS 5150 17

Software Engineering for Real Time

The special characteristics of real time computing require extra attention to good software engineering principles:

• Requirements analysis and specification

• Special techniques (e.g., locks on data, semaphores, etc.)

• Development of tools

• Modular design

• Exhaustive testing

Heroic programming will fail!

CS 5150 18

Software Engineering for Real Time

Testing and debugging need special tools and environments

• Debuggers, etc., can not be used to test real time performance

• Simulation of environment may be needed to test interfaces -- e.g., adjustable clock speed

• General purpose tools may not be available

CS 5150 19

Validation and Verification

Validation: Are we building the right product?

Verification: Are we building the product right?

In practice, it is sometimes difficult to distinguish between the two.

That's not a bug. That's a feature!

CS 5150 20

Security in the Software Development Process

The security goal

The security goal is to make sure that the agents (people or external systems) who interact with a computer system, its data, and its resources, are those that the owner of the system would wish to have such interactions.

Security considerations need to be part of the entire software development process. They may have a major impact on the architecture chosen.

Example. Integration of Internet Explorer into Windows

CS 5150 21

Agents and Components

A large system will have many agents and components:

• each is potentially unreliable and insecure

• components acquired from third parties may have unknown security problems

• commercial off-the-shelf (COTS) problem

The software development challenge:

• develop secure and reliable components

• protect whole system so that security problems in parts of it do not spread to the entire system

CS 5150 22

Techniques: Barriers

Place barriers that separate parts of a complex system:

• Isolate components, e.g., do not connect a computer to a network

• Firewalls

• Require authentication to access certain systems or parts of systems

Every barrier imposes restrictions on permitted uses of the system

Barriers are most effective when the system can be divided into subsystems with simple boundaries

CS 5150 23

Techniques: Authentication & Authorization

Authentication establishes the identity of an agent:

• What the agent knows (e.g., password)

• What the agent possess (e.g., smart card)

• Where does the agent have access to (e.g., crt-alt-del)

• What are the physical properties of the agent (e.g., fingerprint)

Authorization establishes what an authenticated agent may do:

• Access control lists

• Group membership

CS 5150 24

Example: An Access Model for Digital Content

Digital material

Attributes

User

Roles

Actions

OperationsAccess

Policies

CS 5150 25

Techniques: Encryption

Allows data to be stored and transmitted securely, even when the bits are viewed by unauthorized agents

• Private key and public key

• Digital signatures

Encryption

Decryption

X Y

Y X

CS 5150 26

Security and People

People are intrinsically insecure:

• Careless (e.g, leave computers logged on, use simple passwords, leave passwords where others can read them)

• Dishonest (e.g., stealing from financial systems)

• Malicious (e.g., denial of service attack)

Many security problems come from inside the organization:

• In a large organization, there will be some disgruntled and dishonest employees

• Security relies on trusted individuals. What if they are dishonest?

CS 5150 27

Design for Security: People

• Make it easy for responsible people to use the system

• Make it hard for dishonest or careless people (e.g., password management)

• Train people in responsible behavior

• Test the security of the system thoroughly and repeatedly, particularly after changes

• Do not hide violations

CS 5150 28

Suggested Reading

Trust in Cyberspace, Committee on Information Systems Trustworthiness, National Research Council (1999)http://www.nap.edu/readingroom/books/trust/

Fred Schneider, Cornell Computer Science, was the chair of this study.