View
228
Download
2
Category
Tags:
Preview:
Citation preview
CS 5150 2
Administration
Final presentations
Sign up for your presentations now.
Weekly progress reports
Remember to send your progress reports to your TA
CS 5150 4
Failures and Faults
Failure: Software does not deliver the service expected by the user (e.g., mistake in requirements, confusing user interface)
Fault (BUG): Programming or design error whereby the delivered system does not conform to specification (e.g., coding error, interface error)
CS 5150 5
Faults and Failures
Actual examples
(a) An application crashes with an emulator, even though the emulator is bug free. (Compensating bug problem.)
(b) A mathematical function never converges with badly formulated data. (Round off error.)
(c) After a network is hit by lightning, it crashes on restart. (Problem of incremental growth.)
(d) The head of an organization is paid $5 a month instead of $10,005 because the maximum salary allowed by the program is $10,000. (Requirements problem.)
(e) An operating system fails because of a page-boundary error in the firmware. (Different operating environment problem.)
CS 5150 6
Terminology
Fault avoidance
Build systems with the objective of creating fault-free (bug-free) software
Fault tolerance
Build systems that continue to operate when faults (bugs) occur
Fault detection (testing and validation)
Detect faults (bugs) before the system is put into operation.
CS 5150 7
Fault Avoidance
Software development process that aims to develop zero-defect software.
• Careful requirements and detailed specification• Incremental development with customer input• Constrained programming options• Static verification• Statistical testing
It is always better to prevent defects than to remove them later.
Example: The four color problem.
CS 5150 8
Defensive Programming
Murphy's Law:
If anything can go wrong, it will.
Defensive Programming:
• Redundant code is incorporated to check system state after modifications.
• Implicit assumptions are tested explicitly.
• Risky programming constructs are avoided.
CS 5150 9
Defensive Programming: Error Avoidance
Avoid risky programming constructs except where really necessary
• Pointers
• Dynamic memory allocation
• Floating-point numbers
• Parallelism
• Recursion
• Interrupts
• Multi-threading
All are valuable in certain circumstances, but should be used with discretion
CS 5150 10
Defensive Programming Examples
• Use boolean variable not integer
• Test i <= n not i == n
• Assertion checking (e.g., validate parameters)
• Build debugging code into program with a switch to display values at interfaces
• Error checking codes in data (e.g., checksum or hash)
CS 5150 11
Some Notable Bugs
Even commercial systems may have serious bugs
• Built-in function in Fortran compiler (e0 = 0)
• Japanese microcode for Honeywell DPS virtual memory
• The microfilm plotter with the missing byte (1:1023)
• The Sun 3 page fault that IBM paid to fix
• Left handed rotation in the graphics package
• The preload system with the memory leak
Good people work around problems.The best people track them down and fix them!
CS 5150 12
Fault Tolerance
Aim:
A system that continues to operate even when problems occur.
Examples:
• Invalid input data (e.g., in a data processing application)• Overload (e.g., in a networked system)• Hardware failure (e.g., in a control system)
General Approach:
• Failure detection• Damage assessment• Fault recovery• Fault repair
CS 5150 13
Fault Tolerance
Basic Techniques:
• Timers and timeout in real-time and networked systems
• After error continue with next transaction (e.g., discard data record, drop packet)
• Allow user break options (e.g., force quit, cancel)
• Use error correcting codes in data (e.g., RAID)
• Spare hardware (e.g., bad block tables on disk drives)
• Redundancy in data structures (e.g., forward and backward pointers)
Report all errors for quality control
CS 5150 14
Fault Tolerance
Backward Recovery:
• Record system state at specific events (checkpoints). After failure, recreate state at last checkpoint.
• Combine checkpoints with system log (audit trail of transactions) that allows transactions from last checkpoint to be repeated automatically.
• Test the restore software!
CS 5150 15
Fault Tolerance
Google and Hadoop Files Systems
• Clusters of commodity computers (1,000+ computers, 1,000+ TB)
"Component failures are the norm rather than the exception....We have seen problems caused by application bugs, operating system bugs, human errors, and the failures of disks, memory, connectors,networking, and power supplies."
• Data is stored in large chunks (64 MB).
• Each chunk is replicated, typically with three copies.
• If component fails, new replicas are created automatically.
Ghemawat, et al., The Google File System. 19th ACM Symposium on Operating Systems Principles, October 2003
CS 5150 16
Fault Tolerance
N-version programming
• Execute independent implementation in parallel, compare results, accept the most probable.
• Used when extreme reliability is required with no opportunity to repair (e.g., space craft).
• Difficulty is to ensure that the implementations are truly independent (e.g., separate power supplies, sensors, algorithms).
CS 5150 17
Software Engineering for Real Time
The special characteristics of real time computing require extra attention to good software engineering principles:
• Requirements analysis and specification
• Special techniques (e.g., locks on data, semaphores, etc.)
• Development of tools
• Modular design
• Exhaustive testing
Heroic programming will fail!
CS 5150 18
Software Engineering for Real Time
Testing and debugging need special tools and environments
• Debuggers, etc., can not be used to test real time performance
• Simulation of environment may be needed to test interfaces -- e.g., adjustable clock speed
• General purpose tools may not be available
CS 5150 19
Validation and Verification
Validation: Are we building the right product?
Verification: Are we building the product right?
In practice, it is sometimes difficult to distinguish between the two.
That's not a bug. That's a feature!
CS 5150 20
Security in the Software Development Process
The security goal
The security goal is to make sure that the agents (people or external systems) who interact with a computer system, its data, and its resources, are those that the owner of the system would wish to have such interactions.
Security considerations need to be part of the entire software development process. They may have a major impact on the architecture chosen.
Example. Integration of Internet Explorer into Windows
CS 5150 21
Agents and Components
A large system will have many agents and components:
• each is potentially unreliable and insecure
• components acquired from third parties may have unknown security problems
• commercial off-the-shelf (COTS) problem
The software development challenge:
• develop secure and reliable components
• protect whole system so that security problems in parts of it do not spread to the entire system
CS 5150 22
Techniques: Barriers
Place barriers that separate parts of a complex system:
• Isolate components, e.g., do not connect a computer to a network
• Firewalls
• Require authentication to access certain systems or parts of systems
Every barrier imposes restrictions on permitted uses of the system
Barriers are most effective when the system can be divided into subsystems with simple boundaries
CS 5150 23
Techniques: Authentication & Authorization
Authentication establishes the identity of an agent:
• What the agent knows (e.g., password)
• What the agent possess (e.g., smart card)
• Where does the agent have access to (e.g., crt-alt-del)
• What are the physical properties of the agent (e.g., fingerprint)
Authorization establishes what an authenticated agent may do:
• Access control lists
• Group membership
CS 5150 24
Example: An Access Model for Digital Content
Digital material
Attributes
User
Roles
Actions
OperationsAccess
Policies
CS 5150 25
Techniques: Encryption
Allows data to be stored and transmitted securely, even when the bits are viewed by unauthorized agents
• Private key and public key
• Digital signatures
Encryption
Decryption
X Y
Y X
CS 5150 26
Security and People
People are intrinsically insecure:
• Careless (e.g, leave computers logged on, use simple passwords, leave passwords where others can read them)
• Dishonest (e.g., stealing from financial systems)
• Malicious (e.g., denial of service attack)
Many security problems come from inside the organization:
• In a large organization, there will be some disgruntled and dishonest employees
• Security relies on trusted individuals. What if they are dishonest?
CS 5150 27
Design for Security: People
• Make it easy for responsible people to use the system
• Make it hard for dishonest or careless people (e.g., password management)
• Train people in responsible behavior
• Test the security of the system thoroughly and repeatedly, particularly after changes
• Do not hide violations
Recommended