Chapter 1: Data Storage

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

Computer Science: An OverviewTenth Edition

by J. Glenn Brookshear

Chapter 1:Data Storage

1-2

1-2


Chapter 1: Data Storage

• 1.1 Bits and Their Storage• 1.2 Main Memory• 1.3 Mass Storage• 1.4 Representing Information as Bit Patterns• 1.5 The Binary System

1-3

1-3


Chapter 1: Data Storage (continued)

• 1.6 Storing Integers• 1.7 Storing Fractions• 1.8 Data Compression• 1.9 Communications Errors

1-4

1-4


Bits and Bit Patterns

• Information is encoded as patterns of 0s and 1s. These digits are called bits (short for binary digits)

• Bit: Binary Digit (0 or 1)• Bit Patterns are used to represent information.

– Numbers– Text characters– Images– Sound– And others

1-5

1-5


Boolean Operations

• George Boole (1815-1864). Pioneer in the field of math called logic.

• Boolean Operation: An operation that manipulates one or more true/false values

• Specific operations– AND– OR– XOR (exclusive or)– NOT

1-6

1-6


Figure 1.1 The Boolean operations AND, OR, and XOR (exclusive or)

1-7

1-7


Gates

• Gate: A device that computes a Boolean operation

• A gate can be constructed from a variety of technologies such as gears, relays and optic devices.

• In today’s computers, gates are usually implemented as small electronic cirucits in which digits 0 and 1 are represented as voltage levels.

1-8

1-8


Figure 1.2 A pictorial representation of AND, OR, XOR, and NOT gates as well as their input and output values

1-9

1-9


Flip-flops

• Flip-flop: A circuit built from gates that can store one bit.

• Flip-flop produces an output value of 0 or 1, which remains constant until a temporary pulse from another circuit causes it to shift tot he other value.

• The output will flip or flop between two values under the control of external stimuli.– One input line is used to set its stored value to 1– One input line is used to set its stored value to 0– While both input lines are 0, the most recently stored

value is preserved

1-10

1-10


Figure 1.3 A simple flip-flop circuit

1-11

1-11


Figure 1.4 Setting the output of a flip-flop to 1

1-12

1-12


Figure 1.4 Setting the output of a flip-flop to 1 (continued)

1-13

1-13


Figure 1.4 Setting the output of a flip-flop to 1 (continued)

1-14

1-14


Figure 1.5 Another way of constructing a flip-flop

1-15

1-15


Number Systems

• Decimal (10-base): 1,2,3,4...• Binary (2): 0 , 1• Octal (8-base), 0...7• Hexadecimal (16-base). 0....F

• When considering the internal structure, the activities of a computer, we must deal with strings of bits, some of which can be quite long. – A long string of bits is often called stream. – Streams are difficult for human to comprehend. – a shorthand notation called hexadecimal notation.

1-16

1-16


Hexadecimal Notation

• Hexadecimal notation: A shorthand notation for long bit patterns– Divides a pattern into groups of four bits each– Represents each group by a single symbol

• Example: 10100011 becomes A3

1-17

1-17


Figure 1.6 The hexadecimal coding system

1-18

1-18


Main Memory Cells

• Main memory is organized in manageable units called cells with a typical cell size being 8 bits.

• Cell: A unit of main memory (typically 8 bits which is one byte). Although there is no left or right within a computer, we normally envision the bits arranged in a row

– Most significant bit: the bit at the left (high-order) end of the conceptual row of bits in a memory cell

– Least significant bit: the bit at the right (low-order) end of the conceptual row of bits in a memory cell

• Large computers may have billions of cells in their main memory.

1-19

1-19


Figure 1.7 The organization of a byte-size memory cell

1-20

1-20


Main Memory Addresses

• To identify individual cells, each cell is assigned a unique name.

• Address: A “name” that uniquely identifies one cell in the computer’s main memory– The names are actually numbers.– These numbers are assigned consecutively

starting at zero.– Numbering the cells in this manner associates

an order with the memory cells.

1-21

1-21


Figure 1.8 Memory cells arranged by address

1-22

1-22


Memory Terminology

• Random Access Memory (RAM): Memory in which individual cells can be easily accessed in any order

• Dynamic Memory (DRAM): RAM composed of volatile memory

1-23

1-23


Measuring Memory Capacity

• Kilobyte: 210 bytes = 1024 bytes– Example: 3 KB = 3 times1024 bytes– Sometimes “kibi” rather than “kilo”

• Megabyte: 220 bytes = 1,048,576 bytes– Example: 3 MB = 3 times 1,048,576 bytes– Sometimes “megi” rather than “mega”

• Gigabyte: 230 bytes = 1,073,741,824 bytes– Example: 3 GB = 3 times 1,073,741,824 bytes– Sometimes “gigi” rather than “giga”

• Terabyte• Petabyte• Exabyte• Zettabyte• Yottabyte• Brontobyte• Nisabyte (rumors)• Zotzabyte (rumors)

1-24

1-24


Mass Storage

• Hard disks, CDs, DVDs, Flash Drives etc.• Advantages over main memory

– On-line(attached to the device) versus off-line (detached from the device)

– Typically larger than main memory– Typically less volatile than main memory– Typically slower than main memory– Low cost– Large storage capacity– Less volatility

• Disadvantage– Requires mechanical motion and therefore require more time to

store and retrieve data

1-25

1-25


Mass Storage Systems

• Magnetic Systems– Disk– Tape

• Optical Systems– CD– DVD

• Flash Drives

1-26

1-26


Figure 1.9 A magnetic disk storage system

Locations of tracks and sectors are not permanent part of the disk’s physical structure.

The capacity of a disk storageDepends on the # of disks used and the density in which the tracks and sectors are placed.

1-27

1-27


Measurements for Disk Performance

• Seek time: the time required to move the read/write heads from one track to another

• Rotation delay (latency time): half the time required for the disk to make a compete rotation

• Access time: seek time + latency time• Transfer Rate: the reate at which data can

be transferred to or from the disk

1-28

1-28


Figure 1.10 Magnetic tape storage

1-29

1-29


Figure 1.11 CD storage

1-30

1-30


Files

• File: A unit of data stored in mass storage system– Fields and keyfields

• Physical record versus Logical record• Buffer: A memory area used for the

temporary storage of data (usually as a step in transferring the data)

1-31

1-31


Figure 1.12 Logical records versus physical records on a disk

1-32

1-32


Representing Text

• Each character (letter, punctuation, etc.) is assigned a unique bit pattern.– ASCII (American Standard Code for Infromation

Interchange): Uses patterns of 7-bits to represent most symbols used in written English text

– Unicode: Uses patterns of 16-bits to represent the major symbols used in languages world side

• Developed through cooperation of several leading manufactures of hardware and software

– ISO (International Standard Organization) standard: Uses patterns of 32-bits to represent most symbols used in languages world wide

1-33

1-33


Figure 1.13 The message “Hello.” in ASCII

1-34

1-34


Try encoding the following

1010100 1001000 1001001 10100110100000 1001001 1010011 01000001000001 0100000 1010011 10101001010101 1010000 1001001 10001000100000 1010000 1010010 1001111

1000010 1001100 1000101 10011010100001

1-35

1-35


Representing Numeric Values

• Storing information in terms of encoded characters is inefficient when the info is numeric.

• Binary notation is a way of representing numeric values• Binary notation: Uses bits to represent a number in base

two• Limitations of computer representations of numeric

values– Overflow – occurs when a value is too big to be represented– Truncation – occurs when a value cannot be represented

accurately

1-36

1-36


Representing Images

• Popular techniques for representing images can be classified into two:– Bit map techniques

• An image represented as a collection of dots, each of which is called pixel (picture element)

• Each pixel is represented by a combination of bits indicating the appearance of that pixel. There are two approaches:

– RGB– Luminance and chrominance

• Disadvantage is that an image cannot be reshaped to any arbitrary size

– Vector techniques• An image is represented as a collection of lines an curves• Scalable• TrueType and PostScript

1-37

1-37

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley37

...Representation of Images

• Pictures:

– A picture must be transformed into numeric form before it can be stored or manipulated by the computer.

– Each picture is subdivided into a grid of squares called pixels (picture elements).

1-38

1-38



– An image on paper can be converted into pixels using a scanner.

– Digital cameras store their images as digital images, i.e. the picture is already stored in the cameras memory as pixels.

1-39

1-39



A picture with only black and white pixels:1 represents black.0 represents white.

1-40

1-40



=

010101010101010101010110101101001001000111110000

011010101010101010101001011010010110010100000110

100101010101010101010110110001010000101001010100

101101101011011010110101100110010110100010001001

011010010110100101101010001001100100101101010010

100101101100101011010101110110011001010010101100

011010010011010110010010001001100110101010010001

010101101100101100100101110110011001010100100101

010101010101010011011010001001100010100001010100

101010101010101100010010110010001101001110100001

010101010101010001000101000101101000010000001101

110110101010010100110100011010010011100101101000

101001010100100010100101100101101100001010000010

101011010001001001001001011110101011010100101100

101010000100010010010111110101111100101001001001

010100101001000100101010101110101011010010010000

101001000010011001101111101011101010101000100101

010010010100100011011000011110111011010110101000

000100000001001100100111111111110110111000000010

101000101010010011011000010101011101000010101000

000010000100101101010011111111111111011101000101

001000101001101010100100011101111110100010010000

010010010110001001001001111011110101101100100101

100100100000111010010010010111111111011001001000

1-41

1-41


Representation of Images (5)

• Photographic quality images have a gray-scale.

Several shades between black and white are used.

1-42

1-42



• 4 level gray-scale means 4 shades are used.

Each pixel needs 2 bits: 00 - represents white01 - represents light gray10 - represents dark gray11 - represents black

This picture has 4 levels of gray (this uses four bits).

1-43

1-43



256 level gray-scale means

8 bits per pixel are needed for 256 shades of gray.

This picture has 256 levels of gray(this uses eight bits).

1-44

1-44



We could also use 8 bits (known as a byte) to represent the colour of a pixel.

A byte can represent 256 different numbers, so we can have 256 different colours in the image.

This picture uses 256 colours.

1-45

1-45


Graphics Colours

• Red, Green, Blue (RGB)– The primary light colours use three values

per pixel.– One number is used for each of the amounts

of Red, Green and Blue on the computer screen.

Red Green Blue

Colour of pixel

1-46

1-46


Representation of Images (again)

This is a full colour image.

1-47

1-47


Representing Sound

• Sampling techniques– The most generic method– Sample the amplitude of the sound wave at regular

intervals and record the series of values obtained.– Used for high quality recordings– Records actual audio

• MIDI– Used in music synthesizers– Records “musical score”– Wave Table Synthesis

1-48

1-48


Sampling: Analog-to-Digital Converter (ADC)

Each dot in the figure above represents one audio sample. There are two factors that determine the quality of a digital recording: Sample rate: The rate at which the samples are captured, measured in Hertz (Hz), or samples per second. An audio CD has a sample rate of 44,100 Hz (44 KHz)Sample format or sample size: the number of digits in the digital representation of each sample.

1-49

1-49


...Representing Sound

– These voltages are sent down the speaker wires to produce sound.

A sound wave represented by the sequence:

0, 1.5, 2.0, 1.5, 2.0, 3.0, 4.0, 3.0, 0 (Amps)

1-50

1-50


...Representing Sound

1-51

1-51


The Binary System

• A means of representing numeric values using only the digit 0 and 1.

• The traditional decimal system is based on powers of ten.

• The Binary system is based on powers of two.

1-52

1-52


Figure 1.15 The base ten and binary systems

1-53

1-53


Decoding the Binary representation 100101

1-54

1-54


Figure 1.17 An algorithm for finding the binary representation of a positive integer

1-55

1-55


Figure 1.18 Applying the algorithm in Figure 1.15 to obtain the binary representation of thirteen

1-56

1-56


Figure 1.19 The binary addition facts

• 0 + 0 = 0 • 1 + 0 = 1• 0 + 1 = 1 • 1 + 1 = 10 • 1 + 1 + 1 = 11

1-57

1-57


Fractions in Binary Numbers

1-58

1-58


Storing Integers

• Two’s complement notation: The most popular means of representing integer values– it is common to use a two’s complement system in which each

value is represented by apatterns of 32 bits.– Left most bit of a bit pattern indicates the sign of the value

• 1: negative• 0: positive

• Excess notation: Another means of representing integer values

• Both can suffer from overflow errors.

1-59

1-59


Figure 1.21 Two’s complement notation systems

1-60

1-60


Figure 1.22 Coding the value -6 in two’s complement notation using four bits

1-61

1-61


Figure 1.23 Addition problems converted to two’s complement notation

A major benefit of two’sComplement is that addition of any combination of signed numbers can be accomplished using the same algorithm and thusthe same circuit7 - 5 = 7 + (-5)

1-62

1-62


Figure 1.24 An excess eight conversion table

Another notation for representing integer values is excess notation.

It takes longer time than 2’s complement. It plays an important part in computing floating point representation.

1-63

1-63


Figure 1.25 An excess notation system using bit patterns of length three

1-64

1-64


Storing Fractions

• Storage of fractions requires store of the value and also the position of the radix point.

• Floating-point Notation: Consists of a sign bit, a mantissa field, and an exponent field.

• Related topics include– Normalized form– Truncation errors

• Part of the value being stored is lost because mantissa field is not large enough.

1-65

1-65


Figure 1.26 Floating-point notation components

1-66

1-66


Figure 1.27 Encoding the value 2 5⁄8

1-67

1-67


Data Compression

• Lossy versus lossless• Run-length encoding (lossless method)

– Replaces sequences of identical data elements with a code indicating the element that is repeated and the number of times it occurs in the sequence.

– Example: 253 ones 118 zeros 87 ones.

• Frequency-dependent encoding

(Huffman codes)• Relative encoding

– Record the differences between consecutive data units rahter than entire units

• Dictionary encoding (Includes adaptive dictionary encoding such as LZW encoding.)

1-68

1-68


Frequency-dependent encoding Huffman Codes

• Suppose we have a message consisting of 5 symbols, e.g. [►♣♣♠☻►♣☼►☻]

• How can we code this message using 0/1 so the coded message will have minimum length (for transmission or saving!)

• 5 symbols at least 3 bits • For a simple encoding,

length of code is 10*3=30 bits

1-69

1-69


Frequency-dependent encoding Huffman Codes (cont.)

• Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code

• For Huffman code,

length of encoded message

will be ►♣♣♠☻►♣☼►☻

=3*2 +3*2+2*2+3+3=24bits

1-70

1-70


Dictionary Encoding (LZW)

• Lempel-Ziv-Welsh (LZW)– Algorithm works by scanning through the input

string for successively longer substrings until it finds one that is not in the dictionary.

– If such a string is found, it is added to the dictionary– Example: xyx xyx xyx xyx

• 1-x• 2-y• 3 (single space)• 4 xyx

– Message will be 12343434

1-71

1-71


Compressing Images

• GIF (Graphic Interchange Format): Good for cartoons– Reduces the number of colors that can be assigned to

a pixel to only 251 (lossy compression)

• JPEG (Joint Photographic Express Group): Good for photographs

• TIFF: Good for image archiving– Not a data compression but a standard format for

storing photo along with related info such as date, time.

1-72

1-72


Compressing Audio and Video

• MPEG (Motion Picture Experts Group)– High definition television broadcast– Video conferencing

• MP3 (MPEG Layer 3)– Temporal masking

• For a short period after a loud sound, the human ear cannot detect softhen sounds that would otherwise be available

– Frequency masking• A sound at one frequency tends to mask softer sounds at

near by frequencies.

1-73

1-73


Communication Errors

• Parity bits (even versus odd)• Checkbytes• Error correcting codes

1-74

1-74


Figure 1.28 The ASCII codes for the letters A and F adjusted for odd parity

1-75

1-75


Figure 1.29 An error-correcting code

1-76

1-76


Figure 1.30 Decoding the pattern 010100 using the code in Figure 1.30

Documents

Chapter 1: Data Storage