61
f. Yousef B. Mahdy -2012-2013, Assuit University, E f. Yousef B. Mahdy -2012-2013, Assuit University, E File Organization Prof. Yousef B. Mahdy Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

Embed Size (px)

Citation preview

Page 1: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

Prof. Yousef B. Mahdy -2012-2013, Assuit University, EgyptProf. Yousef B. Mahdy -2012-2013, Assuit University, Egypt

File Organization

Prof. Yousef B. MahdyProf. Yousef B. Mahdy

Ch

ap

ter -4

Data

M

an

ag

em

en

t in

File

s

Page 2: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-2 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Data Representation in Memory Persistent: Retained after execution of the

program which created it. »When we build file structures, we are making it

possible to make data persistent. That is, one program can store data from memory to a file, and terminate. Later, another program can retrieve the data from the file, and process it in memory.

The basic logical unit of data is the field which contains a single data value.

Fields are organized into aggregates, either as many copies of a single field (an array) or as a list of different fields (a record).

When a record is stored in memory, we refer to it as an object and refer to its fields as members.

Page 3: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-3 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1 When a record is stored in file, we call it simply a

record. In this chapter, we look at file structures which can

be used to organize the data within the file, and at the algorithms which can be used to store and retrieve the data sequentially.

Page 4: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-4 - Prof Yousef B. Mahdy- 04/19/23 File Organization

2 Record: A subdivision of a file, containing data

related to a single entity. field : A subdivision of a record containing a single

attribute of the entity which the record describes. stream of bytes: A file which is regarded as being

without structure beyond separation into a sequential set of bytes.

Key: a subset of the fields in a record used to identify (uniquely) the record.

Within a program, data is temporarily stored in variables. Individual values can be aggregated into structures, which can be treated as a single variable with parts. In C++, classes are typically used as an aggregate structure.

Page 5: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-5 - Prof Yousef B. Mahdy- 04/19/23 File Organization

3 C++ Person class class Person { public: char FirstName [11]; char LastName[11]; char Address [21]; char City [21]; char State [3]; char ZIP [5];}; With this class declaration, variables can be

declared to be of type Person.  The individual fields within a Person can be referred to as the name of the variable and the name of the field, separated by a period (.).

Page 6: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-6 - Prof Yousef B. Mahdy- 04/19/23 File Organization

4 In memory, each Person will appear as an aggregate,

with the individual values being parts of the aggregate:

The output of this program will be:

Obviously, this output could be improved.  It is marginally readable by people, and it would be difficult to program a computer to read and correctly interpret this output.

Page 7: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-7 - Prof Yousef B. Mahdy- 04/19/23 File Organization

5 In Stream Files, the information is written as a stream of bytes

containing no added information:

Problem: There is no way to get the information back in the organized record format.

The question: when we write records, how do we organize the fields in the records:» so that the information can be recovered» so that we save space

Page 8: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-8 - Prof Yousef B. Mahdy- 04/19/23 File Organization

6» so that we can process efficiently» to maximize record structure flexibility

We must add structure to the file to maintain the identity of fields.

Simple representation: a file organized as a

stream of bytes. Simple, but Reverse Humpty-Dumpty

problem

» In case of putting all information as a byte of stream,

there is no way to get it apart

» Solution : Use field structure

Page 9: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-9 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Data Management in Files The topic of the structure has two main interests:

First, to devise better ways to organize data in files and second, to design methods to access data from files. This part deals with the first question, all subsequent parts deals with the second. The method employed to organize data in the file has impact on methods designed to access them.

Page 10: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-10 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Organizational Hierarchy of Data in Files The file is a collection of Records. A record

contains related data which are called fields. Therefore, to organize data in files, fields and records have to be organized.

Key = a subset of the fields in a record used to identify (uniquely, usually) the record.

Consider storing the student records in a file which consists of fields like University Seat Number (USN), Name, Branch and Semester. The different methods to organize this file is discussed in the following part.

Page 11: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-11 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Field Structures The first method to organize fields is by limiting

the maximum size of each field (Fig. a). This is called as Fixed Length Fields. The advantages in this method is that since the size of each field is fixed, the entire field can be read at once.

But the disadvantage is that enormous space is wasted if the value saved in the field is not using the amount of memory reserved for it.

This is a good method for organizing fields if their size is know well in advance.

The main problem to be a while organizing fields is with respect to the usage of the particular field by the application in hand and to distinguish between one field from the next one.

Page 12: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-12 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1

Page 13: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-13 - Prof Yousef B. Mahdy- 04/19/23 File Organization

2 To overcome the problem of differentiating

between two field, many methods will be used. In such one method known as Length Indicator

Fields, the length of each field is specified as a prefix to actual data, Fig. b.

The other method to distinguish between two fields is using a separator between them. In the earlier method the size of each field acts as separator. Any special character which is not part of the actual data can be used as separator (F.fig. C). This method is known as Delimited Fields.

In the Self-Describing Fields method as shown in Fig. 1d, every field is proceeded by meta Data describing the data that follows it.

Page 14: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-14 - Prof Yousef B. Mahdy- 04/19/23 File Organization

3

Page 15: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-15 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Representing Record or Field Length Record or field length can be represented in either

binary or character form. The length can be considered as another hidden field within the record.

This length field can be either fixed length or delimited.

When character form is used, a space can be used to delimit the length field.

A two byte fixed length field could be used to hold lengths of 0 to 65535 bytes in binary form.

A two byte fixed length field could be used to hold lengths of 0 to 99 bytes in decimal character form.

A variable length field delimited by a space could be used to hold effectively any length.

Page 16: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-16 - Prof Yousef B. Mahdy- 04/19/23 File Organization

4

Page 17: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-17 - Prof Yousef B. Mahdy- 04/19/23 File Organization

5

Page 18: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-18 - Prof Yousef B. Mahdy- 04/19/23 File Organization

6 This helps in understanding the meaning of data.

In the previous methods, all the fields had to be stored in a particular order. But in this method, the fields can be organized in any order. The application program is able to understand the data because the actual data is proceeded by meta data.

Having seen different methods of organizing fields, the focus now shifts to organizing records. Since records are structurally not different from fields, most of the methods used for organizing fields can be used.

Page 19: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-19 - Prof Yousef B. Mahdy- 04/19/23 File Organization

7

Page 20: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-20 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Comparison

Page 21: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-21 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Record Structures The first method is the fixed-length Record structure,

where each record is stored in the fixed size(Fig.a). The size can be determined by adding the maximum space occupied by each field and some space reserved for the header data. Though the size of the entire record is fixed, the fields inside the record can be of varying size or fixed size.

The second method is simple variant of the above method. If in the earlier method, the length of the records is fixed, here the number of fields in each record is fixed (Fig.b). This is called fixed Field count structure is helpful since it combines the flexibility of having any type of field structure combined with the capability of reading record data since the fields in each record is known aprior.

Page 22: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-22 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1

Page 23: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-23 - Prof Yousef B. Mahdy- 04/19/23 File Organization

2 The third method is the familiar technique of

specifying length of each record (Fig. C). The length indicator record has the same advantages as discussed earlier for the similar field structure.

Delimited Record structure uses separator between two records (d).

The final organization is creating an Index Structure for records(Fig.E). An index is a collection of key field and reference field. The key field is a member of record which can uniquely identify the record and reference field contains the value that points to the address of corresponding record in the file.

Page 24: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-24 - Prof Yousef B. Mahdy- 04/19/23 File Organization

3

Page 25: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-25 - Prof Yousef B. Mahdy- 04/19/23 File Organization

fixed length record A record which is predetermined to be the same

length as the other records in the file.

Page 26: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-26 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Examples:

Page 27: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-27 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1

Advantage: the offset of each record can be calculated from its record number.  This makes direct access possible.

Advantage: there is no space overhead. Disadvantage: there will probably be internal fragmentation (unusable space within records.)

Page 28: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-28 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Algorithms for Fixed Length Records Reading:

»while the number of characters read is less than the record length Read a character into the next element of the array.

Writing:»while the number of characters written is less than

the record length Write a character from the next element of the array.

Page 29: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-29 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Delimited Variable Length Records variable length record:

»A record which can differ in length from the other records of the file.

delimited record:»A variable length record which is terminated by a

special character or sequence of characters. Delimiter:

»A special character or group of characters stored after a field or record, which indicates the end of the preceding unit.

Page 30: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-30 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1

Disadvantage: the offset of each record cannot be calculated from its record number.  This makes direct access impossible. Advantage: there is space overhead for the length prefix. Advantage: there will probably be no internal fragmentation (unusable space within records.)

Page 31: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-31 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Algorithms for Delimited Variable Length Records

Reading:»While the last characters read is not the delimiter

Read a character into the next element of the array . Writing:

»While the number of characters written is less than the record length Write a character from the next element of the array Write the delimiter

Page 32: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-32 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Length Prefixed Variable Length Records

Disadvantage: the offset of each record can be calculated from its record number.  This makes direct access possible. Disadvantage: there is space overhead for the delimiter suffix.

Advantage: there will probably be no internal fragmentation (unusable space within records.)

Page 33: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-33 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Algorithms for Prefixed Variable Length Records

Page 34: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-34 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Indexed Variable Length Records An auxiliary file can be used to point to the beginning

of each record. In this case, the data records can be contiguous. If the records are contiguous, the only access is through the index file.

Advantage: the offset of each record is be contained in the index, and can be looked up from its record number. This makes direct access possible.

Page 35: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-35 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1 Disadvantage: there is space overhead for the

index file. Disadvantage: there is time overhead for the

index file. Advantage: there will probably be no internal

fragmentation (unusable space within records.) The time overhead for accessing the index file can be minimized by reading the entire index file into memory when the files are opened.

Page 36: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-36 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Packing and Buffering

Page 37: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-37 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1

Page 38: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-38 - Prof Yousef B. Mahdy- 04/19/23 File Organization

2

Page 39: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-39 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1

Page 40: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-40 - Prof Yousef B. Mahdy- 04/19/23 File Organization

2

Page 41: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-41 - Prof Yousef B. Mahdy- 04/19/23 File Organization

3

Page 42: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-42 - Prof Yousef B. Mahdy- 04/19/23 File Organization

strcat The C++ strcat function is short for "string

concatenate." Strcat appends a copy of a source string to a destination string. The null terminator character in the destination will be overwritten by the first character of the source and another null character will be appended to the end of the resulting new string.

Page 43: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-43 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Examplechar string1[80];char string2[80];char string3[80];strcpy (string1,"This string ");strcpy (string2,"Rose");strcpy (string3,"s smell like old shoes.");strcat (string1,"is concatenated.");strcat (string2,string3);

Page 44: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-44 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Direct access using RRN A relative file, or relative record file has an

organization where each record in the file can be accessed by specifying the record’s position relative to the position of the first record of the file. This is an exact analogue of an array where the element in the array is accessed by specifying a “subscript” value or position of the element relative to the position of the first element of the array.

In a relative file access operation, the position value of a record is called the record's relative record number (RRN) or just its record number.

Page 45: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-45 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1 The record's relative position is based on these

parameters:»the byte address of the first byte of the first record

of the file»the size of the records in the file (recsize) »the RRN value of the target record»the RRN of the first record

Page 46: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-46 - Prof Yousef B. Mahdy- 04/19/23 File Organization

2

Page 47: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-47 - Prof Yousef B. Mahdy- 04/19/23 File Organization

3

Page 48: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-48 - Prof Yousef B. Mahdy- 04/19/23 File Organization

4

Page 49: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-49 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Page 50: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-50 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Example-2 Write a C++ program to read and write student

objects with fixed-length records and the fields delimited by "|".

Solution: Fixed length recodes with variable field lengths, so

the delimiter “|” is used.

Page 51: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-51 - Prof Yousef B. Mahdy- 04/19/23 File Organization

1

Page 52: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-52 - Prof Yousef B. Mahdy- 04/19/23 File Organization

2

Page 53: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-53 - Prof Yousef B. Mahdy- 04/19/23 File Organization

3

Page 54: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-54 - Prof Yousef B. Mahdy- 04/19/23 File Organization

4

Page 55: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-55 - Prof Yousef B. Mahdy- 04/19/23 File Organization

5

Page 56: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-56 - Prof Yousef B. Mahdy- 04/19/23 File Organization

6

Page 57: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-57 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Page 58: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-58 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Page 59: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-59 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Page 60: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-60 - Prof Yousef B. Mahdy- 04/19/23 File Organization

Page 61: Prof. Yousef B. Mahdy -2012-2013, Assuit University, Egypt File Organization Prof. Yousef B. Mahdy Chapter -4 Data Management in Files

1-61 - Prof Yousef B. Mahdy- 04/19/23 File Organization