12
Sorting Large Files Part One: Why even bother? And a simple solution.

Sorting Large Files Part One: Why even bother? And a simple solution

Embed Size (px)

Citation preview

Page 1: Sorting Large Files Part One:  Why even bother?  And a simple solution

Sorting Large Files

Part One: Why even bother? And a simple solution.

Page 2: Sorting Large Files Part One:  Why even bother?  And a simple solution

Starter Questions

Why sort a large data file? speed of searching

Why not sort a large data file? difficult to add and delete data

Page 3: Sorting Large Files Part One:  Why even bother?  And a simple solution

Searching Unsorted Files Algorithm - Sequential Search

start at top of the file and inspect each record until found

Efficiency best case: 1 worst case: N average case: N / 2

average search for 1,000,000 records is 500,000 compares

Big O N

Page 4: Sorting Large Files Part One:  Why even bother?  And a simple solution

Searching Sorted Files Example 1: Sequential Search

Example 2: Binary SearchBasic Algorithm

look at middle recordif (target < current record) look at front halfelse look at end half

Big O = log2(N) average search for 1,000,000 records is 20 compares

Page 5: Sorting Large Files Part One:  Why even bother?  And a simple solution

Editing Unsorted Files

How do you add data? append new data to end of file

How do you delete data? mark over records with Xs and 0s periodically clean the file

Page 6: Sorting Large Files Part One:  Why even bother?  And a simple solution

Editing Sorted Files

To Delete Records, we cannot put Xs over the key field of records

Maintain 3 sorted Filesworking datadata to deletedata to add

To Update --> Merge the three all at once

Page 7: Sorting Large Files Part One:  Why even bother?  And a simple solution

Example Update of Sorted File

Working Data:aardvarkbatcatdoggiraffehippopotamus

Data to Delete:cat

Data to Add:elephantferret

New Working Data:aardvarkbatdogelephantferretgiraffehippopotamus

Page 8: Sorting Large Files Part One:  Why even bother?  And a simple solution

Question

Why we would ever need to sort a file? Wouldn't we build it sorted to begin with and just keep it sorted? sort a big block of new data

e.g., list of transactions from today

sort a huge file by a different key

Page 9: Sorting Large Files Part One:  Why even bother?  And a simple solution

File Sorting Algorithms

Internal Sortswhen the whole file will fit in main memoryalgorithm:

1. read the unsorted file into memory

2. sort all at once

3. write to new file

Page 10: Sorting Large Files Part One:  Why even bother?  And a simple solution

File Sorting Algorithms

External Sortswhen the file is too big to fit in memoryover simplified algorithm:

while not eof

read a big block of the data into memory

sort that portion

write into a temp file

merge all those temp files

Page 11: Sorting Large Files Part One:  Why even bother?  And a simple solution

2-Way Merge SortCreate 2 sorted filesRead 1st half of file W into memorysort it, then write to file XRead 2nd half of W into memorysort it, then write to file Y

Merge the 2 filesRead record x from XRead record y from YWhile both X and Y contain records if x < y write x to Z read x from X else write y to Z read y from YIf X is empty write remainder of Y to Zelse write remainder of X to Z

Page 12: Sorting Large Files Part One:  Why even bother?  And a simple solution

Next Time

Good internal sorts

Merging a small amount of unsorted new data into a Big Sorted File

N-Way Merge Sort