Upload
nora-miller
View
212
Download
0
Embed Size (px)
Citation preview
Sorting Large Files
Part One: Why even bother? And a simple solution.
Starter Questions
Why sort a large data file? speed of searching
Why not sort a large data file? difficult to add and delete data
Searching Unsorted Files Algorithm - Sequential Search
start at top of the file and inspect each record until found
Efficiency best case: 1 worst case: N average case: N / 2
average search for 1,000,000 records is 500,000 compares
Big O N
Searching Sorted Files Example 1: Sequential Search
Example 2: Binary SearchBasic Algorithm
look at middle recordif (target < current record) look at front halfelse look at end half
Big O = log2(N) average search for 1,000,000 records is 20 compares
Editing Unsorted Files
How do you add data? append new data to end of file
How do you delete data? mark over records with Xs and 0s periodically clean the file
Editing Sorted Files
To Delete Records, we cannot put Xs over the key field of records
Maintain 3 sorted Filesworking datadata to deletedata to add
To Update --> Merge the three all at once
Example Update of Sorted File
Working Data:aardvarkbatcatdoggiraffehippopotamus
Data to Delete:cat
Data to Add:elephantferret
New Working Data:aardvarkbatdogelephantferretgiraffehippopotamus
Question
Why we would ever need to sort a file? Wouldn't we build it sorted to begin with and just keep it sorted? sort a big block of new data
e.g., list of transactions from today
sort a huge file by a different key
File Sorting Algorithms
Internal Sortswhen the whole file will fit in main memoryalgorithm:
1. read the unsorted file into memory
2. sort all at once
3. write to new file
File Sorting Algorithms
External Sortswhen the file is too big to fit in memoryover simplified algorithm:
while not eof
read a big block of the data into memory
sort that portion
write into a temp file
merge all those temp files
2-Way Merge SortCreate 2 sorted filesRead 1st half of file W into memorysort it, then write to file XRead 2nd half of W into memorysort it, then write to file Y
Merge the 2 filesRead record x from XRead record y from YWhile both X and Y contain records if x < y write x to Z read x from X else write y to Z read y from YIf X is empty write remainder of Y to Zelse write remainder of X to Z
Next Time
Good internal sorts
Merging a small amount of unsorted new data into a Big Sorted File
N-Way Merge Sort