Upload
curtis-ross
View
224
Download
5
Embed Size (px)
Citation preview
Topic 5: HashesTopic 5: Hashes
CSE2395/CSE3395Perl Programming
CSE2395/CSE3395Perl Programming
Learning Perl 3rd edition chapter 5, pages 73-85
Programming Perl 3rd edition pages 76-78, 697-700, 703-704, 733-734
perldata manpage
2Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
In this topicIn this topic
Hashes► aka associative arrays
Hash variables Functions which use hashes Uses of hashes Accessing Perl’s environment
Hashes► aka associative arrays
Hash variables Functions which use hashes Uses of hashes Accessing Perl’s environment
3Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
ArraysArrays
Arrays are► ordered► indexed by a number (integer)► dense
– if element n exists, so do elements 0 to n-1
Arrays are► ordered► indexed by a number (integer)► dense
– if element n exists, so do elements 0 to n-1
0 1 2 3 4 5
@array
indices
42 "dog" -0.2 undef 42 0
4Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
ArraysArrays
Arrays aren’t always best data structure Imagine array of students’ marks
► indexed by 8-digit student ID number
Arrays aren’t always best data structure Imagine array of students’ marks
► indexed by 8-digit student ID number
@marks
12345678 12345679 12345680 12345681
89 43 undef 70
0
Ten million empty elements in here!
5Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
ArraysArrays
Student ID numbers aren’t really numbers anyway► can’t do arithmetic on them► order of two student IDs not really important► really just strings that happen to contain digits
Want some data structure where indices are strings► usually called associative arrays
– or dictionary– or (lookup) table– or hash table
Student ID numbers aren’t really numbers anyway► can’t do arithmetic on them► order of two student IDs not really important► really just strings that happen to contain digits
Want some data structure where indices are strings► usually called associative arrays
– or dictionary– or (lookup) table– or hash table
6Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Associative arraysAssociative arrays
Associative array is an array where► can locate an array element’s value given index► indices are strings► indices are unique► indices are unordered
For example, to look up capital cities of countries
Associative array is an array where► can locate an array element’s value given index► indices are strings► indices are unique► indices are unordered
For example, to look up capital cities of countries
Peru Japan UK Russia Canada Egypt
Lima Tokyo London Moscow Ottawa Cairo
In Perl, associative arrays are called “hashes” (because they’re implemented using hash tables)
7Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Hashes in PerlHashes in Perl
Indices called keys► strings► must be unique► e.g., country names
Contents called values► any scalar► may be duplicated► e.g., capital city names
Can look up value given key, but not vice versa► What’s the capital of Egypt? (easy)► What country is Monrovia the capital of? (hard)
Unordered► You can’t sort a hash!► Perl stores elements in an order optimized for fast lookup
Indices called keys► strings► must be unique► e.g., country names
Contents called values► any scalar► may be duplicated► e.g., capital city names
Can look up value given key, but not vice versa► What’s the capital of Egypt? (easy)► What country is Monrovia the capital of? (hard)
Unordered► You can’t sort a hash!► Perl stores elements in an order optimized for fast lookup
Llama3 pages 73-74; Camel3 pages 51, 76-77; perldata manpage
8Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Hash elementsHash elements
Hash key written inside { curly braces }► contrast with normal arrays using [ square brackets ]► $capital{"Egypt"} # Equal to "Cairo"► $capital{$nation} # Depends on $nation
Can assign to a hash element► overwrites the old value, if there was one
– or creates a new element, if there wasn’t► doesn’t change any other element► $capital{"Australia"} = "Canberra";
Using nonexistent key returns undef► $capital{"Atlantis"} # No such country
Hash key written inside { curly braces }► contrast with normal arrays using [ square brackets ]► $capital{"Egypt"} # Equal to "Cairo"► $capital{$nation} # Depends on $nation
Can assign to a hash element► overwrites the old value, if there was one
– or creates a new element, if there wasn’t► doesn’t change any other element► $capital{"Australia"} = "Canberra";
Using nonexistent key returns undef► $capital{"Atlantis"} # No such country
Llama3 pages 76-78; Camel3 page 67
9Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Testing hash elementsTesting hash elements
Can determine if hash key exists using exists function► exists $capital{"Canada"} # True► exists $capital{"Atlantis"} # False
Not same as using defined► key can exist, but value can be undefined► exists $capital{"Vatican City"} # True► defined $capital{"Vatican City"} # False
Can determine if hash key exists using exists function► exists $capital{"Canada"} # True► exists $capital{"Atlantis"} # False
Not same as using defined► key can exist, but value can be undefined► exists $capital{"Vatican City"} # True► defined $capital{"Vatican City"} # False
Llama3 page 83; Camel3 pages 697-698, 710-711; perlfunc manpage
10Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Deleting hash elementsDeleting hash elements
To remove an entry from a hash, use delete function► delete $capital{"Czechoslovakia"};► exists will now return false for that key
To clear a hash, assign empty list to entire hash► %capital = (); # World anarchy
To remove an entry from a hash, use delete function► delete $capital{"Czechoslovakia"};► exists will now return false for that key
To clear a hash, assign empty list to entire hash► %capital = (); # World anarchy
Llama3 pages 76-77, 83-84; Camel3 pages 699-700; perlfunc manpage
11Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Entire hashesEntire hashes
To refer to an entire hash, use %hash► % instead of $► no curly braces
Can copy hashes► %clone = %hash;
Can initialize hash with many elements by assigning list to it
► for each element, write key followed by value► order of key/value pairs not important► %capital = ("Peru", "Lima", "Japan", "Tokyo", "UK", "London", "Russia", "Moscow", "Canada", "Ottawa", "Egypt", "Cairo");
Hashes flatten back into lists when used in list context► e.g., when passed to a subroutine
To refer to an entire hash, use %hash► % instead of $► no curly braces
Can copy hashes► %clone = %hash;
Can initialize hash with many elements by assigning list to it
► for each element, write key followed by value► order of key/value pairs not important► %capital = ("Peru", "Lima", "Japan", "Tokyo", "UK", "London", "Russia", "Moscow", "Canada", "Ottawa", "Egypt", "Cairo");
Hashes flatten back into lists when used in list context► e.g., when passed to a subroutine
Llama3 pages 78-79; Camel3 pages 76-78
12Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Hash elementsHash elements
Hashes, subroutines, arrays and scalars occupy different namespaces► %x, $x{...} refer to hash %x► @x, $x[...] refer to array @x► &x, x(...) refer to subroutine &x► $x refers to scalar $x
Hash elements interpolate into double-quoted strings► print "The capital of $nation is $capital{$nation}\n";
Entire hashes don’t interpolate at all.► print "%capital"; # Prints "%capital"
Hashes, subroutines, arrays and scalars occupy different namespaces► %x, $x{...} refer to hash %x► @x, $x[...] refer to array @x► &x, x(...) refer to subroutine &x► $x refers to scalar $x
Hash elements interpolate into double-quoted strings► print "The capital of $nation is $capital{$nation}\n";
Entire hashes don’t interpolate at all.► print "%capital"; # Prints "%capital"
13Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Functions that use hashesFunctions that use hashes
How do you print out the contents of a hash?► need to know what keys a hash has
– from each key, can get value with $hash{key}
keys function returns a list of all keys in a hash► order is indeterminate, but same every time► every key is unique
– by definition of hash► keys %capital # Returns list ("Canada", "UK", "Egypt", "Japan", "Peru", "Russia") (maybe)
values function returns a list of all values in a hash► order is same as from keys function► values may be duplicated
– values may be any scalar► values %capital # Returns list ("Ottawa", "London", "Cairo", "Tokyo", "Lima", "Moscow")
How do you print out the contents of a hash?► need to know what keys a hash has
– from each key, can get value with $hash{key}
keys function returns a list of all keys in a hash► order is indeterminate, but same every time► every key is unique
– by definition of hash► keys %capital # Returns list ("Canada", "UK", "Egypt", "Japan", "Peru", "Russia") (maybe)
values function returns a list of all values in a hash► order is same as from keys function► values may be duplicated
– values may be any scalar► values %capital # Returns list ("Ottawa", "London", "Cairo", "Tokyo", "Lima", "Moscow")
Llama3 pages 80-81; Camel3 pages 733-734, 824; perlfunc manpage
14Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Printing an entire hash using keys function.
# Initialize the hash.# The => notation is just a pretty-looking# synonym for the , (comma) operator that also quotes# the the word on the left side. Great for hashes.%capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo");
# Iterate over the hash, once per nation.# Order is indeterminate.foreach $nation (keys %capital){ print "Capital of $nation is $capital{$nation}\n";}
# Printing an entire hash using keys function.
# Initialize the hash.# The => notation is just a pretty-looking# synonym for the , (comma) operator that also quotes# the the word on the left side. Great for hashes.%capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo");
# Iterate over the hash, once per nation.# Order is indeterminate.foreach $nation (keys %capital){ print "Capital of $nation is $capital{$nation}\n";}
15Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Printing an entire hash, sorted by country.
# Initialize the hash.%capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo");
# Iterate over the hash, once per nation.# Note that this isn't sorting the hash,# nor even iterating over the hash, but# iterating over a sorted list of the hash's keys.foreach $nation (sort keys %capital){ print "Capital of $nation is $capital{$nation}\n";}
# Printing an entire hash, sorted by country.
# Initialize the hash.%capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo");
# Iterate over the hash, once per nation.# Note that this isn't sorting the hash,# nor even iterating over the hash, but# iterating over a sorted list of the hash's keys.foreach $nation (sort keys %capital){ print "Capital of $nation is $capital{$nation}\n";}
16Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Functions that use hashesFunctions that use hashes
keys may return a very large list► perhaps inefficient if you need only one hash element at a time
each function iterates over a hash► one element at a time► on first call, returns a two-element list containing one key/value
pair► subsequent calls return other key/value pairs
– order indeterminate, but guaranteed not to repeat any pairs► when all key/value pairs have been returned once, returns empty
list► state is kept by Perl with hidden attribute on hash variable► much more space-efficient than using keys► typical use
– while (($key, $value) = each %hash) { ... }
keys may return a very large list► perhaps inefficient if you need only one hash element at a time
each function iterates over a hash► one element at a time► on first call, returns a two-element list containing one key/value
pair► subsequent calls return other key/value pairs
– order indeterminate, but guaranteed not to repeat any pairs► when all key/value pairs have been returned once, returns empty
list► state is kept by Perl with hidden attribute on hash variable► much more space-efficient than using keys► typical use
– while (($key, $value) = each %hash) { ... }
Llama3 pages 81-82; Camel3 pages 703-704; perlfunc manpage
17Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Printing an entire hash, using each function.
# Initialize the hash.%capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo");
# Iterate over the hash, once per nation.# No provision for sorting the output here,# because order returned by each function# is indeterminate.while (($nation, $city) = each %capital){ print "Capital of $nation is $city\n";}
# Printing an entire hash, using each function.
# Initialize the hash.%capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo");
# Iterate over the hash, once per nation.# No provision for sorting the output here,# because order returned by each function# is indeterminate.while (($nation, $city) = each %capital){ print "Capital of $nation is $city\n";}
18Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Uses of hashesUses of hashes
Hashes useful for► implementing sparse arrays► implementing lookup tables/databases► counting strings► removing duplicates from a list► passing named parameters to subroutines
Hashes useful for► implementing sparse arrays► implementing lookup tables/databases► counting strings► removing duplicates from a list► passing named parameters to subroutines
Llama3 pages 75-76
19Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Hashes: sparse arraysHashes: sparse arrays
Normal arrays are dense► creating $a[10000] creates @a[0..9999] too.
Hash keys are independent► creating $h{"10000"} creates no other other
elements– only elements that exist need to take up memory
► just have to pretend that keys (really strings) are integers
– like student ID numbers► may have to write some code to fake “order” of
elements– foreach $element (sort {$a <=> $b} keys %h)
Normal arrays are dense► creating $a[10000] creates @a[0..9999] too.
Hash keys are independent► creating $h{"10000"} creates no other other
elements– only elements that exist need to take up memory
► just have to pretend that keys (really strings) are integers
– like student ID numbers► may have to write some code to fake “order” of
elements– foreach $element (sort {$a <=> $b} keys %h)
20Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Hashes: lookup tableHashes: lookup table
Using hash, can look up string (value) given string (key)► look up the capital of a country
– capital of Malaysia is Kuala Lumpur► look up a word in a dictionary
– definition of dog is “domestic canine”► look up the IP address of machine
– slashdot.org’s IP address is 66.35.250.150► look up the value of a variable in an interpreter
– value of variable x is 5► look up the title of a book
– book with ISBN 0-596-00027-8 is “Programming Perl”► look up the real name of a student
– student 11111111 is Bart Simpson
Any relationship with a one-to-many relationship is perfect for a hash
Using hash, can look up string (value) given string (key)► look up the capital of a country
– capital of Malaysia is Kuala Lumpur► look up a word in a dictionary
– definition of dog is “domestic canine”► look up the IP address of machine
– slashdot.org’s IP address is 66.35.250.150► look up the value of a variable in an interpreter
– value of variable x is 5► look up the title of a book
– book with ISBN 0-596-00027-8 is “Programming Perl”► look up the real name of a student
– student 11111111 is Bart Simpson
Any relationship with a one-to-many relationship is perfect for a hash
21Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Using the program's environment# All processes have a set of names and values which# they inherit from their parents. These can be# set in the shell by typing NAME=VALUE.
print "Your home directory is $ENV{'HOME'}\n";
if ($ENV{'SHELL'} eq "/bin/csh"){ # Commiserate with user. print "Your shell is csh. Yuck!";}
print "Commands are looked for in these dirs:\n";print " $_\n" foreach (split /:/, $ENV{'PATH'}) # split: Topic 7
# Using the program's environment# All processes have a set of names and values which# they inherit from their parents. These can be# set in the shell by typing NAME=VALUE.
print "Your home directory is $ENV{'HOME'}\n";
if ($ENV{'SHELL'} eq "/bin/csh"){ # Commiserate with user. print "Your shell is csh. Yuck!";}
print "Commands are looked for in these dirs:\n";print " $_\n" foreach (split /:/, $ENV{'PATH'}) # split: Topic 7
22Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Hashes: counting stringsHashes: counting strings
Use hash to count frequency of strings► key is the string (“dog”)► value (integer) is the count (has been seen 3 times so
far)► increment the value every time a key is read
Can be used to find intersection (common elements) between two arrays► iterate over first array: count elements found► iterate over second array: include element in result
only if it was seen in the first array► can compute union and difference similarly
Use hash to count frequency of strings► key is the string (“dog”)► value (integer) is the count (has been seen 3 times so
far)► increment the value every time a key is read
Can be used to find intersection (common elements) between two arrays► iterate over first array: count elements found► iterate over second array: include element in result
only if it was seen in the first array► can compute union and difference similarly
23Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Counting strings.
%seen = (); # Nothing has been seen so far.
while (<>) # Read words from input.{ chomp; # Increment the counter with line's text as key. $seen{$_}++; print "$_ has been seen $seen{$_} times so far\n";}
# Final report.while (($line, $count) = each %seen){ print "$line was seen $count times overall\n";}
# Counting strings.
%seen = (); # Nothing has been seen so far.
while (<>) # Read words from input.{ chomp; # Increment the counter with line's text as key. $seen{$_}++; print "$_ has been seen $seen{$_} times so far\n";}
# Final report.while (($line, $count) = each %seen){ print "$line was seen $count times overall\n";}
24Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Intersection of two arrays.
%seen = ();@intersection = ();
foreach (@one) # Iterate through first array.{ # Remember which elements have been seen. $seen{$_} = 1; # Any true value will do.}
foreach (@two) # Now iterate through second array.{ # Only add to result if was seen in @one. push @intersection, $_ if $seen{$_};}
# Intersection of two arrays.
%seen = ();@intersection = ();
foreach (@one) # Iterate through first array.{ # Remember which elements have been seen. $seen{$_} = 1; # Any true value will do.}
foreach (@two) # Now iterate through second array.{ # Only add to result if was seen in @one. push @intersection, $_ if $seen{$_};}
25Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Hashes: removing duplicatesHashes: removing duplicates
An extension of counting elements in a list► if this is the first time element seen, include in result► otherwise, skip this element
An extension of counting elements in a list► if this is the first time element seen, include in result► otherwise, skip this element
26Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Simple implementation of Unix sort and sort -u
# Was -u (unique) switch given?if ($ARGV[0] eq "-u") { $unique = 1; shift; # Remove -u argument.}
# Read all input lines and sort them.@result = sort <>;
if ($unique){
# Filter out anything already seen. @result = grep { !$seen{$_}++ } @result;}
print @result; # Output remaining lines.
# Simple implementation of Unix sort and sort -u
# Was -u (unique) switch given?if ($ARGV[0] eq "-u") { $unique = 1; shift; # Remove -u argument.}
# Read all input lines and sort them.@result = sort <>;
if ($unique){
# Filter out anything already seen. @result = grep { !$seen{$_}++ } @result;}
print @result; # Output remaining lines.
27Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Hashes: named parametersHashes: named parameters
Calling subroutines with many parameters is messy► printformatted(56, "$", 8, 2, "decimal");
– what did the 8 mean again?► especially when some parameters are optional and have a
reasonable default anyway
Can use hash to identify optional parameters and give them values► printformatted(56, prefix => '$', format => "decimal", precision => 8, places => 2);
– self-documenting code– order of parameters no longer matters
► printformatted(56, format => "hex");– only need to name the parameters with non-default values
► subroutines require a little code to handle this
Calling subroutines with many parameters is messy► printformatted(56, "$", 8, 2, "decimal");
– what did the 8 mean again?► especially when some parameters are optional and have a
reasonable default anyway
Can use hash to identify optional parameters and give them values► printformatted(56, prefix => '$', format => "decimal", precision => 8, places => 2);
– self-documenting code– order of parameters no longer matters
► printformatted(56, format => "hex");– only need to name the parameters with non-default values
► subroutines require a little code to handle this
28Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
TimeoutTimeout
# Map formats to printf percent-things.%format = (decimal => "d", hex => "x", octal => "o");
# Print a number with a certain format.sub printformatted{ my $number = shift; # Value to print. my %param = ( format => "decimal", # Defaults. precision => "6", @_ # Rest of sub params. ); printf( # Build up printf format string. ($param{"prefix"} . "%" . $param{"precision"} . "." . $param{"places"} . $format{$param{"format"}}), $number);}
# Map formats to printf percent-things.%format = (decimal => "d", hex => "x", octal => "o");
# Print a number with a certain format.sub printformatted{ my $number = shift; # Value to print. my %param = ( format => "decimal", # Defaults. precision => "6", @_ # Rest of sub params. ); printf( # Build up printf format string. ($param{"prefix"} . "%" . $param{"precision"} . "." . $param{"places"} . $format{$param{"format"}}), $number);}
29Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Covered in this topicCovered in this topic
Hashes Hash variables
► $hash{key}, %hash
Functions which use hashes► keys, values► each
Uses of hashes► data lookup► sparse arrays► counting elements in a list► removing duplicates from a list► accessing a process’ environment► subroutines with optional parameters
Hashes Hash variables
► $hash{key}, %hash
Functions which use hashes► keys, values► each
Uses of hashes► data lookup► sparse arrays► counting elements in a list► removing duplicates from a list► accessing a process’ environment► subroutines with optional parameters
30Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Going furtherGoing further
Tying► treat an external file (or any other object) like an
internal hash (or any other type)► Camel3 pages 363-398
Databases► talking to databases with Perl► Programming the Perl DBI by Alligator Descartes and
Tim Bunce, O’Reilly 2000
Shells► the Unix command-line interface► man sh
Tying► treat an external file (or any other object) like an
internal hash (or any other type)► Camel3 pages 363-398
Databases► talking to databases with Perl► Programming the Perl DBI by Alligator Descartes and
Tim Bunce, O’Reilly 2000
Shells► the Unix command-line interface► man sh
31Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University
Next topicNext topic
Regular expressions► pattern matching
Regular expressions► pattern matching
Llama3 chapters 7-9, pages 98-127Camel3 pages 139-195perlre manpage