19
References and Data Structures

References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Embed Size (px)

Citation preview

Page 1: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

References and Data Structures

Page 2: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

References• Just as in C, you can create a variable that is a reference (or

pointer) to another variable. That is, it contains the address in memory where the other variable is stored.

• In Perl, the backslash is used to create a reference: my $var = 5; my $var_ref = \$var;

• To dereference a simple reference, put it inside curly braces with another $ in front of it. Thus, ${$var_ref} is the same as $var, that is, the value “5”.

• The curly braces de-reference what is inside them. I like to say “{$var_ref} ‘generates’ the scalar variable” .

• In many cases you can leave the curly braces out: $$var_ref works just as well as ${$var_ref}. But, in complicated expressions this can cause havoc due to precedence problems.

Page 3: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

More References• This same trick works for arrays and hashes too. my @arr = qw(cow horse pig chicken); my $arr_ref = \@arr; print “Farm animals include @{$arr_ref}\n”; # can leave out {} here

my %hash = (“red” => “stop”, “yellow” => “caution”, “green” => “go”);

my $hash_ref = \%hash; foreach my $key (keys %{$hash_ref} ) { print “$key means ${$hash_ref}{$key}\n”; } # can leave out {} in the “foreach” line, but probably not on the

print line.

Page 4: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Arrow Notation

• Perl provides an alternative notation for use with array and hash references. The small arrow (hyphen followed by greater-than: ->) de-references. To access individual array or hash elements, follow the arrow with [] or {}.

• For example: my @arr = (1, 3, 5, 7); my $arr_ref = \@arr; for (my $i = 0; $i <= $#{$arr_ref}; $i++) { print “Element $i is $arr_ref->[$i]\n”; }• Similarly, hash keys would be placed inside curly braces

to access hash values from a hash reference.

Page 5: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Passing Arrays In and Out of Subroutines

• One important use of references is passing arrays, hashes, and very long strings into and out of subroutines.

• If you pass in a variable, it gets copied to a new location for use by the subroutine. If this is a very long string, such as the DNA sequence of a chromosome, you will use a large amount of memory.

• However, if you pass a reference to that string to the subroutine, the string itself is not copied.

• Recall that variables are passed into a subroutine by the @_ array. For example: process($var1, $var2, @arr); sub process { my ($x, $y, @z) = @_; ... }• If you try to pass in 2 arrays, they both end up together in the fist array inside the subroutine.

That is, Perl “flattens” multiple arrays into the single @_ array.• The way around the problem of passing multiple arrays in or out of subroutines is to pass in

references, which are just scalar variables. process($var1, @arr2, @arr3); # DOESN”T WORK process($var1, \@arr2, \@arr3); # GOOD sub process { my ($x, $arr_ref2, $arr_ref3) = @_;

Page 6: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

More on Subroutines

• Similarly, arrays are generally returned from subroutines in the form of array references.

• Note in this example that the array @arr is created within the subroutine, but returned as a reference. The name “@arr” doesn’t exist outside the subroutine.

sub add_to { my @arr; for (my $i = 0; $i < 10; $i++) { $arr[$i] = $i + 2; } return \@arr; }

Page 7: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Multidimensional Arrays

• Arrays are one-dimensional: a linear set of elements.• Suppose you want a two dimensional array, to keep

track of positions on a grid, for instance. Say, a tic-tac-toe game.

• Each row can be represented as a single array: @row1 = qw(X O O); @row2 = qw (O X O); @row3 = qw(X O X);• Since the elements of an array are scalars, you can’t just

put the row arrays together in a big array to represent the whole game board.

• However, array references are scalars, so the game board could be represented by an array of references to the sub-arrays:

@game = (\@row1, \@row2, \@row3);

Page 8: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

More on Multidimensional Arrays

• To access a row, you need to de-reference it: print “Row 2 is @{$game[1]} \n”;• Note the position of the curly braces which do the de-referencing:

they surround $game[1], which is an array reference, \@row2.

• To access an individual element, say the first square in row 2: print “ ${$game[1]}[0] \n”;

• You see that the index value [0] for the individual element is OUTSIDE the curly braces. The array reference is inside; once they return the array, the $ at the beginning of the expression and the [0] at the end of it access the individual element of that row.

Page 9: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Arrow Notation with Multidimensional Arrays

• You could also use arrow notation: print “$game[1]->[0] “;• Here, the arrow causes $game[1] to be dereferenced, at which point

you can access the individual element [0].• Perl, in its helpful fashion, allows you to not use arrows between

indices. Thus, this also works: print “$game[1][0]”• In this case, @game is an actual array. If you instead used a

reference to an array here: $game_ref = \@game; you would need to use the arrow between the variable name and the

first index value: print “$game_ref->[1][0]”;• You can leave the arrows out between the indexes, but not between

the initial array reference and the first index.

Page 10: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Anonymous Arrays• We have been creating an array such as @arr = (1, 3, 5, 7), then

creating a reference to that array: $arr_ref = \@arr.• It isn’t necessary to do this in 2 steps. If we only want to use the

array reference, we can create an anonymous array and create an array reference variable to refer to it. The anonymous array never gets its own name; it is always referred to by its reference.

• Recall that to construct an array you put the array values within parentheses:

@arr = (1, 3, 5, 7);• The anonymous array constructor is square brackets: []. $arr_ref = [1, 3, 5, 7];• Using square brackets instead of parentheses generates a

reference to an anonymous array, which you assign to a variable. In contrast, the parentheses generate the array itself, which must be given an array designation starting with @.

Page 11: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

More Anonymous Arrays• We could create the tic-tac-toe game thus: my @game = ( [ “X”, “O”, “O”], [ “O”, “X”, “O”], [“X”, “O”, “X”] );• That is, we generate 3 anonymous arrays inside the parentheses

that create the top level array @game.

• Or, we could generate an anonymous array containing 3 references to other anonymous arrays, and assign the whole mess to an array reference scalar:

my $game_ref = [ [ “X”, “O”, “O”], [ “O”, “X”, “O”], [“X”, “O”, “X”] ];• Here we use nested sets of anonymous array generators (square

brackets) to produce the array references we need.

Page 12: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Using Temporary Arrays in a Loop

• Another way to create a 2 dimensional array is to create each row as a temporary named array, then convert it to an anonymous array reference and push it onto a larger array.

for (my $i = 0; $i <= 3; $i++) { my @temp_arr = ($i, $i*2, $i*$i); push @big_arr, [ @temp_arr ]; }• The @temp_arr gets used repeatedly, but the values put into it are

placed in separate locations when it gets converted to an anonymous array with [ @temp_arr ].

• There is a temptation to rewrite the “push” line as: push @big_arr, \@temp_arr; #WRONG• This doesn’t work, because @temp_arr cahnges with every pass

through the loop, and \@temp_arr always refers to the same place in memory. In contrast, [ @temp_arr ] copies the values in @temp_arr to a new location with each pass through the loop.

Page 13: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Auto-vivification

• You don’t need to pre-declare anything about a multidimensional array. Perl takes care of this by creating all needed structures as soon as they are needed. Thus, you could say something like:

my @arr; $arr[5][0][1][4] = 17;• This would cause a 4-dimensional array to come

into being, with all values other than the one you specified set to “undef”.

Page 14: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Hash of Arrays• A hash stores a value that is indexed by its key. Sometimes you want to

store an array of values indexed by the same key. This can be done using the anonymous array composer to create an array for each individual hash key.

• For example, various data about students could be stored in a single hash whose keys are the student ID numbers.

my %students = ( “z12345” => [“Schmoe”, “Joe”, “freshman”, “F”], “z67890” => [“Smith”, “Harold”, “sophomore, “C”], “z13579” => [“Vicious”, “Nancy”, “senior”, “A”] );• To access a student’s info: print “@{$students{z12345} } \n”;• To access an individual piece of information, any of these will work: print “${$students{z12345}}[3] “; print “$students{z12345}->[3] “; print “$students{z12345}[3] “;• Note that $students{z12345} is a reference to an anonymous array.

Page 15: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Anonymous Hashes

• The anonymous hash generator is the curly braces {}. When used instead of parentheses, they generate a scalar reference to an anonymous hash.

• For example: my %hash = (“green” => “go”, “yellow” => “caution”,

“red” => “stop”); my $hash_ref = {“green” => “go”, “yellow” => “caution”,

“red” => “stop”};• Hash references are de-referenced just like array

references: print “A red light means $hash_ref->{red} \n”; print “A red light means ${$hash_ref}{red} \n”;

Page 16: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Array of Hashes• The anonymous hash composer can be used to create various data

structures. An array that contains a set of hash references is an example. • An example: an array of genes on a chromosome, where the position of the

gene in the array corresponds to its relative position on the chromosome. Information about each gene is stored in a hash.

• For example, assume that INFILE contains information about genes, one gene per line, in a “key = value” format, with each attribute separated by commas.

while (<INFILE>) { my @attributes = split /,/; my %temp_hash; foreach my $pair (@attributes) { my ($key, $value) = split /=/, $pair; $temp_hash{$key} = $value; } push @gene_arr, { %temp_hash}; }

Page 17: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Printing from Array of Hashes

• To print an individual element, say the length of gene 1. print “$gene_arr[1]{length} \n”;

• To print the whole thing: foreach my $i (0 .. $#gene_arr) { foreach my $key (sort keys %{$gene_arr[$i]} ) { print “$key = $gene_arr[$i]{$key}\n”; } }

Page 18: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Hash of Hashes• Here’s a hash of hashes example, based on the previous example of genes

on the chromosome. Here we are using a top level hash whose keys are the gene names.

• The input file has the gene name followed by a colon, followed by a comma-separated list of key=value pairs.

my %gene_hash; while (<INFILE>) { my ($gene, $rest) = split /\s*:\s*/; my @pairs = split /,/, $rest; my %temp_hash; foreach my $pair (@pairs) { my ($key, $value) = split /=/, $pair; $temp_hash{$key} = $value; } $gene_hash{$gene} = { %temp_hash }; }

Page 19: References and Data Structures. References Just as in C, you can create a variable that is a reference (or pointer) to another variable. That is, it contains

Further

• All kinds of data structure are possible, with as many levels as you like, mixing arrays and hashes freely. All you have to do is not get yourself confused by your own cleverness.

• Also, remember that someone else will probably have to read your code someday, so document the structures and avoid needless complications