Extended Perl5 Tutorial
Now that you've been introduced to the new features in Perl5, you're ready to embark on an extended tutorial on references and modules. You need to understand how these elements work so that you can make use of the examples to follow in this tutorial. There's a lot to cover here, so grab a cup of coffee, and I'll try to avoid monotony. You might find it helpful to be sitting at your computer with your copy of Perl5 ready to run so that you can try out the sample code as you go along.
In the past, the Perl programmer had to go through some contortions to implement various complex data types, such as arrays of arrays. The notion of a variable that "pointed" to another data type did not exist. With the advent of Perl5, you now have the reference variable type. References are actually just standard Perl scalar variables, which are assigned or initialized to allow them to be used to refer or "point" to some other Perl data type. References give you powerful new capabilities when writing Perl programs.
To create a new real reference variable, you use the following general syntax:
$variable = \datatype;
Here, you set $variable to be a reference to datatype by preceding datatype with a backslash. $variable can now be used to refer or assign to datatype using an explicit form of a dereference, depending on what datatype is. Table 2.2 illustrates the syntax for using real references. A number of other types of references also exist, each with its own assignment syntax, but I won't explain those types just yet. Table 2.2. References: Data types and assignment/dereference syntax.
Data Type | Assignment Syntax | Dereference Syntax |
Scalar | $ref = \$var; | $$ref |
Scalar Array | $ref = \@array | @{$ref} or ${$ref}[0] for individual elements |
Hash | $ref = \%array | %{$ref} or ${$ref}{key} for |
individual elements | ||
Reference | $refref = \$ref | $$$ref |
Subroutine (CODE) | $ref = \&sub | &$sub |
Package | bless $ref, Package | $ref->method() $ref->variable |
In the following sections, you'll look at each of these data types in depth, and I'll demonstrate their use with some examples. These examples should provide you with some general insight as to how each type of reference can be used, but note that they are not comprehensive. You can study the full power and capabilities of references by reading PERLREF and PERLDSC. Another potentially useful document for studying how references work is the test script for references in the Perl distribution, called ref.t. Under UNIX, you can find it in the t/op directory under the Perl build directory. Under Macintosh, the t/op directory should be located within the installation directory. Under Windows 95, this directory is named ntt, and the file has been given the .ntt extension. Look for ntt/op/ref.ntt instead of t/op/ref.t under Windows(ntperl). It contains a complete test suite for all types of Perl references.
References to Scalars
Scalar variables, the simplest type of Perl variable, can be referenced, as can all other types. Although the usefulness of references to simple scalars may not be immediately evident, referencing is certainly an option.
Consider the following example:
$foo = "Initial value"; &update_scalar(); print $foo,"\n"; sub update_scalar{ $foo = "Updated"; } # prints: Updated
This example takes a global variable, $foo, and sets it to an initial value; then it calls the update_scalar subroutine to set it to a new value. Simple enough, but if the update_scalar subroutine lives in a package, you're out of luck. Observe the following:
$foo = "Initial value"; &test::update_scalar(); print $foo,"\n"; package test; sub update_scalar{ $foo = "Updated"; } # prints: Initial value
Here, the $foo variable doesn't get changed because the test package has its own namespace and its own $foo variable, and can't access the $foo in main without some specific semantics. When you work with modules and packages, you'll be faced with this restriction.
So, what to do? Well, you could pass in the $foo from main as a parameter to the subroutine and try to update it within the subroutine like this:
$foo = "Initial value"; &test::update_scalar($foo); print $foo,"\n"; package test; sub update_scalar{ ($foo) = @_; $foo = "Updated"; } # prints: Initial value
Alas, the $foo that gets updated in the update_scalar subroutine is just a copy of the $foo that is passed in. You're still dealing with two specific variables, in different packages, and you're essentially passing by value when you make a reassignment within the subroutine. The experienced Perl4 programmer will recognize that there's also the option of modifying $_[0] directly, but references provide a cleaner solution.
The solution I've chosen, using a reference to a scalar, is to create a reference to main's $foo, pass it into the update_scalar subroutine, and then dereference for the assignment, as follows:
$foo = "Initial value"; &test::update_scalar(\$foo); print $foo,"\n"; package test; sub update_scalar{ ($foo) = @_; $$foo = "Updated"; } # prints: Updated
Notice how you implicitly pass the reference to the subroutine by using the backslash operator on main's $foo variable in the subroutine call. You thus pass main's $foo by reference to the update_scalar subroutine, and when you assign it to the $foo in the subroutine, you are actually creating a real reference to the $foo in main. Using the dereferencing syntax described in Table 2.2, you then can change $main::foo implicitly through the reference, using the $$foo dereferencing syntax.
References to scalar types have many uses; this simple example describes only one. References to Scalar Arrays
Scalar arrays are arrays of Perl scalar types. You declare them using the @name syntax. Using a reference to the scalar array enables you to access the elements of an array individually or refer to the entire array, as shown in Table 2.2. Of course, you also can use the reference anywhere an array is expected, such as within a foreach() loop. The following example again illustrates the usefulness of references when passing arguments to subroutines. Consider the following code:
@array1 = (1, 3, 5); @array2 = (2, 4, 6);
Now, what if you want to pass these arrays to a Perl subroutine and then access them within the subroutine, possibly modifying their values? If you've ever tried to pass two or more arrays to a Perl subroutine, then you know that it can't be done easily, because there's no way to determine where the first array ends and the next one begins. (Recall that the parameters passed to a Perl subroutine are accessible only through the @_ array and thus appear to be a single array to the subroutine that receives them.)
Using references, you can circumvent this limitation. If you create references to each of the preceding arrays, you can easily pass two scalars to the subroutine and then dereference the arrays those scalars have been assigned to, like this:
@array1 = (1, 3, 5); @array2 = (2, 4, 6); $ref1 = \@array1; $ref2 = \@array2; @sum = &array_adder($ref1, $ref2); print "\@sum = (", join(`,',@sum), ")\n"; sub array_adder{ my($ref1, $ref2) = @_; my $i = 0; my @sum; for($i = 0; $i <= $#{$ref1} ; $i++){ $sum[$i] = ${$ref1}[$i] + ${$ref2}[$i]; } return @sum; } # prints: @sum = (3,7,11)
Here, you've created a new array, whose elements are the sum of the individual elements of two equal-length arrays' elements. That's easy, but you do it by passing the arrays to a subroutine using references and thus make a formerly difficult, or at least nonintuitive, task easier. In Perl4, you would have had to either use glob types or have passed in the length of the arrays as the first or last argument and then split @_ appropriately. Not pretty.
Note how you are able to use the reference within the subroutine in the $#array context (the highest index of the array from zero-base), as well as access the individual elements of the arrays that are being referred to. Again, this is just one single use for references to arrays. See the documentation mentioned previously for many more examples, PERLLOL for instance. References to Hashes (Associative Arrays)
When you create a reference to an associative array (hash), you can access all the keys and values of the associative array through the reference. You can also use the reference in place of the hash, using the syntax in Table 2.2, within any given Perl function that operates on associative arrays or their elements, such as keys(), foreach(), and delete().
Hash references are extremely powerful. Using them, you can build up complex data structures containing all the Perl data types or references to those types. In the following example, you use the standard assignment/dereference syntax described previously and one of the other dereferencing syntaxes:
@array = (1, 2, 3); %Hash = ( `foo'=> `bar', `aref' => \@array, `internalhash' => { `birds' => `duck', `plants' => `tomato' } ); # print out the simple scalar element print $Hash{`foo'}, "\n"; # print out the elements of @array print join(` `,@{$Hash{`aref'}}),"\n"; # print out the elements of the %internalhash foreach $key (keys( %{$Hash{`internalhash'}} )){ print "Key is $key, value is $Hash{`internalhash'}->{$key} \n"; } # prints: bar 1 2 3 Key is plants, value is tomato Key is birds, value is duck
Note how you dereference the value corresponding to the internal hash key of %Hash by using the -> dereferencing operator. You can use -> because the value is itself a reference to an anonymous hash. Because it's a reference, you can use it as a regular scalar element of the array and still emulate a multidimensional hash.
Caution:
Remember, in spite of appearances, arrays are always one dimensional within Perl. You're only emulating multidimensionality here by using references. See PERLDSC for more details.
References to References
References, like any other Perl data type, can have references to themselves. You may, for instance, have several references to various types or data structures that you want to group together under a single reference. Here's an example:
$scalar = "a string"; # a regular scalar $array = [1, 2, 3]; # anonymous array $hash = {"foo" => "bar", "baz" => "blech"}; # anonymous hash $scalarref = \$scalar; # ref to scalar $refref = [\$scalarref, \$array, \$hash]; # anon array of refs to refs # print out the contents of the ref to ref to scalar print $${$refref->[0]},"\n"; # print out the elements of the ref to ref to array print join(` `,@{${$refref->[1]}}),"\n"; # print out the elements of the ref to ref to hash foreach $key (keys( %{${$refref->[2]}} )){ print "Key is $key, value is ${${$refref->[2]}}{$key} \n"; } # prints: a string 1 2 3 Key is foo, value is bar Key is baz, value is blech
You can go as deep as you like with references to references. Of course, at some point, readability may suffer. I recommend readability over complexity in most cases, especially in a public interface to a module or library, or within code that may require modifications by someone other than yourself in the future. References to Subroutines
The last type of standard reference to look at, before getting to blessed references and object-
oriented Perl, is the reference to code. Specifically, let's look at how to set up and use a reference to a subroutine.
References to subroutines may be useful in a number of situations. You can use them to implement closures, as previously described, or as subroutine parameters, or as part of complex data types. Subroutine references are also useful within packages, but note that you can't take an external (to a package) reference to a subroutine which is within the external package, because of inheritance. See PERLSUB for more details.
The following example illustrates a simple case in which you set up an array of references to subroutines in the same package. Here, you use two types of references to access the subroutines:
sub foo{ return "I\'m in foo now\n"; } sub bar{ return "Here I am in bar\n"; } %subrefs = (`foo' => \&foo, `bar' => `bar', ); # bar is a "fake" reference while(($key,$ref) = each(%subrefs)){ print ${key}, " : ", &$ref; } # prints: foo : I'm in foo now baz : Here I am in baz
You set up a single hash to contain multiple references to subroutines. The potential for dynamic runtime decision trees should be evident. You could arbitrarily assign references to subroutines at runtime, based on some input parameters, for instance, then execute the subroutines using the references. You could have easily done the same with a regular scalar array, emulating a C-style array of pointers to functions--not as powerful as a hash of function pointers, but potentially useful.
Note how, in the preceding example when setting up the reference to the bar subroutine in the %subrefs declaration, I didn't use the \ operator to prepend it. Instead, you use a fake reference, which is another way to access a given data type by its name. This technique could also be used in Perl4; however, it works for any data type in Perl5, and it is an actual reference in Perl5. You can easily declare another (fake) reference to fake reference, to any depth. The ref.t test suite has a nice example for using fake references. I won't discuss them much more, because they're not widely used, but they're worth noting. References to Packages (When Blessed)
When the reference variable refers to a Perl package and has been blessed into the package, as shown in Table 2.2, it is known as a Perl object. It can then be used to store, and allow access to, any Perl data type that is used by the package, akin to public instance variables being accessed by a reference in C++.
You also can use the blessed reference to invoke the methods of the package, also analogous to a C++ class reference. When you use a blessed reference in this way, the method that gets invoked automatically has access to the object. This technique is very common in the Perl modules, and it's very powerful. You need to have a clear understanding of it in order to use and reuse the examples modify them to suit your needs.
Let me give you a simple example to illustrate the concepts. In this example, the object is a reference to hash. You could simplify it to a reference to array or scalar, if it were appropriate. A hash reference gives the most flexibility to access and grow the object dynamically. Consider the following:
package Customer; sub new { my $type = shift; my %args = @_; my $self = {}; $self->{`Name'} = length($args{`Name'}) ? $args{`Name'} : "No name given"; $self->{`Vitals'} = defined(@{$args{`Vitals'}}) ? $args{`Vitals'} : ["No vitals"]; bless $self, $type; } sub dumpcust{ my $self = shift; # Print out the values for the object print $self->{Name},"\n"; print join(` `,@{$self->{Vitals}}),"\n"; } package main; # Create a new customer object $cust = Customer->new( `Name' => "Billy T. Kid", `Vitals' => ["Age : 42", "Sex : M"] ); # Invoke the method to print out the values for the object print $cust->dumpcust; # prints: Billy T. Kid Age : 42 Sex : M
In this simple example, you see how to initialize the Perl object, which is a reference to hash, with both scalar elements and a reference to a scalar array. Then you invoke the dumpcust() method from the Customer package or class, using the blessed reference.
I'll continue to develop this example as we progress into the extended study of the Perl module, and the object-oriented features of Perl programming.