Coding Domain

Perl Programming: Huge Overview of Perl Language


Context
Lot's of programs have been written in Perl. The name Perl properly sound familiar to you. Well, at least when you are familiar with the CGI part of the Internet, or work with ShellScripts in UNIX.

If don't know anything about CGI, you may want to read this article aswell. That article explains what a CGI script can do in your website.

Many people don't know why Perl is so popular, and what you can do with it. This article shows a lot of coding tricks in Perl.

Introduction
Why Perl? Almost every company that programs software, uses Perl. Perl is used for financial, constructional, genetic, military and CGI programs at the World Wide Web off course. Perl programs can be very robust, easy to use and implement. Just to mention you about it, without Perl (almost) no Linux application will ever run!

Large software programs are also written in other languages, like C. What does Perl add? Perl is so widely used because it's ideal for linking things together. Theoretically, you can write large software in Perl, but most that software already exists.

The scalable of Perl, and the flexibility in it's programming code means that Perl programs can run at a dozen of platforms. The programming code can be rewritten in various styles each performing the same task.

The fun of Perl
Why do I like Perl that much? Well, here are some styles you can write your code in. Most of the styles in Perl are based on English grammar, not based on previous C like languages. A friend of mine, mentioned about Perl as 'Programming fun for the whole family'. Especially the control structures in Perl are worth looking at the remaining part of this document.
Here is a hudge table containing a summary of Perl styles, for those who already have some programming experience.

Perl Statements Description



Scalar Variables  
$Variable1 = 2;
$Variable2 = 3.5;
$Variable3 = "Hello";
$TenStars = "*" x 10;
$Var1 = $Var2 = $Var3 = 0;

my $YourInput = <STDIN>;
print $YourInput;
chomp $YourInput; # Remove line breaks;
Assign a value to a 'scalar' variable
$String1 = "The value $Variable1 is assigned";
$String2 = 'literal string, no $ vars';
$String3 = qq[Var String];
$String4 = qq~Var String~;
$String5 = q[Literal String];
$String6 = q^Literal String 2^;
Different String quotes. Double quotes, or qq operators can include variables, that will be replaced by their values. Using a q or qq operator means you can choose your own string quotes!
{
  my $NewVar = 'value'; # Only in this block
  {
    # Here I also know $NewVar
    my $NewVar = $NewVar; # New copy
    local $_; # Makes $_ local here
  }
}

# Here I don't know $NewVar
# and $_ has his previous value again.
Declare variables in BLOCKs with my. You can start a BLOCK at any location in your program. A variable with the same name can be declared within a sub BLOCK, and the value of that variable in a higher BLOCK can be assigned to it. You can actualy make a copy with the same name of that variable.

You can do the same with non-declarable variables like $_ using local. In that case, we don't alter the value from that variable used in a higher BLOCK.



Arrays and Lists  
@Array = ("item0", "item1", "item2");
@SameArray1 = qw(item0 item1 item2);
@SameArray2 = qw/item0 item1 item2/;
@TwelveNines = (9) x 12;

@Items_1to3 = @Array[1,2,3];
@Items_3to5 = @Array[3..5];
($Item0, $Item1, $Item2) = @Array;

$Item1 = $Array[1];
$LastItem = $Array[-1];

$Items = @Array;
print scalar @Array;

$LowerBound = $[;   # A setting in Perl
$UpperBound = $#Array;

my @AllYourInput = <STDIN>;
print @AllYourInput;
chomp @AllYourInput; # Remove line breaks
Working with array's. qw works just like qq; you can choose the quotes. With qw a space will be used as element/item separator in the assigned list.

With that print statement, a scalar sub call forces a scalar variable context, not array context (print prints the contents)

Some other thing, not demenstrated here, is the fact that you can name an array the same as a scalar variable. Perl will always see the difference, using the variable's context. The prefix doesn't tell what type of variable it is. It only adds more context information.

@Sorted = sort @Array;
@Reverse1 = reverse(sort(@Array));
@Reverse2 = reverse sort @Array;

@SortWithAlgorithm = sort {
  return 1 if $a > $b;
  return 0 if $a == $b;
  return -1;
} @Array;
Sorting array's. An compare algorithm can be passed through the sort function.



Hashes  
%Hash = ("key0" => "value0", "key1" => "value1");
$Value = $Hash{'key0'};

@Hash{@keys} = @values;
@Hash{ qw(key1 key2 key3) } = qw(v1 v2 v3);

@Keys   = keys @Hash;
@Values = values @Hash;

if(exists $Hash{'key0'}) {
  code
}

delete $Hash{'key1'};
Working with hashes. A hash contains values linked to keys. (like an HashMap Object in Java) Many Perl programs use them, because of their (relative) efficiency, and usability.

Seeking an value linked to a key can be done a lot faster, then iterating through an array, comparing each item. That's one of the powers in Perl. For example, to check for double elements, simply use an hash; $seen{$item} = 1; and if($seen{$item}) ... or if(exists $seen{$item}) ....

Strange at it might seams, we the @ prefix is used at the third and fouth line, forcing array context. The { } chars say to Perl that it's actually a hash.

Iterating through hashes is discussed somewhat below this document.



References  
$Ref = \$Variable;
$ValueOfRef = $$Ref;

$Ref2 = \@Array;
($Item1, $Item2) = @$Ref2[1..2];
$Item1 = @$Ref2[1];
$Item2 = $Ref2->[2];

$Ref3 = %Hash;
$Value2 = $$Hash{'key2'};

$Ref4 = [ 'item0', 'item1', 'item2' ];
$Item1 = $Ref4->[1];

$Ref5 = { key1 => 'value1', key2 => 'value2' };
$Value2 = $Ref5->{'key2'};

$Ref6 = \&subroutine;
&$Ref6('arg1', 'arg2');
$Ref6->('arg1', 'arg2');

$Ref1b = $Ref1
$$Ref1 = 'new value';
print $$Ref1b; # $Ref1 and $Ref1b point to
               # the same memory location
Taking a reference to a variable or subroutine.

A reference is a variable that doesn't hold the actual data, but knows where it find it. A reference is sometimes easier to handle by Perl, since you don't need to copy the entire data, but only the memory location stored in the reference.

That also means that when you assign a reference variable to another variable, that second variable will also be a reference to the same location. However, by derefering the variable first, you assign (actually copy) the data to the new variable. Deferering can be done by putting an extra $ before the variable name.

Special list characters can be used to make the reference directly.

  • A [ ... ] creates an reference to an array.
  • A { ... } creates an reference to an hash.
  • A sub { ... } creates a new subroutine accesable through the reference. You'll have to use the sub keyword as right-value, without giving a name to the subroutine to make this work.

Unlike C, Perl can clearup all the references by itself, using a carbage collector.



Conditional Statements  
if(expression) { code } The normal if statement.
$X = (expression) ? 23 : 12;
(expression ? $X : $Y) = 2;
Short if statement. The second left-side version is not supported in C.
unless(expression) { code } A unless statement in Perl.
code if(expression);
code if expression;

code unless(expression);
code unless expression;
Change it, just like you would write it in grammatically correct English!
if(expression) {
    code line 1;
    code line 2
}
elsif(expression) {
    alternative code
}
else {
    other code
}
Full, complete if statement. The last statement in a { } block never requires a closing ;



Illiterating Loops  
for ($I = 0; $I < 9; $I++) {
    repetitive code
}
A pretty normal C like for statement. I call it a look-a-like while statement.

However, I never use it in Perl. There are nicer loop structures avaliable. Even converting an array to an hash first, and iterate though that appears to be faster then iterating though that array using an increment of $I+=2.

foreach (0,1,2,3,4,5,6,7,8,9) {
    print   # the var $_ is used now
}

foreach my $I (0,1,2,3,4,5,6,7,8,9) {
    print $I;
}
A much better readable for statement (foreach actually). When no iterator variable is specified, $_ will be used. The print statement also uses that variable when no arguments are passed through.
foreach (0..9) { print }
for (0..9) { print }
foreach my $I (0..9) { print $I }
Even more compact statements. The for and foreach statements can both be used in the first case.
for (my $I=0; $I < @Array; $I++) {
    my $Item = $Array[$I];
    repetitive code
}

foreach my $Item (@Array) {
    repetitive code
}
Illiteration through array's.

Isn't that second statement much better readable?

foreach my $Key (keys %Hash) {
  my $Value = $Hash{$Key};
  code
}

while(my($key, $value) = each(%Hash)) {
  code
}
Illiteration through hashes. Very simple actually.
print foreach @Items;
Also grammatically correct English. I find this unreadable in most cases, but sometimes I use it. That typical situation would mostly be: "replace some text for each of the following variables: ...". This is possible when both statements use the $_ variable.



Conditional Loops  
while(expression) {
    code
}

until(expression) {
    code
}
Normal While and Until Statement. The code is evaluated while (or until) the expression is true
do {
    code
} while(expression);

do {
    code
} until expression;
While Statement, but block is evaluated once before the condition is tested. In this case the ( ) characters may be left out, since Perl already knows where the statement ends.
code while(expression);
code until(expression);

code while expression;
code until expression;
Also grammatically correct English
$FileName = qq[It's a "Cool" file.txt];

open(FILE, $FileName) or die "Open fail: $!";
while(my $Line = <FILE>) {
    do something with $Line
}
close(FILE);
Pretty cool way of reading a file. A line is assigned to $Line, which is visible only within the while BLOCK. Then the value is tested in the while, so we stop at an end-of-file symbol.

However, please note that we removed the file locking statements, that are very important in multi-user systems!

LINE: foreach my $Line (@FileContents) {
    find/replace text in a line

    next LINE if it is a comment-line;
    redo LINE if more text found;
    last LINE if everything we need is found;
}
Useful statements to jump within BLOCKs.



Subroutines  
sub NewSub {
  my($Arg1, $Arg2, $Arg3) = @_; # Copy args

  subroutine code

  return @Results if wantarray;
  return $ScalarResult;
}
Quite some subroutine. The parameters are stored into @_. First I copy them into local variables.

Strangely, the elements in @_ are linked to the variables passed through. So changing one of those values, will also change the original variable passed through. This does not work anymore for copies of the variable.

The subroutine can even determine whether it's caller expects an array or scalar return value.

sub NewSub ($) { more code }
sub NewSub ($;$$%) { more code }
sub NewSub ($@) { more code }
sub NewSub (*&) { more code }
sub NewSub ($\$$$) { more code }
Subroutines prototyped to receive a number of (optional) arguments of a certain data type. This can be a scalar, array, hash, handle, subroutine pointer or a reference to any of those types.
NewSub;
NewSub();

&NewSub      # Current @_ contains args
&NewSub();   # Sub not declared yet

&NewSub($Arg1, @OtherArgs, $LastArg);
NewSub(@AllTheArgs);

SubThatExpectHash(%Hash);
SubThatExpectHash('key1'=>"value1", 'key2'=>"value2");
Some of the various ways for calling subroutines. There are even more possibilities. As you can see, it doesn't matter whether you pass an array, hash or scalar variable. They are all mixed and concatenated into the @_ variable for the subroutine.



Regular Expressions  
$_ = "Hello there";
if( /Hello/ ) # Hello found in string
if( m(Hello) ) { code }

$SearchIn = qq[...lots of text...];
if( $SearchIn =~ /Hello/ ) { code }
@Matches = ($SearchIn =~ m/Hello/)
Pattern matching in Strings. The m operator allows you to use different separators. Using an =~ or !~ operator makes m search in that string. Otherwise, $_ is used.

$URL = qq[http://www.website.com];

# Is it a http or https URL?
if( $URL =~ m[^http(s?)://(.+)\.(.+)$] ) { code }

if( $Email =~ m/^(.+)\@(.+)$/) {
  print qq[Name:  $1\n];
  print qq[Domain: $2\n];
}

if( $SearchIn =~ m/\QHello\E/) { code }

$Is_PC_IP =~ m#(^\d{1,3))\.(\d{1,3)).(\d{1,3)).(\d{1,3)$)#;

@StartWithF = m/\W(f\w+)/i;

# Finds telephonenumber in an entire document
@TelNumberElem = m/(\d{3})-(\d{3})-(\d{4})/;

$HeOrHi = m/h[ie]/;
$HelloOrHi = m/hello|hi/i;
Regular Expressions can contain meta characters for searching. This might seem slow, but I never wrote an algorithm with BASIC like search functions, that executes faster then a regexp.

After the match, the variables $1 to $9 contain the strings between the parenthesis. If a array context, all results are returned as array. Otherwise, a boolean success will be retured.

Actually, you need to use \Q in the regexp to make a literal string search.

$FindReplaceIn = s#\s(albert|zedd)\s#Welcome $1#;
Substitutions. By putting a s/findregexp/replace/ operator arround the regexp, you can replace text.
  • The s/find/text/g replaces all occurences.
  • The s/find/text/i does the replacement incase-sensitive.
  • The s/find/function(args)/e uses a function to determine what the replacement should be.
  • s/find/function(args)/egi combines those off couse.
  • Some other rare other flags can be found in the perldoc manpages.


Some Final Notes
As you can see, Perl makes programming fun. With easy statements, based on the English grammar, Perl is my favorite language. Please also check the Perl Programming Style documentation. Although you can program things the way you want it, there still exists a style preferred by Larry.

Written by Diederik van der Boor at 13 October 2001