Coding Domain

Perl Programming: Samples


The program
One day, I've heared a wonderful story about someone trying to parse a virus check log file of 8 MB. Using the programming knowledge he had, and trying all kinds of things in kixtart, and Word (VBA).

The Word VBA program was very slow. That program tried to access each line individually using the paragraphs collection object. However, when you delete a line (paragraph), Word has to update that entire collection. This is a terrible breakdown.

Using kixtart, a script could also parse the log file. Eventually, that took 3 hours! I started wondering if there is a better method to this all...

Welcome to Perl

Withing half a hour I wrote the script, while my computer was downloading the file. I added some Benchmark tests to the script, and started it up. Guess what? The script needed 1.5 seconds to parse the entire file ;)

Here it is!

P.S. UNIX people would properly laugh about this story, knowing AWK provides a simple way of handing this problem.
Running awk '/^\QDirectory / {print $0}' bk2_Vr.log would be my first guess.

logremove.pl
#!/usr/bin/perl -w    ## UNIX: Put path to perl here.


#####
# Remove lines from a logfile.
#
# Copyright (c) 2001 - All rights reserved.
# Diederik van der Boor ("vdboor" at "hotmail.com")
#
# Permission to use/edit this code as long
# as the copyright notice remains intact.
#
#
# Note that...
#
# There is no command-line, or user-iterface.
# Below here some settings van be defined.
#
# Building this core took about 15 minutes.
# Fine-tuning (+variable-setting-features)
# another 15 minutes ;-) Go go go Perl!
#
####



####################################################################
## Requirements

use strict;
use Benchmark;
use Fcntl qw(:flock);
use constant KEEP   => 1;
use constant REMOVE => 0;
use constant TRUE   => 1;
use constant FALSE  => 0;




####################################################################
## settings

my $File      = 'bk2_Vr.log';       # The File
my $Match     = qq[^\QDirectory ];  # Regular expression what to match (see `perldoc perlre` manpage)
my $Action    = REMOVE;             # What to do when match                  (KEEP | REMOVE)
my $Display   = FALSE;              # Should those lines be displayed?       (TRUE | FALSE)
my $Overwrite = FALSE;              # Overwrite the original file when done? (TRUE | FALSE)















####################################################################
## Regexp test

eval{ "aa" =~ m[$Match] };
if($@)
{
  $@ =~ s/at .+ line \d+\.\n$//;
  die "Bad regexp: '$Match': $@\n";
}


####################################################################
## Opening the files

my $Temp = "$File.new";

open(LOG, $File)     or die "Can't open $File: $!";
open(TEMP, ">$Temp") or die "Can't open $Temp: $!";

flock(LOG, LOCK_EX);
flock(TEMP, LOCK_EX);


####################################################################
## Benchmark test

my $BM_StartTime = new Benchmark;

END
{
  if(defined $BM_StartTime)
  {
    my $TimeString = timestr(timediff(new Benchmark, $BM_StartTime));
    print qq[Debugging: Execution time is $TimeString\n];
  }
}


####################################################################
## Finding the data.

{
  $SIG{'__WARN__'} = $SIG{'__DIE__'};
  my $Keep   = ($Action == KEEP);
  my $Remove = ($Action == REMOVE);  # ! $Keep

  my $NumKept    = 0;
  my $NumRemoved = 0;

  while(my $Line = <LOG>)
  {
    if($Line =~ m[$Match])
    {
      if($Keep)
      {
        # FOUND=KEEP
        $NumKept++;
        print TEMP $Line;
      }
      else
      {
        # FOUND=REMOVE
        $NumRemoved++;
        print "  $Line" if $Display;
      }
    }
    else
    {
      if($Keep)
      {
        # NOT_FOUND=NOT_KEEP
        $NumRemoved++;
        print "  $Line" if $Display;
      }
      else
      {
        # NOT_FOUND=NOT_REMOVE
        $NumKept++;
        print TEMP $Line;
      }
    }
  }
  print qq[\nSearch complete; $NumKept line(s) kept, $NumRemoved line(s) removed\n];
}



####################################################################
## File close and update

flock(TEMP, LOCK_UN);
flock(LOG, LOCK_UN);

close(TEMP);
close(LOG);


if($Overwrite)
{
  unlink $File or die "Can't unlink: $!";
  rename $Temp => $File or die "Can't rename: $!";
}


####################################################################

print qq[\n\nDone.\n\n];
print qq[File saved as $Temp\n\n] unless $Overwrite;


Written by Diederik van der Boor at 30 January 2002