Coding Domain

Perl Programming: Working with Files


Working with files in Perl
This tutorial shows you how to work with files in Perl. This is one of very fundamentals of the Perl language, since it's originally designed to make nice reports out of an amount of data.

Opening a file
Working with files is very easy in Perl. To open a file, you need to assign a handle to a filename. If that operation succeeds, your Perl program can read or modify your file through that file-handle. This is the general syntax used to open a file, named file.txt. We don't specify the path where the file is located, so the file is assumed to reside in the current-directory. That is most likely the same directory where the Perl program is located.

open(FH, "file.txt") or die("Can't open 'file.txt': $!");

What happens here?
The open function attempts to open a file named file.txt. If that works, a handle named FH will refer to the file. The reference maintains a connection with the file system at the hard drive. The open function now returns 1 (true), and there is no reason to execute anything at the right side of the or operator. An or operator evaluates to true if one of the sides evaluate to true, what just happened.

However, if the open function fails, Perl executes the part at the other side of the or operator. The die function will terminate our program, with the error message as specified. When function fails, most of the time, the $! variable contains a detailed error message.

It's not such a good idea to omit the error checking. The fact that Perl just continues executing your script, is properly not what you want to happen. If you don't want to terminate your program when an open function fails, it's better to test the success of open in a if statement, like this:

if(open(FH, "file.txt"))
{
  # more code in the if BLOCK
}

Maybe you want to replace the or operator with the || (C-style or operator). However, that causes problems when you omit the parenthesis. Perl allows you to omit them, since Perl knows how many parameters the open function requires. The || operator has a higher priority. That means it executes before the open statement. At that moment the operator changes the parameters, not testing the return values.

Writing a file at the console
Now let's write a file at the console. The listing below is a complete Perl program.

#!/usr/bin/perl -w            # UNIX: put the path to the Perl interpreter here.

print "Enter filename: ";     # Ask the user what file should be displayed
my $FileName = <STDIN>;       # Read a line from the standard input (normally keyboard)
chomp $FileName;              # Remove the training \n

open FH, $FileName or die "Can't open $FileName: $!\n";
  print <FH>;                 # Read all line from the file, and print the array of file lines
close FH;                     # This is very important. Close the file-handle connection

exit;

Other methods for Reading
The diamond <> operator can be used to read from a handle. This could be the standard input, but also a file or network socket connection. I've left all the parenthesis. You can add them around all the parameter lists off course.

The print function does something special. It can accept an array as parameter. The diamond operator, reading the handle <FH> will produce an array containing all the lines of the file. The print function them prints all the lines.

Although this is very fast, it could use a lot of system memory if the file is very large. Using a different approach, we can read one line at the time, and processing it. The diamond operator will read only one line, if you try to assign it to a scalar variable.

open(FH, $FileName) or die "Can't open $FileName: $!\n";
  while( my $Line = <FH> )    # Read a line while there still are lines to read
  {                           # $Line is declared in this BLOCK, using my
    print $Line;              # The result of the <FH> is assigned to $Line
  }
close FH;

For anyone preferring a very short style, this does the same trick, using the special $_ variable as placeholder for the data. The print function uses the $_ variable if you don't provide an argument list.

open FH, $FileName or die "Can't open $FileName: $!\n";
  print while <FH>;
close FH;

Writing to files
If this is all clear to you, we can start with writing data to files. There is just one other thing we should be mentioned about: Locking! Normally, forgetting to lock a file isn't that bad. However, if two programs try to access the file at the same time, what would happen? This is a serious problem for web sites with lot's of visitors, where the same program runs parallel to an other request using the same program. Something needs to tell one of the programs the file is in use.

Below, you can see 4 samples of working with files. I don't think they need much more explanation.

Reading a file Write new contents into a file
use Fcntl qw(:flock);

# Opens only if the file exists
open FH, $File or die $!;
  flock(FH, LOCK_SH);   # LOCK_SH
  @Lines = <FH>;

  # Perl as of 5.004 unlocks automatically
  flock(FH, LOCK_UN);
close FH
use Fcntl qw(:flock);

# Erases (truncates) or creates the file
open FH, "> $File" or die $!;
  flock(FH, LOCK_EX);   # LOCK_EX required!!
  print FH @NewContentLines;
close FH
Append new contents to a file Read and write without re-opening
use Fcntl qw(:flock);

# Opens (for append) or creates the file
open FH, ">> $File" or die $!;
  flock(FH, LOCK_EX);
  print FH @LinesToAppend;
close FH
use Fcntl qw(:DEFAULT :flock);

# This code increments a number
# at the first line of a file
# the :DEFAULT at the first line adds the O_*
# constants to the Perl Program

# Create file and open for read/write
sysopen(FH, "numfile.txt", O_RDWR|O_CREAT) or die $!;
  flock(FH, LOCK_EX);

  # Read all the lines
  my @Lines = <FH>;

  # Update the first line
  $Lines[0]++; # \n automatically removed
  $Lines[0] = "$Lines[0]\n";

  # Erase the contents of the file
  seek(FH, 0, 0) or die $!;
  truncate(FH, 0) or die $!;

  # Write the new 'data stream' back
  print FH @Lines;
  flock(FH, LOCK_UN);
close FH;

Advanced: Binary Files
If you're unfortunate enough to be running Perl on a system that distinguishes between text files and binary files (modern operating systems don't care), then you should use a binmode FH; line to deal with this. The key distinction between systems that need binmode() and those that don't is their text file formats. Systems like UNIX, MacOS, and Plan9, which delimit lines with a single character, and which encode that character in C as "\n", do not need binmode(). The rest, like Windows, need it. If you work with binary files, always use binmode(). Perl does the rest for you. In binary files, you pack your data (using pack) to store them as bytes, and not tekst.

Some misc things about files

Written by Diederik van der Boor at 11 November 2001