How do you read an entire file into a string?
Method #1: Use the kernel system call.
Sample code:
# Read entire file into string:
$output = `cat sample_file.txt`;
Comment:
This is the simpliest method, but NOT recommended. There are many issues with this, such as platform-dependent, and bad programming practice (launching a shell to get the output).
- Windows won’t be able to run this.
- It launches a new shell (bad programming practice) to get the output.
Method #2: Open the file through Perl.
Sample code:
# Read entire file into string:
open(FILE, "sample_file.txt") or die "Error: no file found.";
$output = do {local $/; <FILE> };
Comment:
* Recommended.
This is the recommended method. The reason this works is due to the special character $/ defined by Perl.
Normally reading <FILE> returns a line from the file. This is because <FILE> is read until it hits the delimiter defined by $/, which is “\n” by default. By creating a local $/ in the do loop, it reads until it hits at the end of the file, since $/ is undefined.
Examples:
# $/ = "\n", reads until end of line.
$output = <FILE>;
# $/ = "c", reads until it hits 'c'.
$output = do {local $/="c"; <FILE> };
# $/ is undefined, reads until eof.
$output = do {local $/=; <FILE> };
Method #3: Use File::Slurp.
Sample code:
# Read entire file into string:
use File::Slurp qa( slurp );
$output = slurp("sample_file.txt");
Comment:
* Highest performance.
This relies on File::Slurp Perl package to read efficiently an entire file. Though one could use read_file() defined by Slurp, it’s better to use slurp() because it will be supported in Perl 6 as part of the standard package.
Since it isn’t a standard function, File::Slurp will have to be installed on every machine that use this code. Because of this issue, method 2 is preferred. However, if performance is crucial in the program, then use this method.
Install this module is very easy. Download through the link, extract and run the commands (as root):
$ perl Makefile.PL
$ make
$ make install
Nowadays linux distrubutions have easy package installers. Glancing at Ubuntu, I found this command to install File::Slurp:
apt-get install libfile-policy-perl
Final comment:
The first two methods are NOT a good method to read really large files. Some claim that the 3rd method can handle large files efficiently, though through some experimentation I haven’t reproduced the desired result. Ideally it make sense: Perl’s I/O operation is not as efficient, and Perl::Slurp tries to bypass this by using sysread() command.
Running a quick test, I found the performance between these three methods when reading a 100 MB text file:
Method #1: 1.450 seconds
Method #2: 0.754 seconds
Method #3: 0.744 seconds
This was too fast for checking memory usage, but reading a 500 MB file showed that the program used up to 70% of my memory resources of a 1.25 GB RAM laptop.
For more details on this topic, this is a decent article about trying to increase performance for large files. Though out of date on some issues (File::Slurp implementation has improved since this article was written), it has some good data.
Addendum added Aug 8th, 2009 (3rd method and expanding on final comment).
I prefer using File::Slurp.
I thought of adding that option, but you would need to install the module since it isn’t part of the regular perl package.
For a single user, it’s easy to install the needed module, but after maintaining multiple computers, it gets annoyed having to install a customized Perl environment for each machine.
Granted, it isn’t too bad nowadays with easy package installers.
I’ve added an addendum to this post with more info.