PHP Stream Filters: Compress, transform, and transcode on the fly.
Your task: In PHP code, open a file compressed with BZ2, convert its contents from one character set to another, convert the entire contents to uppercase, run ROT-13 over it, and then write the output to another file. And do it as efficiently as possible.
Oh, and do it without any loops. Just for fun.
Actually, this task is exceptionally easy to do. Just make use of an often overlooked feature of PHP stream API: stream filters. Here's how.
Stream Filters In Theory
PHP uses the concept of "streams" as an abstraction layer for IO. Reading from and writing to files can be done with streams. Sockets, too, can be read from and written to as streams. FTP and HTTP servers have native stream support. That's why you can write this:
<?php $contents = file_get_contents('http://example.com'); ?>
The above gets the entire contents of the webpage at the destination URL, and reads it as if it were a file on the local filesystem.
You can also open compressed or archive files like Phar, BZip2, and Gzip files, having the content decompressed on the fly.
Stream filters provide one more layer on top of this: They allow you to open a stream, and then have one or more tasks (filters) run on the stream as the data is read from or written to the stream.
For example, you can open a stream to a remote URL to a gzipped file, and have the file uncompressed as it is read.
Stream Filters In Practice
By way of reminder, here's our task:
We want to read a file compressed with BZip2 and transform it into a file where the data is capitalized, and run through the ROT-13 obfuscator (which "rotates" each character 13 places in the alphabet).
As we go, we will also re-encode the file from ISO-8859-1 to UTF-8.
Broken down into a sequence, we will do the following:
- Open a stream for reading
- Open a (plain) stream for writing
- Uncompress the input stream
- Transcode from ISO-8859-1 to UTF-8
- Convert the contents of the stream to uppercase
- Rotate the characters by ROT-13
- Write the file out to a plain text file
- Clean up
With stream filters, this is accomplished by creating a pair of streams, and then assigning filters to each stream. When we copy the data from one stream to another, the filters will be run internally. Other than assigning the filters, we do not have to intervene.
We will begin with the file
test.txt.bz2, which is a bzip2-compressed text file whose contents are
Hello World.. And we will generate a file called
Here's how we do it:
<?php /** * Example of stream filtering. */ // Open two file handles. $in = fopen('test.txt.bz2', 'rb'); $out = fopen('test-uppercase.txt', 'wb'); // Add a decode filter to the first. stream_filter_prepend($in, 'bzip2.decompress', STREAM_FILTER_READ); // Change the charset from ISO-8859-1 to UTF-8 stream_filter_append($out, 'convert.iconv.ISO-8859-1/UTF-8', STREAM_FILTER_WRITE); // Uppercase the entire string. stream_filter_append($out, 'string.toupper', STREAM_FILTER_WRITE); // Run ROT-13 on the output. stream_filter_append($out, 'string.rot13', STREAM_FILTER_WRITE); // Now copy. All of the filters are applied here. stream_copy_to_stream($in, $out); // Clean up. fclose($in); fclose($out); ?>
Now if we take a look at
test-uppercase.txt, we will see that its
contents look like this:
What the code does
The code above basically does the following:
- Open an input and an output file.
- On the input file...
- Assign the
bzip.decompressfilter in READ mode, which will decompress the input stream as it is read.
- Assign the
- On the output file...
iconvto transcode from ISO-8859-1 to UTF-8
- Use the
string.toupperfilter to transform the data to uppercase where applicable.
- Use the
string.rot13filter to obfuscate the contents.
- Then copy the input stream ($in) to the output stream ($out) in one step.
- Finally, close the files.
It is important to note that none of the filters are actually applied until the streams are processed. So it is only when
stream_copy_to_stream() is executed that all four filters are applied.
This method is far more efficient than performing the same operations in a loop because the copying is done at a lower level, where data does not have to be passed into and out of user space. So in addition to being easier to code, it is also faster and less memory intensive.
Some Important Details
Why doesn't stream filtering get used more often? One reason is that the documentation is sparse. To figure out how to use it, in fact, I had to read part of the C source code for PHP. (The unit tests helped a lot, too).
Here are some useful tips, though:
- To find out (roughly) what filters are supported, you can use
- The order of filters can be managed using
- You can even write your own filters, should you so desire.
But one of the most frustrating aspects of the filtering library was figuring out which particular filters are supported. Running
stream_get_filters() returns data like this:
<?php Array (  => zlib.*  => bzip2.*  => convert.iconv.*  => string.rot13  => string.toupper  => string.tolower  => string.strip_tags  => convert.*  => consumed  => dechunk ) ?>
But what do we do with
zlib.*? Here's what I found:
This supports GZip compressing and decompressing.
These support reading to and writing from a BZip compressed stream.
Base-64 and Quoted Printable seem to be the two formats supported by the
The filter format for these is different than the others. It is something like this:
convert.iconv.ISO-8859-13/ISO-8859-15 would convert from ISO-8859-13 into ISO-8859-15.
Presumably, any charactersets recognized by Iconv are supported by the filter.
These perform simple string manipulations:
- string.strip_tags (removes HTML-like tags)
This reads data passed in using the chunked transfer encoding.
I am not sure what this filter is for. The C code looks like it counts the number of bytes consumed during a particular filter run, but I'm not sure what this is used for. Testing it returns nothing.
If you know what this one is for, let me know in the comments.