By Matt Butcher
Compressing PHP source code
I've long been toying with the idea of creating a PHP compressor that would combine multiple source files, remove comments, and minimize whitespace. The point of such a compressor isn't obfuscation, but just minimizing code space for libraries. (Hypothetically, it should cut down on runtime... but I suspect the performance impact is minimal).
A few readers suggested a better way of doing this. See the update!
Last night I spent a few hours building a proof of concept. It's not yet a generalized tool, and it has some known issues (such as an inability to deal with inline C-style comments), but it does an admirable job of doing the following:
- Removing multi-line C-style
/* */comments. - Removing
//comments. - Compressing whitespace and newlines.
- Combining all required/included files into one file.
The example code I have developed is successfully compacting QueryPath from multiple files to a single file. Of course, no special decompressing code needs to be written -- the output is still 100% valid PHP code.
Here's the rough code:
<?php /<strong> * Compact PHP code. * * Strip comments, combine entire library into one file. */ $source = '../src/QueryPath/QueryPath.php'; $target = 'QueryPath.compact.php'; include $source; $files = get_included_files(); $out = fopen($target, 'w'); fwrite($out, '<?php' . PHP_EOL); fwrite($out, '// QueryPath. Copyright (c) 2009, Matt Butcher.' . PHP_EOL); fwrite($out, '// This software is released under the LGPL, v. 2.1 or an MIT-style license.' . PHP_EOL); fwrite($out ,'// <a href="http://opensource.org/licenses/lgpl-2.1.php'" title="http://opensource.org/licenses/lgpl-2.1.php'">http://opensource.org/licenses/lgpl-2.1.php'</a>); fwrite($out, '// <a href="http://querypath.org.'" title="http://querypath.org.'">http://querypath.org.'</a> . PHP_EOL); foreach ($files as $f) { if ($f !== <strong>FILE</strong>) { $in = fopen($f, 'r'); while (!feof($in)) { $line = fgets($in); $line = clean_line($line); fwrite($out, $line); } fclose($in); } } fclose($out); /</strong> * Main cleaning function. */ function clean_line($line) { static $inComment = FALSE; if ($inComment) { $inComment = strpos($line, '*/') === FALSE; return ''; } // The last condition rules out XPath with //* in them. elseif (!$inComment && strpos($line, '/*') !== FALSE && strpos($line, '//*') === FALSE) { $inComment = TRUE; return ''; } elseif (strpos($line, 'require') === 0) { return ''; } elseif (($p = strpos($line, '// ')) !== FALSE) { $line = substr($line, 0, $p); $line = trim($line); return empty($line) ? '' : $line . PHP_EOL; } elseif (strpos($line, '<?php') !== FALSE) { return ''; } elseif (strpos($line, '//') === FALSE && in_array(substr($line, strlen($line) -2, 1), array('}',':',',',';'))) { $line = rtrim($line); } return ltrim($line); } ?>
While this code is not ready for generalized use, you may be able to take this code here and build your own compressor.
Update: Added code download since the input filters are screwing up the code above.








Code errors
Having a little Markdown vs. Geshi formatting showdown in the code sample above. I'll attach a cleaned up version of the source code shortly.
PHP command line
Hi
What about '#' comments ?
What about if there is no closing '?>' ?
The PHP binary will do this too.
php -w filename.php
(-w Display source with stripped comments and whitespace.)
But as you said "I suspect the performance impact is minimal".
Cheers
Didn't know about -w
I didn't know about the -w flag.
Good points on '#' and '?>'. Since I don't use either in QueryPath, I didn't think of those.
Tokenizer
Parsing this is a silly approach. Use the PHP tokenizer @ http://php.net/tokenizer
Thanks for the pointer -
Thanks for the pointer - there's even an example on how to strip all comments :) - http://il2.php.net/manual/en/tokenizer.examples.php
Thanks!
Thanks to Amitaibu and an anonymous poster for teaching me something new. I've now re-written the compressor using the tokenizer.
Code is available at http://github.com/technosophos/PHPCompactor/tree/master
Post new comment