Compressing PHP source code

Jun 10 2009

I've long been toying with the idea of creating a PHP compressor that would combine multiple source files, remove comments, and minimize whitespace. The point of such a compressor isn't obfuscation, but just minimizing code space for libraries. (Hypothetically, it should cut down on runtime... but I suspect the performance impact is minimal).

A few readers suggested a better way of doing this. See the update! <!--break--> Last night I spent a few hours building a proof of concept. It's not yet a generalized tool, and it has some known issues (such as an inability to deal with inline C-style comments), but it does an admirable job of doing the following:

  • Removing multi-line C-style /* */ comments.
  • Removing // comments.
  • Compressing whitespace and newlines.
  • Combining all required/included files into one file.

The example code I have developed is successfully compacting QueryPath from multiple files to a single file. Of course, no special decompressing code needs to be written -- the output is still 100% valid PHP code.

Here's the rough code:

<?php
/**
 * Compact PHP code.
 *
 * Strip comments, combine entire library into one file.
 */


$source = '../src/QueryPath/QueryPath.php';
$target = 'QueryPath.compact.php';

include $source;

$files = get_included_files();

$out = fopen($target, 'w');
fwrite($out, '<?php' . PHP_EOL);
fwrite($out, '// QueryPath. Copyright (c) 2009, Matt Butcher.' . PHP_EOL);
fwrite($out, '// This software is released under the LGPL, v. 2.1 or an MIT-style license.' . PHP_EOL);
fwrite($out ,'// http://opensource.org/licenses/lgpl-2.1.php');
fwrite($out, '// http://querypath.org.' . PHP_EOL);
foreach ($files as $f) {
  if ($f !== __FILE__) {
    $in = fopen($f, 'r');
    while (!feof($in)) {
      $line = fgets($in);
      $line = clean_line($line);
      fwrite($out, $line);
    }
    fclose($in);
  }
}

fclose($out);

/**
 * Main cleaning function.
 */
function clean_line($line) {
  static $inComment = FALSE;
  if ($inComment) {
    $inComment = strpos($line, '*/') === FALSE;
    return '';
  }
  // The last condition rules out XPath with //* in them.
  elseif (!$inComment && strpos($line, '/*') !== FALSE && strpos($line, '//*') === FALSE) {
    $inComment = TRUE;
    return '';
  }
  elseif (strpos($line, 'require') === 0) {
    return '';
  }
  elseif (($p = strpos($line, '// ')) !== FALSE) {
    $line = substr($line, 0, $p);
    $line = trim($line);
    return empty($line) ? '' : $line . PHP_EOL;
  }
  elseif (strpos($line, '<?php') !== FALSE) {
    return '';
  }
  elseif (strpos($line, '//') === FALSE && in_array(substr($line, strlen($line) -2, 1), array('}',':',',',';'))) {
    $line = rtrim($line);
  }
  return ltrim($line);
}
?>

While this code is not ready for generalized use, you may be able to take this code here and build your own compressor.

Update: Added code download since the input filters are screwing up the code above.