Good PHP: Coding Standards and Why You Should Follow Them

Oct 29 2011

In hindsight, I'm surprised how long it took me to develop a strong appreciation of code formatting standards. It's not that I haven't followed them all along (most IDEs and editors do the lion's share of that for you). What surprises me is that I never really appreciated the value of following them. But managing a codebase of over a million lines makes it readily apparent that coding standards are a big boon -- and that lapses in those standards adversely impact the entire team.

The primary reason for coding standards is this: humans are worse at syntax parsing than machines are. Coding standards exist to make the code easier for humans to work with, and they do this by making the code more amenable to visual scanning.

There are four benefits to be gained from following coding standards: reduction of bugs, preventing new bugs, lowering the learning curve, and easing long term maintenance. I discuss them below. <!--break-->

Reduction of Bugs

Clarity is the enemy of bugs. Where the code is murky, bugs flourish.

Here's an example of a code practice that seems to creep into PHP code from time to time, further obfuscated in a very real-world way by Drupal-ish data structures:

<?php
function mymodule_get_version($data) {
  return $data['field_version_number'][0]['value'] ? $data['field_version_number'][0]['value'] : $data['field_old_version_number'][0]['value'] ?     $data['field_old_version_number'][0]['value'] : 0;
}
?>

The code above is supposed to return the version number if it exists. If no version number exists, it returns the old version number if it exists. And if neither a new nor an old version number exists, it returns 0.

There is a bug in that code. What is it? To find it, let's reformat the code to follow good code formatting practices. Specifically, we will apply the following:

  • Use variables instead of repeated array dereferencing.
  • Remove the nested ternary.
  • Eliminate (at least some of) the ambiguity of the tests being done in conditionals.

And here's what we get:

<?php
function mymodule_get_version($data) {
  $version = $data['field_version_number'][0]['value'];
  $old_version = $data['field_old_version_number'][0]['value'];

  $final_version = 0;

  if (!empty($version)) {
    $final_version = $version;
  }
  elseif (!empty($old_version)) {
    $final_version = $old_version;
  }

  return $final_version;
}
?>

But wait! The bug is gone! Why? Because what we were actually doing in the original code is this:

<?php
function mymodule_get_version($data) {
  $version = $data['field_version_number'][0]['value'];
  $old_version = $data['field_old_version_number'][0]['value'];

  $final_Version = 0;

  // THIS is the first check run...
  if (!empty($old_version)) {
    $final_version = $old_version;
  }
  // And the new version is only checked if an old version wasn't found.
  elseif (!empty($version)) {
    $final_version = $version;
  }

  return $final_version;
}
?>

But that was obscured by the poorly formatted code. (If you don't believe me, test it.)

Prevent New Bugs

I see this simple mistake all the time, and I've even accidentally committed it several times:

Original code:

<?php
if ($foo)
  _do_something();
?>

Now a new feature request comes in... and we need to add another step:

<?php
if ($foo)
  _add_new_step();
  _do_something();
?>

Oops! Now _do_something() is run regardless of whether $foo is TRUE.

Maybe it's Python syntax haunting me, or maybe I really am that obtuse... but I've been guilty of introducing bugs by simply not noticing that there are no curly braces. And since that code is perfectly valid, I'm not going to get a syntax error or anything.

Of course, the easiest way to prevent this is to follow the standard that any if statements should always use curly braces, even for simple tests:

<?php
if ($foo) {
  _do_something();
}
?>

Drupal coding standards require this, and I believe that this is the reason why.

Lower Learning Curve

Learning a new codebase can be daunting. One must learn new patterns, overall architecture, classes, methods, and functions, terminology, and so on.

Why make it harder by requiring that the new developer also spend time figuring out what a particular piece of code is supposed to be doing?

Here's an example of code whose formatting significantly decreases understandability:

<?php
$i = 0;
$total = array();
$list = array(1, 7, NULL, 4);

if ($list) while ($val = $list[$i++]) if ($val) $total += range(
  0,
  $val,
  1
);
print_r($total);
?>

Yes, this is taken from several real-world examples of code formatting that I have seen. I've seen variations of the above that span hundreds of columns and (I kid you not) 20+ lines.

There are a several things that are confusing about the above:

  • Superficially, it looks like a simple conditional. (Imagine scanning down the left-hand side of a large piece of code. Would you notice that there are three control structures here?)
  • The if/while/if all on one line requires the reader to pay close attention.
  • There are multiple operations all happening on that one line.
  • The range function is split onto four lines (presumably to keep the other line from getting too long), making a standard (short) function call jarringly long.
  • The indentation is actually misleading, as it makes the function arguments appear to be the body of the outer if's block. In fact, the code should be indented 3-4 indentations -- one for each control structure plus one for the function call args.

Because of all of this, a very simple operation is being obfuscated. The above could actually be done far more efficiently in other ways, but in order to even see that, one must cut through the formatting issues. Why make something hard that could be so much easier?

Ease Long-Term Maintenance

Finally, when you follow coding standards, you're doing yourself (and your team) a favor. You're making it easier on future-you. When you come back to the code six months later, you won't be thrown off by your own past-cleverness. Your life will be easier. And your team (should you have one) won't curse you for making their lives harder.

Few people like maintaining code. So why make that part of your job any harder than it is?

Misconceived Counterarguments

These are arguments against coding standards that I have heard:

"Well formatted" code introduces bloat

Presumably, "bloat" here is supposed to mean non-essential curly braces, whitespace, or line count.

But that's not what we usually mean by bloat. This code isn't adversely impacting either the readability or the execution time of the code. If files are longer because of coding standards, they are longer in a good way. They are longer so that the code is more manageable and less prone to bugs. That is a standard goal in software development, and most certainly is not "needless bloat".

All that extra stuff slows down the parser

Do a few lines of poorly formatted code make the code parse faster?

I have never seen benchmarks that support this theory. When I tested some using compressed and uncompressed versions of QueryPath (and we're talking about removing ALL nonessentials from tens of thousands of lines of PHP code) I found no substantial differences in loading/execution time.

Furthermore, since most sites use opcode caches, even if there was a runtime slowdown the first time a file was parsed, it would vanish after that initial load. Subsequent loads would use the cached opcodes.

Writing complex lines of code shows that I am a better developer than others

I know some developers think this, but several years ago a developer said this to me. That was his justification for writing massive series of nested ternaries.

I came away from the conversation (and the code) thinking precisely the opposite -- that he wasn't a terribly good coder, and that he tried to hide this fact with needless complexity. He wanted others to see the code and say "hmmm... that looks complex. It must have been written by a master." Instead, the code was just frustrating.

Conclusion

Sometimes it feels like a needless pain. Sometimes it feels like wasted effort. Sometimes all those extra spaces and parens feel somehow wasted. But they're not. Code is an intermediate language -- one designed for both humans and machines. Making the code machine-friendly is only half of the equation. Paying attentio to the other half isn't a waste. It's a long-term investment in continued quality.