Documenting PHP with Doxygen: The Pros and Cons

Feb 1 2012

It's been a few years, now, since I gave up using PHPDocumentor to document my PHP projects. I switched to [Doxygen](http://www.stack.nl/~dimitri/doxygen), an automated documentation tool that supports a wide variety of languages, including PHP. While PHPDocumentor enjoys broad support in the PHP community, Doxygen, too, is well entrenched. (Drupal uses it.) I recently began a new project from scratch, and it gave me an opportunity to once again turn a hard gaze upon Doxygen. After some careful reflection on my experiences developing this new medium-sized library and documenting it with Doxygen, here are what I see as Doxygen's strong and weak points when it comes to PHP API documentation. ## The Pros ### Speed! Proofing documentation is enough of a pain as it is. But waiting a minutes for phpdoc to run was just downright aggravating. I often found myself kicking off a process and then going to do something else, and in the process forgetting all about my documentation tasks. It was a colossal waste of time. Doxygen is blazingly fast. In my current setup, it generates graphics (class diagrams) along with the main documentation, and still does this in just a few seconds. Usually, regenerating documentation is so quick that I can't hit my browser's reload button before the generation has finished. ### Autolinking One of the most time-saving features in Doxygen is its autolinking support. As it analyzes comments, it looks for specific patterns -- camelCasing, parens(), and name::spaces to name a few -- and attempts to determine whether those strings are actually references to existing classes, functions, namespaces, variables, etc. If it determines that they are, it automatically generates a link the the appropriate generated documentation.

This is a fantastic feature, saving developers the time it would otherwise take to make this relation explicit.

While Doxygen sometimes gets false positives (and, occasionally, misses what seems to be an obvious case), this feature saves me copious amounts of time. I'm willing to accept a few misses.

Rich Command (Tag) Library

Doxygen has an astounding number of available commands. You're not limited to things like @param and @author. It has support for sytax-highlighted code blocks (@code and @endcode), subgrouping (@group, @ingroup, and so on), callouts (@todo, @attention, @remark, @bug), and extra stand-alone documentation that is not bound to a piece of source (@mainpage, @page, and so on). If you have a few hours to kill, you can peruse the entire list here: http://www.stack.nl/~dimitri/doxygen/commands.html.

In addition to this command goodness, Doxygen supports a subset of HTML tags, too. And this includes not just bold, italics, and fixed fonts, but also tables and images.

The Source Should Be as Readable as the Generated Docs

I like HTML, PDF, and man page documentation. I use these frequently. But when I'm actively working on a piece of code, it's far more likely that I will read the documentation in the source, rather than in an external web browser. So in my mind it is important that the source code be readable

How many times do you find yourself writing lists -- ordered or unordered -- in your documentation? I do it quite frequently. And I don't want to have to write HTML tags for my lists. I want theses lists to be just as easy to read in the source.

Doxygen makes writing lists as easy as it is in Markdown:

- this
- will
- be
- a 
- bullet
- list

and

-# this
-# will
-# be
-# a
-# numbered
-# list

Advanced Support for Objects and Classes

So much PHP is now done using its bolted-on object and class structure. Whether or not this feels bolted on in the language itself (and it does), it should feel natural in the documentation.

Doxygen accomplishes this with some great features:

  • When a class inherits from another class or interface, documentation is inherited too. No need to explain the same method twice, even if you did override a superclass method.
  • Classes can be navigated (in generated documentation) by their class hierarchy, by class members, by namespace, and alphabetically by name.
  • Methods are sorted into categories -- constructors/destructors, public, private, protected, static, and so on. Properties and constants are sorted thus, too.
  • AND Doxygen can generate graphical glass diagrams. I know, it's eye candy. But beautiful functional eye candy.

Elegant UI Out-of-the-Box

Okay, I know this is kinda whiny, but I hate ugly documentation. I want API documentation to be attractive and I want findability to be top priority.

Doxygen has a very nice default theme (you can change it, of course). The colors are attractive, the fonts are clear, there aren't numerous cases of markup overflow... it's nice. Customizing colorscheme, logo, and so on is very easy -- it's done via configuration parameters. As I understand it, if you would like to write a full set of HTML templates to replace the defaults, you can do this too.

But findability is clearly the top priority, as it should be.

  • There's a built-in JavaScript-based search engine.
  • Navigation is done with a JavaScript-enabled tree structure
  • There are no less than eight different paths to navigate into the documentation, ranging from by-file to full alphabetical index.

The Cons

The Dastardly Backslash

One of the worst decisions I think the PHP community has ever made is adopting the backslash () character as a namespace separator. Already used for escape sequences and Windows paths, backslash has become a source of frustration in more than one aspect of my PHP coding (for starters, my IDE keeps thinking I'm trying to escape stuff). I abhor using it.

But it gets worse! Doxygen, too, assigns special meaning to the backslash: Doxygen recognizes it as initiating a command. Yes, either @ or \\ can be used to declare a command. @param and \\param are semantically equivalent to Doxygen. And this is the root of much documentation confusion. For consider any case where you want to reference a PHP namespace in your documentation.

You write:

<?php
/**
 * @see \Foo\Bar
 */
?>

And doxgen sees:

<?php
/**
 * @see @Foo @Bar
 */
?>

This not only causes Doxygen to emit errors, but it also munges up the output. The generated documentation will display something like "See: Foo" or sometimes just "See:".

The simple solution to the backslash problem

Doxygen does have a simple solution. It abstracts the concept of namespacing at a fairly high level, so you can use alternate namespacing separators instead of the backslash. Both :: (double-colon) and # (hash) seem to work in this capacity. Thus I have now developed the habit of documenting namespace references like this:

<?php
/**
 * @see Foo::Bar
 */
?>

PHP Ambiguity

Let's be honest, PHP isn't exactly a first-class citizen in the Doxygen world, and the language's ambiguities sometimes keep it from working in exactly the expected way.

The place where this really shows is in the way @param and @return are processed. Sometimes PHP developers include typing informat in these directives, and sometimes they don't.

Here is documentation with type information:

<?php
/**
 * @param string $foo
 *  An input string.
 * @return string
 *  An output string.
 */
?>

Here is the same documentation without:

<?php
/**
 * @param $foo
 *  An input string.
 * @return
 *  An output string.
 */
?>

From a parsing perspective, the @return tag is problematic, for there are no clear lexical markers indicating whether there is type information. (Some of us conventionally use a newline, but this is not a standard).

To solve problems like this, Doxygen adds extra commands. For example, for typed return information in PHP, you should use @retval instead of @return.

<?php
/**
 * @param string $foo
 *  An input string.
 * @retval string
 *  An output string.
 */
?>

Doxygen will correctly parse @retval in such a way that it preserves the type info.

A related issue is that Doxygen limits its types to the PHP primitives, including resource, object, and array. But it doesn't supported setting namespaced class names as the return type.

Conclusion

Doxygen is feature rich. We didn't even cover some of its advanced options, like generating PDF files or man pages. Nor did we look at its ability to provide supplemental documentation. But it should be clear that it is an amazing tool for API documentation generation.

It has its glitches and drawbacks, but in my mind these are outweighed by its benefits.