PHP Arrays are NOT Arrays

Jul 26 2011

I'm continually surprised by PHP programmers who argue tooth and nail that a PHP "array" is a "real array". These same programmers, who often program in at least one other language (JavaScript), seem confused over what an array is and how it ought to work. (One even referred to collections classes in other languages as "needless bloat," a telling symptom of this misunderstanding.) Conversely, new PHP developers who come from other languages are often confused by the fact that PHP arrays don't work as expected. Strange things happen to array ordering and such. The confusion can be cleared up at the terminological level. The simple fact is that PHP arrays are not arrays in the traditional sense. They are ordered hash tables. I will explain with several examples. Among PHP's built-in types, there is a widely-used type called array. It is the only native collection-like type. It's constructed like this: array(). It comes with a wide variety of supporting functions, such as array_walk(), array_values(), array_diff() and array_combine().

At face value, the syntax of an array is similar to other languages:

<?php

// Create an array and assign the first two values.
$foo = array();
$foo[0] = 'First slot';
$foo[1] = 'Second slot';

// Create an array with five elements, and then iterate over the list.
$bar = array(1, 2, 3, 4, 5);
for ($i = 0; $i < count($bar); ++$i) {
  // Outputs '12345'.
  print $bar[$i];
}
?>

Yet despite it's trappings, a PHP array is not at all an array. It's an ordered hash table (ordered hash map, order-preserving dictionary). For that reason, you can do things like this:

<?php
$foo = array();
$foo['a'] = 'First slot';
$foo['b'] = 'Second slot';
$foo['c'] = 'Third slot';


// This will result in 'abc', because order is preserved.
foreach ($foo as $key => $value) {
  print $key;
}

?>

I have heard people argue "No, it's both! Here's why: The keys can be ints or any other scalar. When they're ints, the data structure works like an array."

That is not exactly true. It is true that if you supply only values, integer keys will be automatically assigned. And it's true that there are a wide variety of short cuts to make PHP arrays act like arrays. But they are never really arrays at all. (Nor are they linked lists, or array lists). Here's a simple example illustrating why this is the case:

<?php
$bar = array();
for ($i = 7; $i >= 0; --$i) {
  $bar[$i] = $i;
}

// Print the array as a space-separated string of values.
print implode(' ', $bar);

?>

The above loop creates an array with eight elements, assigning them by integer key. However, it assigns values in reverse order. What should the final contents of the array be (and in what order)?

If it were an array, then the final line should print this:

0 1 2 3 4 5 6 7

In fact, it prints this:

7 6 5 4 3 2 1 0

Why? Because this is not an array. It's an ordered hash map. The "real" index of the ordering list is (apparently) inaccessible to us, but it assigns the pair 7 => 7 to the first (0) position, and the pair 0 => 0 to the eighth position (7). Write the same code in JavaScript, Python, Ruby, Perl, Java, or C# using arrays or indexed lists and you will get the first result, a list from 0 to 7.

In case you think I've pulled out some odd edge case, here's another example that creates an array and initializes it with values, missing only the value for the index 4. 4 is later added:

<?php
$bar = array(0 => 'a', 1 => 'a', 2 => 'a', 3 => 'a', 5 => 'a', 6 => 'a', 7 => 'a');
$bar[4] = 'b';

print implode(' ', $bar);
?>

What we would expect the above to print is 'a a a a b a a a'. What it actually prints is 'a a a a a a a b'. Why? Because the index position of 4 is occupied by 5, and when we inserted $bar[4], it was actually slotted into the eighth spot. Even though we've ordered the keys numerically, this is still a hash table.

The confusion between array-ness and the PHP array type can cause some quirky bugs. I recently found code that effectively did this:

<?php

$foo = array();

$foo[0] = 'First';
$foo[1] = 'Second';
$foo[2] = 'Third';
$foo[3] = 'Last';

// Under some conditions, an item was deleted like this:
unset($foo[1]);

// Under other conditions, an item was changed like this.
$foo[1] = 'New second';

print implode(', ', $foo);
?>

It should now be unsurprising that the final order of the array above was '0 2 3 1'. Yet for those used to working with arrays in other languages, that final order is surprising. To get things back in order, you must to a ksort() (quicksort on keys) to re-order the list.

If you're going to implement only one collection type in the language, I think an ordered hash table is a pretty darn good choice... I just don't think it should have been called an array.



comments powered by Disqus