PHP and curl_multi_exec

Oct 26 2012

This post explain how to get data off of a curl_multi handle. Some time back I posted this snippet of code inside of a larger sample of code:

<?php
  $active = NULL;
  do {
    $ret = curl_multi_exec($multi, $active);
  } while ($ret == CURLM_CALL_MULTI_PERFORM);

  while ($active && $ret == CURLM_OK) {
    if (curl_multi_select($multi) != -1) {
      do {
         $mrc = curl_multi_exec($multi, $active);
      } while ($mrc == CURLM_CALL_MULTI_PERFORM);
    }
  }
?>

I didn't really document or explain it. And so it seems that this code snippet has caused some confusion. Let me explain what it does. <!--break--> First, here's the high level. There are two outer loops. The first one is responsible for clearing out the curl buffer right now. The second one is responsible for waiting for more information, and then getting that information. This is an example of what is called blocking I/O. We block execution of the rest of the program until the network I/O is done. While this isn't the most preferable way in general to handle network I/O, it's really our only choice in single-threaded, synchronous PHP.

So let's take a look at the first loop:

<?php
 $active = NULL;
 do {
   $ret = curl_multi_exec($multi, $active);
 } while ($ret == CURLM_CALL_MULTI_PERFORM);
?>

curl_multi_exec tries to load some data off of the multi handler.$multi is the handle generated by some previous call to curl_multi_init(). $active and $ret are both integer values. curl_multi_exec() sets $active to the number of individual handles it is currently working with. In other words, if you are hitting 5 URLs with this handler, curl_multi_exec will return 5 when it's working on all 5, and then as each one finishes, that number will be reduced by one until it is at 0.

$ret is one of the following:

  • CURLM_CALL_MULTI_PERFORM (-1): This means you should call curl_multi_exec() again because there is still data available for processing.
  • CURLM_OK (0): In the words of the docs: "Things are fine." Gee, that's nice. What it means is that there is more data available, but it hasn't arrived yet.
  • One of the error codes: CURLM_BAD_HANDLE, CURLM_OUT_OF_MEMORY, CURLM_INTERNAL_ERROR, or CURLM_BAD_SOCKET. All of these indicate that we need to stop processing.

So when we are processing the loop, the first loop, the only condition that should keep us iterating is CURLM_CALL_MULTI_PERFORM.

Now, for smallish results, one pass through the loop may be all you need. However, often times the first loop will return CURL_OK to indicate that there seems to be more data, but that the data has not yet arrived on the network.

We need to wait.

That's where the second loop comes in:

<?php
 while ($active && $ret == CURLM_OK) {
   if (curl_multi_select($multi) != -1) {
     do {
        $mrc = curl_multi_exec($multi, $active);
     } while ($mrc == CURLM_CALL_MULTI_PERFORM);
   }
 }
?>

This loop says…

  (while): As long as there are active connections and everything looks OK…
    (if) If the network socket has some data…
      (do/while) Process the data for as long as the system tells us to keep getting it

So the second loop is responsible for checking up on the socket until it is all done.

The PHP manual is a little light on details for this stuff, but the libcurl C documentation is much more complete.