QueryPath: Replacing text in one document with text from another

May 14 2009

I received a good question in the QueryPath Drupal module's issue queue. Since the answer is not at all specific to Drupal's QueryPath module, and since the question is one that involves a few QueryPath nuances, I thought it would make a good post here.

To frame the question, here's some sample data:

<?php 
$a = '<div>
  <label>1</label>
  <label>2</label>
  <label>3</label>
</div>';

$b = '<div>
  <p>Second set.</p>
  <label>a</label>
  <label>b</label>
  <label>c</label>
</div>';
?>

The task is to take the labels from $a and write then into $b.

Here's the solution:

<?php 
require 'QueryPath/QueryPath.php';

$a = '<div>
  <label>1</label>
  <label>2</label>
  <label>3</label>
</div>';

$b = '<div>
  <p>Second set.</p>
  <label>a</label>
  <label>b</label>
  <label>c</label>
</div>';

$qpa = qp($a, 'label');
$qpb = qp($b, 'label');

//Might want to check to make sure they are the same length:
if ($qpa->size() != $qpb->size()) {
  // Do something...
  print "Warning...";
}

$i = 0;
foreach ($qpb as $label) {
  $label->text($qpa->branch()->eq($i++)->text());
}

$qpb->writeHTML();
?>

To begin, we create two QueryPath objects -- one pointing to $a, and one pointing to $b. And in both, we search for just the labels, since that is what we are going to replace.

We make sure that there are the same number of labels in each. We should be able to drop this section without causing QueryPath to err, but we do want to do this check.

Most of the "work" happens on these three lines:

foreach ($qpb as $label) {
  $label->text($qpa->branch()->eq($i++)->text());
}

This loops through all of the labels in $qpb and sets the label's text to whatever is in the corresponding indexed position in $qpa.

The latter part is done with $qpa->branch()->eq($i++)->text(). Why do we branch() here? Because we don't want to modify the $qpa QueryPath, and eq() is a "destructive" function. Branching clones the $qpa QueryPath object. When we run eq(), only the branched version is modified. This prevents us from having to do anything fancy with queries. The eq() function selects just the label at the given index. Then we fetch that label's text with text().

The output of this code will look something like this:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div>
  <p>Second set.</p>
  <label>1</label>
  <label>2</label>
  <label>3</label>
</div></body></html>