QueryPath can be used to work with XML or HTML. Here, I will introduce the typical tasks one uses when working with HTML documents.
We will look at the following:
- Loading HTML documents
- Modifying documents
- Sending a document to the web browser
- Creating new documents from scratch
Loading Documents
The first common task we will look at is loading an existing document. In most cases, we will be loading documents straight from the file system. Sometimes we may load them from a string of existing HTML, too. Here, we will look at each.
Loading from a file
To load a document from a file, all we need is the path to the file. Let's say we have an HTML document located in /var/www/html/index.html. Here's how we can load that file using QueryPath:
<?php
require 'QueryPath/QueryPath.php';
qp('/var/www/html/index.html');
?>
The code above will load the file from the file system. The last line of code will create a new QueryPath object that wraps the content of index.html. A little later, we will build on this example.
Loading from a string of HTML
Often times, HTML is generated on the fly and then sent to the web browser. QueryPath can be used as a filter for altering such dynamically generated HTML content.
For example, consider a case where the we have a string, $html that contains some HTML. Here's how we would load that string:
<?php
require 'QueryPath/QueryPath.php';
$html = '<html>
<head>
<title>Existing HTML</title>
</head>
<body>
<h1>The title</h1>
</body>
</html>';
qp($html);
?>
In the example above, $html has an entire HTML document. Note that the document is not technically correct -- it is missing a document type declaration. While QueryPath is strict about XML formatting, it is much more forgiving for HTML markup.
Again, the last line of the code above creates a new QueryPath object, this time wrapping the contents of the string.
It is important to note that in the above two examples, both used the qp() function to build the document. In fact, both used the function with the same signature: qp($string). QueryPath is "smart enough" to determine whether a string is an HTML document, an XML document, or a path to a file in the filesystem.
Tip: In some cases, you can get HTML from the output buffer (see the 'ob' functions in PHP) and then pass the markup on to QueryPath. In this way, you can do post-processing on data that has been output from the application already.
Modifying documents
Let's build on our last example to see how QueryPath can be used to process existing HTML.
In our new example, we will change the title (in the document head) and add a new paragraph beneath the h1.
<?php
require 'QueryPath/QueryPath.php';
$html = '<html>
<head>
<title>Existing HTML</title>
</head>
<body>
<h1>The title</h1>
</body>
</html>';
$qp = qp($html, 'title') // Load doc and find <title>
->text('A new title') // Set the new title
->top() // Go back to the top of the document
->find('h1') // Find the <h1>
->after('<p>This is the new paragraph</p>'); // Add a new paragraph after.
?>
The example above is considerably more dense than our previous examples. Again, it begins with a string of HTML. But this time, when qp() is called, a query is passed in as the second argument. title will instruct QueryPath to search for any title elements. It will, as we can see, find one: The title inside of the document head.
The second line of our QueryPath chain will set the text (text()) of the title to A new title.
The third line will navigate back to the top of the document. This is necessary because we are not going to do any more manipulation of the head. We want to start looking for new content from the top of the document.
Next, we need to find our H1 tag. This is done with find('h1'). At this point, the QueryPath object is pointing to the h1 tag. We want to add some content after this tag.
The final step of the QueryPath chain adds a new paragraph after the h1 tag: after('<p>This is the new paragraph</p>'). QueryPath's after() method is one of the dozen or so tools for inserting or updating content in a document. Check out the article at IBM DeveloperWorks for an overview of the other methods.
Here, we've seen two methods, text() and after(), that can be used to modify the document. Next, let's see how to get the results of our modification.
Sending the results to a browser
Again, let's just continue on from our previous example.
At any point, we can get the current state of the HTML using the html() method. For example, we can do something like this:
<?php
require 'QueryPath/QueryPath.php';
$html = '<html>
<head>
<title>Existing HTML</title>
</head>
<body>
<h1>The title</h1>
</body>
</html>';
$content = qp($html)->html();
?>
Now, content will be a string that should look basically the same as $html (except that it will be cleaned up by QueryPath).
The html() method always works from the local context, though. So if we wanted to get just the body of the above HTML, we could do this:
<?php
require 'QueryPath/QueryPath.php';
$html = '<html>
<head>
<title>Existing HTML</title>
</head>
<body>
<h1>The title</h1>
</body>
</html>';
print qp($html, 'body')->html();
?>
This would output the following:
<body>
<h1>The title</h1>
</body>
See how the html() method is only grabbing the contents that are currently selected? Since we queried for body, only the body is shown.
Most of the time, though, we are more interested in printing the entire document. The clumsy way of doing this is to do something like this:
<?php
require 'QueryPath/QueryPath.php';
$html = '<html>
<head>
<title>Existing HTML</title>
</head>
<body>
<h1>The title</h1>
</body>
</html>';
print qp($html, 'title')->text('New title')->top()->html();
?>
There are three steps here (all on the one line) involved in getting the entire HTML document to print.
First, there is an explicit print statement. Second, there is a call to the top() method to get us back to the top of the document. Third, there is the call to html() to get the HTML string.
The above can be further condensed using a different method, writeHTML(). This basically bundles the three steps above into just one step. Thus, we could rewrite the above like this:
<?php
require 'QueryPath/QueryPath.php';
$html = '<html>
<head>
<title>Existing HTML</title>
</head>
<body>
<h1>The title</h1>
</body>
</html>';
qp($html, 'title')->text('New title')->writeHTML();
?>
As a result of running this code, the following document would be shipped to the browser:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head><title>New title</title></head>
<body><h1>The title</h1></body>
</html>
(Notice that the doctype has been automatically added.)
Creating new documents
The last thing we will look at is creating a new HTML document with QueryPath.Really, there are two ways. You can either create the entire document from scratch, or you can use a built-in HTML stub document.
Building from scratch
The first way of creating a new HTML document is to build the document from scratch. We've actually seen this method already in our string handling examples above. There, we created a document as a string and then passed it into QueryPath:
<?php
require 'QueryPath/QueryPath.php';
$html = '<html>
<head>
<title>Existing HTML</title>
</head>
<body>
<h1>The title</h1>
</body>
</html>';
qp($html);
?>
Building documents that way is always acceptable. Should you so choose, you can even build it up in an even more piecemeal fashion:
<?php
require 'QueryPath/QueryPath.php';
qp()->append('<html>')->children()->append('<head/><body/>'); // etc.
?>
This, however, is not a terribly efficient method of document building, and is generally only useful in rare cases.
Building from a stub
The easiest method is to use the HTML stub document included with QueryPath. This stub provides a skeleton XHTML document that you can then build using the methods we talked about above.
Here is a quick example:
<?php
require 'QueryPath/QueryPath.php';
qp(QueryPath::HTML_STUB, 'title')->text('New title')->writeHTML();
?>
This example creates a new stub document, sets the title, and writes the output. Here is what the output looks like:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>
<title>New title</title>
</head>
<body></body>
</html>
You may notice that here even more work is done for you. The content type is set, as is the doctype. This last method is the quickest way for you to author HTML documents.
Conclusion
We have quickly covered the basics of using QueryPath to work with HTML. We have looked at loading existing documents, changing documents, writing documents to the web browser, and creating documents from scratch.
Of course, QueryPath can be used for many other things. For more information, head over to http://querypath.org. And check back here for more articles like this one.