By Matt Butcher
May 2009
The Amplify Module: Developer's Guide
OpenAmplifyThe Amplify module is a Drupal module that provides a semantic analysis of your nodes. It is powered by OpenAmplify and QueryPath.
This guide provides basic configuration instructions for setting up the Amplify module. It also provides examples of how to retrieve the semantic information from an already amplified document. The Amplify module uses QueryPath for much of its work. If you are planning on doing significant integration with Amplify, you may want to familiarize yourself with the QueryPath API documentation.
A Set of Objects in PHP: Arrays vs. SplObjectStorage
Submitted by matt on Fri, 2009-05-29 15:50One of my projects, QueryPath, performs many tasks that require maintaining a set of unique objects. In my quest to optimize QueryPath, I have been looking into various ways of efficiently storing sets of objects in a way that provides expedient containment checks. In other words, I want a data structure that keeps a list of unique objects, and can quickly tell me if some object is present in that list. The ability to loop through the contents of the list is also necessary.
Recently I narrowed the list of candidates down to two methods:
- Use good old fashioned arrays to emulate a hash set.
- Use the SPLObjectStorage system present in PHP 5.2 and up.
Before implementing anything directly in QueryPath, I first set out designing the two methods, and then ran some micro-benchmarks (with Crell's help) on the pair of methods. To say that the results were surprising is an understatement. The benchmarks will likely change the way I structure future code, both inside and outside of Drupal.
SQLite "Database table is locked" errors in PDO
Submitted by matt on Thu, 2009-05-28 22:37While working on various PHPUnit tests, I was running some database queries. During the tear-down for my tests, I repeatedly received the following error:
General error: 6 database table is locked
The line of code generating this error (from my unit test's tearDown()) was this:
$db->exec('DROP TABLE qpdb_test');
Here, $db was a PDO object connected to a SQLite database. This line of code simply attempts to drop the table that the unit test has been working with.
Initially I could not find the root of the problem. A Google search just about convinced me that the problem was actually with the SQLite driver. However, I managed to temporarily work around the error by skipping the table drop and running a DELETE FROM qpdb_test instead. This worked as expected. After deleting, I discovered the problem.
One of my unit tests was failing. Because PHPUnit (correctly) traps all errors and catches all exceptions, the failed unit test did not stop the execution of the program. Instead, it began the tear-down procedure. However, the premature failure did prevent the PDOStatement's cursor from being closed. With the cursor left open, the DROP TABLE could not successfully be run. And when the DROP failed, the program prematurely exited (as this was in tear-down, not in a test).
A Google search turned up all sorts of theories about why the database tables were locked. (Apparently, others have had this same problem in the context of unit testing.) In the end, though, it was simply a matter of PDO performing as documented, PHPUnit performing as documented, and my code not performing as documented.
Executing a SPARQL Query from QueryPath
Submitted by matt on Thu, 2009-05-28 18:07The Semantic Web. It is a concept that has sparked heated debate for years. While the debate may continue to rage for some time, there are already a host of technologies that can be used to build advanced applications based on XML technology. In this article, we will see how the SPARQL query language can be used to retrieve XML information from remote semantic databases (usually called SPARQL endpoints).
QueryPath already contains all of the tools necessary for running a SPARQL query and handling the results. This is not because QueryPath has been specially fitted to the task, but because SPARQL uses technologies that are widely supported: XML and HTTP. Since QueryPath can be used to make HTTP requests and then digest the XML results, we can use it to execute SPARQL queries and handle the results.
In this article, we will look at a basic SPARQL query, and see how we can use QueryPath to execute it and parse the returned results.
While SPARQL will be introduced here, it is far too robust a language to be explained in a short article. One starting point is the SPARQL Working Group home page.
The queries presented in this chapter will be run against DBPedia, a semantic version of Wikipedia. It makes all of the content from Wikipedia available as semantic content.
The SPARQL Query: A Brief Anatomy
Let's begin by looking at the SPARQL query that we will be running:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?uri ?name ?label
WHERE {
?uri foaf:name ?name .
?uri rdfs:label ?label
FILTER (?name = "The Beatles")
FILTER (lang(?label) = "en")
}The query above begins by defining two prefixes:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
A prefix is a convenient method for representing a namespace URI with a short string. Above, we create one for the Friend of a Friend namespace (foaf:) and one for the RDF Schema namespace (rdfs). Now, whenever we need to represent entities from those two schemata, we can just use the short prefix instead of the full URL.
The next part of the code above is the actual query:
SELECT ?uri ?name ?label
WHERE {
?uri foaf:name ?name .
?uri rdfs:label ?label
FILTER (?name = "The Beatles")
FILTER (lang(?label) = "en")
}We are going to use the URI a lot, and it is easy to get hung up on the URI as a URL expressing a location. However, you are better off thinking of the URI as a unique identifier for an object -- a unique identifier that just happens to also be "dereferenceable". We can, in fact, use the URI to access information over the network (in this case).
If you have developed SQL before, this should look vaguely familiar. It functions similarly to a SQL SELECT operation. Here's what the code above does, phrased in plain English:
- Select the uri, name, and label
- where...
- the uri has the name ?name (or, where the uri's name is stored in ?name)
- the uri has a label ?label
- the name is "The Beatles"
- the language of the label is English
There are a few things to note about the structure of the query.
First, remember that the URI (?uri), is just a unique identifier. It is functioning sort of like a primary key for each object we query.
Second, the items that begin with question marks (?) are variables. Their value is assigned when the query is being executed.
Third, the items in the WHERE clause are not simply restrictive, as they are in SQL. In fact, the purpose of lines 3 and 4 isn't so much to limit the items returned, but to express a relationship between items. The general pattern of lines 3 and 4 is:
?subject ?relationship ?object
So ?uri foaf:name ?name can be understood to mean "Some object ID (subject) named (relationship) Some name(object)". As you may have guessed, foaf:name expresses the relationship "is named". Likewise, rdfs:label expresses the relationship "is labeled".
Assuming that we did not have the two FILTER functions, the query would simply return all objects (together with their names and labels) that had a name and a label.
The FILTER function is used to limit what content is returned. Above, we used two filters:
FILTER (?name = "The Beatles") FILTER (lang(?label) = "en")
The first filter says that the value of ?name must match (exactly) the string "The Beatles". Keep in mind that a given item may have multiple foaf:name items. The filter need only match one of the items.
The second filter requires that the label's language be in English. RDFS labels in the DBPedia database tend to have attributes indicating the language of the label. We are only interested in the English language content. In the query above, if we omit this, we will see results in Chinese, German, and Spanish, as well as other languages.
Putting this all together, then, our query will return the URI, the name, and the label for any URIs in the database that...
- Have a name
- Have a label
- Have a name that is "The Beatles"
- Have a label that is in English.
Next, we're ready to see how this query can be run against a remote, publicly available SPARQL endpoint (server) from QueryPath.
Running the Query
The query is, by far, the most complex aspect of our sample code. Here's what the entire code looks like:
<?php require '../src/QueryPath/QueryPath.php'; // We are using the dbpedia database to execute a SPARQL query. // URL to DB Pedia's SPARQL endpoint. $url = 'http://dbpedia.org/sparql'; // The SPARQL query to run. $sparql = ' PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?uri ?name ?label WHERE { ?uri foaf:name ?name . ?uri rdfs:label ?label FILTER (?name = "The Beatles") FILTER (lang(?label) = "en") } '; // We first set up the parameters that will be sent. $params = array( 'query' => $sparql, 'format' => 'application/sparql-results+xml', ); // DB Pedia wants a GET query, so we create one. $data = http_build_query($params); $url .= '?' . $data; // Next, we simply retrieve, parse, and output the contents. $qp = qp($url, 'head'); // Get the headers from the resulting XML. $headers = array(); foreach ($qp->children('variable') as $col) { $headers[] = $col->attr('name'); } // Get rows of data from result. $rows = array(); $col_count = count($headers); foreach ($qp->top()->find('results>result') as $row) { $cols = array(); $row->children(); for ($i = 0; $i < $col_count; ++$i) { $cols[$i] = $row->branch()->eq($i)->text(); } $rows[] = $cols; } // Turn data into table. $table = '<table><tr><th>' . implode('</th><th>', $headers) . '</th></tr>'; foreach ($rows as $row) { $table .= '<tr><td>'; $table .= implode('</td><td>', $row); $table .= '</td></tr>'; } $table .= '</table>'; // Add table to HTML document. qp(QueryPath::HTML_STUB, 'body')->append($table)->writeHTML(); ?>
While the code may look complex at first blush, it is actually a straightforward tool.
We will begin by taking a quick glance at the first dozen lines:
// URL to DB Pedia's SPARQL endpoint. $url = 'http://dbpedia.org/sparql'; // The SPARQL query to run. $sparql = ' PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT ?uri ?name ?label WHERE { ?uri foaf:name ?name . ?uri rdfs:label ?label FILTER (?name = "The Beatles") FILTER (lang(?label) = "en") } '; // We first set up the parameters that will be sent. $params = array( 'query' => $sparql, 'format' => 'application/sparql-results+xml', ); // DB Pedia wants a GET query, so we create one. $data = http_build_query($params); $url .= '?' . $data;
The snippet above shows all of the preparation we must make to run the query.
We begin with a base $url, which points to the DBPedia SPARQL endpoint. Next we write our SPARQL query. The query above is the same as the one we saw earlier in this article.
With the query and the base URL, we need to build a full URL to access the remote server. This is done with the $params array. There we create the name/value pairs that will be condensed into a GET string by http_build_query(). Note that we set the MIME type as the value of the $params['format'] entry. This is to tell the remote server what kind of data we expect to have returned.
A SPARQL query need not return information encoded as XML. Other data formats are equally capable of representing SPARQL query results. XML is probably the most widely used, though, and is the easiest for us to parse.
In the last line of the snippet above, we assemble our base URL and query params into a complete URL.
Next, we need to execute the query and handle the results.
<?php // Next, we simply retrieve, parse, and output the contents. $qp = qp($url, 'head'); // Get the headers from the resulting XML. $headers = array(); foreach ($qp->children('variable') as $col) { $headers[] = $col->attr('name'); } // Get rows of data from result. $rows = array(); $col_count = count($headers); foreach ($qp->top()->find('results>result') as $row) { $cols = array(); $row->children(); for ($i = 0; $i < $col_count; ++$i) { $cols[$i] = $row->branch()->eq($i)->text(); } $rows[] = $cols; } ?>
We begin by creating a new QueryPath object, stored in $qp. Based on the CSS query, we can see that it will be pointed to the header element in the returned results. This element will contain the names of each of the returned variables of data.
From there, we build an array of $headers, getting the name of each returned variable. These we will use to generate the headers in our table. The headers come back in variable elements, and each variable has a name attribute. To fetch them, then, we select the variables and loop through them, retrieving the name attribute of each.
Next comes the fancy part. We need to loop through each result and fetch each variable out of each result. Or, to use the table metaphor we SQL developers are familiar with, we loop through each row, and fetch each column of data. This is al accomplished in the this foreach loop:
foreach ($qp->top()->find('results>result') as $row) { $cols = array(); $row->children(); for ($i = 0; $i < $col_count; ++$i) { $cols[$i] = $row->branch()->eq($i)->text(); } $rows[] = $cols; }
When this loop is finished, there will be an array of rows, each of which will have an array of columns. The index of the columns should match the index of the headers array. That is how we correlate headers to columns. You may also notice that we use QueryPath's 'branch() method in combination with eq() so that we can (relatively cheaply) get the text for each column.
With this complete, the next thing to do is format the table output:
<?php // Turn data into table. $table = '<table><tr><th>' . implode('</th><th>', $headers) . '</th></tr>'; foreach ($rows as $row) { $table .= '<tr><td>'; $table .= implode('</td><td>', $row); $table .= '</td></tr>'; } $table .= '</table>'; // Add table to HTML document. qp(QueryPath::HTML_STUB, 'body')->append($table)->writeHTML(); ?>
The code above is straightforward. We are taking the data returned from the SPARQL query and formatting it into an HTML table, looping through each row of data.
On the final line, we create a new QueryPath object using the HTML_STUB HTML stub document. We add our new table to that, write the HTML document to the web browser.
Conclusion
This article illustrates how QueryPath can be used to execute SPARQL queries against remote semantic databases, and how QueryPath can then use the results. SPARQL is a complex language, and the introduction here has been brief. However, with such a robust query language at your disposal, and with QueryPath's HTTP, XML, and HTML capabilities, you can make use of the semantic web from your web applications.
Updated Drupal module checkout package
Early this year I posted a short shell script for checking out modules from Drupal.org's CVS repository. After doing several merges and branches, I decided I'd better re-write the script to avoid using -r HEAD when checking out from HEAD.
The revised version of the script now looks like this. You can grab the attached zip file if you want to use the script yourself.
#!/bin/bash ##################################################################### # This script checks out a contrib module. # It sets up the environment first, and then does a CVS login and # a CVS checkout. ##################################################################### # Uncomment and change this if you don't want to set the env var. #DRUPAL_USER='mbutcher' if [[ $# == 0 ]]; then echo "Usage: $0 module_name [branch_or_tag]" echo " " echo " module_name: " echo " Module name must be the name of the module in Drupal's CVS repo." echo " branch_or_tag: " echo " If no branch/tag is specified, HEAD will be used." echo " " echo "Example: " echo " $0 cacheexclude DRUPAL-6--2" echo " " exit 1; fi TAG='HEAD'; if [[ $# > 1 ]]; then TAG=$2; fi # This directory will be created, and then used to house the # new code. dest="./$1/$TAG" mkdir -p $dest cd $dest export CVSROOT=:pserver:$DRUPAL_USER@cvs.drupal.org:/cvs/drupal-contrib cvs login if [[ $# == 1 ]]; then echo "Checking out $1 from HEAD (no tag used)" cvs -z6 checkout -d $1 contributions/modules/$1 else echo "Checking out $1 with revision $TAG" cvs -z6 checkout -r $TAG -d $1 contributions/modules/$1 fi
The main change is this last conditional, which now avoids using -r if no branch is explicitly set.
Pseudo-class and Pseudo-element Reference for QueryPath
Submitted by matt on Sun, 2009-05-24 19:14The CSS 3 Selector specification defines pseudo-classes and pseudo-elements. QueryPath implements as many of these as is possible.
This is a draft reference that will probably be included in QueryPath 2.0. Read on to view the entire document.
SplObjectStorage vs. isset() on array.
<?php /** * Object hashing tests. */ $sos = new SplObjectStorage(); $docs = array(); $iterations = 100000; for ($i = 0; $i < $iterations; ++$i) { $doc = new DOMDocument(); //$doc = new stdClass(); $docs[] = $doc; } $start = $finish = 0; $mem_empty = memory_get_usage(); $start = microtime(TRUE); foreach ($docs as $d) { $sos->attach($d); } $finis = microtime(TRUE); $time_to_fill = $finis - $start; $start = microtime(FALSE); foreach ($docs as $d) { $sos->contains($d); } $finis = microtime(FALSE); $time_to_check = $finis - $start; $
Perl Pie: If you only learn how to do one thing with Perl, this is it.
Submitted by matt on Thu, 2009-05-21 16:20Ever run into that situation where you need to run a find-and-replace inside of multiple documents? Ever wanted to do this with a regular expression, without having to use any complex sed redirection? Here's a pattern for doing this in perl.
No, you don't need to be a Perl guru to use this command. in fact, you don't really need to know Perl at all. If you know regular expressions and can type a command in a shell prompt, you are good to go.
Here's the pattern:
$ perl -pi -e '<some replacement regular expression>' <file or files>
The pattern above is used for running an inline replacement using regular expressions. Given the -pi -e flags, this pattern is sometimes called "perl pie".
Here is an example that replaces MY_NAME with Matt in any files in the current directory that end with the txt extension:
$ perl -pi -e 's/MY_NAME/Matt/g' ./*.txt
When executed, this will seek all files matching ./*.txt and run the replacement pattern s/MY_NAME/Matt/g. This regular expression simply replaces all occurrences of MY_NAME with Matt. This is a simple example, but in your own scripts you can draw freely from the hefty regular expression syntax for which Perl is famous. (Check out man perlre or the wealth of regular expressions websites for more.)
So let's take a look at what happens when the above is run. Here's one of the text files before the above command is executed:
Hello, my name is MY_NAME. Have a nice day.
After running the Perl command, the above file looks like this:
Hello, my name is Matt. Have a nice day.
And that, in a nutshell, is the one line of Perl that you really ought to know.
TweetyPants Video
Submitted by matt on Wed, 2009-05-20 16:03Here's a screencast walk-through of the TweetyPants intelligence analyzer.
Enter a user's screen name and TweetyPants does an analysis of recent tweets by that user. It then rates the user's style, smarts, and shizzle. No, it doesn't do anything useful.
Strange Arrays PHP Puzzler: Dynamic typing, strings, and array indexes
Submitted by matt on Wed, 2009-05-20 12:30Take a look at this snippet of code:
What does it do? By all appearances, it declares an array ($a) with one value in it (first). Then it checks to see if $a's first entry has an index some string.
If it does find such an index, it prints What?. Otherwise the program silently exits.
So what happens when you run this code? If you give it a try, you may be surprised to see that it prints What? every time. How come? After all, we would expect that in order for isset() to return true, we would need an array that looked something like this:
In the case above, we create an array that has an array at the first index position. We could certainly reference $a[0]['some string'] in this example!
But why is the array array('first') evaluating to TRUE?
The reason is somewhat related to an early post here about how PHP deals with characters and strings in array contexts. What appears to be happening is this:
$a[0] returns the string some string. When $a[0]['some string'] is evaluated, PHP converts some string to a numeric value. Effectively, it is evaluated as geshifilter-code'some string'[/geshifilter-code], which will always evaluate to .
So $a[0]['some string'] is essentially performing the same task as $a[0][0]. And in this case, $a[0] is referencing the, and the last [0] is interpreted as being a reference to the first character in the string first
With this behind us, we can now make a trivial change to our sample code and see how this all works:
<?php $a = array('first'); if (isset($a[0]['some string'])) { // print "What?" . PHP_EOL; print $a[0]['some string'] . PHP_EOL; } ?>
Now instead of printing What?, we are going to print the actual value of $a[0]['some string'].
Given what we now know, we should be able to predict the output of the code above. If $a[0]['some string'] is interpreted as $a[0][0], and the second index is calculated against the string in the array, then what we would expect is that the first letter of the first item in the array would be printed.
In fact, that is what happens. Running the code above will output the letter f, which is the first character in $a[0].
Special thanks to Heyrocker for finding this interesting tidbit.
Learning Drupal 6 Module Development
Learning Drupal 6 Module Development
By Matt Butcher
Packt Publishing, 2008
Python LDAP Applications
Around the time the book Mastering OpenLDAP by Matt Butcher was released, Packt published a series of Matt's articles entitled, collectively, Python LDAP Applications.
Packt: Creating Our First Module using Drupal 6 (Part2)
Submitted by matt on Tue, 2009-05-19 08:23
Learning Drupal 6 Module Development
Today Packt has published the second part in my series on creating a first Drupal 6 module. In this installment, I walk though the process of implementing hook_block(). Here's Packt's summary:
In the first part of this 2-part article series we had created a basic module that uses hook_block() to add block content and installed this basic module. In this article by Matt Butcher, we are going to extend the module's functionality. We will add a few new functions that retrieve and format data from Goodreads.
As with the previous part, this has been abstracted from Chapter 2 of Learning Drupal 6 Module Development.
If you missed the first part, start here.
About TweetyPants.com
TweetyPants.com is a fun tool that examines Twitter messages for a particular user and then runs some diagnostics on those messages to determine various characteristics of that user's writing style.

TweetyPants: How smart are your Twitter friends?
Submitted by matt on Mon, 2009-05-18 20:42Are your Twitter friends smarty-pantses? Today I am releasing TweetyPants, a tool that uses QueryPath, OpenAmplify, and Twitter to evaluate a user's posts for three things:
- Style: The more verbose and flowery one's prose is, the higher the score.
- Shizzle: The more slang a post has, the higher the score.
- Smarts: The more educated one's tweets are, the higher the score.
Along with scoring a Twitter account against these three categories, TweetyPants also tells you what that person has been recently tweeting about.
So enter the name of a Twitter friend and see how smart your friends are.
TweetyPants
Drupal Quiz 3.0 Alpha 2 released
Submitted by matt on Fri, 2009-05-15 16:18Drupal Quiz 3.0 Alpha 2 is now released.
The major news for this release is that it has gotten three big new features:
- A new Matching question type has been added.
- A new Short answer question type has been added (and it offers fill-in-the-blank-like functionality)
- QTI (Question & Test Interoperability) XML, version 1.2, can now be imported by Question Import.
Here are some screenshots of the new question types:
Matching
Short answer
Packt: Creating Our First Module using Drupal 6 (Part1)
Submitted by matt on Fri, 2009-05-15 09:54Packt has published an article of mine on beginning Drupal 6 module development. In the article, I cover the process of writing a first module (it's based on Chapter 2 of Learning Drupal 6 Module Development). Here's the intro from the article:
Creating Our First Module using Drupal 6 (Part1)
In this two-part article series by Matt Butcher, we will create our first Drupal module. This article assumes basic knowledge of Drupal module development. Our first module will make use of an existing web service to pull in some XML data, format it, and display it as a block in the site's layout.
We will cover the following topics in this article:
- Creating the .info and .module files
- Creating a new module
- Using basic hooks
- Installing and configuring the module
- Using important Drupal functions
Read it all: http://www.packtpub.com/article/creating-our-first-module-using-drupal6-...
QueryPath: Replacing text in one document with text from another
Submitted by matt on Thu, 2009-05-14 09:43I received a good question in the QueryPath Drupal module's issue queue. Since the answer is not at all specific to Drupal's QueryPath module, and since the question is one that involves a few QueryPath nuances, I thought it would make a good post here.
To frame the question, here's some sample data:
<?php $a = '<div> <label>1</label> <label>2</label> <label>3</label> </div>'; $b = '<div> <p>Second set.</p> <label>a</label> <label>b</label> <label>c</label> </div>'; ?>
The task is to take the labels from $a and write then into $b.
Here's the solution:
<?php require 'QueryPath/QueryPath.php'; $a = '<div> <label>1</label> <label>2</label> <label>3</label> </div>'; $b = '<div> <p>Second set.</p> <label>a</label> <label>b</label> <label>c</label> </div>'; $qpa = qp($a, 'label'); $qpb = qp($b, 'label'); //Might want to check to make sure they are the same length: if ($qpa->size() != $qpb->size()) { // Do something... print "Warning..."; } $i = 0; foreach ($qpb as $label) { $label->text($qpa->branch()->eq($i++)->text()); } $qpb->writeHTML(); ?>
To begin, we create two QueryPath objects -- one pointing to $a, and one pointing to $b. And in both, we search for just the labels, since that is what we are going to replace.
We make sure that there are the same number of labels in each. We should be able to drop this section without causing QueryPath to err, but we do want to do this check.
Most of the "work" happens on these three lines:
foreach ($qpb as $label) { $label->text($qpa->branch()->eq($i++)->text()); }
This loops through all of the labels in $qpb and sets the label's text to whatever is in the corresponding indexed position in $qpa.
The latter part is done with $qpa->branch()->eq($i++)->text(). Why do we branch() here? Because we don't want to modify the $qpa QueryPath, and eq() is a "destructive" function. Branching clones the $qpa QueryPath object. When we run eq(), only the branched version is modified. This prevents us from having to do anything fancy with queries. The eq() function selects just the label at the given index. Then we fetch that label's text with text().
The output of this code will look something like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body><div> <p>Second set.</p> <label>1</label> <label>2</label> <label>3</label> </div></body></html>
Using QueryPath to interact with a SQL database, Part 3
Submitted by matt on Wed, 2009-05-13 21:54This is the third part in a series on how to use QueryPath to work with databases. In this part, we will see how the QPDB extension to QueryPath can be used to extract data from an XML or HTML document and store it in a database.
In the first part of this series, we looked at the basics of QueryPath and databases. There we connected to a database and created some new tables.
In the second part we took a detailed look at the process of retrieving data from the database and then using that data within QueryPath. Specifically, we selected data from a database and then merged it into existing documents and templates.
Now we will look at the opposite process. We will extract data from an XML or HTML document and insert it into a database.
Importing Data
The focus of these articles has thus far been on taking information from a database and putting it into an HTML document. But QueryPath can be used the other way. It can query a document and place that data into a database. In this article, we will see how.
Creating Tables
Our example document will be an RSS 2.0 feed. First, let's set up a very simple database for storing information. We will need two tables: channel to store information about a channel and item to store information about each item in a feed.
The two tables will be created with SQL like this:
CREATE TABLE IF NOT EXISTS channel (channel_id INTEGER PRIMARY KEY AUTOINCREMENT, title, link); CREATE TABLE IF NOT EXISTS item (channel_id INTEGER, guid, title, link, description);
Keep in mind that we are working with SQLite. Other databases, such as MySQL, will have different data definition syntax.
These are both very minimalist tables. Only a fraction of the information in the original RSS document is going to be stored here. But for our purposes, this will do. The channel table will contain three fields: channel_id, which is just an auto-incremented integer, title and link. These last two will come from the RSS channel information.
The second table, item, will be used to store an RSS item. Its channel_id field will be used to tie an item back to the original channel. The other fields will map to information from the RSS feed.
As we did before, we can execute these queries inside of QueryPath like this:
<?php require 'QueryPath/QueryPath.php'; require 'QueryPath/Extension/QPDB.php'; QPDB::baseDB('sqlite:./qpExample.db'); qp() ->query('CREATE TABLE IF NOT EXISTS channel (channel_id INTEGER PRIMARY KEY AUTOINCREMENT, title, link)') ->query('CREATE TABLE IF NOT EXISTS item (channel_id INTEGER, guid, title, link, description)'); exit; ?>
Now we have our tables ready.
Inserting Channel Information
Once we have some tables defined we can set up a simple tool for reading the remote RSS feed and injecting the contents into our newly defined tables.
Let's begin with a simple script that adds a channel to the database:
<?php require 'QueryPath/QueryPath.php'; require 'QueryPath/Extension/QPDB.php'; QPDB::baseDB('sqlite:./qpExample.db'); $url = 'http://localhost:8888/drupal/rss.xml'; $qp = qp($url); $values[':title'] = $qp->find('channel>title')->text(); $values[':link'] = $qp->siblings('link')->text(); $qp->query('INSERT INTO channel (title, link) VALUES (:title, :link)', $values); ?>
This short script does a lot of things. After requiring the two necessary libraries and setting the database connection, the script defines the URL ($url) that points to the RSS feed.
Next, a new QueryPath object is created and stored in $qp. As soon as we pass QueryPath the URL, it will retrieve and parse the remote document. In this case, the document is a standard RSS 2.0 feed.
Our plan is to query that RSS document and then store the retrieved information in our newly defined database. In the code above, we first retrieve the two values we want, storing them in $values. Note that these are keyed using the PDO placeholder syntax (:title and :link).
$values[':title'] = $qp->find('channel>title')->text(); $values[':link'] = $qp->siblings('link')->text();
Next, we use $qp->query() to insert the results into the database, passing in $values as the substitution array.
By the end of the script above, the main channel data will be written to the database.
Getting an Insert ID
One thing we often need to do after an insert is get the auto-generated ID back from the database. Most databases support this operation. In our application we need to do this because channel_id, our primary key, is a parameter that we will need when inserting into the next table.
In QueryPath 1.2, retrieving the last insert ID required just a little bit of fancy footwork. It was done like this:
<?php $channel_id = $qp->getBaseDB()->lastInsertId(); ?>
This essentially gets the PDO database connection (with getBaseDB()) and then calls the PDO function lastInsertId() on that.
In QueryPath 1.3, this has been streamlined slightly, and you will be able to do something like this:
<?php $channel_id = $qp->getLastInsertID(); ?>
Now that we have the last insert ID, we have all we need to iterate through the RSS document and add items to our item table.
Iterating through Matches and Inserting Data
An RSS document has multiple item elements. What we want to do is iterate through those items and store some of the contained data inside of our newly defined database.
Let's continue building on the previous examples. This may make our code snippet longer, but it will help keep things in context.
<?php require 'QueryPath/QueryPath.php'; require 'QueryPath/Extension/QPDB.php'; QPDB::baseDB('sqlite:./qpExample.db'); $url = 'http://localhost:8888/drupal/rss.xml'; $qp = qp($url); $values[':title'] = $qp->find('channel>title')->text(); $values[':link'] = $qp->siblings('link')->text(); $qp->query('INSERT INTO channel (title, link) VALUES (:title, :link)', $values); // Get the channel ID // In QueryPath <= 1.2: //$channel_id = $qp->getBaseDB()->lastInsertId(); // In QueryPath 1.3+ $channel_id = $qp->getLastInsertID(); $sql = 'INSERT INTO item (channel_id, guid, title, link, description) VALUES (:channel_id, :guid, :title, :link, :description)'; foreach($qp->find('item') as $item) { $data = array( ':channel_id' => $channel_id, ':title' => $item->children('title')->text(), ':link' => $item->siblings('link')->text(), ':description' => $item->siblings('description')->text(), ':guid' => $item->siblings('guid')->text(), ); // Notice that we execute from $qp, not $item. $qp->query($sql, $data)->doneWithQuery(); } ?>
The new code begins abut halfway down, where $sql is defined. That SQL template will be used for each of our inserts.
The foreach loop beneath is the most important part. Here, QueryPath iterates through each <item/> element in the RSS feed. Each item is wrapped in a QueryPath object assigned to $item.
Inside the loop, we first build up a list of values to insert. These are stored in $data. Since $data is going to used to substitute values into the SQL, we use the placeholder syntax, :channel_id, :title, and so on.
The $channel_id will be the same for every item, since they all belong to the same RSS channel. All other values come from the RSS item, and so we query the item to find the title, link, description, and GUID.
Finally, we perform the INSERT:
$qp->query($sql, $data)->doneWithQuery();
As a slight performance optimization, we run the query using the $qp object instead of the $item object.
That's all there is to our RSS importing script. Now our new database should contain information retrieved from the RSS feed.
Retrieving the Data
Now that we have the data in the database, we can retrieve it using the methods we studied in the previous part of this series. For example, let's add one more part to our script: Let's dump a list of item titles into an HTML document.
<?php require 'QueryPath/QueryPath.php'; require 'QueryPath/Extension/QPDB.php'; QPDB::baseDB('sqlite:./qpExample.db'); qp(QueryPath::HTML_STUB, 'body') ->query('SELECT title FROM item') ->withEachRow() ->appendColumn('title', '<p/>') ->doneWithQuery() ->writeHTML(); ?>
This little snippet of code will select all titles from the item table. Then each title will be wrapped in a paragraph and added to the stub HTML document. Finally, the document will be written out. This will produce HTML that looks like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>
<title>Untitled</title>
</head>
<body>
<p>Chicago Bears players warm up to Jay Cutler</p>
<p>Broncos trade Cutler to Bears</p>
<p>Thanks to benefactors, fountain will offer summer respite</p>
<p>Learning Drupal 6 Module Development</p>
<p>Story with lots of tags</p>
<p>A Sticky Node</p>
<p>De natura mentis humanae: quod ipsa sit notior quàm corpus.</p>
<p>De iis quae in dubium revocari possunt.</p>
<p>This is a node</p>
</body>
</html>At this point in the article, we have created some tables, read a remote document into these tables, and then retrieved data back out of the tables.
Conclusion
This is the final article in our three-part series on using QueryPath to work with databases. In the first article, we performed basic database tasks like connecting and running basic queries. The second article focused on using SELECT to retrieve data, and also using QueryPath's built in tools for working with the returned results. This third part has focused on using QueryPath's tools to retrieve information from an XML document, and then store that information inside of a database.
The major features of QueryPath's QPDB extension have been explained and used in this series, but there is still more. You can, for example, share a database connection with non-QueryPath tools, or work more directly with the PDO result set. But this series has, I hope, equipped you to make use of the most commonly used QPDB features.
Favorite Programmer Cartoons
Submitted by matt on Tue, 2009-05-12 16:02The "Ballmer Peak" was explained to me at DrupalCon DC earlier this year... moments after I checked in Kitten.module.

A friend of mine sent me a link to a hilarious collection of programmer-oriented comics and cartoons, including the Ballmer Peak, Bobby Tables, the Dilbert Token Ring, and a few dozen that I hadn't seen before.
Check out the entire collection:
http://stackoverflow.com/questions/84556/whats-your-favorite-programmer-...








Recent comments
1 day 19 hours ago
2 days 19 hours ago
3 days 5 hours ago
3 days 5 hours ago
4 days 12 hours ago
5 days 7 hours ago
6 days 8 hours ago
1 week 5 hours ago
1 week 2 days ago
2 weeks 1 day ago