Using QueryPath to interact with a SQL database, Part 3

May 13 2009

This is the third part in a series on how to use QueryPath to work with databases. In this part, we will see how the QPDB extension to QueryPath can be used to extract data from an XML or HTML document and store it in a database.

In the first part of this series, we looked at the basics of QueryPath and databases. There we connected to a database and created some new tables.

In the second part we took a detailed look at the process of retrieving data from the database and then using that data within QueryPath. Specifically, we selected data from a database and then merged it into existing documents and templates.

Now we will look at the opposite process. We will extract data from an XML or HTML document and insert it into a database.

Importing Data

The focus of these articles has thus far been on taking information from a database and putting it into an HTML document. But QueryPath can be used the other way. It can query a document and place that data into a database. In this article, we will see how.

Creating Tables

Our example document will be an RSS 2.0 feed. First, let's set up a very simple database for storing information. We will need two tables: channel to store information about a channel and item to store information about each item in a feed.

The two tables will be created with SQL like this:

CREATE TABLE IF NOT EXISTS channel
  (channel_id INTEGER PRIMARY KEY AUTOINCREMENT, title, link);
CREATE TABLE IF NOT EXISTS item 
  (channel_id INTEGER, guid, title, link, description);

Keep in mind that we are working with SQLite. Other databases, such as MySQL, will have different data definition syntax.

These are both very minimalist tables. Only a fraction of the information in the original RSS document is going to be stored here. But for our purposes, this will do. The channel table will contain three fields: channel_id, which is just an auto-incremented integer, title and link. These last two will come from the RSS channel information.

The second table, item, will be used to store an RSS item. Its channel_id field will be used to tie an item back to the original channel. The other fields will map to information from the RSS feed.

As we did before, we can execute these queries inside of QueryPath like this:

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';

QPDB::baseDB('sqlite:./qpExample.db');


qp()
  ->query('CREATE TABLE IF NOT EXISTS channel (channel_id INTEGER PRIMARY KEY AUTOINCREMENT, title, link)')
  ->query('CREATE TABLE IF NOT EXISTS item (channel_id INTEGER, guid, title, link, description)');
exit;
?>

Now we have our tables ready.

Inserting Channel Information

Once we have some tables defined we can set up a simple tool for reading the remote RSS feed and injecting the contents into our newly defined tables.

Let's begin with a simple script that adds a channel to the database:

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';

QPDB::baseDB('sqlite:./qpExample.db');

$url = 'http://localhost:8888/drupal/rss.xml';
$qp = qp($url);

$values[':title'] = $qp->find('channel>title')->text();
$values[':link'] = $qp->siblings('link')->text();
$qp->query('INSERT INTO channel (title, link) VALUES (:title, :link)', $values);
?>

This short script does a lot of things. After requiring the two necessary libraries and setting the database connection, the script defines the URL ($url) that points to the RSS feed.

Next, a new QueryPath object is created and stored in $qp. As soon as we pass QueryPath the URL, it will retrieve and parse the remote document. In this case, the document is a standard RSS 2.0 feed.

Our plan is to query that RSS document and then store the retrieved information in our newly defined database. In the code above, we first retrieve the two values we want, storing them in $values. Note that these are keyed using the PDO placeholder syntax (:title and :link).

$values[':title'] = $qp->find('channel>title')->text();
$values[':link'] = $qp->siblings('link')->text();

Next, we use $qp->query() to insert the results into the database, passing in $values as the substitution array.

By the end of the script above, the main channel data will be written to the database.

Getting an Insert ID

One thing we often need to do after an insert is get the auto-generated ID back from the database. Most databases support this operation. In our application we need to do this because channel_id, our primary key, is a parameter that we will need when inserting into the next table.

In QueryPath 1.2, retrieving the last insert ID required just a little bit of fancy footwork. It was done like this:

$channel_id = $qp->getBaseDB()->lastInsertId();
?>~~~


This essentially gets the PDO database connection (with <code>getBaseDB()</code>) and then calls the PDO function <code>lastInsertId()</code> on that.

In QueryPath 1.3, this has been streamlined slightly, and you will be able to do something like this:

~~~ php<?php
$channel_id = $qp->getLastInsertID();
?>~~~


Now that we have the last insert ID, we have all we need to iterate through the RSS document and add items to our <code>item</code> table.

<h3>Iterating through Matches and Inserting Data</h3>

An RSS document has multiple <code>item</code> elements. What we want to do is iterate through those items and store some of the contained data inside of our newly defined database.

Let's continue building on the previous examples. This may make our code snippet longer, but it will help keep things in context.


~~~ php<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';

QPDB::baseDB('sqlite:./qpExample.db');

$url = 'http://localhost:8888/drupal/rss.xml';
$qp = qp($url);
$values[':title'] = $qp->find('channel>title')->text();
$values[':link'] = $qp->siblings('link')->text();
$qp->query('INSERT INTO channel (title, link) VALUES (:title, :link)', $values);

// Get the channel ID

// In QueryPath <= 1.2:
//$channel_id = $qp->getBaseDB()->lastInsertId();

// In QueryPath 1.3+
$channel_id = $qp->getLastInsertID();

$sql = 'INSERT INTO item (channel_id, guid, title, link, description) 
  VALUES (:channel_id, :guid, :title, :link, :description)';

foreach($qp->find('item') as $item) {
  $data = array(
    ':channel_id' => $channel_id,
    ':title' => $item->children('title')->text(),
    ':link' => $item->siblings('link')->text(),
    ':description' => $item->siblings('description')->text(),
    ':guid' => $item->siblings('guid')->text(),
  );
  // Notice that we execute from $qp, not $item.
  $qp->query($sql, $data)->doneWithQuery();
}
?>

The new code begins abut halfway down, where $sql is defined. That SQL template will be used for each of our inserts.

The foreach loop beneath is the most important part. Here, QueryPath iterates through each element in the RSS feed. Each item is wrapped in a QueryPath object assigned to $item.

Inside the loop, we first build up a list of values to insert. These are stored in $data. Since $data is going to used to substitute values into the SQL, we use the placeholder syntax, :channel_id, :title, and so on.

The $channel_id will be the same for every item, since they all belong to the same RSS channel. All other values come from the RSS item, and so we query the item to find the title, link, description, and GUID.

Finally, we perform the INSERT:

$qp->query($sql, $data)->doneWithQuery();

As a slight performance optimization, we run the query using the $qp object instead of the $item object.

That's all there is to our RSS importing script. Now our new database should contain information retrieved from the RSS feed.

Retrieving the Data

Now that we have the data in the database, we can retrieve it using the methods we studied in the previous part of this series. For example, let's add one more part to our script: Let's dump a list of item titles into an HTML document.

<?php
require 'QueryPath/QueryPath.php';
require 'QueryPath/Extension/QPDB.php';

QPDB::baseDB('sqlite:./qpExample.db');

qp(QueryPath::HTML_STUB, 'body')
  ->query('SELECT title FROM item')
  ->withEachRow()
  ->appendColumn('title', '<p/>')
  ->doneWithQuery()
  ->writeHTML();
?>

This little snippet of code will select all titles from the item table. Then each title will be wrapped in a paragraph and added to the stub HTML document. Finally, the document will be written out. This will produce HTML that looks like this:

  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8"></meta>
        <title>Untitled</title>
    </head>
    <body>
      <p>Chicago Bears players warm up to Jay Cutler</p>
      <p>Broncos trade Cutler to Bears</p>
      <p>Thanks to benefactors, fountain will offer summer respite</p>
      <p>Learning Drupal 6 Module Development</p>
      <p>Story with lots of tags</p>
      <p>A Sticky Node</p>
      <p>De natura mentis humanae: quod ipsa sit notior qu&agrave;m corpus.</p>
      <p>De iis quae in dubium revocari possunt.</p>
      <p>This is a node</p>
    </body>
    </html>

At this point in the article, we have created some tables, read a remote document into these tables, and then retrieved data back out of the tables.

Conclusion

This is the final article in our three-part series on using QueryPath to work with databases. In the first article, we performed basic database tasks like connecting and running basic queries. The second article focused on using SELECT to retrieve data, and also using QueryPath's built in tools for working with the returned results. This third part has focused on using QueryPath's tools to retrieve information from an XML document, and then store that information inside of a database.

The major features of QueryPath's QPDB extension have been explained and used in this series, but there is still more. You can, for example, share a database connection with non-QueryPath tools, or work more directly with the PDO result set. But this series has, I hope, equipped you to make use of the most commonly used QPDB features. <!--break-->