Loading Drupal Nodes into MongoDB with Drush

Mar 26 2010

To do some prototyping, I wanted to load all 32k of our Drupal nodes into MongoDB. At first, the thought of doing this seemed daunting. Then I realized that with Drush I could use a very simple script to perform an entire migration.

The result: With a 14 line PHP script, I transferred all of the nodes (CCK, taxonomy, and all) without a glitch.

Read on for the full explanation.

The technologies

Nodes: Drupal stores its text content in nodes, where a node is a complex data structure that (in SQL) spans any number of tables. Writing SQL queries to retrieve nodes is anything but simple; however, Drupal provides an API that makes this pretty easy.

MongoDB: MongoDB is a schemaless document database. A little while back, I wrote a quick article explaining five things every PHP developer should know about MongoDB. I also wrote a quick getting started guide for Mac users.

Drush: Drush is the "Drupal Shell" -- a command-line interface to Drupal. Among other things, it allows you to run ad hoc code from the command line, and have that code execute in Drupal.

How to load nodes into MongoDB

All I wanted to do was load a copy of all of my nodes into MongoDB. At some later point, I may want to clean up the data, but for now all I need is an object representation of all nodes.

It turns out that this is pretty easy to do, because...

  • Drupal's node_load() function can load a node as an object
  • Drush can let me write a quick import/export script to run on the command line
  • And MongoDB can store just about any old array of primitives

So here's the code:

<?php
// Connect
$mongo = new Mongo();

// Get the database (it is created automatically)
$db = $mongo->testDatabase;

// Get the collection for nodes (it is created automatically)
$collection = $db->nodes;

// Get a listing of all of the node IDs
$r = db_query('SELECT nid FROM {node}');

// Loop through all of the nodes...
while($row = db_fetch_object($r)) {
  print "Writing node $row->nid\n";

  // Load each node and convert it to an array.
  $node = (array)node_load($row->nid);

  // Store the node in MongoDB
  $collection->save($node);
}
?>

To execute this script inside of a Drupal context, I just ran Drush:

$ drush script mongoimport.php

I let that run (it takes some time), and then tested it from the mongo command line client:

> use testDatabase;
> db.nodes.find( {title: /about/i} , {title: true}).limit(4);

The above gives me the first four items in the MongoDB that have about in the title. It returns only the title (and the MongoID) in BSON format. So the output looks like this:

{ "_id" : ObjectId("4ba92e797f8b9a813c5c0000"), "title" : "Myths about Back Pain and Neck Pain" }
{ "_id" : ObjectId("4ba92e797f8b9a813c700000"), "title" : "Myths about Causes of Back Pain and Back Problems" }
{ "_id" : ObjectId("4ba92e797f8b9a813c710000"), "title" : "Myths about Treatment for Back Pain and Back Problems" }
{ "_id" : ObjectId("4ba92e7a7f8b9a813cb90000"), "title" : "Insights and Advice About Herniated Discs" }

Of course, I can get the same things as PHP arrays through PHP:

<?php
// Connect
$mongo = new Mongo();

// Write our search filter (same as shell example above)
$filter = array(
  'title' => new MongoRegex('/about/i'),
);

// Run the query, getting only 5 results.
$res = $mongo->quiddity->nodes->find($filter)->limit(5);

// Loop through and print the title of each article.
foreach ($res as $row) {
  print $row['title'] . PHP_EOL;
}
?>

The data stored in MongoDB is basically exactly the data that one would get from doing a (array)node_load(). (There will be exceptions to this where non-primitives are stored in a node.)

The clone of the node table that we've stored using this method isn't perfect, and it would certainly be nice to clean it up. But for the most part, this simple method gives a rough and ready copy of all of the node content. I've already been able to make use of it as a quick but advanced tool for querying our data.



comments powered by Disqus