Come to the 2010 CMS Expo

programming

A QueryPath script for checking on a sitemap

Sitemap ScoresSitemap ScoresI've been tuning our sitemap during the last few months, and one thing I needed was a quick tool to check on the effectiveness of various sitemap generation strategies.

To do this, I wrote a quick QueryPath script (see a full-sized image of the output). The script is explained below.

The code is pretty straightforward. It simply retrieves a URL, parses the sitemap contents, and then sorts them. Finally, it displays the top 100 entries. I've tested it on sitemaps with over 20,000 items. While it is a little slow on such a large document, it works fine.

#!/usr/bin/env php
<?php
require 'QueryPath/QueryPath.php';
 
define('MAX_ITEMS', 100);
 
$sitemap = 'http://example.com/sitemap.xml';
 
$urls = array();
print "Parsing sitemap...\n";
$qp = qp($sitemap, ':root>url>loc');
$size = $qp->size();
$max = $size > MAX_ITEMS ? MAX_ITEMS : $size;
printf("Found %d entries; printing top %d\n\n", $size, $max);
 
try {
    foreach ($qp as $url) {
      $loc = $url->text();
    $score = $url->nextAll('priority')->text();
    $urls[$loc] = $score;
    }
} catch (Exception $e) {
  print $e->getMessage();
}
 
arsort($urls);
 
$filter = "%d: %0.5f  %s\n";
 
foreach ($urls as $uri => $score) {
  if ($i++ == $max) break;
   printf($filter, $i, $score, $uri);
};
?>

Basically, the script above simply fetches all of the URLs out of the sitemap, and then sorts them by their corresponding score. Only the top MAX_ITEMS (100) are shown.

LibRIS: A PHP library for RIS parsing and writing

LibRIS is a library for parsing and writing RIS data.

Learn all about it at the official GitHub repository.

RIS is a data file format for handling reference metadata for scholarly resources. It is used by Reference Manager, EndNote, and other such tools. For that reason, it is broadly supported by online scholar-centered sites.

This library provides a simple interface for parsing and writing RIS data for bibliography management.

OpenAmplify Drupal Series: Part 2 - Building a Mini Portal

The Second in my three-part series on Drupal an OpenAmplify has been published on their community site. If you missed the first part, you may want to start there. Part three, coming soon, will cover the API, and will focus on development instead of configuration.
Part 2Part 2
In part two, I walk through the process of building a "mini portal" by taking semantic information returned from an OpenAmplify analysis of a node, and using that information in conjunction with other web services. For this demonstration, I released a new version of the module, and added support for Shopping.Com and Bloglines, both of which can return some impressively rich content.

QueryPath on WebMonkey

It just came to my attention that a WebMonkey article (Parsing HTML? There's an App for That) from a few months ago suggested using QueryPath as an alternative to attempting to parse HTML by hand.

Webmonkey on QueryPathWebmonkey on QueryPath

Appropriately, last week I wrote a QueryPath script to analyze a site and extract all links so that I could feed them to Siege and simulate something like a real load against a server. It's nice to be able to easily extract data from HTML.

OpenAmplify Drupal Series: Part 1 - The Amplify Module

Over at OpenAmplify's Community site, they are running Part 1 of a three-part series I've written about using OpenAmplify with Drupal.
Open AmplifyOpen Amplify
The first part covers the basics of using Acquia Drupal and the Amplify module to perform semantic analysis of your content.

Acquia Webinar: "Playing Nicely with Others"

In our webinar Playing Nicely With Others: Integrating Drupal with Third-Party Data, Ken, George, Larry, and I talk about integrating various web services with Drupal. We talk about SOAP, content importing, digital asset management systems, and QueryPath (surprisingly, I'm not the one plugging QueryPath in this vid).

Thanks to Acquia for doing a fantastic job putting together their webinar series.

Fortissimo and Pilaster: Two projects

I have released two projects today:

  • Fortissimo: A PHP framework with a twist. It's scalable, it's not MVC, it's fast, and it's NSFW!
  • Pilaster: A pure PHP document database that provides similar services to MongoDB or CouchDB -- only without the server.

Both are still under heavy development, but they are now at the point where others can start testing them and playing with them.

PHP Developer's Snow Leopard Upgrade Notes

I'm upgrading to Snow Leopard, and I intend to switch from MAMP to the built-in PHP/Apache 2 configuration. As a PHP developer, there are several notable things that I wanted to track as I performed my upgrades. This article tracks those changes

PHP 5.3PHP 5.3
My current OS 10.5 toolchain for PHP was this:

  • PHP 5.2.6 (MAMP)
  • Apache 2 (MAMP)
  • MySQL 5 (MAMP)
  • TextMate
  • Git
  • XDebug
  • Several PEAR packages installed into MAMP's PHP 5, including PHPUnit, PhpDocumentor, Phing, and XDebug

One of the desired outcomes was to switch to the OS X version of PHP and Apache, which is tenable now that PHP is more robust (and now that I know how to use PEAR with the OS X version). It's also desirable because Snow Leopard is now running PHP 5.3. Here are my notes on the upgrade.

Update: Problems with PHPUnit and Phing.

QueryPath Performance Optimizations on Reduncery

Continuing a trend on the non-evilness of optimization, this article discusses some methods of improving performance in QueryPath.

Early this week, a Twitter analysis tool called Reduncery was launched by a friend of mine. Reduncery calculates how much of a "redunce" a particular user is -- that is, what percentage of a user's tweets are retweets (RT). It can also calculate how ineffective it is for one person to retweet another. In this case, it calculates the overlap in the followers of the original tweeter with the followers of the retweeter. In what follows, we will look at the ways Reduncery optimizes QueryPath to keep page load times down.

"The Fallacy of Premature Optimization": A must-read

Sir Tony Hoare historically remarked that, "premature optimization is the root of all evil." Have we let this view (or a mis-application or misinterpretation of it) dictate too much of our programming methodology? In his article The Fallacy of Premature Optimization, Randall Hyde argues that we have indeed.

I feel it my philosophic duty to point out that there is in fact no fallacy in the statement. But I think that Hyde makes a very good argument, terminological shortcomings aside aside. Of particular interest are his nine observations in the middle.

Syndicate content

Recent comments