By Matt Butcher
querypath
A QueryPath script for checking on a sitemap
Submitted by matt on Mon, 2010-02-15 17:23
Sitemap ScoresI've been tuning our sitemap during the last few months, and one thing I needed was a quick tool to check on the effectiveness of various sitemap generation strategies.
To do this, I wrote a quick QueryPath script (see a full-sized image of the output). The script is explained below.
The code is pretty straightforward. It simply retrieves a URL, parses the sitemap contents, and then sorts them. Finally, it displays the top 100 entries. I've tested it on sitemaps with over 20,000 items. While it is a little slow on such a large document, it works fine.
#!/usr/bin/env php <?php require 'QueryPath/QueryPath.php'; define('MAX_ITEMS', 100); $sitemap = 'http://example.com/sitemap.xml'; $urls = array(); print "Parsing sitemap...\n"; $qp = qp($sitemap, ':root>url>loc'); $size = $qp->size(); $max = $size > MAX_ITEMS ? MAX_ITEMS : $size; printf("Found %d entries; printing top %d\n\n", $size, $max); try { foreach ($qp as $url) { $loc = $url->text(); $score = $url->nextAll('priority')->text(); $urls[$loc] = $score; } } catch (Exception $e) { print $e->getMessage(); } arsort($urls); $filter = "%d: %0.5f %s\n"; foreach ($urls as $uri => $score) { if ($i++ == $max) break; printf($filter, $i, $score, $uri); }; ?>
Basically, the script above simply fetches all of the URLs out of the sitemap, and then sorts them by their corresponding score. Only the top MAX_ITEMS (100) are shown.
QueryPath on WebMonkey
Submitted by matt on Tue, 2010-01-19 10:03It just came to my attention that a WebMonkey article (Parsing HTML? There's an App for That) from a few months ago suggested using QueryPath as an alternative to attempting to parse HTML by hand.
Appropriately, last week I wrote a QueryPath script to analyze a site and extract all links so that I could feed them to Siege and simulate something like a real load against a server. It's nice to be able to easily extract data from HTML.
Acquia Webinar: "Playing Nicely with Others"
Submitted by matt on Wed, 2010-01-06 10:57In our webinar Playing Nicely With Others: Integrating Drupal with Third-Party Data, Ken, George, Larry, and I talk about integrating various web services with Drupal. We talk about SOAP, content importing, digital asset management systems, and QueryPath (surprisingly, I'm not the one plugging QueryPath in this vid).
Thanks to Acquia for doing a fantastic job putting together their webinar series.
Streamlining Iterators in QueryPath 3.x
Submitted by matt on Tue, 2009-12-01 22:21Work has officially begun on QueryPath 3.x. The upcoming release is focused on implementing and supporting many of the new features introduced in PHP 5.3, including enhanced SPL support, namespaces, closures, and phar archives.
In an earlier article, I examined the performance of various iteration strategies in QueryPath. After taking a hard look at the patterns I observed there, I revisited QueryPath's QueryPathIterator class to see if I could make a sizable performance improvement.
Iteration Techniques and Performance in QueryPath
Submitted by matt on Thu, 2009-11-26 12:41QueryPath provides multiple methods of iterating. This article demonstrates the performance impact of various looping types. In this article, we are going to look at four different ways of iterating through the items wrapped by a QueryPath object:
- Using QueryPath's iterator
- Looping through
DOMNodeobjects - Using
each()and a callback - Using
each()and an anonymous function
This last item is specific to PHP 5.3 and later, and offers intriguing possibilities when paired with closures.
Finally, at the end of the article, I will show some representative performance numbers.
QueryPath Performance Optimizations on Reduncery
Submitted by matt on Fri, 2009-11-20 10:40
Continuing a trend on the non-evilness of optimization, this article discusses some methods of improving performance in QueryPath.
Early this week, a Twitter analysis tool called Reduncery was launched by a friend of mine. Reduncery calculates how much of a "redunce" a particular user is -- that is, what percentage of a user's tweets are retweets (RT). It can also calculate how ineffective it is for one person to retweet another. In this case, it calculates the overlap in the followers of the original tweeter with the followers of the retweeter. In what follows, we will look at the ways Reduncery optimizes QueryPath to keep page load times down.
Reduncery: Calculating retweet idiocy
Submitted by matt on Wed, 2009-11-18 10:54Ever get irritated by reading the same tweet multiple times, retweeted by the same old people? Ever wondered how effective re-tweeting is? Are new people really reading the tweet, or are the same people just being notified multiple times? You can now find out for sure with Reduncery.
Reduncery
Reduncery was built on QueryPath and Drupal. In a future blog, I'll tell you about some of the the performance optimizations Reduncery uses to speed up searches of 200k+ users.
QueryPath slides from DrupalCon Paris, 2009
Submitted by matt on Thu, 2009-10-22 11:05I finally posted my slides for DrupalCon Paris at slideshare.
Feel free to download a copy and use it in conjunction with the video from Paris. The slides, though, cover some information that I did not have time to cover in the video. Conversely, the video features Ken and David each talking about QueryPath projects they worked on.
SPL in PHP 5.3
Submitted by matt on Wed, 2009-10-21 14:12Here is a great slideshow that explains what is so important (and so interesting) about the SPL libraries included in PHP 5.3.
And for additional reading, head over to the PHP manual: http://us3.php.net/manual/en/spl.datastructures.php
When I switched QueryPath from an array to an SplObjectStorage object, I noticed tremendous speed improvements. And the 5.3 random access extensions to SPLObjectStorage will continue to speed QueryPath's engine.
QueryPath at DrupalCamp Atlanta?
Submitted by matt on Thu, 2009-09-24 08:58I was happy to see that QueryPath made the hallway track at DrupalCamp Atlanta. I assume I have Ken to thank for that. Ken co-presented with me twice at DrupalCon Paris -- once on how we did Foreign Affairs, and once on QueryPath (video of session).
Josh Brauer's Blog









Recent comments
2 days 14 hours ago
4 days 23 min ago
4 days 55 min ago
4 days 1 hour ago
4 days 13 hours ago
4 days 13 hours ago
2 weeks 1 hour ago
2 weeks 21 hours ago
2 weeks 1 day ago
2 weeks 2 days ago