By Matt Butcher
sitemap
A QueryPath script for checking on a sitemap
Submitted by matt on Mon, 2010-02-15 17:23
Sitemap ScoresI've been tuning our sitemap during the last few months, and one thing I needed was a quick tool to check on the effectiveness of various sitemap generation strategies.
To do this, I wrote a quick QueryPath script (see a full-sized image of the output). The script is explained below.
The code is pretty straightforward. It simply retrieves a URL, parses the sitemap contents, and then sorts them. Finally, it displays the top 100 entries. I've tested it on sitemaps with over 20,000 items. While it is a little slow on such a large document, it works fine.
#!/usr/bin/env php <?php require 'QueryPath/QueryPath.php'; define('MAX_ITEMS', 100); $sitemap = 'http://example.com/sitemap.xml'; $urls = array(); print "Parsing sitemap...\n"; $qp = qp($sitemap, ':root>url>loc'); $size = $qp->size(); $max = $size > MAX_ITEMS ? MAX_ITEMS : $size; printf("Found %d entries; printing top %d\n\n", $size, $max); try { foreach ($qp as $url) { $loc = $url->text(); $score = $url->nextAll('priority')->text(); $urls[$loc] = $score; } } catch (Exception $e) { print $e->getMessage(); } arsort($urls); $filter = "%d: %0.5f %s\n"; foreach ($urls as $uri => $score) { if ($i++ == $max) break; printf($filter, $i, $score, $uri); }; ?>
Basically, the script above simply fetches all of the URLs out of the sitemap, and then sorts them by their corresponding score. Only the top MAX_ITEMS (100) are shown.
5 Differences: Moving from XML Sitemap module to Google's Sitemap Generators
Submitted by matt on Mon, 2010-02-15 16:54For a large site that I maintain, we recently disabled the XML Sitemap module (we're using the 1.x branch) and switched to the Google Sitemap Generators tool (the Python one). We have noticed a few unsurprising things, and a few very surprising things.
We identified five big differences (all positive) that we have seen since moving to the Google Sitemap Generators Python tool.








Recent comments
18 hours 45 min ago
22 hours 15 min ago
1 day 3 hours ago
1 day 3 hours ago
1 day 4 hours ago
1 day 4 hours ago
1 day 16 hours ago
1 day 20 hours ago
1 day 22 hours ago
1 day 22 hours ago