A QueryPath script for checking on a sitemap
Sitemap ScoresI've been tuning our sitemap during the last few months, and one thing I needed was a quick tool to check on the effectiveness of various sitemap generation strategies.
To do this, I wrote a quick QueryPath script (see a full-sized image of the output). The script is explained below.
The code is pretty straightforward. It simply retrieves a URL, parses the sitemap contents, and then sorts them. Finally, it displays the top 100 entries. I've tested it on sitemaps with over 20,000 items. While it is a little slow on such a large document, it works fine.
#!/usr/bin/env php <?php require 'QueryPath/QueryPath.php'; define('MAX_ITEMS', 100); $sitemap = 'http://example.com/sitemap.xml'; $urls = array(); print "Parsing sitemap...\n"; $qp = qp($sitemap, ':root>url>loc'); $size = $qp->size(); $max = $size > MAX_ITEMS ? MAX_ITEMS : $size; printf("Found %d entries; printing top %d\n\n", $size, $max); try { foreach ($qp as $url) { $loc = $url->text(); $score = $url->nextAll('priority')->text(); $urls[$loc] = $score; } } catch (Exception $e) { print $e->getMessage(); } arsort($urls); $filter = "%d: %0.5f %s\n"; foreach ($urls as $uri => $score) { if ($i++ == $max) break; printf($filter, $i, $score, $uri); }; ?>
Basically, the script above simply fetches all of the URLs out of the sitemap, and then sorts them by their corresponding score. Only the top MAX_ITEMS (100) are shown.








