By Matt Butcher
querypath
Data URLs and QueryPath: How to embed images into XML or HTML
Submitted by matt on Thu, 2010-08-26 17:16QueryPath 2.1 is adding support for writing files directly into URLs using Data URLs. What this means is that you can encode and embed images or other documents straight into your HTML or XML.
Here's a simple example from the QueryPath 2.1 unit tests:
<?php $xml = '<?xml version="1.0"?><root><item/></root>'; qp($xml, 'item')->dataURL('secret', 'Hi!', 'text/plain'); ?>
The above will generate an XML fragment that looks like this:
<?xml version="1.0"?> <root> <item secret="data:text/plain;base64,SGkh"/> </root>
The important part there is the attribute secret="data:text/plain;base64,SGkh. This attribute includes an embedded text document with the contents Hi!. What we've done is encode the data and injected it as a document inside of the XML.
Sure, that's novel... but what would we want to use that for? How about adding images directly into a document?
Reflections on Google Summer of Code
Submitted by matt on Thu, 2010-08-26 08:20This was the second year that I have been involved as a mentor for Google's Summer of Code program. And in both cases, I've worked as a mentor for Drupal. Last year, I worked with sivaji on a project involving the Quiz module. This year, I worked with eabrand on QueryPath and the QueryPath module.
In both cases, the projects were highly successful. I'm thrilled to have had the opportunity to work with two very gifted up-and-coming developers.
I think one of the most critical questions to ask of any program like GSOC, is whether or not it produces the results (pedagogical and professional) that it is after. With both Sivaji and Emily, the answer is a resounding yes.
- Since finishing his GSOC project, Sivaji has begun his professional life as a web developer focused on Drupal. Recently, he and his colleagues started E-ndicus, a Drupal-focused software development company in his home town of Chennai.
- Emily is now a software engineer at HP. She continues to contribute to QueryPath, and was just this week featured on Google's blog. Last week, she joined me on the Drupal Dojo QueryPath session, too.
I doubt either of these individuals learned much from me during our GSOC projects. More than anything, it just takes hard work, persistence, and attention to detail to finish a GSOC project. But I've certainly learned a lot from them. And both Quiz and QueryPath have benefited enormously from the work of these two.
Slides for my Dojo presentation: "QueryPath: It's like PHP jQuery in Drupal!"
Submitted by matt on Wed, 2010-08-18 09:11I posted the slides from yesterday's Drupal Dojo presentation. These should be much more readable than the video feed.
Drupal Dojo: "QueryPath: It's like PHP jQuery in Drupal!"
Submitted by matt on Mon, 2010-08-16 22:35On August 17th at 12pm EDT (9AM PDT), I will be doing the Drupal Dojo session, "QueryPath: It's like PHP jQuery in Drupal!". To sign up, head over to the webinar signup.
I'm particularly excited about this for three reasons:
- Emily will be joining me to talk about her GSoC project.
- We will be discussing QueryPath 2.1 and the new Drupal 7 QueryPath module.
- The totally gorgeous new QueryPath logo (designed by Michael Mesker) will be unveiled.
This has been an exciting summer for QueryPath, and this webinar will preview many of the QueryPath technologies that are on the cusp of being released.
A PHP jQuery Library: QueryPath Overview
Submitted by matt on Sun, 2010-08-08 14:54jQuery is a JavaScript library for efficiently working with HTML and CSS. Its chainable and compact API has made it a popular choice for web developers seeking to quickly build rich web applications. But did you know there is a PHP jQuery library? QueryPath is a PHP implementation of jQuery's interface. It provides all of the DOM manipulation functions, a full CSS selector engine, and as much of jQuery's other features as is practically implemented server-side. But that's not all. This powerful library delivers many server-side features designed to make working with XML services simple, robust, and reliable.
QueryPath and Character Sets: Converting content with mb_convert_encoding()
Submitted by matt on Mon, 2010-05-03 10:43QueryPath can be used to crawl the web, parsing web pages and gleaning information. But the HTML of remote websites is not always as pristine and standards compliant as we would like, and one thing that can be particularly frustrating is determining the encoding of a document. (This gets substantially more complicated when HTTP headers list one encoding and HTML meta tags list another -- a common configuration error).
QueryPath is primarily a library for working with XML and HTML, but it assumes that you know from the outset what character set your document uses. This is not always a good assumption to make. Here is one way to circumvent the problem: Rather than write code to find out a document's character set, use PHP built-in functions (assuming you have the MB library compiled in) to do this for you.
<?php require 'QueryPath/QueryPath.php'; $url = 'http://mopy.fr/'; $contents = mb_convert_encoding(file_get_contents($url), 'iso-8859-1', 'auto'); $opts = array('ignore_parser_warnings' => TRUE); print @qp($contents, 'title', $opts)->text() . PHP_EOL;
A QueryPath script for checking on a sitemap
Submitted by matt on Mon, 2010-02-15 17:23
Sitemap ScoresI've been tuning our sitemap during the last few months, and one thing I needed was a quick tool to check on the effectiveness of various sitemap generation strategies.
To do this, I wrote a quick QueryPath script (see a full-sized image of the output). The script is explained below.
The code is pretty straightforward. It simply retrieves a URL, parses the sitemap contents, and then sorts them. Finally, it displays the top 100 entries. I've tested it on sitemaps with over 20,000 items. While it is a little slow on such a large document, it works fine.
#!/usr/bin/env php <?php require 'QueryPath/QueryPath.php'; define('MAX_ITEMS', 100); $sitemap = 'http://example.com/sitemap.xml'; $urls = array(); print "Parsing sitemap...\n"; $qp = qp($sitemap, ':root>url>loc'); $size = $qp->size(); $max = $size > MAX_ITEMS ? MAX_ITEMS : $size; printf("Found %d entries; printing top %d\n\n", $size, $max); try { foreach ($qp as $url) { $loc = $url->text(); $score = $url->nextAll('priority')->text(); $urls[$loc] = $score; } } catch (Exception $e) { print $e->getMessage(); } arsort($urls); $filter = "%d: %0.5f %s\n"; foreach ($urls as $uri => $score) { if ($i++ == $max) break; printf($filter, $i, $score, $uri); }; ?>
Basically, the script above simply fetches all of the URLs out of the sitemap, and then sorts them by their corresponding score. Only the top MAX_ITEMS (100) are shown.
QueryPath on WebMonkey
Submitted by matt on Tue, 2010-01-19 10:03It just came to my attention that a WebMonkey article (Parsing HTML? There's an App for That) from a few months ago suggested using QueryPath as an alternative to attempting to parse HTML by hand.
Appropriately, last week I wrote a QueryPath script to analyze a site and extract all links so that I could feed them to Siege and simulate something like a real load against a server. It's nice to be able to easily extract data from HTML.
Acquia Webinar: "Playing Nicely with Others"
Submitted by matt on Wed, 2010-01-06 10:57In our webinar Playing Nicely With Others: Integrating Drupal with Third-Party Data, Ken, George, Larry, and I talk about integrating various web services with Drupal. We talk about SOAP, content importing, digital asset management systems, and QueryPath (surprisingly, I'm not the one plugging QueryPath in this vid).
Thanks to Acquia for doing a fantastic job putting together their webinar series.
Streamlining Iterators in QueryPath 3.x
Submitted by matt on Tue, 2009-12-01 22:21Work has officially begun on QueryPath 3.x. The upcoming release is focused on implementing and supporting many of the new features introduced in PHP 5.3, including enhanced SPL support, namespaces, closures, and phar archives.
In an earlier article, I examined the performance of various iteration strategies in QueryPath. After taking a hard look at the patterns I observed there, I revisited QueryPath's QueryPathIterator class to see if I could make a sizable performance improvement.









Recent comments
8 hours 11 sec ago
1 day 2 hours ago
1 day 4 hours ago
1 day 19 hours ago
1 day 22 hours ago
1 day 22 hours ago
2 days 2 hours ago
2 days 19 hours ago
2 days 19 hours ago
3 days 2 hours ago