03 May

QueryPath and Character Sets: Converting content with mb_convert_encoding()

in encoding, querypath, utf-8

QueryPath can be used to crawl the web, parsing web pages and gleaning information. But the HTML of remote websites is not always as pristine and standards compliant as we would like, and one thing that can be particularly frustrating is determining the encoding of a document. (This gets substantially more complicated when HTTP headers list one encoding and HTML meta tags list another -- a common configuration error).

QueryPath is primarily a library for working with XML and HTML, but it assumes that you know from the outset what character set your document uses. This is not always a good assumption to make. Here is one way to circumvent the problem: Rather than write code to find out a document's character set, use PHP built-in functions (assuming you have the MB library compiled in) to do this for you.

<?php
require 'QueryPath/QueryPath.php';
 
$url = 'http://mopy.fr/';
$contents = mb_convert_encoding(file_get_contents($url), 'iso-8859-1', 'auto');
$opts = array('ignore_parser_warnings' => TRUE);
 
print @qp($contents, 'title', $opts)->text() . PHP_EOL;
16 Apr

Video: A Developer's Introduction to Drupal

in drupal, php, programming

A few weeks ago I did a webinar for PHP|Architect and the Tek-X conference. The webinar was recorded, and is now available as a video. You can watch the presentation on A Developer's Introduction to Drupal at their site.
Drupal Intro WebinarDrupal Intro Webinar

The presentation is aimed at developers who are just getting started with Drupal and want to know how things work and where they can dive in.

Lots of thanks to Cal Evans and the PHP|Architect crew, all of whom were awesome to work with.

16 Apr

Linux/UNIX/OS X: How to find and combine multiple files

in linux, mac, os x, system administration, unix

This explains how to use a UNIX-like command line (including Linux and OS X) and the find command to search through a subdirectory and find all of the files with a certain extension, and then combine those all into one file. Surprisingly, this isn't a difficult task. It can be accomplished with one command on the command line:

$ find ./src -name '*.txt' -exec cat '{}' \; > test.txt

The above looks through everything in the ./src directory (including all subdirectories) for any files with the .txt extension. Each file it finds, it adds to test.txt. So at the end of the command's run, all of the text files will be combined together into text.txt. You can use this strategy to easily combine lots of files into one.

Using find, it's easy to customize the command above to do all kinds of things with files. I gave a few examples in an earlier post about the UNIX find command.

15 Apr

mkdir: Creating multiple subdirectories in one command

in linux, system administration

Often times, I want to create a full directory structure, and I'd like to do it with just one call to mkdir. That is, I want to create a root directory and multiple subdirectories all at once. Here's how to do this.

mkdir -p myProject/{src,doc,tools,db}

The above creates the top-level directory myProject, along with all of the subdirectories (myProject/src, myProject/doc, etc.). How does it work? There are two things of note about the command above:

  • The -p flag: This tells mkdir to create any leading directories that do not already exist. Effectively, it makes sure that myProject gets created before creating myProject/src.
  • The {} lists: The technical name for these is "brace expansion lists". Basically, the shell interprets this as a list of items that should be appended individually to the preceding path. Thus, a/{b,c} is expanded into a/b a/c.

You can nest brace expansion lists. That means you can create more complex sets of subdirectories like this:

mkdir -p myProject/{src,doc/{api,system},tools,db}

Notice that this creates two directories inside of doc/.

11 Apr

Blackboard UX Fail! How not to label buttons.

in user interface and ux

Several times in my experiences with Blackboard, I have accidentally clicked the wrong button. In Blackboard's forum thread editing screen, there are both Save and Submit buttons.
Blackboard Save and SubmitBlackboard Save and Submit

There are three problems with this display:

  • Both button terms are ambiguous (Save to what? Submit to for what?)
  • These terms are often used interchangeably
  • The buttons are right next to each other with no contextual distinction
02 Apr

Configuring Static IPs on a Comcast SMC Router

in system administration

I have recently been working on configuring a business-class Comcast SMC router to make use of a group of 5 static IPs. The documentation I found was sparse, and I spent a few days figuring out how to do this.

Turns out that it is very simple.
Configuring the FirewallConfiguring the Firewall

26 Mar

Loading Drupal Nodes into MongoDB with Drush

in drupal, mongodb, php, programming

To do some prototyping, I wanted to load all 32k of our Drupal nodes into MongoDB. At first, the thought of doing this seemed daunting. Then I realized that with Drush I could use a very simple script to perform an entire migration.

The result: With a 14 line PHP script, I transferred all of the nodes (CCK, taxonomy, and all) without a glitch.

Read on for the full explanation.

26 Mar

MapReduce as a Star Trek Episode

in mongodb

Kristina Chodorow, a member of the MongoDB development team, and maintainer of the Mongo PHP driver, wrote a great blog explaining Map Reduce as a Star Trek episode. It's a quick and humorous read.

Kristina is also doing a TEK-X Webinar on MongoDB today. I'm encouraging my dev team to attend.

22 Mar

A 53,900% speedup: Nginx, Drupal, and Memcache bring concurrency up and page load time way down

in drupal, memcached, nginx, performance

With a clever hack utilizing Memcache, Nginx, and Drupal, we have been able to speed the delivery time of many of our major pages by 53,900% (from 8,100 msec to 15 msec, according to siege and AB benchmarks). Additional, we went from being able to handle 27 concurrent requests to being able to handle 3,334 concurrent requests (a 12,248% increase).

While we performed a long series of performance optimizations, this article is focused primarily on how we managed to serve data directly from Memcached, via Nginx, without invoking PHP at all.
Nginx, Memcached, and DrupalNginx, Memcached, and Drupal

Read on for the full explanation of how we achieved this huge speedup.

17 Mar

Heating Water: UX principles from your microwave

in user interface and ux

When I was at WebMD back in the late '90's, I often worked on pieces of the site that were designed by Tog. (If you haven't heard of him, check out his Wikipedia page). I developed a sincere respect for UX engineering because of him.

Today, I was looking for a decent set of UX guidelines to give to people who were struggling to understand why it is important. I found Tog's, and this particular portion of the article really struck me.