Come to the 2010 CMS Expo

Blogs

MongoDB: 5 Things Every PHP Developer Should Know About MongoDB

2010 will be remembered as the year SQL died; the year relational databases were moved off of the front line; the year that developers discovered that they no longer had to force every single object into a tabular structure in order to persist the data.

2010 is the year of the document database. While momentum has been steadily building over the last seven years or so, there are now a wide variety of stable document databases -- from cloud-based ones from Amazon and Google, to a wide variety of Open Source tools, most notably CouchDB and MongoDB.

So what is MongoDB? Here are five things every PHP developer should know about it:

  1. MongoDB is a stand-alone server
  2. It is document based, not table-based
  3. It is schemaless
  4. You don't need to learn another query language
  5. It has great PHP support

Read on to learn a little about each of these.

Tek-X Webcast: A Developer's Intro to Drupal

On March 12, 2010, I will be online with the folks from Tek-X giving a webcast on A Developer's Intro to Drupal. If you're just getting your feet wet with Drupal and are still a little confused about hooks, modules, themes, nodes, or even why Drupal isn't (fully) Object-Oriented, then this session is for you.
Tek-X Drupal WebcastTek-X Drupal Webcast

Using BetterAWStats in Drupal

Our current environment uses AWStats to analyze our HTTP server log files and build reports. Because it has privileged access to our data, and because it is open source, we can glean more information out of it than we could from proprietary hosted analytics platforms.

It turns out that there is a PHP front-end to AWStats (called BetterAWStats) that comes complete with a Drupal module. Here, I explain how we've installed and configured this module to get our AWStats data imported into our Atrium server.

Why does Nginx return 499 errors?

I noticed something unexpected in my nginx logs today: There were a bunch of 499 HTTP codes in the access log. Oddly, these didn't show up in Google Analytics, there were no corresponding errors in the error log, but they did show up in my AWStats. What's the deal?nginxnginx

The answer is pretty simple: Nginx uses 499 as the status code when the client unexpectedly terminates a connection. (Thus the client may have already received a 200 in the header, AFAIK). This is consistent with the usage of 4xx errors as indicating a client error condition.

A quick calculation showed me that the 499s accounted for only 0.2% of our total traffic. Not bad. And in fact, I sorta like the ability to see how many times clients terminated connections to my server.

Large MySQL Imports with GoDaddy: How to get your database imported

Every once in a while, I have some project that requires working with one of GoDaddy's servers. By far, the biggest frustration for me when dealing with GoDaddy is getting MySQL databases uploaded. I've tried all kinds of crazy tricks, from exporting MySQL databases in "bite sized chunks" to writing SQL processors that break large imports into smaller ones.The _db_backup directoryThe _db_backup directory

But today I think I found the Right Solution (TM) to the problem: Use GoDaddy's database restoration tool to load a large SQL file. Basically, instead of treating this as an initial import, we treat it as a restoration from a backup file.

Here's how you do it.

A QueryPath script for checking on a sitemap

Sitemap ScoresSitemap ScoresI've been tuning our sitemap during the last few months, and one thing I needed was a quick tool to check on the effectiveness of various sitemap generation strategies.

To do this, I wrote a quick QueryPath script (see a full-sized image of the output). The script is explained below.

The code is pretty straightforward. It simply retrieves a URL, parses the sitemap contents, and then sorts them. Finally, it displays the top 100 entries. I've tested it on sitemaps with over 20,000 items. While it is a little slow on such a large document, it works fine.

#!/usr/bin/env php
<?php
require 'QueryPath/QueryPath.php';
 
define('MAX_ITEMS', 100);
 
$sitemap = 'http://example.com/sitemap.xml';
 
$urls = array();
print "Parsing sitemap...\n";
$qp = qp($sitemap, ':root>url>loc');
$size = $qp->size();
$max = $size > MAX_ITEMS ? MAX_ITEMS : $size;
printf("Found %d entries; printing top %d\n\n", $size, $max);
 
try {
    foreach ($qp as $url) {
      $loc = $url->text();
    $score = $url->nextAll('priority')->text();
    $urls[$loc] = $score;
    }
} catch (Exception $e) {
  print $e->getMessage();
}
 
arsort($urls);
 
$filter = "%d: %0.5f  %s\n";
 
foreach ($urls as $uri => $score) {
  if ($i++ == $max) break;
   printf($filter, $i, $score, $uri);
};
?>

Basically, the script above simply fetches all of the URLs out of the sitemap, and then sorts them by their corresponding score. Only the top MAX_ITEMS (100) are shown.

5 Differences: Moving from XML Sitemap module to Google's Sitemap Generators

For a large site that I maintain, we recently disabled the XML Sitemap module (we're using the 1.x branch) and switched to the Google Sitemap Generators tool (the Python one). We have noticed a few unsurprising things, and a few very surprising things.

We identified five big differences (all positive) that we have seen since moving to the Google Sitemap Generators Python tool.

Downtime-free Drupal Migration

In Jauary we migrated a Drupal site that routinely has 40k+ hits per day. We moved the site from servers in the Pacific Northwest to a datacenter in Virginia. As if that wasn't enough, we moved the servers from Apache to Nginx, as well. But what makes this remarkable to me is that we managed to pull this off without so much as a minute of downtime. This blog explains how we did it (and it uses lots of pretty diagrams, too!).

Google Scholar and RefMan: Configuring Scholar to give downloadable RIS references

Did you know that you can configure Google Scholar to provide RIS download links?

RIS is an industry-standard format for importing and exporting bibliography information. Recently I posted a PHP library for working with RIS files. I wanted to find a good search tool that would allow me to find articles, and then download them into Lantern (a project I will release soon).

RefMan is a popular tool that also uses the RIS format. So to enable RIS downloads, simply tell Google Scholar to provide RefMan support.

Here are the steps to do this:

  1. Log into Google Scholar (http://scholar.google.com)
  2. Click on scholar preferences next to the Search button.
  3. Scroll to the bottom of the configuration screen to Bibliography Manager and choose RefMan

Here's a screenshot showing the last step.
Bibliography Manager SettingsBibliography Manager Settings

Once you have saved those preferences, every article in your search results should have an Import into RefMan link next to it.

LibRIS: A PHP library for RIS parsing and writing

LibRIS is a library for parsing and writing RIS data.

Learn all about it at the official GitHub repository.

RIS is a data file format for handling reference metadata for scholarly resources. It is used by Reference Manager, EndNote, and other such tools. For that reason, it is broadly supported by online scholar-centered sites.

This library provides a simple interface for parsing and writing RIS data for bibliography management.

Syndicate content