NoSQL No More

Apr 11 2014

I work at Revolv, and right now we're just about finished with a major migration off of a NoSQL database. We're moving onto a traditional RDBMS (PostgreSQL, to be precise). Many people seem surprised that we, a tech-savvy startup, would be moving to an "old" technology. But in fact our reasoning is pretty simple: A relational system better meets our needs and is more cost-effective.

There are three main reasons why I say this.

1. Our data is relational

NoSQL databases are notorious for not handling relations gracefully. NoSQL design eschews relational modeling by suggesting that objects ought to be stored in such a way that relations are not needed as often. For example, instead of storing a "chair" with a relation to four "legs", each leg should simply be stored on the chair's record. There is no separate "legs" table. At times, there is definite value in this approach.

But what about data that really is relational? When it comes to authors and articles, for example, copying author data onto every single article object results in severe data de-normalization. Copies of author bios everywhere! Conversely, storing articles as attributes of an author adds unnecessary complexity when, say, retrieving a list of the most recent articles for all authors. And the more complex and numerous the relationships, the harder it becomes to model and efficiently query in a NoSQL database. Our data is far more relational than this example. It is filled with many-to-many relationships where no particular record type takes precedence as a "first class" type under which others may be subservient.

Some data fits the NoSQL model well. That's great. But some data simply does not fit the assumptions of NoSQL design. Shoe-horning data into a database--making concessions on structure, query-ability, and performance--is a good indication that you're using the wrong tool for the job.

2. We need better querying

Our NoSQL-based application was getting pretty heavy on a specific kind of logic: building up and executing queries. With no intermediary query language, we ended up writing special code for every single query. And each language's API was different to the point of requiring different techniques when executing database operations on different tools.

And then there were security considerations, type checking, query performance, and so on. We had to write code for each language we used.

This whole scenario got more problematic when we needed to run ad hoc queries. When asked "How many X's do we have that are running Y?", we could only respond with, "Give me a few days to code up a query, test it, deploy it, and then I'll let you know." And sometimes we'd come back with the response that writing the query was too time consuming to justify the expense.

At one point, things got so bad that it was faster to dump the entire database into CSV files and use Excel! Maybe this really is the right solution for ad hoc queries, but it just seems dirty and inelegant.

Now before anyone gets too bent out of shape, it is true that part of this is an attribute of the particular NoSQL database that we selected. The query API for our old NoSQL database is primitive and there is no higher-level query language. The same does not hold true for all NoSQL databases. But few NoSQL databases have complete query languages built-in. (Surprisingly few have command-line monitors that support ad hoc queries.) Switching to a SQL-based database completely alleviates this problem.

3. We have access to better resources

Frankly, it's frustrating to have to build and rebuild even trivial tools for NoSQL databases. And it's equally frustrating to find so many knowledge gaps and documentation shortages. So much time is spent on figuring out and doing the mundane. The SQL database world just looks much better on this front.

There are SQL workbenches. Their are data visualizers. Backup tools. Import tools. Fantastic libraries in all major languages. There are 60 years of theory, hundreds of books, and thousands of articles. There is a degree of consistency (though not perfect) across most SQL-based RDBM systems. There are tried and true methodologies and patterns to follow. There are techniques for scaling, for segmenting, and for replicating -- techniques that work.

And yet with all of this, SQL databases don't come with the kind of enterprise cruft or cognitive overhead that most technologies accrue over such a long lifetime. Most RDBM systems remain carefully focused on providing ACID reliability with high performance storage, not on adding crazy new fringe feature sets.

Even while there are several good NoSQL databases, none really boast the kind of fostering resource platform that RDBM systems have. The field is still so fragmented that I don't see this situation improving in the near future.

The bottom line: choose the right tool

I am at pains to emphasize that I am not bashing NoSQL in theory or practice. I am not claiming that NoSQL databases are valueless, or bad, or necessarily inferior. But right now the landscape is fragmented, inconsistent, and still immature. Great things can be built atop NoSQL databases, and NoSQL solutions are indeed the best for some classes of problems.

But there is a danger in assuming that just because NoSQL is newer it's better. Likewise, there is a danger in claiming that NoSQL solves some problems that RDBM systems do not solve, therefore NoSQL is always better than SQL-based solutions. I was, and still am, a big fan of some of the NoSQL databases out there (MongoDB maybe most of all). But I've found myself more inclined to choose the tool that naturally matches a data model. (This means, of course, that I do the data model before selecting the database.)

Finally, I've realized the value of an established ecosystem. It was fun to write a cutting edge document database using a shiny beta NoSQL storage engine. It was decidedly less fun to write tools to export data, automate tasks, run complex queries, and manage security. And it was downright painful to right one-off programs every time sales, marketing or somebody else needed to "see some numbers". SQL may not be sexy, but it sure is useful.