It's Time for an Alternative to PHP

Aug 22 2015

PHP has had a good run, and still is the basis for a huge percentage of the websites out there. But it looks like its popularity may be waning. The thing is, PHP is meeting a very real need in the web development world. It's time for a replacement. Something that meets the same needs, but is updated for the modern web.

First I'll briefly explain why I think PHP needs a replacement, then I'll sketch the features I think any suitable replacement must have.

Update: I changed Reason 1 from Apache, to the more broadly descriptive Legacy, and added a short paragaph to the end.

Why Not PHP?

I developed using PHP for many years, following it from its last days in the 3.x versions to the middle of the 5.x versions. I wrote it concurrently with many other languages, including Java, Perl, Python, Ruby, and Go. I've had a great opportunity to see PHP's strengths and weaknesses over the years.

Right now, the weaknesses are strong enough to suggest that it's time to move to a new language.

Reason 1: Legacy

PHP is a legacy system. This shows in a few places, such as its continued entrenchment with Apache.

Apache is a phenomenal web server, and the simple fact of the matter is that it led the way for many emerging web technologies in the past. But it is now large and complicated compared to the newer HTTP servers on the market.

There remain two reasons to run the mammoth Apache server:

  1. You run a large website with highly sophisticated needs for supporting a huge variety of new and aged web technologies from only one type of server, and to do this, you need Apache's huge library of modules.
  2. You run PHP.

Running the complicated, large, and slow Apache server just to get PHP language support is suboptimal, to say the least. Yet PHP developers tend to have to do this in production to get "first class" support for PHP. And in many cases, they end up having to run a full Apache server locally for development.

Conversely, if you want to run PHP on a lighter web server, like nginx, be prepared to learn a lot about 90s technologies like CGI and FastCGI. And prepare to make some code changes.

Updated: PHP shows its age elsewhere, too. It still does not have even rudimentary unicode support. It retains quant attachments to the CGI-style of programming. It remains page request oriented, generally incapable of retaining state across requests. It's a language for Web 1.0.

Reason 2: Security

Security remains the bane of PHP. As continued vulnerabilities in top-tier PHP platforms like Drupal and Wordpress show us, it's hard to prevent SQL injection attacks in PHP. And it has been the platform for many other types of exploits, including those involving remote code execution, cross-site scripting, and form injection.

To be fair, it's not simply a matter of PHP having fewer security features than other languages. Were Python to achieve PHP's success, and were it to be as web-centric, doubtless we would see many similar errors.

But this is where a new alternative could shine. A language designed for server-side web programming on the modern web could and should make it much, much harder to make security mistakes.

Reason 3: Language

PHP is a huge gateway language. Many developers cite it as their first real programming language (along with JavaScript). But as a programming language, PHP is slapdash. There's the long-running joke about needles and haystacks (or is it haystacks and needles?!). There's the ubiquitous non-array type called array. There are a mishmash of C-style procedural functions and later OO classes. I could go on and on, but the bottom line is that it is not a clean and straightforward language. Some of the coding habits one acquires in PHP development take considerable re-training to learn other languages.

Interlude: Why Others Have Failed To Oust PHP

Other languages have come and gone during PHP's reign. mod_perl. CFML. Ruby on Rails. JSP. Node.js. Why can't even these decent technologies bump PHP off the map? The reason is that PHP makes web development ridiculously easy. It is 100% web oriented.

Unlike CFML and JSP, it doesn't require a buy-in to some bigger enterprisey framework, with supporting Java code to do anyting interesting.

Unlike Ruby, Node, and Perl, it's not a web add-on for an existing language.

PHP is a highly successful domain specific language. And that is a Really Good Thing (TM).

What Could Replace It?

If we were to build a new language to compete with PHP, what features would it have?

Small Is Beautiful

From the ground up, the language would be built for running on small web servers. PHP's inclusion of a built-in webserver was just a little too late, but it is a fantastic idea. Any new language should be able to do that.

And configuring it to run on production servers should be easy, too.

First-Class Cloud Citizen

PHP nailed its market in the late '90s. People routinely ran their web apps through hosting providers. And the huge hosting industry grew up around the LAMP stack.

Today, hosting is fading to the background, and cloud is the new target platform. Think virtual machines, containers, object storage, and microservices.

How easy could we make it to run a "cloud native" web language? Anything that could fill PHP's philosophical shoes needs to be as easy as possible to run in the cloud -- whether that's in containers, VMs, or even unikernels. (In fact, there may be an argument to be made for writing a unikernel-first language.)

Web Specific Libraries

The standard installation of the language must provide abundant support for common web functionality. Above all else, this is to be the focal point of the language.

Standard libraries can be broken into two categories: Basic web, and advanced web/internet.

Basic Web:

  • HTML templating like PHP's Twig.
  • Strong support for CSS Selectors as a method for querying HTML.
  • Styles written in the language, and CSS generated as output.
  • Browser-side scripts written in the language, and JavaScript generated as output. (But also an ability to use raw JavaScript.)
  • Forms written in the language, HTML forms generated as output.
  • Native support for sessions.
  • Abstraction for client-server out-of-band communications (e.g. web sockets or HTTP/2 channels). This should be strongly linked to the script generator, but be easy for pure JavaScript (etc.) to interact with.
  • An unambiguous Request/Response model.
  • Rich string processing, oriented toward working with common Web strings like URLs.
  • Transparent HTTP/2 support.
  • A generic model library (CRUD-oriented) that makes it easy to define a model, and easy to implement a model backend. (Diatribe: How many model layers does every language actually need? Maybe we could build one in and start with the premise that the answer is "we only need one.")

Advanced Web/Internet

  • Each request is handled on its own "thread" (concurrency unit), but a master "context" for a program can be accessed to store and retrieve "globals" that persist across requests. Other than initialization (start-up of app) and shutdown (tear-down of app), code cannot be executed in the global context.
  • A full TCP/IP library for supporting other protocols.
  • A Codec trait, with several common encoders and decoders included. JSON, XML, and YAML are obvious candidates.
  • Ability to access low-level HTTP/2 details, like data frames vs. header frames.

Security Starts In The Language

Most languages are constructed to give the programmer as much leeway as possible. That is great for a general purpose or system programming language. But for a web-specific language, we can list out a number of things that we would rather be able to do brainlessly and quickly than trade off for deep functionality.

Here are some ready examples:

  • HTML/JS/CSS sanitization: By default, the language should make it easy to send all HTML, JS, and CSS through a sanitization layer en route to printing. An example of this is Go's html/template package. In contrast, it should be hard (but not impossible) to send unfiltered HTML, CSS, and JS to the browser.
  • Preparing and sanitizing database statements: The easy first step is to require all statements to be executed like prepared statements. String-building should not be supported at all.
  • Using SSL/TLS: Enabling SSL in the engine should be dead simple, and accessing security information inside the language should be even easier. For starters, the program ought to be able to easily determine whether or not the current request/response is protected by SSL.
  • Secure sessions and authentication: There is no reason that today's web developer should have to write code to generate secure session tokens, or to write basic auth handlers. The language should provide decent secure authentication out of the box, and should continue to evolve along with emerging best practices.
  • Form processing: Input validation and protection from XSS and various injection attacks are both features that should just come with the language engine. This includes supporting secure file uploads.

A Gateway Language

Because web development is such a common entry to computer programming, this language should be conducive to teaching good programming techniques and terminilogy, while also being straightforward.

A few such features come readily to mind for me:

  • Unicode (with UTF-8) is default and is supported to the core.
  • A flexible type system. Python's type system is a good example of a loose type system that is also flexible. Strongly typed languages are not great for typical web development, and practically require generics and collections if they are to make web development convenient. That trade off is not worth it.
  • Strong list and map collections, decent extended collections. Today's web development doesn't need to the low-level fixed arrays, nor is it imperative that they support the rich collections libraries you see in languages like Java. But... really flexible list and map implementations form a CS foundation that is practical. For a first pass, I would even argue that a language like this should have only TWO collections: A growable list and an ordered hash map.
  • Syntax that is simple, but not whitespace-oriented. Javascript, Ruby, Elixir and Go are all quite elegant (in different ways) in this regard.
  • Conceptually simple OO-like language. Procedural and functional programming languages each have their strong points. But OO is a good common ground when it comes to straddling the language's dual goals. It is good for web development, and it is good as a gateway for introducing other languages. I think favoring a Class/Object/Trait/Composition style language would be great
  • Memory managed, garbage collected, and defaulting to pass-by-reference -- all in the name of ease of use.
  • Built-in dependency management system that is similar to Ruby Bundler, NPM, or Composer.
  • Concurrency via CSP. I love Elixir's concurrency model. Go's is great, too. These make it easy to write concurrent programs, but avoid many of the difficulties imposed by thread-based languages.
  • It should be trivially easy to write an entry point to the application that simply contains HTML. This does not entail embedding the language inside of HTML like PHP. But achieving a similarly quick HTML page is desirable.

But should it be a scripting language? Honestly, I think people make too much of a deal about this point. The fact is that for a scripting language of significant complexity (like Python, PHP, Ruby, and Perl), the distinction between interpretation time and runtime introduces as much frustration as it purports to ameliorate by not requiring compilation. And even in a weakly typed language, many errors can be caught during compilation time.

Elixir and Go both have modes that are more script-like and modes that compile. Perhaps this is the way of any future web language.

Non-Goals

The following things should not be goals for this language:

  • Providing a "general purpose" language, and a web add-on.
  • Focusing on high-performance, scientific, enterprise, or distributed uses of the language.
  • Building a "Pure" (academic).

Where To Start?

I can honestly say that I haven't seen any language that I think is a great basis for such a project.

Elixir, Go, and Rust each have some desirable elements to be extracted, and PHP itself should indeed be the bar which any such language must surpass. But this may honestly be a case where a new language is necessary.