TechnoSophos

Writing a Kubernetes CRD Controller in Rust

2019-08-07T13:58:00+00:00

In this post, we'll define a Kubernetes Custom Resource Definition (CRD) and then write a controller (or operator) to manage it -- all in 60 lines of Rust code.

Over the last several months, I have been writing more and more Kubernetes-specific code in Rust. Even though Kubernetes itself was written in Go, I am finding that I can typically write more concise, readable, and stable Kubernetes code in Rust. For example, I recently wrote functionally equivalent CRD controllers in Rust and in Go. The Go version was over 1700 lines long and was loaded with boilerplate and auto-generated code. The Rust version was only 127 lines long. It was much easier to understand and debug... and definitely faster to write. Here, we'll write one in just 60 lines.

Getting Started

You should have the latest stable Rust release. You'll also need kubectl, configured to point to an existing Kubernetes cluster.

A controller runs as a daemon process, typically inside of a Kubernetes cluster. So we'll create a new Rust program (as opposed to a library). Our aim here is to provide a basic model for writing controllers, so we won't spend time breaking things down into modules. We also won't cover things like building a Rust Docker image or creating a Deployment to run our controller. All of that is well documented elsewhere.

Let's start by creating our new project:

$ cargo new k8s-controller
     Created binary (application) `k8s-controller` package

Before we start writing code, let's create two YAML files. The first is our CRD definition, and the second is an instance of that CRD. We'll create a directory in k8s-controller/ called docs/ and put our YAML files there.

The Custom Resource Definition looks like this:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: books.example.technosophos.com
spec:
  group: example.technosophos.com
  versions:
    - name: v1
      served: true
      storage: true
  scope: Namespaced
  names:
    plural: books
    singular: book
    kind: Book

Stepping through this file is beyond the scope of this tutorial, but you can learn all about this file format in the official docs. (Recent versions of Kubernetes added more fields to the definition, but we're going to stick with a basic version.) A CRD is just a manifest that declares a new resource type and expresses the names that are associated with this new resource type. The full name of ours is books.example.technosophos.com/v1.

Next, let's make an instance of our Book CRD.

apiVersion: example.technosophos.com/v1
kind: Book
metadata:
  name: moby-dick
spec:
  title: Moby Dick
  authors:
    - Herman Melville

As with most Kubernetes resource types, our example above has two main sections:

metadata, which is a predefined metadata section
spec, which holds our custom body

We can quickly test to make sure that things are working:

$ kubectl create -f docs/crd.yaml
customresourcedefinition.apiextensions.k8s.io "books.example.technosophos.com" created
$ kubectl create -f docs/book.yaml
book.example.technosophos.com "moby-dick" created
$ kubectl delete book moby-dick
book.example.technosophos.com "moby-dick" deleted

We now have everything we need to start coding our new controller.

Setting up our `Cargo.toml` file

Rather than incrementally adding dependencies to our Cargo.toml file as we go, we'll just set up all of the dependencies now. As the text progresses, we'll see how these are used.

[package]
name = "k8s-controller"
version = "0.1.0"
edition = "2018"

[dependencies]
kube = "0.14.0"
serde = "1.0"
serde_derive = "1.0"
serde_json = "1.0"

The serde serialization libraries are likely already familiar to you. And kube is the Kubernetes library for writing controllers. (Another library, k8s_openapi, is useful for working with existing Kubernetes resource types, but we don't need it here)

Part 1: Create the Book Struct

The first piece of code we'll write is a struct that represents our book CRD. And the easiest way to start with that is to write the basic struct that defines the body (spec). In our book.yaml we had two fields in spec:

title: The book's title
authors: A list of authors

Since we're just writing a quick example, we'll go ahead and create this struct inside of main.rs:

#[macro_use]
extern crate serde_derive;

// This is our new Book struct
#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct Book {
    pub title: String,
    pub authors: Option<Vec<String>>,
}

// This was the boilerplate that Cargo generated:
fn main() {
    println!("Hello, world!");
}

By making the title a string and the authors an Option, we're stating that the title is required, but the authors are not. So now we have:

A title string
An optional vector of authors as strings

We've also used macros to generate the Serde serializer and deserializer features as well as clone and debug support.

If we look again at our book.yaml, we will see that the body of the book has two sections:

metadata with the name
spec with the rest of the data

Some Kubernetes objects have a third section called status. We don't need one of those.

The kube library is aware of this metadata/spec/status pattern. So it provides a generic type called kube::api::Object that we can use to create a Kubernetes-style resource. To make our code easier to read, we'll create a type alias for this new resource type:

// Describes a Kubernetes object with a Book spec and no status
type KubeBook = Object<Book, Void>;

A cube::api::Object already has the metadata section defined. But it gives us the option of adding our own spec and status fields. We add Book as the spec, but we don't need a status field, so we set it to Void.

Here's the code so far:

#[macro_use]
extern crate serde_derive;

use kube::api::{Object, Void};

#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct Book {
    pub title: String,
    pub authors: Option<Vec<String>>,
}

// This is a convenience alias that describes the object we get from Kubernetes
type KubeBook = Object<Book, Void>;

fn main() {
    println!("Hello, world!");
}

Now we're ready to work on main().

Part 2: Connecting to Kubernetes

Next, we'll create the controller in the main() function. We'll take this in a few steps. First, let's load all of the information we need in order to work with Kubernetes.

#[macro_use]
extern crate serde_derive;

use kube::{
    api::{Object, Void, RawApi},
    client::APIClient,
    config,
};


#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct Book {
    pub title: String,
    pub authors: Option<Vec<String>>,
}

// This is a convenience alias that describes the object we get from Kubernetes
type KubeBook = Object<Book, Void>;

fn main() {
    // Load the kubeconfig file.
    let kubeconfig = config::load_kube_config().expect("kubeconfig failed to load");

    // Create a new client
    let client = APIClient::new(kubeconfig);

    // Set a namespace. We're just hard-coding for now.
    let namespace = "default";

    // Describe the CRD we're working with.
    // This is basically the fields from our CRD definition.
    let resource = RawApi::customResource("books")
        .group("example.technosophos.com")
        .within(&namespace);

}

If we run this program it won't do anything visible. But here's what's happening in the main() function:

First we load the kubeconfig file (or, in cluster, read the secrets out of the volume mounts). This loads the URL to the Kubernetes API server, and also the credentials for authenticating.
Second, we create a new API client. This is the object that will communicate with the Kubernetes API server.
Third, we set the namespace. Kubernetes segments objects by namespaces. In a normal program, we'd provide a way for the user to specify a particular namespace. But for this, we'll just use the default built-in namespace.
Forth, we are creating a resource that describes our CRD. We'll use this in a bit to tell the informer which things it should watch for.

So now we have sufficient information to run operations against the Kubernetes API server for our particular namespace and watch for our particular CRD.

Next, we can create an informer.

Part 3: Creating an Informer

In Kubernetes parlance, an informer is a special kind of agent that watches the Kubernetes event stream and informs the program when a particular kind of resource triggers an event. This is the heart of our controller.

There is a second kind of watching agent that keeps a local cache of all objects that match a type. That is called a reflector.

In our case, we're going to write an informer that tells us any time anything happens to a Book.

Here's the code to create an informer and then handle events as they come in:

#[macro_use]
extern crate serde_derive;

use kube::{
    api::{Object, RawApi, Informer, WatchEvent, Void},
    client::APIClient,
    config,
};

#[derive(Serialize, Deserialize, Clone, Debug)]
pub struct Book {
    pub title: String,
    pub authors: Option<Vec<String>>,
}

// This is a convenience alias that describes the object we get from Kubernetes
type KubeBook = Object<Book, Void>;

fn main() {
    // Load the kubeconfig file.
    let kubeconfig = config::load_kube_config().expect("kubeconfig failed to load");

    // Create a new client
    let client = APIClient::new(kubeconfig);

    // Set a namespace. We're just hard-coding for now.
    let namespace = "default";

    // Describe the CRD we're working with.
    // This is basically the fields from our CRD definition.
    let resource = RawApi::customResource("books")
        .group("example.technosophos.com")
        .within(&namespace);

    // Create our informer and start listening.
    let informer = Informer::raw(client, resource).init().expect("informer init failed");
    loop {
        informer.poll().expect("informer poll failed");

        // Now we just do something each time a new book event is triggered.
        while let Some(event) = informer.pop() {
            handle(event);
        }
    }
}

fn handle(event: WatchEvent<KubeBook>) {
    println!("Something happened to a book")
}

In the code above, we've added a new informer:

let informer = Informer::raw(client, resource).init().expect("informer init failed");

This line creates a raw informer. A raw informer is one that does not use the Kubernetes OpenAPI spec to decode its contents. Since we are using a custom CRD, we don't need the OpenAPI spec. Note that we give this informer two pieces of information:

A Kubernetes client that can talk to the API server
The resource that tells the informer what we want to watch for

Based on these pieces of information, our informer will now connect to the API server and watch for any events having to do with our Book CRD. Next, we just need to tell it to keep listening for new events:

 loop {
    informer.poll().expect("informer poll failed");

    // Now we just do something each time a new book event is triggered.
    while let Some(event) = informer.pop() {
        handle(event);
    }
}

The above tells the informer to poll the API server. Each time a new event is queued, pop() takes the event off of the queue and handles it. Right now, our handle() method is unimpressive:

fn handle(event: WatchEvent<KubeBook>) {
    println!("Something happened to a book")
}

In a moment, we'll add some features to handle(), but first let's see what happens if we run this code.

In one terminal, start cargo run and leave it running.

$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 7.28s
     Running `target/debug/k8s-controller`

Make sure your local environment is pointed to a Kubernetes cluster! Otherwise neither cargo run nor the kubectl commands will work. And make sure you installed docs/crd.yaml.

Now, with that running in one terminal, we can run this in another:

$ kubectl create -f docs/book.yaml
# wait for a bit
$ kubectl delete book moby-dick

In the cargo run console, we'll see this:

    Finished dev [unoptimized + debuginfo] target(s) in 7.28s
     Running `target/debug/k8s-controller`
Something happened to a book
Something happened to a book

In the final section, we'll add a little more to the handle() function.

Part 4: Handling Events

In this last part, we'll add a few more things to the handle() function. Here is our revised function:

fn handle(event: WatchEvent<KubeBook>) {
    // This will receive events each time something 
    match event {
        WatchEvent::Added(book) => {
            println!("Added a book {} with title '{}'", book.metadata.name, book.spec.title)
        },
        WatchEvent::Deleted(book) => {
            println!("Deleted a book {}", book.metadata.name)
        }
        _ => {
            println!("another event")
        }
    }
}

Note that the function signature says that it accepts event: WatchEvent<KubeBook>. The informer emits WatchEvent objects that describe the event that it saw occur on the Kubernetes event stream. When we created the informer, we told it to watch for a resource that described our Book CRD.

So each time a WatchEvent is emitted, it will wrap a KubeBook object. And that object will represent our earlier YAML definition:

apiVersion: example.technosophos.com/v1
kind: Book
metadata:
  name: moby-dick
spec:
  title: Moby Dick
  authors:
    - Herman Melville

So we would expect that a KubeBook would have fields like book.metadata.name or book.spec.title. In fact, all of the attributes of our earlier Book struct will be available on the book.spec.

There are four possible WatchEvent events:

WatchEvent::Added: A new book CRD instance was created
WatchEvent::Deleted: An existing book instance was deleted
WatchEvent::Modified: An existing book instance was changed
WatchEvent::Error: An error having to do with the book watcher occurred

In our code above, we use a match event to match on one of the events. We explicitly handle Added and Deleted, but capture the others with the generic _ match.

To look closer, in the first match we simply print out the book object's name and the book's title:

WatchEvent::Added(book) => {
    println!("Added a book {} with title '{}'", book.metadata.name, book.spec.title)
},

If we execute cargo run and then run our kubectl create and kubectl delete commands again, this is what we'll see in the cargo run output:

$ cargo run
   Compiling k8s-controller v0.1.0 (/Users/technosophos/Code/Rust/k8s-controller)
    Finished dev [unoptimized + debuginfo] target(s) in 5.33s
     Running `target/debug/k8s-controller`
Added a book moby-dick with title 'Moby Dick'
Deleted a book moby-dick

From here, we might want to do something more sophisticated with our informer. Or we might want to instead experiment with a reflector. But in just 60 lines of code we have written an entire Kubernetes controller with a Custom Resource Definition!

Conclusion

That is all there is to creating a basic controller. Unlike writing these in Go, you won't need special code generators or annotations, gobs of boilerplate code, and complex configurations. This is a fast and efficient way of creating new Kubernetes controllers.

From here, you may want to take a closer look at the kube library's documentation. There are dozens of examples, and the API itself is well documented. You will also learn how to work with built-in Kubernetes types (also an easy thing to do).

The code for this post is available at github.com/technosophos/rust-k8s-controller

You can find the final code in the GitHub copy of main.rs

The TRUE hardest programming problem is tight vs. weak coupling

2018-08-19T22:16:00+00:00

A few months ago, I claimed that naming is the hardest programming problem. I was wrong. The true hardest problem is one that impacts every developer at every skill level, across all programming languages, regardless of experience. It appears on multiple levels, from language details to large scale distributed computing. It is equally applicable across all programming disciplines. And its impacts are monetary, hedonic, and cognitive.

The hardest problem in programming is this:

Should X and Y be tightly coupled or weakly coupled?

I was trained as a philosopher. Philosophers are a motley bunch, but if there is any generalization that is fair to level at all philosophers, it's this: Philosophers revel in questions that appear to have easy answers, but which, upon reflection, might just be intractable. This appears to be one of those.

What Do We Mean by Tight and Weak Coupling?

When we talk about coupling, we're talking about establishing a relationship between two things. While programming, we might couple two functions like this:

function x() {
  //do something
}

function y() {
  x()
}

By making y() dependent on x(), we've coupled them. But coupling is a more generic term. We could reverse the dependency (have x() call y()). We could have a mutual dependency (x() calls y(), which calls x()). We could add a layer of indirection, such as have y() call an implementation of interface X, and have x() be an implementation of X that is available to y(), and so on.

This particular domain is especially interesting because it admits of multiple levels and abstractions. We can talk not just about functions being coupled, but other common things:

Libraries can be coupled
Programs can be coupled (program x calls program y...)
Services can be coupled (which is the basis for microservice architecture)
Data formats can be coupled (such as of SGML, HTML, XML, SVG, and so on)
Protocols can be coupled
User interface elements can be coupled
Hardware and software can be coupled
Network stacks can be coupled
We can even get into high level techs like websites, phones, relations between datacenters, and so on.

And to really drive the point home, the new FaaS (Functions as a Service) paradigm is all about coupling "functions" that are each stand-alone services. The hype generated around this technology revolves around this idea that FaaS provides a desirable boundary across which "functions" couple.

So part of what makes this problem interesting is that it is likely be hit by everyone from CS101 students to datacenter operators to architects to data scientists. Everyone in our space deals with coupling, and anyone making design decisions will have to make decisions about coupling.

Next, we need to distinguish between two different kinds of coupling, which many computer scientists refer to as tight and weak coupling.

Tight Coupling

It is easier to start with the more restrictive case, and that is tight coupling.

When establishing a relationship from X to Y, only the mutual needs of X and Y are considered.

What this says is that tight coupling consists in reducing a coupling problem to only two questions: What does X need to be successfully related to Y? And what does Y need to be successfully related to X?

In programming, this might play out as deciding how to break up programming logic into different functions, and then make calls from one function to another. At a higher level, two microservices are tightly coupled when one service's sole role is to fulfill the needs of one other service.

There's actually a very interesting question to ask at this point: Is coupling about intention or about reality? That is, when we say X is tightly coupled to Y, do we mean "X was designed in such a way as to be related to Y in an exclusive way?" Or do we mean, "X happens to be related to Y in an exclusive way"? It might be easiest to explain that in light of the microservice example above:

Option A: My microservice Y was designed only to interoperate with microservice X.

Option B: As it happens, microservice Y only interoperates with microservice X.

Clearly, there are many cases where Option B arises in the wild. But those cases are not particularly... how should we say it... philosophically interesting. There are no engaging problems to solve.

But when it comes to intentions, then we are onto something interesting. For in this case we can ask what ought to be done, and how we should make decisions.

At this point, we are talking about making design decisions that involve coupling X to Y by considering only the needs of X and Y.

Loose Coupling

Defining loose coupling now no appears to be a boring exercise:

It is not the case that when establishing a relationship from X to Y, only the mutual needs of X and Y are considered.

But we can really ignore a bunch of logical cases we don't care about (in which we're focusing on the antecedent) do a little rewording, and give an account of loose coupling that is not a negation of strong coupling, but does provide a relevant alternative to string coupling:

When establishing a relationship between X and Y, the mutual needs of X and Y are considered along with additional needs.

But "additional needs" is frustratingly vague.

Given a system of abstraction S, composed of individual component parts, when establishing the relationship between X (a component of S) and Y (a component of S), the needs of each component in S are considered as they pertain to the relationship between X and Y.

Note that since X and Y are both components of S, their needs are each considered. But have we just kicked the ambiguity up a level? Because now we are talking about components as they pertain to.... To get rid of this is going to be tedious.

Given a system of abstraction S, composed of individual component parts, and where X and Y are each component parts, when establishing a relationship from X to Y, the needs of each component's relationship to Y, and the needs of X's relationship to each component must be considered.

The problem with this definition is that we don't need to consider every possible relationship. That is, when looking at how function X calls function Y to sum a few numbers, we shouldn't also have to look at how X calls Z to check whether a string contains a substrings. We just care about the particular relationship that is under scrutiny. (e.g. what we are really asking is whether Y should or could be used by other components rather than just X, and (vice versa) whether X should be able to use components other than Y to perform the required summing task?

So now we're onto a seriously frustrating definition:

Given a system of abstraction S, composed of individual component parts, and where X and Y are each component parts, when establishing a particular relationship R from X to Y, the needs of each component's relationship R to Y, and the needs of X's relationship R to each component must be considered.

By limiting the definition to just a particular relationship makes things less generic.

At this point, one might argue that we've gotten overzealous. Do we really need to consider each component in the system? The short answer is yes. Really, it's a yes, because.... It is perfectly legitimate to apply broad heuristics and say, "When considering the relationship R, I simply ruled out a whole bunch of components because they weren't directly relatable."

But wait! There's more! We need to pull off a very dangerous philosophical move and leave the realm of the existing system, entering the realm of the possible. Because we also need to say, "what if at some point I write new code that does Z... will it, too, need a relationship R with X?"

(If you're keeping track... we started with predicate logic, worked our way into set logic, and are now in modal logic. This problem is a massive pain in the butt.) So we somewhat need to revise our statement to be thus:

Given a system of abstraction S, composed of all possible individual component parts, and where X and Y are each possible component parts, when establishing a particular relationship R from X to Y, the needs of each component's relationship R to Y, and the needs of X's relationship R to each possible component must be considered.

Now the problem we skirted earlier might actually be a real problem. For while we can, with some justification, rule out a broad number of actual cases in our code, the set of possible components is highly likely to be substantially larger. Which means, I'm afraid, that we are going to have to cheat... err... be instrumental.

Given a system of abstraction S, composed of all possible individual component parts, and where X and Y are each possible component parts, when establishing a particular relationship R from X to Y, the needs of each component's relationship R to Y, and the needs of X's relationship R to each relevant possible component must be considered.

And now we use "relevance" to give us a cognitive safety net, fleshing it out "within scope of system S at time T (when the decision is being made)" and then declaring "within scope" to include a cognitive boundary. Or, to put it in plain English, "relevant" is shorthand for "stuff that seemed to me to be likely at the time."

Were this a proper philosophy paper, we would now revisit our definition of strong coupling, and would discover that we needed to enfancificate it as well. We'd add our systems wording, and our revised relationship wording, but it would still mean the same thing.

Instead, let's bump our definitions from the realm of set logic back to a simple grokkable definition:

When establishing a particular relationship between X and Y... * Strong coupling says we only considered the mutual needs of X and Y * Weak coupling says we consider the other relevant components in the system as well

How is this a Problem?

We've spent some serious wordcount just trying to explain the terms. But does any of this justify claiming that this distinction is at the heart of the hardest problem in programming?

Let's lay down the problem plainly: The hardest problem in programming is assessing, in any given circumstances, the myriad problems associated with coupling. Here's a two-pronged approach for illustrating just how deeply the problem is. First, I'll pick a particular programming challenge, and show the breadth of issues associated with coupling. Then I'll list out a variety of broader circumstance, each of which will admit a similar breadth of issues. In other words, we're doing something like tracing the perimeter of an issue in order to assess the area of the issue.

XXX

Now we can enumerate the areas in which problems like the above might manifest:

Do I write generalized classes (getters and setters for all the things?) or just do what's necessary for now?
Do I make the class/function/interface/variable public or private?
Do I expose this part of the library as part of the public API/SDK or leave it internal?
Do I expose this information on the REST API?

- Do I allow this data to be mutated, or just accessed?

All of these questions have at their core the question of whether the implementation is designed to tightly couple ("This API is private because only the internals should use it") or loosely couple ("This API is public, and thus I have to design for possible use cases").

From Go To Rust - Advanced Testing

2018-07-25T01:08:00+00:00

For the fifth installment of this series, we'll take a look at benchmarking, documentation testing, and integration testing. As usual, we'll start with an example in Go and see how it translates to Rust. Along the way, we'll be learning about the Rust language.

If you want to catch up on the series:

We started with the basics of working with Rust, and how that compared to Go.
Then we focused on web services for the second post.
The third post focused on working with file formats like JSON and YAML
And in the previous post we took a look at unit testing.

As I compared Go and Rust in the last post, I noted that Go has support for testing beyond mere unit tests. And that is where we'll start today.

Go Goes Beyond Unit Tests

Go's built-in testing package defines three classes of tests:

Unit tests, which we looked at last week
Benchmarks for testing performance of your code
Documentation functions, which can be tested automatically

So let's kick things off with an example of benchmarking and documentation functions. We'll continue where we left off last time. Here's the base library that we are writing tests for.

wordutils.go:

package wordutils

import (
    "bufio"
    "strings"
)

// Initials returns a string with the first letter of each word in the given string.
func Initials(phrase string) (string, error) {
    wrds, err := words(phrase)
    if err != nil {
        return "", err
    }
    initials := ""
    for _, word := range wrds {
        initials += word[0:1]
    }
    return strings.ToUpper(initials), nil
}

func words(str string) ([]string, error) {
    wordList := []string{}
    scanner := bufio.NewScanner(strings.NewReader(str))
    scanner.Split(bufio.ScanWords)
    for scanner.Scan() {
        wordList = append(wordList, scanner.Text())
    }
    return wordList, scanner.Err()
}

I am not going to reproduce the unit tests we covered last week. Instead, we are going to dive right into the new testing material, which will also be in wordutils_test.go.

package wordutils

import (
    "fmt"
    "testing"
)

func BenchmarkInitials(b *testing.B) {
    text := "I have measured my life in coffee spoons"
    for i := 0; i < b.N; i++ {
        if _, err := Initials(text); err != nil {
            panic(err)
        }
    }
}

func ExampleInitials() {
    text := "J. Alfred Prufrock"
    out, err := Initials(text)
    if err != nil {
        panic(err)
    }
    fmt.Print(out)
    // Output:
    // JAP
}

The BenchmarkInitials function will be called when we execute go test -bench. It will run the test multiple times until suitable benchmarks can be generated:

$ go test --bench .
goos: darwin
goarch: amd64
pkg: github.com/technosophos/wordutils
BenchmarkInitials-4       500000          2286 ns/op
PASS
ok      github.com/technosophos/wordutils   1.180s

The ExampleInitials function is part of the documentation. So if we run godoc on our library, we will see the example: godoc -html github.com/technosophos/wordutils Initials. (Unfortunately, the examples are not printed in the plain-text version of godoc help.)

But to make sure that our examples stay current and accurate, Go will automatically run them as tests during regular unit testing:

$ go test -v .
=== RUN   TestWords
--- PASS: TestWords (0.00s)
=== RUN   TestInitials
--- PASS: TestInitials (0.00s)
=== RUN   ExampleInitials
--- PASS: ExampleInitials (0.00s)
PASS
ok      github.com/technosophos/wordutils   (cached)

As with unit tests, Go determines the kind of test based on the function signature.

BenchmarkXXX(b *testing.B) is a benchmark
ExampleXXX() is an example

There are a few other variations of these patterns that you can use, but the basic idea is that the testing tool reflects over the code to determine what to execute during a testing cycle.

As we'll see with Rust, there are four supported classes of tests:

Unit tests (again, covered last time)
Benchmarks, which are new and still marked unstable
Documentation examples
Integration tests

Rust Benchmarks

Rust is introducing benchmark testing. It is available in the unstable builds of Rust, but not yet in the official stable build.

Enabling Benchmarking

So to test this out, we need to enable unstable features. Inside of the wordutils project, we need to run rustup override add nightly to switch us over to using the nightly build for this particular project.

$ rustup update nightly
$ rustup override add nightly
info: using existing install for 'nightly-x86_64-apple-darwin'
info: override toolchain for '/Users/mbutcher/Code/Rust/wordutils' set to 'nightly-x86_64-apple-darwin'

  nightly-x86_64-apple-darwin unchanged - rustc 1.29.0-nightly (6a1c0637c 2018-07-23)

Now we can use the benchmarking features.

Writing Benchmark Tests

Last time we created a wordutils library with Cargo. By the end, we were experimenting with several features of package organization. But let's start off with a simplified version of the code we used last time, and add just the benchmark.

#![feature(test)]
extern crate test;

pub fn initials(phrase: &str) -> String {
    phrase.split_whitespace().map(
        |word: &str| word.chars().next().unwrap()
    ).collect::<String>().to_uppercase()
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    #[bench]
    fn bench_initials(b: &mut Bencher) {
    let input = "J. Alfred Prufrock";
        b.iter(|| initials(input));
    }
}

Since we are using an unstable feature, we need to tell Rust explicitly that we know what we're doing when we use the test module:

#![feature(test)]
extern crate test;

By enabling the test feature, we indicate that we are using features that would otherwise be disabled because of stability flags.

The stability mechanism of Rust is a cool way of gradually introducing features, while letting people like us kick the tires and report errors.

Inside of our own mod test, we add just one benchmarking test:

use test::Bencher;

#[bench]
fn bench_initials(b: &mut Bencher) {
    let input = "I have measured my life in coffee spoons";
    b.iter(|| initials(input));
}

Instead of using the #[test] attribute, we use #[bench] to indicate that this is a benchmark. (Unlike Go, Rust function names have no impact on whether this is considered a benchmark test.)

The Bencher is Rust's equivalent of testing.B in Go. We've seen in previous posts how Rust uses iterators and anonymous functions. And in the example above, Rust is doing basically the same thing that Go does, only more compactly.

In Go, we wrote a benchmark like this:

text := "I have measured my life in coffee spoons"
for i := 0; i < b.N; i++ {
    if _, err := Initials(text); err != nil {
        panic(err)
    }
}

Looking at the for loop, we can see that we ran the test as many times as b.N indicates. But we did have to explicitly create the for loop for this.

Conceptually, Rust is doing the same thing. It is running the || initials(input) test as many times as bench.iter() dictates.

Recall from previous posts that |params| body is the syntax for Rust closures. bench.iter() takes a closure with zero parameters.

Now we can run the benchmark with cargo bench:

$ cargo bench
   Compiling wordutils v0.1.0 (file:///Users/mbutcher/Code/Rust/wordutils)
    Finished release [optimized] target(s) in 2.33s
     Running target/release/deps/wordutils-0770f6e0ea5ebd16

running 1 test
test tests::bench_initials ... bench:         512 ns/iter (+/- 53)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured; 0 filtered out

What we see from the result is that our initials test took about 512ns per operation, with a variance of 53. (I'm actually a little surprised. This is 4x faster than my Go implementation.)

As benchmarking is still fairly new, to learn more you'll need to take a peek at the nightly documentation on this feature.

Documentation Testing

While benchmarking is still a new feature of Rust, writing and testing examples is a stable feature.

I'm not going to beat around the bush about this, but I think Rust's documentation testing feature is a thing of beauty. Why? Because examples are embedded in the documentation blocks, and executed automatically. When I'm in the process of writing code, I feel like this is more amenable to my coding practice.

My biggest complaint with Go examples is that because they require a context switch, a special method signature, and special markup at the end, they are weird to write. Most Go developers simply don't write examples this way. Also, because Go's example methodology is a biased toward string testing, it's hard to make these examples reflective of real usage.

Rust goes the other direction: Examples are written as part of source code comments, and are executed almost like mini-programs.

But to understand how this testing works, we need to spend a moment on Rust source code documentation.

Documenting Rust

Like Go (and many other languages), Rust supports extracting documentation from the source code. Rust uses Markdown as the documentation format, which means we can write docs that are a little richer than Go's when it comes to formatting.

Let's document our initials() function:

/// Given a string, extract the initials.
/// 
/// Initials are composed of the first letter of each word, capitalized.
/// They are then joined together with no spaces.
pub fn initials(phrase: &str) -> String {
    phrase.split_whitespace().map(
        |word: &str| word.chars().next().unwrap()
    ).collect::<String>().to_uppercase()
}

Comments use three slashes (///). The first line is a complete, punctuated sentence. Unlike Go documentation, this does not begin with the name of the item being documented.

After a line break, we can add a more complete description. Then, to generate the documentation, we run cargo docs. The resulting documentation will be written into the target/doc/wordutils directory as HTML. We'll see an example shortly.

Adding an example in Markdown

The idea of Rust's example documentation is that it should accurately replicate how the function would be called in context. So we just embed a snippet of code right into the Markdown, using the sort of conventions you would normally use:

/// Given a string, extract the initials.
/// 
/// Initials are composed of the first letter of each word, capitalized.
/// They are then joined together with no spaces.
/// 
/// # Example
/// 
/// ```rust
/// let out = wordutils::initials("hello beautiful world");
/// assert_eq!(out, "HBW");
/// ```
pub fn initials(phrase: &str) -> String {
    phrase.split_whitespace().map(
        |word: &str| word.chars().next().unwrap()
    ).collect::<String>().to_uppercase()
}

Unlike our unit tests, we do need to call the package by its full name (or use a use).

Rust will run the inlined examples as unit tests during the testing phase:

$ cargo test
  Compiling wordutils v0.1.0 (file:///Users/mbutcher/Code/Rust/wordutils)
    Finished dev [unoptimized + debuginfo] target(s) in 1.07s
     Running target/debug/deps/wordutils-e01fe756921f1114

running 1 test
test tests::bench_initials ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests wordutils

running 1 test
test src/lib.rs - initials (line 11) ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

In the Doc-tests section, we can see that it ran our documentation test. (Interestingly, it also appears to have run bench_intials to make sure that it didn't fail.)

A documentation test is considered successful if it doesn't panic. (It does not, however, check whether one knows where one's towel is.)

You can also show examples of failures by annotating the markdown code block with should_panic. This test is totally contrived, but shows the basic idea:

/// Given a string, extract the initials.
/// 
/// Initials are composed of the first letter of each word, capitalized.
/// They are then joined together with no spaces.
/// 
/// # Example
/// 
/// ```rust
/// let out = wordutils::initials("hello beautiful world");
/// assert_eq!(out, "HBW");
/// ```
/// 
/// # Panics
/// 
/// ```rust,should_panic
/// let out = wordutils::initials("");
/// assert_eq!(out, "hello");
/// ```
pub fn initials(phrase: &str) -> String {
    phrase.split_whitespace().map(
        |word: &str| word.chars().next().unwrap()
    ).collect::<String>().to_uppercase()
}

In the second example, the assert_eq! will panic because the initials of "" will not be "hello". But when we run the test, it will succeed because it was expecting that example to panic.

$ cargo test
   Compiling wordutils v0.1.0 (file:///Users/mbutcher/Code/Rust/wordutils)
    Finished dev [unoptimized + debuginfo] target(s) in 2.55s
     Running target/debug/deps/wordutils-e01fe756921f1114

running 1 test
test tests::bench_initials ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests wordutils

running 2 tests
test src/lib.rs - initials (line 11) ... ok
test src/lib.rs - initials (line 18) ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Now if we generate the documentation with cargo doc, this is what it will look like:

At this point, we've seen both benchmarking and documentation testing. And as with Go, we haven't explored every nook and cranny of the framework, but we've gotten the basic idea.

However, we have one more thing to cover. And this is a feature that Go doesn't support in its core toolset: Integration testing.

Integration Tests in Rust

The last category of tests to cover is integration tests. Typically, while unit tests cover individual functions and are "close to the source", integration tests are designed to show that, from an outside perspective, things work as advertised.

Often, unit tests will use fixtures and mocks to test just very specific parts of the code. Integration testing more often forgoes mocks (at least mocks of internal things), and tests that the code is functioning together as a whole.

Rust has a top-level concept of integration tests. Inside of your Cargo project, they are placed in the tests/ directory adjacent to src/ and target/:

.
├── Cargo.lock
├── Cargo.toml
├── src
│   ├── lib.rs
│   └── tests.rs
├── target
│   ├── debug
│   ├── doc
│   └── release
└── tests
    └── integration_tests.rs

Integration tests are constructed the way the unit tests and documentation tests are: Write some code, use some asserts to make sure it does what it is supposed to.

Unfortunately for us, we don't have a whole lot of "integration" to do. But we'll still see the main pattern for integration tests, and see how this differs from unit tests.

I created an integration test in tests/ named integration_tests.rs. Zero points for originality.

extern crate wordutils;

use wordutils::initials;

#[test]
fn do_initials() {
    assert_eq!(initials("j. alfred prufrock"), "JAP");
}

The main thing to notice about this is that integration tests are structured the same way that an external tool would use the library.

Pro Tip: That means that the contents of a crate's tests/ directory is a great place to figure out how to use a library.

So we use extern to declare that we are using wordutils and we use use to import the initials function into our current namespace.

However, we still have to use the #[test] attribute to declare that do_initials() is a test function.

As with unit and documentation tests, Cargo will do us the favor of running these tests as part of cargo test:

$ cargo test
   Compiling wordutils v0.1.0 (file:///Users/mbutcher/Code/Rust/wordutils)
    Finished dev [unoptimized + debuginfo] target(s) in 0.72s
     Running target/debug/deps/wordutils-e01fe756921f1114

running 1 test
test tests::bench_initials ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/integration_tests-543eeef678725b84

running 1 test
test do_initials ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests wordutils

running 2 tests
test src/lib.rs - initials (line 11) ... ok
test src/lib.rs - initials (line 18) ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

To be completely honest, as a Rust neophyte, I am not entirely sure how beneficial it will be to have integration test support built-in like this. But I do like the idea that I can look in a crate's tests/ directory and get an idea of how the public API is supposed to work.

Conclusion

In this fifth post of the series, we have looked at three additional types of testing in Rust:

Benchmarking
Examples in documentation
Integration tests

For the most part, Rust's testing strategy is not much different than Go's. We don't see drastically different paradigms or conventions. But what we do see is perhaps a greater "ergonomic" in Rust, where examples are inside of the documentation instead of in the unit tests, and where integration tests are separated from unit tests, and designed to mirror the user experience of developers who use your libraries.

Using Azure Static Websites

2018-07-21T20:06:00+00:00

Over the years I've chronicled the technical changes to this blog and its hosting provider. Years ago, I moved it from Drupal to Middleman to cut down on the maintenance. Later I containerized my Ruby environment to get rid of RVM madness

For a fun weekend project, I moved from S3 to the brand new (still in preview) Static Websites service from Azure.

Essentially, I moved the hosted portion of my blog, but didn't actually change the underlying blog software. I'm still using Middleman to generate the blog.

In this post, I'll cover the first two steps of my move:

Move my blog's files from S3 to Azure Storage
Use Azure Static Websites to serve the site

In the future, I want to add a couple new features, though:

Add Azure CDN to speed things up
Finally start using SSL for this site

So I will cover those in a follow-up post.

Step 1: Setting Up Azure Storage

Really, the process of "migration" is more about uploading a bunch of static files, then redirecting DNS. Because Middleman can generate the entire site from my source code (which lives in a private BitBucket repo), there's no data migration necessary. I don't have to get data out of S3. I can just upload a fresh copy.

So to kick things off, I logged into the Azure web portal and then created a new storage account. And I turned on the Static Website feature.

Rather than repeat the exact steps I did, I'll point you straight to the (frequently updated) official documentation. That team does a stellar job of keeping up with changes, and for a preview service that is important. The entire process documented there seriously took me only a couple of minutes.

A few quick notes:

When the docs tell you to enter index.html, which seems to override the default of index.html, you do actually need to do that. I suspect that will be fixed in the future.
I did not create a $web container at this point. I let the tooling do it for me later.

Step 2: Upload My Site

I have a handy Makefile that I use to do various blog tasks. It looks like this:

APPROOT=/usr/src/myapp
UPLOAD_FLAGS ?= -o table

.PHONY: post
post: TITLE ?= Untitled
post: COMMAND = bundle exec middleman article "$(TITLE)"
post: dockerize

.PHONY: build
build: COMMAND = bundle exec middleman build
build: dockerize

.PHONY: serve
serve: EFLAGS = -p 4567:4567
serve: COMMAND = bundle exec middleman serve
serve: dockerize

.PHONY: docker-build
docker-build:
    docker build -t $(IMAGE) .

.PHONY: dockerize
dockerize:
    docker run -it --rm --name $(NAME) -v "$(CURDIR)":$(APPROOT) -w $(APPROOT) $(EFLAGS) $(IMAGE) $(COMMAND)

.PHONY: dist
dist:
  # Code to send this to AWS

In a nutshell:

post creates a new post
serve starts a local testing server
build generates a static version of the site
dist sends the static site to S3

To this Makefile I just added a new target:

.PHONY: upload
upload:
    az storage blob upload-batch -d '$$web' -s build/ $(UPLOAD_FLAGS)

(UPLOAD_FLAGS just gives me a way to override the flags from the command line, like $ UPLOAD_FLAGS="--dry-run" make upload.)

The new command does a bulk upload of my static site (az storage blob upload-batch) sending it to the destination (-d) container named $web (note that we escaped this for Make by doing $$). And it reads the sources from the build/ folder.

To authenticate az to my account, I set the env var AZURE_STORAGE_CONNECTION_STRING.

Now running make build upload builds my site from source, then uploads it to my new Azure static website.

The first time I ran the upload, it created the $web container, but it seemed to take about three minutes to get everything synced. In particular mapping the index.html file to the document root seemed to take a bit. But from there, everything worked as expected.

Where Next?

At this point, I can hit my Technosophos blog at the URL provided by Azure. There are two possible routes to go from here:

I could set up Azure's DNS service to point directly to this endpoint. This process is actually pretty easy. But that's not what I want to do.
I would like to set up Azure's CDN service to cache my blog, then add an SSL certificate on the CDN service (something not supported by static websites yet) so that the blog will be fully TLS.

That second option is what I am exploring now, and will document in a future post.

From Go to Rust - Unit Testing

2018-07-07T22:46:00+00:00

In this fourth installment of the series, we'll transform some Go tests into Rust tests.

Go Did Testing Right... Mostly

I am a big fan of Go's approach to testing. Tests are easy to write, easy to run, and live alongside the stuff that they test. Adding benchmarking support was cool. And I like the way that documentation functions get automatically tested (though the implementation is limited to nearly trivial functions).

I also like the fact that Go makes it easy to test private (unexported) functions. I know there's some dogma involved here. I've heard people adamantly claim that private functions should not be tested. But for purely pragmatic reasons, my view is that we should be able to test whatever we want.

But there are a few things about Go's built-in testing that I'm not terribly keen on.

I'll never understand why the Go developers didn't just add an assertions library to the testing library. I've heard the "asserts get abused" line, but I don't find it convincing. That said, it's a shortcoming easily remedied by a decent assert library.

One thing I'm not a fan of in general, though, is using "magic" prefixes or suffixes to determine how to execute a function. I think function name scanning sets a dangerous precedent for how reflection ought to be used. And I find that in practice it results in an arbitrary limitation on what I can actually name my functions. But in spite of my general disdain for that pattern, I've been happy with the way it works in Go's testing suite.

Overall, I think Go makes it amazingly simple to write tests. And compared to languages like Java, Python, JavaScript, and PHP, working with Go tests is a breeze.

So when diving into Rust (reminder: This really is my first go-around with the language), I've been interested to see how Rust's testing stacks up. In this article, we'll focus on unit tests.

Something To Test: Wordutils

For the past few posts, I've been writing small programs. But with testing as the focus, it seems like the right time to try my hand at writing a library. So here's a small wordutils library:

package wordutils

import (
    "bufio"
    "strings"
)

// Initials returns a string with the first letter of each word in the given string.
func Initials(phrase string) (string, error) {
    wrds, err := words(phrase)
    if err != nil {
        return "", err
    }
    initials := ""
    for _, word := range wrds {
        initials += word[0:1]
    }
    return strings.ToUpper(initials), nil
}

func words(str string) ([]string, error) {
    wordList := []string{}
    scanner := bufio.NewScanner(strings.NewReader(str))
    scanner.Split(bufio.ScanWords)
    for scanner.Scan() {
        wordList = append(wordList, scanner.Text())
    }
    return wordList, scanner.Err()
}

The library above declares two functions: words, which splits a string into words, and Initials, which returns a string that is the capitalized version of the first letter of each word in the given string.

While there's no Go standard library function that does what words() does, I decided to write it as an internal function (non-exported) to give me a point of comparison with Rust. When re-implementing, we'll follow the same scoping rules.

These two functions are easy enough to test. So alongside wordutils.go, here is the contents of wordutils_test.go:

package wordutils

import (
    "testing"
)

func TestWords(t *testing.T) {
    in := "this is the way\n the world ends"
    out, err := words(in)
    if err != nil {
        t.Fatal(err)
    }

    expect := []string{"this", "is", "the", "way", "the", "world", "ends"}
    if len(out) != len(expect) {
        t.Fatal("expected same length")
    }
    for i, word := range out {
        if word != expect[i] {
            t.Errorf("expected word %d to be %q, got %q", i, expect[i], word)
        }
    }
}

func TestInitials(t *testing.T) {
    in := "not with a bang   but a whimper"
    expect := "NWABBAW"
    out, err := Initials(in)
    if err != nil {
        t.Fatal(err)
    }
    if out != expect {
        t.Errorf("expected %q, got %q", expect, out)
    }
}

There's a simple unit test for each of the functions we wrote. These tests aren't terribly robust (they don't test the error cases), but they're good enough for us to start modeling a Rust implementation.

Creating a Rust Library

Instead of creating an executable with cargo new --bin, we're going to create a new library. And as a bonus... I just learned that we can initialize a Git repo as part of package creation:

$ cargo new --vcs git --lib wordutils
     Created library `wordutils` project

The --lib flag sets up the package as a library:

wordutils
├── Cargo.lock
├── Cargo.toml
├── src
│   └── lib.rs
└── target
    └── debug
        └── ...

In previous articles, we worked on src/main.rs. Note that with --lib, the file created for us is src/lib.rs. If we take a look inside of it, we'll see that some code was already created for us:

#[cfg(test)]
mod tests {
    #[test]
    fn it_works() {
        assert_eq!(2 + 2, 4);
    }
}

Huh... it's a test scaffold. It's like they knew what I was planning. Let's run the tests just to see what happens. From within the wordutils directory, we just run cargo test:

cargo test
   Compiling wordutils v0.1.0 (file:///Users/mbutcher/Code/Rust/wordutils)
    Finished dev [unoptimized + debuginfo] target(s) in 4.24 secs
     Running target/debug/deps/wordutils-9a757f9d84faff12

running 1 test
test tests::it_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests wordutils

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Alright, lest I get ahead of myself... we know that the initial test works. Let's get about the business of writing the library, and we'll return to testing later.

Wordutils: The Rust Version

Again, this is my first attempt at writing a Rust library. So my initial reaction to the contents of lib.rs was, "Wait... if the tests are in there, where do I put my code?" It turns out that for simple modules like ours, the answer is: put the code and the tests in the same file.

We'll do that first, then later look at ways of breaking things up. To start, we'll leaving the stub test alone and adding our new functions.

fn words(phrase: &str) -> std::str::SplitWhitespace {
    return phrase.split_whitespace()
}

pub fn initials(phrase: &str) -> String {
    words(phrase).map(
        |word| word.chars().next().unwrap()
    ).collect::<String>().to_uppercase()
}


#[cfg(test)]
mod tests {
    #[test]
    fn it_works() {
        assert_eq!(2 + 2, 4);
    }
}

We've replicated our wordutils.go functionality in just a few lines of code. And it's exciting, because we've encountered a few new concepts that haven't appeared in previous installments of the series.

The `words()` function

The words() function is a oneliner because Rust's standard library already has a word splitter.

fn words(phrase: &str) -> std::str::SplitWhitespace {
    return phrase.split_whitespace()
}

There are two things to note about this function.

First, we return a type called std::str::SplitWhitespace. The SplitWhitespace type is the type returned by split_whitespace. It has several traits associated with it that make it useful as a type.

In Go, it is common to keep the number of types sparse. For something like split_whitespace, Go developers would likely return a []string (slice of strings). But in Rust, it is common to return specialized types that can then implement multiple traits. In doing this (as we will see in a moment), we gain added flexibility that leads to concise and readable code.

Second, I chose to return std::str::SplitWhitespace instead of adding a use std::str::SplitWhitespace at the top, and then just returning SplitWhitespace. Either way works, but it is probably more common to add the use line rather than use a fully qualified name in a return value.

The `initials()` function

The second function declared is initials(). Normally, we'd actually write this as a one-liner, too. But since we're all new to Rust, I figured it would be more readable to expand it into a three-line body:

pub fn initials(phrase: &str) -> String {
    words(phrase).map(
        |word| word.chars().next().unwrap()
    ).collect::<String>().to_uppercase()
}

Notice that this function definition begins with pub. That tells the Rust toolchain that this function is a public (exported) function. Unlike Go, capitalization makes no difference as to the visibility of a function (or anything else). In Rust, modules must use pub to mark a method as visible outside of the module.

Like Go, though, Rust only has two visibilities: public (with pub) and private (the default). But the rules for privacy are slightly different in Rust than in Go. According to the Rust visibility documentation:

If an item is private, it can be accessed only by its immediate parent module and any of the parent’s child modules.

In contrast, privacy in Go dictates that only the present package may access a private item. We'll see in a moment why this nuance makes a difference when we write Rust tests.

Now let's look more closely at the function chain we run inside of initials():

words(phrase).map(
    |word| word.chars().next().unwrap()
).collect::<String>().to_uppercase()

We start by running the words() function we saw above. The SplitWhitespace object returned from that implements Iterator. Because of this, we could toss the result into a for loop:

for word in words("Mary had a little lamb") {
    println!("Word: {}", word);
}

See what we did? The SplitWhitespace type is an iterator (implements std::iter::Iterator), so we can use it in a for loop without doing anything special. Go does not have a concept of an iterator, so this code might look surprising.

Rust iterators are more than just a convenience for for loops, though. An Iterator has around a dozen useful functions attached to it. And one of them is map(). The map() function takes a closure (inline function), runs it on every item in the iterator, and returns the results as a new iterator.

In Rust, closures look different than regular functions. They take the form:

|param1, param2, ...| function_body

We can spread the function body out into a block, if we'd like:

|param1| {
   stuff;
   more_stuff;
   return_val
}

In our code, we call the map() function and give it a transformation: It takes a word, and returns the first character of that word.

|word| word.chars().next().unwrap()

Note that here (as in many cases in Rust), we let the compiler infer the type of word. The function is still type safe because Rust can determine at compile time that anything assigned to word will be a string. (If the type were ambiguous for some reason, we could annotate it |word: &str| ...)

So word.chars().next() essentially says "convert this word string to a list of characters, then use next() to pop the first character." But next() is safe: Instead of returning a character, it returns an Option<char>. If there is no character to return, it will send back a None. We happen to know that all of the words returned from split_whitespace() have at least one character in them. So instead of testing whether the result of next() was a Some<char> or a None, we can use unwrap() to just get the char value.

Note that if for some reason we did get a None, unwrap() would cause a panic.

At this point, the words().map() combo has returned an iterator of chars. We want to turn that into a String. Believe it or not, this is really easy in Rust because an Iterator has a function called collect<T>() that takes an iterator and transforms it into some other collection type (T).

We could, for example, use collect::<Vec<char>> to collect our iterator into a vector (list) of characters. And in Rust, a String happens to be... wait for it... a collection of characters! So all we need to do to transform our character iterator into a String is call collect::<String>(). (Recall that ::<T>, the turbofish, tells a function that takes a generic how to fill out that generic.)

That's it for our two functions. Essentially, we've now reimplemented our Go wordutils library. It's time to do some testing.

Writing the Tests

We already got a hint of how to write tests when we took our initial look at lib.rs. We saw a basic test that looked like this:

#[cfg(test)]
mod tests {
    #[test]
    fn it_works() {
        assert_eq!(2 + 2, 4);
    }
}

So let's just roll with it and try to flesh out some actual tests based on that pattern. Here I'm replicating the tests from wordutils_test.go and also adding a few tests for handling empty strings:

use std::str::SplitWhitespace;

pub fn initials(phrase: &str) -> String {
    words(phrase).map(
        |word: &str| word.chars().next().unwrap()
    ).collect::<String>().to_uppercase()
}

fn words(phrase: &str) -> SplitWhitespace {
    return phrase.split_whitespace()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_words() {
        let input = "this is the way\n the world ends";
        let expect = vec!["this", "is", "the", "way", "the", "world", "ends"];
        assert_eq!(words(input).collect::<Vec<&str>>(), expect)
    }

    #[test]
    fn test_initials() {
        let input = "not with a bang   but a whimper";
        assert_eq!(initials(input), "NWABBAW");
    }

    #[test]
    fn empty_words() {
        let input = "";
        let expect: Vec<&str> = Vec::new();
        assert_eq!(words(input).collect::<Vec<&str>>(), expect);
    }

    #[test]
    fn empty_initials() {
        let input = "";
        assert_eq!(initials(input), "");
    }
}

If we run cargo test with the above, we'll see four tests pass. Let's take a quick look at the organization of tests.

Rust Modules, and the `test` Module

In Go, packages are determined largely by the package keyword at the top of each file in a directory. The usual idiom in Go is that a directory name and a package name match. (The history of this is a little complex, as the original version of Go suggested that a directory contains two packages: the base package (foo) and the testing package (foo_test). But that idiom was deprecated shortly after Go 1.0.)

In Rust, when people say "package", they usually mean a crate, which is an entire library or application. Libraries are broken up not into packages, but into modules.

While Go mixes testing and non-testing stuff into the same package, and uses compiler magic to distinguish based on file names, Rust is a little different. In Rust, you store your tests (by convention) in the tests module, decorate the modules with attributes, and then let the toolchain sort and run the tests.

There are noteworthy similarities that Rust and Go share here (especially when contrasted with other popular languages): Tests are stored alongside the code they test. Tests are sorted and executed by the toolchain. Testing support is first-class in the language.

With that in mind, we can take a look at the tests module:

#[cfg(test)]
mod tests {
    use super::*;

    // Tests go here
}

So we've created a module inside of wordutils that is named wordutils::tests. And we've annotated it with #[cfg(test)], which (if I understand things correctly) is the attribute that tells the compiler to only compile this module during a test run. (Compare this with Go build flags.) Like go test, cargo test compiles a testing binary and then executes it.

See the line use super::*? We've seen use in previous installments of this series. It is used to import names from other modules into the current namespace. Here, we are importing the names from the parent (super) module into the current tests module. So instead of calling super::words(), we can simply call words().

Remember that in Rust, a private function can be accessed by the parent module, and by all of the parent's submodules. Because of that rule, we can import the words function, which is private, into the tests module.

At this point, we've created our testing module and imported the functions we want to test. Let's look at the tests.

Test Functions

Here's the first test function:

#[test]
fn test_words() {
    let input = "this is the way the world ends";
    let expect = vec!["this", "is", "the", "way", "the", "world", "ends"];
    assert_eq!(words(input).collect::<Vec<&str>>(), expect)
}

My Go naming habits have me prefixing the test function with test_, but as far as I can tell, that's not an idiom in Rust. Maybe calling it words_equal would have been just as acceptable.

But to make this function a test, we need to prefix it with the #[test] attribute. This is what indicates that the function a testing target. If I omit this attribute, I'll see an error:

warning: function is never used: `test_words`
  --> src/lib.rs:18:5
   |
18 |     fn test_words() {
   |     ^^^^^^^^^^^^^^^
   |
   = note: #[warn(dead_code)] on by default

Inside of a test module I can add utilities, mocks, etc. and as long as I don't label them with #[test] they will not be mistaken for tests.

In the four test functions I created, I used an assertion macro (assert_eq!). There are actually four macros that are useful for testing:

assert!, which asserts that the value is true
assert_eq!, which asserts that two values are equal
assert_ne!, which asserts that two values are not equal
panic! which causes the test to fail

Go has two classes of failure: t.Error and t.Fatal. Standard Rust only has one type of failure.

There's also a #[should_panic] annotation. Decorate a function with this to indicate that a test is considered passing if and only if it panics.

From here, our tests are straightforward. We merely compare the output of our two functions to the expected results.

Breaking Out Tests into a Separate File

To be honest, I'm not totally sure what the accepted idioms are for breaking tests out into separate files, but it turns out that it is relatively easy to do.

Modules can be split into separate files, and tests are organized into modules. So we can split tests into a separate file like this.

First, here's the main file for the library in lib.rs:

use std::str::SplitWhitespace;

pub fn initials(phrase: &str) -> String {
    words(phrase).map(
        |word: &str| word.chars().next().unwrap()
    ).collect::<String>().to_uppercase()
}

fn words(phrase: &str) -> SplitWhitespace {
    return phrase.split_whitespace()
}

#[cfg(test)]
mod tests;

On the last two lines of that file, we declare a testing module, but we don't put anything in the module. This will cause the compiler to look for tests.rs. We can oblige it by putting all the tests in tests.rs:

use super::*;

#[test]
fn test_words() {
    let input = "this is the way\n the world ends";
    let expect = vec!["this", "is", "the", "way", "the", "world", "ends"];
    assert_eq!(words(input).collect::<Vec<&str>>(), expect)
}

#[test]
fn test_initials() {
    let input = "not with a bang   but a whimper";
    assert_eq!(initials(input), "NWABBAW");
}

#[test]
fn empty_words() {
    let input = "";
    let expect: Vec<&str> = Vec::new();
    assert_eq!(words(input).collect::<Vec<&str>>(), expect);
}

#[test]
fn empty_initials() {
    let input = "";
    assert_eq!(initials(input), "");
}

Now if we run cargo test, we'll see the usual testing output:

$ cargo test
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running target/debug/deps/wordutils-9a757f9d84faff12

running 4 tests
test tests::empty_words ... ok
test tests::empty_initials ... ok
test tests::test_initials ... ok
test tests::test_words ... ok

test result: ok. 4 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests wordutils

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Again, I'm not sure if this is the preferred way to write tests, but it seems like a nice way to organize things.

Specifying the Test to Run

Here's one final note on running tests with cargo. In Go, we can select specific tests to run by regular expression: go test -run REGEXP. With Cargo, you can call specific tests by name:

$ cargo test empty_words

There are several flags (like --exclude) that can impact which tests are run. But as far as I know, there's no equivalent regular expression version.

Conclusion

The focus of this article has been on how to unit test Rust code, compared to Go. As a Rust neophyte, I have been pleasantly surprised by the similarities. The things I love about Go's testing are also present in Rust's testing. And I find Rust's built-in assertions, module-based tests, and attribute annotations to be elegant.

But unit testing is only one of the kinds of tests we care about. In the next installment of the series, we will see how Rust's testing stands up against Go's when it comes to benchmarks, documentation tests, and functional tests.

Be Nice And Write Stable Code

2018-07-04T15:13:00+00:00

Stop rearchitecting your code! The professional developer values stability over "code purity." Instead of pursuing a Shangra-La vision of code perfection with each and every release, just be nice and write stable APIs. In this post, I talk about taking practical steps toward writing code that remains stable over time.

The Non-Goal

It is completely unhelpful to begin with a suggestion like this:

When you write code, make it future-proof.

If history teaches us anything, it's that we are lousy predictors of what the future holds. Of course it is a good practice to strive toward clean APIs, flexible design, and thoughtful defaults. But inevitably, your users and emerging requirements will surprise you.

What I want to talk about is how to best deal with those surprises, and how to avoid introducing surprises (and frustration) to those who use your code.

Versions and SemVer

In software, we use version numbers to signal that something has changed. Version numbering schemes go from dead simple (integers that increment with every release, or date stamps) to surprisingly complex (1.0~pre3+dfsg-0.1+b2, 2.1.1+git20160721~8efc468-2, and 1.2.0+LibO5.2.7-1+deb9u4 are a few versions spotted in the wild).

But when it comes to software version numbers, the current leader in version numbering schemes is SemVer (or Semantic Versioning). Don't be fooled, though! Many people claim to know how SemVer works, but have never read the specification. Since this is a critical piece of what we are about to talk about, here is a summary of the spec:

Version numbers take the form X.Y.Z, sometimes augmented with additional pre-release and build information: X.Y.Z-AAA#BBB. And each of those fields means something well defined and specific.

X is the major number. Changes in this indicate breaking changes to the API (and/or behavior).
Y is the minor number. Changes to this number indicate that new features were added, but that no APIs are broken as a result.
Z is the patch version. Changes to this indicate that internal changes were made, but that no changes (even compatible changes) were made to the API.

These three are the important ones for us. Again, I suggest taking 15 minutes to read the entire spec.

Countless projects use a format that looks like SemVer, but many of them ignore the semantics behind the version number. Often, it seems that version numbers are incremented by "gut feel" instead of any consistent semantic: "This feels like a minor version update."

The intention of this post is to explain how to write software in a way that actually adheres to semantic versioning. Get rid of "gut feel" version numbers and give your users some peace of mind.

But Why?

Why bother using a semantic versioning scheme? What's wrong with just updating numbers arbitrarily? The reason is simple: Version numbers help your users understand something about the nature of the changes they can expect. If you don't follow a pattern, they are left guessing. And this frustrates people.

Following SemVer introduces rigor on two fronts:

It sends clear signals to users about the depth of changes they can expect in a release.
It sends a clear signal to your developers about what is, and what is not, allowed when it comes to changing the code.

I cannot understate the importance of (2). SemVer helps us impose self-discipline, which in turn minimizes both internal and external disruption.

Patterns of Change

With the SemVer discussion out of the way, we can now talk about the actual patterns of change.

Remember, the usability of code is a focal point of the professional software developer. Predictable patterns of change are a boon to usability.

Reorganizing, Refactoring, and Renaming

There is no clearer way to state the point than this: If you reorganize the package structure of your public API, or if you do a major renaming, or if you choose to change the methods/structs/classes/etc of your public API, you must increment the major version number.

That's it. There is no grey area here. Such changes mean that anyone who's using your code will experience breakage.

When it comes to working on minor updates, dealing with this means exercising discipline. Yes, the package structure might be poor. Yes, the code might be ugly. But you must wait until the right moment to fix that.

Of course, it's okay to make internal changes that don't touch any public API items. So minor internal-only refactoring can be done in minor, and even patch, releases (though we don't recommend doing it in patch releases).

Note: Stop trying to justify your refactoring with the "public but internal" argument. If the language spec says it's public, it's public. Your intentions have nothing to do with it.

So in effect, the following are not be be changed except during major updates:

Package structure
Public class, struct, enum, trait, interface, etc. names, nor the names of any of the items on these
Constants or public variable names or values
Function/method names
Function/method signatures for existing functions except where the change is additive and the added argument is optional. Return value types and exceptions must also not change.

The bottom line: Refactoring, renaming, and reorganizing is a sweet temptation. But this is a temptation that must be resisted when doing a minor/patch releases. Part of being a professional software developer is creatively coping with imperfect code.

But, you might be saying, how do you add new features without changing any of these? That is the subject of the next few sections.

Introducing New Features

Minor versions may introduce new features, but features must be introduced without breaking existing APIs.

Features are additive in nature: They bring new things, but do not modify or delete existing things. To that end, these are safe as part of a feature release:

Adding a field or method to a struct/class/enum/etc.
Adding a new struct/class/enum/etc. or adding new variables, constants, functions, packages, etc.
Adding new configuration options (but see the next section)
Making something that was non-public into something that is public (e.g. exposing a private API as a public API)

However, there are a few changes that are sometimes done under the guise of a feature, but which are breaking changes that must be avoided:

Changing values of constants, variables, etc.
Changing a function or method signature (e.g. adding more params or changing the return type)
- There is an exception here if a language supports adding optional parameters in a way that will still make old calls to the function to work exactly the same.
Changing an item from public-scoped to non-public (hiding an API)

Modifying by Adding Alternatives

Consider the case where you begin with code like this:

func ListItems(query Query) Items {
  // Code to fetch and list the items
}

The code above might, for example, do a database query and fetch all of the results.

Now a feature request rolls in: "We need to add paging to the list functions." The temptation is to do this:

func ListItems(query Query, limit int, offset int) Items {
  // ...
}

But that is an API breaking change. The correct way to handle this is to introduce a new function:

func ListItems(query Query) Items {
  ListItemsWithLimit(query, 0, 0)
}

func ListItemsWithLimit(query Query, limit int, offset int) Items {
  // ...
}

Note that we have adjusted the internals on the old to replicate the exact behavior of the old function, but by calling into the new function.

It is fine to mark functions as deprecated when you do this. In fact, this is why languages like Java cleverly added built-in support for deprecation. Professional-grade development involves a strategy of deprecation with eventual removal, even if that removal is years down the road.

Importantly, if the newly introduced function cannot replicate the old feature set, you are obligated (unless security concerns dictate otherwise) to provide the old API's functionality to the greatest extent possible.

Beware The Dubious Work-Around

There is an accepted pattern that works well for many things, but which some developers employ to "get around" the SemVer constraints on modifying function signatures:

func ListItems(query Query, options Map) Items {
   // ...
}

In this pattern, the options map is an arbitrary set of key/value options. This pattern itself is fine unless the default behavior changes when a new option is introduced. When adding a new option, the professional developer ensures that when that option is not present, the code behaves the same as it did in the last release.

Deprecating

We touched on deprecation above. But I want to summarize the deprecation strategy:

Mark a thing as deprecated as soon as it is considered deprecated, even if that is a patch or minor release. Deprecation, after all, is a warning condition, not an error condition.
Do not change the behavior of the deprecated thing during minor or patch releases
Remove deprecated things only at major version changes. Until that time, you're still on the hook for supporting them.

Deprecation is a signal that in the future a thing will be removed. But it is not an excuse to change, delete, or ignore the functionality of that bit of code outside of the SemVer constraints.

Errors and Exceptions

One of the most frustrating outages I ever experienced occurred because of a seemingly innocuous change to an upstream library: During a minor release update, the library changed the exception type that it threw on a particular error.

One of the functions in the library threw an IOException whenever a network error occurred, and other exceptions for other problems.

We used the throwing of an IOException to kick off our retry logic. Given that network failures were a frequent occurrence for our particular conditions, this was an important feature.

But during a minor version change, the developers decided to simplify the API by catching all of the different exceptions (including the IOException) and wrap them in a single generic exception. (Incidentally, the API itself did not change because it was something like func Read(in Reader) error, where error was a parent of all exceptions).

When we upgraded, all our tests passed (because our test fixtures emulated the old behavior and our network was not unstable enough to trigger bad conditions), and production rolled out just fine. But our customers began complaining that the product was much less reliable. Why? Because the retry logic was never triggered. So our app suddenly was as unstable as the network it was on.

The bottom line: Even error handling is part of your public API.

Resisting Subtle Changes

Sometimes subtle but ill-planned changes can cause major breakages for your users.

Here's a short story of how a trivial change in one of our dependencies led to a series of production catastrophes for our users:

We depended on a library that provided a client/server RPC-like protocol. This library had long been marked stable, and indeed stability is one of the touted features of this library. But the developers introduced a very subtle change that appeared to follow the stability requirements, but which actually introduced a serious compatibility flaw. The change went something like this:

The library allowed us to set a maximum message size. The default was 256k, but we wanted it to be significantly larger. So we set this option:

config.MaxMesageSize = 1024

This made it possible for client and server to send each other messages up to 1M. But at some point, a minor release of the package made a very subtle, but killer, change: They split upstream and downstream message sizes into two variables. And here's the killer: They did this by merely creating a second variable (config.MaxInboundMessageSize) and changing the behavior of the first to impact only outbound message size.

When we upgraded the package, all seemed to work well. Code compiled. Tests passed. Early users saw no problems. Then we shipped our new version with this updated dependency. And suddenly angry users started filing issues. Stuff that worked yesterday was broken today.

Why? Because behind the scenes, the inbound message size had dropped from 1M to it's default 256k. And while nothing in our early testing sent messages larger than 256k, there were plenty of production instances that did.

The upstream library maintainers had introduced a serious bug into our code by silently changing the behavior of their code, even though they didn't (in a pedantic sense) "break" SemVer.

What should they have done?

I would argue that breaking the size limits into an inbound and an outbound is a totally legitimate thing to do during a minor release. It was just done wrong.

The right way to address a configuration change like this is to introduce two new variables and then add default support for the old one.

Thus, in practice, it would look like this:

type Config struct {
   // Old one
   MaxMessageSize int
   // New settings
   MaxInboundMessageSize int
   MaxOutboundMessageSize int
}

And then the internal code in the library would handle the legacy case while still offering the new functionality:

// Support clients who set the old config
if config.MaxMessageSize > 0 {
    config.MaxInboundMessageSize = config.MaxMessageSize
    config.MaxOutboundMessageSize = config.MaxMessageSize
}

With something like this in the base library, any tools that use it would get the old behavior if they used the old configuration param, but could opt into using the newer options instead.

Bugs and Security Fixes: How To Handle Real Life

Guidance that you should never change certain things is all well and good until reality comes a-callin'. But what if that public constant or variable introduces a security issue, or causes the server to crash?

When the real world comes crashing in, we make exceptions. But professional software developers make them wisely and carefully.

The important concept here is the minimally invasive change. That is, when patching bugs or security releases, we may need to change the API, but we should do it by changing the absolute minimum number of things we can get away with. And we do that even if it means sacrificing our "architectural purity".

I will plead guilty for introducing a global variable as a stop-gap to fix the internals of a function without changing the function signature. It was ugly. I was ashamed. But it ensured backward compatibility, and that was the important thing. If I had changed the function call, thousands of users would have had to change their code. But with the ugly global, only the few who needed to tune that particular parameter were impacted (and only by having the option to set something they could not previously control).

But a security issue or major bug is a legitimate reason to change things like default values or even larger macro behaviors. If the change is big enough, you're still obligated to change the major version of your code. SemVer doesn't give a free pass on that, and failing to do so still undermines user confidence.

But for less intrusive changes, I personally feel like you can make some minor SemVer transgressions provided:

You make this very clear in your release notes.

"The value of MaxBufferSize was adjusted downward to 2048 because we discovered a buffer overflow in a lower level library for any larger buffer size. See issue #4144"
The code is clearly commented:

// MaxBufferSize sets the maximum size of the network buffer.
// Prior to version 2.5.1, this was 4096. Due to a security flaw
// reported in #4144 that resulted in a buffer overflow, we
// lowered this to 2048.
MaxBufferSize = 2048

Conclusion

The professional software developer has long-term usability and stability as a goal. Yes, well-architected code is important. But there is a time and place for making that your focus. And maintenance releases (minor and patch versions) are not an occasion to refactor, re-organize, or make sweeping modifications.

Be conscientious about how much effort the users of your code put into using your code. I can tell you from experience what we do when the maintenance burden you impose on us gets wearying: We stop using your tools (or we fork them).

SemVer is a communications tool. But to use it well, we must use it accurately. And that means writing code focused on stability.

TechnoSophos

Writing a Kubernetes CRD Controller in Rust

Getting Started

Setting up our Cargo.toml file

Part 1: Create the Book Struct

Part 2: Connecting to Kubernetes

Part 3: Creating an Informer

Part 4: Handling Events

Conclusion

The TRUE hardest programming problem is tight vs. weak coupling

What Do We Mean by Tight and Weak Coupling?

Tight Coupling

Loose Coupling

How is this a Problem?

- Do I allow this data to be mutated, or just accessed?

From Go To Rust - Advanced Testing

Go Goes Beyond Unit Tests

Rust Benchmarks

Enabling Benchmarking

Writing Benchmark Tests

Documentation Testing

Documenting Rust

Adding an example in Markdown

Integration Tests in Rust

Conclusion

Using Azure Static Websites

Step 1: Setting Up Azure Storage

Step 2: Upload My Site

Where Next?

From Go to Rust - Unit Testing

Go Did Testing Right... Mostly

Something To Test: Wordutils

Creating a Rust Library

Wordutils: The Rust Version

The words() function

The initials() function

Writing the Tests

Rust Modules, and the test Module

Test Functions

Breaking Out Tests into a Separate File

Specifying the Test to Run

Conclusion

Be Nice And Write Stable Code

The Non-Goal

Versions and SemVer

But Why?

Patterns of Change

Reorganizing, Refactoring, and Renaming

Introducing New Features

Modifying by Adding Alternatives

Beware The Dubious Work-Around

Deprecating

Errors and Exceptions

Resisting Subtle Changes

Bugs and Security Fixes: How To Handle Real Life

Conclusion

Setting up our `Cargo.toml` file

The `words()` function

The `initials()` function

Rust Modules, and the `test` Module