How Brigade Shares Data Between Containers

Mar 26 2018

Brigade provides a way to script multiple containers to perform a task. With Brigade, you can build things like CI systems, ETL pipelines, and distributed batch processors. One of the critical capabilities of Brigade is its ability to share data between containers. This article describes the two main ways of sharing data.

Brigade's Purpose

In a previous article in this series, I explained why we created Brigade. I described Brigade as an event-based scripting environment for Kubernetes. A second way of looking at Brigade is as a serverless platform for scripting containers.

Both descriptions share a central feature: Brigade is about linking multiple containers together to create powerful processing pipelines.

To do this well, it needs to be easy to share data between containers. Brigade provides two ways of doing this, though you can also add your own.

Two Ways of Sharing Data

In the previous article, I explained how UNIX shell scripting influenced our design of Brigade. UNIX also influenced the way we share data.

We think of containers as roughly analogous to processes. Brigade scripts work like shell scripts: They provide an environment for grouping and executing several programs in order to accomplish one larger task. Thus, sharing data between containers should feel approximately like sharing data between processes.

Brigade shares data in two ways:

  1. By passing output (like UNIX pipes)
  2. By sharing access to a common filesystem location

UNIX Pipes and Passing Output

From the earliest versions of UNIX, it has been possible to pass data from one process to another via a pipe. The basic pipes in UNIX were half-duplex, or one-directional. In shell scripts, we pipe data like this:

$ cat animals.txt | grep "rhino"

The output of the cat command is directed into the grep command as input.

We wanted a similar feel for Brigade pipelines. Since we were working with an existing language (JavaScript), we couldn't make the syntax quite as compact, but the basic idea persists:

events.on("exec", () => {
  one = new Job("one", "alpine:3.7", ['cat animals.txt']);
  two = new Job("two", "alpine:3.7", ['grep "rhino" <<< $DATA']

  one.run().then( data => {
    two.env = { DATA: data }
    two.run().then( matches => {
      console.log(matches)
    })
  })
})

The above script does approximately the same thing as our earlier one-line shell script, though it does so in a distributed way, running each command in a different container. It works like this:

  1. It creates two jobs: one and two. one does the cat command, and two does the grep command.
  2. It first executes one.run(), which will start the container and run the cat command, returning the output.
  3. Then we use a JavaScript Promise to capture the output (then( data => { /* ... */ })). So data will have the output of the cat command.
  4. Inside of the Promise callback, we set up two to receive the data by passing it into two's environment. Then we run() it.
  5. Finally, we capture the output of two (which is the output of grep) and print that to the log.

The idea expressed above is that we can easily pass chunks of data from one job to another, just as with UNIX pipes.

This method of sharing is optimal for a few cases:

  1. You need to share a small amount of data from one job to the job(s) directly afterward.
  2. You need to share that data with the outer JavaScript

Sometimes you want to share larger chunks of data, and keep those chunks around for multiple jobs to share. In that case, you may prefer sharing files.

Shared Files

The second way to share data in Brigade is through shared files. This is better in two circumstances:

  • When the data you are dealing with is large, and passing it through the JavaScript layer would be inefficient
  • When the data needs to be shared with multiple jobs

Brigade will share files on an ephemeral filesystem accessible to different jobs in a build. But this facility is only available if you request it in your script. Here's a revised version of our last script to demonstrate:

events.on("exec", () => {
  one = new Job("one", "alpine:3.7", ['cp animals.txt /mnt/brigade/share/animals.txt']);
  two = new Job("two", "alpine:3.7", ['grep "rhino" /mnt/brigade/share/animals.txt']

  // request ephemeral storage for both jobs
  one.storage.enabled = true
  two.storage.enabled = true

  one.run().then( () => {
    two.run().then( matches => {
      console.log(matches)
    })
  })
})

The example above works this way:

  1. Job one copies its anmials.txt file onto a shared filesystem, then...
  2. Job two runs grep on that newly shared file, then...
  3. Brigade prints the result of two to the console

As this event runs, both jobs will have access to a shared filesystem. But as soon as the event completes, the shared filesystem is discarded. In fact, if you were to run this same script twice concurrently, each instance would have access to a separate shared filesystem (though the mount paths would be the same). In other words, the shared filesystem is isolated to just the jobs inside each build.

Do-It-Yourself Sharing

We built two ways of sharing data, focusing on methods we believed would be the easiest for developers to use. Earlier in our development cycle, we experimented with other methods, including putting data inside of a key/value storage (Redis, in our case) or directly sharing data over socket connections.

While we opted for the easier methods showcased above, we can attest that it is possible to create your own sharing methods for Brigade. Typically this is done by adding tools inside of your containers, and then using these tools to store and retrieve data.

In one case we tested adding the mongo client to our container images, and then directly invoking that client to store and query results from a CosmosDB.

Feel free to experiment with your own methods of sharing data. Brigade is designed to be open-ended and flexible.

Configuring Build Storage

If you are using shared filesystem storage, it is worth taking the time to configure Brigade to optimize this.

I recommend considering using an NFS server for shared storage. But there are other alternatives as well.

Conclusion

Brigade is designed not just to make it possible to run multiple containers, but to share data between those containers. We did our best to design a system that felt natural and usable. We also tried to leave it open for further expansion.

In this article, I showed how to share data using UNIX pipe-like passing, and also using shared files.



comments powered by Disqus