How Brigade Shares Data Between Containers
Brigade provides a way to script multiple containers to perform a task. With Brigade, you can build things like CI systems, ETL pipelines, and distributed batch processors. One of the critical capabilities of Brigade is its ability to share data between containers. This article describes the two main ways of sharing data.
Brigade's Purpose
In a previous article in this series, I explained why we created Brigade. I described Brigade as an event-based scripting environment for Kubernetes. A second way of looking at Brigade is as a serverless platform for scripting containers.
Both descriptions share a central feature: Brigade is about linking multiple containers together to create powerful processing pipelines.
To do this well, it needs to be easy to share data between containers. Brigade provides two ways of doing this, though you can also add your own.
Two Ways of Sharing Data
In the previous article, I explained how UNIX shell scripting influenced our design of Brigade. UNIX also influenced the way we share data.
We think of containers as roughly analogous to processes. Brigade scripts work like shell scripts: They provide an environment for grouping and executing several programs in order to accomplish one larger task. Thus, sharing data between containers should feel approximately like sharing data between processes.
Brigade shares data in two ways:
- By passing output (like UNIX pipes)
- By sharing access to a common filesystem location
UNIX Pipes and Passing Output
From the earliest versions of UNIX, it has been possible to pass data from one process to another via a pipe. The basic pipes in UNIX were half-duplex, or one-directional. In shell scripts, we pipe data like this:
$ cat animals.txt | grep "rhino"
The output of the cat
command is directed into the grep
command as input.
We wanted a similar feel for Brigade pipelines. Since we were working with an existing language (JavaScript), we couldn't make the syntax quite as compact, but the basic idea persists:
events.on("exec", () => {
one = new Job("one", "alpine:3.7", ['cat animals.txt']);
two = new Job("two", "alpine:3.7", ['grep "rhino" <<< $DATA']
one.run().then( data => {
two.env = { DATA: data }
two.run().then( matches => {
console.log(matches)
})
})
})
The above script does approximately the same thing as our earlier one-line shell script, though it does so in a distributed way, running each command in a different container. It works like this:
- It creates two jobs:
one
andtwo
.one
does thecat
command, andtwo
does thegrep
command. - It first executes
one.run()
, which will start the container and run thecat
command, returning the output. - Then we use a JavaScript Promise to capture the output (
then( data => { /* ... */ })
). Sodata
will have the output of thecat
command. - Inside of the Promise callback, we set up
two
to receive thedata
by passing it intotwo
's environment. Then werun()
it. - Finally, we capture the output of
two
(which is the output ofgrep
) and print that to the log.
The idea expressed above is that we can easily pass chunks of data from one job to another, just as with UNIX pipes.
This method of sharing is optimal for a few cases:
- You need to share a small amount of data from one job to the job(s) directly afterward.
- You need to share that data with the outer JavaScript
Sometimes you want to share larger chunks of data, and keep those chunks around for multiple jobs to share. In that case, you may prefer sharing files.
Shared Files
The second way to share data in Brigade is through shared files. This is better in two circumstances:
- When the data you are dealing with is large, and passing it through the JavaScript layer would be inefficient
- When the data needs to be shared with multiple jobs
Brigade will share files on an ephemeral filesystem accessible to different jobs in a build. But this facility is only available if you request it in your script. Here's a revised version of our last script to demonstrate:
events.on("exec", () => {
one = new Job("one", "alpine:3.7", ['cp animals.txt /mnt/brigade/share/animals.txt']);
two = new Job("two", "alpine:3.7", ['grep "rhino" /mnt/brigade/share/animals.txt']
// request ephemeral storage for both jobs
one.storage.enabled = true
two.storage.enabled = true
one.run().then( () => {
two.run().then( matches => {
console.log(matches)
})
})
})
The example above works this way:
- Job
one
copies itsanmials.txt
file onto a shared filesystem, then... - Job
two
runsgrep
on that newly shared file, then... - Brigade prints the result of
two
to the console
As this event runs, both jobs will have access to a shared filesystem. But as soon as the event completes, the shared filesystem is discarded. In fact, if you were to run this same script twice concurrently, each instance would have access to a separate shared filesystem (though the mount paths would be the same). In other words, the shared filesystem is isolated to just the jobs inside each build.
Do-It-Yourself Sharing
We built two ways of sharing data, focusing on methods we believed would be the easiest for developers to use. Earlier in our development cycle, we experimented with other methods, including putting data inside of a key/value storage (Redis, in our case) or directly sharing data over socket connections.
While we opted for the easier methods showcased above, we can attest that it is possible to create your own sharing methods for Brigade. Typically this is done by adding tools inside of your containers, and then using these tools to store and retrieve data.
In one case we tested adding the mongo
client to our container images, and then directly invoking that client to store and query results from a CosmosDB.
Feel free to experiment with your own methods of sharing data. Brigade is designed to be open-ended and flexible.
Configuring Build Storage
If you are using shared filesystem storage, it is worth taking the time to configure Brigade to optimize this.
I recommend considering using an NFS server for shared storage. But there are other alternatives as well.
Conclusion
Brigade is designed not just to make it possible to run multiple containers, but to share data between those containers. We did our best to design a system that felt natural and usable. We also tried to leave it open for further expansion.
In this article, I showed how to share data using UNIX pipe-like passing, and also using shared files.