How To Create Large Files for Testing
Sometimes you need to create a large file for testing. The command line tool
dd is an easy way to create large files filled with random data.
I recently found myself needing to test uploading and downloading files of various sizes. So I wanted a quick way to create several files, each of very specific size. Almost all UNIX-like systems, including Linux and macOS, provide a tool called
dd that makes this job easy.
dd command is a low-level copying tool. Apparently, it's original intent was to convert between ASCII and EBCDIC encodings. But most of the time we use it as an efficient way to copy data. Here, we'll take advantage of a couple of built-in UNIX devices to create large files.
For the examples below, I'm running macOS's version of
dd. Different versions provide different output.
Creating a Large "Empty" File
Let's create a file whose contents is just a long series of null characters. This file will have a size, but if we look at its contents, we'll see nothing.
$ dd if=/dev/zero of=data.bin count=400 bs=1024 400+0 records in 400+0 records out 409600 bytes transferred in 0.001991 secs (205722299 bytes/sec)
The arguments are as follows:
ifis the input source
ofis the output file
countis the number of times to repeat a copy
bsis the size of the chunk that is copied on each step. (
bsstands for block size)
Now we've created a file that is 400k. In
dd, you specify the size by multiplying the block size, 1024 (or 1k), by the count (400).
$ ls -lah data.bin -rw-r--r-- 1 mbutcher staff 400K Apr 18 09:11 data.bin
But if we were to cat the file, it would appear to be empty:
$ cat data.bin
What's happening above is that
dd is copying 400k of data off of the special
/dev/zero device, which produces a stream of null characters. Essentially we're making a big "empty" file.
Creating a Large File of Random Data
Sometimes we don't want a large empty file, but a large file of random data. For example, while I was testing these large uploads and downloads, I realized that the data was being compressed in transit. And a file full of null characters compresses very efficiently:
ls -lah data.bin.gz -rw-r--r-- 1 mbutcher staff 441B Apr 18 09:11 data.bin.gz
So instead, I wanted to fill the file with random data that would not be particularly efficient to compress. To do this, use
/dev/random instead of
/dev/zero. Here's how we create a 5m file of random data with
$ dd if=/dev/random of=data.bin count=5k bs=1024 5120+0 records in 5120+0 records out 5242880 bytes transferred in 0.410073 secs (12785234 bytes/sec)
There are two important details that changed in the example above:
- We use
/dev/null. This fills the file with random (binary) data.
- We use
count=400. This tells
ddto copy 5m of data, or (5 * 1024) * 1024.
$ ls -lah data.bin -rw-r--r-- 1 mbutcher staff 5.0M Apr 18 09:19 data.bin
If I were to run
cat on this file, my console would display a huge stream of garbled characters, since our file is full of binary data.
And now, if I compress the data, I see that it compresses much less efficiently:
$ ls -lah data.bin.gz -rw-r--r-- 1 mbutcher staff 5.0M Apr 18 09:19 data.bin.gz
The one trade-off with using
/dev/random instead of
/dev/zero is speed: The random generator can take a while to create a sufficient amount of data. But most of the time, this difference in speed isn't all that relevant.