How To Create Large Files for Testing
Sometimes you need to create a large file for testing. The command line tool dd
is an easy way to create large files filled with random data.
I recently found myself needing to test uploading and downloading files of various sizes. So I wanted a quick way to create several files, each of very specific size. Almost all UNIX-like systems, including Linux and macOS, provide a tool called dd
that makes this job easy.
The dd
command is a low-level copying tool. Apparently, it's original intent was to convert between ASCII and EBCDIC encodings. But most of the time we use it as an efficient way to copy data. Here, we'll take advantage of a couple of built-in UNIX devices to create large files.
For the examples below, I'm running macOS's version of dd
. Different versions provide different output.
Creating a Large "Empty" File
Let's create a file whose contents is just a long series of null characters. This file will have a size, but if we look at its contents, we'll see nothing.
$ dd if=/dev/zero of=data.bin count=400 bs=1024
400+0 records in
400+0 records out
409600 bytes transferred in 0.001991 secs (205722299 bytes/sec)
The arguments are as follows:
if
is the input sourceof
is the output filecount
is the number of times to repeat a copybs
is the size of the chunk that is copied on each step. (bs
stands for block size)
Now we've created a file that is 400k. In dd
, you specify the size by multiplying the block size, 1024 (or 1k), by the count (400).
$ ls -lah data.bin
-rw-r--r-- 1 mbutcher staff 400K Apr 18 09:11 data.bin
But if we were to cat the file, it would appear to be empty:
$ cat data.bin
What's happening above is that dd
is copying 400k of data off of the special /dev/zero
device, which produces a stream of null characters. Essentially we're making a big "empty" file.
Creating a Large File of Random Data
Sometimes we don't want a large empty file, but a large file of random data. For example, while I was testing these large uploads and downloads, I realized that the data was being compressed in transit. And a file full of null characters compresses very efficiently:
ls -lah data.bin.gz
-rw-r--r-- 1 mbutcher staff 441B Apr 18 09:11 data.bin.gz
So instead, I wanted to fill the file with random data that would not be particularly efficient to compress. To do this, use /dev/random
instead of /dev/zero
. Here's how we create a 5m file of random data with dd
.
$ dd if=/dev/random of=data.bin count=5k bs=1024
5120+0 records in
5120+0 records out
5242880 bytes transferred in 0.410073 secs (12785234 bytes/sec)
There are two important details that changed in the example above:
- We use
/dev/random
instead of/dev/null
. This fills the file with random (binary) data. - We use
count=5k
instead ofcount=400
. This tellsdd
to copy 5m of data, or (5 * 1024) * 1024.
$ ls -lah data.bin
-rw-r--r-- 1 mbutcher staff 5.0M Apr 18 09:19 data.bin
If I were to run cat
on this file, my console would display a huge stream of garbled characters, since our file is full of binary data.
And now, if I compress the data, I see that it compresses much less efficiently:
$ ls -lah data.bin.gz
-rw-r--r-- 1 mbutcher staff 5.0M Apr 18 09:19 data.bin.gz
The one trade-off with using /dev/random
instead of /dev/zero
is speed: The random generator can take a while to create a sufficient amount of data. But most of the time, this difference in speed isn't all that relevant.