Using the UNIX find Command
Today I needed to run a simple script against thousands of identically named files nested in a huge directory structure. What I needed was a quick way to recurse through all of the directories, ignore all of the files I didn't care about, but run a specified command on any files with a particular name.
While this sounds like the sort of thing that will require a couple dozen lines of shell scripting, it can actually be accomplished on a single line with the command find
. find
has been around for decades, and can be found on almost any UNIX-like file system. It is a simple to used for searching for files within the UNIX directory hierarchy. Honestly, there's not much more to it. Like all good UNIX tools, it does one thing well.
<!--break-->
Here is a basic example of using find to recurse through directory of HTML files, and list all of the ones named index.html
:
$ find . -name index.html
When this command is run, it will start from the current working directory (.
) and search through all files and subdirectories for anything named index.html
(Note that the long options, like -name
, use only one dash, not two).
There is room for an inaccuracy to creep in here, though. This will find anything named index.html
, including directories and symbolic links. We might prefer to restict our search to only files. That can be done by adding another argument to our command:
$ find . -name index.html -type f
The -type f
argument instructs find to only match files.
Now let's say we want to find out how many lines each index.html file contains. While find
cannot determine the number of lines itself, it can execute a command to do that for us. For example, we know that we can find the number of lines in a file by executing this command:
$ wc -l FILENAME
The above will use the word count program (wc
) to print the number of lines in the file named FILENAME.
With a little bit of additional work, we can have find
execute wc
for us. Here's an example:
$ find . -name index.html -type f -exec wc -l {} \;
10 ./foo/bar/index.html
256 ./baz/burble/index.html
The snippet above shows our full find command, as well as the first two lines of returned output.
Let's take a closer look at the newest material in our find
command: -exec wc -l {} \;
.
The -exec
argument tells find
that it should execute the following command. find
will execute anything following -exec
until it encounters \;
(the backslash is required to prevent the shell from interpreting the semicolon).
So between -exec
and \;
we have the code that find
will execute: wc -l {}
. Already we have seen the wc -l
command, and we know that it takes a filename as an argument. The find
command will execute this command on every item it finds, placing the name of the item where the curley braces ({}
) appear. In other words, the pair of braces is a placeholder for the file name.
We have created a simple find
command for finding files and executing an command on each file. There are many other things you can do with find. Of course, man find
will give you detailed information on the command. Here's an interesting set of examples that provide more complex filtering logic:
http://www.athabascau.ca/html/depts/compserv/webunit/HOWTO/find.htm
Thanks to sdboyer for reminding me how amazingly awesome find
is.