Using the UNIX find Command

Aug 3 2009

Today I needed to run a simple script against thousands of identically named files nested in a huge directory structure. What I needed was a quick way to recurse through all of the directories, ignore all of the files I didn't care about, but run a specified command on any files with a particular name.

While this sounds like the sort of thing that will require a couple dozen lines of shell scripting, it can actually be accomplished on a single line with the command find. find has been around for decades, and can be found on almost any UNIX-like file system. It is a simple to used for searching for files within the UNIX directory hierarchy. Honestly, there's not much more to it. Like all good UNIX tools, it does one thing well. <!--break--> Here is a basic example of using find to recurse through directory of HTML files, and list all of the ones named index.html:

$ find . -name index.html

When this command is run, it will start from the current working directory (.) and search through all files and subdirectories for anything named index.html (Note that the long options, like -name, use only one dash, not two).

There is room for an inaccuracy to creep in here, though. This will find anything named index.html, including directories and symbolic links. We might prefer to restict our search to only files. That can be done by adding another argument to our command:

$ find . -name index.html -type f

The -type f argument instructs find to only match files.

Now let's say we want to find out how many lines each index.html file contains. While find cannot determine the number of lines itself, it can execute a command to do that for us. For example, we know that we can find the number of lines in a file by executing this command:

$ wc -l FILENAME

The above will use the word count program (wc) to print the number of lines in the file named FILENAME.

With a little bit of additional work, we can have find execute wc for us. Here's an example:

$ find . -name index.html -type f -exec wc -l {} \;
      10 ./foo/bar/index.html
     256 ./baz/burble/index.html

The snippet above shows our full find command, as well as the first two lines of returned output.

Let's take a closer look at the newest material in our find command: -exec wc -l {} \;.

The -exec argument tells find that it should execute the following command. find will execute anything following -exec until it encounters \; (the backslash is required to prevent the shell from interpreting the semicolon).

So between -exec and \; we have the code that find will execute: wc -l {}. Already we have seen the wc -l command, and we know that it takes a filename as an argument. The find command will execute this command on every item it finds, placing the name of the item where the curley braces ({}) appear. In other words, the pair of braces is a placeholder for the file name.

We have created a simple find command for finding files and executing an command on each file. There are many other things you can do with find. Of course, man find will give you detailed information on the command. Here's an interesting set of examples that provide more complex filtering logic: http://www.athabascau.ca/html/depts/compserv/webunit/HOWTO/find.htm

Thanks to sdboyer for reminding me how amazingly awesome find is.



comments powered by Disqus