Command Line Searching with grep, find, and ag

Apr 3 2017

There are lots of tools for searching files on the UNIX (macOS, Linux) command line. Which one do you use? Let's look at grep, find, and ag to understand which tool is the best for a particular search job.

Three Search Tools

There are three tools we'll look at here:

  • grep: This tool is for using a regular expression to search the content of one or more files.
  • find: This is a tool for searching directory trees to find files that match certain criteria.
  • ag: Called The Silver Searcher (ag is the chemical symbol for silver), ag is a tool optimized for searching source code files for particular regular expressions.

The grep and find tools have been around since the early days of UNIX, both originally written in the 1970s. ag is a newcomer to the field, created around 2012 as an alternative to ack.

I use all three fairly frequently. What I've learned is that each has its strong points.

When to Use ag

I use ag primarily in an interactive shell (or from vim). Its support for colorized output and highlighting, combined with its stunning speed, makes it great for hunting through large code bases for a particular thing.

I usually using it when I'm in "coding mode", trying to find a pesky bug or refactor something in my code.

For example, I can use ag "func Test" to search for all of the test functions in my Go project:

One of the cool features of ag is that it will respect the settings from .gitignore files. It will skip searching files and directories ignored by that file.

But I rarely use ag in shell scripts. Really, I only script with it if I am sure that the script I am writing is just for me. Since ag is not installed on most systems by default, it's not a good idea to assume its presence in shell scripts.

When to Use grep

While ag is specialized for code, grep is a general purpose search tool for text files. I can roughly approximate the same results from the ag example above by doing this:

$ grep -n "func Test" *.go
sqlite_pod_test.go:44:func TestPlainStructInsert(t *testing.T) {
sqlite_pod_test.go:66:func TestPlainStructLoad(t *testing.T) {
sqlite_pod_test.go:91:func TestPlainStructLoadWhere(t *testing.T) {
sqlite_pod_test.go:120:func TestPlainStructUpdate(t *testing.T) {
sqlite_pod_test.go:149:func TestPlainStructDelete(t *testing.T) {
sqlite_pod_test.go:175:func TestPlainStructExists(t *testing.T) {

The output might not be as attractive, but several attributes of grep make it easier to script with:

  • Each line contains all the information we need to identify file, line number, and the matched line
  • The format is relatively easy to parse
  • The results of grep can be easily piped to other UNIX commands (grep "Test" | grep -v "Plain")
  • The grep command is a "common denominator" for UNIX-like systems

The grep command is included in just about every UNIX environment I've ever worked in. Even tiny Linux distributions like Alpine Linux include an implementation of grep.

On one hand, this is great. It means grep is a great choice for performing searches from within shell scripts.

But on the other hand, it is important to be aware of the fact that there are multiple implementations of grep. The BSD and GNU grep implementations support a superset of the original standard set of features, while versions like Busybox grep supports a limited set of features.

When to Use find

While ag and grep are really optimized for searching the contents of files, the find command is optimized for searching directory trees to find files.

For example, I can search my source code to find all of the .go files:

$ find . -name "*.go" -print
./example/fence.go
./example/users.go
./schema2struct/schema2struct.go
...

One of the nice features of find is that it supports some elaborate filtering features designed to take advantage of filesystem information. For example, the -user mbutcher filter will only return files owned by the user mbutcher.

Another great feature of find is its ability to run an operation on each match it identifies. The find command makes it easy to execute another command on each match.

For example, we can search for all .go files owned by mbutcher and then run each through grep to find the number of times the string const appears:

find . -name "*.go" -user mbutcher -print -exec grep -c "const " {} \;
./example/fence.go
1
./example/users.go
1
./schema2struct/schema2struct.go
5
./schema2struct/test.go
0

The example above is doing these things:

  • Search all files and directories starting with .
  • Return only files that match the name pattern *.go
  • Filter only files that are owned by mbutcher
  • Print the name of each matched file
  • And for each matched file, run: grep -c "const " {} (which prints the count, -c of matches within that file)

The special name {} tells find to substitute the name of the matched file.

It's worth noting that the syntax for a find command varies a little from the UNIX norm. It is:

find [FLAGS] [PATH] [FILTER EXPRESSION]

This can be confusing because filters often look like (Plan 9 formatted) flags: -name, -exec, -user, etc.

To dive a little deeper into find, you might enjoy these articles:

Wrapping Up

Each of these tools is useful in a particular context:

  • ag is for interactive code searching
  • grep is for searching inside files
  • find is for searching for files inside directory trees

All three are good to know for your day-to-day UNIXy work.



comments powered by Disqus