Command Line Searching with grep, find, and ag
There are lots of tools for searching files on the UNIX (macOS, Linux) command line. Which one do you use? Let's look at grep
, find
, and ag
to understand which tool is the best for a particular search job.
Three Search Tools
There are three tools we'll look at here:
grep
: This tool is for using a regular expression to search the content of one or more files.find
: This is a tool for searching directory trees to find files that match certain criteria.ag
: Called The Silver Searcher (ag
is the chemical symbol for silver),ag
is a tool optimized for searching source code files for particular regular expressions.
The grep
and find
tools have been around since the early days of UNIX, both originally written in the 1970s. ag
is a newcomer to the field, created around 2012 as an alternative to ack.
I use all three fairly frequently. What I've learned is that each has its strong points.
When to Use ag
I use ag
primarily in an interactive shell (or from vim
). Its support for colorized output and highlighting, combined with its stunning speed, makes it great for hunting through large code bases for a particular thing.
I usually using it when I'm in "coding mode", trying to find a pesky bug or refactor something in my code.
For example, I can use ag "func Test"
to search for all of the test functions in my Go project:
One of the cool features of ag
is that it will respect the settings from .gitignore
files. It will skip searching files and directories ignored by that file.
But I rarely use ag
in shell scripts. Really, I only script with it if I am sure that the script I am writing is just for me. Since ag
is not installed on most systems by default, it's not a good idea to assume its presence in shell scripts.
When to Use grep
While ag
is specialized for code, grep
is a general purpose search tool for text files. I can roughly approximate the same results from the ag
example above by doing this:
$ grep -n "func Test" *.go
sqlite_pod_test.go:44:func TestPlainStructInsert(t *testing.T) {
sqlite_pod_test.go:66:func TestPlainStructLoad(t *testing.T) {
sqlite_pod_test.go:91:func TestPlainStructLoadWhere(t *testing.T) {
sqlite_pod_test.go:120:func TestPlainStructUpdate(t *testing.T) {
sqlite_pod_test.go:149:func TestPlainStructDelete(t *testing.T) {
sqlite_pod_test.go:175:func TestPlainStructExists(t *testing.T) {
The output might not be as attractive, but several attributes of grep
make it easier to script with:
- Each line contains all the information we need to identify file, line number, and the matched line
- The format is relatively easy to parse
- The results of
grep
can be easily piped to other UNIX commands (grep "Test" | grep -v "Plain"
) - The
grep
command is a "common denominator" for UNIX-like systems
The grep
command is included in just about every UNIX environment I've ever worked in. Even tiny Linux distributions like Alpine Linux include an implementation of grep
.
On one hand, this is great. It means grep
is a great choice for performing searches from within shell scripts.
But on the other hand, it is important to be aware of the fact that there are multiple implementations of grep
. The BSD and GNU grep implementations support a superset of the original standard set of features, while versions like Busybox grep supports a limited set of features.
When to Use find
While ag
and grep
are really optimized for searching the contents of files, the find
command is optimized for searching directory trees to find files.
For example, I can search my source code to find all of the .go
files:
$ find . -name "*.go" -print
./example/fence.go
./example/users.go
./schema2struct/schema2struct.go
...
One of the nice features of find
is that it supports some elaborate filtering features designed to take advantage of filesystem information. For example, the -user mbutcher
filter will only return files owned by the user mbutcher
.
Another great feature of find
is its ability to run an operation on each match it identifies. The find
command makes it easy to execute another command on each match.
For example, we can search for all .go
files owned by mbutcher
and then run each through grep
to find the number of times the string const
appears:
find . -name "*.go" -user mbutcher -print -exec grep -c "const " {} \;
./example/fence.go
1
./example/users.go
1
./schema2struct/schema2struct.go
5
./schema2struct/test.go
0
The example above is doing these things:
- Search all files and directories starting with
.
- Return only files that match the name pattern
*.go
- Filter only files that are owned by
mbutcher
- Print the name of each matched file
- And for each matched file, run:
grep -c "const " {}
(which prints the count,-c
of matches within that file)
The special name {}
tells find
to substitute the name of the matched file.
It's worth noting that the syntax for a find
command varies a little from the UNIX norm. It is:
find [FLAGS] [PATH] [FILTER EXPRESSION]
This can be confusing because filters often look like (Plan 9 formatted) flags: -name
, -exec
, -user
, etc.
To dive a little deeper into find
, you might enjoy these articles:
- My earlier blog on Using find
- 25 Simple Linux Find Command Examples
Wrapping Up
Each of these tools is useful in a particular context:
ag
is for interactive code searchinggrep
is for searching inside filesfind
is for searching for files inside directory trees
All three are good to know for your day-to-day UNIXy work.