What’s the DIFF?

One of the basic tasks Integration Engineers do is to compare files that we use or receive. There are some interesting and useful tools that people can get out there to DIFF files. But on Linux and Unix machines around the world there is a native tool that is almost always present. Amazingly it is called DIFF.
Like some other command-line tools, its interface is not really intuitive. Lets walk through the basics of how to get use out of this handy file comparing tool. (If you are working with and comparing EDI files, you might want to look at the post on how to unwrap your EDI file so that our line by line comparison is more meaningful.)
How to used “diff”
You can get the real basics by executing “diff –help” to get the basic help and options for this application. But in short, here is the thumb nail. “diff” is followed by some options. Options are designated by a “-” and then a letter indicating the option. Any options are then followed by the two file names that are being compared. Lets look at an example.
Example:
We have two files, file1.txt and file2.txt
| File1.txt | File2.txt |
|---|---|
| This is a test file: And this is the first line of the first file. Thanks. |
This is a test file: And this is the first line of the second file. Thanks. Again. |
When we issue this command: “diff file1.txt file2.txt” we get this result.
2c2
< And this is the first line of the first file.
—
> And this is the first line of the second file.
3a4
> Again.
- The first thing we see is “2c2″ This is line 2 of the first file, compared or changed to line 2 of second file.
- Next we have a < indicating the first file, and the line echoed.
- Following this we have a “—” as a separator between the line compared.
- Next we have > indicating the second file, and then that line is echoed.
- This is a comparison between to lines that were found to be different.
- For the next line that is shown, we have “3a4″ that indicates that there is a line added to the file.
- Finally, > indicates the second file followed by the line being echoed.
If we were to compare them in the other order, we end with these two lines:
4d3
< Again.
- Here, “4d3″ that the 4th line on the first file is deleted from the second file.
- Following this is < indicating the first file, and echoing the line.
Regular Options
Here are the list of options that “–help” gives you, with maybe some more explanation.
diff [-b] [-i] [-t] [-w] [-c] [-C] [-e] [-f] [-h] [-n] [-D string] [-l] [-r] [-s] [-S name] [fileone filetwo ] [directoryone directorytwo]
| -b | Ignores spacing differences. This is useful when white-space doesn’t matter in what you are comparing. |
| -i | Ignores case. This is useful when case doesn’t matter in what you are comparing. |
| -t | Expands TAB characters in output lines. Normal or -c output adds character(s) to the front of each line that may adversely affect the indentation of the original source lines and make the output lines difficult to interpret. This option will preserve the original source’s indentation. |
| -w | Ignores spaces and tabs. Again, for when we don’t want to include changes in the white-space. |
| -c | Produces a listing of differences with three lines of context. With this option output format is modified slightly: output begins with identification of the files involved and their creation dates, then each change is separated by a line with a dozen *’s. The lines removed from file1 are marked with ‘-’; those added to file2 are marked ‘+’. Lines that are changed from one file to the other are marked in both files with ‘!’.
With our two files we get this output: *** file1.txt 2009-11-17 10:20:38.000000000 -0700 |
| -C | Produces a listing of differences identical to that produced by -c with number lines of context.
There is no difference to just -c with our examples if you supply a number. i.e diff -c 1 file1.txt file1.txt |
| -e | Output an ed script. I have looked at these, but really haven’t used this feature for anything real. I may later if I have time.
With our files it looks like this: 3a |
| -f | Produces a similar script, not useful with ed , in the opposite order. (Really, this is exactly like -e except in reverse order.) |
| -h | Does a fast, half-hearted job. It works only when changed stretches are short and well separated, but does work on files of unlimited length. Options -c, -e, -f, and -n are unavailable with -h. diff does not descend into directories with this option.
With our example files it produces the same output as with no options. |
| -n | Produces a script similar to -e, but in the opposite order and with a count of changed lines on each insert or delete command. |
| -D string | Creates a merged version of file1 and file2 with C preprocessor controls included so that a compilation of the result without defining string is equivalent to compiling file1, while defining string will yield file2. |
| -l | Produce output in long format. Before the diff, each text file is piped through ‘pr’ to paginate it. Other differences are remembered and summarized after all text file differences are reported. |
| -r | Applies diff recursively to common subdirectories encountered. Just like you would expect if you have ever used this with any other command line tools like grep or rm. |
| -s | Reports files that are the identical; these would not otherwise be mentioned. |
| -S name | Starts a directory diff in the middle, beginning with the file name. Basically this is a compare directory after a supplied file name. Make sure this file exists in both directories or you will be disappointed. |
| filenameone | File one for comparing. |
| filenametwo | File two for comparing. |
| directoryone | Directory one for comparing. |
| directorytwo | Directory two for comparing. |
For comparing file, the first four options (-b, -i, -t, -w) are the most useful. I don’t start with any options and add them as I need them to reduce the amount of change noise reported in the result set.
diff is your friend
Like many basic tools, “diff” is almost always there. And if you know how to use it effectively, it can really save time and frustration. Sure there are other cool file comparison tools. Some are even embedded into other products. But knowing how to use the basic tools that are always there will be a life saver in a crisis situation. And the only way to know how to use them is to actually use them sometimes.
Do you use “diff” with a set of options that does a specific task for you? If so, what are they, please share. And what other basic tools do you use?
Subscribe to "The Integration Engineer" by Email
Find out about the tools and services available at The Integration Engineer's Consulting site.


November 18th, 2009 at 10:29 am
Some gotchas that come to mind include whitespace, including the evil end of line whitespace, consider -b and -w above and non printing characters.
Another console/unix tool that might be of interest depending on the changes you are looking for is vimdiff, which presents a graphical and color highlighted diff with which you can edit if needed (or just :set list to find the pesky hidden characters).