git diff across renamed directories

Consider the following scenario:

During porting the Spring Framework build from Ant to Gradle, we decided to rename the subproject directory structure.  It used to look like this:

org.springframework.aop
org.springframework.asm
org.springframework.aspects
org.springframework.beans
org.springframework.context
org.springframework.context.support
...

It now looks like this:

spring-aop
spring-asm
spring-aspects
spring-beans
spring-context
spring-context-support
...

This directory structure is more intuitive, because the subprojects are named according to the actual jars that we publish into Maven Central, e.g. spring-aop.jar, spring-context.jar, etc.

However, in the process of developing the Gradle version of the build, certain sources had been temporarily modified -- usually just @Ignore'ing certain JUnit tests when one broke for an unknown reason.  Each of these changes need to be revisited before the porting process could be called complete -- of course all tests need to continue to work before and after the build system change.

Prior to the directory rename, these diffs were easy to detect with a more or less vanilla `git diff`, e.g.:

# while on the 'gradle' development branch
$ git diff master -- org.springframework.*/src/*/java

This would return a nicely paged, in-color set of results, showing what changes had occurred to source files.  For example, I had @Ignore'd JibxMarshallerTests, intending to come back to it later. It would show up in the results like this:

index 7d43dcd..70bd337 100644
--- org.springframework.oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
+++ spring-oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
@@ -29,6 +29,7 @@ import static org.custommonkey.xmlunit.XMLAssert.*;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertTrue;
 
+@org.junit.Ignore // TODO fix this issue https://gist.github.com/1174579
 public class JibxMarshallerTests extends AbstractMarshallerTests {
 
        @Override

I expected that, after the directory rename, this kind of diff'ing would work in a similar way, because I know Git is quite smart about renames.  For example, after a rename, `git log` will return only the history for the file in its renamed version, but you can follow the history of this file across renames using `git log --follow`:

# while on the 'gradle' development branch
$ git log --oneline --name-only -4 --follow spring-oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
4743aa8 Rename modules {org.springframework.*=>spring-*}
spring-oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
c34ab00 Initial work on Gradle build
org.springframework.oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
fa4f90e SPR-7805 - Add support for package binding in the JibxMashaller
org.springframework.oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
9d1c3fa SPR-6907 - JibxMarshaller - provide access to jibx's writeDocType
org.springframework.oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java

In the results above, you can see that the most recent log entry is against the renamed file (spring-oxm/...), while the previous three are against the original path (org.springframework.oxm).  This is exactly what one wants.

However, `git diff` doesn't have a --follow flag, and it's not immediately obvious how to get similar functionality.  I essentially want to see the same diff as I originally showed above, even across the renamed directories.

After 30 minutes or so of manpages, googling, and trial-and-error, I arrived at the following command, which provides me exactly the results I'm looking for:

$ git diff --color --diff-filter=R -M master -- \
  org.springframework.* spring-* | egrep -v \
  '^....(diff|similarity|rename)' | less -R

Whew!  Let's break that down:

$ git diff master -- org.springframework.* spring-*

This one is pretty clear: I'm saying that I want to see all changes from my current branch ('gradle') compared to the 'master' branch, for all paths starting with org.springframework.* or spring-* -- this should catch all the renamed files, and any actual diffs between them.

Unfortunately, however, it shows me the deletion of the org.springframework.* files, followed by the addition of the spring-* files -- this is not what I want!  The information is all there, yes, but I need it in a more concise way -- just show me what content changed across the rename!

$ git diff -M master -- org.springframework.* spring.*

Now we're getting somewhere!  "-M" (also "--find-renames") detects that files were renamed and shows only the content differences between the two, across renames.  Brilliant.

However, I'm still seeing quite a bit of extra noise in the output -- all files that were added or removed, as well as lots of information about renamed files that did not have any content differences between them.  So the next step is adding in the --diff-filter flag:

$ git diff --diff-filter=R -M master -- \
  org.springframework.* spring.*

This tells git to show me only information about renamed files (R).  Now I'm getting less noise, but it's not all gone -- I still see thousands of entries about files that were renamed with a "100% similiarity index", meaning that they had no content changes.  I want only the files that have material differences, so now we'll just pipe all that to egrep:

$ git diff --diff-filter=R -M master -- \
  org.springframework.* spring-* | \
  egrep -v '^(diff|similarity|rename)'

It's looking much cleaner now.  I'm seeing only the renamed files across the two branches that actually have changes, but I've lost colored output from `git diff` because its output is being piped.  No problem, we can fix this with `git diff --color`, right?

$ git diff --color --diff-filter=R -M master -- \
  org.springframework.* spring-* | \
  egrep -v '^(diff|similarity|rename)'

Almost -- now that the ANSI color codes are still being output, my egrep regex doesn't match anymore.  Let's touch that up by anticipating that there will be four characters representing the ANSI color sequences at the beginning of every line.  The regex changes from '^(diff|similarity|rename)' to '^....(diff|similarity|rename)':

$ git diff --color --diff-filter=R -M master -- \
  org.springframework.* spring-* | \
  egrep -v '^....(diff|similarity|rename)'

Excellent.  I'm now getting exactly the output that I want, just like it looked before the directory rename.  There's only one thing missing -- the content is not paged.  Again, this is because of the piping involved.  Of course, one can just pipe this to `less`, but by default, `less` won't handle the color sequences gracefully.  It'll print them in literal form so you get ugly "ESC[31m" in the output of every line, etc.  The solution here?  Supply the `-R` switch, telling less to output ANSI escape sequences in raw form, i.e. "show me the color, not the escape sequence".  Here we arrive at the final command:

$ git diff --color --diff-filter=R -M master -- \
  org.springframework.* spring-* | \
  egrep -v '^....(diff|similarity|rename)' | \
  less -R

You may have to work for it sometimes, but git rarely fails to provide the tools you need.

In this particular scenario, it would have been nice if git had a way to find renames (-M), but to also exclude those that have a similarity index over a certain threshold.  -M can take a number argument, signifying that only renames with a similarity *over* that percentage should be shown, but in this case, I would have like just the opposite: show me all renames *under* 100% -- that would have saved the egrepping.