git diff across renamed directories

Consider the following scenario:

During porting the Spring Framework build from Ant to Gradle, we decided to rename the subproject directory structure.  It used to look like this:

org.springframework.aop
org.springframework.asm
org.springframework.aspects
org.springframework.beans
org.springframework.context
org.springframework.context.support
...

It now looks like this:

spring-aop
spring-asm
spring-aspects
spring-beans
spring-context
spring-context-support
...

This directory structure is more intuitive, because the subprojects are named according to the actual jars that we publish into Maven Central, e.g. spring-aop.jar, spring-context.jar, etc.

However, in the process of developing the Gradle version of the build, certain sources had been temporarily modified -- usually just @Ignore'ing certain JUnit tests when one broke for an unknown reason.  Each of these changes need to be revisited before the porting process could be called complete -- of course all tests need to continue to work before and after the build system change.

Prior to the directory rename, these diffs were easy to detect with a more or less vanilla `git diff`, e.g.:

# while on the 'gradle' development branch
$ git diff master -- org.springframework.*/src/*/java

This would return a nicely paged, in-color set of results, showing what changes had occurred to source files.  For example, I had @Ignore'd JibxMarshallerTests, intending to come back to it later. It would show up in the results like this:

index 7d43dcd..70bd337 100644
--- org.springframework.oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
+++ spring-oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
@@ -29,6 +29,7 @@ import static org.custommonkey.xmlunit.XMLAssert.*;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertTrue;
 
+@org.junit.Ignore // TODO fix this issue https://gist.github.com/1174579
 public class JibxMarshallerTests extends AbstractMarshallerTests {
 
        @Override

I expected that, after the directory rename, this kind of diff'ing would work in a similar way, because I know Git is quite smart about renames.  For example, after a rename, `git log` will return only the history for the file in its renamed version, but you can follow the history of this file across renames using `git log --follow`:

# while on the 'gradle' development branch
$ git log --oneline --name-only -4 --follow spring-oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
4743aa8 Rename modules {org.springframework.*=>spring-*}
spring-oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
c34ab00 Initial work on Gradle build
org.springframework.oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
fa4f90e SPR-7805 - Add support for package binding in the JibxMashaller
org.springframework.oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java
9d1c3fa SPR-6907 - JibxMarshaller - provide access to jibx's writeDocType
org.springframework.oxm/src/test/java/org/springframework/oxm/jibx/JibxMarshallerTests.java

In the results above, you can see that the most recent log entry is against the renamed file (spring-oxm/...), while the previous three are against the original path (org.springframework.oxm).  This is exactly what one wants.

However, `git diff` doesn't have a --follow flag, and it's not immediately obvious how to get similar functionality.  I essentially want to see the same diff as I originally showed above, even across the renamed directories.

After 30 minutes or so of manpages, googling, and trial-and-error, I arrived at the following command, which provides me exactly the results I'm looking for:

$ git diff --color --diff-filter=R -M master -- \
  org.springframework.* spring-* | egrep -v \
  '^....(diff|similarity|rename)' | less -R

Whew!  Let's break that down:

$ git diff master -- org.springframework.* spring-*

This one is pretty clear: I'm saying that I want to see all changes from my current branch ('gradle') compared to the 'master' branch, for all paths starting with org.springframework.* or spring-* -- this should catch all the renamed files, and any actual diffs between them.

Unfortunately, however, it shows me the deletion of the org.springframework.* files, followed by the addition of the spring-* files -- this is not what I want!  The information is all there, yes, but I need it in a more concise way -- just show me what content changed across the rename!

$ git diff -M master -- org.springframework.* spring.*

Now we're getting somewhere!  "-M" (also "--find-renames") detects that files were renamed and shows only the content differences between the two, across renames.  Brilliant.

However, I'm still seeing quite a bit of extra noise in the output -- all files that were added or removed, as well as lots of information about renamed files that did not have any content differences between them.  So the next step is adding in the --diff-filter flag:

$ git diff --diff-filter=R -M master -- \
  org.springframework.* spring.*

This tells git to show me only information about renamed files (R).  Now I'm getting less noise, but it's not all gone -- I still see thousands of entries about files that were renamed with a "100% similiarity index", meaning that they had no content changes.  I want only the files that have material differences, so now we'll just pipe all that to egrep:

$ git diff --diff-filter=R -M master -- \
  org.springframework.* spring-* | \
  egrep -v '^(diff|similarity|rename)'

It's looking much cleaner now.  I'm seeing only the renamed files across the two branches that actually have changes, but I've lost colored output from `git diff` because its output is being piped.  No problem, we can fix this with `git diff --color`, right?

$ git diff --color --diff-filter=R -M master -- \
  org.springframework.* spring-* | \
  egrep -v '^(diff|similarity|rename)'

Almost -- now that the ANSI color codes are still being output, my egrep regex doesn't match anymore.  Let's touch that up by anticipating that there will be four characters representing the ANSI color sequences at the beginning of every line.  The regex changes from '^(diff|similarity|rename)' to '^....(diff|similarity|rename)':

$ git diff --color --diff-filter=R -M master -- \
  org.springframework.* spring-* | \
  egrep -v '^....(diff|similarity|rename)'

Excellent.  I'm now getting exactly the output that I want, just like it looked before the directory rename.  There's only one thing missing -- the content is not paged.  Again, this is because of the piping involved.  Of course, one can just pipe this to `less`, but by default, `less` won't handle the color sequences gracefully.  It'll print them in literal form so you get ugly "ESC[31m" in the output of every line, etc.  The solution here?  Supply the `-R` switch, telling less to output ANSI escape sequences in raw form, i.e. "show me the color, not the escape sequence".  Here we arrive at the final command:

$ git diff --color --diff-filter=R -M master -- \
  org.springframework.* spring-* | \
  egrep -v '^....(diff|similarity|rename)' | \
  less -R

You may have to work for it sometimes, but git rarely fails to provide the tools you need.

In this particular scenario, it would have been nice if git had a way to find renames (-M), but to also exclude those that have a similarity index over a certain threshold.  -M can take a number argument, signifying that only renames with a similarity *over* that percentage should be shown, but in this case, I would have like just the opposite: show me all renames *under* 100% -- that would have saved the egrepping.

email++

I was surprised recently to discover the significance of the '+' character in email addresses.  It tuns out that anything in the local part of an email address following a '+' should be ignored by the mail server during routing.

For example, an email addressed to foo+bar@gmail.com will be routed to user 'foo' just as if the email had been addressed to foo@gmail.com.  I don't recall how I came across this but a bit of googling pointed me to the Wikipedia page on Email Sub-Addressing.  The page is informative, but strangely it doesn't mention RFC 3598, which introduced the sub-addressing concept in late 2003 (I guess I should get busy editing that article).  Perhaps the fact that the RFC is relatively new when compared to the age of email itself explains why this is feature is not so well known or exploited by users. 

A few use cases for sub-addressing came to mind once I knew of it: 

 
1) A native message tagging system
 
 user+tagname@host.com can be filtered via rules on the client side into different folders, etc.  I don't care so much about this personally, as I tend to have very few folders and generally adhere to the "Inbox Zero" approach to email management.
 
However, I do use (and recommend) Evernote, where tagging is an important organizational tool.  One can create notes by sending mail to an Evernote email address, and the company recently announced support for tagging notes sent via email by using #hash #tags in the subject line.
 
This support is nice and I use it frequently, but a more elegant approach might have been to support the same functionality via +tags in the Evernote address.  This would leave the subject line uncluttered, which is important in cases where Evernote is not the only recipient.  For example, it's a bit strange to send an email like the following:
 
Re: Your mail #tag1 #tag2
 
but the alternative would be natural and transparent to all involved:
 
 
I tried sending mail to my Evernote account with +tags in the address just to see what would happen, and they failed even to route it to my account.  I received the following bounce notification:
 
Evernote was unable to submit your note for the following reason:
Emailed note was received, but Evernote did not understand the email address. It may be mis-typed, or the user may not exist. Please check to make sure the address was properly entered.
Too bad, but there's nothing conceptually stopping them from adding this functionality tomorrow (or at least letting the message through!).  Sending messages to different Evernote notebooks could even be supported with hash tags in the address (per RFC 3598, sub-addressing is not limited to the plus character).  For example, an email to me.4el72#london+meal+expenses@evernote.com would create a new note in the notebook named 'london' having tags 'meal' and 'expenses'.
 
 
2) Lightweight disposable email addresses
 
 Disposable email services like Mailinator have existed for ages, but one could use sub-addressing to achieve a similar result.  Specify a +qualified version of your email address when signing up for a service online, and then filter it at a later time if mail from that service is no longer wanted.
 
Pro:  Zero creation overhead.  The 'creation' of the disposable address is as simple as typing "+disposable" in addition to your usual me@mail.com.  Some services make it nearly this easy, in that you can send an email to whatever42@disposablemailservice.com, and then go check the account, but it still requires knowing about the service and then checking the 'inbox' via that service's website, negotiating their UI, and so on.
 
Con:  Those with malicious intent can easily subvert the scheme by truncating the + and everything that follows it.  Your actual email address is in no way hidden.
 
Con:  You have to manually add a filter to 'dispose of' the email address.  Kind of ruins the productivity gains realized by zero creation overhead as above.  This can be mitigated somewhat by always using the same +disposable tag, and just routing it to a 'disposable' folder in your email client.
 
You might say this approach is convenient but insecure.  If you wish to truly hide, it's not the way to go; it may be ideal if you basically trust the service in question but want to hedge your bets against potentially unwanted email in the future.
 
 
3) Spam tracing
 
 A slight variation on the above.  Supply a different +qualifier each time one's email address is provided to a merchant or service online.  If mail is received with that qualifier from anyone other than that original merchant or service, you know exactly who is selling your data.  Unfortunately, as mentioned above, any smart spammer would know to shave off the + and everything after so as to hide the origin of the list.  This is probably a common spamming 'best practice' already (there's a book title I'd love to see).
 
 
Caveats
 
* Apparently, not all mail servers implement sub-addressing functionality when receiving messages.  GMail certainly does, and I would presume that most modern servers maintained beyond 2001 (the date of RFC 3598) would provide support.
 
* Certain mail servers will reject attempts to send messages that have a '+' or other seemingly (but not actually) non-standard characters.  Hotmail for example, is notoriously restrictive.
 
* Often, web forms will be too restrictive in validating email addresses, and will not allow a form to be sent with a +qualified address.  I actually ran into this the same day I found out about sub-addressing.

OmniFocus Feature Request: Provide 1st-class support for items in a 'Waiting For' state

I just sent this request via email to OmniFocus support.  After I wrote it I searched around and found a relevant thread proposing a similar idea on the OF support forum.

 
As a user of OmniFocus, I want to be able to filter out projects that are currently waiting on an action from another person or occurrence of an external event.
 
Many OmniFocus users, including myself, create a context named 'Waiting For', but doing so is not particularly useful.  What I need is to be able to filter out projects that are in a 'wait state' from my current list of projects in order to have less noise and avoid being overwhelmed.
 
Today OmniFocus has 'Active', 'Stalled', 'Pending', and 'On Hold' project states.  I suggest adding a 'Waiting For'  project state and a project filter called 'Actionable' .  Actionable projects are technically 'Active' (they are not indefinitely 'On Hold', and I may not wish to set a future date in order to make them 'Pending').
 
 
Consider the following use case which assumes the feature has been implemented:
 
* I have my OmniFocus Project Filter set to 'Remaining'
 
* I have a project called 'Sell my house in Seattle'.  The current task in this project is to email my real estate agent and ask him to do a market assessment.
 
* I complete this task and check it off.  I create a second task entitled 'Waiting for agent to respond re: market assessment'.
 
* During the creation of this task, I indicate via a new control on the UI that this is an item in a 'Waiting For' state.
 
* Because this task is the next task to be completed for this project, the project state changes to 'waiting for'.  An appropriate overlay shows up on the project icon, just as happens when a project becomes 'Pending', 'On Hold' or otherwise transitions state.
 
* I switch my Project Filter to 'Actionable', and the 'Sell my house in Seattle' project disappears because it is in a 'Waiting For' state, and is thus not 'actionable' by me right now.
 
* OmniFocus is now significantly more useful to me, because I know that the projects in front of me are all things that I can actually work on immediately.