Overview

Git provides support for diffing binary files and including the results in diff output.

This is a non-standard extension to Unified Diffs, and is not widely supported outside of Git (Subversion and Mercurial do not, at the time of this writing, support this).

This guide will go over how Git Binary Diffs work, based in part on information scavenged from StackOverflow and other write-ups, and completed with in-house investigations into generated binary diffs. Every effort was made to avoid using any existing code for this analysis.

Types of Binary Diffs

Binary diffs can be generated in two forms:

  1. Literal Binary Diffs
  2. Delta Binary Diffs

Literal Binary Diffs contain the full (zlib-compressed, encoded) contents of both the original and modified binary file.

Delta Binary Diffs contain (zlib-compessed, encoded) instructions on applying patches to binary files. Those instructions are based on the results from the XDelta algorithm.

In both cases, the formatted structure is pretty much the same: