when exactly does a git merge conflict arise

 

Questions


I’m using git to track changes to my LaTeX documents. I tend to keep feedback from co-authors in a separate branch and merge it in later. So far things seem to magically merge properly, but I would like to know when exactly a merge conflict occurs, so that I can obtain some real trust in the merging process (I would not like text to come out funky of course).

There are a number of questions on StackOverflow that seem to ask the same thing, but none of the answers get very specific. For example this answer that specifies that a conflict occurs if changes were made to the same region, but that makes me wonder what exactly those regions are. Is it just changes made to the same line, or is some context taken into account?

 

 

————————————————-

Answer

It’s on a line by line basis, and the answer is sort of both no and yes: context does matter, but the amount that it matters is tricky. It’s both more and less than you might think at first.

You might want to skim through this answer to a related question first, for background. I will now assume that we have base as the (single) merge base (perhaps we set the name base by tagging the specific commit, e.g., git tag base $(git merge-base HEAD other)) and HEAD as our commit, with some other branch-name other naming the other commit.

Next, we look at the two diffs:

git diff base HEAD
git diff base <other>

If we see that all three versions of file F are different (so that F appears in both outputs, and the changes differ), we must then work, in essence, diff-hunk-by-diff-hunk. Where diff hunks overlap, but make different changes, Git declares a conflict. But—this seems to be your question—what, exactly, does “make different changes” mean?

I think this is best shown by example. For instance:

$ git diff base HEAD
diff --git a/basefile b/basefile
index df781c1..e4f9e4b 100644
--- a/basefile
+++ b/basefile
@@ -4,6 +4,7 @@
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
 # are met:
+# added line in b1
 # 1. Redistributions of source code must retain the above copyright
 #    notice, this list of conditions and the following disclaimer.
 # 2. Redistributions in binary form must reproduce the above copyright

and:

$ git diff base b2
diff --git a/basefile b/basefile
index df781c1..c96620e 100644
--- a/basefile
+++ b/basefile
@@ -4,7 +4,6 @@
 # Redistribution and use in source and binary forms, with or without
 # modification, are permitted provided that the following conditions
 # are met:
-# 1. Redistributions of source code must retain the above copyright
 #    notice, this list of conditions and the following disclaimer.
 # 2. Redistributions in binary form must reproduce the above copyright
 #    notice, this list of conditions and the following disclaimer in the

Note that while these changes don’t touch the same line, in a sense, they also do touch the same line, in a sense. I added a line 7 (pushing old line 7 down to line 8), and I deleted the old line 7. These are, apparently, the “same” line. So:

$ git merge b2
Auto-merging basefile
CONFLICT (content): Merge conflict in basefile
Automatic merge failed; fix conflicts and then commit the result.

Let’s abort this merge and consider the tip of branch b3 instead (the merge base of b1 and b3 are the same as the merge base of b1 and b2, in my setup).

$ git merge --abort
$ git diff base b3
diff --git a/basefile b/basefile
index df781c1..e2b8567 100644
--- a/basefile
+++ b/basefile
@@ -5,7 +5,6 @@
 # modification, are permitted provided that the following conditions
 # are met:
 # 1. Redistributions of source code must retain the above copyright
-#    notice, this list of conditions and the following disclaimer.
 # 2. Redistributions in binary form must reproduce the above copyright
 #    notice, this list of conditions and the following disclaimer in the
 #    documentation and/or other materials provided with the distribution.
 $ git merge --no-edit b3
Auto-merging basefile
Merge made by the 'recursive' strategy.
 basefile | 1 -
 1 file changed, 1 deletion(-)

This time there was no conflict, even though both diff hunks touched the same general area. The second diff deleted a line that was not “touching” the added line, so Git considered this safe.

If you experiment more, in this same fashion, you will find exactly which seemingly-overlapping changes are combined successfully, and which result in a conflict. Obviously changes that directly overlap, e.g., where both delete original line 42 and insert a different new line 42, will conflict. But all changes are always represented as “delete some existing line(s), though maybe zero of them” followed by “add some new line(s), though maybe zero of them”. A change—even one that changes, adds, or deletes just one word within a line—deletes a nonzero number of existing lines and adds a nonzero number of new lines. A pure-delete (of one or more complete line) adds zero lines, and a pure-insert deletes zero lines. In the end, it comes down to: “Did both ours-and-theirs changes touch the same line number?” The context becomes almost irrelevant, except that when deleting zero lines, or inserting zero lines, the context itself “is” the lines, in a sense. (I’m not sure how much sense this claim makes, so if it’s incomprehensible, that’s my fault. 😉 )

(Remember also that if you are modifying the “merged so far” file as you work, you must use the original base-file’s line numbers when looking at whether a change touched “the same” lines. Since both “ours” and “theirs” have the same base version, that’s an easy short-cut we can use here.)

A three-way merge is not a patch

Note that this differs from applying a patch, which is done without a common base version to start. In the case of a patch, the context is used much more heavily: the diff hunk header provides the location for searching for the context, but since it might be applied to a different version of the file, the context allows us (and Git) to make the same change at a different line, as long as the context still matches.

The patch utility uses a different algorithm here (a “maximum fuzz” factor, looking +/- that many lines). Git doesn’t do fuzz factors; it will search all the way to the beginning or end of the file, if it has to. It does, however, have the usual option of tweaking white space before deciding that context fails to match.

(When using git apply to apply a patch, you can add -3 or --3way to allow Git to read the index lines, which provide partial or full hash IDs of file blobs. The left hand hash ID is the hash of the previous version of the file: note that in all the diffs above, the “base” version of basefile has ID df781c1. If Git can find a unique blob from that ID, it can pretend that that is the merge-base, and diff just the one merge-base against HEAD, treating the patch itself as the other diff, and do a three-way merge that way. This sometimes allows git apply to succeed where patch would fail.)

git,merge,merge-conflict-resolution