Absorbing Commit Changes in Mercurial 4.8
November 05, 2018 at 09:25 AM | categories: Mercurial, Mozilla
Every so often a tool you use introduces a feature that is so useful
that you can't imagine how things were before that feature existed.
The recent 4.8 release of the
Mercurial version control tool introduces
such a feature: the hg absorb
command.
hg absorb
is a mechanism to automatically and intelligently incorporate
uncommitted changes into prior commits. Think of it as hg histedit
or
git rebase -i
with auto squashing.
Imagine you have a set of changes to prior commits in your working
directory. hg absorb
figures out which changes map to which commits
and absorbs each of those changes into the appropriate commit. Using
hg absorb
, you can replace cumbersome and often merge conflict ridden
history editing workflows with a single command that often just works.
Read on for more details and examples.
Modern version control workflows often entail having multiple unlanded commits in flight. What this looks like varies heavily by the version control tool, standards and review workflows employed by the specific project/repository, and personal preferences.
A workflow practiced by a lot of projects is to author your commits into a sequence of standalone commits, with each commit representing a discrete, logical unit of work. Each commit is then reviewed/evaluated/tested on its own as part of a larger series. (This workflow is practiced by Firefox, the Git and Mercurial projects, and the Linux Kernel to name a few.)
A common task that arises when working with such a workflow is the need to incorporate changes into an old commit. For example, let's say we have a stack of the following commits:
$ hg show stack
@ 1c114a ansible/hg-web: serve static files as immutable content
o d2cf48 ansible/hg-web: synchronize templates earlier
o c29f28 ansible/hg-web: convert hgrc to a template
o 166549 ansible/hg-web: tell hgweb that static files are in /static/
o d46d6a ansible/hg-web: serve static template files from httpd
o 37fdad testing: only print when in verbose mode
/ (stack base)
o e44c2e (@) testing: install Mercurial 4.8 final
Contained within this stack are 5 commits changing the way that static files are served by hg.mozilla.org (but that's not important).
Let's say I submit this stack of commits for review. The reviewer spots a problem with the second commit (serve static template files from httpd) and wants me to make a change.
How do you go about making that change?
Again, this depends on the exact tool and workflow you are using.
A common workflow is to not rewrite the existing commits at all: you simply create a new fixup commit on top of the stack, leaving the existing commits as-is. e.g.:
$ hg show stack
o deadad fix typo in httpd config
o 1c114a ansible/hg-web: serve static files as immutable content
o d2cf48 ansible/hg-web: synchronize templates earlier
o c29f28 ansible/hg-web: convert hgrc to a template
o 166549 ansible/hg-web: tell hgweb that static files are in /static/
o d46d6a ansible/hg-web: serve static template files from httpd
o 37fdad testing: only print when in verbose mode
/ (stack base)
o e44c2e (@) testing: install Mercurial 4.8 final
When the entire series of commits is incorporated into the repository,
the end state of the files is the same, so all is well. But this strategy
of using fixup commits (while popular - especially with Git-based tooling
like GitHub that puts a larger emphasis on the end state of changes rather
than the individual commits) isn't practiced by all projects.
hg absorb
will not help you if this is your workflow.
A popular variation of this fixup commit workflow is to author a new commit then incorporate this commit into a prior commit. This typically involves the following actions:
<save changes to a file>
$ hg commit
<type commit message>
$ hg histedit
<manually choose what actions to perform to what commits>
OR
<save changes to a file>
$ git add <file>
$ git commit
<type commit message>
$ git rebase --interactive
<manually choose what actions to perform to what commits>
Essentially, you produce a new commit. Then you run a history editing command. You then tell that history editing command what to do (e.g. to squash or fold one commit into another), that command performs work and produces a set of rewritten commits.
In simple cases, you may make a simple change to a single file. Things are pretty straightforward. You need to know which two commits to squash together. This is often trivial. Although it can be cumbersome if there are several commits and it isn't clear which one should be receiving the new changes.
In more complex cases, you may make multiple modifications to multiple files. You may even want to squash your fixups into separate commits. And for some code reviews, this complex case can be quite common. It isn't uncommon for me to be incorporating dozens of reviewer-suggested changes across several commits!
These complex use cases are where things can get really complicated for version control tool interactions. Let's say we want to make multiple changes to a file and then incorporate those changes into multiple commits. To keep it simple, let's assume 2 modifications in a single file squashing into 2 commits:
<save changes to file>
$ hg commit --interactive
<select changes to commit>
<type commit message>
$ hg commit
<type commit message>
$ hg histedit
<manually choose what actions to perform to what commits>
OR
<save changes to file>
$ git add <file>
$ git add --interactive
<select changes to stage>
$ git commit
<type commit message>
$ git add <file>
$ git commit
<type commit message>
$ git rebase --interactive
<manually choose which actions to perform to what commits>
We can see that the number of actions required by users has already increased
substantially. Not captured by the number of lines is the effort that must go
into the interactive commands like hg commit --interactive
,
git add --interactive
, hg histedit
, and git rebase --interactive
. For
these commands, users must tell the VCS tool exactly what actions to take.
This takes time and requires some cognitive load. This ultimately distracts
the user from the task at hand, which is bad for concentration and productivity.
The user just wants to amend old commits: telling the VCS tool what actions
to take is an obstacle in their way. (A compelling argument can be made that
the work required with these workflows to produce a clean history is too much
effort and it is easier to make the trade-off favoring simpler workflows
versus cleaner history.)
These kinds of squash fixup workflows are what hg absorb
is designed to
make easier. When using hg absorb
, the above workflow can be reduced to:
<save changes to file>
$ hg absorb
<hit y to accept changes>
OR
<save changes to file>
$ hg absorb --apply-changes
Let's assume the following changes are made in the working directory:
$ hg diff
diff --git a/ansible/roles/hg-web/templates/vhost.conf.j2 b/ansible/roles/hg-web/templates/vhost.conf.j2
--- a/ansible/roles/hg-web/templates/vhost.conf.j2
+++ b/ansible/roles/hg-web/templates/vhost.conf.j2
@@ -76,7 +76,7 @@ LimitRequestFields 1000
# Serve static files straight from disk.
<Directory /repo/hg/htdocs/static/>
Options FollowSymLinks
- AllowOverride NoneTypo
+ AllowOverride None
Require all granted
</Directory>
@@ -86,7 +86,7 @@ LimitRequestFields 1000
# and URLs are versioned by the v-c-t revision, they are immutable
# and can be served with aggressive caching settings.
<Location /static/>
- Header set Cache-Control "max-age=31536000, immutable, bad"
+ Header set Cache-Control "max-age=31536000, immutable"
</Location>
#LogLevel debug
That is, we have 2 separate uncommitted changes to
ansible/roles/hg-web/templates/vhost.conf.j2
.
Here is what happens when we run hg absorb
:
$ hg absorb
showing changes for ansible/roles/hg-web/templates/vhost.conf.j2
@@ -78,1 +78,1 @@
d46d6a7 - AllowOverride NoneTypo
d46d6a7 + AllowOverride None
@@ -88,1 +88,1 @@
1c114a3 - Header set Cache-Control "max-age=31536000, immutable, bad"
1c114a3 + Header set Cache-Control "max-age=31536000, immutable"
2 changesets affected
1c114a3 ansible/hg-web: serve static files as immutable content
d46d6a7 ansible/hg-web: serve static template files from httpd
apply changes (yn)?
<press "y">
2 of 2 chunk(s) applied
hg absorb
automatically figured out that the 2 separate uncommitted changes
mapped to 2 different changesets (Mercurial's term for commit). It
print a summary of what lines would be changed in what changesets and
prompted me to accept its plan for how to proceed. The human effort involved
is a quick review of the proposed changes and answering a prompt.
At a technical level, hg absorb
finds all uncommitted changes and
attempts to map each changed line to an unambiguous prior commit. For
every change that can be mapped cleanly, the uncommitted changes are
absorbed into the appropriate prior commit. Commits impacted by the
operation are rebased automatically. If a change cannot be mapped to an
unambiguous prior commit, it is left uncommitted and users can fall back
to an existing workflow (e.g. using hg histedit
).
But wait - there's more!
The automatic rewriting logic of hg absorb
is implemented by following
the history of lines. This is fundamentally different from the approach
taken by hg histedit
or git rebase
, which tend to rely on merge
strategies based on the
3-way merge
to derive a new version of a file given multiple input versions. This
approach combined with the fact that hg absorb
skips over changes with
an ambiguous application commit means that hg absorb
will never
encounter merge conflicts! Now, you may be thinking if you ignore
lines with ambiguous application targets, the patch would always apply
cleanly using a classical 3-way merge. This statement logically sounds
correct. But it isn't: hg absorb
can avoid merge conflicts when the
merging performed by hg histedit
or git rebase -i
would fail.
The above example attempts to exercise such a use case. Focusing on the initial change:
diff --git a/ansible/roles/hg-web/templates/vhost.conf.j2 b/ansible/roles/hg-web/templates/vhost.conf.j2
--- a/ansible/roles/hg-web/templates/vhost.conf.j2
+++ b/ansible/roles/hg-web/templates/vhost.conf.j2
@@ -76,7 +76,7 @@ LimitRequestFields 1000
# Serve static files straight from disk.
<Directory /repo/hg/htdocs/static/>
Options FollowSymLinks
- AllowOverride NoneTypo
+ AllowOverride None
Require all granted
</Directory>
This patch needs to be applied against the commit which introduced it. That commit had the following diff:
diff --git a/ansible/roles/hg-web/templates/vhost.conf.j2 b/ansible/roles/hg-web/templates/vhost.conf.j2
--- a/ansible/roles/hg-web/templates/vhost.conf.j2
+++ b/ansible/roles/hg-web/templates/vhost.conf.j2
@@ -73,6 +73,15 @@ LimitRequestFields 1000
{% endfor %}
</Location>
+ # Serve static files from templates directory straight from disk.
+ <Directory /repo/hg/hg_templates/static/>
+ Options None
+ AllowOverride NoneTypo
+ Require all granted
+ </Directory>
+
+ Alias /static/ /repo/hg/hg_templates/static/
+
#LogLevel debug
LogFormat "%h %v %u %t \"%r\" %>s %b %D \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\""
ErrorLog "/var/log/httpd/hg.mozilla.org/error_log"
But after that commit was another commit with the following change:
diff --git a/ansible/roles/hg-web/templates/vhost.conf.j2 b/ansible/roles/hg-web/templates/vhost.conf.j2
--- a/ansible/roles/hg-web/templates/vhost.conf.j2
+++ b/ansible/roles/hg-web/templates/vhost.conf.j2
@@ -73,14 +73,21 @@ LimitRequestFields 1000
{% endfor %}
</Location>
- # Serve static files from templates directory straight from disk.
- <Directory /repo/hg/hg_templates/static/>
- Options None
+ # Serve static files straight from disk.
+ <Directory /repo/hg/htdocs/static/>
+ Options FollowSymLinks
AllowOverride NoneTypo
Require all granted
</Directory>
...
When we use hg histedit
or git rebase -i
to rewrite this history, the VCS
would first attempt to re-order commits before squashing 2 commits together.
When we attempt to reorder the fixup diff immediately after the commit that
introduces it, there is a good chance your VCS tool would encounter a merge
conflict. Essentially your VCS is thinking you changed this line but the
lines around the change in the final version are different from the lines
in the initial version: I don't know if those other lines matter and therefore
I don't know what the end state should be, so I'm giving up and letting the
user choose for me.
But since hg absorb
operates at the line history level, it knows that this
individual line wasn't actually changed (even though the lines around it did),
assumes there is no conflict, and offers to absorb the change. So not only
is hg absorb
significantly simpler than today's hg histedit
or
git rebase -i
workflows in terms of VCS command interactions, but it can
also avoid time-consuming merge conflict resolution as well!
Another feature of hg absorb
is that all the rewriting occurs in memory
and the working directory is not touched when running the command. This means
that the operation is fast (working directory updates often account for a lot
of the execution time of hg histedit
or git rebase
commands). It also means
that tools looking at the last modified time of files (e.g. build systems
like GNU Make) won't rebuild extra (unrelated) files that were touched
as part of updating the working directory to an old commit in order to apply
changes. This makes hg absorb
more friendly to edit-compile-test-commit
loops and allows developers to be more productive.
And that's hg absorb
in a nutshell.
When I first saw a demo of hg absorb
at a Mercurial developer meetup, my
jaw - along with those all over the room - hit the figurative floor. I thought
it was magical and too good to be true. I thought Facebook (the original authors
of the feature) were trolling us with an impossible demo. But it was all real.
And now hg absorb
is available in the core Mercurial distribution for anyone
to use.
From my experience, hg absorb
just works almost all of the time: I run
the command and it maps all of my uncommitted changes to the appropriate
commit and there's nothing more for me to do! In a word, it is magical.
To use hg absorb
, you'll need to activate the absorb
extension. Simply
put the following in your hgrc
config file:
[extensions]
absorb =
hg absorb
is currently an experimental feature. That means there is
no commitment to backwards compatibility and some rough edges are
expected. I also anticipate new features (such as hg absorb --interactive
)
will be added before the experimental label is removed. If you encounter
problems or want to leave comments, file a bug,
make noise in #mercurial
on Freenode, or
submit a patch.
But don't let the experimental label scare you away from using it:
hg absorb
is being used by some large install bases and also by many
of the Mercurial core developers. The experimental label is mainly there
because it is a brand new feature in core Mercurial and the experimental
label is usually affixed to new features.
If you practice workflows that frequently require amending old commits, I
think you'll be shocked at how much easier hg absorb
makes these workflows.
I think you'll find it to be a game changer: once you use hg abosrb
, you'll
soon wonder how you managed to get work done without it.