Some notes on differences between unifdef version 2 and version 3.
unifdef
3The main new features are macro expansion and expression simplification.
Newer C++ features such as binary integer literals, raw string literals, and user-defined literals are now handled properly.
Instead of working one line at a time, unifdef
now loads the whole
file into memory, so it can handle preprocessor directives that span
multiple lines.
The expression evaluator is more correct, for instance is uses
intmax_t
instead of long
, and it knows about signed vs unsigned
and undefined behaviour. It now supports ?:
and character constants.
unifdef
3Support for non-C-like languages has been dropped. This was the -iD
,
-iU
, and -t
options which (selectively) disabled lexing of strings
and comments.
Complement mode, -c
, which reversed which lines were deleted and
which were kept. (You can script it up with comm
(1).)
Short-circuit evaluation of &&
and ||
cannot be disabled.
unifdef
3Diagnostics are different. Error exit codes are more specific.
The -d
debugging/diagnostics option now takes an argument.
unifdef
?Version 2 works line-at-a-time, which makes it hard to handle C well.
So old unifdef
has a bunch of limitations related to preprocessor
directives that span multiple lines. And there's a load of extra
complexity involved in detecting when these limitations are triggered,
and handling them gracefully. The unifdef
parser state machine is at
least twice as big because of this.
One of the worst parts of C's lexical syntax is that backslash-newline
can occur anywhere. Old unifdef
's line-at-a-time design means it has
to give up when it encounters backslash-newline, so the rest of its
lexer does not have to deal with the full implications. This means it
isn't just a matter of better buffering to change the old code to work
better with multi-line preprocessor directives: lots of other code
can't support it either.
And there are some embarrassing shortcuts. I think the worst one is
that strings are treated the same as comments, because they can't
legitimately occur in #if
directives, so it's convenient to pretend
strings don't exist in a similar way to comments.
But despite being a bit crappy, unifdef
is successful and its
limitations don't stop it being useful on real-world code. And it's
economical, about 1300 lines of code (not counting comments and blank
lines).
The main features I want are under the headline idea of a "partial preprocessor", i.e. macro expansion and expression simplification. They both require infrastructure that the old code lacks.
My other aim is a bit more esoteric: to make unifdef
conform much
more closely to the standards (de facto as well as de jure). The
success of old unifdef
shows that this isn't necessary for a tool to
be useful, but old unifdef
definitely needs manual help in difficult
situations. Other authors of C source analysis tools have written
about the difficulties of getting a tool that works in the lab to be
sufficiently trouble-free in the real world. Maybe unifdef
is too
simple for it to have this problem, and the effort to improve it will
be a waste; or maybe it's so obviously limited that it doesn't get
pushed hard. Maybe we'll find out which...
In 2002 I started working on unifdef
, using CVS because that was the
version control system used by the various BSDs. In the first few
years, its release version number was 1.NNN, which was just the CVS
revision of unifdef.c
. There's a tradition (going back to SCCS) of
embedding the version control revision number in the source file, and
unifdef
used this to include a version string in the binary that
could be read by SCCS what
or RCS ident
.
In 2010, I uplifted unifdef
to git
, which does not have CVS-style
revision numbers. So I replaced the CVS $Keyword$
tags with
manufactured ones containing the output from git describe
, and I
decided to bump the major version to 2.N.
The v1/v2 major version bump was partly for administrivial reasons,
but it also made better historical sense. Dave Yost's pre-ANSI-C
unifdef
was clearly version 1, and my rewrite was version 2. But I
was 8 years late in applying that logic, partly because unifdef
v2
evolved directly from unifdef
v1.
Version 3 is a complete rewrite, and deprecates some command line
options, so the major version number bump is fully justified.
(And you can still see it using what
or ident
.)