dotat unifdef: expand macros, reduce #ifdefs

UNIFDEF(1) General Commands Manual UNIFDEF(1)

unifdef
expand macros, reduce #ifdefs

unifdef [-hV] [-dflags] [-wflags] [-x {012}] [-m | -Mext | -o outfile] [-Dsym[=val]] [-Usym] [-f header] ... [source ...]

The unifdef utility is a partial C preprocessor. It can expand just the macros you want expanded, selectively process #ifdef conditional directives, and simplify #if controlling expressions. It does not process #include directives. It preserves whitespace and comments.

It processes one or more source files according to a collection of macro names which you can specify on the command line with the -D or -U options or using preprocessor directives in -f header files.

sym=val
The symbol is defined to the given value value, like #defval sym val in a -f header file.
sym
The symbol is defined to the value 1, like #defval sym 1 in a -f header file.
sym
The symbol is explicitly undefined, like #undef sym in a -f header file.
header
Read symbol definitions and undefinitions from the header file.

The -D, -U, and -f header options are processed in order, and later definitions and undefinitions override earlier ones.

outfile
Write output to the file outfile instead of the standard output when processing a single file.
ext
Modify multiple source files in place, and keep backups of the originals by appending the ext to the source file names.
Modify multiple input files in place. If an input file is not modified, the original is preserved instead of being overwritten with an identical copy.
flags
Set diagnostic options. See the DIAGNOSTICS section below.
flags
See the Write options subsection below.
{012}
See the EXIT STATUS section below.
Print help.
Print version details.

The -wflags option allows you to control certain details of how source files are processed, request optional features in the output, or enable special output modes.

Source files can be specified on the command line after the options. They are expected to be written in C or C++, or some other language that has a sufficiently similar lexical syntax.

The -f header files contain a limited set of preprocessor directives that define or undefine symbols as described in the EXTENDED DESCRIPTION section below.

The output of unifdef is the partially preprocessed source file(s). Unlike the standard preprocessor, unifdef preserves comments and whitespace.

The -w flags control features of the output such as #line directives. When there is one source file, you can use -o output to specify the output file name. A -Mext or -m option tells unifdef to modify multiple source files in place, with or without backups.

The source is read from the standard input if no source file names are provided on the command line. If a -f header file name or source file name is ‘-’ the file is read from the standard input.

When there is one input file, and no -M, -m, or -o option, the partially preprocessed source is written to the standard output. The -o - option explicitly tells unifdef to write to the standard output.

Errors and warnings are written to the standard error as described in the DIAGNOSTICS section below. The -d flags adjust diagnostic messages.

If the exit mode is -x0 (the default), unifdef exits with status 0 if the output is an exact copy of the input, or with status 1 if the output differs.

If the exit mode is -x1, then the status is inverted.

If the exit mode is -x2, then the status is 0 in either case.

The -h and -V options exit with status 0.

If there is an error, unifdef exits with a status 2.

You use the -D or -U options or preprocessor directives in -f header files to specify a set of “symbols”. Symbols can be C preprocessor macros or they can have a special unifdef meaning:
Function-like macros
created by #define macro(params) body in -f header files
Object-like macros
created by #define macro value in -f header files
Defined values
affect preprocessor conditions, but are not expanded in the program text like macros; they are set using the -Dsym or -Dsym=val options or by #defval sym val in -f header files
Explicitly undefined
using the -Usym option or with #undef sym in -f header files
Unknown symbols
are other macros, identifiers, or keywords that you did not specify a meaning for.

Each -f header file can only contain #defval, #define, and/or #undef directives, with comments and extra white space. No other preprocessor directives or program text is allowed.

Given these symbols, unifdef partially pre-processes one or more source files as described in the following sub-sections.

Macros (from #define) are expanded in source files in the usual way, in program text and in #if, #elif, and #include preprocessor directives. Defined values (from #defval) are only subsituted into #if and #elif expressions, not the program text.

Macros, defined values, and explicitly undefined symbols can be tested by the defined(sym) operator and #ifdef, #ifndef, #elifdef, and #elifndef conditions. Unknown symbols produce an unknown result from the defined operator, instead of being treated as undefined.

Any #defval, #define, and #undef directives are ignored in source files. #include directives can be changed by macro expansion but unifdef does not do any file inclusion.

The unifdef utility got its name from selectively deleting #ifdef, #ifndef, #if, #elif, #elifdef, #elifndef, #else, and #endif lines, and the sections of program text they control.

#ifdef sym is equivalent to #if defined(sym), and #ifndef sym is equivalent to #if !defined(sym), but they are not simplified like #if when they are retained.

The controlling expression of a #if is evaluated as described in the Expression simplification subsection below. When the whole expression can be completely evaluated and it simplifies to nothing, the #if is deleted. Otherwise the simplified expression is written to the output.

When the #if expression evaluates to nothing-but-1 then just the directive is deleted and the following lines under its control are retained. The #if nothing-but-1 group can be followed by #elif or #else directives which are deleted along with all the program text under their control. The matching #endif is also deleted.

When the #if expression evaluates to nothing-but-0 then the following lines under its control are deleted too. When the #if nothing-but-0 group is followed by #else or #endif, then the #else and #endif lines are deleted and any lines between them are retained. When the #if nothing-but-0 group is followed by #elif expr, it is rewritten to #if expr and it is processed as if it were the start of a group. Similarly, #elifdef sym and #elifndef sym are rewritten to #ifdef sym and #ifndef sym respectively.

In #if and #elif lines in source files, unifdef expands macros and evaluates and simplifies the controlling expression as follows:
  • Macros and defined values are expanded. Any defined(sym) operators are replaced by 1 if the symbol is defined, by 0 if it is explicily undefined, or left alone if the symbol is unknown.
  • The expression is evaluated. If there is a syntax error, the value of the whole expression is unknown. If a subexpression still contains an unexpanded symbol or defined operator, the value of just that subexpression is unknown.
  • The ‘!’, ‘not’, ‘&&’, ‘and’, ‘||’, ‘or’, and ‘?:’ operators are simplified. When an operand has a known value, part or all of the subexpression is deleted (but unifdef remembers its value). Brackets around a subexpression that simplifies to nothing are also deleted.
  • A subexpression is only deleted if it contained an expanded symbol or defined operator.
  • If a deleted subexpression is an argument of another operator then a 0 or 1 is inserted instead.
expression simplification
true nothing but 0
false nothing but 1
other unchanged
false && any nothing but 0
any && false nothing but 0
true && true nothing but 1
left && true left
true && right right
left && right unchanged
true || any nothing but 1
any || true nothing but 1
false || false nothing but 0
left || false left
false || right right
left || right unchanged
true ? then : else then
false ? then : else else
other ? then : else unchanged
truthy ?: any truthy
false ?: any any
other ?: any unchanged

It is not possible for unifdef to straightforwardly conform to the C and/or C++ standards, because unlike the standard preprocessor it retains comments and whitespace and it has a different treatment of unknown preprocessor symbols.

This section describes unifdef's implementation-defined behaviour, some aspects of undefined behaviour, and details of how unifdef differs from the various C and C++ standards. Differences from the standards are marked (diff), implementation-defined behaviours are marked (impl), variations between standards and/or popular implementations are marked (vary), and undefined behaviour that is defined by unifdef is marked (undef).

(impl) There is no limit on the size of tokens or depth of nesting in source files processed by unifdef other than available memory.

(diff) The unifdef utility passes whitespace and comments through unchanged. The standard C preprocessor eliminates comments and backslash-newline sequences, and may collapse horizontal whitespace.

(vary) Some implementations (such as gcc(1) and clang(1)) allow (and warn about) horizontal space inside a backslash-newline sequence, but unifdef does not.

(impl) Newlines in source files can be any of ‘\n’ (line feed), ‘\r\n’ (CRLF), or bare ‘\r’ (carriage return).

(diff) The ‘\v’ (vertical tab) and ‘\f’ (form feed) characters are always treated as horizontal whitespace. (In C and C++ they are forbidden in preprocessor lines, and in C++ they are also forbidden in // comments.)

(vary) C did not support // line comments before C99.

(vary) C requires a newline at the end of a source file; C++ appends any missing newline; unifdef behaves as if the newline were not missing, but does not append it to the output.

(impl) Source files are expected to be UTF-8, but unifdef allows any ASCII-compatible character set.

(vary) In C++11 translation phase 1 (when trigraphs are processed), characters that are not in the basic source character set are translated into universal character names (the ‘\uXXXX’ notation). C and unifdef do not do this.

(diff) C and C++ restrict universal character names in various ways but unifdef does not.

(vary) Universal character names appeared in C++98 and C99.

(undef) Unexpected characters, such as control characters or ' or " outside a constant or literal, are passed through by unifdef but cause undefined behaviour in C and C++.

(impl) C allows identifiers to contain non-standard characters. Like gcc(1), unifdef allows ‘$’ in identifiers.

(impl) UTF-8 characters and other source bytes with the top bit set are also allowed in identifiers.

(diff) C and C++ allow a subset of universal character names in identifiers, whereas unifdef does not check them.

(vary) C++ does not allow implementations to extend the set of valid identifiers. Multibyte characters are allowed in C++ identifiers via the phase 1 translation to universal character names.

(vary) In C++14, GNU C, and unifdef you can use binary literals like ‘0b0101010’.

(vary) In C++14 you can separate digit groups in numbers with apostrophes, like ‘1'048'576’. C does not have digit separators. Digit separators can be read and passed through by unifdef, but it cannot evaluate integers containing them.

(diff) Backslash-newline sequences inside integers also prevent unifdef from evaluating them.

(vary) The long long type and its ‘ll’ and ‘LL’ integer constant suffixes were introduced in C99 and C++11. The standard preprocessor and unifdef recognise length suffixes but always evaluate expressions using maximum-width integers,

(vary) The standard preprocessor and unifdef do not evaluate floating constants, which may also contain apostrophes in C++.

(vary) The standard preprocessor and unifdef do not evaluate user-defined numeric literals, which were added in C++11.

(diff) The following kinds of character constants can be evaluated by unifdef. Other character constants are passed through unevaluated.
  • A single universal character name that does not overflow
  • A single hex or octal escape that does not overflow
  • One simple escape like '\n', with the usual ASCII value, including the GNU extension '\e' for '\033' ESC
  • A single ASCII or UTF-8 character, whose value is the the corresponding Unicode code point
  • A single byte with the top bit set (when the source is non-UTF-8 extended ASCII)

(impl) Hex and octal escape sequences can be up to 8 bits for bare character constants, 7 bits for u8'X' constants, 16 bits for u'X' constants, or 32 bits for U'X' or L'X' constants.

(impl) Bare character constants without an encoding prefix are sign-extended if their value is between 128 and 255, unless they are UTF-8.

(impl) Wide character constants containing a non-UTF8 extended ASCII byte with an ‘L’ encoding prefix are also sign-extended.

(impl) The full range of UTF-8 values is allowed in most character constants. The standards limit u'X' to 16 bits, and u8'X' to 7 bits.

(vary) The ‘U’ and ‘u’ encoding prefixes were introduced in C11 and C++11. They are recognised by unifdef.

(vary) The ‘u8’ character prefix was introduced in C++14 and C2x. It is not supported by gcc(1) nor clang(1) but is supported by unifdef.

(vary) User-defined character constants were introduced in C++11. They are recognized and passed through by unifdef but they cannot be evaluated in #if preprocessor control expressions.

(vary) The ‘U’, ‘u’, and ‘u8’ string encoding prefixes were introduced in C11 and C++11. They are all recognised by unifdef.

(vary) C++11 added multiline raw string literals like R"delim(contents)delim" and user-defined string literals like "string"tag are also recognised by unifdef.

(vary) In C99, you can use wordy versions of certain operators after #include <iso646.h>. In C++ and unifdef the wordy operators are built in.

(vary) In C99, you can use true and false as constants after #include <stdbool.h>. In C++ and unifdef, true and false are built in.

(vary) Digraphs appeared in C++98 and C99. The ‘%:’ and ‘%:%:’ aliases for ‘#’ and ‘##’ are supported by unifdef. (The other digraphs are not relevant to the C preprocessor.)

(vary) C2x added the #elifdef and #elifndef directives, which are supported by unifdef.

(diff) The #defval directive is specific to unifdef -f header files. For compatibility with standard compilers, it can be written #pragma unifdef defval.

(impl) The C preprocessor is required to evaluate #if controlling expressions using intmax_t and uintmax_t. The details of these types are determined by the compiler that unifdef was built with.

(diff) When there is an error during evaluation, unifdef produces an unknown result, so that the problem subexpression is not simplified.

(diff) Unknown symbols evaluate to 0 in the standard preprocessor, but produce an unknown result in unifdef.

(diff) A wordy operator that appears in an operator position, like ‘5 xor 10’, acts as an operator, for compatibility with C++. A wordy operator that appears in an operand position after macro expansion, like ‘-xor’, is treated as an unknown symbol for compatibility with C. However ‘not’ is always parsed as an operator.

(vary) GNU C and unifdef support a binary variant of the ‘?:’ operator, where then ?: else is like then ? then : else except the then operand is written and evaluated once.

(undef) Signed integer overflow in multiplication, addition, subtraction, or negation produces an unknown result.

(undef) When evaluating the / or % operators, division by zero produces an unknown result, as does division of INTMAX_MIN by -1 (which causes an overflow because -INTMAX_MIN > INTMAX_MAX).

(vary) Shift operators in C and C++ consist almost entirely of undefined behaviour. In C++20 shifts are more tightly specified, and unifdef implements the new requirements, as follows.

(undef) Negative shifts or shifts of more than the word size produce an unknown result.

(impl) A right shift of a negative value is an arithmetic shift, so left >> right is always left / 2^right.

(undef) A left shift is the same for signed and unsigned values. Before C++20, a left shift of a negative value is undefined, as is a left shift of a positive value that overflows into the sign bit.

(diff) There are no predefined macros in unifdef.

(diff) The __VA_ARGS__ and __VA_OPT__ identifiers may occur only in the body of a variadic macro, but unifdef does not check this.

(undef) Other reserved symbols (starting with underscore uppercase or double underscore) and standard predefined macros are not restricted by

(vary) Variadic macros were introduced in C99 and C++11. The __VA_OPT__ pseudo-macro was introduced in C++20. In GNU C you can use the ## operator before the variable arguments instead of __VA_OPT__. You can use either in unifdef.

(vary) The __VA_ARGS__, __VA_OPT__, and defined keywords may not be defined or undefined. unifdef.

(undef) The standards say that defined(sym) operators are evaluated before macro expansion, and macros that expand to defined operators cause undefined behaviour. In practice, implementations (such as gcc(1) and clang(1) and unifdef) evaluate defined operators like a special kind of macro, so it works as expected when used in a macro body.

(vary) In C++, a user-defined literal suffix is part of the token, but unifdef and the gcc(1) and clang(1) preprocessors treat it as a separate identifier token that may be expanded as macro. If you define a macro and write its name immediately after a string or character literal, a standard preprocessor should not expand the macro (unless its spelling is the same as a keyword).

cpp(1), diff(1).

The unifdef home page

ISO/IEC 9899 (C), ISO/IEC 14882 (C++).

The C standards committee, ISO/IEC JTC1/SC22/WG14

The C++ standards committee, ISO/IEC JTC1/SC22/WG21

The unifdef command appeared in 4.1cBSD. ANSI C support was added in unifdef version 2 which appeared in FreeBSD 4.7. Support for C++ and macro expansion was added in unifdef version 3.

Dave Yost ⟨Dave@Yost.com⟩ wrote the original K&R C implementation.
Tony Finch ⟨dot@dotat.at⟩ rewrote unifdef to support ANSI C, and again to support C++ and macro expansion.

Please refer to the unifdef home page to report bugs or request features.

Trigraphs are not recognized.
March 29, 2021 UNIFDEF(1)