|
- William of Occam
|
Command-line processing packages multiply because, despite its apparent simplicity, unrestricted command-line processing is a complex and specialized parsing task. Solutions may be optimized for execution speed, library size, flexibility, expressive power, level of automation, ease of use, or conciseness of specification, but not for all of these at once.
Thus the CPAN offers: small and fast (but unsophisticated) Getopt:: packages; large and powerful (but harder-to-use) Getopt:: packages; and middle-sized and easy-to-use (but restrictive) Getopt:: packages. Getopt::Declare is targeted at still another niche in this multidimensional manifold, offering a large, powerful, flexible, easy-to-use, unrestrictive, highly-automated (but less concisely-specified and somewhat slower) package.
More significantly, Getopt::Declare represents a quite different approach to specifying the nature and meaning of command-line parameters. Most Getopt:: packages take a list of the allowed parameters in some form, possibly annotated with corresponding parameter descriptions, or lists of subarguments, or other flags which control the command-line processing. In contrast, to use Getopt::Declare, the programmer simply specifies the complete "usage" string they wish to have implemented. Getopt::Declare then parses this specification and builds a command-line processor to match.
Thus, when using the standard Getopt::Long, one might write:
GetOptions('foo|f=s', \$foo, 'bar=i', \&proc, 'ar=s@', \@ar) or die; print "foo = $foo, ar = ", @ar;whereas, using Getopt::Declare, one would write:
$args = new Getopt::Declare q{ -foo <str> Peeking option -f <str> [ditto] -bar <num:i> Drinking option { proc($_PARAM_,$num) } -ar <str>... Pirate option [repeatable] }; print "foo = $args->{-foo}, ar = ", @{$args->{-ar}};which is considerably more verbose, but also much clearer and easier to get right. Note that the Getopt::Declare version also provides full automatic usage and version enquiry parameters (-h and -v, respectively) and detailed error messages.
$args = new Getopt::Declare($specification);The specification is a single string in which the syntax of each parameter is declared, along with a description and (optionally) one or more actions to be performed when the parameter is encountered. The specification string may also include other usage formatting information (such as group headings or separators) as well as standard Perl comments (which are ignored).
Calling Getopt::Declare::new() parses the contents of the array @ARGV, extracting any arguments that match the parameters defined in the specification string, and storing the parsed values as hash elements within the new Getopt::Declare object being created.
Other features of the Getopt::Declare package include:
The parameter definition consists of a leading flag or parameter variable, followed by any number of parameter variables or punctuators, optionally separated by spaces. The parameter definition is terminated by one or more tabs (at least one trailing tab must be present).
For example, all of the following are valid Getopt::Declare parameter definitions:
-v in=<infile> +range <from>..<to> --lines <start> - <stop> ignore bad lines <outfile>Note that each of the above examples has at least one trailing tab (even if you can't see it). Note too that this hodge-podge of parameter styles is certainly not recommended within a single program, but is shown so as to illustrate some of the range of parameter syntax conventions that Getopt::Declare supports.
The spaces between components of the parameter definition are optional,
but significant. If two components are separated by a space in the definition,
then there may be optional spaces at the same point in a matching argument.
If there is no space between two components, then there may not be any
space at the same point in a matching argument. Hence, as specified above,
the --lines parameter would match any of the following:
--lines1-10 | --lines 1-10 | --lines 1 -10 |
--lines 1 - 10 | --lines1- 10 |
-val <str>would match any of the following the arguments:
-value # <str> <- "ue" -val abcd # <str> <- "abcd" -val "a value" # <str> <- "a value"It is also possible to restrict the types of values which may be matched by a given parameter variable. For example:
-limit <threshold:n> Set threshold to some (numerical) value -count <N:i> Set count to <N> (must be integer)See "Parameter variable types" for details of this mechanism.
Parameter variables are treated as scalars by default, but this too can be altered. Any parameter variable immediately followed by an ellipsis (...) is treated as a list variable, and matches its specified type sequentially as many times as possible. For example, the parameter specification:
-pages <pages:i>...would match either of the following arguments:
-pages 1 -pages 1 2 7 20Note that both scalar and list parameter variables "respect" the flags of other parameters, as well as their own trailing punctuators. For example, given the specifications:
-a -b <b_list>... -c <c_list>... ;The following argument lists will be parsed as indicated:
-b -d -e -a # <b_list> <- ("-d", "-e") -b -d ; # <b_list> <- ("-d", ";") -c -d ; # <c_list> <- ("-d")
+range <from> [..] [<to>]which now matches any of:
+range 1..10 | +range 1.. |
+range 1 10 | +range 1 |
-list [<pages>...]Two or more parameter components may be made jointly optional, by specifying them in the same pair of brackets. Optional components may also be nested. For example:
-range <from> [.. [<to>] ]Scalar optional parameter variables (such as [<to>]) are given undefined values if they are skipped during a successful parameter match. List optional parameter variables (such as [<page>...]) are assigned an empty list if unmatched.
One important use for optional punctuators is to provide abbreviated versions of specific flags. For example:
-num[eric] # Match "-num" or "-numeric" -lexic[ographic]al # Match "-lexical" or "-lexicographical" -b[ells+]w[histles] # Match "-bw" or "-bells+whistles"Note that the actual flags for these three parameters are -num, -lexic and -b, respectively.
Descriptions may be placed after the tab(s) following the parameter definition and may be continued on subsequent lines, so long as those lines do not contain any tabs after the first non-whitespace character (because any such line will instead be treated as a new parameter specification). The description is terminated by a blank line, an action specification (see "Actions") or another parameter specification.
For example:
-v Verbose mode in=<infile> Specify input file (will fail if file does not exist) +range <from>..<to> Specify range of columns to consider --line <start> - <stop> Specify range of lines to process ignore bad lines Ignore bad lines :-) <outfile> Specify an output fileThe parameter description may also contain special directives which alter the way in which the parameter is parsed. These are described in later sections.
-v Verbose mode { $::verbose = 1; } -q Quiet mode { $::verbose = 0; }Each action is executed as soon as the corresponding parameter is successfully matched in the command-line (but see "Deferred actions" for a means of delaying this response). Actions are executed (as "strict" do blocks) in the package in which the Getopt::Declare object containing them was created. In addition, each parameter variable belonging to the corresponding parameter is made available as a (block-scoped) Perl variable with the same name. For example:
+range <from>..<to> Set range { setrange($from, $to); } -list <page:i>... Specify pages to list { foreach (@page) { list($_) if $_ > 0 } }Note that scalar parameter variables become scalar Perl variables, and list parameter variables become Perl arrays.
-- Traditional argument list terminator { finish } ## Non-traditional terminator (only valid Wednesdays) { finish (localtime)[6] == 3 }It is also possible to reject a successful parameter match from within its associated action (and then continue trying other candidates), by using the reject operator. This allows actions to be used to perform more sophisticated tests on the value of a parameter variable, or to implement complicated parameter interdependencies. The reject operator takes an optional parameter. If the parameter is true (or is omitted) the current parameter match is immediately rejected. For example:
-ar <R:n> Set aspect ratio (must be in the range (0..1]) { $::sawaspect++; reject ( $R <= 0 or $R > 1 ); setaspect($R); }Note that any actions performed before the call to reject will still have effect (for example, the variable $::sawaspect remains incremented even if the aspect ratio parameter is subsequently rejected).
The reject operator may also take a second parameter, which is used as an error message if the rejected argument subsequently fails to match any other parameter. For example:
-q Quiet option (not available on Wednesdays) { reject ((localtime)[6]==3 => "Not today!"); $::verbose = 0; }
To support this, Getopt::Declare provides a local operator (defer) which delays the execution of a particular action until the command-line processing is finished. The defer operator takes a single block, the execution of which is deferred until the command-line is fully and successfully parsed (the block is converted to a closure, which is stored and executed only when parsing is finished). If command-line processing fails for some reason, deferred blocks are never executed.
For example:
$args = Getopt::Declare q{ <files>... Files to be processed { defer { foreach (@files) { proc($_); } } } -rev[erse] Process in reverse order -rand[om] Process in random order };With the above specification, the -rev and/or -rand flags can be specified after the list of files, but still affect the processing of those files (assuming that proc() consults $args->{'-rev'} and $args->{'-rand'}).
:+i | which restricts a parameter variable to matching positive, non-zero integers. |
:+n | which restricts a parameter variable to matching positive, non-zero numbers (integer or floating point). |
:0+i | which restricts a parameter variable to matching non-negative integers. |
:0+n | which restricts a parameter variable to matching non-negative numbers. |
:id | which requires a parameter variable to match an identifier (that is, a sequence of characters matching /[A-Za-z_]\w*/). |
:s | which allows a parameter variable to match any quote-delimited or whitespace-terminated string. Note that this is the default behaviour. |
:if | which is used to match input file names, and requires that the matched argument be either - (indicating standard input) or the name of a readable file. |
:of | which is used to match output file names. It is exactly like type :if except that it requires that the string be either - (indicating standard output) or the name of a file that is either writable or non-existent. |
-repeat <n:+i> Repeat <n> times (must be > 0) -scale <f:0+n> Set scaling factor (cannot be negative) -o <file:of> Specify output fileAlternatively, parameter variables can be restricted to matching a specific regular expression, by providing the required pattern explicitly (in matched '/' delimiters after the colon). For example:
-parity <p:/even|odd|both/> Set parity -file <name:/\w*\.[A-Z]{3}/> File name (with extension)
To declare a new parameter variable type, the [type:...] directive is used. A [type...] directive specifies the name, matching pattern, and action for the new parameter variable type (though both the pattern and action are optional).
The name string may be any whitespace-terminated sequence of characters which does not include a ">". The name may also be specified within a pair of quotation marks (single or double) or within any Perl quotelike operation. The pattern is used in initial matching of the parameter variable. Patterns are normally specified as a '/'-delimited Perl regular expression:
[type: num /\d+/ ] # <v:num> matches digits [type: q{nbr} /\d+(\.\d*)/ ] # <v:nbr> matches decimals [type: "a num" /[+-]?\d+/ ] # <v:a num> matches signed digitsAlternatively the pattern associated with a new type may be specified as a ":" followed by the name of another parameter variable type. In this case the new type matches the same pattern (and action! - see below) as the named type. For example:
[type: posnum :+i ] # <v:posnum> is the same as <v:+i>As a third alternative, the pattern may be omitted altogether, in which case the new type matches whatever the inbuilt pattern :s matches.
The optional action which may be included in any [type:...] directive is executed after the corresponding parameter variable matches the command line but before any actions belonging to the enclosing parameter are executed. Typically, such type actions will call the reject operator (see "Termination and rejection") to test extra conditions, but any valid Perl code is acceptable. For example:
[type: num /\d+/ { reject {(localtime)[6]==3}} ] [type: 'a num' :n { print "a num!" } ] [type: q{nbr} :'a num' { reject {$::no_nbr} } ]If a new type is defined in terms of another (for example, :a num and :nbr above), any action specified by that new type is prepended to the action of that other type. Hence:
As a special case, if a parameter consists of a single parameter variable (optionally preceded by a flag), then the value for the corresponding hash key is not a hash reference, but the actual value matched.
For example, given the following specification:
$args = new Getopt::Declare q{ -v <value> [exact] Specify search value <infile> Input file -o <outfiles>... Output files };the object $args would have the following members (assuming that all parameters were matched):
$args->{'-v'}{'<value>'} | The argument matched by the <value> parameter variable of the -v parameter. |
$args->{'-v'}{'exact'} | The argument (if any) matched by the optional [exact] punctuator of the -v parameter. |
$args->{'<infile>'} | The argument matched by the <infile> parameter. |
$args->{'-o'} | The argument matched by the <outfile> parameter variable of the -o parameter. |
$args = new Getopt::Declare q{ ar = <R:n> Set aspect ratio (will be clipped to [0..1]) { $R = 0 if $R < 0; $R = 1 if $R > 1; } };then the value of $args->{'ar'}{'<R>'} will always be between zero and one.
Getopt::Declare::parse() takes an optional parameter which specifies the source of the text to be parsed (it parses @ARGV if the parameter is omitted). This parameter takes the same set of values as the optional second parameter of Getopt::Declare::new().
Getopt::Declare::parse() returns true if the source is located and parsed successfully. It returns a defined false (zero) if the source is not located. An undef is returned if the source is located, but not successfully parsed.
Thus, the following code first constructs parsers for a series of alternate configuration files and for the command line, and then parses them:
# BUILD PARSERS my $config = Getopt::Declare::new($config_grammar, [-BUILD]); my $args = Getopt::Declare::new($cmdline_grammar, [-BUILD]); # TRY STANDARD CONFIG FILES $config->parse([-CONFIG]) # OTHERWISE, TRY GLOBAL CONFIG or $config->parse(['/usr/local/config/.demo_rc']) # OTHERWISE, TRY OPENING A FILEHANDLE (OR JUST GIVE UP) or $config->parse(new FileHandle (".config")); # NOW PARSE THE COMMAND LINE $args->parse() or die;
Apart from allowing for "secret" parameters (a dubious benefit), this feature enables the programmer to specify some (undocumented) action which is to be taken on encountering an otherwise unknown argument. For example:
<unknown> [undocumented] last resort { handle_unknown($unknown); }
-v Verbose mode --verbose [ditto] (long form)Furthermore, if the "dittoed" parameter has no action(s) specified, the actions of the preceding parameter are reused. For example, the specification:
-v verbose mode { $::verbose = 1; } --verbose [ditto]would result in the --verbose option setting $::verbose just like the -v option. On the other hand, the specification:
-v Verbose mode { $::verbose = 1; } --verbose [ditto] { $::verbose = 2; }would give separate actions to each flag.
Getopt::Declare allows flag clustering at any point where the remainder of the command-line being processed starts with a non-whitespace character and where the remaining substring would not otherwise immediately match a parameter flag. This means that multiple-character flags can be clustered, as can flags with parameter variables and punctuators.
If the idea of such unconstrained flag clustering is too libertarian
for a particular application, the feature may be restricted (or removed
entirely), by including a [cluster:<option>]
directive anywhere in the specification string. The clustering options
are:
any | The [cluster:any] directive allows any suitable flags to be clustered (that is, it simply makes explicit the default behaviour). |
flags | The [cluster:flags] directive restricts clustering to parameters which are "pure flags" (that is, those which have no parameter variables or punctuators - not even optional ones). |
singles | The [cluster:singles] directive restricts clustering to parameters which are "pure flags", and which consist of a flag prefix followed by a single alphanumeric character. |
none | The [cluster:none] directive turns off clustering completely. |
However, if a new Getopt::Declare object is created with a specification string containing the [strict] directive (at any point in the specification):
$args = new Getopt::Declare <<'EOSPEC'; [strict] -a Append mode -b Back-up mode -c Copy mode EOSPECthen the command-line is parsed "strictly". In this case, any unrecognized command-line argument (such as "-q") will cause an error message to be written to STDERR, and command-line processing to fail (after the entire command-line has been parsed). On such a failure, the call to Getopt::Declare::new() returns undef instead of the usual hash reference.
The only concession that "strict" mode makes to the unknown is that, if command-line processing is prematurely terminated via the finish operator, any command-line arguments which have not yet been examined are left in @ARGV and do not cause the parse to fail (of course, if any unknown arguments are encountered before the finish was executed, those earlier arguments will cause command-line processing to fail).
Each directive specifies a particular set of conditions that a command-line must fulfil. If any such condition is violated, an appropriate error message is printed. Furthermore, once the command-line is completely parsed, if any condition was violated, the call to Getopt::Declare::new() dies.
The directives are:
However, it is often useful to allow a particular parameter to match more than once. Any parameter whose description includes the directive [repeatable] is never excluded as a potential argument match, no matter how many times it has matched previously:
-nice Increment nice value [repeatable] { $::nice++; }If the [repeatable] directive appears outside the description of any parameter (usually at the start of a specification), then all parameters are marked as repeatable.
-case set to all lower case -CASE SET TO ALL UPPER CASE [mutex: -case -CASE]The interaction of the [mutex:...] and [required] directives is potentially awkward in the case where two "required" arguments are also mutually exclusive (since the [required] directives insist that both parameters must appear in the command-line, whilst the [mutex:...] directive expressly forbids this).
Getopt::Declare resolves such contradictory constraints by relaxing the meaning of "required" slightly, so that an argument that matches any flag in a [mutex...] set is implicitly considered to have matched all the flag's mutually exclusive alternatives as well. Hence the specifications:
-case set to all lower case [required] -CASE SET TO ALL UPPER CASE [required] [mutex: -case -CASE]mean that exactly one of these two flags must appear on the command-line, but that the presence of either of them will suffice to satisfy the "requiredness" of both.
-num Use numeric sort order -len Sort on length of line (or field) -field <N:+i> Sort on value of field <N> -rev Reverse sort order [requires: -num || (-len && ! -field)]means that the -rev flag is valid only if the -num parameter has matched, or if the -len parameter has been found but not the -field parameter. Note that the operators &&, || and ! retain their normal Perl precedences.
$args = new Getopt::Declare (-PERL);declares a command-line parser which is functionally equivalent to the one installed by the perl -s option (except, of course, the Getopt::Declare version also provides automated usage and version information, and allows flags to appear anywhere on the command-line). Likewise:
$args = new Getopt::Declare (-AWK);allows the program to use awk-like arguments (of the form: "var=val") to create run-time variables (of the form: $::var = 'val').
It is also possible to specify both predefined grammars together , by concatenating the keywords:
$args = new Getopt::Declare ('-AWK-PERL');
In addition to this information, Getopt::Declare displays three sample command-lines: one indicating the normal usage (including any required parameter variables), one indicating how to invoke help, and one indicating how to determine the current version of the program.
Note however that, if a parameter with any of these flags is explicitly specified in the string passed to Getopt::Declare::new(), that flag (only) is removed from the list of possible help flags. For example:
-h <pixels:+i> Specify height in pixelswould cause the -h help parameter to be removed (although help would still be accessible by specifying any of the arguments "-H", "-help", "-Help", "--HELP", etc).
my $parser = new Text::CSV; open CSV_FILE, $datafile or die; while (defined($line = <CSV_FILE>)) { if ($csv->parse($line)) { my ($ID, $name, $score) = $csv->fields(); process_marks($ID, $name, $score) && next if $ID =~ /^[A-Z]\d{7}$/ && $score eq 0+$score; } print STDERR "Invalid data: $line\n"; }Getopt::Declare can mimic this behaviour, somewhat more compactly:
my $format = q{ [repeatable] <ID:/[A-Z]\d{7}/> , <name:qs> , <score:n> VALID FORMAT { process_marks($ID, $name, $score); } <line:/.*/> ELSE ERROR { print STDERR "Invalid data: $line\n"; } }; new Getopt::Declare ($format, [$datafile]) or die;More importantly, Getopt::Declare makes it simple to handle variant formats of comma-separated values in the same input stream:
my $format = q{ [repeatable] <ID:/[A-Z]\d{7}/> , <name:qs> , <score:n> FORMAT 1 { process_marks($ID, $name, $score); } <name:qs> , <ID:/[A-Z]\d{7}/> , <score:n> FORMAT 2 { process_marks($ID, $name, $score); } <ID:/[A-Z]\d{7}/> , <score:n> FORMAT 3 { process_marks($ID, '???', $score); } <line:/.*/> ELSE ERROR { print STDERR "Invalid data: $line\n"; } }; new Getopt::Declare ($format, [$datafile]) or die;
my $interpolator = new Getopt::Declare (<<'EOINTERP',[-BUILD]); [repeatable] [cluster:none] [type: NOTDELIM /(?:(?!}}).)+/ ] [type: WS /\s*/ ] \{{ <cmd:NOTDELIM> }}[<ows:WS>] { $self->{result} .= (eval "no strict; $cmd") || ""; $self->{result} .= $ows if $ows; } <othertext>[<ows:WS>] { $self->{result} .= $othertext; $self->{result} .= $ows if $ows; } EOINTERP
sub interpolate($) { $interpolator->{result} = ''; $interpolator->parse($_[0]); return $interpolator->{result}; }
print interpolate 'Average mark: {{ sum(@marks[1..$n])/$n }}'; print interpolate 'Expected cost: ${{ commify($cost) }}';
my $commands = q{ [type: ID /[A-Z]\d{7}/ ] [repeatable] f[ind] <id:ID> Find by student ID { $marks->find($id)->print() } f[ind] <name:/.*/> Find by student name { $marks->find_name($name)->print() } d[elete] <id:ID> Delete record { $marks->find($id)->del() } m[ark] <id:ID> <score:0+n> Update mark for student { $marks->find($id)->set($score) } h[elp] { $self->usage(); } }; new Getopt::Declare ($commands, [-STDIN]);Prompting is also easy to incorporate, by using a subroutine reference as a source:
my $prompt = sub { print "> " if -t; return <> } new Getopt::Declare ($commands, $prompt);
Moreover, the approach is easily generalized to provide declarative solutions to a range of similar parsing tasks, where the full power of a recursive parser is not required.
Getopt::Declare is freely available from the author at: