meta data for this page
  •  

This is an old revision of the document!


Obsolete Settings from Legacy JavaCC (and JJTree)

As a result of quite a bit of forward evolution, some of the settings from legacy JavaCC (and JJTree) are obsolete in JavaCC21. We don't anticipate that any of them will be missed.

  • STATIC: JavaCC21 does not support static parsers. Or, in other words, this is always set to false, (and thus, ignored.)
  • LOOKAHEAD Legacy JavaCC allowed you to specify a default numerical lookahead other than 1 token. In JavaCC21, this setting is gone and is always effectively equal to 1. Of course, you can still specify numerical lookahead other than 1 at choice points as needed.
  • CHOICE_AMBIGUITY_CHECK This was a parameter in legacy JavaCC that allowed you to specify how far to scan ahead when checking for “ambiguities” in the grammar. For now, the whole concept has been removed from JavaCC 21. Most of the conditions reported as “choice ambiguities” were not really ambiguities in the grammar anyway. The logic of JavaCC is that if more than one choice matches, the first one wins. At some point, we may put in some code to check for unreachable code (at least the simple cases that can be statically proven) but it is not a high priority since the whole thing is of very marginal use-value.
  • OTHER_AMBIGUITY_CHECK The same comments basically apply here as to CHOICE_AMBIGUITY_CHECK. The code for these so-called “ambiguity checks” has been ripped out. In any case, in real world praxis, nobody was ever using these settings anyway.
  • FORCE_LA_CHECK Frankly, we are unsure what this setting ever did. At least in this case, ignorance is bliss. So the setting is gone. Besides, the fact remains that lookahead was always fundamentally broken in legacy JavaCC anyway, so all of these sophisticated checks were surely all for nothing anyway!
  • UNICODE_INPUT: Effectively, this is now always set to true, so it is superfluous, (and thus, ignored.)
  • USER_CHAR_STREAM: This was a setting that allowed you to define your own implementation of the CharStream interface. In default usage of JavaCC 21, this whole concept is irrelevant, since by default, the generated parser just slurps the whole file into memory at once anyway. See The Gigabyte is the new Megabyte.
  • BUILD_LEXER: It is rather hard to fathom what the point of this setting ever was. Presumably, the case where you don't build a lexer is the one in which you define your own XXXLexer implementation. However, the USER_DEFINED_LEXER setting (previously called USER_TOKEN_MANAGER) always existed, so it is not clear why this setting was ever needed.
  • BUILD_PARSER: Another bizarrely pointless setting really. If all you want to do is build a lexer, and not a parser, then just don't define any grammatical productions in your grammar and all we build is a lexer!
  • KEEP_LINE_COL: JavaCC21 always puts location information in Tokens and Node objects. (Really, why would you ever want to throw away location info?) For more thoughts on this issue, see The Gigabyte is the new Megabyte.
  • ERROR_REPORTING: This was an option that was true by default, but you could turn it off in order to generate a somewhat smaller .class file, except that error messages would be much less informative because of information being thrown away. I did some experimenting and found that the generated XXXParser.class was typically about 10% smaller with ERROR_REPORTING off. The tradeoff looks terrible and, as with KEEP_LINE_COL, it looks utterly foolish to ever turn this off. So, the setting is now gone and the option is always effectively on. (Further note. All the legacy error reporting code is practically rewritten anyway. The prior comment applies in any case. There is no reason for any sane person to want to turn it off.)
  • SANITY_CHECK: By default, the parser generator does some various sanity checks before generating the various files. This setting in the legacy JavaCC tool allowed you to turn this off. (Why would anybody turn this off?) This setting is gone and is now effectively always true.
  • CACHE_TOKENS: I never even understood what the point of this setting was. It must have been some kind of speculative peephole optimization, except I don't think it was even correct. There would be problems with switches of lexical state in some cases. Also, I doubt it offered any noticeable performance gain. The setting is now gone and is always effectively false. (Which was the default before, which everybody was using anyway.)
  • TRACK_TOKENS : There is no real reason for this setting to exist any more, since, by default, Tokens are added to the AST and they have their line/column information. In fact, all Node objects have line/column information.
  • COMMON_TOKEN_ACTION : This feature is still supported but the configuration setting is no longer necessary, since JavaCC21 deduces it from the presence (or absence) of the appropriately named method in your generated lexer class. If you have a method with the signature void CommonTokenAction(Token t) it will be called at the appropriate point. However, you would be better off using the newer alternative, which has the signature Token tokenHook(Token t). It is more flexible since it allows you to instantiate a new Token object (of whatever subclass) and return it. In any case, there is no need for the configuration setting, since the method is used if present and if not, not. (Duh!)
  • NODE_SCOPE_HOOK : As with COMMON_TOKEN_OPTION, the feature is still supported but the configuration option is no longer necessary, since JavaCC21 deduces it from the presence or absence of the appropriately named method or methods in your generated parser class. See Node Life Cycle Hooks for more information.
  • NODE_EXTENDS : Since JavaCC21 has INJECT, there is no need for this configuration option to exist. If you want to specify that your BaseNode class extends some specific class, simply use code injection to specify this. Something like:
   INJECT(BaseNode) : {extends SomeClass}

In general, code injection can be used to specify that any generated class should extend a given class or implement whatever interface(s). There is no need for a plethora of configuration settings for this.

The following configuration options are still supported but are deprecated in JavaCC21:

  • OUTPUT_DIRECTORY: This is deprecated in favor of the new BASE_SRC_DIR option. See convention over configuration for more information on the preferred way to specify your directory layout when using JavaCC21.
  • NODE_PREFIX: Use of this is not encouraged in JavaCC21. By default, it is simply the empty string. (In JavaCC (or JJTree to be precise) it was “AST” by default.) I guess that prefixing all the Node classes with “AST” is a (crude) way of defining a Namespace. However, one would think these people noticed that Java has this thing called packages.

The following option has been renamed for consistency, but the older name is still supported:

USER_TOKEN_MANAGER is now USER_DEFINED_LEXER.

The use of both PARSER_BEGIN….PARSER_END and TOKEN_MGR_DECLS is deprecated in favor of the new code injection feature. Injecting code into the generated parser and lexer is simply a specific case of code injection, so there is no need for these separate constructs. However, they will continue to work for the foreseeable future.

To specify the parser and lexer class names, you may use the PARSER_CLASS and LEXER_CLASS configuration options. However, it is not mandatory, since a Foo.javacc file will automatically generate a parser class called FooParser and a lexer class called FooLexer. There will rarely be any practical value in overriding that.

There are a host of settings that were added after the FreeCC fork, which was in mid-2008. See ancient history for more information on all this. No settings added to legacy JavaCC after about 2008 are currently in JavaCC 21. Most of them are of very marginal value. Moreover, it is safe to say that nobody uses them because they are not documented anywhere that I can find! Just for example, the GRAMMAR_ENCODING option was added at some point after 2008 (I don't know when exactly) to specify what encoding your grammar file is in. I am certain that nobody uses this. (Or just about nobody surely.) Everybody stores their grammar files in the system default encoding which is UTF-8 on any remotely modern system that any serious developer would be working on. Adding these kinds of options that nobody uses is actually very typical of a nothingburger project. (Adding all these options and not even documenting them is nothingburger-ism squared!)

See new settings in JavaCC 21 for information on settings introduced in JavaCC21 that were not present in legacy JavaCC.