Tree Building Enhancements in JavaCC 21

In JavaCC 21, there is no separate program analogous to legacy JavaCC's JJTree preprocessor. Building an AST is assumed to be the way that most people will want to use the tool, so parsers generated by JavaCC 21 build a tree *by default*. If you really don't want to build a tree (or you want to build a tree but in your own java code actions) there is a TREE_BUILDING_ENABLED setting that can be set to false. (The default value is true.) Assuming that you leave TREE_BUILDING_ENABLED as its default of true, JavaCC 21 supports various options. In addition to the ones that were available in JJTree (most of them, anyway) the following settings are available:

  • TREE_BUILDING_DEFAULT
  • TOKENS_ARE_NODES
  • SPECIAL_TOKENS_ARE_NODES
  • SMART_NODE_CREATION
  • FREEMARKER_NODES

TREE_BUILDING_DEFAULT

A parser built by JavaCC 21 builds a tree by default. in other words, the setting TREE_BUILDING_DEFAULT is set to true unless you explicitly turn it off. In that case, your code needs to explicitly turn on tree-building. Note the difference between disabling tree building altogether via TREE_BUILDING_ENABLED=false. If you set TREE_BUILDING_ENABLED to false, no tree building code is inserted in your parser. If you set TREE_BUILDING_DEFAULT to false, the tree building code is present, but must be toggled on. Like so:

   MyParser parser = new MyParser(....); 
   parser.setBuildTree(true);

Of course, in the opposite case, where TREE_BUILDING_DEFAULT is on and you want to parse input without building a tree, in the above snippet, you would have parser.setBuildTree(false). Note that this is the situation if you do not specify the TREE_BUILDING_DEFAULT option as tree building is on by default.

TOKENS_ARE_NODES

This option indicates that Tokens should be treated as (terminal) nodes in the abstract syntax tree. Note that, for this reason, The Token.java generated by JavaCC 21 contains extra methods that allow it to implement the Node interface. Note that TOKENS_ARE_NODES is the default. If you want the older JJTree behavior, you must explicitly set TOKENS_ARE_NODES=false in your options block.

SPECIAL_TOKENS_ARE_NODES

If you set this option along with TOKENS_ARE_NODES, then any “special” tokens are also treated as terminal nodes and are added to the tree – as preceding siblings to the regular Token they are associated with. This is off by default. As a practical question, SPECIAL_TOKENs are usually used for comments in the source code, so this would typically amount to deciding whether you want comments included in your AST.

SMART_NODE_CREATION

This option indicates that, by default, a production should result in a new node being created if there is more than one node on the stack. This is the default behavior, since it seems to be what most people would want out of the box, and, if you want the older default, of having every production be a definite node, you must explicitly turn it off via SMART_NODE_CREATION=false in your options block.

FREEMARKER_NODES

This option means that extra code is added so that your Node objects implement core FreeMarker API's, in particular freemarker.template.TemplateScalarModel and freemarker.template.TemplateNodeModel. This means that if you expose the tree you build to a FreeMarker template, you can walk the tree using a very natural syntax. Note, however, that using the FREEMARKER_NODES option creates a runtime dependency on freemarker.jar.

By default, JavaCC21 will generate subclasses of Token that correspond to the various lexical elements in your grammar. Consider the following specification:

    TOKEN #Operator : 
    {
       <PLUS : "+">
       |
       <MINUS : "-">
       |
       <TIMES : "*">
       |
       <DIVIDE : "/"> 
    }

For the above, JavaCC21 will define a Token subclass called Operator and the Tokens for “+”, “-”, “*”, and “/” will be instantiated as instances of Operator.

There is also the possibility of defining further subclasses. For example, in the following:

    TOKEN #Operator : 
    {
       <PLUS : "+"> #Plus
       |
       <MINUS : "-"> #Minus
       |
       <TIMES : "*"> #Times
       |
       <DIVIDE : "/"> #Divide
    }

Here, as before, the Operator class is defined as a subclass of Token and the classes Plus, Minus, Times, and Divide are subclasses of Operator. This disposition, in many cases, will allow one to write code using much more natural object-oriented programming idioms than was possible with legacy JavaCC.