meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
new_settings_in_javacc_21 [2021/02/09 11:16] revuskynew_settings_in_javacc_21 [2022/02/15 05:28] (current) chamberlain
Line 1: Line 1:
 Here is a list of settings that exist in JavaCC 21 but are not (and never were) present in the legacy JavaCC tool: Here is a list of settings that exist in JavaCC 21 but are not (and never were) present in the legacy JavaCC tool:
  
 +  * **BASE_NAME** This option is used to set the name of the parser, lexer, constants, and NFA data files instead of using the default naming convention that prefixes these files with the name of the grammar file. If this option is set to an empty string (BASE_NAME="";) then these filenames will not receive a prefix. See the PARSER_CLASS, LEXER_CLASS, and CONSTANTS_CLASS settings to individually specify the names for these classes.
 +  * **BASE_NODE_CLASS** When defined, this option allows you to specify the name of the base node from BaseNode to a name that you prefer, such as SimpleNode. This class implements the Node interface and is extended by the production classes generated by JavaCC.
   * **BASE_SRC_DIR** This supersedes the older OUTPUT_DIRECTORY setting. Files are generated //relative// to the BASE_SRC_DIR, i.e. taking into account the package naming. If this is unset, BASE_SRC_DIR is assumed to be the directory where the grammar is.   * **BASE_SRC_DIR** This supersedes the older OUTPUT_DIRECTORY setting. Files are generated //relative// to the BASE_SRC_DIR, i.e. taking into account the package naming. If this is unset, BASE_SRC_DIR is assumed to be the directory where the grammar is.
-  * **ENSURE_FINAL_EOL** With this setting turned on (it is off by default) the generated parser makes sure  +  * **CONSTANTS_CLASS** This option is used to set the name of the generated constants file instead of using the default file name based on the grammar filename. 
-that the input ends with a newline character. Some grammars are actually quite hard to write if you can'be  +  * **DEACTIVATE_TOKENS** This setting allows you to indicate that certain token types are de-activated by default when you instantiate the parser. Something like: ''DEACTIVATE_TOKENS=OPEN_PAREN,CLOSE_PAREN;'' would mean that those two tokens are not //active// by default when you instantiate the parser. Of course, you could use ''ACTIVE_TOKENS(....)'' to activate them when needed in the parse. 
-sure that every line (including the last one) terminates with a newline!+  * **DEFAULT_LEXICAL_STATE** This option allows you to specify the default lexical state for a grammar instead of relying on the legacy value of DEFAULT. Setting a meaningful DEFAULT_LEXICAL_STATE is extremely desirable for grammars that may be INCLUDEd by other grammars because it will prevent accidental duplication of lexical productions.  
 +  * **ENSURE_FINAL_EOL** With this setting turned on (it is off by default) the generated parser ensures that the input file ends with a newline character. (It tacks one on if it is not present.) This is a nitpicking detail but it is surprisingly difficult to write certain grammars (ones that are very line-oriented) if you cannot be sure that every line (including the last one!ends with a newline
 +  * **EXTRA_TOKENS** This setting allows you to indicate some additional token types that are not defined with regular expressions in the lexical grammar. This can be useful particularly in token hook routines.
   * **FAULT_TOLERANT** This turns on the experimental support for building a [[fault tolerant]] parser. It is off by default.   * **FAULT_TOLERANT** This turns on the experimental support for building a [[fault tolerant]] parser. It is off by default.
-  * **HUGE_FILE_SUPPORT** Since we believe that the normal usage of the tool is simply to build a tree, it makes little sense to have any qualms about reading in the entire input into memory. So this is the default. This option allows you to turn on the legacy behavior of only maintaining a (fairly small) buffer in memory of the input file. See [[https://javacc.com/2020/05/05/gigabyte-is-the-new-megabyte/|The Gigabyte is the new Megabyte]] for more information on the reasoning behind all this. Note that the experimental fault-tolerant parsing features only work with HUGE_FILE_SUPPORT offAlso, having TREE_BUILDING_ENABLED set to true (which is the default) means that HUGE_FILE_SUPPORT is automatically turned off+  * **FREEMARKER_NODES** Defining this option inserts extra code into Node objects that implement FreeMarker (template) API's. This allows you to use FreeMarker templates to walk the node tree using a more natural syntax. 
-  * **LEGACY_API** If you turn on this setting, the tool generates code that is more compatible with legacy JavaCC. One example is that JavaCC 21 removes publicly visible fields like Token.kind and Token.image and replaces them with getter/setter methods. If you have LEGACY_API setit leaves these fields as publicly visibleAlso, it generates static final int constants for your various token types, as a convenience to keep older code working. (JavaCC 21 uses type-safe enums in these cases.) Notehowever, that the existence of this setting is not guaranteed to keep all older code workingIt simply makes it less work to get legacy JavaCC code working with JavaCC 21. Projects that migrate to JavaCC 21 should, as soon as they reasonably can, refactor their code so that the LEGACY_API setting can be turned off. In other words, it is just meant to provide a temporary stopgap, not any sort of permanent solution for people migrating their projects. +  * **LEXER_CLASS** This option sets the name of the generated lexer file instead of using the default file name based on the grammar filenameThis option also sets the name of the NFA data file to use this same prefixFor example, LEXER_CLASS="FooLexer"; generates FooLexer.java and FooNfaData.java
-  * **PRESERVE_LINE_ENDINGS** This is true by default (though this could change in the future based on user feedbackIf you turn this setting off, all Windows/DOS style line endings (\r\n) are converted to UNIX/MacOS style (\n) internally when the file is read in. Note, by the way, that one advantage of this and the TABS_TO_SPACES option is that if you convert tabs to spaces and line endings to \n then your grammar's lexical specification can be a bit simpler. And your own code that runs over Tokens and Nodes. Your code can just assume that any line endings are a simple \n and and your horizontal whitespace is just spaces, not a mix of tabs and spaces, independently of what platform the generated code is running on.+  * **MINIMAL_TOKEN** Default is not set (false) Token Chaining is supported by adding two fields to Tokens. If token chaining is _not_ requireddefine this option to skip adding these two additional fields. 
 +  * **PARSER_CLASS** This option is used to set the name of the generated parser file instead of using the default file name based on the grammar filename. 
 +  * **PARSER_PACKAGE** By default, all classes are generated with no package (default package)When this option is definedall generated classes will have the specified package inserted at the top of the filesThis option improves the organization of source code in large non-trivial projects. 
 +  * **PRESERVE_LINE_ENDINGS** This is now off by default. That means that all Windows/DOS style line endings (\r\n) are converted to UNIX/MacOS style (\n) internally when the file is read in. Note, by the way, that one advantage of this and the TABS_TO_SPACES option is that if you convert tabs to spaces and line endings to \n then your grammar's lexical specification can be a bit simpler. And your own code that runs over Tokens and Nodes. Your code can just assume that any line endings are a simple \n and and your horizontal whitespace is just spaces, not a mix of tabs and spaces, independently of what platform the generated code is running on.
   * **SMART_NODE_CREATION** This is the default behavior, so you would have to explicitly turn it off. It means that if no JJTree-style tree-building annotation is used, then a new Node will be created if there are more than one Nodes on the stack. So, a production like '' A (B)* '' will create a new Node if there are one or more B's after the A. If there is only an ''A'' then the production will just leave it on the stack. It is our belief that this is the behavior that most people would want most of the time.   * **SMART_NODE_CREATION** This is the default behavior, so you would have to explicitly turn it off. It means that if no JJTree-style tree-building annotation is used, then a new Node will be created if there are more than one Nodes on the stack. So, a production like '' A (B)* '' will create a new Node if there are one or more B's after the A. If there is only an ''A'' then the production will just leave it on the stack. It is our belief that this is the behavior that most people would want most of the time.
   * **SPECIAL_TOKENS_ARE_NODES** This sets whether to add so-called "special tokens" to the AST. By default, it is set to false. (Note that this option and TOKENS_ARE_NODES are meaningless if TREE_BUILDING_ENABLED is set to false.)   * **SPECIAL_TOKENS_ARE_NODES** This sets whether to add so-called "special tokens" to the AST. By default, it is set to false. (Note that this option and TOKENS_ARE_NODES are meaningless if TREE_BUILDING_ENABLED is set to false.)
-  * **TABS_TO_SPACES** This is an integer (typically from 1 to 8, in practice) that defines how many spaces a tab stop is. This is off by default, but if you use this setting, all TAB characters (\t) are converted to spaces when the file is read in. Note that, if you do not have this turned on, all reported error locations simply treat a tab character as one horizontal offset. If you want JavaCC to report errors as if a TAB stop is 4 spaces, say, you need to set TAB_SPACES=4 in your settings. +  * **TABS_TO_SPACES** This is an integer (typically from 1 to 8, in practice) that defines how many spaces a tab stop is. This is now set to 8 by default. (Until very recentlythe default was that the option was off.) This means that all TAB characters (\t) are converted to spaces when the file is read in. Note that, if you turn this off, all reported error locations simply treat a tab character as one horizontal offset. If you want JavaCC to report errors as if a TAB stop is 4 spaces, say, you need to set TAB_SPACES=4 in your settings. 
   * **TOKENS_ARE_NODES** This sets whether we add Tokens as terminal nodes to the AST. By default, it is true.   * **TOKENS_ARE_NODES** This sets whether we add Tokens as terminal nodes to the AST. By default, it is true.
 +  * **TREE_BUILDING_DEFAULT** Default is true. A parser generated by JavaCC 21 automatically includes the code to build an AST (Abstract Syntax Tree). When set to false, the code is still available in the generated code but the AST is not built until you turn it on in the code.
 +  * **TREE_BUILDING_ENABLED** Default is true. A parser generated by JavaCC 21 automatically builds a tree. If you don't want a tree or want to build a tree in your own code actions, set TREE_BUILDING_ENABLED=false; and no tree building code is inserted into your parser.
 +  * **USE_PREPROCESSOR** When defined, JavaCC 21 will process preprocessor statements. The statements conform the behavior of Microsoft's C# #define/#undef and #if/#elif/#else/#endif constructs that are used to conditionally turn on and off ranges of lines in the grammar file.
  
 See [[deprecated settings]] for a list of legacy JavaCC options that no longer exist in JavaCC 21. See [[deprecated settings]] for a list of legacy JavaCC options that no longer exist in JavaCC 21.