meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
include [2020/02/09 19:25] revuskyinclude [2023/03/03 16:20] (current) revusky
Line 3: Line 3:
 # The INCLUDE Statement # The INCLUDE Statement
  
-JavaCC 21's **INCLUDE** statement allows you to break up your grammar file into multiple physical files. This feature is not present in legacy JavaCC. +Congo's **INCLUDE** statement allows you to break up your grammar file into multiple physical files. It would look like this typically:
  
-The motivation behind INCLUDE should be fairly obvious. By allowing you to reuse a base grammar or generally useful fragment in various files, you can avoid the copy-paste-modify antipattern that would have been necessary when using legacy JavaCC. Alsoallowing you to organize a large grammar into multiple physical files can be a big win in terms of maintainability.+    INCLUDE "IncludedGrammar.javacc" 
 + 
 +*This feature is not present in legacy JavaCC.* 
 + 
 +The motivation behind **INCLUDE** should be obvious. By allowing you to reuse a base grammar or generally useful fragment in various files, you can avoid the copy-paste-modify *antipatternthat would have been necessary when using legacy JavaCC. Generally speakingbeing able to to organize a large grammar into multiple physical files can be a big win in terms of maintainability.
  
 Still, as they say, the devil is in the details, and there are some various wrinkles that need to be covered here.  Still, as they say, the devil is in the details, and there are some various wrinkles that need to be covered here. 
Line 11: Line 15:
 ## The DEFAULT_LEXICAL_STATE setting ## The DEFAULT_LEXICAL_STATE setting
  
-In legacy JavaCC, if you defined a token production without specifying a lexical state, any lexical definitions corresponded to a lexical state called "DEFAULT". Now, obviously, this is a problem once you have an INCLUDE disposition, because the including and included grammar are liable to have a "DEFAULT" lexical state and, typically, we don't want the definitions to clobber one another. +In legacy JavaCC, if you defined a token production without specifying a lexical state, any lexical definitions belonged to a lexical state called "DEFAULT". Now, obviously, this is a problem once you have an INCLUDE disposition, because the including and included grammar are liable to have a "DEFAULT" lexical state and, typically, we don't want the respective definitions to clobber one another. 
  
-Thus, JavaCC 21 introduces a setting called **DEFAULT_LEXICAL_STATE**. That means that any lexical specifications where the lexical state is unspecified are in that state. Thus, a JSON grammar would likely have something like:+Thus, CongoCC has a setting called **DEFAULT_LEXICAL_STATE**. That means that any lexical specifications where the lexical state is unspecified are in that state. Thus, a JSON grammar would likely have something like this at the top: 
 + 
 + 
 +    DEFAULT_LEXICAL_STATE=JSON;
  
-    options { 
-       DEFAULT_LEXICAL_STATE="JSON"; 
-    } 
          
-at the top. In that case, any grammar for a language that wants to handle embedded JSON data would presumably define a different "default" lexical state, and when it wants to embedded JSON data, would have to explicitly switch to the JSON lexical state. +In that case, any grammar for a language that wants to handle embedded JSON data would presumably define its own "default" lexical state, and when it wants to embedded JSON data, would have to make an explicit switch to that JSON lexical state that is the *default* in the included grammar.
  
-At the moment, **DEFAULT_LEXICAL_STATE** is the only setting you can put in an INCLUDEd grammar that has any effect. All of the other options are simply ignored, since they are presumably set in the *including* grammar. In legacy JavaCC, if you defined a token production without specifying a lexical state, it those patterns are matched in a lexical state called "DEFAULT" -- by default, obviously. This is a problem in terms of its interaction with the INCLUDE directive, since both grammars are liable to have a "DEFAULT" lexical state. The solution is that the *default* lexical state (the one you are using if none is explicitly specified) should be different in the *including* grammar from the *included* one.+Actually, at the moment, **DEFAULT_LEXICAL_STATE** is the only setting you can put in an **INCLUDE**d grammar that has any effect. All of the other options are simply ignored, since they are presumably set in the top-level *including* grammar. In legacy JavaCC, if you defined a token production without specifying a lexical state, those patterns are matched in a lexical state called "DEFAULT" -- by default, obviously. This is a problem in terms of its interaction with the INCLUDE directive, since both grammars are liable to have a "DEFAULT" lexical state. So, you see, the solution is that the *default* lexical state (the one you are using if none is explicitly specified) should be different in the *including* grammar from the *included* one.
  
 ## Wrinkles with Code Injection ## Wrinkles with Code Injection
  
-JavaCC still supports the legacy JavaCC constructs of **PARSER_BEGIN...PARSER_END** and **TOKEN_MGR_DECLS**. (For how much longerI am not making any promises...). However, those constructs are ignored within an **INCLUDE**d grammar.+You can  *injectcode into the generated parser or lexer classfrom within an included grammar, but you need to write something like:
  
-If you want to *inject* code into the generated parser or lexer class, from within an included grammar, you need to write something like: +    INJECT PARSER_CLASS : 
- +
-    INJECT(PARSER_CLASS +
-    { +
-       ... +
-    }+
     {     {
        ...        ...
Line 39: Line 38:
 or: or:
  
-    INJECT(LEXER_CLASS +    INJECT LEXER_CLASS : 
-    { +
-       ... +
-    }+
     {     {
        ...        ...
     }     }
    
-JavaCC 21 will replace the PARSER_CLASS and LEXER_CLASS holders with the appropriate names. So, if you have a Foo language in which you want to embed JSON expressions, so you include a JSON grammar, if that JSON grammar is to include some code within the generated parser, it cannot be:+CongoCC 21 replaces the **PARSER_CLASS** and **LEXER_CLASS** aliases with the appropriate names -- i.e. the actual class names of the XXXParser or XXXLexer being generated. So, if you have a Foo language in which you want to embed JSON expressions, so you include a JSON grammar, if that JSON grammar is to include some code within the generated parser, it cannot be:
  
-    INJECT(JSONParser: +    INJECT JSONParser :
-    { +
-       ... +
-    }+
     {     {
        ...        ...
     }     }
  
-because the parser class we are generating is not JSONParser, it is FOOParser! However, the person writing a JSON grammar who wants it to be included does not know the name of the class. So, he needs to use the alias PARSER_CLASS or possibly LEXER_CLASS for the injected code to be included.+because the parser class we are generating is not JSONParser, it is FooParser! However, the person writing a a generally useful JSON grammar that can be embedded in other grammars does not know the classname of Parser (or Lexer) that is being generated. So, he needs to use the alias **PARSER_CLASS** or possibly **LEXER_CLASS** for the injected code to be included.
  
-Sodo not be surprised when the code within PARSER_BEGIN...PARSER_END is ignored if it is within an INCLUDEd grammar. You need to write INJECT(PARSER_CLASSto achieve the desired result.+In fact, the aliases **PARSER_CLASS**, **LEXER_CLASS**, **CONSTANTS_CLASS**, and **PARSER_PACKAGE** can be used in code injections and java actions to make an included grammar (or grammar fragmentmore generally useful.
  
-In fact, the aliases **PARSER_CLASS**, **LEXER_CLASS**, **CONSTANTS_CLASS**, **PARSER_PACKAGE**, and **LEXER_PACKAGE** can be used in code injections and java actions to make an included grammar (or grammar fragment) more generally useful.+To see a concrete example of **INCLUDE** in useyou can take a look at https://github.com/congo-cc/congo-parser-generator/tree/master/src/grammars. Specificallyyou can see that the CongoCC.ccc grammar **INCLUDE**s the Java.ccc. Another point is that this Java.ccc grammar file ison its own quite generally usable. And useful!
  
-To see a concrete example of **INCLUDE** in use, you can take a look at https://github.com/javacc21/javacc21/tree/master/src/grammarsSpecificallyyou can see that the JavaCC.javacc grammar **INCLUDE**s the Java.javaccAnother point is that this Java.javacc grammar file ison its own quite generally usableAnd useful!+</markdown> 
 + 
 +===== INCLUDE with Java Source files ===== 
 +  
 +Note also that if the name of the INCLUDEd file ends in .java or in .javthen the the file is assumed 
 +to only contain Java source codeThus, writing: 
 + 
 +   INCLUDE "SomeJavaCode.java" 
 +    
 +is exactly the same as if you wrote: 
 + 
 +    INJECT : { 
 +       (contents of the SomeJavaCode.java file here) 
 +    } 
 + 
 +In other wordsit is equivalent to the second sort of code injection described [[code_injection_in_javacc_21##the_inject_block_with_no_type_specified|here]].
  
-</markdown>