meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
up_to_here [2020/09/25 18:39] – [Niggles (Annoying Details)] revuskyup_to_here [2023/03/03 14:26] (current) revusky
Line 1: Line 1:
-===== The "Up-to-here" Construct =====+===== The "Up-to-here" Marker =====
  
-The //up-to-here// marker ''=>||'' and the //up-to-here-plus// marker ''=>|+n'' (where n is an integer) was introduced to be able to express lookahead in a more succinct and less err-prone manner than using the legacy ''LOOKAHEAD'' or even the terser ''SCAN'' statement.+The //up-to-here// marker ''=>||'' and the //up-to-here-plus// marker ''=>|+n'' (where n is an integer) were introduced to be able to express [[lookahead]] in a more succinct and less err-prone manner than using the ''LOOKAHEAD'' from legacy JavaCC or even the terser ''SCAN'' statement.
  
 Where in legacy JavaCC, you would write: Where in legacy JavaCC, you would write:
Line 9: Line 9:
 </code>     </code>    
  
-or in JavaCC 21 (//without up-to-here//) you could write:+or in CongoCC (//without up-to-here//) you could write:
  
 <code> <code>
Line 45: Line 45:
 </code> </code>
  
-(I refer to these as //simple// because both of them are just a sequence of three string literal tokens. The same concept applies to much more complex expansions, of course.)+(I refer to these as //simple// because both of them are just a sequence of three string literal tokens. I use these as examples, but the same concept applies to much more complex expansions, of course.)
  
 Suppose further that we have a choice construct: Suppose further that we have a choice construct:
Line 53: Line 53:
 </code> </code>
  
-Now, if no ''LOOKAHEAD'' (or ''SCAN'') is specified, then the ''Foobaz'' production is going to be unreachable. This is because, by default, JavaCC looks ahead exactly one token to decide which of the two expansions to enter, and will simply go with the first one that matches. So, assuming that the next token is a "foo", then it will //always// enter ''Foobar'' and ''Foobaz'' is unreachable.+Now, if no ''LOOKAHEAD'' (or ''SCAN'') is specified, then the ''Foobaz'' production is going to be unreachable. This is because, by default, CongoCC looks ahead exactly one token to decide which of the two expansions to enter, and will simply go with the first one that matches. So, assuming that the next token is a "foo", then it will //always// enter ''Foobar'' and the second choice, ''Foobaz''is unreachable.
  
 The classic solution would be to specify two tokens of lookahead, so that we need to match "foo" followed by "bar" to enter ''Foobar''. Like so: The classic solution would be to specify two tokens of lookahead, so that we need to match "foo" followed by "bar" to enter ''Foobar''. Like so:
Line 61: Line 61:
 </code> </code>
  
-In JavaCC 21, you have the option of putting an //up-to-here// marker in the ''Foobar'' production like so:+In CongoCC, you have the option of putting an //up-to-here// marker in the ''Foobar'' production like so:
  
 <code> <code>
Line 67: Line 67:
 </code> </code>
  
-This means that we scan up to (and including) the "bar" token when the expansion at a [[choice point]] starts with ''Foobar''.+This means that we scan up to (and including) the "bar" token when the expansion at a [[choice points|choice point]] starts with ''Foobar''.
  
 Now, the choice construct above can be written simply as: Now, the choice construct above can be written simply as:
Line 116: Line 116:
 The above basically summarizes how the new //up-to-here// construct works. There are, however, a few little annoying details to keep in mind. The above basically summarizes how the new //up-to-here// construct works. There are, however, a few little annoying details to keep in mind.
   * If an expansion has an //up-to-here// marker then any nested //up-to-here// is ignored.   * If an expansion has an //up-to-here// marker then any nested //up-to-here// is ignored.
-  * If an expansion has an explicit //numerical// or //syntactic// lookahead, then any //up-to-here// marker in a nested production is ignored.+  * If an expansion has an explicit //numerical// or //syntactic// lookahead, then any //up-to-here// marker in a that expansion or in a nested production is ignored.
  
 What this means, for example, is that if we have: What this means, for example, is that if we have:
Line 124: Line 124:
 </code> </code>
  
-then any //up-to-here// inside of the ''Foo'' production is ignored because, at a higher level, we specified that we want to scan past it up to and including the ''Bar''that follows.+then any //up-to-here// inside of the ''Foo'' production is ignored because, at a higher level, we specified that we want to scan past it up to and including the ''Bar'' that follows.
  
 By the same token, if we specify a specific numerical limit, then that overrides any nested //up-to-here// as well. So, in the following: By the same token, if we specify a specific numerical limit, then that overrides any nested //up-to-here// as well. So, in the following:
Line 153: Line 153:
  
 Though it could conceivably be anti-intuitive (I'm not sure really...) the //up-to-here// marker in the ''Bar'' above is //not// used. Or, in other words, the ''( Foo )*'' expansion above only scans //one// token ahead on each iteration of the loop. This is the current implementation and it is possible that it will be revisited at a later point. Perhaps not, since respecting the //up-to-here// marker when recursing more than one level deep seems like it could be problematic, both in the implementation and in terms of grammars being easily readable.  Though it could conceivably be anti-intuitive (I'm not sure really...) the //up-to-here// marker in the ''Bar'' above is //not// used. Or, in other words, the ''( Foo )*'' expansion above only scans //one// token ahead on each iteration of the loop. This is the current implementation and it is possible that it will be revisited at a later point. Perhaps not, since respecting the //up-to-here// marker when recursing more than one level deep seems like it could be problematic, both in the implementation and in terms of grammars being easily readable. 
-    + 
 +Note also that the only //nested// up-to-here that applies to an expansion is in a NonTerminal that //starts// the expansion. Thus, if we have the expansion: 
 + 
 +<code> 
 +   ( Foo Bar Baz )* 
 +</code> 
 + 
 +The //up-to-here// marker inside the ''Bar'' production is not used, only one inside of ''Foo'', assuming it does have one! In the above case, if we did want to scan past Foo and into Bar, stopping after two tokens, we could write: 
 + 
 +<code> 
 +   ( Foo =>|+2 Bar Baz )* 
 +</code> 
 + 
 +As of this writing, there is no way to get the lookahead machinery to respect the //up-to-here// marker inside of a production (like ''Bar'' above) if it is not at the very beginning of the expansion. This may be addressed later, but it is probably not at all urgent. In most cases, where you want to scan past a production and then somewhat more into the next one, it is exactly one token. You will typically have something like: 
 + 
 +<code> 
 +   MethodDeclaration : [ Modifiers ] ReturnType <IDENTIFIER> =>||+1 Args Block ; 
 +</code> 
 + 
 +What the above means is that we scan to the end of the <IDENTIFIER> (which would be the method name) and then one more token to check for the initial parenthesis "("in the Args production. We anticipate that the vast majority of //up-to-here-plus// markers will indicate one extra token. (Though that remains to be seen in actual praxis, admittedly...)