meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
lookbehind [2020/09/25 10:04] revuskycontextual_predicates [2023/03/03 14:23] (current) revusky
Line 1: Line 1:
-====== LOOKBEHIND ======+====== Contextual Predicates ======
  
-A //lookbehind predicate// allows you to add conditions at [[choice points]] based on scanning back in the call/lookahead stack. The easiest way to describe this is with some actual examples.+A //contextual predicate// allows you to add conditions at [[choice points]] based on scanning back in the call/lookahead stack. We are not aware of any other parser generator tool that has this feature.
  
-Probably the most typical usage will be to guarantee that a production is not //re-entrant//, i.e. that it is not allowed to nest recursively. This can now be expressed as follows:+The easiest way to describe this is with some actual examples. 
 + 
 +==== Specifying that a production is non-reentrant ==== 
 + 
 +Probably the most typical usage will be to guarantee that a production is not //re-entrant//, i.e. that it is not allowed to nest recursively. This can now be expressed very cleanly with a //contextual predicate// as follows:
  
 <code> <code>
Line 9: Line 13:
 </code> </code>
  
-First of all, the tilde "~" character that starts the predicate indicates negation. The above predicate indicates that we scan backward in the call stack to see whether we have previously entered a ''Foo'' production. If that is //not// the case (because the condition is negated with the "~") then we can enter the ''Foo'' production. Note that a //lookbehind predicate// starts with either a backslash "\" or a forward slash "/". The above predicate uses a backslash and that means that we scan backwards from the current production up towards the root; a forward slash means that we are scanning forward from the root. +First of all, the tilde "~" character that starts the predicate indicates negation. The above predicate indicates that we scan backward in the call stack to see whether we have previously entered a ''Foo'' production. If that is //not// the case (because the condition is negated with the "~") then we can enter the ''Foo'' production.  
 + 
 +The above sort of predicate will probably be the most commonly used pattern. However, more complex conditions can be formed. 
 + 
 +==== Scanning Forward vs. Backward, Ellipsis and Wild-card  ==== 
 + 
 +Note that the elements in a //contextual predicate// are separated either with a backslash "\" or a forward slash "/". The previous example used a backslash and that means that we scan backwards from the current production up towards the root; a forward slash means that we are scanning forward from the root. 
  
-In the above example, the ellipsis "..." that follows the backslash means that there can be an arbitrary number of intervening productions in the call stack. If, for example, we wrote:+In the above example, the ellipsis "..." that follows the backslash means that there can be an arbitrary number of intervening productions in the call stack. The //wild-card// or simply //dot// means that we match the occurrence (exactly one!) of any production. If, for example, we wrote:
  
 <code> <code>
-  [ SCAN ~\.\Foo => Foo] +  [ SCAN ~\.\Bar => Foo] 
 </code> </code>
  
-this would mean that we enter the ''Foo'' production //only if// the direct parent of the current production is a ''Foo''+this would mean that we enter the ''Foo'' production //only if// the direct parent of the current production //is not// a ''Bar''
  
 Or alternatively, Or alternatively,
  
-[ SCAN \.\Bar => Foo ]+<code> 
 +   [ SCAN \.\Bar => Foo ] 
 +</code>
  
-would mean that we enter the ''Foo'' production if the parent of the current production is a ''Bar''. (Note that this predicate does not start with a "~", so thus is //not// negated.+would mean that we enter the ''Foo'' production if the parent of the current production //is// a ''Bar''. (Note that this predicate does not start with a "~", so thus is //not// negated.
  
-So, consider the following predicate:+Now, consider the following predicate that uses a forward slash:
  
 <code> <code>
Line 31: Line 43:
 </code>     </code>    
  
-This means that we enter the Baz production only if the root production is a ''Foo'' and we then entered a ''Bar''.+This means that we enter the Baz production only if the root production is a ''Foo'' and we then entered directly a ''Bar''. 
 + 
 +==== Optional Ending Slash ====
  
 If the predicate begins with a forward slash, it may end //optionally// with a backslash. And vice versa. If a predicate begins with a backslash, it may //optionally// end with a forward slash. For example, consider the following predicate: If the predicate begins with a forward slash, it may end //optionally// with a backslash. And vice versa. If a predicate begins with a backslash, it may //optionally// end with a forward slash. For example, consider the following predicate:
Line 51: Line 65:
 </code> </code>
  
 +==== Summary ====
  
-===== Recap ===== +A //contextual predicate// starts optionally with a tilde "~" to indicate negation. The first character after the tilde (or simply the first character if there is no tilde) must be either a backslash or a forward slash. The backslash indicates that we are scanning backwards from the current production and the forward slash means that we are scanning forward from the current production.
- +
-A //lookbehind predicate// starts optionally with a tilde "~" to indicate negation. The first character after the tilde (or simply the first character if there is no tilde) must be either a backslash or a forward slash. The backslash indicates that we are scanning backwards from the current production and the forward slash is that we are scanning forward from the current production.+
  
 An ellipsis "..." means that we can have an arbitrary number (including zero) of intervening productions. A dot "." means that we have exactly one production of any type.  An ellipsis "..." means that we can have an arbitrary number (including zero) of intervening productions. A dot "." means that we have exactly one production of any type. 
Line 64: Line 77:
 </code> </code>
  
-The above would mean that we check that we aren't already inside a ''Foo'' production AND we scan ahead up to 2 tokens of lookahead when deciding whether to enter Foo. Otherwise, we break out of the loop.+The above would mean that we check that we aren't already inside a ''Foo'' production AND we also scan ahead up to 2 tokens of lookahead when deciding whether to enter Foo. Otherwise, we break out of the loop.
  
 Or alternatively, we can specify a //syntactic// and/or //semantic// lookahead: Or alternatively, we can specify a //syntactic// and/or //semantic// lookahead:
  
 <code> <code>
-   ( SCAN ~\...\Foo +   ( SCAN ~\...\Foo "bar" "baz" => Foo )*
 </code> </code>
 +
 +In the above we specify that Foo must be //non-reentrant// and also that the next 2 tokens must be "bar" followed by "baz", or else we jump out of the loop.
 +
 +NB. If you have a ''SCAN'' statement that does not specify either numerical or syntactic lookahead, then the generated code will scan ahead an //unlimited// number of tokens. (Unless the expansion to be parsed is constrained by an [[up to here]] marker.) This is a key characteristic of the newer [[scan statement]].
 +
 +Note also that //contextual predicates//, like syntactic lookahead in CongoCC, can be nested arbitrarily and work in an arbitrarily nested scanahead routine.
 +
 +