meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
choice_points [2020/09/26 23:52] – [More than one way to skin a cat...] revuskychoice_points [2021/02/08 18:09] (current) – ↷ Links adapted because of a move operation revusky
Line 7: Line 7:
   * The expansions within a //choice construct//, i.e. n (two or more) expansions, separated by the "|" character, i.e. '' Foo | Bar | Baz ''   * The expansions within a //choice construct//, i.e. n (two or more) expansions, separated by the "|" character, i.e. '' Foo | Bar | Baz ''
  
-(Perhaps needless to say, all of the above constructs can arbitrarily combine and nest.)+(Perhaps needless to say, all of these constructs arbitrarily combine and nest within one another to an arbitrary level of complexity.)
  
-I think the simplest way to think about the above cases (for most people, anyway) is just in terms of their analogues in a procedural programming language. The first three cases are effectively binary choices, a choice between entering the expansion inside the parentheses and jumping directly to what follows it. In all of these cases, the expansion within the parentheses is a choice point. The last case is a choice between n options and each of the expansions is a choice point. +I think the simplest way to think about the above cases (for most people, anyway) is just in terms of their analogues in a procedural programming language. The first three cases are effectively binary choices, a choice between entering the expansion inside the parentheses and jumping directly to what follows it. In all of these cases, the expansion within the parentheses is a choice point. The last case is a choice between n options and each of those n sub-expansions is a choice point in the grammar
  
-==== Zero Or More ====+==== Zero Or One ====
  
-A //zero-or-more// is a single (non-looping) choice. If the enclosed expansion matches, we enter it, and if not, we jump directly to whatever follows it. So, if we write:+A //zero-or-one// is a single (non-looping) choice. If the enclosed expansion matches, we enter it, and if not, we jump directly to whatever follows it. So, if we write:
  
 <code> <code>
Line 83: Line 83:
     ....     ....
     else     else
-      expansionN();+      final_expansion();
 </code>     </code>    
  
Line 104: Line 104:
 Now, to be clear, looking ahead one token might be sufficient or not in any given spot, but if we don't specify any extra information, that is the //default assumption//. Another fancy, theoretical way of expressing this //default// would be: Now, to be clear, looking ahead one token might be sufficient or not in any given spot, but if we don't specify any extra information, that is the //default assumption//. Another fancy, theoretical way of expressing this //default// would be:
  
-//We enter an expansion at a choice point if the next token is in that expansion'**first set**.//+//We enter an expansion at a choice point if the next token is in that expansion'[[first set]].//
  
-Again, no need to be intimidated by the lingo. An expansion'//first set// is simply the set of tokens with which an expansion can begin. Depending how your brain is wired, it might be easier to think about it the other way round. Like so:+Again, no need to be intimidated by the lingo. An expansion'[[first set]] is simply the set of tokens with which an expansion can begin. Depending how your brain is wired, it might be easier to think about it the other way round. Like so:
  
-//If the next token is **not** in an expansion'**first set**, the expansion cannot possibly be matched, so we skip ahead to the following choice.//+//If the next token is **not** in an expansion'[[first set]], the expansion cannot possibly be matched, so we skip ahead to the following choice.//
  
 Now, regardless, it may be the case that more than one of the expansions at a given choice point matches this condition. Well, in that case, we have a secondary rule: //the first one gets it//. And this is really no different, by the way, from how if-elseif-else in a procedural programming language works. Now, regardless, it may be the case that more than one of the expansions at a given choice point matches this condition. Well, in that case, we have a secondary rule: //the first one gets it//. And this is really no different, by the way, from how if-elseif-else in a procedural programming language works.
Line 141: Line 141:
 The answer is that we do this by defining some sort of //lookahead//. (Note that //lookahead// is legacy JavaCC terminology but is frequently inaccurate. Really, we're talking about defining some sort of condition or //predicate// that overrides the default resolution algorithm, and that could well involve looking ahead more than the default of 1 token, or it could involve something else.) Probably the term //predicate// is more accurate, but that is a bit of a fancy word. Maybe just //condition// is better, but I may well use the term //lookahead// frequently in these spots because, yes, resolving the choice very often does involve scanning ahead more than one token. (//But not always.//) The answer is that we do this by defining some sort of //lookahead//. (Note that //lookahead// is legacy JavaCC terminology but is frequently inaccurate. Really, we're talking about defining some sort of condition or //predicate// that overrides the default resolution algorithm, and that could well involve looking ahead more than the default of 1 token, or it could involve something else.) Probably the term //predicate// is more accurate, but that is a bit of a fancy word. Maybe just //condition// is better, but I may well use the term //lookahead// frequently in these spots because, yes, resolving the choice very often does involve scanning ahead more than one token. (//But not always.//)
  
-==== Definite Numerical Lookahead ====+====Numerical Lookahead =====
  
 In the case given above, where we have that unreachable second expansion, the simplest solution would simply be to specify that we should scan ahead two tokens on the first expansion. So we could write the above choice as: In the case given above, where we have that unreachable second expansion, the simplest solution would simply be to specify that we should scan ahead two tokens on the first expansion. So we could write the above choice as:
Line 165: Line 165:
 (//Huh?//) (//Huh?//)
  
-It's really quite simple, you see. The second token, "baz", in the second expansion choice is //not a choice point//! So, yes, the generated parser at this point is expecting a "baz" but encountering a "bat". So yes, it will complain! (*Okay, when I say "complain", that is an anthropomorphic way of saying that an exception will be thrown.*)+It's really quite simple, you see. The second token, "baz", in the second expansion choice is //not a choice point//! So, yes, the generated parser at this point is expecting a "baz" but encountering a "bat". So yes, it will complain! (//Okay, when I say "complain", that is an anthropomorphic way of saying that an exception will be thrown.//)
  
-But that is all right and proper. Look at it this way... the above grammar snippet says that we look ahead 2 tokens to decide whether to enter the first expansion. Since we didn't specify anything for the second expansion, it now just looks ahead one token, and if an error occurs after that, then so be it. The parser is doing what it is supposed to do, right? You see, in this case, none of the three given choices can match the "foo" "bat" sequence anyway! So, end result: it throws an exception, **which is what it should do**. But the point is that we can have '' "foo" "bar" '' here or '' "foo" "baz" '' and the generated parser will handle both cases. Our first go at this, without the lookahead specified, it would spuriously reject a "foo" "bar", **even thought that is valid input**! If it rejects "foo" "bat" that is not really an issue //in terms of the problem we have set ourselves here// because **that is not valid input anyway**!+But that is all right and proper. Look at it this way... the above grammar snippet says that we look ahead 2 tokens to decide whether to enter the first expansion. Since we didn't specify anything for the second expansion, it now just looks ahead one token, and if an error occurs after that, then so be it. The parser is doing what it is supposed to do, right? You see, in this case, none of the three given choices can match the "foo" "bat" sequence anyway! So, end result: it throws an exception, **which is what it should do**. But the point is that we can have '' "foo" "bar" '' here or '' "foo" "baz" '' and the generated parser will handle both cases. Our first go at this, without the lookahead specified, it would spuriously reject a "foo" "bar", **even though it is valid input**! If it rejects "foo" "bat" that is not really an issue //in terms of the problem we have set ourselves here// because **that is not valid input anyway**!
  
 Well, if you don't currently understand my point there, then all I can say is just think about it. It will eventually become clear... Well, if you don't currently understand my point there, then all I can say is just think about it. It will eventually become clear...
Line 173: Line 173:
 ===== More than one way to skin a cat... ===== ===== More than one way to skin a cat... =====
  
-The //definite numerical lookahead// of two tokens worked okay in the above example, but generally speaking, it is a rather crude disposition. The legacy JavaCC tool provides two other ways to specify how we resolve a choice -- when the default resolution is not good enough. In the original, somewhat inaccurate terminology, they are called //syntactic// and //semantic// lookahead. JavaCC 21, meanwhile, provides two other tools, [[lookbehind]] and [[up to here]]. +The //definite numerical lookahead// of two tokens worked okay in the above example, but generally speaking, it is a rather crude disposition. The legacy JavaCC tool provides two other ways to specify how we resolve a choice -- when the default resolution is not good enough. In the original, somewhat inaccurate terminology, they are called //syntactic// and //semantic// lookahead. JavaCC 21, meanwhile, provides two other tools, [[contextual_predicates]] and [[up to here]]. 
  
 ==== Syntactic Lookahead ==== ==== Syntactic Lookahead ====
Line 207: Line 207:
 </code> </code>
  
-The code in the braces is just any Java expression, a black box from the point of view of our parser generator.+The code in the braces is just any Java expression that resolves to a boolean (true/false) value, a black box from the point of view of our parser generator.
  
  
Line 233: Line 233:
 In very many common usage cases, the [[up to here]] syntax removes the need to write separate //syntactic// or //numerical// lookaheads.     In very many common usage cases, the [[up to here]] syntax removes the need to write separate //syntactic// or //numerical// lookaheads.    
  
-By the same token, //semantic lookahead//, i.e. using arbitrary Java code to express conditions should always be a last resort. The new [[lookbehind]] feature in JavaCC 21 was designed to drastically reduce the cases in which it is necessary.) +By the same token, //semantic lookahead//, i.e. using arbitrary Java code to express conditions should always be a last resort. The new [[contextual_predicates]] feature in JavaCC 21 was designed to drastically reduce the cases in which it is necessary.) 
  
-A general rule of thumb would be to use [[up to here]] and [[lookbehind]] constructs whenever possible instead of //syntactic// and //semantic// lookahead respectively. (Though, granted, if it is not possible, then it's not possible.)+A general rule of thumb would be to use [[up to here]] and [[contextual_predicates]] constructs whenever possible instead of //syntactic// and //semantic// lookahead respectively. (Though, granted, if it is not possible, then it's not possible.)