Making the parser more flexible

Another thought here.

This would be a big tweak, but it just occurred to me.

put 'gorillas in the mist' DVD in jack in the box

Neither greedy nor non-greedy patterns will allow that. But I think I can see a way it could be done.
I'm thinking of a function, GetAllCommandMatches (), which returns a list. Each element in the list is a dictionary with values for 'command', 'object1', and 'object2'. (In the case of multiple commands having patterns that match the same player input, they would be ordered by match strength)

The parser could then loop over the list, matching the "object1" and "object2" strings to objects in scope, and accept the first one where object1 and object2 are both sane object names.

It would need a lot of fiddling around with the details, but it basically comes down to:

  1. Modify the pattern. After each named match, add a second part. (?=(?<afterobject1>.*$)).
    • This causes Populate to provide a string "afterobject1" which contains everything after the string that's in "object1"
  2. Repeat the match, but with the regex modified again. Between object1 and afterobject1 patterns, you add "(?!" + Join (previous_afterobject1_values, "$|") + "$)"
    • This forces the regex to match any way of splitting up the string that isn't the same as the previous one.
  3. Repeat until you get a non-match

That is a good idea, but besides the coding, it would mean updating every command pattern across over a dozen languages, so it may not be happening soon, unless anyone wants to do a fork and do it themselves?


No it wouldn't. The changes to the patterns would have to be made programatically, because you're changing the regex at each iteration of the match. It would be a slightly ugly piece of code, because you would have to effectively split the pattern on the ) that corresponds to each (?<; so you'd have to walk through the string character by character to find them. But I don't think that should be a problem. You'd end up with a list in which each element is either a string (part of the pattern) or a dictionary containing the name of the parameter that ends here. Then loop over that list putting it back together into a pattern, with the extra bits in.

Obviously, that's likely to be an expensive operation; but it only needs to be done once for each pattern.
To make it more efficient, you could add a flag on commands specifying if it's necessary to use this method.


I really need to cut my coding time down so I'm not staring at this for more than an hour at a time. So I'll try to come up with an implementation one function at a time.

<function name="SplitRegexParts" parameters="pattern" type="stringlist">
  result = NewStringList()
  stack = NewList()
  i = 0
  building = ""
  while (i < LengthOf(pattern)) {
    i = i + 1
    char = Mid (pattern, i, 1)
    if (char = "\\") {
      char = Mid (pattern, i, 2)
      i = i + 1
    }
    building = building + char
    if (char = "(") {
      // is there a neater way to add an element to the beginning of a list?
      push = NewList()
      list add (push, IsRegexMatch ("^\\(\\?<\\w+>", Mid (pattern, i), "startswithnamedsubgroup"))
      stack = ListCombine (push, stack)
    }
    else if (char = ")") {
      if (ListCount (stack) = 0) {
        error ("Unmatched ')' in regular expression")
      }
      isPattern = ListItem (stack, 0)
      list remove (stack, isPattern)
      if (isPattern) {
        if (Mid (pattern, i) = ")$") {
          i = i + 2
          building = building + "$"
        }
        list add (result, building)
        building = ""
      }
    }
  }
  if (not building = "") {
    list add (result, building)
  }
  if (not EndsWith (pattern, "$")) {
    list add (result, "")
  }
  if (not ListCount (stack) = 0) {
    error ("Unmatched '(' in regular expression")
  }
  return (result)
</function>

This should split up a regular expression at the points where the extra (?=(?<afterobject1>.*$)) clauses need to be added.
For example, it you gave it "^put (?<object1>.+) (in|on) (?<object2>.+)$", it should return a stringlist with 2 elements, ("^put (?<object1>.+)" and " (in|on) (?<object2>.+)$")

The extra regexen I mentioned earlier can be added between the elements in this string (but not at the end), to identify the point at which the break matched, and force it to break at different points.


Actually, I'm probably an idiot. You could achieve the same effect by splitting on the string "(?<", at the cost of slightly reduced efficiency for the final expression.


And if you ever needed proof that anxiety can fry your brain… what was I even thinking of?

Really bad code OK… that makes the whole thing a bit simpler, especially was we don't really need to track which "after" variable (let's call them "split" because they're no longer after) belongs to which real parameter.
<function name="GetPopulateVariants" type="list" parameters="pattern, input">
  results = NewList()
  if (not IsRegexMatch (pattern, input)) {
    return (results)
  }
  // TODO - split pattern into parts
  // make an (empty) list of found options for each split part
  // pseudocode follows
  current_subpattern = ListCount (patternparts) - 1
  while (current_subpattern > 0 or (current_subpattern = 0 and not StartsWith(pattern, "^"))) {
    // TODO - build an expression based on the current found parts
    if (IsRegexMatch (currentexpression, input)) {
      // TODO - run Populate; split results into split-subpatterns and actual-subpatterns
      // TODO - if the actual subpatterns don't match a set in results, add them
      // TODO - add the split subpattern value equal to current_subpattern to the current expression
      //   or while there isn't one, decrement current_subpattern
    }
    else {
      // remove all previous matches from current_subpattern and later
      current_subpattern = current_subpattern - 1
    }
  }
  return (result)
</function>

I'm not feeling so great today. I know there's stupid mistakes in there, I just can't think straight. Will edit later.


I wonder if this is any better:

<function name="GetPopulateVariants" type="list" parameters="pattern, input">
  results = NewList
  if (not IsRegexMatch (pattern, input)) {
    return (results)
  }
  excludes = NewDictionary()
  currentpattern = pattern
  patternparts = Split (pattern, "(?<")
  while (true) {
    matchparts = Populate (currentpattern, input)
    newsplits = "*END*"
    newresult = NewStringDictionary()
    foreach (key, matchparts) {
      if (StartsWith (key, "split")) {
        newsplits = key + ";" + newsplits
      }
      else {
        dictionary add (newresult, key, DictionaryItem (matchparts, key))
      }
    }
    list add (results, newresult)
    newsplits = Split (newsplits)
    currentpattern = "^(?!.)x"
    while (not IsRegexMatch (currentpattern, input)) {
      key = ListItem (newsplits, 0)
      if (key = "*END*") {
        return (results)
      }
      else if (not DictionaryContains (excludes, key)) {
        lst = NewStringList()
        dictionary add (excludes, key, lst)
      }
      else {
        lst = DictionaryItem (excludes, key)
      }
      if (ListContains (lst, DictionaryItem (matchparts, key))) {
        dictionary remove (excludes, key)
        list remove (newsplits, key)
        currentpattern = "^(?!.)x"
      }
      else {
        list add (lst, EscapeRegex (DictionaryItem (matchparts, key)))
        currentpattern = ""
        partname = "split"+i
        for (i, 0, ListCount (patternparts) - 1) {
          if (i > 0 or not StartsWith (pattern, "^")) {
            if (DictionaryContains (excludes, partname)) {
              excl = DictionaryItem (excludes, partname)
              excl = Join (excl, "|")
              if (not excl = "") {
                currentpattern = currentpattern + "(?!" + excl + ")"
              }
            }
            currentpattern = currentpattern + "(?=(?<" + partname + ">.*$))"
          }
          piece = ListItem (patternparts, i)
          if (i > 0 and not piece = "") {
            currentpattern = currentpattern + "(?<"
          }
          currentpattern = currentpattern + piece
        }
      }
    }
  }
</function>

I wonder if this is any better:

<function name="GetPopulateVariants" type="list" parameters="pattern, input">
  results = NewList
  if (not IsRegexMatch (pattern, input)) {
    return (results)
  }
  excludes = NewDictionary()
  currentpattern = pattern
  patternparts = Split (pattern, "(?<")
  while (true) {
    matchparts = Populate (currentpattern, input)
    newsplits = "*END*"
    newresult = NewStringDictionary()
    foreach (key, matchparts) {
      if (StartsWith (key, "split")) {
        newsplits = key + ";" + newsplits
      }
      else {
        dictionary add (newresult, key, DictionaryItem (matchparts, key))
      }
    }
    list add (results, newresult)
    newsplits = Split (newsplits)
    currentpattern = "^(?!.)x"
    while (not IsRegexMatch (currentpattern, input)) {
      key = ListItem (newsplits, 0)
      if (key = "*END*") {
        return (results)
      }
      else if (not DictionaryContains (excludes, key)) {
        lst = NewStringList()
        dictionary add (excludes, key, lst)
      }
      else {
        lst = DictionaryItem (excludes, key)
      }
      if (ListContains (lst, DictionaryItem (matchparts, key))) {
        dictionary remove (excludes, key)
        list remove (newsplits, key)
        currentpattern = "^(?!.)x"
      }
      else {
        list add (lst, EscapeRegex (DictionaryItem (matchparts, key)))
        currentpattern = ""
        partname = "split"+i
        for (i, 0, ListCount (patternparts) - 1) {
          if (i > 0 or not StartsWith (pattern, "^")) {
            if (DictionaryContains (excludes, partname)) {
              excl = DictionaryItem (excludes, partname)
              excl = Join (excl, "|")
              if (not excl = "") {
                currentpattern = currentpattern + "(?!" + excl + ")"
              }
            }
            currentpattern = currentpattern + "(?=(?<" + partname + ">.*$))"
          }
          piece = ListItem (patternparts, i)
          if (i > 0 and not piece = "") {
            currentpattern = currentpattern + "(?<"
          }
          currentpattern = currentpattern + piece
        }
      }
    }
  }
</function>

The Pixie, can you remove buy and purchase from the basic commands, so I can use and change them in my games again?
...worth a shot.


Just thought... with this, you could also do patterns like:

^ask( (?<object_npc>.+))? about ((?<object_topic>.+)|(?<text_topic>.+))$

I think I see a potential edge case here, but I can't work out any examples that would actually trigger it.

But basically, GetPopulateVariants in this case would return a two-membered list:

  1. stringdictionary
    • object_npc: "bob"
    • object_topic: "his garden"
  2. stringdictionary
    • object_npc: "bob"
    • text_topic: "his garden"

If "his garden" is an object, it's passed to the ask command as an object. If it can't be resolved, it gets passed as a text parameter. Because the parser would take the first variant in which all the object* and exit* parameters can be resolved.

And now you've got me thinking about how you'd do the scope for an "ask about (object)" command. Presumably the scope for the second object should include all objects the player has seen. How would you track that?
Easy: have the backdrop scope script do game.seenobjects = ListCompact (ListCombine (game.seenobjects, items)).
Then you have an ask command that will easily work with objects with alternate aliases, and incomplete names, and will allow the player to ask about anything they've seen.


Log in to post a reply.

Support

Forums