If I only had a brain. . . (I need some Regex help)

K.V.

15 May 2021 22:13 (edited)

Hello.

First, I know I've been all over the place, asking about all sorts of stuff, and I haven't responded to some of those threads in a day or two, but I will. I'm working on a big port, and all those things must be included. So, I'm not just all over the place for no reason. I have good reason to be all over the place. :o)

Okay. . .

I know there's an easy way to do this with RegEx, but I'm not good at that.

Let's say I have this "formula":

  <PLTABLE
  "THE GAME"
  <LTABLE 3
  "How do I play this game?"
  "Try typing things on your keyboard."
  "If that doesn't work, go ask your mom.">
  <LTABLE 3
  "How do I stop the bunny people from stealing my carrots?"
  "You can't.">>
  <PLTABLE
  "THE AUTHOR"
  <LTABLE 3
  "Why does K.V. suck at regular expression stuff?"
  "Try asking, \"K.V., why do you suck?\""
  "No one knows."
  "Maybe he was born that way.">>

There's no telling how many topics there will be, no telling how many answers there will be, and no telling how many questions there will be, but there will be at least one of each.

There's also no telling how much whitespace there will be outside of the "". There's always only one line break, though.

What's the best way to turn this into a dictionary of lists or something?

K.V.

16 May 2021 04:18

I'm getting close to something, but the code isn't worth posting.

Can we use a global flag with regex in Quest?

Like, if my regex pattern was /red/g and the string was:

I like red. My favorite color is red. Do you like red?

If it was JS, it would match all three instances of "red".

...but in Quest, I can only do

regex = "red"
s = "I like red. My favorite color is red. Do you like red?"
params = Populate (regex, s)

...and that would only capture the first instance of red.

This one time, in band camp, there was thing, it was called Javascript, and all the people used it, and one of them showed me how to do this:

let regex = new RegExp('red', 'g');

Can we do something similar in Quest?

mrangel

16 May 2021 08:37

Unfortunately not. To match multiple parts you'd probably need to remove each one as you find it.

Is this the text storage in ZIL format, and you're storing it in a string so Quest can parse it? Might be easier to build a simple script that will transform it into Quest-readable XML dictionaries.

mrangel

16 May 2021 09:10

Or split the text on newlines, and loop over them handling them in a way determined by the start of the line.

Off the top of my head, typing on my phone, and not 100% sure if this is your intended output format… but this would be my first guess:

parents = NewList()
current = NewDictionary()
pltable = false
foreach (line, Split (source, chr(13))) {
  if (IsRegexMatch ("^\\s*\"(.+)\"\\s*(<*)\\s*$", line)) {
    parts = Populate  ("^\\s*\"(.+)\"\\s*(<*)\\s*$", line)
    if (TypeOf (current) = "stringlist") {
      list add (current, ListItem (parts, 1))
    }
    else {
      key = ListItem (parts, 1)
      if (pltable) {
        value = NewDictionary()
      }
      else {
        value = NewStringList()
      }
      dictionary add (current, key, value)
      list add (parents, current)
      current = value
    }
    for (i, 1, LengthOf(ListItem (parts, 2))) {
      if (ListCount (parents) = 0) {
        return (current)
      }
      else {
        current = ListItem (parents, ListCount (current) - 1)
        list remove (parents, current)
      }
    }
  }
  else if (IsRegexMatch ("^\\s*<PLTABLE", line)) {
    pltable = true
  }
}

K.V.

16 May 2021 23:55

Yep. It's ZIL.

In ZIL, a table is a list.

I have already converted one section to Quest in a few ways. I use a dictionary. Dictionaries, actually.

In this file, I left out the main table. It actually has:

<PLTABLE
; The hints object
  <PLTABLE
  "THIS IS A HINTS SECTION"
    <LTABLE 3
    "QUESTION"
    "HINT 1"
    "HINT 2">
    <LTABLE 3
    "QUESTION NUMBER TWO"
    "HINT 1">>
  <PLTABLE
  "THIS IS A SECOND HINTS SECTION."
    <LTABLE 3
    "QUESTION"
    "HINT ONE">
    <LTABLE 3
    "QUESTION TWO"
    "HINT ONE"
    "HINT TWO">>>

Yeah. They use a list of lists of lists.

So, I have a dictionary named "clues". Its keys are the text from the sections. The section key "unlocks" the string list comprised of the question and the hints.

I used that to make rooms. and ... I'll just post some code in a little while.

Anyway, I'm going to test your code. Right now, I'm just making section rooms inside of a room called "clues". Each section room's description is just a copy and paste of the LTABLE 3 text. Then, I have a script that converts that text to a string list, and so forth.

I was just thinking while coding the bit to replace \" with @@@ then remove " then replace @@@ with \", that it would be cool if I could just parse the damned ZIL into dictionaries without having to muck about with the section bits.

Anyway, I've gone completely loopy whilst coding this for . . . however many days it's been, and I must (try to) take a break for the evening.

Tomorrow, after I get my first dose of the vaccine, I shall return and see what all we've got going on here.

Thanks, mrangel!

mrangel

17 May 2021 00:26

the bit to replace \" with @@@ then remove " then replace @@@ with \", that

The usual way to do that would be to use a regexp like "(?<somename>(?:\\.|[^\\"]+)*)" which captures the contents of a quoted string, including any number of quotes escaped by an odd number of backslashes but not a non-escaped quote. Replacing shouldn't be necessary.

In your example I thought that would be more than you need, as you never have more than one string on a line so you can just use a single regex to extract the part between the first and last " on the line (assuming only whitespace before it, and whitespace or > after)

I didn't actually think about unescaping the quotes… presumably you'd do something like Replace (Replace (ListItem (parts, 1), "\\\\", "\\"), "\\\"", "\"")

mrangel

17 May 2021 00:29

Ah, my bad. I forgot that Quest's Populate is weird. You'd need to name the groups, I think. I'm used to languages where you can pretend the collection of captured groups is a numbered array.

K.V.

17 May 2021 01:58 (edited)

You'd need to name the groups, I think.

The documentation page concerning Populate confuses me.

Use a cache ID for improved performance if you repeatedly test strings against the same regular expression. The compiled regular expression will be cached and used again for subsequent calls to Populate (or GetMatchStrength or IsRegexMatch ) using the same cache ID.

Okay. . .

Also, this bit:

The “cache ID” parameter

All the above functions take an optional third parameter, the “cache ID”. If you supply a cache ID, the regex will be saved under that name. The next time you use that cache ID for any of the above functions, Quest will ignore the regex you supply, and use the one it created earlier instead.

Continuing with the example before:
IsRegexMatch(regex, s1, "my regex")
=> true
IsRegexMatch("nonsense", s1, "my regex")
=> true
The original regex is given a cache ID here (the string “my regex”). When IsRegexMatch is called a second time, Quest ignores the nonsense regex, because it already has a regex with that cache ID.

Every time the player types some input, Quest has to compare that against the regex for every command, and using cache IDs makes that process considerably faster (and it does that for any custom command you add yourself). It is doubtful if cache IDs are of significant use outside of that, and are more likely to be a source of obscure bugs, so my advice is to not use them.

If I am grasping this, this just means that we can use #object#, #text#, etc. (or their <?<object>.*) counterparts) because the regex Cache IDs are setup?

Like, if I did:

regex = "(heck|darn|shoot|dang)"
s = "get the darn lamp"
bsbool = IsRegexMatch(regex, s, "profanity")

After doing that in a game, does that mean I could do this at any point afterwards in a command pattern:

examine #profanity# #object# or ^(get|take|grab) (?<profanity>.*) (?<object>.*)$

???

The Pixie

17 May 2021 06:48

Cache IDs are just for performance. Rather that convert the string to a regex every time, Quest does it once, and can then remember it using its cache ID. It will not help here, I am afraid.

mrangel

17 May 2021 08:20

After doing that in a game, does that mean I could do this at any point afterwards in a command pattern:

No; it just means that later after using that test, another piece of code could do:

if (IsRegexMatch ("", s2, "profanity")) {

to test using the same regex, or do:

if (IsRegexMatch (some_regex, some_string, "name of regex")) {
  parts = Populate ("", some_string, "name of regex")

It's also why you can't easily use script to change a command's pattern (you'd have to clone the command in order to change its name as well).

What I meant for the code I posted above was change:

  if (IsRegexMatch ("^\\s*\"(.+)\"\\s*(<*)\\s*$", line)) {
    parts = Populate  ("^\\s*\"(.+)\"\\s*(<*)\\s*$", line)

  if (IsRegexMatch ("^\\s*\"(?<contents>.+)\"\\s*(?<closing><*)\\s*$", line)) {
    parts = Populate  ("^\\s*\"(?<contents>.+)\"\\s*(?<closing><*)\\s*$", line)

and then

change ListItem (parts, 1) to DictionaryItem (parts, "contents") to get the contents of the quotes
change ListItem (parts, 2) to DictionaryItem (parts, "closing") to get the closing > at the end of the line

I'm used to languages where the regex functions deal with numbered groups; but in Quest I can't treat the dictionary parts as a list as far as I'm aware.

mrangel

17 May 2021 16:39 (edited)

So, I have a dictionary named "clues". Its keys are the text from the sections. The section key "unlocks" the string list comprised of the question and the hints.

I wasn't so sure about that to start with. It seems a little odd for the first element of a list to be a question, and the rest answers. So I had it making the question a dictionary key pointing to a stringlist of answers.

If that's the desired format, it makes the code a little more elegant (and more robust).

<function name="ParseDataDictionary" parameters="input" type="dictionary">
  parents = NewList()
  current = null
  key = null
  open = false
  pattern = "^\\s*(?:<(?<list>LTABLE\\s*\\d*)|<(?<dictionary>PLTABLE)|\"(?<string>([^\\\\\"]+|\\\\.)*|(?<unknown>.+))\")(?<space>[\\s>]*)?(?<remainder>.*)$"
  while (not input = "") {
    if (not IsRegexMatch (pattern, input)) {
      error ("Can't happen: "+input)
    }
    parse = Populate (pattern, input)
    input = StringDictionaryItem (parse, "remainder")
    if (DictionaryContains (parse, "list")) {
      value = NewStringList()
      open = true
    }
    else if (DictionaryContains (parse, "dictionary")) {
      value = NewDictionary()
      open = true
    }
    else if (DictionaryContains (parse, "string")) {
      value = Replace (Replace (StringDictionaryItem (parse, "string"), "\\\"", "\""), "\\\\", "\\")
      open = false
    }
    else if (DictionaryContains (parse, "unknown")) {
      error ("Unexpected data: \"" + DictionaryItem (parse, "unknown") + "\"")
    }
    if (TypeOf (current) = "dictionary") {
      if (not key = null) {
        dictionary add (current, key, value)
        key = null
      }
      else if (open) {
        error ("Can't use a " + TypeOf (current) + " as a dictionary key.")
      }
      else {
        key = value
      }
    }
    else {
      list add (current, value)
    }
    if (open) {
      if (not current = null) {
        list add (parents, current)
      }
      current = value
    }
    if (DictionaryContains (parse, "space")) {
      padding = StringDictionaryItem (parse, "space")
      for (i, 1, LengthOf (padding)) {
        if (Mid (padding, i, 1) = ">") {
          if (key = null) {
            if (ListCount (parents) = 0) {
              if (remainder = "") {
                if (TypeOf ("current") = "dictionary") {
                  return (current)
                }
                else {
                  wrapper = NewDictionary()
                  dictionary add (wrapper, "result", current)
                  return (wrapper)
                }
              }
              else {
                error ("Unexpected data at end of string: "+remainder)
              }
            }
            else {
              current = ListItem (parents, ListCount (parents) - 1)
              list remove (parents, current)
            }
          }
          else {
            error ("Odd number of elements in dictionary; spare key \"" + key + "\" has no value.")
          }
        }
      }
    }
  }
  error ("Missing `>`")
</function>

This version assumes that a dictionary starts with <PLTABLE, a list starts with <LTABLE, and allows them to be nested inside each other in any configuration.

The pair of Replace()s used to turn \" into " and \\ into \ is not robust - but it doesn't need to be, because the big regexp at the top will always catch any unescaped strings; therefore it is safe to assume that a quote remaining in the string is escaped.

I've called it ParseDataDictionary because it assumes the top-level item is a dictionary. Quest doesn't allow functions to return "list or dictionary", so if your data structure starts with <LTABLE it will get wrapped inside a dictionary with a single element, key "result".

This code is a bit bigger than the last one; most of that is because of else cases or extra checks to ensure it gives a sensible error message if there's a character missing from your data or something. It's simpler really, it just handles errors instead of ignoring them and returning garbage.

This version also completely ignores whitespace outside of strings. As you're using a LISP-inspired language, this seems entirely logical. (your sample files have one string per line, but I think that's probably a stylistic choice to make the code more human readable. I think my function would be equally happy to parse your code if you had entered it as:

<PLTABLE"THE GAME"<LTABLE
3"How do I play this game?"
  "Try typing things on your keyboard.""If that doesn't work, go ask your mom."><LTABLE3"How do I stop the bunny people from stealing my carrots?"
  "You can't.">
><PLTABLE
  "THE AUTHOR"<LTABLE3"Why does K.V. suck at regular expression stuff?"
  "Try asking, \"K.V., why do you suck?\"""No one knows.""Maybe he was born that way.">>

K.V.

17 May 2021 17:41

It seems a little odd for the first element of a list to be a question, and the rest answers. So I had it making the question a dictionary key pointing to a stringlist of answers.

That's the same thing I did. After I got it working, I did see how it makes sense to have the question as the first in the string list, though. The way things are handled in the screen which actually lists the hints shows only the question first. Each time you press enter, another hint is appended. (I still just like the question as a key, though. At least I'm not alone.)

Also: Ooh! Ooh! I'm gonna try that code!

Thanks!!!

This topic is now closed. Topics are closed after 60 days of inactivity.

If I only had a brain. . . (I need some Regex help)

The “cache ID” parameter

Connect

Products

Resources

Support