Help with string manipulation

Help!

So far, I have this:

  <function name="GetOutputList" parameters="s" type="stringlist">
    a = NewStringList()
    i = 72
    list add (a, Left(s,i))
    s = Mid(s, i + 1)
    while (not s = "") {
      list add (a, Left(s,i))
      s = Mid(s, i + 1)
    }
    return (a)
  </function>

I pasted a bit of Lorem Ipsum into a room description to test it with this:

s = lorem.description
list = GetOutputList(s)
i = 0
foreach (line, list) {
  msg (i + ": " + line)
  i = i + 1
}

This is the output:

0: Lorem ipsum dolor sit amet, consectetur adipiscing elit. In non interdum
1: felis. Vestibulum rhoncus vel felis a maximus. Quisque fringilla semper
2: lacus, at fermentum velit tempor quis. In nec ipsum a sapien mattis bla
3: ndit. Maecenas id neque lacus. In magna nibh, blandit a luctus nec, auct
4: or at quam. Aenean volutpat sapien risus, at tincidunt massa bibendum ac
5: . Ut tristique ex a nibh hendrerit, in posuere turpis auctor. Quisque ur
6: na risus, fermentum vitae gravida non, venenatis a nisl. Nulla commodo d
7: olor at urna tristique, at gravida nulla imperdiet.

Nam dolor n
8: isl, rutrum eu mattis at, gravida ut libero. Nulla facilisi. Vestibulum
9: tempus auctor odio, vitae consectetur justo condimentum in. Pellentesque
10: aliquam augue duis.


JS Version

(I can return data via ASLEvent, if easier to do this in JS.)

function getOutputArray(s) {
	var a = [];
	var i = 72;
	a.push(s.substring(0, i))
	while( (s = s.substring(i, s.length)) != "" ){
		a.push(s.substring(0, i))
	}
	return a;
}
let s = `Lorem ipsum dolor sit amet, consectetur adipiscing elit. In non interdum felis. Vestibulum rhoncus vel felis a maximus. Quisque fringilla semper lacus, at fermentum velit tempor quis. In nec ipsum a sapien mattis blandit. Maecenas id neque lacus. In magna nibh, blandit a luctus nec, auctor at quam. Aenean volutpat sapien risus, at tincidunt massa bibendum ac. Ut tristique ex a nibh hendrerit, in posuere turpis auctor. Quisque urna risus, fermentum vitae gravida non, venenatis a nisl. Nulla commodo dolor at urna tristique, at gravida nulla imperdiet.

Nam dolor nisl, rutrum eu mattis at, gravida ut libero. Nulla facilisi. Vestibulum tempus auctor odio, vitae consectetur justo condimentum in. Pellentesque aliquam augue duis. `


That gives me this array:

  • Lorem ipsum dolor sit amet, consectetur adipiscing elit. In non interdum
  • felis. Vestibulum rhoncus vel felis a maximus. Quisque fringilla semper
  • lacus, at fermentum velit tempor quis. In nec ipsum a sapien mattis bla
  • ndit. Maecenas id neque lacus. In magna nibh, blandit a luctus nec, auct
  • or at quam. Aenean volutpat sapien risus, at tincidunt massa bibendum ac
  • . Ut tristique ex a nibh hendrerit, in posuere turpis auctor. Quisque ur
  • na risus, fermentum vitae gravida non, venenatis a nisl. Nulla commodo d
  • olor at urna tristique, at gravida nulla imperdiet. Nam dolor nisl, rut
  • rum eu mattis at, gravida ut libero. Nulla facilisi. Vestibulum tempus a
  • uctor odio, vitae consectetur justo condimentum in. Pellentesque aliquam
  • augue duis.

I want this to sort of behave like word-wrap, though.

How can I split a string at the 72nd character or at the space (or line break) before the 72nd character, without splitting in the middle of a word?


One way:

<function name="GetOutputList" parameters="s" type="stringlist">
  pattern = "^((?<line>.{1,72})((\\s+|(?=<-))(?<remainder>.+))?|(?<line>.{72})(?<remainder>.*))$"
  output = NewStringList()
  if (s = "") {
    return (output)
  }
  while (true) {
    parts = Populate (pattern, s, "wordwrap")
    list add (output, DictionaryItem (parts, "line"))
    if (DictionaryContains (parts, "remainder")) {
      s = DictionaryItem (parts, "remainder")
    }
    else {
      return (output)
    }
  }
</function>

Note that this uses the pattern (\s+|(?=<-)) which matches one or more of any whitespace character, or a zero-character string immediately after a hyphen. Means that it can break a line in the middle of a hyphenated word if necessary. You could replace that with a \s if you want to match a single space, or a literal space if you want it to be an actual ASCII space and not some other whitespace character.

The alternate pattern at the end grabs 72 characters exactly, in case the first one doesn't match (which can only happen if the line is a single word of 72 or more characters)

Alternatively, if you don't want to wrestle with regexen, you could do a simpler but less flexible pure-quest version:

<function name="GetOutputList" parameters="s" type="stringlist">
  line = ""
  output = NewStringList()
  foreach (word, Split (s, " ")) {
    while (LengthOf (word) > 71) {
      if (line = "") {
        list add (output, Left (word, 71) + "-")
        word = Mid (word, 72)
      }
      else {
        word = line + " " + word
        line = ""
      }
    }
    if (LengthOf (line) + LengthOf (word) > 72) {
      list add (output, line)
      line = word
    }
    else {
      line = line + " " + word
    }
  }
  if (not line = "") {
    listadd (output, line)
  }
  return (output)
</function>

The first code does this:

Error running script: Error evaluating expression 'Populate (pattern, s, "wordwrap")': String '' is not a match for Regex '^((?<line>.{1,72})((\s+|(?=<-))(?<remainder>.+))?|(?<line>.{72})(?<remainder>.*))$'
Error running script: Cannot foreach over '' as it is not a list

The second code seems to work quite well, after changing the one instance of listadd to list add. (NOTE: I do that all the time in my code, too!)


Hmm... If I wanted to put each line break in a list item by itself, I should probably split at those before anything else and figure out how to build a list with everything including the line breaks in order...


Modified version of the first one, should (unless I'm missing a typo) also handle newlines.

I'm not sure where the error came from; is it required to call RegexMatch before Populate?

<function name="GetOutputList" parameters="s" type="stringlist">
  pattern = "^((?<line>[^\\n\\r]{1,72})(([\\r\\n]+|\\s+|(?=<-))(?<remainder>.+))?|(?<line>.{72})(?<remainder>.*))$"
  output = NewStringList()
  if (s = "") {
    return (output)
  }
  while (IsRegexMatch (pattern, s, "wordwrap")) {
    parts = Populate (pattern, s, "wordwrap")
    list add (output, DictionaryItem (parts, "line"))
    if (DictionaryContains (parts, "remainder")) {
      s = DictionaryItem (parts, "remainder")
    }
    else {
      return (output)
    }
  }
</function>

Second version with linebreaks:

<function name="GetOutputList" parameters="s" type="stringlist">
  output = NewStringList()
  foreach (paragraph, Split (s, Chr(13))) {
    line = ""
    foreach (word, Split (paragraph, " ")) {
      while (LengthOf (word) > 71) {
        if (line = "") {
          list add (output, Left (word, 71) + "-")
          word = Mid (word, 72)
        }
        else {
          word = line + " " + word
          line = ""
        }
      }
      if (LengthOf (line) + LengthOf (word) > 72) {
        list add (output, line)
        line = word
      }
      else {
        line = line + " " + word
      }
    }
    if (not line = "") {
      listadd (output, line)
    }
  }
  return (output)
</function>

Hmm.

Error running script: Function did not return a value
Error running script: Cannot foreach over '' as it is not a list

I don't know. The second one works good, though.


What string are you feeding it to get that error? I assume the regex is failing to match; which I can't see how it can happen.


It says I can't post it here. Hehehe.

https://gist.githubusercontent.com/KVonGit/07ac3eb8d6dc416598bee19fae70be2c/raw/a9dcae8f89e3918a6e76525f1569816fd801f64a/jack_long_text_sample.txt


I got the second example working like I want it. The Chr(13) wasn't pushing each line break to the array separately (I guess?). I changed it to "<br/>", and success!!!

<function name="GOL2_1" parameters="s" type="stringlist"><![CDATA[
    // by mrangel
    output = NewStringList()
    // foreach (paragraph, Split (s, Chr(13))) {
     foreach (paragraph, Split (s, "<br/>")) {
        line = ""
        foreach (word, Split (paragraph, " ")) {
          while (LengthOf (word) > 71) {
            if (line = "") {
              list add (output, Left (word, 71) + "-")
              word = Mid (word, 72)
            }
            else {
              word = line + " " + word
              line = ""
            }
          }
          if (LengthOf (line) + LengthOf (word) > 72) {
            list add (output, line)
            line = word
          }
          else {
            line = line + " " + word
          }
        }
        if (not line = "") {
          list add (output, line)
        }
      }
      return (output)
  ]]></function>

I added some extra line breaks, just to test it out:

image


Awesome!

Thanks!


I added a return after the while loop in the code with regexen, and it works now, but it doesn't push each line break to the list like the other code does now.

<function name="GOL3" parameters="s" type="stringlist"><![CDATA[
    // by mrangel
    pattern = "^((?<line>[^\\n\\r]{1,72})(([\\r\\n]+|\\s+|(?=<-))(?<remainder>.+))?|(?<line>.{72})(?<remainder>.*))$"
    output = NewStringList()
    if (s = "") {
      return (output)
    }
    while (IsRegexMatch (pattern, s, "wordwrap")) {
      parts = Populate (pattern, s, "wordwrap")
      list add (output, DictionaryItem (parts, "line"))
      if (DictionaryContains (parts, "remainder")) {
        s = DictionaryItem (parts, "remainder")
      }
      else {
        return (output)
      }
    }
    return (output)
  ]]></function>

38: dull boy. All work and no play makes Jack a dull boy. All work and no
39: play makes Jack a dull boy. All work and no play makes Jack a dull boy.
40: All work and no play makes Jack a dull boy. All work and no play makes
41: Jack a dull boy. All work and no play makes Jack a dull boy. All work
42: and no play makes Jack a dull boy.

All work and no play makes
43: Jack a dull boy.

All work and no play makes
44: Jack a dull boy.
All work and no play makes Jack a dull boy.
45:
All work and no play makes Jack a dull boy. All work and no play
46: makes Jack a dull boy. "All work and no play makes Jack a dull
47: boy."

All work and no play makes Jack a dull boy. All work and
48: no play makes Jack a dull boy. All work and no play makes Jack a dull
49: boy. All work and no play makes Jack a dull boy.


The code with no regex works just right.

Here's the full game's code:
https://gist.github.com/KVonGit/e5ecf3e457884b55e209ae05c7cdbfdf

I'm back on trying to make Quest use a MORE pager instead of automatically scrolling past text the player hasn't had a chance to read yet.

When play begins, I use JS and ASLEvents to get the actual line height and the window's height, then divide to figure how many lines of text will fit on the screen before scrolling past new text. (Haven't implemented anything past getting measurements just yet.)

If I understand correctly, old terminals printed to the screen line by line like this, which must be how BASH knows when to pop up MORE once the screen fills with new text. (???)


That being said, I think "split a string at the 72nd character or at the space (or line break) before the 72nd character," wasn't actually that common.

In the days of fixed-width displays, you were more likely to find something like:

<function name="WordWrap" parameters="s" type="stringlist">
  output = NewStringList()
  while (not s="") {
    linebreak = Instr (s, "<br/>")
    if (linebreak > 0 and linebreak < 80) {
      list add (output, Left (s, linebreak))
      s = Mid (s, linebreak + 6)
    }
    else {
      space = Instr (s, " ", 72)
      if (space > 0 and space < 81) {
        list add (output, Left (s, space - 1))
        s = Mid (s, space + 1)
      }
      else {
        list add (output, Left (s, 78) + "-")
        s = Mid (s, 79)
      }
    }
  }
  return (output)
</function>

That is, copying over 72 characters blindly, and only looking for a space at the end of the line. This version looks for the first space aftercharacter 72, and if it doesn't find one sooner it takes 79 characters and inserts a hyphen. Which is a method that seems pretty common in text-heavy games on some of the old CBM systems.


Ah. Cool.

You are a fountain of wisdom, good sir.


Log in to post a reply.

Support

Forums