Do any rex wizards want to give me the recipe for how to find the first alphanum/punct character in a string that is not enclosed in a tag of some kind? And correspondingly, the last character?
These latest attempts are still failing…
lastWordEnd = static R'<AlphaNum|Punct>%s*((<langle>.+<rangle>)%s*)*$'
firstWordStart = static R'%s*((<langle>.+<rangle>)%s*)*<AlphaNum|Punct>'
In this nonsense string
' <.p> <.roomname>abc def<.p>lm <.p> '
it is finding the 'l' to be the first printable letter,
and the 'f' to be the last printable letter.
I might be able to tinker my way through this(?) but if anyone can see an obvious fix, that’d be much appreciated!
I’m by no means a wizard. A quick hack would be something like this:
local trimmed = rexReplace(R'<langle><^rangle>*<rangle>', str, '');
The regex searches str for runs of characters starting with a left-angle and ending with a right-angle. Those runs are replaced with an empty string. The result is all the characters outside the tags.
It’s a coarse expression. Even invalid HTML will be trapped. Hopefully this gets you started the right direction, though.
Edit: So, to answer the question, you would combine the above with this:
local first = rexSearch(R'<alphanum|punct>', trimmed);
local last = rexSearchLast(R'<alphanum|punct>', trimmed);
Thanks Jim! My application actually needs to leave the tags and instead find index numbers, but the thing I was missing was to exclude the right angle bracket within my tag expression. When I added ‘[^>]’ it fixed it, so you had the right idea!