Rex help

Do any rex wizards want to give me the recipe for how to find the first alphanum/punct character in a string that is not enclosed in a tag of some kind? And correspondingly, the last character?
These latest attempts are still failing…

    lastWordEnd = static R'<AlphaNum|Punct>%s*((<langle>.+<rangle>)%s*)*$'
    firstWordStart = static R'%s*((<langle>.+<rangle>)%s*)*<AlphaNum|Punct>'

In this nonsense string
' <.p> <.roomname>abc def<.p>lm <.p> '
it is finding the 'l' to be the first printable letter, 
and the 'f' to be the last printable letter. 

I might be able to tinker my way through this(?) but if anyone can see an obvious fix, that’d be much appreciated!

1 Like

I’m by no means a wizard. A quick hack would be something like this:

local trimmed = rexReplace(R'<langle><^rangle>*<rangle>', str, '');

The regex searches str for runs of characters starting with a left-angle and ending with a right-angle. Those runs are replaced with an empty string. The result is all the characters outside the tags.

It’s a coarse expression. Even invalid HTML will be trapped. Hopefully this gets you started the right direction, though.

Edit: So, to answer the question, you would combine the above with this:

local first = rexSearch(R'<alphanum|punct>', trimmed);
local last = rexSearchLast(R'<alphanum|punct>', trimmed);
2 Likes

Thanks Jim! My application actually needs to leave the tags and instead find index numbers, but the thing I was missing was to exclude the right angle bracket within my tag expression. When I added ‘[^>]’ it fixed it, so you had the right idea!

The final expressions would be

    lastWordEnd = static R'<AlphaNum|Punct>%s*((<langle>[^>]+<rangle>)%s*)*$'
    firstWordStart = static R'%s*((<langle>[^>]+<rangle>)%s*)*<AlphaNum|Punct>'
2 Likes