(I wasn’t sure quite what category to put this in. Technically it’s for a Dialog work but it’s the JavaScript I’m struggling with, and this forum is a lot kinder to amateurs than Stack Overflow.)
I’m trying to write a regex to recognize words. A word consists of any number of letters, and any number of separators, as long as they appear between letters. For example, example.com.
should match example.com
, including the dot in the middle, but not the dot at the end.
This seems like it should be easy enough, right? Something like ([letter]+[sep])*[letter]+
should do it, I think.
Except, these words aren’t in English, which means some of the “letters” aren’t ASCII. Which means I can’t use built-in classes like \w
to recognize them; JavaScript doesn’t consider á
to be a letter.
Is there some easy way to define the “letter” class only once, so that I don’t need two copies of the whole alphabet in the regex? Duplicating the whole thing seems like a recipe for disaster if I have to edit them both later. In Python I would define a string containing that class, and then interpolate it into the regex, but in JavaScript regexes aren’t strings so I don’t know how to interpolate into them.