How do I make my own regex character class?

(I wasn’t sure quite what category to put this in. Technically it’s for a Dialog work but it’s the JavaScript I’m struggling with, and this forum is a lot kinder to amateurs than Stack Overflow.)

I’m trying to write a regex to recognize words. A word consists of any number of letters, and any number of separators, as long as they appear between letters. For example, example.com. should match example.com, including the dot in the middle, but not the dot at the end.

This seems like it should be easy enough, right? Something like ([letter]+[sep])*[letter]+ should do it, I think.

Except, these words aren’t in English, which means some of the “letters” aren’t ASCII. Which means I can’t use built-in classes like \w to recognize them; JavaScript doesn’t consider á to be a letter.

Is there some easy way to define the “letter” class only once, so that I don’t need two copies of the whole alphabet in the regex? Duplicating the whole thing seems like a recipe for disaster if I have to edit them both later. In Python I would define a string containing that class, and then interpolate it into the regex, but in JavaScript regexes aren’t strings so I don’t know how to interpolate into them.

1 Like
const cc = "[a-z]"
const sep = "[-.]"
const r = new RegExp(`(${cc}+${sep})*${cc}+`)

Might /\p{Alpha}/u to match everything with the Unicode property alphabetic be useful for your purposes?

7 Likes

Aha! That is, I think, exactly what I need! Thanks!

1 Like