I had the idea to use ^L, the Form Feed character, commonly used as a page separator, to add some structure to my source files. Dialog gives the “Ignoring control character” message, but it would be nice to treat this as whitespace instead. Could be as simple as an extra else if in lexer_getc to return 0x20 if it sees 12.
There’s an open feature request to treat U+00A0 (non-breaking space) as whitespace as well; it shouldn’t be hard to add U+000C at the same time. Are there any other characters that people commonly use as whitespace?
I specifically don’t want to commit to handling every Unicode whitespace character, because I worry about edge cases, and Dialog currently doesn’t rely on any external libraries for Unicode support. But dealing with the more common ones seems reasonable enough.
This is what I use in various projects:
bool char_is_space (uchar_t ch)
{
// note that `NUL` is a control char, hence it is considered
// whitespace here.
if (ch == ' ') return true;
if (ch == 0x7F) return true; // DEL
if (ch == 0xA0) return true; // nbsp
if (ch < 0x0020) return true; // C0 control chars
if (0x0080 <= ch && ch <= 0x009F) return true; // C1 control chars
if (0x2000 <= ch && ch <= 0x200B) return true; // various
if (ch == 0x2028) return true; // line separator
if (ch == 0x2029) return true; // paragraph separator
if (ch == 0x205F) return true; // mathematical space
if (ch == 0x3000) return true; // ideographic space
return false;
}
Assuming that you are doing this because you are using Emacs, the page delimiter is configurable. You could just use the same convention that the standard library uses (a full width sequence of % characters) and then navigate/narrow based on that:
(add-hook 'dialog-mode-hook
(lambda ()
(setq-local page-delimiter
(concat "^" (make-string fill-column ?%)))))
(If you modify the value of fill-column using the same hook then you just need to make sure that setting the page-delimiter value runs after that.)