Dialog syntax highlighting

Draconis · February 4, 2025, 2:35am

I threw together a really quick syntax highlighter in Pygments.

Sadly, I have no way to incorporate this into any of the editors I use. But it should make it easier to release Dialog code in an easy-to-read way.

Draconis · February 4, 2025, 2:38am

Current version:

WORD = r'[^\*\%\#\$\@\(\)\[\]\{\}\|\s]+'

# TODO: slashes in rule heads, ~ in rule heads and bodies, * in nested rule heads

class DialogLexer(RegexLexer):
	tokens = {
		'everywhere' : [ # The only thing that's not context-dependant in the whole language: comments
			(r'\%\%.*', Comment.Single),
		],
		'literals' : [ # Literals of all sorts
			(r'[ \t]+', Whitespace),
			(r'\#[-\<\>\w]+', Name.Constant.Object), # Object `#word`
			(r'\$[-\<\>\w]+', Name.Variable.Named), # Variable `$Word`
			(r'\$', Name.Variable.Anonymous), # Anonymous variable `$`
			(r'\@[-\<\>\w\\]+', String.Symbol.At), # Dictionary word `@word`
			(r'\*', Name.Builtin.Topic), # Current topic `*`
			(r'0\b|[1-9]\d*', Number), # Number (without leading zeroes)
			(r'\[', Punctuation.List.Start, 'list'), # List literal [...|...]
			(r'\{', Punctuation.Block.Start, 'body'), # Closure literal {...}
		],
		'list' : [ # List literal
			include('everywhere'),
			(r'\\\S', String.Escape.Bare), # Escaped punctuation
			include('literals'),
			(r'\|', Punctuation.List.Mid), # List continuation marker
			(r'\]', Punctuation.List.End, '#pop'), # End of list
			(WORD, String.Symbol.Bare), # Bare dictionary word `word`
		],
		'root' : [ # Main file
			include('everywhere'),
			(r'\n', Whitespace),
			(r'[ \t]+', Whitespace, 'body'),
			(r'\@?\(', Punctuation.Head.Start, 'rulehead'),
			(r'\#[-\<\>\w]+', Name.Constant.Topic),
		],
		'body' : [ # Anything inside a rule or closure
			include('everywhere'),
			(r'\\\S', String.Escape.Double), # Escaped punctuation
			(r'\n[ \t]+', Whitespace), # Don't pop if we see a newline with whitespace after it
			include('literals'),
			(r'\n', Whitespace, '#pop'), # Do pop if we see a newline *without* whitespace after it
			(r'\(', Punctuation.Pred.Start, 'predicate'),
			(r'\{', Punctuation.Block.Start, 'body'), # Blocks `{...}`
			(r'\}', Punctuation.Block.End, '#pop'),
			(WORD, String.Double), # Bare words are strings to print
		],
		'rulehead' : [ # Rule head
			include('everywhere'),
			include('literals'),
			(r'\n', Whitespace, '#pop'), # Should never happen, but it's good to be able to recover from errors
			(r'\*?\(', Punctuation.Pred.Start, 'predicate'), # Nested predicate
			(r'\)', Punctuation.Head.End, ('#pop', 'body')), # Anything else on the same line as the rule head is taken as body text
			(WORD, Name.Function.Declare), # Bare words are part of the pred name
		],
		'predicate' : [ # Predicate invocation
			include('everywhere'),
			include('literals'),
			(r'\n', Whitespace, 'root'),
			(r'\)', Punctuation.Pred.End, '#pop'),
			(WORD, Name.Function.Use), # Bare words are part of the pred name
		],
	}

hlship · February 4, 2025, 2:47am

I think I can adapt this for use in the documentation.

andrewj · February 4, 2025, 3:17am

Here is my vim syntax file for dialog. It’s been through a few major revisions and is working pretty well now, but lacks some keywords.

dialog_vim.zip (1.5 KB)

Obligatory screenshot:

jwalrus · February 4, 2025, 6:06am

Might want to make it an optional toggle or something, if we’re going with respecting the author’s original vision:

https://www.linusakesson.net/programming/syntaxhighlighting/

(Edit: can’t figure out why Discourse isn’t oneboxing the link, but the title of the linked page is “A Case Against Syntax Highlighting” by Linus Akesson)

jwalrus · February 4, 2025, 7:18am

To add to the above: I don’t know if I agree with Linus’ thesis that syntax highlighting is actively harmful, but it did prompt me to reflect that I find it much more useful when writing code (when it will reveal errors such as mismatched brackets and unterminated strings) than when reading it. I never had difficulty parsing the code in the Dialog docs even as a novice (you can recognise words beginning with # or $ without syntax highlighting because they begin with # or $!).

The two cases I can think of where syntax highlighting in the docs might improve understanding for a relatively new Dialog user:

Making rule heads a different colour to rule bodies
Making sure that dictionary words inside lists (with or without @) appear in the same colour as dictionary words outside lists (with @)

Pacian · February 4, 2025, 8:38am

In Dialog is it “else if”, “elif” or “elseif”? Oh, the last one went purple, that’s it then.

Although I agree that it’s more useful when writing than reading, looking at my current file in VSCode using the Dialog extension by sideburns3000, I feel that it does help readability to see Dialog code, comments and literal text in three different colours.

averyhiebert · February 4, 2025, 8:41am

I definitely strongly disagree with that argument against syntax highlighting, but going into that further stands a strong chance of totally derailing the thread.

As far as the Dialog documentation is concerned, I don’t see why we should elevate the author’s personal opinions on the broad concept of syntax highlighting to the level of “original vision for the Dialog software”. But I also agree that it doesn’t matter as much for the documentation, where you’re just reading (as opposed to writing). EDIT: And making syntax highlighting optional is probably better from an accessibility perspective, as I can definitely understand why some people would find it distracting.

Draconis · February 4, 2025, 4:34pm

I’d almost forgotten about that. So that’s why the compiler code is almost entirely uncommented and undocumented—to help people understand the code by making them read it instead of just looking at the comments!

Snark aside, I think the syntax highlighting here serves a useful purpose. There are a few different things a bare word can mean in Dialog source: inside parentheses it’s part of the predicate name, inside brackets it’s a dictionary word, after %% it’s a comment, and anywhere else it’s a string to print. When you’re searching for occurrences of a word—e.g. when I was changing my American spellings to British ones in Miss Gosling—it’s nice not to have to spend the extra seconds figuring out which one each usage is. That’s generally less of a problem in languages like C, but Linus specifically calls out the one place it is: inside multiline comments.

That’s also one reason I’ve found syntax highlighting incredibly useful while working on the Dialog compiler. Linus likes to give variables the same name as their type (e.g. an array of datatables named datatable; C is okay with this), and syntax highlighting tells me whether the compiler is going to read sizeof(datatable) as the variable (getting the size of the whole array) or the type (getting the size of an individual struct).

dragonlurk · May 4, 2025, 8:31pm

Hi, making a post to inform people that I’ve written a tree-sitter-dialog module to enable syntax highlighting in helix. I believe it should also work in other places too, but I have only confirmed it works in helix.

A big thanks to @Draconis for sharing the Pygments snippets! I was able to reuse a good number of the regexes in the tree-sitter-dialog module.

Unfortunately I can’t share a link to it at the moment as I’ve just barely created this account, but I’ll make a new post or edit this one when I’ve gotten to a high enough trust level to add a link.

EDIT: The restrictions have been lifted, so here’s a link: desttinghim/tree-sitter-dialog - Codeberg.org

And a picture for good measure: