Accounting for interpreter support for the em dash

bg · July 8, 2024, 3:03am

I learned on another thread that you can print an em dash by saying [unicode 8212], for example:

Say "Some text[unicode 8212]and some more text.";

If I do this, will there be any Glulx interpreters that won’t display the em dash? If so, is there a way to account for interpreters that don’t? Something along the lines of “if this interpreter supports this unicode character, display the em dash, and otherwise display -- instead”?

zarf · July 8, 2024, 4:05am

In theory there’s a way to check this (a Glk gestalt selector). But in practice it’s never hooked up right.

The game hands the character to the interpreter, the interpreter hands it to the OS (or browser if it’s a web interpreter). The OS or browser tries to render the character in the current font (which may involve some amount of fallback to other fonts). There’s essentially no way to dig all the way down to the font layer and determine whether the character can “really” be displayed.

If I had known in 1997 what I know now, I would have left out the gestalt selector and just said “print the characters you want to print and have faith.”

As it happens, the General Punctuation range of Unicode (8192-8287) was one of the first groups defined. Support for that is pretty much universal for systems that support Unicode at all.

bg · July 8, 2024, 4:21am

Thank you!

It sounds like unicode 8212 is a pretty safe bet, then. (Except for systems that don’t support unicode at all–I’m not sure what systems those would be.)

Draconis · July 8, 2024, 4:51am

Nowadays it depends more on the available fonts than on the system—web interpreters use whatever the web browser has access to, and Gargoyle recently added font fallback—and it’s hard to imagine anyone not having any fonts with em-dashes. The only ones where I’d expect trouble are the ones that run in a terminal.

jkj_yuio · July 8, 2024, 3:52pm

In a slightly related question, there is also 8211 “en-dash”. They look mostly the same to me. Does anyone know the difference. ie when you’d use one and when the other. thanks.

zarf · July 8, 2024, 4:07pm

How to Use Em Dashes (—), En Dashes (–) , and Hyphens (-) | Merriam-Webster , at a quick search.

Draconis · July 8, 2024, 5:21pm

They’ll look exactly the same in a monospaced font, but an em-dash is nominally the width of an M (M —), and an en-dash is nominally the width of an N (N –). Nowadays, though, em-dashes tend to be even wider than M’s, since they look better that way.

Tl;dr em-dashes indicate a break in the flow of a sentence, en-dashes are sometimes used in ranges (pp 73–7), but most people just use hyphens for that instead. Nowadays they’re mostly relegated to LaTeX-users.

jkj_yuio · July 8, 2024, 6:25pm

OK thanks. According to the article linked, that’s what they do. "en"s are much like regular hyphens but specialised to ranges.

Dannii · July 8, 2024, 9:35pm

Em dashes are one em wide, which is a unit based on the height of the capital M, not its width. An en dash is usually half an em wide, but could be longer, and may or may not be the width of an N.

Edit: that’s not quite right. The em dash is meant to be one em wide, and the em is based on the height of the font. But the em height is not the height of an M (the cap height), but more like the line height, but maybe excluding leading. So an em dash could be up to 40% longer than the height of an M, and even more so than its width! (depending on whether it’s a portrait or landscape M)

In any case, any font can break the ‘rules’, and the em dash here in the forum’s font doesn’t look one em wide (and it’s even narrower on mobile)… —

bg · July 8, 2024, 9:49pm

Thanks, everyone!

Mewtamer · July 9, 2024, 7:17pm

This thread is reaffirming my stance to just stick with ASCII characters where ever possible.

jkj_yuio · July 10, 2024, 3:10am

from my investigation of en dash, a regular ASCII hyphen does the same thing. And em dash was acceptability approximated by two regular hyphens.

I have a converter that changes my, easier to write, ASCII into fancy chars. Things like fancy open and close quotes and corresponding apostrophes. there’s no real need to author with them.

Mewtamer · July 10, 2024, 5:03am

Funnily enough, I have a bash script for converting UTF-8 encoded text files to ascii.