Story description and unicode

I’d like to include an em-dash (U+2014) in my story description, but Inform is flattening it to an ordinary (ASCII) dash (I checked, it really is ASCII and not some weird formatting thing). I think that under the Treaty of Babel, which states that iFiction is a UTF-8 format (Section 5.2, “Encoding”), it should be possible to encode any Unicode character into such bibliographic data. So how do I do it? I can’t use “[unicode 1234]” because that doesn’t get substituted at all.

There are a number of bugs in the processing of the metadata lines. This may have been fixed as part of bug 553 (fix not yet released) but I’m not positive.

inform7.com/mantis/view.php?id=553

I must wonder, however, how it managed to take an em dash and produce a “hyphen minus” (ASCII dash) without human intent. I suppose I’ll just chain 3 dashes and/or use the babel metadata editor to fix it after release (which is a long way away, so I’ll make a note of it).

There’s obviously some (wrong) level of charset filtering going on in the current compiler. I tried a test case; it left “ö” alone, converted “—” to a hyphen, and “€” wound up as “[unicode 8364]”.

Writing with Inform, section 5.10: “As it reads in the text, Inform silently converts all kinds of dash (en-rules, em-rules, etc.) to simple hyphens, all kinds of space other than tabs (em-spaces, non-breaking spaces, etc.) to simple spaces, and all kinds of quotation marks to “straight” (non-smart) marks.”

Makes me think that it was originally intentional, maybe for compatibility reasons? It’s always bugged me, though.

This is why many, me included, are forced to use the double dash (–) in place of the extralong one (—).
So, is there a way around this?

One of my pet peeves is the use of typewriter conventions such as double hyphens in place of em-dashes, and single hyphens in place of en-dashes and minus signs. It pains me every time I have to do that in Inform 7. It is ironic, indeed, that technology has completely supplanted the typewriter (to the point where a large percentage of the population has never seen one except perhaps in movies), yet typographical relics which came into existence solely as a result of the mechanical limitations of the typewriter continue to plague us.

Robert Rothman

You can always use “[unicode 8212]” in game text. It’s only in the metadata lines (such as story title and story description) that this poses problems.

Gargoyle systematically displays “–” as a single en-dash (not the wider em-dash). Letting the interpreter handle it is not ideal, but it does have one advantage; it works for every game.

If it were up to me, I’d drop this transform on dashes, but keep it for spaces. (Whitespace in source code can contain all sorts of garbage that the user almost never intends to carry through to the game, and which interpreters almost never have consistent display rules for.)

Quotes are tricky because I7 already rewigs them from single to double (according to a complex but usually successful set of rules) (which are easy for the user to override by saying [']). Trying to preserve curly-vs-straightness through this process would probably be a disaster. I can see an argument for the compiler generating curly quotes all the time, and also an argument for the interpreter curlifying them (as Gargoyle does).

Gargoyle will convert “—” to an em-dash.

Ben, it seems like it would be productive to change Gargoyle’s behavior so that the double hyphen is expanded to the em dash. The double dash is pretty much universally used to stand in for the em dash–know what I mean?–whereas most people would never think to use a triple dash for this purpose. (Moreover, since the triple dash isn’t so converted by other interpreters, it would look silly anywhere but in Gargoyle.)

I’m not sure there is a need for en dash conversion. If you need to use an en dash in standard English text, you should probably rewrite so you don’t need it (it’s generally used for complicated compound modifiers). It might be safe, though, to autoconvert a single hyphen to an en dash when it is surrounded by spaces. This would accord with 19th century style typography as well as the use of the en dash in printed equations, e.g. 9 - 5 = 4.

I agree with ektemple that en-dashes are not likely to come up that often in IF. Basically they are used in two situations. One is where there is a reference to a range of numbers, such as in a footnote which refers to “pp. 2[en-dash]10”. Obviously this is not common in fiction, interactive or otherwise. The other situation is where one would ordinarily use a hyphen, but the material on one (or both) sides of the hyphen consists not of a single word but of a multi-word phrase. Again, not all that common (although I’ve used it in technical articles, invariably leading to a conversation with an editor who does not understand proper typography).

As for numerical equations, a minus sign is different than either an en-dash or a hyphen.

While it would be nice to be able to render en-dashes and minus signs in I7, I would be wary of any system which tried to do that conversion automatically (analogous to an automatic conversion of a double hyphen to an em-dash). My sense is that, absent manual intervention, it would be difficult to identify whether the author intends a hyphen, an en-dash or a minus sign. In particular, where the (unconverted) text reads, for example “2-10” it is not apparent on its face whether the reference is to a range of numbers (in which case the proper character is an en-dash) or to a numerical expression “two minus ten” (in which case the proper character is a minus sign).

Robert Rothman

Very true; they are often the same for a font that provides both (which is why I suggested that), but they are not necessarily equivalent–and certainly aren’t logically equivalent.

Wait a minute, aren’t en dashes used for numerical ranges (e.g. 6–10 bananas)? It is also typographically correct to translate “This is a sentence – with a parenthetical.” into an en dash (and not an em dash), since there’s spacing around the “–”; you don’t put spacing around an em dash. Using two hyphens as an en dash (and 3 for an em) is consistent with how TeX renders them, and more flexible than just flattening “–” into an em dash directly. So IMHO, Gargoyle does things correctly as-is.

En-dashes are correctly used for numerical ranges. It is also correct that spaces are not used around em-dashes (although I must say that to my eye, this looks rather strange in many fonts).
However, it would not be correct to use an en-dash in “This is a sentence – with a parenthetical.” A break in the flow of text, as in this context, properly takes an em-dash, not an en-dash. Putting spaces around it doesn’t change that, it just adds a second typographical error to the first one.

As to what conventions might be used to translate regular keyboard input into the proper characters, that’s a different issue, and one as to which I don’t think there is any absolute right or wrong. There’s no reason that a particular program can’t adopt a convention which basically says to the writer, “if you want an em-dash, type three consecutive hyphens, and if you want an en-dash, type two consecutive hyphens.” I was not aware that one could use that convention to generate em-dashes in I7, although if I understand correctly what you’re saying it doesn’t work with all interpreters. I’ve been using the old typewriter convention of typing two hyphens in lieu of an em-dash, which, with the interpreter I use, gets rendered as typed, rather than translated either into an (incorrect) en-dash or a (correct) em-dash.

Robert Rothman

Yes, an en dash is used for numerical ranges, as Robert pointed out above. I’m not sure that would be a comfortable context for auto-substitution, though, since there are situations (e.g. a serial number) where you might want mere hyphens: Take “I am Hydrobot serial number A1-24-Z2”, for example–if we assumed that a hyphen surrounded by digits should be an en dash, one of the hyphens in that serial number would be replaced by an en dash, the other wouldn’t be.

TeX should not be a model, as it does not reflect how people use double dashes in the real world; it’s a text-rendering environment that’s full of artificial symbols. No one other than a TeX user is going to use triple dashes naturally. (As I mentioned in my earlier post, space en-dash space is 19th-century style typography; any modern style guide will tell you to use an em dash–without spaces–instead.)

The triple dash substitution will simply not be used by IF authors unless (1) all other interpreters follow the same convention, or (2) the author wants to target only Gargoyle. It seems clear to me that Gargoyle might as well not be offering this feature at all if it uses the triple dash.

Given that a double dash is the most conventional typographic substitution for an em dash (spaces surrounding a hyphen being the next most common), the fact that Gargoyle replaces the double dash with an en dash is just bad design and confusing to boot.

I agree that the triple dash conversion probably doesn’t see much action. I mentioned it merely to point out that the feature was there; it would be problematic for authors to depend on this as a convention.

I’ve changed the default behavior to do two levels of dash replacement instead of three, and added a configuration option to the .ini file to control this:

code.google.com/p/garglk/source/detail?r=599

After reading through the arguments in the Dash article on Wikipedia, I decided someone would be unhappy regardless of what I picked. I like TeX and I am partial to the TeX usage, but I don’t think it makes sense as the default behavior.

I am not prepared to tackle the question of spaces around dashes at this time. Properly supporting the correct typographical behavior would require some changes in the way Gargoyle breaks and justifies lines. (Patches welcome!)

That’s a matter of opinion, though for US usage you’re mostly right—but Inform defaults to UK usage, which might prefer a spaced en dash there (or, bizarrely enough, a spaced em dash), depending on who you ask.

Hmm, I never knew that UK style was different in this regard. I was just going by Chicago.

Now I’ll be able to use a new variant on the old joke:

[Please mentally substitute em-dashes for the double hyphens above, in accordance with US style.]

Robert Rothman

Getting back to the topic:

As the dash conversion is spec (WI 5.10) I think it’s unlikely that they’ve “fixed” it in any newer version of Inform. Furthermore, WI explicitly says that you can’t use text substitutions in bibliographic metadata.

However, I feel that [unicode 1234] has a sufficiently well-defined meaning that it makes more sense to allow it than to ignore it (and even if the I7 people disagree, we still need some way to put dashes in), so I’m reporting this as bug 913

Thanks, Ben!

(And apologies to Britishers for forgetting that space+en+space is not 19th century practice for you!)