I’ve mentioned before my plan for adding arbitrary CSS support to Parchment, I’ve now written it up as a formal proposal:
While most of this is already supported on the GlkOte side (and by my additions to the GlkOte protocol), I haven’t added any of the functions to GlkApi yet. As I am still unsure about the security implications of allowing arbitrary CSS, I am thinking that this will only be enabled for custom Parchment installs (such as the Inform 7 template or the sitegen service.) I do want to enable it in all versions of Parchment eventually. And subsets of CSS may be supported by other non-HTML interpreters too; my hope is that we can use this API to support more of HTML TADS.
So it’s based on buffers, or arrays with lengths (not NULL terminated arrays). You can easily define these in I6, so if you didn’t need to worry with dynamic properties, you could do this:
If you haven’t seen them before, a buffer array in I6 defines one word with the length, followed by the bytes. So we pass css_text_decoration + WORDSIZE to point to the bytes of the text buffer, and css_text_decoration-->0 to get the length of the buffer. (In this example the arrays could also be specified as static.) Don’t want a plain underline? Why not try “green wavy underline”!
Obviously dynamic properties and values is more complicated, as is converting from I7 texts to I6 buffers.
This isn’t part of the proposal, but it wouldn’t be hard to allow for more than the 11 styles supported by Glk. Only question is should style 11 be “user3” or “user11”? I lean towards the latter.
The main use I can see of with new styles is colors. Let’s say you have a game with color-coded objects; with your current setup, would each color be its own ‘style’, or is there separate color code?
If you need a different style for each color, I’d advocate for many styles. If not, I think that a few new styles are good, and that that numbering system seems good to me (if it corresponds closely to what’s used ‘under the hood’).
For that use you’d probably just use inline CSS (like you can with the Garglk extensions).
Extra styles would be for when you regularly need more user defined styles than just user1 and user2. Right now you can also use alert or note and hope that you can cancel out any default stylehints the interpreter has, but you can never be sure that you have. It would be better to have user11 and so on.
Two questions I asked myself that may be worth clarifying:
Is it possible for a Glk implementation to support this feature but not the UTF-8 encoding/decoding? If yes, what is a game author using non-ASCII CSS supposed to do in that case (assuming they are a good citizen and check all the gestalts)? If not, please make that explicit so everyone is on the same page.
What’s the interaction and/or precedence between (individual properties coming from) span style hints, paragraph style hints, and inline CSS when multiple apply to the same piece of output? I suppose the answer is “whatever CSS does in that situation” but from my superficial knowledge of CSS and Web Glk implementations, it’s not obvious to me what that will end up as.
(I also have some thoughts on the UTF-8 extension per se, but that feels off-topic here. Is there a thread for that extension I can necro?)
Well I guess it’s possible, but I don’t expect anyone would implement this without the UTF-8 functions. The UTF-8 conversions will be almost be almost free to implement considering you need UTF-8 support for the file functions anyway.
Stylehints and window paragraph styles should be equivalent precedence. Then window span styles. Then inline styles. The proposal should specify these precedence rules. If you use the selector functions the precedence would have to be whatever the natural CSS precedence rules say.
Though I just realised that I had meant for there to be inline paragraph styles, but the API doesn’t support them yet. I’ll have to add that in.
There’s no other thread about the UTF-8 proposal so you can just comment here.
This makes sense to me. If the “CSS yes, UTF-8 no” case is practically irrelevant, perhaps the spec should just rule it out? There’s a bit of precedent of one gestalt implying others (the most clear-cut example is Sound2), and it would free authors from checking a second gestalt and having to worry about what to do in the “CSS yes, UTF-8 no” case.
On to the UTF-8 conversions proper. The proposed spec address too-small destination buffers, but what about other errors? There’s a lot of ways for byte sequences to be invalid UTF-8, and in the other direction, 32-bit integers >= 221 can’t be encoded in UTF-8. There’s also some integers that could be encoded but according to the standards must be rejected because they’re not valid code points: low and high surrogate halves, as well as U+110000 and anything larger (which will never exist because UTF-16 couldn’t handle them). The strictly correct behavior would be to reject all these cases as errors; any API that decodes UTF-8 could alternatively use replacement characters (I suppose this aspect is also relevant for the CSS feature).
From a quick spot check, I don’t think current Glk implementations agree on what subset of these errors they detect and how exactly they handle them. Perhaps that is an argument for the spec staying silent on it? These are not entirely new issues, but more APIs using UTF-8 might mean there’s more ways to stumble over it.
For handling UTF-8 errors I think there’s a simple solution: change the functions to return a glsi32, and then return -1 if there is an encoding/decoding error. I don’t think we need to distinguish between the types of errors. Do you think that would be adequate?
Yes, I don’t think distinguishing different kinds of encoding errors is useful or actionable. In contrast, “the output buffer is too small, it needs to be at least this big” is very actionable (just try again with a larger buffer). For encoding/decoding errors, you usually only care whether there are any, not what they look like exactly or where they occurred, outside of low-level APIs only used by experts.
However, note that encoding errors can occur in all functions that deal with text, not just the conversions between UTF-8 and other encodings. For example, what should glk_css_inline_set and friends do if the property is not valid UTF-8? For that matter, what happens if you write invalid unicode (not possible to encode into valid UTF-8) to a text file stream?
The two main options are:
Encoding errors are errors, the program shouldn’t continue as if nothing happened. This can be important for detecting logic bugs and to avoid silent data corruption. It’s also sometimes relevant for security: if untrusted input is first “sanitized” with one way of handling/ignoring invalid encodings, and then passed to another component that handles invalid encodings differently, this may subvert whatever rules the sanitizer was intended to enforce.
Encoding errors are unfortunate, but should not stop the human at the other end from getting mostly-legible text output. So you throw out the invalid bytes or “code points”, insert a replacement character or several, and soldier on.
For explicit conversions from one encoding to another, most programming environments offer both options in some form(s). The CSS use case probably falls more into the first bucket: encoding errors are unlikely to result in valid and meaningful CSS, so it’s better to catch them early than to having to make sense of whatever the browser(s) do with the end result. But I/O with text intended for end users will often want the latter behavior: something like a transcript should not be stopped dead in its tracks just because the game tried to output invalid unicode at some point.
I don’t think an epic side quest to nail down these questions for all Glk APIs would be a good use of anyone’s time. But for new functions introduced now and in the future, it’s worth spending a little thought on it. It may make sense for the CSS extension to not validate anything and just throw the allegedly-UTF-8 strings directly at the browser, but this should be a conscious decision IMHO. And for the explicit conversions to/from UTF-8, I think both options (errors reported as you described, and replacement characters) make sense, either as two sets of functions, or as an extra flag passed to the proposed functions.