Best way to specify tagged unions in Glk extension

Dannii · December 14, 2022, 12:07am

I’m designing a Glk extension that will allow for arbitrary CSS to be specified, mostly to help with formatting of VMs like TADS, but possibly also for authors, depending on the security implications. The details don’t matter at this stage, I’ll post about it more at a later time. For now I’m wondering what the best API design would be.

The basic API would be like this:

glk_set_css(property, value);

But the issue is that the property could be one of four things: ASCII buffer (with length), ASCII null-terminated, UTF-32 buffer, UTF-32 null-terminated. The value can be all of those four, plus also an uint32. If this was Rust or Typescript then I’d represent this as an enum/tagged union, but as a C API it’s harder.

The traditional Glk way would be to have separate functions for each way of providing the arguments, but that would be 20 functions just for this one API (4 types of properties times 5 types of values). There’s also have to be 4 functions for the corresponding glk_clear_css, and I also intend to have a glk_set_css_stylehint for adding extra stylehints before a window is created. All in all I think there would be at least 48 functions!

But there are other ways I could design the C API. So I’d like your advice on what to do, from the perspective of a user of the API in either C or Inform 6, or from the perspective of the Dispatch system, which I’ve never really looked at the details before. Some of these options may not really be feasible in the current Dispatch system.

Here are the options I can think of:

The full explicit API as described above, with a unique C function for each parameter combination. (48 functions total)
Have different functions for each type of parameter, but don’t allow mismatched types, so that the property and value must both be a ASCII buffer or both UTF-32 null-terminated etc. But also have separate functions for uint32 values. (24 functions total)
The same as 2 except that uint32 values are scrapped and must be represented as strings instead. (16 functions total)
Use a C struct for each parameter, so that they each carry their own type with them. (4 functions total) (Whether a true C union struct or just a Glk struct with 3 members for type, addr, len I’m not sure.)
Specify the type in the function signature: glk_set_css(property_type, property_addr, property_len, value_type, value_addr, value_len); (The len arguments would only be for buffers.) (4 functions total)

The stylehint API would look like glk_set_css_stylehint(wintype, stylenum, par_or_char, property_type, property_addr, property_len, value_type, value_addr, value_len), 9 parameters long. I’m not sure if that would be a problem.
As unicode values won’t be very common (and probably there would never be unicode property names?) the API could just be restricted to ASCII, with JSON style escapes if any unicode is necessary. Helper functions can be used to convert any unicode prior to the Glk function being called. For #1 this would bring the function count down to 16, for #2 it would bring it down to 12, for #3 down to 8.
Be even more restricted and only allow ASCII buffers (or only ASCII null-terminated strings, I don’t know which would be easier for users). (6 functions, or 4 if no uint32s)

Edit: Oh no, I forgot that I’d also been thinking of an API function for testing whether the interpreter supports the requested style (because the plan is that the new functions could be used outside of HTML-based interpreters, where they might only support a small number of properties.) I’m not sure if a measure function would also be helpful. So, just approximately double all the function counts above.

Zed · December 14, 2022, 12:57am

This sounds most palatable to me.

Draconis · December 14, 2022, 4:00am

Only allowing a single type seems reasonable, for the sake of keeping the API small. And even if standard CSS uses only ASCII in its names, is there much reason to not support Unicode in this day and age?

Dannii · December 14, 2022, 5:50am

Well C has historically been not very unicode friendly. It’s probably better today, and anyone compiling against a Glk library with these extension functions would no doubt be using a modern compiler. But ASCII/Latin-1 is still a bit simpler.

UTF-8 could even be an option, as for most CSS values there’d be no difference between ASCII and UTF-8, but the rest of Glk doesn’t use any UTF-8 so there’d be a big difference between these functions’ char *buf and the rest of the Glk API, so I didn’t consider it. But it’s an option too.

We could add UTF-8 en/decoding functions to Glk to make it easier for Inform.

cas · December 27, 2022, 12:42am

I think this is the best approach, though that’s from a “lazy unix developer” standpoint. On the Unix side of things, at least, UTF-8 is effectively the de facto standard for text encoding. Like you note, it’s got the advantage of transparently working with ASCII. Combined with its ubiquity, I don’t have any problems putting the onus on the API user to convert from a “weird” encoding to UTF-8, given that what they have will probably be compatible directly anyway.

Character set encoding/decoding in C is still less than an afterthought. There’s no standard facility for managing character sets, and the standard doesn’t impose any requirements at all (on specific character sets). So unfortunately, C’s still basically useless (by itself) when it comes to Unicode. What you really have to do is just take a char * and tell the user what encoding you expect. And you may as well tell them it’s UTF-8, as it’s the best we have, IMO.

I would have zero problems with this.