Babel for HTML

Dan Fabulich pointed out that the current babel tool returns “twine” as the format of Twine files, which isn’t really right. The doc says that the <format> tag (and the output of babel -format) should describe the file format, not the dev tool. For a web-playable HTML file that should be “html”. (An old thread discussing this is here: HTML in Babel)

Changing this means turning the Twine-parsing module in babel into an HTML parsing tool. However, that means we should think about how a general HTML IF game should express its IFID. Twine games have a <tw-storydata ifid="..."> tag, but it doesn’t make sense to demand that every HTML game use a Twine-specific tag like that.

Dan’s suggestion is straightforward: put a <meta> tag in the header.

<meta property="ifiction:ifid" content="...">

This is similar to the way web pages declare bibliographic info like the title:

<meta property="og:title" content="...">

(There’s some additional declaration you need to establish the ifiction: prefix, but I won’t get into it here.)

The idea is that ChoiceScript games, hand-rolled HTML IF, and so on could provide this <meta> tag as an aid to bibliographers. Future releases of Twine tools would also add this tag. (Twine should continue to use <tw-storydata> as it does now, too – no reason to give that up.)

Make sense?

7 Likes

That sort of makes sense, but it doesn’t help with the huge number of existing files that are out there.

I’m also interested in the extraction of the IFID from Adventuron games. All properly written Adventuron games have an IFID, but it is compressed/scrambled/encoded in some way so that you can’t see it in the compiled html file. Would it be possible to add extraction of the Adventuron IFID to the current babel tool? I’m sure @adventuron could provide details on how to find and decode it.

2 Likes

I’ve also seen HTML games that just put the UUID:// ... // format IFID in an HTML comment, which has the virtue (I think?) of working with the current tooling.

Edit: Ah, looks like it’s Tweego which does that (though of course it also has the attribute on tw-storydata).

The general rule for legacy games (no IFID included) is you compute the MD5 checksum and then take “HTML-checksum” as the IFID. We were already doing this for old Twine games. (Although it was “TWINE-checksum”, which we’re proposing to change, which is a slight inconsistency, but oh well. It was only inconsistent for part of this past year.)

Good point. We can check for that too, but I think <meta> is a better policy going forward.

1 Like

Link me a game that does that, please?

As I said in my edit, it seems to just be a thing that Tweego does. Trying to think if I know of anyone who’s made actual games with it? Oh, there’s my little thing for Fortnightly Fiction Jam #8. But itch puts things in iframes so they’re a nuisance to download…let me put the actual html file somewhere you can get it directly.

This is the unofficial advice that ifwiki gives about what to do.

For example, in a Undum, ChoiceScript, or other Web browser game, you may insert

<!-- UUID://XXXXXXXXXXXXXXXX// -->

as a comment anywhere in your HTML file.

By putting UUID:// in front of your IFID, and // after your IFID, you make it possible for automated tools like Babel to find it.

https://ifwiki.org/index.php/IFID

2 Likes

So, if I get this right, UUID is created by calculating MD5 hash of the released game. There may be collision. And only one ID per game, regardless of version.

With ISBN, even the same book on a different format get a different number, much less different version.

Perhaps a second identifying number hashing author, publisher, format, release date+version, platform, tool+version, library+version?

No. Old IFIDs are calculated with a MD5 hash, new IFIDs are random UUIDs. MD5 collisions are possible, but it would be extremely difficult to come up with one that is still a valid storyfile. UUIDs have such a large numerical space (122 bits) that it’s almost impossible for random UUIDs to be chosen more than once.

1 Like

If this gets officially changed/decided, I’d be interested to know…we should probably update the advice on ifwiki to reflect whatever the new guidance is.

1 Like

The UUID://XXXXXXXXXXXXXXXX// plan (embedding that string literally in the game file) is used by Glulx, Z-code, Hugo, and Alan. It makes sense to permit it as a general fallback plan for any game file, including future formats.

(This is not meant to change the original proposal above.)

Wow, let’s not add any more identifiers!

(We already use TUID, IFID, MD5 file checksum, and IFArchive pathname in different parts of the existing ecosystem.)

2 Likes

Please let me know when a global universal identifier is available. And how to calculate it. Thanks.

Yes, it’s an old xkcd comic: xkcd: Standards

1 Like

I’ve added two sections to the Babel draft doc:


The IFID for an HTML story file

A number of design systems generate output in HTML format, including
Twine, ChoiceScript, Adventuron, Ink, Texture, and others.

Design systems may integrate an IFID into the output HTML by adding a
<meta> tag to the <head> section of the output:

<meta property="ifiction:ifid" content="448E73DF-2D2F-47E7-A494-A46B40D4CFB3">

(If the game comprises several HTML files, apply this to the start file.)

You may optionally include an RDFa prefix or XML xmlns for this meta
tag, ensuring that your HTML will be valid RDFa. This is not required.
Some examples of this (other arrangements are possible):

<meta prefix="ifiction: http://babel.ifarchive.org/protocol/iFiction/"
	property="ifiction:ifid" content="448E73DF-2D2F-47E7-A494-A46B40D4CFB3">

<html xmlns:ifiction="http://babel.ifarchive.org/protocol/iFiction/">
	<head>
		<meta property="ifiction:ifid" content="448E73DF-2D2F-47E7-A494-A46B40D4CFB3">
	</head>
</html>


The IFID for a legacy HTML story file

HTML games that lack the <meta> tag described above may include the
text UUID://...// in a literal string or comment in the HTML.

Older Twine games may incorporate an IFID in a <tw-storydata> tag in
the HTML:

<tw-storydata name="Title" creator="Twine" ifid="8665FC08-15CD-4BEC-B15A-7F72E34F4F51" ...>

Otherwise, the IFID for a legacy HTML story file is “HTML-” followed by
the MD5 checksum of the file.

1 Like

Here’s a branch for the Babel tool which handles all these possibilities:

Thanks to Dan F for getting these changes started.

2 Likes

What if we only standardize on the UUID://...// syntax, allowing authors to embed it anywhere in a file? The Treaty could recommend that HTML authors put it in a <meta property=ifiction:ifid> tag like this:

<meta property="ifiction:ifid" content="UUID://448E73DF-2D2F-47E7-A494-A46B40D4CFB3//">

The Babel tool could just search for the UUID://...// syntax in HTML files, allowing the UUID://...// to appear anywhere, including in a comment.

This would have the benefit of getting us out of the business of parsing HTML by regex! (Except for <tw-storydata>, but that’s a particularly trivial case to parse, and we have working code already.)

(Also, your draft version of the Treaty proposes that the UUID://...// syntax would be valid for HTML games, but upthread, you were musing that perhaps it would/should be supported for all story files. If so, do we wanna amend the Treaty to say that?)

I don’t think that xmlns is right in this case.

xmlns was the XHTML way to add new tags and attributes to XHTML, and it might make sense to use it if we were asking people to use a colon-prefixed custom tag (like <ifiction:ifid>) or a custom attribute (<meta ifiction:ifid="xxxx">), but xmlns is irrelevant in our case, because the <meta> tag is already intended to support custom metadata in its attribute’s values, like this:

<meta name="whatever you like" content="custom value">

Now, the subtlety here is that there are two competing ways to put custom metadata in <meta> tags. The WHATWG HTML standard recommends that you just make up a custom name attribute, like <meta name="ifiction.ifid">, but W3C recommends RDFa, in which you use a property attribute instead of a name attribute, with a colon-prefixed value, like <meta property="ifiction:ifid">. RDFa then adds a prefix attribute that associates the ifiction: prefix with an URL.

RDFa’s prefix attribute is philosophically similar to xmlns, but since prefix is namespacing the value of an attribute, xmlns doesn’t actually apply.

(Note that WHATWG, the team who maintains the HTML standard, has been feuding with W3C for at least 10 years; the teams aren’t talking to each other, and even contradict one another. The RDFa property and prefix attributes (as well as vocab, resource, and typeof) appear in a W3C standard https://www.w3.org/TR/rdfa-lite/ but none of the RDFa attributes are even documented in the WHATWG HTML standard or MDN. https://html.spec.whatwg.org/multipage/ Also, RDFa is intended to apply to any element in the file, kinda like microdata, but HTML already has microdata attributes itemscope itemprop and itemtype.)

When I wrote my initial proposal, I picked RDFa because that’s what Facebook used for their Open Graph tags https://ogp.me/

<html prefix="og: https://ogp.me/ns#">
<head>
<title>The Rock (1996)</title>
<meta property="og:title" content="The Rock" />

It seemed easy and lightweight to me, especially since the prefix attribute is optional.

Regardless, I claim that the prefix should use the https:// version of the URL, https://babel.ifarchive.org/protocol/iFiction/ and not the http:// version http://babel.ifarchive.org/protocol/iFiction/.

(Many major historical xmlns values used http://, e.g. xmlns="http://www.w3.org/1999/xhtml", but I claim that’s because they were defined in the 20th century. In hindsight, they should have used https:// even then.)

2 Likes

What I figure is that web pages are an important special case. Any kind of web cataloging or indexing system is going to be geared to use RDFA, so we should support that first and foremost.

(The babel tool is almost a corner case, really. Does anybody use it besides David Kinder when he’s filing ifarchive uploads? This is why I’m not too concerned about the ickiness of C-code HTML parsing!)

It would make sense. But I haven’t worked up either the doc change or the code change to make it happen.

I admit that I just polled some web sites like IMDB and looked at what they were doing!

I feel like we can’t really dictate how people template their web sites. Especially if there’s feuding standards. Maybe it’s not useful to give examples at all, but I don’t want to give the impression that people should omit the prefix entirely.

The http URL is in the original Babel doc and I can’t see any reason to change it now. Nothing about the process involves fetching that URL; it’s just a unique string, so there’s no added security to the https version. Changing it just means having two different prefixes in common use. I see that the OpenGraph people changed from http://opengraphprotocol.org/schema/ to https://ogp.me/ns#, which must have been some kind of headache.

2 Likes

That makes sense, and so asking RDFa crawlers to manually strip the UUID://...// off seems like a hassle.

I’ve filed a PR https://github.com/iftechfoundation/ifarchive-if-specs/pull/3 to remove the xmlns remark from the treaty. Since we’re going with RDFa, the prefix is the (only) thing to use.

Otherwise, I think https://github.com/iftechfoundation/babel-tool/pull/25 is ready to merge; once the Treaty merges, I think it’s ready to ship.

I tried to use an RDFa parser to verify that the HTML files are correct, but I couldn’t get it to work.

I’ve probably just set something up wrong. I will try again later.

Seems fine to me. I just went to https://github.com/iftechfoundation/babel-tool/blob/twine-html/test/html/Test-Game-Meta-Prefix.html and uploaded it to https://www.w3.org/2012/pyRdfa/Validator.html and it passed.

The python one that I installed (pyRdfa3) refuses to return anything, even on the sample files shown at https://www.w3.org/TR/html-rdfa/. However, the validator you linked to is more helpful. I also tested with a browser extension (https://osds.openlinksw.com/). So I am reassured.

They all accept the xmlns:ifiction=... syntax, by the way, but mark it as deprecated.