Extracting IFID from Hugo story file

The current Hugo detection code in Babel-tool looks for the plaintext string “UUID://”, which seems a bit futile as there doesn’t seem to be any existing Hugo games which contain such a string.

However, most recent Hugo games actually do contain an IFID string. Here is for example the story file of Storm over London (storm.hex) at offset 0x15F83:

B7142FD1-4BE4-4BE8-A3BF-E854D90F3540Hugo v3.1 / Library 31031v3.1itandarehereinison  RoodyLib Version 4.04.0[ press any key to exit ]waswerethereThe mansion was old, built perhaps in the 19th century. It seemed to be relatively well-kept, not counting a few missing roof tiles, rusty rain gutters about to fall apart and the need for a new layer of paint.gagaingowalkintoinsidethroughoffouttotowardtowardsoutsidelooklexaminexwatch

Of course, to read that, you have to subtract 20 from every character code in order to reverse the Hugo text obfuscation. Anyway, if you know what IFID you’re looking for, it is easy to find. But what if we don’t know the IFID? What if we are trying to extract if from the story file, the way Babel-tool is supposed to work?

Is there a way to know beforehand where to look? It is at a different offset in every game, and the surrounding text is always different.

Or do we have to look through the entire file for a sequence of between 8 and 63 characters, each of which shall be a digit, a capital letter or a hyphen?

Does anyone have a good idea of how to implement this search? As far as I can tell, all Hugo IFIDs are 36 characters long, in the form XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX (8-4-4-4-12), all upper case, which should make things a little easier.

1 Like

I have no special insight into the Hugo file structure, but from your description it sounds like Hugo uses a UUID for its IFID. (Perfectly reasonable, a prime example of what UUIDs were designed to do.) If so, we can restrict the search criteria further: not just letters, but the letters A-F (since we’re looking at hex digits), and the hyphens are in specific positions.

My intuition says Hugo files shouldn’t be so huge that scanning for these would be a painful task. Let me see if I can throw together a quick proof of concept in Python.

Yes, that is what my question should have been:
a) Is there a way to know where in the Hugo file structure the IFID (in UUID format) is stored?
If not (which seems likely, now that I think about it)
b) what is a good way to search through the entire file for it?

Apparently, I’m not the first one with the second problem. Here are a number of solutions using regex: Searching for UUIDs in text with regex - Stack Overflow

But preferably, the algorithm should be in plain C without any dependencies, so that it can be easily incorporated into the babel-tool.

Ah, gotcha. Makes sense. My C is so minimal that I’ve already grossly overrepresented it by calling it “minimal,” but I guess there’s no reason to think that C has a UUID library, that being a newer thing than C itself, and it looks like there’s no regular expression library that’s standard for anything below C++11, either. I have a half-written implementation in Python that uses regular expressions, which is not helpful.

I guess that you could iteratively read the file byte by byte, checking to see if (a) if the current character continues the UUID pattern we’re currently matching, if we’re currently trying to build a match, and discarding the current match attempt, if the current character means that the current string we’re constructing can’t be a UUID, and moving back to the character after the match we started trying to build; or (b) if this is plausibly the first byte of a UUID, if we’re not currently in the middle of a check, then storing it and starting to check whether the subsequent characters continue the current pattern, if we’re not currently trying to match a UUID.

This might be painful on large files, but the largest Hugo file on my hard drive is Fallacy of Dawn, which is less than a megabyte, so it shouldn’t be impossible on a relatively fast computer.

Alternately, if the Babel tool is only going to be run on Unix-like computers, it might just be easier to use the standard Unix regexp programs on them.

Never mind, looks like there are at least some Hugo games that don’t have UUIDs as IFIDs. Just a couple from a quick scan through IFDB:

Sigh. So much for scanning for UUIDs.

I think it is still fine to scan for them. In the Treaty of Babel, a distinction is made between the present IFID standard, which uses a UUID format, and legacy formats, which do not. The IFID detection code for most of the formats in babel-tool first looks for an UUID, and if it doesn’t find one, generates a format-specific legacy IFID.

I think that a couple of IFIDs listed on IFDB might in fact be mistakes, entered by people running the current (non-functional) babel-tool on the Hugo story files and getting a legacy IFID, despite there being a present-standard UUID hidden in there. So if we make the scan for UUIDs work, we can correct this.

1 Like
1 Like