There’s a nice Python library called unidecode that converts unicode to ascii, including approximate romanization of non-latin scripts (so e.g. 北京 becomes Bei Jing, and فارسی, farsi, becomes “frsy”)
Man, I am an idiot. I was literally struggling with this exact problem for a week not that long ago. I don’t know why I didn’t connect the dots that this is what you were talking about.
Anyway, I sincerely appreciate you going through the effort. Something in my brain just wasn’t clicking.
I like Hanon’s suggestion that a fixed number might be maintained, the oldest rolling off when the number is hit.
I think at least some interpreters continue transcribing across restore (I spot-checked frotz and lectrote) so I don’t think doing the right thing has to be a problem. Open/close would probably be enough, maybe with story name and date/time in the filename.
Unidecode as a Firefox extension would be a godsend for Wiki articles on subjects with names of Greek, Cyrillic, Arabic, Hebrew, or east asian origin and browsing A3O(so many fandom tags that put the original language title before the romanized and official english titles, thank the patron saint of fanfiction characters are only tagged by at most romanized name and dub name). Oh, and did I mention my screen reader identifies which foreign alphabet a character is from per character? So that five letter Arabic word above is read as Arabic followed by the name of the letter five times. Even if I knew the Arabic alphabet, I wouldn’t be able to parse the letter names for all the repeats of the word arabic(Thank Athena it doesn’t read Greek letters with the language identifier)… And as annoying as thorn for unrecognized characters is in the console, the way unrecognized characters are handled in the GUI is worse… reading out the entire unicode codepoint in hexadecial(though at least not in decimal, decimal numbers get super wordy really quickly, hexadecimal is at least limited by being read digit by digit).
And no worries Jess, I’ve had my share of moments where I’ve forgotten something that should have been obvious given my experience.