As the title says… I need a example game with a custom defined unicode translation table for testing a z-machine disassembler.
I think the z5 versions of Garry’s PunyInform games typically (always?) use a copyright symbol. I’m not sure if that means it has to be in the Unicode translation table (this is probably the part of the Z-machine that I’m the least familiar with).
Does this count?
My ZDevtools includes an assembler which will add a Unicode table whenever you make use of non-ASCII/ZSCII characters. For example:
start
print "Dvořák"
quit
This’ll create a 2-entry Unicode table. I’ve attached the output of this if a single example is enough.
unicode.z5 (728 Bytes)
That’s true, but I’ve been thinking of changing that.
The z5 version of The Mystery of Winchester High uses D@'un @'Eideann
.
The z5 version of Search for the Lost Ark uses Andr@'e
, resum@'e
and sl@aegan
.
I have no idea how or where these are stored. All I know is that it works. (The z3 versions use ASCII characters without diacritical marks.)
These characters are in the default Unicode table (See 3.8.7 at The Z-Machine Standards Document), so they don’t require a game-specific Unicode table.Copyright symbol isn’t though.
Thanks. As I said, I didn’t know where they were stored. I suspected they weren’t anything special.
This is Search for the Lost Ark
C:\Users\heasm\Downloads>unz search-for-the-lost-ark.z5 -m -i -u
***** ANALYZING *****
Filename: search-for-the-lost-ark.z5
Compiled With: Inform 6.41
Z-machine version: 5
Calculated checksum: 0x9EA3, checksum ok
IFID: UUID://20c07a3d-6486-4e59-a560-a606edecf9e0//
Object count: 151
Unique verbs count: 105
Grammar table version: 2
Verb action count: 113
Dictionary word count: 630
Scanning for routines from: 0x03EA8
Found first routine at address: 0x03F00
Lowest routine address (immediate call) : 0x03F64
Highest routine address (immediate call): 0x0B528
Lowest string address (immediate address): 0x0B59C
Strings start at address: 0x0B59C
Highest used global in z-code: 240
Number of used globals in z-code: 107
Number of unique properties: 26
***** MEMORY MAP *****
00000-02191 DYNAMIC MEMORY
00000-0003F Header table, 64 bytes.
00040-001F9 Abbreviation strings, 442 bytes.
001FA-002B9 Abbreviation table, 192 bytes.
002BA-002C1 Header extension table, 8 bytes.
002C2-0034F Unidentified data, 142 bytes.
00350-003CD Object defaults table, 126 bytes.
003CE-00C0F Object tree table, 2,114 bytes.
00C10-0192A Object properties tables, 3,355 bytes.
0192B-01B6E Unidentified data (Class, indiv. prop & symbol table), 580 bytes.
01B6F-01D4E Global variables, 480 bytes.
01D4F-01D7C IFID, 46 bytes.
01D7D-02190 Unidentified data (Arrays), 1,044 bytes.
02191-02191 Terminating characters table, 1 byte.
02192-03EFF STATIC MEMORY
02192-02263 Syntax/Grammar table, 210 bytes.
02264-02A0B Syntax/Grammar table data, 1,960 bytes.
02A0B-02AEC Action table, 226 bytes.
02AED-02AEE Preposition/Adjective table, 2 bytes.
02AEF-03EA5 Vocabulary/Dictionary, 5,047 bytes.
03EA6-03EFF Unidentified data (Static arrays), 90 bytes.
03F00-12D73 HIGH MEMORY
03F00-0B59B Z-code, 30,364 bytes.
0B59C-12D73 Static strings, 30,680 bytes.
***** HEADER (00000-0003F, 64 bytes) *****
00000 05 VERSION Z-machine version: 5
00001 00 MODE Flags 1: 0x00
00002 00 01 ZORKID Release number: 1
00004 3F 00 ENDLOD Base of high memory: 0x3F00
00006 3F 01 START Initial value of pc: 0x3F01
00008 2A EF VOCAB Dictionary: 0x2AEF
0000A 03 50 OBJECT Object table: 0x0350
0000C 1B 6F GLOBALS Global variables table: 0x1B6F
0000E 21 92 PURBOT Base of static memory: 0x2192
00010 00 50 FLAGS Flags 2: 0x0050
00012 32 33 30 36 32 39 SERIAL Serial number: 230629
00018 01 FA FWORDS Abbreviations table: 0x01FA
0001A 4B 5D PLENTH Length of file: 0x12D74
0001C 9E A3 PCHKSM Checksum of file: 0x9EA3
0001E 00 INTWRD Interpreter number: 0
0001F 00 Interpreter version: 0
00020 00 SCRWRD Screen height (lines): 0
00021 00 Screen width (chars): 0
00022 00 00 HWRD Screen width in units: 0x0000
00024 00 00 VWRD Screen width in units: 0x0000
00026 00 FWRD Font width/height: 0
00027 00 Font width/height: 0
00028 00 00 FOFF Routines offset: 0x0000
0002A 00 00 SOFF Static strings offset: 0x0000
0002C 00 CLRWRD Default backgr. color: 0
0002D 00 Default foregr. color: 0
0002E 21 91 TCHARS Terminating chars table: 0x2191
00030 00 00 TWID Output loc for DIROUT: 0x0000
00032 00 00 Standard revision number: 0x0000
00034 00 00 CHRSET Alphabet table address: 0x0000
00036 02 BA EXTAB Header extension address: 0x02BA
00038 00 00 00 00 36 2E 34 31 USRNM Username: ....6.41
***** HEADER EXTENSION TABLE (002BA-002C1, 8 bytes) *****
002BA 00 03 Number of further words: 3
002BC 00 00 MSLOCX X-coord of mouse after a click: 0x0000
002BE 00 00 MSLOCX Y-coord of mouse after a click: 0x0000
002C0 02 C2 Unicode tranlation table address: 0x02C2
***** UNIDENTIFIED DATA (002C2-0034F, 142 bytes) *****
002C0 46 00 E4 00 F6 00 FC 00 C4 00 D6 00 DC 00 F.............
002D0 DF 00 BB 00 AB 00 EB 00 EF 00 FF 00 CB 00 CF 00 ................
002E0 E1 00 E9 00 ED 00 F3 00 FA 00 FD 00 C1 00 C9 00 ................
002F0 CD 00 D3 00 DA 00 DD 00 E0 00 E8 00 EC 00 F2 00 ................
00300 F9 00 C0 00 C8 00 CC 00 D2 00 D9 00 E2 00 EA 00 ................
00310 EE 00 F4 00 FB 00 C2 00 CA 00 CE 00 D4 00 DB 00 ................
00320 E5 00 C5 00 F8 00 D8 00 E3 00 F1 00 F5 00 C3 00 ................
00330 D1 00 D5 00 E6 00 C6 00 E7 00 C7 00 FE 00 F0 00 ................
00340 DE 00 D0 00 A3 01 53 01 52 00 A1 00 BF 00 A9 00 ......S.R.......
As you can see (header extension table) there’s an unicode translation table at 0x02c2
with 0x46
(70) word
entries in the array. It’s first the 69 defined in §3.8.7 (0xe4 - 0xbf), last is the ‘©’ (0xa9) added to the array.
(There’s an odd extra byte between unicode and objects default table that needs investigation…)
Excellent, thank you, all!
The mystery with the extra byte is solved. Looked in the Inform6 source code and this is the comment…
/* The object table must be word-aligned. The Z-machine spec does not
require this, but the RA__Pr() veneer routine does.
*/
Now I have the table decoded. Example from Search of the Lost Ark.
***** UNICODE TRANSLATION TABLE (002C2-0034E, 141 bytes) *****
002C2 46 Number of entries: 70
002C3 00 E4 ZSCII #155 = U+00E4 'ä'
002C5 00 F6 ZSCII #156 = U+00F6 'ö'
002C7 00 FC ZSCII #157 = U+00FC 'ü'
002C9 00 C4 ZSCII #158 = U+00C4 'Ä'
002CB 00 D6 ZSCII #159 = U+00D6 'Ö'
002CD 00 DC ZSCII #160 = U+00DC 'Ü'
002CF 00 DF ZSCII #161 = U+00DF 'ß'
002D1 00 BB ZSCII #162 = U+00BB '»'
002D3 00 AB ZSCII #163 = U+00AB '«'
002D5 00 EB ZSCII #164 = U+00EB 'ë'
002D7 00 EF ZSCII #165 = U+00EF 'ï'
002D9 00 FF ZSCII #166 = U+00FF 'ÿ'
002DB 00 CB ZSCII #167 = U+00CB 'Ë'
002DD 00 CF ZSCII #168 = U+00CF 'Ï'
002DF 00 E1 ZSCII #169 = U+00E1 'á'
002E1 00 E9 ZSCII #170 = U+00E9 'é'
002E3 00 ED ZSCII #171 = U+00ED 'í'
002E5 00 F3 ZSCII #172 = U+00F3 'ó'
002E7 00 FA ZSCII #173 = U+00FA 'ú'
002E9 00 FD ZSCII #174 = U+00FD 'ý'
002EB 00 C1 ZSCII #175 = U+00C1 'Á'
002ED 00 C9 ZSCII #176 = U+00C9 'É'
002EF 00 CD ZSCII #177 = U+00CD 'Í'
002F1 00 D3 ZSCII #178 = U+00D3 'Ó'
002F3 00 DA ZSCII #179 = U+00DA 'Ú'
002F5 00 DD ZSCII #180 = U+00DD 'Ý'
002F7 00 E0 ZSCII #181 = U+00E0 'à'
002F9 00 E8 ZSCII #182 = U+00E8 'è'
002FB 00 EC ZSCII #183 = U+00EC 'ì'
002FD 00 F2 ZSCII #184 = U+00F2 'ò'
002FF 00 F9 ZSCII #185 = U+00F9 'ù'
00301 00 C0 ZSCII #186 = U+00C0 'À'
00303 00 C8 ZSCII #187 = U+00C8 'È'
00305 00 CC ZSCII #188 = U+00CC 'Ì'
00307 00 D2 ZSCII #189 = U+00D2 'Ò'
00309 00 D9 ZSCII #190 = U+00D9 'Ù'
0030B 00 E2 ZSCII #191 = U+00E2 'â'
0030D 00 EA ZSCII #192 = U+00EA 'ê'
0030F 00 EE ZSCII #193 = U+00EE 'î'
00311 00 F4 ZSCII #194 = U+00F4 'ô'
00313 00 FB ZSCII #195 = U+00FB 'û'
00315 00 C2 ZSCII #196 = U+00C2 'Â'
00317 00 CA ZSCII #197 = U+00CA 'Ê'
00319 00 CE ZSCII #198 = U+00CE 'Î'
0031B 00 D4 ZSCII #199 = U+00D4 'Ô'
0031D 00 DB ZSCII #200 = U+00DB 'Û'
0031F 00 E5 ZSCII #201 = U+00E5 'å'
00321 00 C5 ZSCII #202 = U+00C5 'Å'
00323 00 F8 ZSCII #203 = U+00F8 'ø'
00325 00 D8 ZSCII #204 = U+00D8 'Ø'
00327 00 E3 ZSCII #205 = U+00E3 'ã'
00329 00 F1 ZSCII #206 = U+00F1 'ñ'
0032B 00 F5 ZSCII #207 = U+00F5 'õ'
0032D 00 C3 ZSCII #208 = U+00C3 'Ã'
0032F 00 D1 ZSCII #209 = U+00D1 'Ñ'
00331 00 D5 ZSCII #210 = U+00D5 'Õ'
00333 00 E6 ZSCII #211 = U+00E6 'æ'
00335 00 C6 ZSCII #212 = U+00C6 'Æ'
00337 00 E7 ZSCII #213 = U+00E7 'ç'
00339 00 C7 ZSCII #214 = U+00C7 'Ç'
0033B 00 FE ZSCII #215 = U+00FE 'þ'
0033D 00 F0 ZSCII #216 = U+00F0 'ð'
0033F 00 DE ZSCII #217 = U+00DE 'Þ'
00341 00 D0 ZSCII #218 = U+00D0 'Ð'
00343 00 A3 ZSCII #219 = U+00A3 '£'
00345 01 53 ZSCII #220 = U+0153 'œ'
00347 01 52 ZSCII #221 = U+0152 'Œ'
00349 00 A1 ZSCII #222 = U+00A1 '¡'
0034B 00 BF ZSCII #223 = U+00BF '¿'
0034D 00 A9 ZSCII #224 = U+00A9 '©'
And the strings prints correct:
***** STATIC STRINGS (0B59C-12D73, 30,680 bytes) *****
0B59C S0001 "S{ear}ch{ for}{ the }Lost Ark"
0B5AC S0002 "^Copyr{ight} © 2023 Garry Francis^Type ABOUT{ for} fur{the}r{ in}fo{ and }credits{.^^}"
0B5E4 S0003 "André"
0B5EC S0004 "resumé"
0B5F4 S0005 "slægan"