Apparently Level-9’s A-Code system had better text compression than Infocom’s Z-Code system, managing to compress the text to about half the size of the uncompressed text according to reports. How did this text compression work?
V1 and V2 games used a simple system where text was ASCII (values are offset though) + certain bytes could refer to abbreviations, which are just messages to insert. This system is not great at compressing text, but definitely helps.
V3 and V4 used a more sophisticated system. The main innovation was using the dictionary to store words for both parsing and printing messages. The dictionary itself is also compressed, using a 5-bit code to cover the letters A-Z and code extensions for other things. Also words which shared a common prefix didn’t need to repeat that prefix.
@heasm66 has a document describing all the gory details (I have forgotten the link).
This is output text. It typically ends up at around 55% of the original size.
It’s harder to say how efficiently the dictionary words are stored. For Z-code, the designers have chosen a compromise between fast access and small size. Each word is stored with 6 or 9 characters, in four or six bytes - the word is truncated if it’s longer than this, padded if it’s shorter, and this storage can’t use abbreviations. Then there’s two or three bytes of extra data for each word. For some words, some of these bytes are empty - it depends on the role the word can have in commands. For any given game, these sizes are fixed, so there’s a fixed length that’s required for storing each entry in the dictionary. This, and the fact that encoding a dictionary word can only be done in one way (hence no abbreviations), allows the Z-code interpreter to encode the word it’s looking for, and look it up in the dictionary quickly.
We could instead start each dictionary entry with a byte saying how many bytes are used for storing the text, how many bytes are used for data, and maybe we can squeeze in some other flags there as well. This might save space, but then searching for words gets a lot slower, as it has do be done in a linear maner. Also note that a linear search is less of a disaster if you write games where you know the full dictionary will always be available in RAM, and stored in one piece, in the right order. This is true for Level 9’s games, but not for Infocom’s games.
Different ways to compress and store data will of course be more or less relevant depending on the CPU and memory available, and in some cases the storage media, e.g. the Zip archive format is designed for handling archive files stored on tape, including files larger than the available RAM. I don’t think it’s a factor for these text adventure formats though. While all but the latest Level 9 games could be loaded from tape, they wouldn’t load any game data while running.
Yeah, when you’re producing a game for a single-load on a tape-driven 48K machine, you do really need some decent text compression. I think the PAW’s token system compressed text down to about 40% of the original size. We couldn’t customise the tokens back then, but I know there is a modern tool out there for the successor system DAAD that allows you to define an optimal set of tokens for a specific game’s text. Oh, to have access to tools like that back then!
PAWs tokens as used in the CP/M version (below). Underscore represents a space, iirc.
Summary
_the_
_you_
_are_
ing_
_to_
_and
_is_
You_
and_
The_
n't_
_of_
_you
ing
ed_
_a_
_op
ith
out
ent
_to
_in
all
_th
_it
ter
ave
_be
ver
her
and
ear
You
_on
en_
ose
no
ic
ap
_b
gh
__
ad
is
_c
ir
ay
ur
un
oo
_d
lo
ro
ac
se
ri
li
ti
om
bl
ck
I_
ed
ee
_f
ha
pe
e_
t_
in
s_
th
,_
er
d_
on
to
an
ar
en
ou
or
st
._
ow
le
at
al
re
y_
ch
am
el
_w
as
es
it
_s
ll
do
op
sh
me
he
bo
hi
ca
pl
il
cl
_a
of
_h
tt
mo
ke
ve
so
e.
d.
t.
vi
ly
id
sc
_p
em
r_