How did A-Code text compression work?

P-Tux7 · November 20, 2025, 1:52am

Apparently Level-9’s A-Code system had better text compression than Infocom’s Z-Code system, managing to compress the text to about half the size of the uncompressed text according to reports. How did this text compression work?

andrewj · November 20, 2025, 9:47am

There were two main ways of compressing text:

V1 and V2 games used a simple system where text was ASCII (values are offset though) + certain bytes could refer to abbreviations, which are just messages to insert. This system is not great at compressing text, but definitely helps.
V3 and V4 used a more sophisticated system. The main innovation was using the dictionary to store words for both parsing and printing messages. The dictionary itself is also compressed, using a 5-bit code to cover the letters A-Z and code extensions for other things. Also words which shared a common prefix didn’t need to repeat that prefix.

@heasm66 has a document describing all the gory details (I have forgotten the link).

heasm66 · November 20, 2025, 10:59am

This is the incomplete document that I have: Level 9 Game Format - Google Docs

And here is a link to the thread thst initiated that document: Specification of Level 9 A-Code

fredrik · November 20, 2025, 1:14pm

Z-code games store the text compressed to about half the original size.

It was not quite as good in the 80s, because the tools they had for choosing the best abbreviations were less sophisticated.

Angstsmurf · November 20, 2025, 2:27pm

That is just the output text, right? Or does it take the input dictionary into account?

This sounds plausible, but it would be interesting to see the actual numbers. What is the source of this?

fredrik · November 20, 2025, 3:24pm

This is output text. It typically ends up at around 55% of the original size.

It’s harder to say how efficiently the dictionary words are stored. For Z-code, the designers have chosen a compromise between fast access and small size. Each word is stored with 6 or 9 characters, in four or six bytes - the word is truncated if it’s longer than this, padded if it’s shorter, and this storage can’t use abbreviations. Then there’s two or three bytes of extra data for each word. For some words, some of these bytes are empty - it depends on the role the word can have in commands. For any given game, these sizes are fixed, so there’s a fixed length that’s required for storing each entry in the dictionary. This, and the fact that encoding a dictionary word can only be done in one way (hence no abbreviations), allows the Z-code interpreter to encode the word it’s looking for, and look it up in the dictionary quickly.

We could instead start each dictionary entry with a byte saying how many bytes are used for storing the text, how many bytes are used for data, and maybe we can squeeze in some other flags there as well. This might save space, but then searching for words gets a lot slower, as it has do be done in a linear maner. Also note that a linear search is less of a disaster if you write games where you know the full dictionary will always be available in RAM, and stored in one piece, in the right order. This is true for Level 9’s games, but not for Infocom’s games.

Dissolved · November 20, 2025, 4:00pm

Isn’t this sort of design also strongly encouraged by the nature of tape-based storage?

fredrik · November 20, 2025, 5:11pm

Different ways to compress and store data will of course be more or less relevant depending on the CPU and memory available, and in some cases the storage media, e.g. the Zip archive format is designed for handling archive files stored on tape, including files larger than the available RAM. I don’t think it’s a factor for these text adventure formats though. While all but the latest Level 9 games could be loaded from tape, they wouldn’t load any game data while running.

Dissolved · November 20, 2025, 5:41pm

That’s sort of what I meant, though? Since tape storage discourages loading game data while running, it encourages storing everything in RAM.

8bitAG · November 20, 2025, 8:28pm

Yeah, when you’re producing a game for a single-load on a tape-driven 48K machine, you do really need some decent text compression. I think the PAW’s token system compressed text down to about 40% of the original size. We couldn’t customise the tokens back then, but I know there is a modern tool out there for the successor system DAAD that allows you to define an optimal set of tokens for a specific game’s text. Oh, to have access to tools like that back then!

PAWs tokens as used in the CP/M version (below). Underscore represents a space, iirc.

Summary

_the_
_you_
_are_
ing_
_to_
_and
_is_
You_
and_
The_
n't_
_of_
_you
ing
ed_
_a_
_op
ith
out
ent
_to
_in
all
_th
_it
ter
ave
_be
ver
her
and
ear
You
_on
en_
ose
no
ic
ap
_b
gh
__
ad
is
_c
ir
ay
ur
un
oo
_d
lo
ro
ac
se
ri
li
ti
om
bl
ck
I_
ed
ee
_f
ha
pe
e_
t_
in
s_
th
,_
er
d_
on
to
an
ar
en
ou
or
st
._
ow
le
at
al
re
y_
ch
am
el
_w
as
es
it
_s
ll
do
op
sh
me
he
bo
hi
ca
pl
il
cl
_a
of
_h
tt
mo
ke
ve
so
e.
d.
t.
vi
ly
id
sc
_p
em
r_

8bit_era · December 2, 2025, 9:23pm

In that context I’d like to draw your attention to this thread. You may find what you are looking for there. Enjoy. A-Code Compiler, Level 9 Archive, Specs, game sources: made public!