Dead code elimination

I was curious about how well dialogc detected and eliminated dead and unreachable code, so I did a little experiment, compiling an absolutely minimal program with and without a bunch of library code included:

(current date [24 4 2026]) %% time.dg falls over if this isn't set

(current time 0) %% time.dg falls over if this isn't set

(program entry point) (quit)

I set up a little Makefile:

COMPILE=dialogc -t z5

all: hello1.z5 hello2.z5

hello1.z5: hello.dg dice-lite.dg time.dg utils.dg stdlib.dg
	$(COMPILE) $^ -o $@

hello2.z5: hello.dg stdlib.dg
	$(COMPILE) $^ -o $@

…and running make gave me this:

sue@solfar dialog-extensions % ls -l hello*
-rw-r--r--@ 1 sue  staff     75 Apr 23 23:57 hello.dg
-rw-r--r--  1 sue  staff  79156 Apr 24 00:03 hello1.z5
-rw-r--r--  1 sue  staff  78324 Apr 24 00:03 hello2.z5
sue@solfar dialog-extensions % 

Building .aastory files resulted in output about half this size. Unless my extensions really do compile down into less than 1K of Z-code, it does seem to be leaving them out, but there’s a lot lying around from stdlib.dg that isn’t getting left out, despite never being called. So I guess we really do need to go and yank unused features out of stdlib.dg to get down to a small footprint, after all?

I suspect the compiler is not properly eliminating the library’s (program entry point), and basically every bit of library code is alive if that is. If your entry point succeeds instead of quitting, will that do it?

1 Like

Weirdly, doing that adds eight bytes to each file:

sue@solfar dialog-extensions % ls -l hello*
-rw-r--r--@ 1 sue  staff     68 Apr 24 01:33 hello.dg
-rw-r--r--  1 sue  staff  79164 Apr 24 01:33 hello1.z5
-rw-r--r--  1 sue  staff  78332 Apr 24 01:33 hello2.z5
sue@solfar dialog-extensions % 

Now I’m curious how much library code actually fits into that many bytes. I should poke at this tomorrow and see! If you want to try before then, though, my first step is going to be compiling with -vvv (and redirecting the output to a file, it’s enormous). That prints information on every predicate compiled, and I believe it omits dead ones. So with a minimal “hello world” like this, it should give a list of which things from the standard library are considered reachable.

(It’s also possible it prints predicate information before doing dead code elimination, in which case I may need to go in and add some extra diagnostics to get proper results. Adding a never-called (dead code) predicate and checking if it appears in the -vvv output is a good sanity check.)

In the ideal case, it’s just (error $ entry point), but that shouldn’t take a whole kilobyte, right?

That’s why I threw in three entire never-called extensions in one of the builds. I’ll have to see how it looks under -vvv.

1 Like

All right! Doing a bit of poking.

Using this code:

(program entry point)
	(hello)

(hello)
	Hello world!

(interface (dead alpha))

(dead alpha)
	(dead beta)

(dead beta)
	This should never happen

Compiling with -vvv returns, along with many other things:

Predicate ---M-- ($ < $) of arity 2
Predicate ---M-- ($ > $) of arity 2
Predicate ---M-- ($ plus $ into $) of arity 3
Predicate ---M-- ($ minus $ into $) of arity 3
[...]
Predicate PN---- (hello) of arity 0
	helloworld.dg:4: ---> Hello world!
Predicate ------ (dead alpha) of arity 0 (interface declared at helloworld.dg:7)
	helloworld.dg:9: ---> (dead beta)
Predicate ------ (dead beta) of arity 0
	helloworld.dg:12: ---> This should never happen

Those flags after the word “Predicate” are the key:

  • P = invoked by program
  • N = invoked normally (not for words)
  • W = invoked for words (within a (collect words))
  • M = invoked in a multi-query
  • G/F/D = global variable / fixed flag / dynamic flag
  • S = might (stop)

The first one, “P”, is the key. Any predicate without “P” is considered dead and will be eliminated by the compiler. (Annoyingly, this works differently on Z-machine and Å-machine. But unz confirms that the string “This should never happen” does not appear in the compiled Z-machine file, at least.)

If we include stdlib.dg when compiling, then we see a huge list of library predicates have the “P” flag—it seems like everything involved in parsing and the turn sequence! And unfortunately, I can’t find any way of cutting it down by altering helloworld.dg. It seems like Dialog always considers every rule for (program entry point) to be live, even if an earlier one features (just) or (stop) or the like.

Why does this happen? It comes from trace_invocations in frontend.c. It starts by marking (program entry point), (error $ entry point), and (object $) live, then propagates live-ness from there. And it marks them live for normal queries and multi-queries, which means every rule needs to be compiled, not just the first one—after all, if you (exhaust) *(predicate), every rule might matter! It doesn’t seem to notice (just) or (quit) at this stage.

Predicate PN-M-- (program entry point) of arity 0
	helloworld.dg:1: ---> (just) (hello)
	stdlib.dg:4909: ---> (exhaust) *(startup) (div @initial-spacer) {} (update environment around player) (stoppable) (intro) *(repeat forever) (read-parse-act) (fail)

So…why do we mark (program entry point) as live for multi-queries? I honestly have no idea! Maybe it has something to do with how (program entry point) is queried to launch the program, which is not documented in the manual? Or maybe it’s simpler than that—it looks like every builtin is automatically marked as “invoked simply” and “invoked multi”, no matter what. And that includes the entry points.

The answer ends up being: dead code usually gets eliminated, but code called from a (program entry point) rule is never marked dead. If you want to use the standard library’s utilities without its main parsing loop, comment out (or just rename) its entry point rule.

1 Like

PS: What happens if we don’t mark the entry points as multi-queried? No idea! I might give that a shot this weekend and see what happens.

1 Like