Next steps for Inform 6 compiler

DavidG · May 24, 2021, 9:40pm

I think that support for Adam Cadre’s drawing code would need to come from the library.

KrocCamen · May 25, 2021, 10:01am

Could we have the ability to Replace Objects? This would help with PunyInform where replacing the TheDark or the Player objects is not really possible.

Warrigal · May 25, 2021, 11:32am

I’m pretty sure that would be a library function, not a compiler function.

heasm66 · May 25, 2021, 1:09pm

Use the “Optimal Parse” way to apply abbreviations. The thread here have code examples for both C, C# and Python.

Note: This is not to calculate the abbreviations. It is just how to apply them in an optimal way.

zarf · May 25, 2021, 2:53pm

When you say “Replace Objects”, you mean define a new object with the same name as an old one, whose properties and attributes completely replace the previous version?

I can see the use for that, but it’s not easy to implement. (The way the compiler accumulates property tables would have to be rewritten.)

Can you set up the library with a pattern like this?

#ifndef TheDark;
Object TheDark ...;
#endif;

zarf · May 25, 2021, 3:10pm

That thread is really long. You’re referring to this C# code sample, right?

heasm66 · May 25, 2021, 3:25pm

That’s the one.

Selkie · May 29, 2021, 4:41am

Speaking from decades of algorithm development background, I just wanted to say, a) I welcome this proposed series of improvements to the compiler, and b) doubling buffer sizes during reallocation is a robust and almost always superior approach (instead of growing by a constant amount).
You probably already knew that! E.g.:
https://blog.mozilla.org/nnethercote/2014/11/04/please-grow-your-buffers-exponentially/

ramstrong · May 29, 2021, 5:30am

Speaking of algorithm, the example given is just straight appended, cumulative data, in which case, repeated realloc() is extremely wasteful, and doubling memory allocation each time may run into memory segmentation problems.

It would be better to have a series of linked memory pages, keeping the maximum page size small, and skipping memcpy() calls inherent in realloc().

Granted, implementing linked memory pages is a pain, but one that yields superior performance assuming the data doesn’t change, merely added to the existing.

Note: linked pages can be serial (linked list), or parallel (indexed).

ramstrong · May 29, 2021, 5:50am

Speaking of text compression, I keep thinking about LZ77(stream) and LZ78 (lookup table). In fact, one of my ideas about it, is to just pack all the text into one big LZ77 blob. Then, when I print a string, it’d be like this:

pstrz(start,len)
where I’d do LZ77 stream uncompress, and print the characters from start for len characters.

Then I figure that a simplistic implementation may run long due to string locations not necessarily in sequence, and I decided not to do it.

Still, LZ77 gives you table entries automatically, so I figure I’d mention it for table look up generation. I’d be interested to see how the entries will compare to comprehensive brute force algorithm, used in some implementations.

fredrik · May 29, 2021, 8:01pm

Inform 6 is a compiler which can compile to Z-code format and Glulx format.

The Inform 6 compiler can’t start using new text compression algorithms at will - It can only use what each format supports.

ramstrong · May 29, 2021, 8:12pm

Huh? Inform 6 supports character array, in which case you can do anything.

But I’m not talking about that. In LZ77, compression is done by referencing existing text. Pointers to text snippets, if you will. The text snippet pointers can be sorted and be put into tables that Inform6 supports.

fredrik · May 29, 2021, 8:58pm

In the Z-machine, strings are stored in memory which is not directly addressable - it can only be acccessed by printing a string.

Yes, you can store data in a character array in the first 64 KB of a story file, which is the directly addressable part, but if you need custom code to decompress a string, that code has to be written in Z-code, which is about 100-500 times slower than the code that’s built into the interpreter, supporting the default compression scheme. Thus, thís kinds of compression will be useless for low-end targets like 8-bit computers due to speed and limited storage and useless on modern computers due to limited storage.

For the Glulx target specifically, it may be useful, I’m not sure.

ramstrong · May 29, 2021, 9:12pm

500 times slower?! Are you sure about that?
Anyway, as long as there’s String2CharArray conversion available, there is no problem.

And if there isn’t, I have a way where you can adapt LZ77 decompression algorithm onto Inform6 String/Routine. But it’s not easy, and I have no desire explaining the concept where there’s no appreciation for it, anyway. It’s one of those niche thingy. But here’s a hint: instead of printing String, print String returned by Routine.

fredrik · May 30, 2021, 8:25am

You’re right. 500 may be a bit high. The simplest operations in C64 assembler take 2 microseconds. The simplest operations in Z-code on Ozmoo take ~400 microseconds. I think 100x slower on average is closer to the truth.

What I mean to say is: Computationally intensive decompression routines would not be feasible on low-end platforms.

Maybe I misunderstand you completely. Do write up a working example or give more details, and maybe it will all get clearer.

ramstrong · May 30, 2021, 9:33pm

LZ77 decompression routine is not CPU intensive. All the heavies are in the compression stage.

HOW MUCH WOOD CAN A WOODCHUCK CHUCK IF A WOODCHUCK CAN CHUCK WOOD

HOW MUCH WOOD CAN A (-11,4)CHUCK (-6,6) IF …

The problem is if you want to build custom made routines in order to avoid STR2SZ conversion. Something like this

[ PZ n;
switch (n) {
0: return "HOW MUCH ";
1: return “WOOD”;
2: return " CAN A ";
3: return PZ(1);
4: return “CHUCK”;
etc

If you want to print WOODCHUCK, you can do print PZ(1),PZ(4); for example. These can be used as entries for built in string substitution, the ones with ampersand. Note, in actual implementation, you’d skip case 3, because you’d just print PZ(1) directly.

The tricky part is where entries are overlapping. Should you do “WOOD”, " WOOD", or “WOODCHUCK” ? This isn’t a problem in pure LZ77 implementation due to buffer window. Doing it with routines will, by necessity, require careful chopping of the text.

fredrik · May 31, 2021, 7:33am

Ok, I get the idea. You assign some special escape sequence in strings which is used to reference these string components, and when printing a string you instead print to a buffer and print from the buffer while checking for the escape sequence?

ramstrong · May 31, 2021, 7:46am

Yes. That’s right.

With internal Abbreviation/ lookup table you do ampersand-number, with numbers 0-63.

With LZ77 you do marker-#charskip-len. You don’t need separate table because the text entry is on the preceeding text stream. This is usually done by “rolling window” buffer, most commonly 64KB.
The key takeaway is that each LZ77 marker is a potential Abbreviation lookup table entry.

fredrik · May 31, 2021, 8:12am

But this means you have to uncompress the preceding x KB to have access to the string you will need to insert, or you can only refer to substrings in the current string?

ramstrong · May 31, 2021, 9:33am

Yes. The former. Remember that LZ77 is designed as stream, so you need to fill in the buffer. But to do that, you need the previous buffer as well, all the way to the beginning.

You can have multiple partitions, each with its own beginning, but you really need to start from beginning in each case.

LZ78 deals with look up tables, and avoids such issues, I believe.

Edit:
For 8 bit computers with limited storage, I wouldn’t implement pure LZ77 decompression. Instead, I would implement LZ77 compressor to output custom Routine as described above. That way, decompression is fast and dynamic, but that’s because it acts as lookup tables.