What is a scaled address?

I’m trying to make sense of Inform Technical Manual section 8.8 Packed address decoding (https://www.inform-fiction.org/source/tm/chapter8.txt). It makes use of the term “scaled address” – a term which isn’t mentioned in the Z-Machine Standards or explained anywhere else in the Inform Technical Manual, and which doesn’t show up on a search of the forums.

What is a scaled address?

EDIT: Forgot to mention that I saw the Wikipedia article on addressing modes that does mention scaled addresses (Addressing mode - Wikipedia), but in the example given all of the elements seem to have the same size. Since strings and routines would have different sizes, I’m not sure that it has the same meaning here.

Looks like ‘scaled address’ is an Inform-specific term, to refer to version 6-7 addresses that are relative to the routines/strings offsets. They look like a temporary address mode that get corrected before the final bytecode is written.

The reason for having these is that when compilation was actually happening, Inform did not know enough to calculate packed addresses, and instead wrote scaled addresses starting from 0 for each area. In backpatching, then, Inform adds the above offsets to each packed address, and then they come right.

Thank you for the reply, @Dannii, but I’m not sure that I fully understand.

What I thought I understood from the passage that you referenced was that “each area” meant each of the two separate high memory areas respectively dedicated to routines and strings, and that “writing scaled addresses from [zero]” essentially meant that each routine or string was assigned a temporary zero-indexed identifier that is unique within their respective groups. Is that correct?

I’m not certain this is how it works, but I don’t think it’s a 0-based identifier, but instead a 0-based address (possibly a packed address, ie, divided by 4 or 8.) In versions 6-7 you could just use those addresses, because the VM would add the offsets itself. In other versions Inform has to patch the addresses before writing out the file.

OK, that makes a little more sense. Thank you. So then I think that means:

  • 8R means the byte address that is 8 times the routine offset value (which is stored at $28?)
  • 8S means the byte address that is 8 times the strings offset value (which is stored at $2a?)

… so that (for Z6/Z7) any temporary relative address P for a routine or string would be relative to one of those two byte addresses (each of which can be thought of as the relative zero value of their respective assigned ranges), and translate to the (4 * P) byte of that range. So, for example, if the routine offset is $1000 and a given routine’s assigned P is $33, then the start of the high memory for routines would be $8*$1000=$8000 and the relative byte address within the range for routines would be $33*$4=$CC. So that means that the byte address for the routine in question would end up at $80CC? (Edit: Fixed the example in the question so that it makes sense.)

… but if that’s the case, then I’m really not understanding the passage at the beginning of the section:

In versions 6 and 7, R and S can be chosen fairly freely in the range 0 to
M/8, where M is the minimum address of a routine or static string as
appropriate.  The point is to expand the address space by making higher B
values available (whereas P has to lie within 0 and 65535): so one wants to
make R, S larger rather than smaller.  On the other hand, it's necessary to
the Inform language that the following are all different:

    (a) the number 0;
    (b) valid object numbers;
    (c) packed addresses of routines;
    (d) packed addresses of strings.

To be sure that (c) and (d) differ, Inform sets S = R.

I don’t understand that last sentence, in particular. If R (routine offset) and S (strings offset) are both the same value, then it seems like any routine that was assigned the same temporary P value as a string would end up with the same byte address. (And even if not at the exact same byte address, probably with a byte address such that its byte code would overlap a string’s byte storage.) And it seems like this would happen if each is being assigned a temporary P starting from zero in their own independent relative range.

… unless for Z6/Z7 routines and strings share the same assigned high memory range and also share the same pool of unique P values? Then they would be intermixed but presumably not overlapping or colliding. It doesn’t seem like that’s what’s going on, though, because in Z-Machine Standards section 1.2.3 (The Z-Machine Standards Document) it states the same formulae for packed address decoding along with:

R_O and S_O are the routine and strings offsets (specified in the header as words at $28 and $2a, respectively).

and the example memory map definitely shows different values for “Z-code” (with a start address the same as routines offset times 8?) and “static strings” (with a start address the same as strings offset times 8?)

As you can tell, I’m really quite lost as to what ITM 8.8 is intended to explain. Although each individual piece makes sense in isolation (given enough assumptions), I can’t make any sense out of the system as a whole as presented.

Maybe I can present some easier-to-answer questions, the answer to which will hold the key detail(s) I’m missing or misunderstanding:

[EDIT: My own answers to the questions have been added in italics.]

  1. Is the entirety of the discussion in ITM 8.8 specific to Z6/Z7 format? Per zarf (below), the answer is: “No. Only the section that starts ‘In versions 6 and 7…’ and continues through the selection of S and R.”

  2. In ITM 8.8, does “code_offset” mean the same thing as “R”? Apparently not. Later in the section, it defines these terms as "code_offset = 4X" and "strings_offset = 4X + size of code area", so unless the value of "size of code area" is zero, it can’t be the case that code_offset == R == S == strings_offset.

  3. In ITM 8.8, does “strings_offset” mean the same thing as “S”? See previous answer.

  4. In ITM 8.8, is the “X” as first defined (as “slightly more than the largest object number”) the meaning that applies in all formulae that use X? Apparently yes. X translates into compiler code as extend_offset, as mentioned. Inspecting 6.34 compiler code, extend_offset is assigned in tables.c via: extend_offset=256; if (no_objects+9 > extend_offset) extend_offset=no_objects+9; . Since no_objects is defined in objects.c with comment "/* Number of objects made so far */", then [the greater of (no_objects+9) and (256)] == extend_offset == "slightly more than the largest object number".

  5. In ITM 8.8, what defines the “slightly more” part of the phrase “slightly more than the largest object number”? Is it just padding to ensure that the value to be used is divisible by 8? See previous answer. The value 9 in no_objects+9 is fixed, so it is not padding for divisibility.

  6. In ITM 8.8, when it says “Inform therefore nudges up M and X to ensure that M and 4X are both divisible by 8”, does this mean that once X has been adjusted for padding if necessary, 4X will be divisible by 8 and therefore M as calculated from the formula will be divisible by 8? Or is M somehow adjusted independently of any adjustments to X? M is not directly related to X at all, only indirectly related via the formula used to calculate R. X is adjusted independently of M in a separate part of the compiler code (see answer to #4). The code to insert the padding being discussed in ITM 8.8 is in tables.c. Following comment "/* ----------------- A gap before the code area ----------------------- */" is the line while ((mark%length_scale_factor) != 0) p[mark++]=0;. Note that length_scale_factor appears to be set based on the Z-machine version target in inform.c: For Z6/Z8, it is set to 8. For Z5, it is set to 4. For Z3, it would be 2. It appears that only M (Write_Code_At) is “nudged up” with padding as required to 1) align with length_scale_factor divisibility, 2) be increased to a minimum of scale_factor*0x100, then 3) if the -B switch is in use, make a further adjustment to be divisible by scale_factor*2, if needed. (Note that scale_factor is separate from length_scale_factor; scale_factor also varies by Z-Machine version: Z3 = 2, Z5/Z6 = 4, Z8 = 8.) For Z6, where 2*scale_factor == length_scale_factor == 8, step 3 should always be skipped, as 256 is divisible by 8.

  7. In ITM 8.8 is R an offset from byte 0 (i.e. true bottom of memory) or is it an offset from the end of object memory? Unclear. ITM 8.8 claims that it is calculated as a scaled offset (i.e. should be interpreted as 8 times its nominal value in bytes) from a seemingly-arbitrary value, via the formula given, which is R = (M - 4X)/8. Perhaps the answer is neither.

  8. In ITM 8.8 what are the “areas” referred to in the phrase “... and instead wrote scaled addresses starting from 0 for each area”? They appear to be the two high memory areas labeled "Z-Code" and "static strings" in the output of compilation using the -z switch, which are synonymous with the same-labeled blocks in the example diagram found in ZMS 1.2.3. This makes sense if it is assumed that the backpatching process will interpret the #code_offset and #string_offset values (which are not the same as R and S) as the relative zero address of each area, respectively.

  9. In ITM 8.8 is the “code area” referred to in the formula “strings_offset = 4X + size of code area” the same as the “Z-code” block of the example memory map in ZMS 1.2.3? Probably yes. This assumption fits with the answer to the previous question.

  10. In ZMS 1.2.3, does the example memory map apply to only specific versions of the Z-machine? If so, which? Based on the memory addresses given for the start of the "Z-Code" and "static strings" sections, which are $101A and $5D56, respectively, this example memory map would have to be for Z3 (because the only divisible scale factor value is 2).

  11. Is the “R_O” referenced in ZMS 1.2.3 the same as the “R” referenced in ITM 8.8? It seems likely, given that the first word of ITM 8.8 (“Recall…”) probably refers to what’s written in this section.

  12. Is the “S_O” referenced in ZMS 1.2.4 the same as the “S” referenced in ITM 8.8? See previous answer.

  13. Can a routine and a string have the same P value for the purposes of the formulae presented in both ITM 8.8 and ZMS 1.2.3? No, at least with respect to Inform-compiled story files. A P value means a packed address value. It is always in the range $0000 to $FFFF. Although the method of translating packed addresses to high memory addresses varies by Z-machine version as explained in ITM 8.8 and ZMS 1.2.3, within Inform (as alluded in ITM 8.8) the initial packed addresses must still be unique for any two distinct items within addressable memory (i.e. P range of $0000-$FFFF), regardless of what type said items are. While the different packed address translation formulae given for Z6 imply that a string and a routine might theoretically have the same P value (because they use different Z6 offset values R and S, respectively), it seems that the central point that ITM 8.8 is trying to get across is that Inform will always set R and S to be the same value when compiling for Z6. (By implication this is unlike the way that Infocom’s ZIL compiler for Z6 would handle the situation.) There is a strong implication that it is possible for the Z6 architecture to access different high memory addresses in two different requests that use the same packed address value (P) – one a call to a routine at P and one a call to print a string at P – but for this to be possible the values of R and S (stored at $28 and $2A, respectively) would have to be different from each other, which by design won’t occur when compiling for Z6 with Inform.

  14. Is ITM 8.8 intended as a compare-and-contrast exercise between the way that Infocom/ZIL puts together the Z-machine’s memory map vs. the way that Inform does it? In other words, is it trying to illustrate that in Inform the P values won’t collide/overlap even if R and S are the same because Inform assigns routines and strings to their own discrete P ranges, while Infocom-compiled Z6 machines can have what would be colliding/overlapping P addresses because it somehow relies on different values for R and S? Perhaps yes. This assumption fits with the answer to the previous question.

I cannot answer all of that. However:

No. Only the section that starts “In versions 6 and 7…” and continues through the selection of S and R.

A scaled address is a relative address within a memory area, divided by scale_factor.

Thank you, @zarf – even a straw at which to grasp is helpful at the moment.

I did the exercise of taking a copy of Ruins from working through DM4 (which has plenty of objects [67], routines and strings) and adding the following to its Initialise() routine:

print "<header word $28 (routine offset) = ", 0-->$28, ">^";
print "<header word $2A (strings offset) = ", 0-->$2A, ">^";
print "<#code_offset = ", #code_offset, ">^";
print "<#strings_offset = ", #strings_offset, ">^";
print "<location of Main() = ", Main, ">^";
print "<location of Main__() = ", Main__, ">^";

[EDIT: The above includes an addressing error, so the wrong offset values for the header block addresses $28 and $2A are reported. See zarf’s reply and revised output values below.]

Then I compiled it as Z5, Z6 and Z8 (using Inform 6.31) and looked at the produced output.

compiled as Z5:

<header word $28 (routine offset) = 32>
<header word $2A (strings offset) = 32>
<#code_offset = 3325>      ! $0CFD
<#strings_offset = 18858>  ! $49AA
<location of Main() = 3327>
<location of Main__() = 0>
! per -z output, Z-code begins $33F4 and strings begin $126A8, #*_offset values exhibit scale_factor 4

compiled as Z6:

<header word $28 (routine offset) = 32>
<header word $2A (strings offset) = 32>
<#code_offset = 256>       ! $0100
<#strings_offset = 15872>  ! $3E00
<location of Main() = 258>
<location of Main__() = 0>
! per -z output, Z-code begins $33F8 and strings begin $127F8, no apparent correlation to reported #*_offset values

compiled as Z8:

<header word $28 (routine offset) = 32>
<header word $2A (strings offset) = 32>
<#code_offset = 1663>      ! $067F
<#strings_offset = 9545>   ! $2549
<location of Main() = 1664>
<location of Main__() = 0>
! per -z output, Z-code begins $33F8 and strings begin $12A48, #*_offset values exhibit scale_factor 8

[Note that output when compiled with Inform 6.34 (but StdLib 6/11) is essentially identical, although values for #strings_offset are very slightly (<= 10 bytes difference) lower.]

From this, it seems that R and S (at least as stored in the header block) are always identical (and always 32?), regardless of Z-machine version. Main() always seems to be just after #code_offset. (I don’t know what’s going on with the reported location of Main___, which is the lowest-address routine cited in ITM 8.8. The reported address seems incorrect.)

In the Z6 case, the $28 routine offset value times 8 does match up with the value of #code_offset, and the reported location of Main() is in accordance with this value. However, the starting address for the Z-code block ($33F8 as reported in the memory map at compilation) doesn’t correlate. Also, the $2A strings offset value times 8 clearly does NOT match up with the value of #strings_offset. So… it does not seem possible that R == S, as described.

In the Z5 and Z8 cases, there is a clear correlation (at different scale factors, per the version-specific formulae in ITM 8.8) between the reported #*_offset values and the reported-at-compilation starting addresses for the memory blocks for Z-code and strings. However, there is no correlation between reported #*_offset values and what’s stored in the header block at $28 and $2A.

So I’m back to square one now in making sense of this section. (Though some of the details underlying packed addressing are becoming clearer.)

print "<header word $28 (routine offset) = ", 0-->$28, ">^";
print "<header word $2A (strings offset) = ", 0-->$2A, ">^";

Whoops, I see the mistake. 0-->$28 is the $28th word in memory, which is at address $50.

(Note to onlookers: these are hex values.)

Try this:

print "<header word $28 (routine offset) = ", $28-->0, ">^";
print "<header word $2A (strings offset) = ", $2A-->0, ">^";

(0-->$14 and 0-->$15 would work equally well.)

Much obliged, @zarf. With those changes in place, the reported offsets values for Z5 and Z8 are all zero (here meaning the offsets stored in the two header words $28 and $2A; #*_offset values report as before). For Z6, they are:

<header word $28 (routine offset) = 1535> ! $05FF
<header word $2A (strings offset) = 1535> ! $05FF
<#code_offset = 256>        ! $0100
<#strings_offset = 15872>   ! $3E00
<location of Main() = 258>  ! $0102
<location of Main__() = 0>
! per -z output, Z-code begins $33F8 and strings begin $127F8, see below

It seems like all the pieces are present to be able to reconcile the numbers in the calculation for R, shown as:

R  =  (M - 4X)/8

[EDIT: OK, this didn’t look like it was working, but now I’ve got it. The issue was that the value of X (extend_offset) is NOT defined as no_objects+9 in my example case; X gets “nudged up” to 256 ($0100) as a minimum value prior to calculating R. This step is hiding in the definition of “slightly above” as the phrase is used in ITM 8.8. Answers above updated.]

When compiling for Z6, in tables.c the internal compiler variable code_offset (which is presumably the value reported for system constant #code_offset) is initially set via:

code_offset = extend_offset*scale_factor;

This occurs immediately after extend_offset is determined per the answer to question #4 above. The following code is used to populate the value of R as stored in $28:

j=(Write_Code_At - extend_offset*scale_factor)/length_scale_factor;
p[40]=j/256; p[41]=j%256;

Substituting code_offset for M and the Z6 values for scale_factor (4) and length_scale_factor (8), this accords exactly with the given formula.

In my example scenario:

  • there are 67 objects per compiler’s -j output
  • no_objects+9 is less than 256, so X (extend_offset) is set at $0100.
  • M is $33F8, as reported by compiler’s -z output and Initialise().
  • 4X == 4 * $0100 = $0400
  • M - 4X == $33F8 - $0400 == $2FF8
  • (M - 4X)/8 == $2FF8 / 8 == $05FF == 1535 == $28-->0

For fun, I added to my Initialise() output:

e = "egg";
print "<location of an instance of ~egg~ = ", e, ">^";

and got output:

<location of an instance of "egg" = 19339>

One virtual Easter egg hunt with a hex editor later, I found the sequence $A9 $8C at position $015E24 in the Z6 story file, so all of the above seems to be tying out.

@Dannii, in this response above, did the phrase “routines/strings offsets” map to #code_offset and #strings_offset (which now seems to be the best way to understand it), or to R and S (as I had initially interpreted it)?

I don’t know. I’ve never looked into the compiler internals like this.

So. That was a journey. For anyone trying to follow the same path, I strongly recommend reading ZMS section 1 and the contents of compiler source file tables.c in detail before even trying to make sense of ITM 8.8. Before that, you may wish to review the following as a guide to items of interest.

First and foremost: despite certain similarities in name and pairing, there is no direct connection between the concepts behind R and S (from the Z6 architecture) and the concepts of code_offset and strings_offset (from the Inform compiler). R is not the same thing (conceptually) as code_offset, and S is not the same as strings_offset.

Second: The presentation of equation M = 4X + 8R in ITM 8.8 can be (mis)taken to imply that it is the formula used by the compiler to determine M, but it is not – rather, the formula seems to be the expression of an internal relationship holding true for the Z6 and Z7 architectures only: a quirk that seems intended to create more finely-addressable high memory areas for routines and strings than would otherwise be possible with the targeted addressable-to-virtual memory scaling factor of 8. (Compare the minimum distance between translated packed addresses within virtual memory as seen in Z6 versus Z8.) M is determined solely by the algorithm as described in the answer to question #6 above. The important equation is the rearranged version R = (M-4X)/8, which gives the compiler’s formula for determining R.

Third: The phrase “where X is slightly more than the largest object number” in ITM 8.8 leaves things unsaid that might be better laid out explicitly. Most importantly, X has a minimum value of 256, which is used if the number of objects (including those from the Standard Library) is less than 247. Compiler switch -j will output objects as identified, and this can be used to see the true object count that will be used by the compiler when determining X.

Fourth: If scanning or searching through the contents of tables.c, be aware that not every use of the key variables named applies to the discussion in ITM 8.8. Sometimes, they are used within the Glulx output function.

As a compact summary of equivalencies when compiling for Z6:

  • scale_factor == 4 == minimum difference between byte addresses translated from packed addresses
  • length_scale_factor == 8 == ratio of virtual memory size (per Z-Machine version) to addressable memory (fixed at 64K)
  • X == extend_offset == max((no_objects+9), 256)
  • code_offset == (scale_factor*extend_offset) == (scale_factor*(max((no_objects+9), 256)) == 4X
  • M == high memory starting mark == whatever P address the compiler was at after writing out “static arrays” (which are written after the dictionary and are not to be confused with the memory block labeled “arrays” in -z output) and then padding for divisibility by length_scale_factor == $04-->0 == Z-code lower bound in compiler’s -z output
  • Write_Code_At == max((scale_factor*256), M)
  • R == $28-->0 == (Write_Code_At - ((extend_offset*scale_factor)/length_scale_factor))

At any rate, this post was theoretically intended to answer the question of what a scaled address is. The answer provided by zarf is a succinct response.

If you want to dig one layer deeper, Inform has a -B option which does allow R and S to be different values, thus permitting it to compile larger V6/7 games.

The option is described here: Inform - Support - Patches