Format of Inform 6 Debugging Information Files Version 1.0 Rough Draft (2012 October 16) 0: Introduction This is a specification of the Version 1 format for the debugging information files emitted by the Inform 6 compiler. It replaces Version 0, which is documented in Section 12.5 of the Inform Technical Manual. 1: Overview Debugging information files are written in XML and encoded in UTF-8. They therefore begin with the following declaration: Strings are expressed with markup (", &, etc.) rather than CDATA sections, numbers are written in decimal, and excerpts from binary files are base64-encoded. 2: The Top Level The root element is given by the tag with three attributes, the version of the debug file format being used, the name of the program that produced the file, and that program's version. For instance, ... Underneath the root is an element for the story file header and then the bulk of the content organized by source file. 2.1: The Story File Header The story file header contains a Base64 encoding of the story file's first bytes so that a debugging tool can easily check whether the story and the debug information file are mismatched. For example, the header for a Glulx story might appear as R2x1bAADAQEACqEAAAwsAAAMLAAAAQAAAAAAPAAIo2Jc 6B2XSW5mbwABAAA2LjMyMC4zOAABMTIxMDE1wQAAMA== The story file header is mandatory, but its length is unspecified. Version 6.33 of the Inform compiler records 64 bytes, which seems sufficient. 2.2: Source Files Source files are encoded as in the example below. Each file's path is recorded in two forms, first as it was given to the compiler (the form suitable for presentation to a human) and second after resolution to a relative or absolute path (the form suitable for loading the file contents). All paths are written with forward slashes separating directory and file components, regardless of the host OS. example.inf directory/example.inf ... The other children elements, represented by the ellipses, describe the file's contents, as outlined in the next section. Some code—the veneer, for instance—has no associated source file. All such code will be grouped under one tag with both path elements omitted: ... 3: The Source File Level Every element that may appear under a tag contains one child element holding its I6 identifier and another indicating the section of the source code that declared that identifier, unless the identifier was declared internally: <...> foo ... ... Most source code locations take the following format, which describes the line and column where they begin, the line and column where they end, and the file positions (in bytes) corresponding to those endpoints: 1024 4 44153 1025 1 44186 However, in the case where the endpoints coincide, as happens with sequence points, the end elements may be omitted: 1024 4 44153 Line numbers begin at one, but column numbers and file positions count from zero. This is consistent with the majority of text editors. 3.1: Named Values; Constants, Attributes, Properties, Actions, and Fake Actions Apart from identifier and location, named values have only one other child element, the value itself. For instance, MAX_SCORE 40 ... records a named constant. Attributes, properties, actions, and fake actions are also names for numbers, and differ only in their use; they are represented in the same format under the tags , , , and . Note that the system constants defined by the compiler and tabulated in Section 12.2 of the Inform Technical Manual are included in the debug information file, even though they are not created by Constant directives. These entries supersede the MAP_DBR entries from the Version 0 format [1]. As another exception, constants that are created with a Constant directive might not appear if they are #undefed at a later source location. 3.2: Named Addresses; Global Variables, Classes, and Objects Globals and objects are a similar case, except that the value they associate with a name is specifically an address. They therefore contain an
tag in place of the tag, as in: darkness_witnessed
1520
...
Some objects represent classes; in that case they will be given with the tag rather than and include an additional child to indicate their class number: lamp 5
1560
...
3.3: Named Memory Regions; Arrays and Routines Arrays and routines are likewise represented by memory addresses, but the debugging information additionally includes the extent of the region they point to. The element indicates this size: <...> ...
...
... ... ... Array records also track how they were declared, specifically the number of bytes allocated per element and whether their zeroth element was set equal to the array length: route
1500
20 4 true ...
Routines, on the other hand, have children elements for their local variables and sequence points. The format for local variables mimics the format for global variables, except that they are located by zero-based index and do not have their source code location recorded: rulebook 0 As for sequence points, each is stored as an instruction address and the corresponding location in the source code:
1628
...
In this case, the source code location will always be a single position, not a range. Sequence points are defined as in Section 12.4 of the Inform Technical Manual, but with the further stipulation that labels do not influence their source code locations, as they did in Version 0 of the debug information format. For instance, in code like say__p = 1; ParaContent(); .L_Say59; .LSayX59; t_0 = 0; the sequence points are to be placed like this: <*> say__p = 1; <*> ParaContent(); .L_Say59; .LSayX59; <*> t_0 = 0; rather than like this: <*> say__p = 1; <*> ParaContent(); <*> .L_Say59; .LSayX59; t_0 = 0; -------------------------------------------------------------------------------- [1] Well, almost. Not all of the MAP_DBR values are available as system constants under both targets. But is there any reason not to make them so?