IF Text Emission as a Service

DavidC · January 6, 2024, 4:47am

Okay IF platformers. In my experimenting with a C#-based platform, one of my interests is in a completely separate text emission process.

The general logic is:

each turn
- report things that happen
- report location description
- report scenery
- report items laying about
- report NPC’s
- report list of exits and possibly their destinations

All of this would get sent to the Text Service. When the turn is complete, the game loop will call the TextService (probably from the World Model, but unclear) and get the text and emit it using some kind of logic or template.

That’s the tricky undefined part. In our current IF platforms like TADs and Inform, text emission is a stream of text spliced together with punctuation and line breaks. This has always been annoying to me.

I want to create an entirely new paradigm where the author simply emits contextual text and the text service puts it all together into some standard format.

I’m not worried about Fonts or Types or Styles. Those are well-known and can be easily managed.

This is more about structure. In our current platforms, we often will loop through objects and delimit their output with commas or in other ways. Sometimes we leave text without punctuation, leaving that for some final text handling.

What if we didn’t worry about any of that and simply reported things to the service and the service used the list of contents (and their context) to structure the output properly?

A template might be something like (pseudo code):

emit before-turn actions with two line breaks
emit location name in bold with one line break
embed scenery and list of exits in location description
emit location description with two line breaks
emit npc names
end

Of course, the whole line break thing is up in the air too, since we could be emitting HTML or be targeting some other kind of client.

Thoughts anyone?

pinkunz · January 6, 2024, 5:53am

I sincerely read this in its entirety no less than 3 times. I still don’t fully understand.

I gather that you are unhappy with either the order various outputs are listed, how they are formatted internally and in relation to each other, or the positioning of line breaks and punctuation. Which is fine, I suppose. What I’m having a harder time seeing is what you’d prefer.

May I make a suggestion? Show us two full outputs. One using either TADS or Inform, displaying the things you are unhappy with, and the other an ideal output from this new system. They would convey the same information, but differ in their presentation, making it easier for smoothbrains like me to better ascertain your intent.

Draconis · January 6, 2024, 6:22am

Having struggled a lot to get line breaks in the proper place with Inform, or to get results reported in the right order, I can see the appeal!

My main thought is that you’d end up with a bunch of lists, each of which will be joined together according to some rule (commas and “and”, double line breaks…) at the printing side of things. Lists of lists and all that.

Hanna · January 6, 2024, 9:39am

I’ve not fought with Inform’s approach to text output as much, but I’ve seen enough to take the idea of a more structured approach serious. The big question marks to me revolve around how structured it should, how deeply this structure goes, and what restrictions this entails.

On its face, the sort of template DavidC sketched addresses the highest level: all of the output in a turn is sorted into a few buckets, to enforce a convention about grouping and ordering. This much is sensible for most parser games, and it’s easy to imagine the template being flexible enough to add, remove and rearrange these buckets. But what is going into those buckets? In other words, how structured is the data flowing from game code to the template?

If it’s just a bunch of sentences or paragraphs to be concatenated, then authors remain very flexible in how they build those text snippets, but common tasks like collecting a list of objects for a room description, collating them, and gluing together the individual descriptions remains firmly in the land of game and library code (as currently the case in Inform). The flip side is that the template has very limited ability to intervene, e.g., to insert or modify HTML markup. A new system in this style could still have structured markup, but the game code would make most of the markup decisions before sending anything to the final stage.

The other extreme is a very detailed “data model” for text output. In the same way that a typical parser IF system has preconceived ideas about a world model, this style of system may have very concrete ideas of what concepts the text output deals with and accept output in those terms. Done well, this could be very convenient for games that fit the mold, but just as for world models, it’s a nuisance or roadblock whenever an author wants to do something a little different. Whereas Inform authors have to learn how to manipulate line breaks and the standard rules’ many printing activities, authors using this kind of system may have to learn how to customize this text emission service (which, handling many of the same concerns, would also have a fair amount of inherent complexity).

There are countless design points in between these extremes. I’m not experienced enough as an author to say for sure which of these would be the most interesting to me. I am, however, pretty sure that there’s no way to please everyone in this respect. Which doesn’t mean “nobody should try” but rather that I’d recommend targeting specific pain points, needs and preferences. Exploring more options is a good thing!

jbg · January 6, 2024, 9:59am

I understand the argument for this sort of thing, but I don’t think the solution is to have the game object constantly emitting events to the UI. The way I’d expect a modern-ish engine to work is that the game object keeps track of the game state, which the UI can then query via some standard API whenever, however, and about whatever it needs.

So on any given turn the UI does the equivalent of saying where is Alice? What’s the name of that location? What turn number is it. And so on.

With a simple game state model you could handle commands via an input queue and command results via an output queue. Either “simple” queues (i.e., the output queue contains text literals describing the output of the current turn’s command) or something that handles things like containment, occlusion, and so on (i.e., the output queue consists of some kind of event object, which can be filtered and sorted and re-written to reflect the UI player’s viewpoint).

But yeah. I’d absolutely like an IF engine that I could, for example, embed in an arbitrary javascript widget by just instancing it, submitting commands to it via some method on the object, and then handle the output however the hell I wanted (for my WIP one of the things I wanted to do was to have the IF part running on an old 8-bit computer in a 3d environment, but none of the TADS3 interpreters are really set up to handle that sort of thing).

DavidC · January 6, 2024, 10:12am

An important point to make is that this is a fun exercise. I’m not at all concerned about any future popularity or usage of my eventual system. This is just an accumulation of ideas from using Inform 6 and 7. I love IF from a creation perspective. The aspects of Grammar, Parser, and World Model are an extremely unique aspect of any programming I have done in 40 years. There really isn’t anything like it (that’s fun). One could make a case for processing insurance claims having a very similar set of overlapping rules, but I assure you, that code is nega-fun.

I’m not trying to solve any specific problem. I’m trying to reframe the problem under a different model.

What if the text service was really smart? Maybe it understands sentence structure, paragraph structure, punctuation, spacing, and given a bunch of contextual items, it can interpret all of it and cohesively produce a pleasant output.

Maybe as an author I could emit the awareness of some object that’s very important or suggest it remain opaque? The rules for such things could be very interesting. In one case the text would stand out and in the other a single word might be embedded in some other text.

The important object leans against the wall.

There is a four foot high wall filled with painted graffiti obscured by odd shadows.

This is purely experimental. I don’t necessarily have a vision. But I think this path has interesting potential.

More controversially, maybe there’s a language specific LLM that combines all the text into something interesting.

DavidC · January 6, 2024, 10:20am

There is definitely value in querying the game state for UI constructs, but there’s a competing aspect of reporting things that are an immediate result of a game state change.

You could query for physical changes, but as an author you want to interpret those state changes into something tangible and readable. I think this is more logically done as it happens than being later identified and handled.

Excellent ideas I will combine on this journey.

jnelson · January 6, 2024, 7:24pm

I don’t like the programmatic text emissions in TADS, such as automatically listing surface and container contents (which have parallels in Inform and elsewhere). I mean, it’s important that TADS have them, but I very much like sculpting my room descriptions rather than have a set of rules dump bland expositive statements to output.

An example from a WIP:

Beside him is a hospital nightstand with a note on it. Against the far wall is a broad table with an assortment of items scattered across its top.

I don’t claim this is superlative prose, but it’s better (to my eyes) than:

Beside him is a hospital nightstand. Against the far wall is a broad table.

On the nightstand you see a note.

On the table you see a pocketbook, a handkerchief, a coin, and a wristwatch.

Note that the first example will mutate if the contents of the nightstand and table change. In particular, if the table is stripped of all but one or two items, it will list them rather than use the generic “assortment of items scattered across it.”

I do rely on generic output for dealing with things like the user dropping random items in a room (versus setting them on a surface). I suppose I could arch over backwards and provide custom handling for that in every location too, but I’m not a madman.

So, I’m not saying mechanical output should be banished. This hospital room is an important room in the game, and I want that kind of control to “sculpt” the prose to my liking. I may want to list exits before scenery, for example, or blend the descriptions together. Or, don’t list NPCs here, because my room description will handle that based on variable game state. I might want to add a field to the emissions list for every location in a region (“emit ambient noises heard”), or depending on an event tied to a timed fuse. And so on.

I do think the direction you’re going is interesting. There’s plenty of assumptions baked into parser systems that deserve a fresh reappraisal.

Important point about TADS output streams: Although it’s pretty much as you described (blocks of text spliced together), the stream does contain markup, some of which provides structural information (room title, room description, etc.), and not merely style information. If the library and game authors were stricter in providing this structural information, I could see, in theory, a system coarsely like what you’re describing being achieved.

jbg · January 6, 2024, 10:00pm

But unless the system is realtime (like a MUD, or a very small number of parser IF works, like Madness and the Minotaur) there’s no need to notify the client asynchronously. Because everything happens in discrete, turn-sized chunks, and they happen in response to user input.

And even if turns happen independent of user input (for example, if the atomic time interval is a multiple of the length of in-game time it takes for the player to take a turn, so some mobs can move multiple times in a single player turn) that’s still amenable to handling via event queues without any need for asynchronous notifications to the client.

DavidC · January 7, 2024, 1:15am

None of the things I mention notify the client directly. The client doesn’t know anything until the turn is completed and the text service emits its results (or the client asks the text service “hey what should I show now?”).

Alianora_La_Canta · January 7, 2024, 10:59am

I’m sorry, but I feel like this missed an explanation of what “the Text Service” is in the first place (this is not a term I’ve ever seen used in IF before, and I’ve only previously heard of it as a Windows-specific natural language tool that, among other things, requires C or C++ to run - immediately ruling out the majority of IF projects).

Without a clarification of what you mean by Text Service, none of the rest of the post makes sense to me.

DavidC · January 7, 2024, 12:17pm

I’m definitely pushing my IF ideas towards modern software architecture patterns, so in this case Text Service might be thought of as a discrete piece of software that handles IF text emissions. During the turn process, code will need to emit text in response to story state changes. These “reports” normally get “printed” to some standard output in a collective stream. Instead, I’m suggesting we “report” to the Text Service and at the end of the turn process, ask it for the combined results. Within the Text Service is some logic that combines all the reported text as well as World Model state and sends this to the UI (or the UI subscribes to a TurnCompleted event and asks the Text Service for its results).

The hard part is, what can be reported and what intelligence is needed to effectively combine the reported text into something useful.

jbg · January 7, 2024, 9:54pm

In which case why is the service end deciding what to show the client at all? Unless you want to intentionally limit what the client has to look/behave like (which is exactly orthogonal to the reason I’d want such a system in the first place), there’s no reason to bake queries into the service end. That kind of thing is just a specific query the client can run if it wants to.

In slightly different terms, you’re about 90% of the way to a MVC design pattern, only the controller forces a particular view. Which kinda defeats the point. Unless line spacing (to use an example from the OP) is somehow or other intrinsic to the game world. And put in slightly different terms (again): imagine that instead of going to the equivalent of a generic web browser or a terminal emulator, your output stream is going to a screen reader. Would it ever request a location description followed by two line breaks? If not, why would you force it to accept output formatted that way, instead of in on some format better suited to its needs? Similarly, say that the client is graphical, and is rendering all the text to a virtual display in a 3D environment (which is one of my use cases). Why insist on outputting in a format designed for…whatever your implementation target is…instead of letting the client handle it?

DavidC · January 8, 2024, 9:28pm

There are two things going on here.

First, we’re doing IF things with our reported text. We may be merging inventory items into some larger text, or identifying different aspects of the text for UI usage.

The UI would then take that text (it may be a dynamic list of context+text similar to how I implemented FyreVM). Then the UI would choose to layout the text in its own way.

We’re trying to separate those concerns. One is IF things, the other is UI things. The Text Service in this case is mostly concerned with the IF things.

Things like punctuation would be handled in the Text Service. But we might also have line breaks in an atomic piece of text. We might use markup for that, so the UI understands it. These are definitely open questions about the line between what is “IF text” and what is “text to be displayed by the UI”.

tundish · January 8, 2024, 11:07pm

This is (sort of) how my Python library Balladeer operates.
A Story object maintains a number of contexts, called Drama. Drama objects encapsulate both state and behaviour.

In response to user input, the current Drama applies appropriate state transitions.
The Drama generates objects of speech (Prologue, Dialogue and Epilogue).
Using Drama state as the conditional criteria, the Story object selects more dialogue from scene files.
Static dialogue is reformatted with variable substitution from the Drama object.
The static dialogue is interleaved with the programmatic speech.
The resulting stream (optionally) invokes event handlers on the Drama object.
The Story selects one Drama to represent the next context in scope.
The new Drama publishes the user commands now permitted by current state.

Both static and programmatic dialogue are expressed in SpeechMark format.

riidom · January 8, 2024, 11:24pm

Just to understand. The input could be something like:
room: attic; atmosphere: dusty; lightsource: window (west side); items: coin, coin, coin, drawer

And then the result would be like, assuming the attic gets visited in the afternoon on a sunny day: “It’s dark and dusty, only where the lightbeams, well visible in the dusty air, coming through the window are hitting the ground, you can spot items. After some investigation you notice there are several coins.”

(Sorry for bad writing, that grammar level is slightly above my payroll:) )

This feels a bit like a 1st- or 3rd person 3D game to me, just with a text renderer as output. In a 3D game you give up control over the final image (in contrast to, let’s say, a point and click game or a VN), instead “indirectly” control via items, scenery and lighting what the player will see.

In a same way you would provide only fragments and have the text service assemble the final output. Is that what you meant?

jbg · January 9, 2024, 12:18am

Right, and that totally makes sense. My point is that once you start heading down the road of separating the game state from it’s representation, I don’t know why you’d want to start peppering presentation-level stuff into the back into the backend.

If you 100% unquetionably know everything the client is ever going to do and how they’re going to do it then maybe it makes sense (in terms of implementation cleanliness, ease of use, performance, or whatever). But if you don’t (and spoiler alert, you don’t) then it almost always makes more sense to bake your canned queries (or whatever you want to call them) into the system as API usage cases or whatever. Instead of being behaviors hardwired into the service end.

It’s your idea and so you can of course implement it in any way you want. But it is a lot easier to bake raw data into a new format than it is to parse baked data to reformat it and I don’t think I’ve ever been in a situation where I’d prefer more prescribed behaviors on the service end instead of better configurability on the client end.

As I understand your idea, you’ve got a game object that keeps track of the game state. In most IF systems, the UI gets user input, submits it to the game object, and in response gets a block of formatted(-ish) text to display. Your idea is to add structure to this communication channel in one direction: instead of a block of text, the game object provides something like an XML document with tags for various types of output (location name, room description, and so on). That makes sense. But if that’s what you’re doing, I don’t see the advantage of baking presentation details (like adding a line break to the location name) instead of just providing the semantic markup.

And then also I think that if you’re already doing this at all, then I don’t see any argument for why the client shouldn’t be querying via something that looks like an XML document (or SQL query) identifying what it needs to display. I guess if the client is intended to be super lightweight then it could request specific formatting, delimiting, and so on.

Perhaps I’m missing or misunderstanding something.

DavidC · January 9, 2024, 5:04am

I’ve already solved the contextual UI problem with FyreVM and I’ll probably reuse those concepts in this system.

But that still leaves a sizable area of text emission that’s centers around reporting things within the turn loop. Like I said, Inform 6/7 implement a stream of reporting that auto-concatenates text as its emitted. The author has some control over this, but some of it is embedded in the old Inform 6 “ways of working”. And at its root, this streaming text emission avoids almost all contextual control.

I want to alter the text emission process completely. I want to change the way we think about it as well as how its implemented. I want to remove the idea of Say or Print statements. Text emission will be about activities, objects, scenery, and npc’s. My proposed Text Service will be in charge of accepting emissions about those things, be able to query the World Model, and provide a complete set of text to the UI.

The UI can do whatever it needs to do.

Another aspect of this system is it will be wide open for replacements. I will make it so you can swap out my Text Service for your own. I’ll make a UI that acts as a terminal and be compatible with text readers. I’ll make a UI that’s web based. I’ll make a Windows UI.

That’s the other vision. Because I’m using a platform like .NET Core and C#, modification and enhancement are very easy.