Glk/Glulx threads

Have you thought about how the code of a game that does background processing might look? That’s an important consideration, since the low-level details will have consequences for compiler authors, library/extension authors, and game authors.

That is essentially what Ben is suggesting, an interrupt system. While it would work, I think it’s a much bigger change than what I’m suggesting. And I don’t think it would really provide much more than what I’m proposing either…

And to keep the UI responsive, you would need the VM to run in a separate thread.

Good question.

My idea is for a tasks extension. You could register a task, or a function to run, with a certain priority. When its time for the background processing, the task manager would run through the tasks in order of the priority you give them. Each task should be broken down into small steps, so that when the function returns glk_select_poll_all() can be called. Once the task decides it’s complete, it removes itself from the list. If all the tasks get completed then glk_select() is called.

If an input event returns before all the tasks are completed, or if threads/background processing isn’t available, then the game can look at the list and decide which ones are essential, and run them then. If threads aren’t available then you’re really no worse off than you are now.

What could be turned into a task? One example would be path-finding, which can be noticeably slow. Now the algorithm would need changes to be more predictive, but it would mean that all “go to …” commands would be very snappy. I don’t know the specifics of Alabasters conversation model, but I’m sure some of its computations could be performed as a task. Possibly even some table initialisations (I think the startup of Fate suffers from this?) If the conversation tables aren’t needed from turn one, then why initialise them then?

Even if you have a UI thread separate from your VM thread, you still can’t guarantee responsiveness during heavy calculations on the VM side.

  1. The player could press return, queueing a line event.
  2. The player could resize a graphics window, queueing a redraw event.
  3. The player could click the mouse, queueing either a hyperlink or mouse event.

All of these events need to be claimed in about 10 ms, to give you time to process them before the user perceives a delay. They must be handled by the VM; the library lacks the information necessary to provide a response.

You could do as you propose and run glk_select_poll_all() many times in quick succession, but that is poor application design: it is comparatively expensive to go out to the OS and request events in a one-off manner.

My proposal does nothing to address the responsiveness concern. The only way around that is to either have separate VM threads, or have the library run the code on behalf of the VM. But the callback / notification model does greatly reduce the expensive calls.

I also don’t think it’s a big change. If it works at all, it would work today at the VM level. Libraries would have to adapt, but I would rather make those changes and wind up with a more efficient, performant implementation.

I’m much more concerned with UI responsiveness than VM responsiveness. You should be able to scroll, type a command, copy text etc while the VM is doing heavy computations.

The VM may not respond, but that’s the same as it is now, with the delay between entering commands and whatever it outputs next. In many games the delay is perceivable. I want to reduce that by doing as much calculations while the player is reading as possible. Now unlike input events which could and do trigger large conversation computations etc, resizing a window isn’t likely to do that. So I guess resizing windows may be less responsive that before… but that’s surely not a common occurance. I don’t think we should worry if it takes 100ms, 200ms or even more to redraw the window.

What do you mean that glk_select_poll_all() would have to go to the OS to request events? Isn’t it Glulx/whatever going to Glk to request events? How is the OS involved?

Does garglk already start its VMs in new threads?

Can you please explain what you meant by this?

Well, the question is where and when does Glk get those events?

On Windows, Gargoyle registers a callback function to handle messages when it opens the game window. glk_select() retrieves UI events using the GetMessage call. glk_select_poll() uses the PeekMessage call instead. The former will suspend the Glk thread until a UI message comes in; the latter will return immediately, whether or not a message exists. The message is then handed back to the OS, which invokes the callback function.

On Linux, Gargoyle registers various callback functions for the OS events it wants (keystrokes, clicks, resizes, etc). Then glk_select() and glk_select_poll() call gtk_main_iteration, which checks if events are waiting and invokes the appropriate callback function to dispatch those events.

On Mac, Gargoyle has a completely separate launcher process that hosts the UI and stores all events for all game windows. (This allows me to keep the same execution model from the other two platforms, at the cost of some self-respect and the occasional sleepless night.) When new events come in, the launcher signals the appropriate interpreter process to let it know that it’s OK to ask for events. glk_select() and glk_select_poll() then retrieve the OS event and handle it in much the same way as the other platforms, minus the callback functions.

In all three cases, Gargoyle doesn’t handle any events until the VM tells it to, by calling glk_select() or glk_select_poll(). Apart from internally spawned events like sound notifications and perhaps timers, it doesn’t even know whether events are available until it polls the OS to ask.

Windows Glk and Cocoa Glk use native text controls, so quite probably they don’t have to be inside a glk_select() or glk_select_poll() call to display characters. I’m not sure if there are restrictions around this; presumably there needs to be some way to ensure that the OS and the VM are not updating the same window at the same time. But perhaps the execution model is different enough that this never comes into play.

Every VM has its own process, and therefore at least one thread. Possibly more, if sounds are playing.

Having to break the tasks into small steps seems like a weakness of the polling model. Something that takes one or two lines of I7 code to do all at once – like a huge object loop or pathfinding request – might take much more code to do in small steps, split across multiple phrases or rules, which could discourage authors from using this feature.

On the whole, I think we might be better served by a VM-based threading model, where the main VM starts a parallel VM to handle some background task while it’s waiting for line input, and is then notified through a Glk event when the task is complete. The background task wouldn’t need to be rewritten, and the input loop would only need to handle a single additional event (rather than switching between glk_select and glk_select_poll based on whether a task is pending).

We could avoid the complexity of thread safety by placing some restrictions on the parallel VM to make it more like fork() than true threading: e.g. memory is copied rather than shared, except for a known region used to return values once the task is finished.

So I’m looking at your source code, and the crucial parts are I think: gtk_events_pending(), winpoll() and PeekMessage(). You said that it was expensive to go to the OS, so are these calls really expensive? What sort of times are we talking about here?

You make a good point in regard to splitting up a task. I’m sure there will be some where that will be hard and awkward. I see the value in a further proposal for VM-level threads or an interrupt system.

But as I keep saying, that will be a huge change, while what I’m proposing is both small and achievable now. It won’t be suitable for everything, but I think it would be of benefit enough. The only concerns I see about it are that there be no dangerous interactions (printing to a window with an input request etc) and that the polling functions are too slow.

Also, even if you had to use a big single task function, the user wouldn’t be able to tell the difference between a delay in calling glk_select() and the processing that would take place afterwards. And if possible we should keep the I7 code simple, and make these changes at the I6 level.

I am somewhat shamed to admit that I don’t have precise numbers, just a gut sense that it is slow: expensive in terms of execution time rather than processor usage, since you have to call out to the window manager each time.

I think I understand your point better now, though. If you just look at Cocoa Glk / Windows Glk behavior, where the user can type merrily away in a standard text control that the OS manages while the interpreter chugs along, then your proposal makes a lot more sense.

The only time the user would notice a delay is if they happened to press return before you were done, or if they resized the window, and then the delay would last only until the next glk_select_poll_all() call.

However, with the way Gargoyle does it, the individual keys composing the line are not processed until the interpreter requests events. So there would potentially be delays after each keystroke, inconsistent delays depending on how close they were to the next glk_select_poll_all() call.

My objection was that you’d have to call glk_select_poll_all() a lot to keep the event processing smooth. You could get away with a much looser polling interval for the other Glk libraries, though, because you are really only doing it to catch the odd line event that happens before you’re finished processing.

With Gargoyle, you’d have to do it to catch every keystroke, because the big text buffer you sent to keep the user distracted might toss up a MORE prompt, and you need to hurry back to glk_select_poll_all() every few milliseconds or the user will notice that the window hasn’t responded to his spacebar press.

But when you call that poll function a lot, 99 times out of 100 the event queue is empty and you are just burning time and CPU in useless context switches. This is the expense I was referring to: the accumulated cost of all those wasted calls.