The aspect of it I’m thinking of never really made it into the cultural osmosis view, and that’s the free mouse cursor (as opposed to mouselook), along with the fairly complicated posture system. Of course first-person RPGs were doing the free mouse cursor thing too, but System Shock brought it all together to make a very physical, leaning around corners, crawling, fiddling with relatively complicated objects game. It just had more affordances than a Doomclone, and deployed them in a more adventurey way than RPGs. Although the choice to make it playable using only the mouse limited how freely arcade action could mix with clicking on stuff in the world.
What I think is interesting about putting a parser into an action game is that it makes it possible to interact with things that the engine isn’t really capable of representing. You’re adding another layer to the world model. In a text adventure, you move through space on a room level, and interact with objects on a finer level. In Doom there’s more nuance to the movement level, but the object interaction level isn’t really included. This would be a way to “zoom in” textually on things that couldn’t really be discerned or manipulated in Doom.
I think it’s fairly common in older RPGs to “zoom in” in this way, although they don’t model the level in detail. I’m also thinking of a moment in Cave Story where you find a small object in the grass (a ring or a key or something). What if, instead of just walking past it and being told you picked it up, you had to go find a stick and explicitly fish it out of a hole or something? I think you’d have access to some moods you can’t create just by printing, “You found a ring!”
Really the interactions could be fairly standard adventure stuff. The difference would be that rather than getting from interaction to interaction via NSEW they would be connected by arcade maneuvering. So you could ask, what kind of actions can you come up with that will motivate non-standard arcade maneuvers? In Doom your movements are governed almost entirely by tactical needs, with the occasional switch that must be reached. It can’t go much beyond that because it can only use level features the engine is capable of representing. A parser would be a way to abstract everything the engine doesn’t support well, tying it in to the engine by determining scope from position in the engine.
Just making the player type under pressure is fairly novel too.