You seem to be saying that since ChatGPT produces plausible output, it must work like an IF engine, in two stages. First parse the input into a representation of a command; then generate output based on that command. If that were true, we could make use of the parsing stage by itself.
But that’s not true. It’s not two stages. There’s just one stage, which is “put the input through a statistical blender.”
This was basically the goal of the SpiritAI project which Emily Short started several years ago. Aaron Reed and I worked on it for a few years. Unfortunately the project didn’t wind up going anywhere and got shut down in 2019-ish.
This was just before the AI chatbot explosion, but even if we’d kept it going, it’s much harder to train. The reason ChatGPT works so outrageously well is that you can train it on any arbitrary text. Every bit of text leads to the next bit of text. You can feed in as many gigabytes of text as you can find (and the Internet has plenty).
To train a classifier for IF input, you’d need a corpus of IF commands and their parsed representation. (Not just the output of the command, but the action/noun data from inside the parser.) This basically doesn’t exist.