It’s good that that’s your plan, and I look forward to seeing that plan executed. But asking an LLM to generate the solutions for you isn’t going to be a very effective way to start. It’s also unnecessary, since walkthroughs have long since been written by humans. Here’s the one I used for Zork I, for example.
Keep in mind that these games contain random elements, so following any script verbatim is unlikely to work unless you know the RNG is seeded the same way. In Zork, you can do this by starting with commands like:
#RECORD - start recording your commands to a file
#RANDOM 123 - seed the RNG
I would recommend recording the solution yourself, typing commands into a Z-machine interpreter, rather than trying to generate it non-interactively.
Also, as Daniel mentioned, the full set of ZILF test cases may prove useful to you. Since your project is licensed under the GPLv3, you’re free to convert those tests to Python and incorporate them, if you want: see the integration tests, which execute compiled Z-code and thus need a Z-machine interpreter, and the interpreter tests, which test the facilities used to execute macros at compile time.