TADS 3 Documentation

I was looking for something I could do with Claude Code that might be beneficial to the IF community.

So I pointed it at the tads-sources repo and asked it produce a comprehensive single-point set of documentation for TADS 3.

Should I submit a PR?

2 Likes

This is very cool. I do this all the time.

Quick suggestion: You might want to do a deep thinking pass to check whether the implementation reflects everything described in the new docs.

I have done this recently on the blender repo and it came up with quite a few errata and omissions. It also tends to add clarifying comments that are really nice.

I did three deep thinking passes. There are inaccurate comments in the code. Claude found them on subsequent passes.

Just for clarification: Is this just the underlying engine docs, or does it include docs for Adv3 and/or Adv3Lite? Because without a library, TADS 3 is just a language spec for terminal I/O.

1 Like

It includes everything related to TADS 3 in tads-sources.

I don’t think Michael J Roberts is taking PRs, and that site just gathers the docs together. You might be able to have your own fork, but I haven’t checked the licence.

It might be better to have your own, to separate out the work since it’s not official. Plus it’s a way to quarantine any mistakes the LLM might make (it doesn’t have a great knowledge of TADS 3).

2 Likes

I made sure Claude didn’t invent anything. Pure translation. There are six deep dive add-ons and those are distinct and separate.

Sincere question: how do you make sure Claude doesn’t invent something without having enough experience to check it all over yourself? Or do you have the TADS knowledge to verify it?

2 Likes

This is where there’s a misunderstanding of how LLMs work. If you ask it to do something new, it may hallucinate. If you ask it to do empirical tasks, it’s just going to do what you ask. Plus this was a specific requirement with multiple check passes to verify accuracy. I’m very confident it just reformatted the TADS 3 docs into markdown and organized them together.

Sure, but even if it’s only doing what you ask, it can still get things wrong. It’s entirely possible for an LLM to misinterpret the meaning of a sentence, just like it’s possible for a human.

This isn’t an anti-LLM thing; I wouldn’t want a human without knowledge of TADS 3 to be writing the docs either, because if they misinterpreted a phrase, they wouldn’t have the domain knowledge to realize it. Human language is ambiguous in a thousand tiny ways and the only way to resolve that ambiguity is proper understanding. And as you yourself have pointed out, Claude’s understanding of some programming languages is much worse than others.

2 Likes

For a concrete example, imagine a Winograd schema. “Warning: a foo expression can’t be used inside a bar loop, because it’s purely a quux construct.” Which is a quux construct, the foo expression or the bar loop? The answer is obvious to someone who’s fluent in the language, but not at all obvious without that fluency. And if the LLM needed to make a list of all the quux constructs in the language, it would have to pick one or the other.

Well there’s the kicker - the docs repo stays in my GitHub account and people can try it out or not. I offer it up as a moderate effort to make something better. The build system can make changes at anytime so it’s an idempotent process. (I actually discovered a translation error with ticks, so there IS one issue).

I think it’s worthwhile to check it out. YMMV

2 Likes

It doesn’t actually seem to include anything from Learning TADS 3, for example.

1 Like

Right, but translation between formats isn’t exactly “empirical” for an LLM, though. It’s not quoting the original content directly; it’s still producing a new kind of output.

Though markdown is considerably less complex than HTML, so. (squints at Claude)

Curious about your Q/A tests.

Is there a reason why an HTML-to-md toolkit couldn’t handle this? I know the original docs don’t exactly have clean formatting, but I think there was already a task completed to convert the docs to markdown.

Also, I’m curious why you didn’t have Claude simply identify the non-standard HTML formatting. That would be easier for you to review as an overseer with the diff (to ensure the textContent remained unchanged). From there, traditional algorithms could convert the repaired HTML to markdown, no?

Like. sure I usually come in with my typical skepticism of LLMs but in this case, I’m actually confused by the chosen workflow.

EDIT: My thinking here is you do know HTML, but you don’t know TADS. However, you don’t need to know TADS to verify in a diff that the text between HTML elements hasn’t changed, according to a diff report.

EDIT 2: My memory is bad so I struck a sentence from this post for being wrong.

2 Likes

Of course, people are welcome to use it or not. But I wouldn’t recommend anyone submit a PR to the official repositories without actually knowing the language, whether LLM or human. I personally wouldn’t accept a PR like that for Dialog, because it would take me so much more effort to verify than it took you to generate it in the first place, and if I’m putting in that much effort, I might as well just redo the documentation myself.

Text processing is what LLMs are best at, and it’s a reasonable task to apply them to. But they’re not infallible either, and the fact that it’s so much easier to generate reams of text than it is to verify that text’s correctness is a legitimate logistical problem.

4 Likes

To be clear, please don’t take this as a “LLMs are bad and should never be used for anything” post. LLMs are incredibly powerful tools for text processing and there are so many things in that domain that they can be good for—especially since code is text! If I want to find the part of the Inform 7 documentation where it talks about “parsing numbers with words attached to them without a space in between”, a traditional search tool will come up with nothing, because that’s not how Inform phrases it. An LLM scanning the documentation can figure out that “numbers with units” are what I’m looking for. That kind of documentation processing is incredibly valuable.

My concern is that, in this particular case, you (and I!) have no way of verifying that the output is correct, because—correct me if I’m wrong, but—I don’t think you have much experience with TADS? I certainly don’t myself. If you do, then please disregard every single one of my critiques here, because none of them apply.

The upside of LLMs is that they can process natural language with all its exceptions and irregularities, but the downside is that natural language is full of ambiguity, so it’s possible to just fundamentally get it wrong in a way that grep can’t. So I’m a firm believer in never using LLM-generated text for anything if you can’t verify it first.

Make sure the code passes the test cases, make sure the resume is formatted correctly, make sure the documentation summary properly understood what it’s talking about, because Claude knows more about TADS than I do, but (from the sound of things) far, far less than this forum as a whole. And I’m not convinced its understanding is good enough to handle every single ambiguity the right way. Are you?

4 Likes

I actually asked Claude to identify something that might improve the repo. It suggested a unified set of documentation and the plan to do it.

The Learning TADS 3 doc is a pdf and although it doesn’t have a license, I’d be reluctant to convert it to markdown without asking Eric.

1 Like

Sure, but the unified documentation you linked claims:

Apart from the initial idea, I’m not sure why you’d get Claude to do this rather than pandoc. And I have a lot of experience with this deep learning, LLMs and TADS 3. You might feel confident that it does a good job on the many, many pages of text here, but if this is to be a source of truth for TADS 3 developers, you might need a higher bar of quality.

I like the sentiment of helping the community, but maybe ask the community rather than Claude.

2 Likes