Creating a fair comparison

I’m trying to create some comparisons of my system Rez vs the systems I see it mainly competing with: Twine (Harlowe), Twine (SugarCube), Twine (Snowman), Ink, and ChoiceScript.

– Update: Based on feedback so far I’ve decided the most pragmatic thing is to just drop the idea until such time as I can create something rigorous.

I can only speak for SugarCube, but hum… you’re often going to get some redundancy or actual errors within your code. Like $args[0] being deprecated (and the proper form being indicated in the documentation) in widgets…

For example, a simpler and more correct widgets could be reduced to this amount of lines:

<<widget "ordinal">>
       <<if [11, 12, 13].includes($args[0] % 100) or not [1, 2, 3].includes($args[0] % 10)>>
                <<set _suffix to "th">>
       <<elseif $args[0] % 10 is 1>><<set _suffix to "st">>
        <<elseif $args[0] % 10 is 2>><<set _suffix to "nd">>
        <<elseif $args[0] % 10 is 3>><<set _suffix to "rd">>
        <</if>>
    <<print $args[0] + _suffix>>
<</widget>>

And if you’ve set $greetingCounts as a full object with all the relevant properties earlier (like in StoryInit), this would be irrelevant:

<<if not $greetingCounts>><<set $greetingCounts to {}>><</if>>
<<if not $greetingCounts[$greeting]>><<set $greetingCounts[$greeting] to 0>><</if>>

As well, this line is useless:

<<set $count to $greetingCounts[$greeting]>>

When you could simply call the widgets:

<<ordinal $greetingCounts[$greeting]>>

In short: please don’t use LLMs like Claude or GPT to create bits of code. Because 99% of the time, it gives you spaghetti code that’s works but by a thin thread that would break if you try to edit it, using deprecated or removed macros/functions, or just plain wrong.

If you want fair comparison in your examples, maybe ask outright. Debugging AI code is suuuuuuuch a pain.

8 Likes

OH And maybe another advice: test the code you get??

Cause your Harlowe code seemed nonesense… so I went and tested it on Twine, just in case. And I was right:


Error all the way.

5 Likes

Yeah and so does your Ink Code…

Because you don’t have a starting point for the code to run. The code essentially ends at setting the variable and THAT’s IT! Because it can’t even make a connection with your === start === passage (-> start is missing after the VAR).

2 Likes

I thought I might as well do ChoiceScript, since I’m at it…
And surprise surprise… it doesn’t work either

EDIT:
even when fixing your missing variable, it still doesn’t work

2 Likes

Ok one final comment:

Why Snowman? That format isn’t used by most Twine users, and it isn’t even recommended to most Twine users, unless they know what they are doing.

1 Like

Well first of all thanks for your efforts… but I hadn’t actually expected anyone would try and run them, none of those snippets are expected to be complete. E.g. the Rez example isn’t directly runnable either.

Because I arrived at building Rez after going through SugarCube and then Snowman.

Are you… serious?
You’re trying to compare your engine with other ones, and you don’t expect people who might have some experience with those engines not to try those examples? Or people who might stumble on your page and maybe thing of testing those other engines to see for themselves?

Then what’s the point of this whole thing?

2 Likes

In order to test it I’d have to learn rather more about Ink and ChoiceScript - I’ve never used either. I was really just hoping someone experienced with those systems would look it over and give a view about whether they were representative examples of an idiomatic solution in those systems. Perhaps I should drop the idea.

Yes, I’m serious. I think perhaps I created the wrong impression. I wasn’t trying to create a rigorous comparison with runnable examples but rather to give a flavour that might suggest something. Perhaps I should amend the wording or drop it altogether.

You’re contradicting yourself there…

2 Likes

I don’t see a contradiction in wanting to be fair without necessarily creating something full-blown that might obscure the point. I’m not trying to teach people those systems. But anyway I think I have your general point, thank you.

In most situations, code snippets are expected to work with minimal adjustment.

Work, as in, “be a valid example” — yes. That’s what I was trying to get at. Work, as in, “you can copy & paste this and run it straight away” — no, that’s not what I was intending.

I’m not sure if I should just try and phrase this differently, learn more about those systems (I probably don’t have time for that), or just drop the idea entirely.

since you offered already, let’s drop the idea!

edit: I see you added that you’re dropping it in your op. :+1:

As someone who primarily uses Sugarcube and is very interested in Rez – AI generated or not, if I’m looking at these code snippets and spot obvious errors it makes me a lot less likely to switch. (Whether or not they run as-is is tangential.)

Dropping the AI code snippets is a good idea, but if you want to keep them I think there’s people here who would happily help write them in a language they’re familiar with. Maybe that’s worth considering?

3 Likes

Absolutely, that’s why I was I was hoping to get feedback from users of those systems to ensure that I am comparing apples to apples rather than comparing a “good” Rez example to, say, a “crappy” SugarCube one. This was a misguided effort. Maybe in future I will try again with a different angle.

3 Likes

I would happily write a sugarcube example for you to avoid you touching AI for this

3 Likes

FWIW, the LLMs are notoriously bad at generating code for most of the popular IF systems. Putting aside my feelings on AI for a second, there’s just not enough people using them for the training data to be reliable. Add in the fact that most of these languages are still in active development so said training data is going to be fragmented over several versions and you see the problem.

There’s also been a steady stream of people flowing in to IF community spaces asking people to help them troubleshoot their (badly) AI-generated code to the point where it was driving regular helpers away and as such had to ban that behavior, so people are already predisposed to having strong negative associations with reviewing LLM code snippets. On the flip side, people here will bend over backwards to help others out organically. Hope that helps navigate the issue going forward!

7 Likes