So... how good does a parser have to be these days?

Dannii · October 12, 2025, 2:10am

I don’t remember much about minimalism now, but the name comes from abstracting all syntax transformations to a single “merge” operation. And while it may have a Euro-bias, I don’t think it’s really fair to say it treats all language as English at a fundamental level, as minimalist accounts of English still have to do lots of move/merge operations!

If minimalism has a bias I’d expect it would be more towards agglutinative/isolating languages. Fusional, incorporating, or discourse configurational languages might be harder to analyse with minimalism? (They’d also be much harder to process in an Inform-style parser!)

My thesis used distributed morphology, which was developed around the same time as minimalism (and some say it is a development or application of minimalism). I think it might perhaps be able to account for some things better than pure minimalism (though I haven’t looked at either for 15 years.)

Draconis · October 12, 2025, 3:07am

Yeah, I’m vastly oversimplifying to make a point here. My issue with minimalism is that it assumes things like “subjects of verbs are Merged in later than objects” are true for all languages. But then there are languages like Hittite where all verb arguments seem to be on the same level, so you can put subjects, direct objects, and indirect objects in whatever order you want…so minimalism needs to add more complications like scrambling, where the elements are merged in an underlying SOV order and then scrambled into the surface form, instead of just allowing different languages to treat subjects differently.

That’s what I mean by its English bias—even when there’s no evidence that subjects are Merged in later in Hittite, the structure of subjects has to be universal across all languages, because English does it that way. (The hierarchy of projections is a particular offender here.) Alternatives like head-driven phrase structure grammar (HPSG) get to keep all the best parts of minimalism (I will admit I’m very fond of constituency structures over dependency structures, even if they’re falling out of fashion nowadays) while also allowing each language to have its own set of rules.

I’m not very familiar with distributed morphology, but from a quick search it looks like it agrees with HPSG in moving a lot of detail from a posited Universal Grammar into a particular language’s grammar-slash-lexicon, so I’m all for that!

pinkunz · October 12, 2025, 4:34pm

You could take the newspeak method of eliminating words and paring the language down. You could only include words that have clear and unambiguous translations into words in the set of languages you wish to use, and then form a working dictionary out of those words. Once you have that, you pick an arbitrary simplified grammar that isn’t a one-to-one with any of the existing grammatical systems, meaning everyone is equally disadvantaged. The included dictionary includes the direct translations in other languages as synonyms, forming a single consistent system you could build a parser around.

ETA: Note, this isn’t a project I’d be interested in, to be clear, but, in theory, it should be workable. It’s going to be awkward and alien sounding to all users, but it should remain somewhat intelligible.

Mewtamer · October 12, 2025, 5:52pm

I confess most of the linguistics discussion is going over my head, but I always thought ASL was just a method for encoding spoken English as hand gestures just as the Braille I learned as a kid was just a way of encoding written English as a pattern of raised dots… Do remember being pissed in highschool upon learning that the deaf and hearing impaired could use sign language towards their foreign language requirement when the blind and vision impaired couldn’t get foreign language credit for Braille(was also pissed that highschool level courses taken in middle school didn’t count towards my HS GPA or graduation requirements… would have taken two years of maths beyond the 4 that were standard for Highschool at the time, but still feels unfair that two of my best grades didn’t count and I like to think I’d be annoyed on the behalf of any hypothetical children of mine who found themselves in a similar situation).

zarf · October 12, 2025, 6:02pm

Nope, it’s entirely its own language. (Not one I know, but I’ve read a tiny bit about it.)

It’s considered to be a naturally-evolved language, too – not a “conlang”.

mathbrush · October 12, 2025, 6:02pm

I learned a little bit of ASL and a little bit of Braille. Braille is just an encoding like you said, but ASL is significantly different. It actually is a lot more similar to Chinese in word structure than to English. In Chinese, a big chunk of characters consists of two parts: one that indicates the meaning, and one that indicates the pronunciation. So, for instance, 们 (a word that changes pronouns into plural) has radicals indicating ‘person’ and ‘gate’, where ‘person’ indicates its use in pronouns and ‘gate’ provides the pronunciation (both are pronounced ‘men’, with the e sounding like a schwa). Similarly, a lot of sign language signs are the combination of the first letter of the word and another motion, sign, or position indicating the meaning of the word (like ‘King’ being the letter K moved in a line like a sash).

Outside of sign language ‘spelling’, the grammar is quite a bit different, but not in ways I’m able to communicate well due to my lack of experience. Words like ‘a’ and ‘the’ are often omitted when they can be implied, and ‘is’ can be omitted from adjectives if they are placed after the noun (both of which are again similar to chinese, which is interesting.)

Hidnook · October 12, 2025, 6:31pm

Sign language has its own grammar, sentence order, vocabulary, etc. separate from their spoken “counterpart” (visual/spatial instead of auditory/sequential). Every signed language is different, and they also have dialects/accents.

Information is coded within location, movement, and facial expressions beyond just the hand signs (e.g. relative time, emotion, speed). I found it particularly fascinating that instead of conjugations for time, there’s an imaginary timeline running from behind the body to beyond the front, representing the past and future, respectively. ASL also uses a topic-comment structure, so the topic comes first, then its descriptors.

English	ASL
I went to the library yesterday.	yesterday library me go
I will help you.	future me help you
The big brown dog barks.	dog brown big bark

What Mewtamer is talking about, in the case of American Sign Language (ASL), is Manually Coded English (MCE), which is ASL signs with the grammatical structure of English. There are various systems for this, including Signing Exact English (SEE) and Signed English. When people are speaking and signing at the same time (sim-com), they’re using one of these systems, not ASL.

Signed,
A hearing person with a passing interest in linguistics

pinkunz · October 12, 2025, 6:38pm

In my experience, it’s not uncommon for that grammatical structure to sneak in when an ASL speaker is speaking or writing in English. I found this to be most common among older speakers.

garryg · October 12, 2025, 8:59pm

It probably sounds like a very stock answer but I think it depends on the type of game you are making. The traditional Verb/Noun input can be very effective if used properly and it is explained that the game only uses the verb and noun input.

But if the parser in more complicated make sure every option has a reason or it can just add complexity to the interface and frustration to the user for no real reason.

If you are just writing the parser so it can interpret a single combined command that can easily be two commands, ask yourself if you are adding functionality or just complexity!

paul-donnelly · October 12, 2025, 9:47pm

I get the impression that people are more open to minimalist parsers of various kinds than they were in the fairly recent past, but maybe less accepting of inconsistencies in a game’s parser than they had to be in the early 80s.

prevtenet · October 13, 2025, 12:29am

I think this is exactly it. People are willing to play by the game’s rules if they know what they are; but if the game implies one rule and does something different, they get frustrated.

This is true in a narrow sense (e.g., if the game typically rejects >DIG as nonsensical, the player will be annoyed when one puzzle requires it) but also in the broad sense that the implied promise of maximalist parsers—”just type anything!”—is easily broken. If the player cares more about fairness than gee-whiz complexity, minimalist parsers start to look appealing. In this reading, the trend towards minimalist parsers has the same root causes as the trend towards short, merciful games.

The tradeoff is that you lose some of the illusion of free choice.

jbg · October 13, 2025, 1:57am

Right. And we can at least consider the reductio ad absurdum case of accepting any input and just outputting “No”, or the equivalent generic word of negation in various languages, assuming they have one. But even then there’s the question of whether blank negation is equivalent in all languages. As in, do some languages have polite versus casual ways of indicating negation, and so in choosing one have you introduced a feature for one language that isn’t present in the others? And that kind of thing.

Or to approach this from a slightly different angle: I’m typing this on a QWERTY keyboard. It is quite well-suited to entering the Latin alphabet, Arabic numerals, and common English punctuation marks. With slightly greater effort it can be used to enter accents and so on: sauté. And with somewhat more effort than that I can enter more or less any character supported by whatever I’m typing things into: 叉烧包 (char siu bao, pork bun).

But if I was doing most of my typing in Chinese I wouldn’t want to be using this keyboard. Because although I can use this keyboard to produce essentially anything I could produce with any other keyboard, this doesn’t mean that a QWERTY keyboard is equivalent to every other kind of keyboard.

That is, in some narrow technical sense I can accept that I have a keyboard that can type any language (angels and ministers of UTF-8 permitting) but I’m extremely skeptical of the claim that this is a universal keyboard for any language in any practical sense.

Of course this keyboard wasn’t designed to be a universal keyboard, it was designed for US English. So you could sit down and start making modifications to it to make it better suited to other languages. Perhaps just sorta bolting together a bunch of keyboards for different languages. But even once you’ve done that, my assumption is that you’re still going to have a different experience in some languages versus others. And I don’t just mean in abstract qualia or something like that, but in terms of basic nuts and bolts mechanics. As in some languages are just going to be better suited to keyboard-like input semantics than others.

And that’s the kind of thing I’m talking about with parsers. I’m skeptical (although willing to be convinced otherwise) that the whole parser paradigm or whatever you want to call it maps equivalently onto all languages. And that’s to say nothing of the tacit assumptions built into the typical parser-based game world model.

To be clear, this isn’t me trying to terminate discussion or anything like that. I’m just trying to clarify what I meant when I said that I think any argument about whether or not you could build a universal parser eventually boils down, in practical terms, to an argument about synonymy. As in, because both my QWERTY keyboard and a pinyin keyboard can produce arbitrary UTF-8 characters, are they equivalent? Is a romanji keyboard equivalent to a kana keyboard? And so on.

My point being that these kinds of things tend to end up not being questions of narrow technical capabilities, but squishy just-draw-a-line-somewhere definitional things.

pinkunz · October 13, 2025, 3:32am

Okay.

Mewtamer · October 13, 2025, 4:06am

And even then qwerty wasn’t really designed for optimized English input, but to minimize mechanical issues on early typewriters and is one of those things we’re now stuck with because of societal inertia.

I knew there was more to sign language than just the signed version of the alphabet, but I didn’t know there was that much to it.

Also, on another tangent, the latest Rob Words is on the NATO Phonetic Alphabet(which wasn’t made by NATO and isn’t a phonetic alphabet) and goes into the difficulty of finding words that are easily understood across several languages and not easily mixed up and why some of those words need to be given non-standard spellings to ensure consistant pronunciation across languages.

Which was already a tricky problem, yet much simpler than an multi-lingual parser.

pinkunz · October 13, 2025, 4:22am

Okay, that is just bizarre. I just shared the same YouTube video on a different intfiction post less than 12 hours ago:

Mewtamer · October 13, 2025, 8:33pm

Well, clearly, several people on this forum have at least a passing interest in linguistics and etymology, Rob Words is a YouTube channel dedicated to that subject, so it makes sense there would be overlap between users of this forum and watchers of that channel, and that video was posted in the last few days, so it would logically be on the minds of anyone who both watches Rob Words and reads this forum if they read something similiar on the forum… though I’ll admit I haven looked at that thread recently, so I haven’t read your post recommending it outside of you quoting it here.

Hidnook · October 13, 2025, 9:53pm

I’d never seen that man in my life until that exact video popped up on my recommended…so I saved it to my Watch Later list…and now we are here? This is truly strange.

vaporware · October 14, 2025, 1:27am

I’m pleased to note that ZILF mostly supports this:

github.com/taradinoc/zilf

default/zillib/parser.zil

branch


      
          <PRONOUN IT (X)

              <NOT <OR <=? .X ,WINNER>

                       <=? .X ,MANY-OBJECTS>

                       <FSET? .X ,PERSONBIT>

                       <FSET? .X ,PLURALBIT>>>>

          

          <PRONOUN THEM (X)

              <AND <N=? .X ,WINNER>

                   <OR <=? .X ,MANY-OBJECTS>

                       <FSET? .X ,PLURALBIT>>>>

          

          <PRONOUN HIM (X)

              <AND <N=? .X ,WINNER>

                   <FSET? .X ,PERSONBIT>

                   <NOT <FSET? .X ,FEMALEBIT>>

                   <NOT <FSET? .X ,PLURALBIT>>>>

          

          <PRONOUN HER (X)

              <AND <N=? .X ,WINNER>

                   <FSET? .X ,PERSONBIT>

This file has been truncated. show original

Every pronoun provides a filter for deciding which objects it refers to, and whenever an object is noticed, it sets every pronoun whose filter it passes.

The pronouns can also be overridden for a specific object: (PRONOUN HIM HER THEM) will make the object set those pronouns when it’s noticed, regardless of the filter. The bear in Advent is both “it” and “him”, the set of keys is both “it” and “them”, etc.

Although I haven’t tried it, I think this works in ZILF as well, because orders are implemented by changing the player-character before parsing the order as a second command (as with “then”).

Dannii · October 14, 2025, 2:13am

Not only is ASL not a form of English, it’s also a totally different language from the signed languages in many other English speaking countries. ASL is used in the US and Canada, and is part of the Francosign family. The UK, Australia, NZ, and South Africa have sign languages that are in the BANZSL family which are unrelated to ASL, though they have borrowed some vocab, especially in recent decades.

spaceflounder · October 14, 2025, 4:58pm

This is the kind of discussion that makes me run back to writing games with a Verb Coin.

Is a real parser worth it? Mostly. It’s a lot of work for the illusion of choice. The game can’t really do everything you type, but the parser can sometimes kind of fool you into thinking it’s possible.

I have to applaud the cleverness/ingenuity of parser designers, past and present. It’s a deceptively difficult engineering challenge.