Now of course, if I want to do a language puzzle…then I’m going to need a language.
Not much of a language. Just enough to make the puzzles work. 20 words or so, and basic grammar. But still! How does one even get started on any of that? And this is where we get into the fun of conlanging!
I’ll be keeping the grammar a secret for the moment. But still, coming up with 20 words of vocabulary is hard, especially for someone like me who struggles to name anything! I could just grab some Sumerian words, nobody’s likely to recognize those, but I want this alien language to look alien—not too alien to pronounce, just alien enough to feel weird. (Plus, someone actually pulled up an Akkadian dictionary and grammar to decipher the “Kishadu” dialogue in Death on the Stormrider.)
So when I need a bunch of words with a consistent aesthetic, I turn to context-free grammars. A CFG is basically a series of rules that says “when you have an X, you can turn it into a Y or a Z”. They’re generally used to recognize patterns rather than generate them, but by adding a bit of dice-rolling, they work just as well for generating too.
For example, let’s say I wanted to generate a bunch of nonsense words that look like Japanese. A syllable in Japanese consists of four parts:
- An initial consonant (optional)
- A y sound after the initial consonant (optional)
- A long or short vowel (required)
- A final n (optional)
We can encode that in a context-free grammar like this:
# A syllable has four parts
S : CYVN
# 0 stands for "nothing here"
C : p b t d k g s z h r 0
# Let's make "nothing" be twice as common as "y"
Y : y 0 0
# And short vowels twice as common as long ones
# (S is already taken, so short is H)
V : L H H
L : aa ei ii ou uu
H : a e i o u
# Similarly, "nothing" twice as common as "n"
N : n 0 0
# Finally, a word will be some pattern of syllables
# Let's use a bell curve
@ : S SS SS SSS SSS SSS SSSS SSSS SSSSS
Now, we start with “@”. At every step, we go through our string, look for any character that appears on the left side of a rule, and replace it with a random result from the right side. So over a few steps:
@
SS
CYVNCYVN
t0ounkyaa0
tounkyaa
And if we run this a bunch of times:
gokyo hiinhuguunkya zin kyunrihin dibukyigounkyiin pipuu syeinbyin ontehu tyanzaa sanzeitu hounduuzei pyukyo uukanbyun zinbei hodeizin roobya giinzinheyi zyuubeto yonzya pyan raenru huyagon buzison
Reasonably Japanese-looking nonsense! This could be improved by running a few replacements on the result—for example, hu in Japanese is actually pronounced fu, di is pronounced ji, yi is forbidden, and so on—and the probabilities are way off. But I think people would agree that words like zinbei sanzeitu pyan gokyo have a Japanese look to them, even if they’re meaningless gibberish.
And this sort of thing turns out to be very useful when you’re making up a language—or even just a handful of names. It’s quite handy when building a D&D campaign to have names from the Tphaki region look like apsheksha ophe okkhepshema enopho esapkhe i pphatepkhomisho thomathiphi tshepho while the dwarves’ names look like pazam qidtupir p’r anpi ku palirpud tiinquz ulk’ azka kitim titumpa. So a few years back I wrote a simple Python program that takes a grammar like this and produces a long stream of nonsense from it.
(The program is actually a bit more powerful than a CFG: it can run replacements on the result, like turning di into ji in Japanese, or eliminate any words where certain sequences appear, like yi. At some point I’ll document it well enough to toss up on Bitbucket.)
To make this language sound alien, I’m removing a bunch of sounds that are possible in human languages. These aliens don’t have noses, so they can’t make nasal sounds like m or n, and their larynx doesn’t have cricoarytenoid muscles like ours does, so they can’t make voiceless sounds (sounds where the vocal cords are left slack and don’t vibrate), like s or k. To compensate for the loss, I’ll throw in a couple sounds that English doesn’t have, but which English-speaking readers can easily imagine, like gh and q.
# Curve of syllable distributions
@ : IF IF ISF ISF ISSF ISSSF
# Initial syllables can lack onsets, sometimes
I : S S V
# Medial syllables are onset-nucleus
S : ON
# Final syllables are either that or onset-nucleus-consonant
F: S ONC
# Onsets are either a consonant or an R-compatible consonant plus r
O: C C C Rr
# Nuclei are either a vowel or a vowel plus a glide
N : V V YV
Y : u i
V : a e o
# And now our consonant inventory: voiced stops and fricatives only
C : b d j g q v z zh gh l r
# R-compatible consonants: don't allow lr, jr, etc
R : b d g q z
The result?
dezhog eduobio ozioved vuodroqio qrozioqruega luezoze ziezholiob aviazrazelue qroqrie aqria zhezhuo oqred godruoroj giejuajiequazh luaqroraquoq quoqoboq ejiorie zobejieb aquaze buazovueb ojorod eghiad odradreborie luavad zhuabebuezro eqrojav zruelo ezuoq azazhagh abrelo egoqrogiobag ojueva logreghia gheqobraqrelo abrej zhazruezhev zravabeb ghieqovajogh zhozrogedav gueqale logruaq eqrog ojazabadre aghuaghuezriada ejajiogro gobiaghua abiol rajabuebruogheq edrodred ghuazaziodezoz baquel riejor vegedrarazrazh
I only need like 20 or so good ones, so it’s okay if the CFG also produces some bad ones. From this batch I might grab dezhog, aqria, zhezhuo, oqred, ojorod, eghiad, zruelo, ezuoq, logruaq, or abiol as particularly nice-looking. If I cared about fine-tuning this, I’d notice that it’s producing a lot of intimidatingly long words, and reduce the number of syllables in the first line of the grammar. But I don’t. I’m not likely to be using it again after today.
And voilà! Alien words!
Drela oraqe adruagria zhozozra. Oziaz vozrazalio ghiadar zevogh doqradab. Viabrieve obegh biagev. Zhozregrua!
Now all I have to do is pick a handful of favorites, adjust them to my liking, ensure they look distinct enough that people won’t get confused between them, and a language is born! Let’s call it, uh…
Zhozrogedav
That sounds good. Let’s go with that.