Seeking advice on LLM use

I generally avoid LLMs, because the sorts of things I do (programming, writing, research), either they’re very bad at or I enjoy doing it myself. I imagine at this point my stance on their use for IF is well-known.

But if I’m teaching students about how they work and what they’re good and bad at, I figure I should also have firsthand experience using them for actual tasks, not just basic demos. I currently have some tedious language processing that I would love to automate, and in this case, it seems like LLMs would be the right tool for the job.

Specifically, I want to take an enormous list of words (4.5 megabytes of plain text) and make a spreadsheet giving semantic properties of each one. For example:

  • Can it be used as a common noun (that is, not a name)?
  • If so, can it refer to a concrete object (that is, not something abstract like “joy”)?
  • If so, is that concrete object human-sized or smaller?
  • Can it be used as a common adjective?
  • If so, can it apply to concrete objects?
  • And so on…

This seems like a promising job to automate. It involves analyzing words based on their contexts of use, and it’s tedious enough that no human would want to do this for four hundred thousand individual words. And if there are mistakes, it’s not the end of the world.

But…I don’t really know where to start. Are any of the LLM companies still offering use for free at this point, or are the serious models now all locked behind paid APIs? OpenAI’s “compare plans” page is frustratingly vague on what’s actually possible on a free plan. Firefox claims to have a sidebar for easy LLM use; can I turn that on without inviting endless privacy violations? Which of them will accept a 4.5MiB text file?

Basically, if I want to (as a complete novice) try using LLMs for some language-processing tasks…where’s the best place to start, in mid-2026?

The LLM won’t do all of the work in this case. It will build a program (probably python) to organize the output and use tokens for individual tasks. Claude does this all the time.

In fact one of the secrets of successful llm use is that they just know all the normal tools and do it fast. Read a pdf: There’s a tool. Create a mermaid diagram: tool. SVG: tool.

In your case I’d even suggest to the llm (use a terminal window version or vs code) to put all the data in a sqlite3 database after describing your intentions.

Your vision will likely be carried out in one 15-30 minute session.

It’s pretty low-powered, but you can still get GitHub Copilot for free. With the VS Code extension it’s pretty convenient.

The best way to go is probably to get copilot to write you a Python script that can do it. Otherwise the token usage will be prohibitive.

1 Like

Mm, unfortunate. If it comes down to writing a Python script, I can more easily do that myself; I was hoping the LLM would help with actually processing the text, since “is this word a concrete noun?” sounds like the sort of problem they’d be best at, without getting bored like a human annotator would.

On the other hand you might be able to feed copilot a huge file now before they switch to token-based pricing.

The free models on Copilot are mediocre at best, so I’d suggest spending the $10 for a month of Pro, but they’ve paused new signups for now.

1 Like

Also, I’m no expert in this at all, but there are Python libraries for natural language processing. In fact I just had copilot build me a conlang builder app using some python plumbing.

I sent ChatGPT a copy of /usr/share/dict/words and asked it to do this task, and it wrote a Python script that used some simple heuristics to classify the words. Laughable. When I told it what I thought of that solution, it suggested sending batches of 50-200 words at a time, with a prompt like:

Classify each word by its possible English common-noun usage.

Definitions:
- common noun: can be used generically, not only as a proper name.
- concrete: can refer to a physical object/entity, not an abstract idea, event, language, place, or institution.
- human-sized-or-smaller: a typical concrete referent is no larger than an adult human.

For each word, return strict JSON:
[
  {
    "word": "...",
    "lemma": "...",
    "is_common_noun": true/false,
    "common_noun_sense": "... or null",
    "is_concrete": true/false/null,
    "is_human_sized_or_smaller": true/false/null,
    "confidence": 0.0-1.0,
    "notes": "brief"
  }
]

If a word has both proper and common uses, answer true and classify the common use.
If only a proper-name use is known, answer false.

For a megabyte-sized file, you’d probably want to write a script to submit requests to the API rather than doing it by hand with a chatbot, but I suspect that means having to sign up and pay per token.

1 Like

As I skim the various services, the impression I’m getting is that the free lunch period is just about over, and doing anything serious (like processing megabytes of data) is going to require paid subscriptions. But also, most of those paid subscriptions seem like massive overkill (or a complete mismatch) for this sort of basic language-processing task.

For the moment, I think I’m going to set up a small local model with Ollama, then write my own Python wrapper that takes each word, wraps it in an appropriate prompt, and feeds it to the system. This shouldn’t be a very hard task for any sort of LLM (the (pre-chat) GPT models could handle this five or six years ago), and writing a Python script to iterate through the lines of a file is barely a minute of work.

Oh, definitely! But I chose this task in particular because it’s something that traditional (non-neural) NLP systems are really bad at. Understanding the semantics of words is a lot harder than understanding their syntax, to the point that it was used as a successor to the Turing Test until LLMs started reliably beating it in the 2020s.

Which is why…

But—

This is exactly what I was hoping for, thanks! I’ve never really needed to come up with prompts before, and this looks like what I’ll need.

That’s why I’m leaning toward a local model. Aside from all the environmental and privacy benefits, this is a task where I want economy, not power. Leaving my laptop running overnight is a lot faster and easier than going through each word by hand!

2 Likes

You are so right.

Gemini’s free API tier might work for this: https://ai.google.dev/gemini-api/docs/pricing

The app builder in Google AI Studio even cranked out a slick web app to wrap the API:


Its query code:

      const client = getGeminiClient();

      const prompt = `Linguistically classify and analyze the following English words based on their noun qualities. For each word, determine:
1. isCommonNoun: Can it be used as a common noun (e.g. "dog", "idea", "water", but NOT proper names like "John", "London", "Google", and NOT strictly non-nouns like "very" or "quickly" or "the")?
2. isConcrete: If it is a common noun, does it refer to a concrete object (something physical that can be touched, seen, or physically detected, e.g., "chair", "air", "water", "atom" vs abstract ideas like "joy", "grief", "culture")? Set to false if it's not a common noun.
3. isHumanSizedOrSmaller: If it is a concrete object, is that object typically human-sized or smaller (e.g., "apple", "cat", "chair", "car" or "house", vs a "mountain", "ocean", "galaxy", "planet")? Set to false if not concrete.
4. explanation: A brief, single-sentence linguistic justification of the analysis.

Words to classify:
${words.map((w, index) => `${index + 1}. ${w}`).join("\n")}`;

      const response = await client.models.generateContent({
        model: "gemini-3.5-flash",
        contents: prompt,
        config: {
          systemInstruction: "You are a professional computational linguist analyzing lexical semantics.",
          responseMimeType: "application/json",
          responseSchema: {
            type: Type.ARRAY,
            description: "Array of linguistic word classifications",
            items: {
              type: Type.OBJECT,
              properties: {
                word: { type: Type.STRING, description: "The original word" },
                isCommonNoun: { type: Type.BOOLEAN, description: "Can is be used as a common noun (not proper named entity/adverb/conjunction)?" },
                isConcrete: { type: Type.BOOLEAN, description: "Is it a touchable/visible physical entity? (Set to false if isCommonNoun is false)" },
                isHumanSizedOrSmaller: { type: Type.BOOLEAN, description: "Is the physical entity human-sized or smaller in scale? (Set to false if isConcrete is false)" },
                explanation: { type: Type.STRING, description: "A highly concise, one-sentence lexical reason." }
              },
              required: ["word", "isCommonNoun", "isConcrete", "isHumanSizedOrSmaller", "explanation"],
            }
          },
        },
      });

I’ve had lots of conversations with fellow faculty about this very problem. How do we teach how to be critical with AI tools? One of the things that has proven helpful is to have some idea of how things work, as you are doing, but to also do experiments in the classroom and then ask students to play with free tools, if possible, and then discuss it as a class or as small groups.

I often interact with more artist-oriented students who, in many cases, actively dislike LLM-based services and tools because they see them as taking away future jobs they may have had. Part of what I have found is helpful in these cases is to show that, yes, in some cases they may end up taking jobs in the short term, but that certain fields as less at risk. Doing hands-on experiments has often helped cut through the hype and prepare students for how to recognize potentially generative AI-created works and also to have an idea of how to present themselves, if they want, as someone working against, with, or within existing AI-driven workflows.

I know you asked about a specific task usage, but I’d be interested, if you have some findings in the future on classroom usage, how that went. I know many on this forum actively hate generative AI, and that’s great, but my students are using it and I’d rather they use it safely and potentially ethically within frameworks of understanding their potential benefits and costs than thinking it can “magically” create answers for them for things like design, writing, and programming tasks without knowing the risks involved.

2 Likes