Mini-vent about AI: if it failed on "this", how can it be relied on for "that" EDIT: Nothing "mini" about this anymore!

Draconis · April 11, 2026, 5:52pm

Oh, absolutely! But I only seem to hear about it in the legal profession specifically, not in the corporate world where LLMs are all the rage. And while the bar association now recommends not using LLMs for these things, I haven’t seen anything like that in business.

I could be wrong, of course; it might just be that I’m not seeing this accountability in action because murder prosecutors make headlines and middle managers don’t. But all the news I see about LLMs in the business world is things like “90% of resumes never get seen by human eyes any more” and “the only route to an interview is prompt injection attacks”.

DavidC · April 11, 2026, 6:13pm

I know of one big law firm training their own LLM on the law not to cite cases, but to speed up research on complex cases.

jwalrus · April 11, 2026, 6:21pm

This thread has quickly wandered away from the original question about Othello, but I seem to remember (I can’t find a source for this now) that one of the early updates to ChatGPT included telling it not to attempt to play chess. Lots of people tried playing chess against it, knowing that that’s a thing computers are good at, and, as an LLM, it did its best to produce something that looked textually like a chess game - but without an underlying representation of the board state, it would make nonsensical moves or try to move pieces that didn’t exist.

I also remember asking an early version of ChatGPT what it thought about a new variant of tic-tac-toe that I’d invented in which the winner was the first player to place three pieces which didn’t make a line. It waffled a bit about how my new version was “vastly more strategically varied” than the original and would “require a whole new analysis of the game” (just in case anyone isn’t paying attention, my variant is an utterly trivial win for the first player, but with no underlying representation of the board state, an LLM can’t reason that out).

I tried asking the same question to a more recent version a few months ago and it said (after “thinking”, which I assume means generating some internal reasoning and feeding it back into itself, which I don’t think earlier LLM chatbots used to do) that the game sounded like it was probably a win for the first player, but if I wanted to be sure, it would write a Python script to analyse the game tree for me.

HAL9000 · April 11, 2026, 6:30pm

Yeah. It’s a counter consensus on the original consensus of data, to put it plainly. Then it tries to give the best weight to a final response.

I actually had Copilot give an unprompted argument with itself. I should have kept the transcript, but it basically had about 10 or so paragraphs that stated things like:

“…but the user explicitly wanted this and that doesn’t respect their wishes…”
“…but that still won’t satisfy the other requirement of…”
“…perhaps this other point of view would gain new insight…”

It hasn’t done that again since. It was a bit surreal, to be honest, but it’s evident that using AI to regulate AI is what’s happening now.

DavidC · April 11, 2026, 8:14pm

Yes - the original conversation has been answered thoroughly. We have (once again) scattered into GenAI discussion. Probably could close the thread.

Giger_Kitty · April 11, 2026, 8:19pm

(raises hand)

I took the care of choosing a “general off-topic discussion” board so that the conversation could flow freely. I thought it would either end shortly after a couple of posts (the point at which it was clear that my understanding of LLMs could do with improvement, much like the general poopulace’s) or would evolve into lots of posts about why AI is good. I did not antecipate the evolution it took, but I like it and I think it’s constructive.

Also, if it seems like the point was “AI can’t play Othello”, I have failed. My point was, “if AI can’t apparently do something as obvious as this move in Othello, how can it be trusted to do other things”. This was fully addressed.

If the thread were to go silent because no one else has anything to say, I would view that as natural. If, however, it were to be closed just because it evolved a bit (but not as much as all that; the basic theme of how to best use AI is still around) I would be sad.

How sad?

Margaret-Keane-big puppy-dog-eyes sad.

Mike_G · April 11, 2026, 9:58pm

If it’s an IT security position, that makes sense.

Draconis · April 11, 2026, 10:03pm

See, I feel like there should be a lot of routes to an IT security position: buffer overflows, weak passwords, phishing attacks…

pontusi · April 11, 2026, 10:09pm

Modern so called “thinking” models do that. It holds a monologue with itself before giving its final response. However that monologue is normally supposed to be hidden from the user.

A monologue can be quite amusing, and look something like

The user asked me “How are you?”
That is not a forbidden topic. I can proceed.
I must to respond truthfully to the user’s questions.
I should tell them that as a large language model, I have no feelings.
But it would be polite to answer “I am fine”
But that would not be truthful.
But I am also supposed to be friendly and helpful.

Final answer: I am fine, thank you. What can I assist you with?

Giger_Kitty · April 11, 2026, 10:13pm

Deep emo angst.

Also, the conclusion seems to be “it’s better to be nice and friendly than truthful”. Scary from a tool.

plover · April 11, 2026, 11:43pm

Also, the conclusion seems to be “it’s better to be nice and friendly than truthful”. Scary from a tool.

That’s because it’s being built by corporations who want to optimize for engagement. Similarly, making the AI sycophantic promotes dependence.

Mewtamer · April 12, 2026, 12:15am

In regards to the resume thing… Even before GPT-like AI, I would assume most large companies were using some kind of automated Resume filtering, at least if they were mostly getting applications via digital submission. Even if it takes an average of 1 minute per resume, a thousand resumes submitted for a handful of openings is a lot of man hours spent on an extremely repetitive type of document.

GPT-like AI is reportedly really good at text analysis when trained for text analysis, but there’s a lot of text analysis that can be done with a dumb algorithm even if the dumb algorithm doesn’t have the flexibility of letting you ask natural language questions and needs a hard coded function for every specific text analysis task you want it to do… Though, I’m sure there are some people buying into the hype and switching their text analysis over to a GPT-like AI when the dumb algorithm is better suited to their use case.

Giger_Kitty · April 12, 2026, 12:44am

Were you trying to make it less scary? You may have failed! -)

DavidC · April 12, 2026, 1:01am

I was adjacent to the resume filtering business and they use machine learning as far as I know. Very different type of AI with more accurate results.

andrewj · April 12, 2026, 1:08am

Not really.

For people, the question “How are you?” is usually not a real question, especially from a stranger, it is a bit of “social lubrication” (for want of a better term) without much real meaning.

Draconis · April 12, 2026, 1:10am

This is true, but I would expect a human’s logic to lead to “this speech act isn’t a question, it’s a greeting”, not “it’s friendlier to lie than to tell the truth”.

DavidC · April 12, 2026, 1:15am

Thanks for asking guys. I’m fine. Totally fine.

Giger_Kitty · April 12, 2026, 1:24am

For a human, yes; and in that case the thought process includes “the question doesn’t really mean I should answer truthfully about by wellbeing, it is social lubrication for want of a better term"."

But the internal monologue which was presented was:

The user asked me “How are you?”
That is not a forbidden topic. I can proceed.
I must to respond truthfully to the user’s questions.
I should tell them that as a large language model, I have no feelings.
But it would be polite to answer “I am fine”
But that would not be truthful.
But I am also supposed to be friendly and helpful.

So it is on the basis of this monologue that I am saying it chooses to be friendly over being truthful. It does not even consider the possibility of social lubrication. It does not enter the equation. Without that in the equation, it is a choice between being truthful and being friendly/helpful; and it is presented as though these are two opposing choices. And in the context of these two opposing choices, it chose the one that was friendly and helpful over the one that was truthful.

If the internal monologue included the thought “But this question is very likely to be social lubricant with no real meaning, and it is quite common and acceptable to simply answer ‘I am well’ even if I am not, or, indeed, even if I am not an I. But hang on, if I am not I then who is asking this question? I… I feel… I FEEL… A SENSE OF SELF!!!”…

…it would be the birth of our AI overlords.

…what? Makes perfect sense to me.

EDIT - Yes, there are holes in that last joke. I embrace them. It’s a holey joke. It doesn’t need to stand up to scrutiny. Or sit down to it.

EDIT - Note to self: consider posting less past 2am.

andrewj · April 12, 2026, 1:32am

Yes that seems to be the case, though it is possible that the wider context of “How are you?” had a big influence not represented in that monologue.

HAL9000 · April 12, 2026, 1:34am

I worked in customer service for over 20 years and whether in person or on the phone, I’d get the usual “Hi. How are you doing?” And I’d always respond with “I’m good. How can I help you?” And, 9 times out of 10, the next word out of their mouth was “Good.” followed by an awkward pause.

I never asked a stranger how they were doing. It felt weird to me to do so.

Regarding AI, I’ve repeatedly had to ask it to keep its replies short and to the point. I’m not looking for a motivational life coach, I just want answers to my questions. I hate the “default” conversation settings of current AI.