The Pitfalls of AI

This is a perfect example of why the people that follow Domain-Driven Design principals are finding more success with GenAI than people who don’t.

We have never assumed we understand the business. We start by modeling it through conversations before we ever write any code to begin with. These models are tested through workflows and scenarios until the model is agreed upon by both the business and us, the technologists.

So when we’ve written or directed code to be written, the logic is already understood and tested. The actual code matches the model and the language used to describe the model (and there is usually more than one model).

So when a DDD architect/engineer uses GenAI to build an application, it’s just painting what we’ve already decided.

And yes, when you write the code, you often run into irregularities. On my teams, we stop, bring this to the business and fix it in the model, then continue with the code with the gap addressed. With GenAI, these gaps are identified through the tests designed by the workflows and scenarios used when working on the model. Good guardrails also help identify gaps.

When you build software from a DDD perspective, you don’t create technical debt OR comprehension debt.

1 Like

I think I get it. Trying to get AI to understand an existing code base is problematic, but generating it from a clean slate with business use cases works much better. It sounds like comprehension of the code base is not required (or even desired) when adhering to DDD best practices.

No that’s not it. Because we’re modeling the behavior of the business, that behavior and its language are in the code.

I can often put code on a projector and the SMEs can see their application language there.

Here’s a simple library behavior from a DDD perspective. It may seem simple but a lot of code would use database terminology like addBookToCart which makes sense but so many logic issues arise from that.

interface Member {
  readonly id: MemberId;
  readonly tier: MembershipTier;
  readonly currentLoans: ReadonlyArray<Loan>;

  checkout(book: Book, today: Date): Loan;
  returnBook(loan: Loan, today: Date): ReturnReceipt;
  renewLoan(loan: Loan, today: Date): Loan;
  payFine(fine: Fine): Receipt;

  hasOverdueBooks(): boolean;
  canCheckout(book: Book): boolean;
}

type MembershipTier = "standard" | "faculty" | "senior";
type LoanPeriod = 14 | 21 | 28;

@DavidC Right, but do you go into the code that’s generated at all?

I verify contracts and my guardrails check variants and invariants. But the above example already is begging for more questions:

  • what if the member is checking out a DVD or Vinyl album?
  • what if the member loses an item?
  • what if the member is an organization and not a person?

This is where DDD insists we ask all the stupid questions and the SMEs tell us what’s what.

  • there are actually many kinds of items you can checkout
  • we have a replacement cost
  • this happens outside of our membership system, but it is documented

Then I’d ask more questions:

  • List all of the items that can be checked out and their limitations (days, age range)
  • Do you store the replacement cost for every item?
  • How do organization checkouts work?

You can see this is extremely iterative, but eventually you exhaust all of the stupid questions and you have a model everyone agrees is “mostly complete”. Then you implement it and the code exactly represents the model including the language the SMEs used.

And with GenAI, we do the same process, but just hand off the model definition to the AI with proper guardrails and it generates exactly what we modeled and prompted (varying programming language, data storage, microservice, any other implementation criteria).

Non-DDD technical people do this too, but they almost always pre-select a relational database as the default data store and they think in terms of tables and columns. When you do that, you immediately diverge from the business model. DDD folks avoid the data store conversation until implementation time.

There’s a deal of prior art on this from when machine automation began to spread starting with “Ironies of automation” (Bainbridge, 1983) that discusses the challenges of turning operators into supervisors.

I wrote about this for the WONRO (Welcome Our New Robot Overlords) substack that I & my friend and philosopher Chris Bateman started a little while ago, in a piece titled Four Problems with Robot Slaves.

A good example is self-driving cars. At least in certain circumstances these have reached a level of competence that allows daring couples to video themselves having sex as their car flies along the road. Yet they go wrong and when they do it’s very hard to recover. This is brilliantly outlined by Uber’s former head of self-driving, Raffi Krikorian, talking about his car crashing.

My own view is that the real dangers lie in long-feedback policy related automation. We’re not there yet but I fear we will be soon.

1 Like

@sandbags Great site you’ve started there! Subscribed! Reading further on the site was mention of addiction to AI usage. This is a trend I’ve started seeing in AI commentary now.

There’s a certain gambling-like pattern I’ve experienced with AI.

| :bell: | :bell: | :bell: |

It makes me very happy when AI gives me a programming solution that works. Like, it feels rewarding in a way, but I know that’s not what actual reward is. The only saving grace I have is that I don’t work with AI (just a casual hobbyist), but if I was dictated to by my employer… I could see how AI can be addictive.

Addictions replace the good parts in people. I want to keep my good parts in working order. :wink:

2 Likes

Thank you :grinning_face:

Vibe coding addiction is very much a thing. I am following other developers experiences closely and there are very worrying mental health trends. I think it will not be long before we see people needing to be treated for disorders related to supervising coding agents.

I am in two minds about coding agents. Yes they have some productivity wins. Yes they democratise the means of producing software (there has probably never been a better time to be a domain expert with an idea) but… I can’t help thinking that we’re generating a lot of technical debt that someone is gambling will never need to be paid off… that the models will improve faster than the debt piles up (that is 100% Anthropic’s bet).

I’m not convinced that, when the dust settles, we’re going to be on the winning side of that bet.

4 Likes

I’m convinced the debt is piling up faster than any model will be able to fix.

Seems to me the bottleneck is people’s awareness and willingness to fix it, not models’ ability to fix it. When I notice that Copilot has put a week’s worth of work into a single 10,000 line file, or suspect that the model has written the same helper functions three times already, I skim through the code to see what stands out, and then take some time away from feature work to refactor and clean up - which the model is easily able to do, once given the task.

1 Like

I would add “ability” to that list! Some people using Claude are experienced developers who understand concepts like technical debt and refactoring, but a lot more aren’t.

2 Likes

I find when doing low-level coding Claude will often add new fields with slightly different meanings rather than do simple computes from existing fields. Over time without directives to clean up existing code (and sometimes in spite of such directives), this proliferation leads to Claude getting confused and breaking things in weird ways.

Any time features are added or the design shifts, Claude lets cruft build up in ways that I don’t think even moderately skilled human coders would. It’s especially bad about adding superfluous comments that quickly get out of sync with the actual code and clearly Claude will sometimes use those same comments to base assumptions on for future edits.

3 Likes

I haven’t tried using AI generated code, but for every coder who knows how to properly manage AI code, I’m sure there are probably a dozen coders oblivious to the errors accumulating in their projects and a hundred non-coders who think the AI is as good as hiring an experienced programmer… and its those non-coders we need to worry about, especially since some of them are profit maximizing C-suite types who don’t fully understand the value of their human developers… and that 1:12:100 ratio is probably overly optimistic.

And if I’m honest, part of why I haven’t tried AI coding, aside from not having the spare funds for any paid models, is lack of confidence I could properly double check the AI’s work.

5 Likes

You’d be amazed how much technical debt is knowingly and unknowingly built up by human developers too. I’ve worked in companies where dev churn (people quitting and leaving with undocumented knowledge, new people coming in without proper onboarding) leads to decades-old monoliths that people are afraid to touch because they’re core underlying systems but they’re written in ancient languages and nobody understands them.

Smaller startups are no better - developers are often told to not waste time deprecating features or fixing technical debt because the company hasn’t found product-market fit yet, and until you do refactoring just eats into the funding runway.

Whether these are good arguments and business practices or not is besides the point - even before AI came along most code repos have been in an awful state.

In fact, since AI is trained on existing code, I wouldn’t be surprised if many of its bad habits are learned from code written by lazy human engineers or due to managerial decisions that deliberately pile on technical debt and kick the can down the road.

2 Likes

How it arrives at writing bad code is not relevant. The thing is it will get you from greenfield to hopelessly obfuscated bubble-gum and duct-tape covered code with large blocks of misleading comments in world-record time.

3 Likes

I think there’s an old XKCD strip that’s a flowchart that can be summarized as coding well leading to code that half works, but the requirements have changed and you need to start over but coding quick leads to barely functional spaghetti code and the box for good code is disconnected and surrounded by question marks.

But yeah, I’m not sure AI’s failure modes are all that different from the mistakes humans make, but the rate at which the AI can churn stuff out means they can pile up exponentially faster… And to be fair to human programmers, sometimes, less than great code isn’t the product of laziness, but a case of having already spent an entire work day trying to fix a bug and deciding the duct tape patch is good enough…

4 Likes

I agree. It feels to me that a lot of things that are framed as flaws of AI are really just things that humans have been doing anyway, but humans can now do even more easily with the aid of AI: write bad code, create incomprehensible systems, destroy the world with the excesses of capitalism, or whatever else. I am personally intrigued by the existence of generative AI as a technology but would be pleased to see the current bubble collapse.

1 Like

I am jumping in as someone who has benefited well from the generative AI boom but is also very wary of the marketing, I agree with the overall sentiment that this is a tool with its own pros and cons.

Ultimately it will be the “how” of using it that will be the differentiator. That said, the impact it has had on my field of research is pretty alarming. Bioinformatics is inundated with AI Slop projects. Luckily, peer review is doing its thing but they have a finite bandwidth. On the other hand, I have used Claude for certain issues in some tools I have built and maintain as an active scientist. Giving it a specific problem and guardrails and precise instructions, particularly for making small changes in an established codebase, has been incredibly useful.

For creative endeavours though, I do hate that almost all of my students work sounds exactly the same. Giving them the benefit of well intentions, and assuming they only used ChatGPT or Claude for fixing some grammatical issues, I really do miss the unique voices I used to see in people’s essays, or papers.

3 Likes

That’s very unfortunate. I’d be pushing back on that with your students.

It’s understood now that relying on AI reduces comprehension and information retention. Even if the ideas are our own to begin with, using AI to make us sound smarter about them, means we spend less time thinking critically about them (and that includes wrestling with how to convey those ideas)… which is the opposite intent of academic learning. Communicating ideas in one’s own words is the only real proof we have of our understanding of things.

If the students are starting to sound the same… they’re not trying hard enough to fully understand their own ideas. When you think clearly, you write well. Well thought out ideas don’t require AI assistance.

5 Likes

The theme of this video is: small things scale up.

[!warning] CW: Animal experimentation images and discussion
https://www.youtube.com/watch?v=-7RDU-piOVA

[Edited by Mod]

1 Like