Marco Vermeulen

Software Engineer

January 19, 2026

From madness to method with AI coding: Part 5 – Back at the clubhouse

Welcome to the final part in our series on getting the best possible results from AI-assisted coding. If you haven’t already done so, start with Part 1 of this series, which begins with meta-prompting, and follow the course from there.

Like a golfer after 18 holes, it’s now time to head back to the clubhouse, take a breath, and reflect on what we’ve learned. Throughout this series, we’ve been exploring how AI-assisted coding can move from chaotic experimentation to a disciplined, repeatable method. In this final instalment, we’ll reflect on what’s worked, what hasn’t, and how these lessons are shaping Equal Experts’ approach to AI-assisted delivery at scale.

First, we’ll think about the size and scope of prompts:

Downsizing the prompt

We’ve gone through the process of setting up the Driving Shot, which perhaps landed us halfway along the fairway of a long 6-par hole. What do we do now? Do we switch to putting? If we do, we might be putting for a long time! Or do we switch to a different club to get closer to the green before moving on to the putting phase? Should we even attempt such long stretches, or should we stick to safe pitch-and-putt games?

In our recent work at John Lewis & Partners (JL&P) we faced this exact problem. We needed to deliver a seemingly small new feature – our first in the API backend for a Product Incidents application. We had a walking skeleton with a health-check endpoint running, but none of the core building blocks you’d expect in a production-ready service: a core domain, repository layer, database migrations, and other essential components to enable deployment of the app.

Our prompt seemed simple enough, but when we attempted to execute it we received highly inconsistent results between each execution. After making numerous changes to the prompt and witnessing how it seemed to ignore many of the pointers we gave it, we realised that the LLM was overwhelmed by this seemingly simple task.

In reality, what looked small through a behavioural lens was actually too complex to interpret. The guardrails we’d applied were asking the LLM to design and build a complete core domain following domain-driven design (DDD) principles, in a strict hexagonal architecture. The tech stack was equally complex. We’d specified a repository layer using Spring JPA for database interactions, Flyway for managing database schema migrations, plus service and controller layers. The testing requirements were comprehensive, including acceptance tests to validate critical end-to-end flows, integration tests that ran against real databases in a Docker container, and unit tests that covered the rich domain logic and service behaviours.

That’s a lot to ask, even of an experienced engineer. For an AI model, it was simply too much.

Smaller slices are crucial for better delivery

The breakthrough came when we stopped thinking vertically and started thinking in terms of smaller, more horizontal slices. Instead of asking for a complete feature end-to-end, we broke the task into a series of smaller, well-defined prompts:

First, we generated a domain model using our DDD rule and a legacy specification we had produced earlier. Immediately, we saw the consistency between prompt executions restored. Not only that – the quality of results improved. The result was a rich core domain [1] complete with an aggregate root at the centre, containing all the behaviours you would expect to model the state machine of a product incident throughout its lifecycle.

Every subsequent layer that we added surprised us in terms of the high-quality implementation that the LLM provided. Clearly, we were onto something: our agent excelled at smaller, well-defined tasks as opposed to hugely complex, nebulous ones.

This method of “slicing up” is not new to us; we’ve been working in kanban-style teams for years, where we’ve applied this technique to maintain a steady flow of delivery in teams. We identify larger features and chunk them up into smaller, similarly sized stories. This approach makes complex tasks less overwhelming and accelerates progress.

Once we had the groundwork in place, we could start implementing full vertical slices again. By then, the model had strong infrastructure and components to build on (the LLM tends to perpetuate the patterns it sees in your codebase by following what you already have. So, zooming out of our previous representation, we could now start building entirely new features more rapidly.

AI-assisted coding FAQ – Lessons from the fairway

As we wrap up our From Madness to Method with AI Coding series, a lot of great questions have come in from engineers experimenting with these techniques.

Here are some of the most common, and our thoughts on each:

How complex should my TODOs be? Is anything ever ‘too big for a TODO’?

There’s no strict rule, but the key is to keep the list of corrective actions as short and specific as possible. Your driving prompt should do most of the heavy lifting; the closer you can get with the driving shot, the better – and easier – your final corrective actions will be.

If you notice your TODO list getting long or individual items getting large, it’s a signal that your driving prompt was too ambitious. At that point, consider tweaking and rerunning the driving shot rather than trying to fix everything retrospectively with a lengthy TODO list.

As in golf, it’s better to get close to the hole with your first shot than to rely on a long, complicated journey with a putter.

What counts as an “imperfection”? How many are too many?

Imperfections are the small things your corrective actions can take care of quickly: extracting common logic across classes, adding additional test coverage, or cleaning up a nullable type you’d rather not see (we all have our pet peeves).

If you’re spotting large-scale structural issues with the generated code, that’s a red flag that your initial prompt wasn’t optimal. It’s far more economical to get close to the hole with the driving shot than to do many corrective actions retrospectively. Just like Golf, you want to get as close to par as possible!

Do you still code manually? Or let the LLM handle everything?

Absolutely, we still code! I’m a programmer first; I’m just selective about which pieces I pick up, using the AI to do what it’s best at: the boring bits. I recently enjoyed reworking some imperative service layer logic into a beautiful piece of functional Kotlin. That’s the fun stuff! On the other hand, I avoided mundane tasks, such as refactoring numerous tests to use a common test helper.

Think about what’s efficient, too. If your IDE can refactor or rename methods in seconds, there’s no point wasting time and money on an LLM process. AI assistance works best when it complements your flow, not when it replaces it.

What model do you typically use when generating prompts and code?

At the time of writing, Claude 4.0 has been our go-to model. It’s a strong step up from 3.7 and performs consistently well for structured code generation and multi-step prompting.

That said, we remain model-agnostic; we test across leading LLMs regularly to find the right fit for each context.

Does this approach remove the need for specialist developers?

Definitely not. This method amplifies expertise, it doesn’t replace it.

If you have expert-level skills and knowledge of specific technology, AI will make you faster and sharper. If you don’t, it will have the polar opposite effect and could lead to disaster. As with any power tool, it takes someone with experience and skill to use it to get the best results. The famous Spiderman quote applies here: “With great power comes great responsibility”.

A good example: our backend developer team built a React/Next.js frontend, with only minimal guidance from a frontend developer. We had high velocity at first, but as we progressed to non-trivial features we started slowing down. Our lack of deep frontend knowledge meant we didn’t know ‘What Good Looked Like’, meaning we couldn’t intelligently challenge the LLM.

In short: AI needs expert gatekeepers. Without them, you may unknowingly let through suboptimal code. Eventually, the overall code quality deteriorates, leaving you with an unmaintainable application.

Should AI-assisted coding be used in solo programming or do you pair/mob?

Both can work.

In our JL&P project, we mostly mob-programmed (because the cognitive load was high), breaking into pairs at times to do parallel streams of work.

But the same approach worked just as well in solo open-source work. The key is to adapt to your context; AI doesn’t change the value of good collaboration.

Will this methodology work with a trunk-based development approach?

It fits just fine. While some of us prefer short-lived feature branches, the process doesn’t depend on your branching model. Prompting, reviewing, and integrating AI-generated code follows the same principles either way: small changes, frequent integration, consistent quality checks.

Do you always use the putting approach with the TODO.md?

Mostly, yes. We’ve found that methodical prompting through a TODO.md file gives clearer, more consistent results than chat-based back-and-forth with an AI that often leads to tangents.

That said, we’re pragmatic; if it’s faster to fix something directly, we do it. Small changes often aren’t worth prompting for.

Does AI-assisted coding feel less satisfying than traditional coding?

There’s a unique rhythm to prompting, reviewing, and refining AI-generated code. It can feel like having “coding wings” – you’re still in control, just moving faster.

And yes, there’s a dopamine hit every time you execute a prompt and see what the model comes up with. The mix of predictability and surprise keeps it engaging – a bit like a good game of golf.

Final thought

These questions show just how much curiosity and experimentation there is among engineers adopting AI. As we wrap up our series, we don’t claim to have all the answers, but we’ve found that applying good engineering practices and a progressive precision refinement process yields good results when working with AI to write code.

At Equal Experts, we’re continuing to refine our methods for AI-assisted delivery at scale; we’d love to hear what you’re learning too. If you’d like to learn more, or talk to us about how we can support your own work, get in touch.

About the author

Marco is a Principal Consultant at Equal Experts with over 20 years of experience in backend development on the JVM. A specialist in functional programming and distributed systems, he is the co-author of Functional Programming in Kotlin (Manning Publications) and is the creator and maintainer of SDKMAN!, a widely adopted tool for managing parallel versions of Software Development Kits.

At Equal Experts, he focuses on the intersection of disciplined engineering and emerging technology, currently exploring the practical application of Generative AI within the software development lifecycle. His work is defined by a commitment to well-crafted, maintainable code and a pragmatic approach to solving complex technical challenges at scale.

Disclaimer

This blog is a record of our experiments and experiences with AI. It reflects what we tried, learned, and observed, but does not represent Equal Experts’ official practices or methodologies. The approaches described here may not suit every context, and your results may vary depending on your goals, data, and circumstances.

[1] Rich vs anaemic domain model: An anaemic domain specifies state and no behaviour. For example, a domain made of POJOs with no behavioural methods constitutes an anaemic domain.

Get in touch

Solving a complex business problem? You need experts by your side.

All business models have their pros and cons. But, when you consider the type of problems we help our clients to solve at Equal Experts, it’s worth thinking about the level of experience and the best consultancy approach to solve them.

If you’d like to find out more about working with us – get in touch. We’d love to hear from you.

From madness to method with AI coding: Part 5 – Back at the clubhouse

Downsizing the prompt

Smaller slices are crucial for better delivery

AI-assisted coding FAQ – Lessons from the fairway

How complex should my TODOs be? Is anything ever ‘too big for a TODO’?

What counts as an “imperfection”? How many are too many?

Do you still code manually? Or let the LLM handle everything?

What model do you typically use when generating prompts and code?

Does this approach remove the need for specialist developers?

Should AI-assisted coding be used in solo programming or do you pair/mob?

Will this methodology work with a trunk-based development approach?

Do you always use the putting approach with the TODO.md?

Does AI-assisted coding feel less satisfying than traditional coding?

Final thought

About the author

Disclaimer

You may also like

Blog

Blog

Blog

Blog

Get in touch

Solving a complex business problem? You need experts by your side.