Steven Webber

Software Engineer

October 6, 2025

Prompts, pitfalls, and progress: What I learned building an app with AI agents

I recently set out to test a hypothesis: Could I build a reasonably complex application using only AI Agents and behaviour-driven development (BDD) without writing a single line of code myself?

The goal wasn’t just about automation; it was about understanding how effective and collaborative AI development environments really are.

I chose to test this hypothesis by building an API-based game that allows users to control robots in a battle. You can also access the git repo I created. However, this blog isn’t about what I built, but rather what I learned along the way.

The setup

My hypothesis

I can build a complete application, including backend APIs, a React frontend, and CI/CD pipelines, using AI agents by:

Driving development via BDD feature files
Not writing any code myself
Focusing on outcomes, not implementation
Reviewing and refining what the agent produces
Enforcing quality through linting and tests

The tech stack

Backend: Java, Quarkus (native builds), Cucumber, deployed to AWS EC2
Frontend: React, TypeScript, BDD with Cucumber, deployed to AWS S3
CI/CD: GitHub Actions with linting, testing, and deployment

The agents

I started with Junie, moved to Warp, and occasionally used Amazon Q. Each brought its own strengths and limitations. Warp’s “agentic” workflow stood out for its collaborative, review-first approach, while Junie was great and nicely integrated with the IDE. I ended up preferring the workflow from the terminal as offered by Warp and Amazon Q. Amazon Q was functional but struggled with prompt context size and lacked rich control over changes.

I used the agents all day, every day, for just over a week. Something to note if working with agents in this way is that even the paid subscriptions aren’t enough. You need the higher-tier offerings, or you will run out of credits/tokens/requests very quickly.

What I learned

1. You’re not out of the loop unless you choose to be

While it’s possible to let an AI do everything, I learned that active involvement is key. I noticed that the agents can:

Misinterpret requirements
Undo working code
Introduce side effects or regressions
Miss subtle design intent

The most successful work came when I reviewed each change, refined prompts, and treated the process like pair programming, not delegation.

2. Guidelines matter, but agents may ignore them

Despite setting up a .junie/guidelines.md with explicit instructions (like always updating prompts.txt), agents often ignored them or followed them inconsistently. I had to:

Reiterate key rules
Evolve the guidelines structure over time
Build “check your work” steps into each task

A lesson learned here is not to just write the guidelines, but also monitor and enforce them.

3. Context is fragile

Something I noticed was that agents struggle as project complexity grows.

Features introduced early conflict with new ones
Changing a test can break others unexpectedly
Large prompts often lead to loops or partial implementations

The longer the session, the harder it became to maintain context. Saving and restoring context (either manually or via prompts files) became critical.

4. BDD is a great fit, but not a silver bullet

Using BDD helped me frame outcomes clearly. It allowed the agent to:

Understand expectations
Validate implementation with real tests
Keep development focused on behaviour

But I also saw:

Tests that passed but didn’t reflect the intended logic
New APIs added that allowed “cheating” in tests (e.g., placing robots directly)
Test flakiness from randomness (robot/wall placement)

BDD is powerful, but only if the test design is intentional and precise.

5. Warp’s workflow was a game-changer

Warp’s approach of showing each proposed change with reasoning was the closest thing to working with a human teammate. It meant I could:

Review before applying
See why changes were made
Refine prompts without restarting

This dramatically improved trust and control, especially compared to agents that “just do stuff” without showing their thinking.

6. Human creativity still has a role

As much as I loved seeing features come to life without writing code, I began to miss creative problem-solving. Prompting all day started to feel more like coordination than development.

Some questions that arose around this time were:

Am I just a prompter now?
Where’s the joy of architecting a system?
Is reviewing AI output enough to call this “development”?

The answer might be yes in some cases. But for me, the balance of human insight and AI generation is where the magic lies.

Final thoughts

In just over a week, I achieved:

A working backend and frontend app with real features
CI/CD pipelines deploying to AWS (S3, EC2)
All features driven by BDD
Fully documented APIs (Swagger)
README files and developer scripts
A demo project with real-time interaction

I couldn’t have achieved this working alone in the timeframe I had. The final app was nowhere close to perfect, but this was a powerful collaboration and interesting experiment.

Advice to others exploring AI agents

Set clear guidelines, and make sure they’re enforced.
Review everything. Trust, but verify.
Be patient, because iterative work pays off.
Expect surprises, and build in time for debugging.
Use agents as collaborators, not magicians.

More about the app

The app is an API based game that allows users to control a robot in a battle against other robots. If you’d like to have a go at building a robot and battling it against others, please do. I’d love to hear about it and all the bugs you found.

The Git Repo with all three projects (Backend, Frontend and robo-demo) is available. The repo includes all the prompts I used. It’s the only file I’ve manually updated as the Agents didn’t always update it or update it correctly.

About the author

Steven Webber is an experienced Java developer, Technical Lead, and Software Architect. He drives engineering excellence by implementing DevOps concepts and cloud solutions while mentoring development teams to improve code quality and delivery. He is passionate about advancing software engineering practices and fostering continuous improvement across technical organisations.

Disclaimer

This blog is a record of our experiments and experiences with AI. It reflects what we tried, learned, and observed, but does not represent Equal Experts’ official practices or methodologies. The approaches described here may not suit every context, and your results may vary depending on your goals, data, and circumstances.

Get in touch

Solving a complex business problem? You need experts by your side.

All business models have their pros and cons. But, when you consider the type of problems we help our clients to solve at Equal Experts, it’s worth thinking about the level of experience and the best consultancy approach to solve them.

If you’d like to find out more about working with us – get in touch. We’d love to hear from you.

Prompts, pitfalls, and progress: What I learned building an app with AI agents

The setup

My hypothesis

The tech stack

The agents

What I learned

1. You’re not out of the loop unless you choose to be

2. Guidelines matter, but agents may ignore them

3. Context is fragile

4. BDD is a great fit, but not a silver bullet

5. Warp’s workflow was a game-changer

6. Human creativity still has a role

Final thoughts

Advice to others exploring AI agents

More about the app

About the author

Disclaimer

You may also like

Blog

Blog

Blog

Get in touch

Solving a complex business problem? You need experts by your side.