I tried AI coding - I still don't get it...
NB: I have a considerably longer version of this article which goes more deeply into things. I may post that in the future after a bit more research and time.
I try to be a pragmatic developer. I believe there's deep value in practicing your craft so that you can better make decisions and employ tried and proven techniques to solve problems. After all, that's what the nature of what we do really comes down to: solving problems.
On top of this, I'm a natural sceptic. Things that claim to change or revolutionise any aspect of my life will be met with scrutiny.
That said, I also try to be open minded. Curiosity and learning are wonderful things. After all, how do we know what we should be practicing if we don't explore new and shifting landscapes?
AI development
It goes without saying that LLMs and related tools are the hot topic right now. It seems like almost everyone (especially in tech) is talking about, using, or evangelising tools built on these systems.
In the last while, especially with agentic coding, hype is reaching fever pitches.
"I one-shotted a mobile app and made $20k this month." "I made a flight-sim in 3 hours." "If you're not using AI and doing XYZ you're NGMI." "I'm doing a whole team's work with agents in hours, not weeks."
The hype surrounding all this feels to me the same as the hype which surrounded NFTs, and there is so much hype.
But, if everyone is talking about this, if it's revolutionising their teams and their lives, it would behoove me to investigate. Maybe this is as good as everyone is claiming.
It's been some time since I last had a proper look at the tools that are out there, and the landscape has changed a lot in a very short period of time.
Last week I set some time aside to build a small project where most of the work was done by LLM powered tools.
The project
I chose to assess Junie (PhpStorm) and Claude CLI.
I've been on the hunt for an alternative to Postman for some time. A simple app running HTTP requests and displaying results would make a good test project. If the tools are as good as people say they are, this should be no problem.
Requirements/setup in brief:
- A fresh Laravel + React starter kit to build on
- A simple UI
- Panel on the left listing servers/APIs
- Users can add/remove servers/APIs
- Panel on the right listing environments for simple variables
- A main panel in the middle to configure and send requests to the selected server/API
- Panel on the left listing servers/APIs
- Filesystem storage of config/servers/etc
- I want people to be able to use their own file sync processes with their own teams
- No tabs or nested/layered UIs
- A comprehensive and detailed set of instructions:
REQUIREMENTS.md
,STANDARDS.md
, etc - these defined rules for React, Laravel, and other patterns of the application - Several sessions throughout the week for each agent
- The end result should be a well structured project which other devs could pick up and maintain, bug solve, and expand on easily
Post-project result and observations
Project result
Neither agent completed the requirements in a way that I found to be satisfactory.
The end result from each wasn't functional in the ways that mattered - the applications were "shallow".
To be clear about that, I couldn't really run HTTP requests from either of them without some issue or failure tripping the application up, nor did important basic CRUD operations work properly.
Both agents completely missed requirements and guidelines. Both confidently got things wrong, and with detailed instructions for specific issues they both failed to correct particular mistakes after several iterations.
First observation - friction and stewardship over engineering/development
Both agents needed a lot of constant guidance.
Both were given small tasks, large tasks, tasks with new work, and tasks based on old work. Both ignored a number of the requirements defined in the specs. Both sounded confident in answering instructions, however their work missed the mark almost every time.
Going over tasks or issues again and again was frustrating. Copilot in early days got things closer to the mark based on surrounding context without specs and guidance.
This process also took away one of the things I enjoy the most: actually doing development and knowledge work, thinking on and solving architectural designs and specifics.
Second observation - an unfamiliar code base, and almost no personal retention of project internals
During these sessions I reviewed all the work. I wanted to be familiar with the project and make sure it was heading in the right direction.
What I found was that even if I checked all of its work, reviewed all the diffs, I retained less. I didn't care about it, I didn't know it.
That's bad news from a project sustainability point of view.
NB: this has some similarity to doing reviews of a human colleague's work, though the difference in this scenario is I'm the "owner" of the work. I should know it.
Third observation - dependency
This is where things get particularly alarming for me.
Even if Junie and Claude were not producing results I was happy with, I noticed I was starting to progressively reach for them more for guidance/ideas/solutions/investigation.
From a high level informational perspective I think these tools can be great - they can certainly get things rolling more quickly.
As soon as they become something habitually reached for, however, that requires a lot more caution.
The more we outsource our critical thinking and skills to LLMs or similar, the less sustained and nourished our skills and minds will become.
And this isn't just for development either. The chat interface we interact with, the affirming and placating language these tools use. People are outsourcing more and more of their thinking, and even feeling, to these tools.
But that's another topic in and of itself, for another day...
Conclusion
My scepticism remains intact, and my stance hasn't changed. LLM/AI powered tools are indeed amazing and can be very useful, but I'm not convinced they're all people are saying they are at this point in time.
Two lines of thought after this experiment.
Thought one:
A strong feeling of "I don't get it - I must be doing something wrong..."
The story is that people's worlds are being fundamentally and positively changed by these tools - they're so empowered they can taken on much more work, or reduce their workforce, or make bucketloads cash almost overnight with apps they're building.
My experiences don't make it clear to me how that can be the case in any sustainable and meaningful way. It took me approximately the same time, or more, to get a worse result that ultimately I'll rm -rf
.
Thought two:
I'm a curious person. I thrive when I learn, it's rewarding to put something new into practice.
Outsourcing the very things that make development something I love, and my critical thinking, to external systems which don't actually think? Very wary of this.
A final "PS thought"
People who know me might say I'm "anti AI". The truth is more nuanced. I'm not opposed to it.
I believe there are a lot of ethical concerns and bad actors, and that things are happening far too quickly for us to assess ongoing impact, but I'm not "anti".
And when it comes to writing code, I'm a huge fan of IntelliJ's Full Line Completion and similar implementations. I find these sorts of tools much more adept and improve my efficiency in tangible ways.