Agentic Coding - AI, as a software developer
AI (more precisely LLMs) have been developing very rapidly lately, and it is hard to avoid them in everyday life - they have been integrated into Google search, our email, browser, mobile phone, social media, and they come across whether we like it or not. For some time now, it has also been said that they will replace programmers, and the term vibe coding has also appeared, which refers to developers only providing the vibe, and the AI writing the code.
I have been using it since the beginning (when I got the beta Copilot, quite early on), mainly as a smarter code completion tool, and recently I have occasionally asked it about topics that I rarely need to deal with and do not remember the details exactly, but I have never generated complete programs or larger code snippets with it. Until around the beginning of this year, I considered vibe coding mainly as a game, and most of the stories I came across on social media were more funny or interesting than serious - for example, someone created a complete service with AI assistance without any programming knowledge, just by “vibing”, and is already making money with it, then a few days later reported that their service was hacked and they had no idea what to do since they had clients, where could they get help, …
Then around the beginning of this year, some professionals whose opinions I value (e.g. Armin Ronacher) also spoke positively about the topic, and credible stories also appeared on Reddit, so at the end of spring I decided to embark on an experiment and vibe code a simpler web application. I leave everything possible to AI agents, and only touch the code if absolutely necessary.
TL;DR: I was skeptical at first, but now I have a new friend, Claude, who develops my live-tracker project in the evenings while we tinker in the garden or during long boring meetings.
The project⌗
I wanted to create a meaningful project, not just another TODO app, and since I often run in the wilderness, sometimes alone for longer distances, I like it when my friends know where I am. Until now, I have been sending Garmin livetrack links, but that is a bit cumbersome, and as soon as the run is over, it is no longer available, so I thought it would be nice to have a live-tracker like this that could be expanded later, e.g. with team tracking - this could be useful for those who want to follow the whole team during long relays (e.g. UB).
The MVP was to be a simple single-user website that could track me, but we went much further than that:
- multiple users (but no registration), profile editing
- real-time location sharing
- naming of livetrack sessions, ability to review them later
- map display of users’ last public locations on the main page
- mobile-friendly interface
- dark mode
To be able to track my location, I supplemented one of my Garmin datafields with such a capability. I tried to get AI help for this as well, but at that time I was only using Gemini, which did not handle the - otherwise indeed exotic and rarely used - Garmin Monkey-C language well, so I wrote it myself. With the experience gained since then, I will try again, as I will need a more general version anyway.
(available here, but still quite experimental and only works on more expensive watches)
The tech stack⌗
According to the conversations I read on the social media sites, it’s better to choose a simpler (more beginner-friendly) language and start with as few external dependencies as possible. Python, plain JavaScript were good choices, or even Go as it is relatively simple but still properly typed, so I chose it for the backend. Also, I did not want to start from scratch, so I chose PocketBase for the backend, which is a simple backend solution that also offers a REST API and has a pretty good admin interface.
For the frontend, I chose plain HTML/CSS/JS (with web components), and although TypeScript would be the obvious choice if I were writing the code myself, I read that complex types can confuse AIs, so at the beginning I stuck with plain JavaScript, which also allowed me to get started without any build system.
Backend: PocketBase (Go)
Frontend: HTML/CSS/JS (web components)
Agentic coding experiment⌗
On the command line: since I have been using the terminal for everything for 25+ years, it was natural to choose a terminal-based solution - despite having a Github Copilot subscription for years, it only has IDE integrations currently. When I started thinking about the experiment, everyone was migrating from Cursor to Claude, and I was about to subscribe to the Pro version, which offers Claude Code
terminal agent, when Google made the Gemini CLI agent freely available, so I started using that instead.
First experiences⌗
On June 25, Gemini CLI was announced, and by the evening, the MVP was ready.
Okay, the task was not super complicated, but I had almost no experience in using AI agents in this way, and what it produced was quite acceptable, especially for a tech demo.
✅ quick demos for proof-of-concept purposes
Let’s move on⌗
The goal was to push the boundaries of technology, so I could not stop here, the next step was to support multiple users.
And the first problem. The plan was apparently very cleverly created by Gemini, but during implementation, problems arose. PocketBase is not a very widely used solution (though it has over 50k stars on GitHub), so the model was probably trained on few related example codes, and despite reading the documentation, it started hallucinating non-existent methods, and when it deleted them and added them back again, it got into a loop, so I had to help it. I was still too enthusiastic at that point, so I just wrote the two problematic lines myself and we moved on, but that was the first and last time I touched the code.
❌ with less known technologies, the model may hallucinate and get into loops
During the Garmin datafield extension it started to write Java-like code, which did not work there, so I did not insist on it, and wrote it myself instead.
Docker⌗
Everything else on my NAS runs in Docker, so the next step was to dockerize it.
And here another advantage of AI agents was revealed: I write a Dockerfile from scratch about once every six months, and I always forget the syntax, so I usually start by searching and reading documentation. This time I skipped that, Gemini wrote the Dockerfile for me in half a second, built it, tested it, asked me to log in (docker login
), and uploaded it to DockerHub.
✅ quickly helps with rarely used but otherwise simple, well-documented tasks
GPX upload mini tool⌗
To test the functionality, I needed a way to upload larger routes, so I created a small tool called gpxup that uploads a GPX file via the API.
✅ quickly creates small tools and scripts
Claude⌗
Since I read everywhere that you can try and prompt other agents however you want, they won’t achieve the results of Claude, I decided to invest in a Pro subscription and try it out.
It is recommended everywhere that although the annual subscription is cheaper, things are changing so fast that you should rather take a monthly subscription, and if necessary, you can cancel and switch to the current best solution at any time.
I have to say that in my workflow, Claude is in a completely different league than anything else I have tried so far (up to the writing of this blog post). Currently, if you want to use AI for programming, it is Claude Code. (The Pro subscription only includes the Sonnet-4 model, not the more advanced Opus, but even so, I felt a big difference compared to other solutions.)
All the features have been unleashed: login, profile page, session page, dark mode, …
MCP servers⌗
MCP (Model Context Protocol) is a new standard that allows AI models to use external services (databases, programs, …). It is quite simple (maybe too simple, time will tell), and almost everything supports it. For example, with its help, agents can use a real browser and can check and debug the web pages they create. ( playwright-mcp )
When Claude placed the user avatar at the end of the route, I did not like the shape of the marker - despite telling it that I wanted an upside-down teardrop shape, it used some lame oval (although drops do look more like that when falling, but whatever). I was about to touch the code myself when I remembered that I could try Playwright MCP, and with its help, we managed to get past the problem without human intervention.
Rough refactor⌗
At this point, I had a usable web application whose backend was essentially a single main.go
, and the frontend consisted of a few HTML and JS files. In the long run, this is not maintainable - especially if I have to touch it later myself - so I decided to ask Claude to refactor the code.
Since I suspected that this would be a big chunk, I first had Claude (sonnet-4) create a detailed plan, which I reviewed with Gemini (2.5-pro) and Copilot (GPT-5) - who also added some comments - then the plan was broken down into smaller tasks by Claude, and I asked him to start it so that it would stop after each major step and wait for me to review what it had done. This method worked very well, it updated the plan documents (backend plan, frontend plan) after each step to clearly see where we were, so we could always continue where we left off the next day - or when I ran out of the 5-hour limit.
The backend refactor took 3 days, the frontend 2 days (not full days, 3-4 hours of active work per day), I reviewed everything it did after each step, asked for modifications where needed, ran tests with it, and only moved on when everything was fine.
The two PRs became huge, but the more enthusiastic can review them if they want:
- backend refactor PR ~15k+ lines
- frontend refactor PR ~28k+ lines
(The backend PR failed because gitguardian found the test account password created for Claude among the commits. :D This only works in my development environment, so I did not start messing with the git history.)
✅ serious refactoring with strict supervision and testing
For now, the project is at this point, I thought I would write this blog post and take a little break, then see where to go next. Soon it may reach a size that Claude can hardly handle with its 200k token limit, but there is no sign of that yet.
Tools, models⌗
As I wrote above, which model or tool is the best depends on how you use it, what task you want to solve, and by the time I post this article, probably 10 new ones will have come out.
Models⌗
New ones appear daily, their capabilities/prices/availability are quite varied, you can browse here.
Some can only be used through their own interface, some you can run yourself (if you have the right hardware). Moreover, they can be evaluated based on many aspects, e.g. how good they are at making plans, how well they generate code, whether they can use tools (testing, debugging), how well they follow what you say, how well they can handle longer contexts, and of course how much they cost. A model that is good in one aspect may fail in another, it really depends on what and how you use it.
My experiences:
Gemini 2.5 pro and flash (Google)⌗
I used it for free through Gemini CLI, the pro
daily limit runs out after a while, but it automatically switches to flash
. Perfect for getting to know it, not bad for simpler programming tasks, but I managed to get it into a loop and it also hallucinated. The CLI version has the advantage that if I see it is going astray, I can quickly stop it and steer it back on track.
GPT-4.1 and GPT-5 (OpenAI)⌗
I tried them through Github Copilot (the GPT-5 is quite recent, it appeared during the project), and I used them in the terminal with the help of opencode. It may be familiar to many from the ChatGPT interface, but I have mixed experiences with them, especially the GPT-5 tended to wander off and do things that no one asked for. This was less noticeable with GPT-4.1, but I did not try too much with either, because Claude seemed much better.
Claude Sonnet-4 (Anthropic)⌗
The best I have tried, my experiences can be read above. It performed very well in planning, coding, and refactoring, and if it encountered something it could not solve on the spot - or felt that it would require more effort than I expected based on my request - it indicated this and suggested alternative solutions. I did not experience this with other models.
During the refactor, I ran out of Claude’s limit at one point, and I thought I would have the new GPT-5 write unit tests for the completed part. I almost succeeded, but to properly test two functions, PocketBase would have been needed, mocking the whole thing would have been pointless, so it solved it by deleting things from the implementation and keeping only the testable parts. :D
Claude, on the other hand, simply stated that it would not write unit tests for those two functions, but that it would be worth writing integration tests for them later.
![]()
Agents⌗
There are also many types of agents, everyone knows the websites where you can chat with LLMs, but there are also solutions specifically designed for programming:
- command-line (CLI/TUI) tools
- those provided by the model vendors themselves (Gemini CLI, Claude code, OpenAI codex, …)
- independent but resellers (Cursor cli)
- completely independent (opencode, crush, …)
- IDE extensions (Github Copilot, Amazon CodeWhisperer, Tabnine, …)
- complete IDEs (Cursor, Replit, …)
- Copilot on Github (generating PRs, code review, handling issues, …)
- …
My workflow⌗
First of all, before doing anything, I create a git/jj repo, and after every step, I commit and push. Agents can also use git, but with a simple move they can destroy the entire repository, and if they also have push rights, a small mistake can lead to everything being lost.
Most tools allow you to configure separate programming and planning agents (and usually they are there by default, and you can create others as well), these agents can even use different models - e.g. in the Claude Max subscription, the planning agent uses the Opus model, while the programming agent uses Sonnet-4, supposedly this is the most efficient way to work.
I usually start in plan mode, and try to formulate the request as if I were explaining a ticket at work, so that any colleague could start working on it without any questions. I ask the agent to create a detailed plan of how it would solve the problem. If I deem it necessary, I further refine (or have it refined) the plan, and only start the implementation phase when I myself understand what it will do.
I do not trust it enough yet (?) to accept what it does without review, so I review everything it does after each major step, ask for modifications where needed, run tests with it, and only move on when everything is fine.
Summary⌗
I was skeptical when I started the experiment, but I was positively surprised.
LLM-based AIs will not become AGI, and they may soon reach the limits of the technology, but their current capabilities are good enough to make programmers more efficient and help them progress faster in their work. Part of coding (e.g. boilerplate code, simpler functions) can be completely automated, and developers can focus more on planning and solving more complex problems, in which a smarter agent can also provide assistance.
Will they take away programmers’ jobs? I don’t think so, because software development is a complex task where coding is only part of the work. Supervision is needed alongside them, and we do not yet know their real costs, we are probably only in the investment phase, and every AI company is unprofitable.
Can they replace a junior programmer? In the given moment, yes, but as long as a junior has the potential to become senior, an AI agent will never be smarter than what it is trained to be, and it will not be able to develop or gain experience.
Can they replace a programmer who implements the issued tickets almost without thinking (and without asking questions)? Yes, but I don’t think they add much value anyway. Then of course they will also find (or have already found) their place, and they will eagerly copy the ticket text into the ChatGPT browser window and copy back the generated code without having any idea what they are writing. Unfortunately, I see this happening daily.
One thing is certain, LLMs can be great tools in the hands of software developers, and even if they do not develop further in this direction, they will remain. What I expect are smaller, more specialized models that we can even run locally (on our own computer, even on our phone). I am curious where we will be a year from now.
Comments
You can use your Mastodon account to reply to this post: here