Genie, Take the Wheel
How I Stopped Micromanaging Code and Learned to Love the Lamp
Over the last two months, I went from zero to fluency with Claude Code’s take on vibe coding. Prior to this, I hadn’t used any AI in programming outside of a couple of years of GitHub Copilot’s auto-complete feature. In a typical (for me) act of hipster defiance of the popular, I had completely skipped the Cursor revolution until after I had already gotten familiar with Claude Code.
This gave me the rare opportunity to experience vibe coding without any existing preconceptions from intermediate evolutionary links like Cursor or Windsurf. Here are a few peculiar observations from the experience.
1. So much breadth..
Vibe coding is like the spice; it expands one’s horizon beyond one’s natural limitations. It gives one new superpowers.
Thanks to Claude Code, I am suddenly a crappy Swift UI developer. Neat. For a quick experimental app, I don’t need to learn much of anything about how it all works, I can trust the magical code roulette to get it done, and an app with a few views appears in my Xcode.
Thanks to vibe coding I can do basic computer vision with Python. I expected that to be more mystical, but it just works, and I don’t have to wonder how it is implemented unless I want to.
Thanks to the vibe code roulette, I have never written Flask before, but I have a Flask server up and running, doing useful work. The agent recommended I use Hono for my TypeScript backend, made it safer with Zod, tested the implementation with vitest and poof, the implementation appeared out of thin air.
I needed a couple of simple Chrome plugins. One-shot, no problem there. The list goes on and on.
This is a huge change, especially for those of us who remember buying books to pick up a new ecosystem, watching hours of tutorials, or spelunking through official docs for that one needle in the haystack detail blocking us. The barrier to entry for getting started with any ecosystem, at least for code with examples available online, has eroded to almost zero. Whatever it might have seen on GitHub or beyond is now at your fingertips. For quick hacks and small experiments, it’s a big deal.
2. .. at a cost
The breadth of tools that you can now use on a whim is spectacular. At the same time, you’re under no illusion of actually having become an expert in any of them. You’re a nomad doing short stays, not a real local with any depth of understanding. For many use-cases, that’s fine.
What’s not delightful is that as soon as you stop getting what you want out of the interaction with the agent, you are on your own. You have a codebase to maintain that you’ve never looked at. After all, the whole thing was conjured for you by your ambivalent coding genie. It tried its best to please, hit its limits, and now you have to pick up the load. The abstraction balloon burst, and you plummeted back to the previous reality of organic hand-crafted small-batch edits.
Was betting on vibes sensible here? Is this still more productive than the traditional path of needing to build at least some experience with your tools before you start using them? That’s entirely context-dependent; sometimes, yes, sometimes no.
The feeling of working with a vibe coded codebase is that of repeatedly opening a project written by someone else, one that changes every time you look at it. To thrive in this new paradigm, you need to let go of meticulously combing through every line that goes into the project as you edit and code review. That’s not your job anymore as long as things work.
You’re now a stranger in your “own” codebase. You open files as if it was the first time. You don’t know where to look, things are named and positioned in ways you yourself wouldn’t have, nor in a way you would have instructed your team to follow. There’s cruft and scruff in there that you didn’t ask for, hitting you all at once. Not too different to opening another maintainer’s repo for the first time.
This project isn’t actually yours anymore. Someone else is building it, they have their own style and approach, their best practices and biases. You can roll up your sleeves and try to micromanage every line, but now you’re back to the dark ages of pre-2024. With a coding agent, you’re facing the perennial tradeoffs of delegation. You’re having to decide how much effort to spend in the weeds rather than on the bigger picture. And it’s clear to me that the value is not in the microscopic details.
3. Code loses value
As bottomless code pours out of the genie, you have several epiphanies in short order:
Code is now throwaway.
Because you invested virtually zero time building it, you have zero loss aversion to throwing it all away and having it rebuilt from scratch. Sure, you’ll burn a few tokens and torch a few minutes, but someone else is doing the bulk of the work.
In the past, writing a quick experiment, a quick script, a small command line tool would have caused hesitation. You weren’t sure it was worth the bet of spending 15, 30, or 60 minutes executing the task. Reading the documentation for a new library, spelunking Stack Overflow for best practices, and having to remember how to create and activate virtual environments for that specific ecosystem - ugh.
Now? As long as you know how to validate that the script is working, I can have it appear out of thin air in under a minute, and as soon as I’m done, I can throw it away. Glue code is effectively free now, so you might as well spam it.
Over time the LLM architectures and techniques will get better, context windows will grow, the attention mechanism will be perfected, and overall model effectiveness will improve. A 15-minute vibe-coded script will be replaced by a no-brainer 60-minute script, which will be replaced by a no-brainer half-day task, and so on.
It’s possible the progress will not be linear, but will instead hit an asymptote. But people’s predictions about this have been wrong before, so maybe the models will continue to get better at the existing pace. The incentives for capital are all aligned around continuing to improve the tech and gain more leverage.
4. Just-in-time software
We’re not there yet, but as the quality of intelligence-on-tap goes up, and costs go down, there’s no reason why more software is not entirely generated on the fly. It will start small. Naive glue code and simple scripts are already a mostly-solved problem, and the circle of “one- or two-shottable” software will only grow.
Want to put together a quick soft synth for your website? Need a physics simulation, visually rendered by the GPU? Fancy turning a PDF intake form into a whole data entry workflow with validation and user guidance? No problem, given an accurate enough problem description—and that’s a big given!—you will likely get what you want. With more specialized libraries created and uploaded to GitHub every day, stacking legos towards a solution only gets more viable.
Maybe not on the first iteration, but you wouldn’t get that from a human developer either: they don’t know what’s in your head after all, you have to share that vision, explain the constraints, and make sure they fully grokked it. And you probably suck at writing the perfect all-comprehensive spec, it’s tedious and laborious, and you’d rather have a conversation instead where you’re polled for details, you’re discovering the constraints together. LLMs will not develop telepathy, and most often you yourself haven’t thought through every detail until you’ve made some progress and can play with the intermediate implementation steps.
We’re not quite there yet, but it’s only a matter of time. The big question is if we’ll ever get to where a non-expert will be able to get complex software out of these tools without even knowing what to ask for. Possibly not, at least not for a while. But, one day the coding agent might instantly put a dozen options in front of you, and you select the one closest to what you vaguely had in mind, and iterate from there. It’s akin to someone asking for concept art from a skilled artist on their team, but not having the expertise or language to explain what exactly they want. You can often point at an option that feels most right, but you can’t explain it well in advance.
You end up with a much more creative and experimental workflow than what we’re used to today, but one that might actually win in the end, at least for non-critical applications.
Some compare coding agents to the 3D printer “revolution”. Everybody is now able to print physical components for whatever it is that they’re making. But, do they in practice? The revolution never actually happened: 3D printing is still restricted to a niche of enthusiasts, and never gained mass adoption. Most just want to buy finished products, and even if they just need a component, they’d rather get it on Amazon than bother with learning a whole new workflow they might be bad at for some time. It’s likely just-in-time software will follow the same path.
5. Moving up the value chain
With all of this abundance, thanks to your newly-hired hard-working, tireless junior to mid-stage digital SWE genie, you need to move up the value chain. You are now the PM. You’re now the tech lead. You’re now the architect, and your job is to operate at a higher level of abstraction. You’re the UX person. You’re the visionary, the taste maker, the customer representative.
The dream is that eventually you won’t have to look at the code at all because you trust that the implementation of your design is sufficiently well-executed at the micro level. Coding agents are able to take on more and more of the mechanical tasks, and the human is left with matters of feeling, taste, vibe, things that the computer can’t judge, yet.
Of course we’re not there yet, but that’s where everything’s headed. Already today you get to experience the value of being higher up the chain when, instead of spending time thinking about the implementation details of your tasks, you write specs that describe what you’re hoping to achieve and define the constraints for the agent. You then discuss the plan with the agent, reach an agreement, oversee the work, provide feedback, test it, etc. As time goes on the human work will only get more abstract.
There’s a parallel here with what happened with compilers. What’s the last time any of us have had to look at the x86 instructions output by a module that we wrote? Sure, you will pry open the black box once on a blue moon on high-performance applications. Especially if you are a performance specialist, or if you’re trying to debug a strange and particularly nasty bug in some lower level representation of your logic. But for the vast majority of software work, that is not necessary. We trust the compiler to do the good-enough thing. The same way we trust the microarchitecture of the machine below to run those instructions with reasonable speculative execution. No need to scrutinize it.
Again, there’s a whole world of specialists out there who are worrying about these problem. The magic of abstraction allows us to be oblivious to their work for regular application-level tasks. For most of us, the black box you get from the compiler is “good enough”.
I expect that soon entire code modules will become uninteresting for programmers to look at. All you need to know are the interfaces the module provides, everything else can be safely abstracted away. The internals are now the realm of automation, humans not needed, the same way poking at assembly is unnecessary in routine software development. It’s purely functional programming at the layer of modules, assuming no shared mutable global state.
Large, tightly coupled modules that mutate shared state among themselves are going to cause trouble for coding agents and human SWEs alike. They always caused problems, nothing new here. Both software and wetware struggle thinking through all the permutations of state the system can get itself into, and all the branches that take it there.
6. Closing the loop
Andrej Karpathy points out in his AI Startup School talk the importance of letting the agent do its work without a human in the loop. I couldn’t agree more, the difference is night and day. You must do whatever you can to enable the agent to validate that it’s achieving its task without your help, maximizing its autonomy and reducing touch-points with humans.
It’s likely we’ll see a resurgence of TDD/BDD-style testing, given that humans no longer need to be hassled into writing those. Getting adoption of a test-first workflow has always been a pain, but your digital SWE has relentless enthusiasm and will happily oblige. And setting technical constraints has never been more important than when providing guardrails for an unpredictable and sometimes volatile genie.
With a comprehensive set of new tests, reference mock-ups, regression tests and possibly an eval suite, the agent can continue iterating on the implementation until those success conditions are met. Don’t bother humans until you’re done, go away, clanker, we’re paying for your tokens to work, not distract us.
Command-line tools, or anything that can have a straightforward test written for it is a natural fit for this: it is straightforward for Claude to both write these tests and run them as it’s developing the application.
For web apps, it is essential that it is able to open the application in a web page and look at it to make sure that it achieves the UI/UX that you asked it to implement. Or that the page looks aesthetically like what you wanted it to do. There’s no way around giving the agent Puppeteer or Playwright MCP server access in order to open a browser to look at the web page through screenshots.
For mobile apps, it’s once again about being able to open the app and take a look, rather than having to guess correctness on implementation alone, or having to ask the human for an approval at intermediate steps. This is harder, but it looks like the loop is being closed there as well.
Gamedev is lagging behind, although people are making progress there as well. I suspect it will trail behind for some time due to the complex nature of game engine editors and interaction with complex multimedia assets of many different kinds (audio, 3D meshes, animations, levels, VFX, etc.). Testing mechanical scenarios will be feasible, but validating “feel”, “responsiveness”, and emotional impact will be outside of the realm of bots for a while longer.
Ultimately reducing touchpoints with the AI’s workflow is just another variant of workflow optimization you already have to do across individual contributors and teams. The more synchronization rules you create in a process, reducing the team’s ability to ship independently, the less efficient the process.
7. Need for speed
There’s a lot to be in awe of already, in spite of the predictable human tendency to take magical advancements for granted. The truth is however that there’s still much work to be done before these tools reach their full potential.
Model speed is one of those major unlocks.
Right now, even the simplest command run by Claude Code takes 5 seconds or longer. That doesn’t sound like a lot, but you feel it. There’s a reason why it takes that long: every time you ask the tool to do something, behind the scenes Claude Code executes a sequence of steps. It needs to figure out your intent, what you’re asking it to do. It needs to break down the request into manageable chunks. It needs to execute said chunks with the correct tools for the job. It needs to check that the change actually achieved what it set out to do.
Every one of those steps introduces latency between the request and the actual operation being executed. By itself, not a big deal, but add it up over a full day, and it’s significant. But this sum isn’t actually the major culprit.
The real offender is that the delays are long enough to tempt you to context switch to something else, like opening your browser or checking your Slack or email, instead of staring into the void of your terminal waiting for the result. Now you’re context switching between several parallel activities, only half-paying attention to any of them. You are constantly pulled out of the zone as you wait and get distracted, or get bored of waiting for something to finally happen.
I suspect that if we got rid of the delays, if we increased the speed of token generation to 5-10x the current speed, the experience would be more fluid and stream-of-consciousness-like. The bottleneck would be the jockey’s ability to ask for changes and review the proposals emitted by the genie. My hunch is that we don’t yet understand just how different that boost in speed will feel: it will not be just faster, it will be a 0-to-1 towards an entirely different experience, changes at the speed of thought.
8. Ugh, you want me to do this by hand?
Speaking of quick adaptation, I’m still amused that it took me only a few hours of Claude Code to feel hassled by having to write code manually again. The human brain has a seemingly limitless capacity for taking new conveniences for granted, and editing code by hand appears no different. “Massaging characters on a screen? Are we back to the Stone Age? We have assistants for that now.. this is such a hassle! Ugh, why do I have to do everything myself? Why aren’t you better, Claude?”
The assembly-to-C analogy again comes to mind: you get used to working at a higher level of abstraction, but, in the early days, the abstraction often wasn’t good enough. You had to dive down one layer, with the full resentment of losing the conveniences of upstairs, and the need to remember how it all worked below your new zone of comfort.
I wonder if near future newly-minted SWEs will have never had to manually edit source code beyond the tiniest tweaks. Will they just point and explain the change they want, perhaps without much precision? Will everybody become a tech lead of sorts, from day one?
I wonder if the transitional generation of programmers, brought up in a manual editing world, will have a distinct advantage over the younger crop. Or perhaps manual editing will be an anchor holding them back from achieving their full vibe-coding potential, incapable of letting go of the wheel, like drivers who spent decades with a manual gear box, refusing to trade familiar control for convenience.
I wonder if manual editing will follow the path of manual memory management, where a small subset of developers still needs to be familiar with it, but for the rest of the community and the industry, a fully automated approach will be more than sufficient to get the job done. Even in the game development world these days, even with the native implementations like Unreal Engine, the C++ programmer is actually hardly, if ever, thinking about memory management - that being orchestrated for them by the system and done perfectly well as long as one follows the expected idioms.
9. Everything is new all at once
This is to be expected, but the progress of best practices around these tools has been breathtaking, and it takes a tremendous amount of time just to keep up with all of the changes introduced daily. You have to give it to the teams working on Claude Code and Cursor: they’re iterating fast and furious, trying to keep up with each other’s constant output. New best practices emerge all the time, and you’re not sure if you should toss out the old ones or if what is now popular is a real step change.
What do you put in your CLAUDE.md? Which of the many layers of CLAUDE.md files? Should this go into a command instead? Should this be a hook? Should this be run as a GitHub action Claude Code command instead? What’s the best set of Claude Code commands to have in one’s project? Should this be run by one agent or fanned out to a swarm of agents? How many tasks in parallel? What’s the best way to prompt the agent? What format of specs works best? Which model should you be using for which type of task? Can you swap the stock models for third party ones and get better ROI?
It’s all exciting, moving very fast, and very volatile. You can give up and wait for the tools to stabilize, but the FOMO is real, and you worry that you might be left behind, with competition racing past you, whether in the business or in the job market. Much of that pressure is likely not worth giving in to, but it doesn’t mean you should completely sleep on this.
10. In the end
My guess is that in the medium term the average developer will have no choice but to adopt agentic coding as their daily driver. The tools will get faster, more efficient, more autonomous, and staying on top of the latest advancements will keep you employed. Editing logic by hand will be akin to shoveling dirt in an era of bulldozers. After all, why would a company not pay the same for more output, given the option?
The number of developers needed by the industry might shrink over time—perhaps analogous to farming in the 20th century?—and only the most machine-augmented programmers will be left around to orchestrate swarms of intelligent assistants. Fortunately, software developers have gone through these augmentation cycles many times before and are the first to embrace new tools that increase leverage. Unfortunately, it’s unclear if the market will be able to support as many of them as before. Or Jevons Paradox might kick in again and we’ll be just fine.
I can’t imagine a world where human SWEs aren’t the bottleneck for agentic development. Already today with Claude’s sub-agents and parallel Claude Codes running in the same git worktree, the human attention span and context shifting can only stretch so far. Is a wetware orchestrator able to keep up with half a dozen agents all implementing their own separate feature in parallel? Apparently not. Kanban’s rule around limiting WIP tasks seems more relevant than ever.
Regardless of what happens, there hasn’t been as exciting a time in SWE in a while, and we can only guess where the current trajectory is taking us. The growing pains of the technology are both frustrating and to be expected, but the potential is obvious and awe-inspiring. The least we can do is hook ourselves onto the unstoppable train of progress and ride it wherever it takes us.
#ai #coding-agents #claude-code #software-development #automation #productivity #developer-experience #future-of-work #programming #artificial-intelligence #workflow #abstraction #testing #tdd