The Last Carpenter
It’s late in the evening, I’m on my second decaf, and I should be heading to bed. But I know that if I don’t try implementing a terminal inside of Kale tonight, I’ll keep putting it off. My body is already bracing for the pain. Hours of reading docs. Library versions that don’t play nice with what you’ve already integrated. StackOverflow archaeology. Pseudo-terminals, Electron IPC, the venerable 13-year-old xterm.js. A lot of reading just to glue it all together and hope it works.
But then I remember: none of this applies anymore. These are scars from an era that disappeared a few months ago, but the old lessons keep resurfacing. It’s like refusing to go to the dentist even though we just invented anaesthesia. I make a new git branch, have a quick conversation with the agent about how to implement what I have in mind, and in 20 minutes the machine is off to the races. I alt-tab to something else. Half an hour later, the familiar ding: job’s done. I’m sure it’s broken. I remember when agents would tell you they’d succeeded but actually hadn’t. I start the app, expecting a half-baked monstrosity. And there it is: a working terminal running inside my app, with Claude Code and a custom system prompt already loaded, handling my inputs correctly. I can shift-return, I can stop Claude Code without accidentally dropping into the shell. I had to do nothing but chat.
That was… uneventful? My previous conditioning was wrong. The correct answer now is always “sure, why don’t we try that, it’s literally a freeroll.”
I studied software engineering for four years in college. I’ve read hundreds of books. Thousands upon thousands of hours of practice. Fifteen years of carving ornate tables and chairs by hand. Now I press a button and the table appears.
I built Kale – a writing tool, designed around how I actually write – and it messed me up. Not because it didn’t work. Because it worked great. And in working, it confirmed the thing I’d been trying not to think about: the craft I spent my career learning is going away. Software developers have been through this before. We’ve been unlearning things we prided ourselves on since the ’70s. We laugh at the “skill” of sequencing punch cards. But when it’s your turn, it hits different. I am the carpenter who has lost his craft. The question is whether I can also be the architect who directs the machines. Kale was my attempt to find out.
On Kale, by Kale, in Kale
This article is a true dogfood. I composed it in Kale, Kale prompted this article, Kale crafted it, Kale polished it. The software iterated itself.

Functionally, Kale is a document editor with a built-in comment system, designed around my personal workflow for writing articles. Technologically, it’s what you’d get if you threw Cursor, Claude Code, Google Docs, and Git for desktop into a blender and optimized the smoothie to my particular taste. Almost everything I’ve worked on for the last five years has been zero-to-one. Blank canvas, figure out what the experience should feel like, ship it. At Freckle, we improved math education for classrooms that didn’t know they needed help. At Double Dusk, co-op gaming from first principles. Caltsar was meeting accountability for Google Calendar. With Tutorbox, we launched AI-powered English tutoring before most people had even heard of ChatGPT. Every time, the same loop: understand what someone actually needs, find the right shape for it, build the thing. Inspired by Peter Steinberger of OpenClaw and Boris Cherny of Claude, I built Kale to feel (in my hands, not in the abstract) what it’s like when an agent builds your software for you. I wrote every line of code for my first startup’s MVP. Artisanal. Handmade. My hands. I have now written zero lines of code in almost a year and I’m over 100x faster than I was three years ago. Let’s see how far the most expensive tools on the market can take us.
The Genie and the Rule
When creating Kale, I held myself to the YouTube Rule, coined by my friend Finbarr Taylor: “At all times the agent is coding, it must require so little attention from me that I can be watching YouTube.” No peeking. No fidgeting with something it’s working on. Full delegation. Years of managing engineers taught me that the best delegation isn’t looking over someone’s shoulder. It’s setting context, defining success, and getting out of the way. The YouTube Rule is that instinct, formalized. As an added benefit, I get to watch YouTube.
The New Craft
The YouTube Rule required a completely different relationship to building. I started by going after the scariest assumptions first – the stuff I had no idea if the agent could even pull off – and validated them as fast as possible. Experiment, play, don’t lock anything in. You stop thinking about code and start noticing where the thing feels wrong. Where the UX is clunky. Where the architecture is getting in the agent’s way. It’s more like sculpting than engineering. One thing I carried over from game development was the obsession with feel. In games, you care about the emotional impact of every interaction. Most general software doesn’t bother: it’s all function, zero feeling. For Kale, I wanted the canvas to look pretty, to make me want to write, and to feel good when the agent reviewed my work or workshopped a passage. At Double Dusk, we spent weeks tuning the weight of a character’s jump. Not the physics. The feel. How heavy should the landing be? How floaty the peak? We’re not aiming for accuracy. We’re shooting for experience. That obsession carried straight into Kale. How does it feel when a comment appears? When Claude finishes a review? These micro-moments are where users fall in love or bounce. Most software teams never think about this stuff. But experience is everything. My goal with Kale: a writing app that both performs well and feels right for the N of 1 of me.
The Comment System
Kale’s comment system does three things, and each one emerged from a real pattern in how I actually write. First, comments are reminders for myself. Stuff I want to come back to but don’t want to deal with right now. This mirrors the two phases of writing music: the generative phase and the editorial phase. You can’t do both at once, they’re different parts of your brain. Comments let me punt the inner critic to a second pass so the first pass flows instead of stuttering. Second, comments are a to-do list for Claude. Standard operations against specific chunks of text: “This passage flows poorly. Fix it.” “This turns into word salad. Fix it.” “I’m not sure this is actually true. Research it.” “I want to rewrite this. Give me a draft.” Third – and this is the most interesting one – Claude uses comments to act as an editor and writing coach. This came from my own experience working with human writing coaches. I’d send them a polished article, and days later get back a file plastered with edits, critiques, ideas, and feedback. Hugely useful for getting better at the craft. Now I can get that experience in a few minutes instead of waiting a week. Is it as good as a pro? Probably not. Is it vastly more convenient? For blog posts and essays, where the gap between “pretty good” and “perfect” barely matters? Yeah, absolutely. Iterate faster, avoid getting stuck.
Choosing the Comment Format
Comment systems for the written word are all over the map. My constraint was specific: I wanted to edit files directly on the file system. What’s on disk is what you’re working with. No translating Markdown into some internal format and back. Option one: build a custom internal representation, inject comments into that, export to Markdown when saving. Clean, but way heavier than what I needed. Option two: a sidecar file tracking comments next to the content. Sounds fine until the writer starts deleting lines, adding paragraphs, and moving things around. Keeping the sidecar in sync is doable from the editor’s perspective, but once Claude Code enters the picture, you’re asking the agent to understand your custom sidecar format. That eats context and adds fragility, all for a problem that has simpler solutions. I went with option three: inject markdown comments directly into the source file. Everything lives in one place. Comments are invisible when published. Both the writer and Claude can edit freely without breaking anything. The main cost: making sure the editor doesn’t let you accidentally half-delete a comment boundary. A few quirks, all manageable. Simplest solution that met every constraint.
Saving, Git, and Knowing When to Stop
Saving to disk sounds trivial until you realize the user might be writing at the same time as Claude. Both of you, editing the same file, side by side. Claude’s update tool flails if the file has changed since it last read it, so there’s this dangerous window between your keystrokes and the background save where Claude could try to write and blow something away. Fix: autosave every three seconds. Fast enough that a thinking model is too slow to sneak changes in between. A couple hours later I’d forgotten it had ever been a problem. Git integration followed the same principle: do the least possible thing. Having manual version control under the writer’s control felt powerful: you can see the article evolve, and if you let the agent loose on your writing, there’s no way to lose your work. But the temptation to bloat was constant. Am I trying to recreate Cursor? Do I need a file explorer? A diff viewer? Branch management? Your brain is incredible at making things more complex than they need to be. You can always add more – more options, more workflows, more buttons, more knobs – and in an AI writing tool, you could have the AI nag you about a thousand different things, distracting you from the actual writing. I don’t want that. I want to be immersed. To write. To flow. So: what’s the absolute minimum? Save button. Open file button. Restore button. Those were the only ones I actually reached for. Everything else? Overkill.
How Much Leash?
The meatiest non-obvious decision was safety. Should this app be completely uncontrolled – able to execute anything on the hard drive – or highly constrained, where the developer picks the tools and content lives only in memory? I let the agent loose. Kale runs a completely uncontrolled coding agent with total access to everything on your machine. I baked guidelines into its instructions – stick to this one document, don’t edit my machine, don’t connect to other computers, don’t do anything funny – but these are suggestions, not walls. The machine can talk itself out of following your rules if it decides it has a better idea. I was generally impressed by how well Claude respected the guardrails. But this approach would NOT be acceptable for anything public-facing. Right now, it’s a bazooka. I wouldn’t feel comfortable handing it to someone else, even with detailed instructions on how to avoid blowing yourself up. In a real product, Kale would need sandboxing: a safe container so users don’t explode their files or exfiltrate their data. And here’s the thing that building this made visceral for me: despite the safety training that companies like Anthropic invest in their models, it’s essentially impossible to make a perfectly safe AI. There will always exist some perfect string of inputs that changes its goals. Scale changes everything, too. My N of 1 went well. That tells you nothing about N of 10 million. If you’re worried about a one-in-a-million event, your wariness should increase dramatically when you’re actually dealing with millions of instances.
Closing the Loop
The YouTube Rule is a nice idea. Making it actually work inside a tactile Electron app was a whole different beast. You have to set the agent up for success before you walk away: prepare the plan together, keep the architecture minimal so it doesn’t get confused, have tests in place, and – this is the big one – give it a way to see and interact with what it’s building. In a document editor, everything is a bug hotspot. Moving the cursor with arrow keys, pressing backspace, selecting text to create a comment – all of it. The agent needed to operate the app without calling me over. I landed on dynamically generating Playwright JavaScript to drive the app directly (screenshots, mouse movements, selections, zooming in and out) instead of trying to wrestle an MCP server connection that never quite worked out of the box. I had a screenshotting script I’d ported from past projects: Cui and Rui, native apps with custom desktop UI systems we’d built from scratch at Double Dusk. Those frameworks were years of work. Seeing a fragment of that effort find new life inside Kale was a quiet payoff I hadn’t expected. The testing approach that emerged: create a test Markdown file in a temp folder for whatever corner case you’re chasing, spin up a fresh instance of the app against that file, reproduce or confirm directly. Define what “done” looks like, the same way you would with a human dev. Once this was in place, 90–95% of issues got reproduced and fixed without me touching anything.
Not everything worked off the bat. I wanted a diff view, similar to what you get in Cursor or a GitHub Pull Request. A trivial, entirely-solved problem. The agent would one-shot it after a quick conversation about the optimal approach.
Instead, I kept coming back to find the diff showing me something I didn’t expect or want. Paragraphs mixed up instead of lined up side by side. Small edits within a paragraph convincing the system that the user had created an entirely new section of text. The agent thought it was done each time, but as a human writer trying to make sense of the before and after, nothing tracked. Further prompting only sent it into a loop: fix one corner case, regress another, never reaching an acceptable state.
I abandoned those branches. The juice wasn’t worth the squeeze, both in iterations and in the sprawl of code the implementation was adding for marginal benefit. In retrospect, I should have been ultra-crisp about what successful diffs looked like and captured the corner cases upfront. My vagueness about success criteria is what threw the agent off. I’m confident a second attempt would land, now that I know exactly what’s going to stump it. I experienced an engineering “team” faster than the dozens I’ve led in my career. It lacks some of the context-questioning habits of humans. It also never needs a bathroom break. But that moment stuck with me — the speed is real, and so is the gap that only taste can close.
Hats Off to the Machine, New Hats on My Head
The thing that fascinated me most: what it actually feels like to be the PM, the UX person, the user researcher, and the software engineer all at once. Not in theory. In practice, in one afternoon. As a founder, wearing many hats is nothing new. But the speed – making real progress without needing to go find an expert, without waiting on anyone – that was something else. At Freckle, being the CTO who also cared about UX meant fighting for calendar time between code reviews and interface polish. Always one or the other. With Kale, that tension just… dissolved. The machine handled the mechanical translation between what I wanted and the code that made it happen. I got to think about the product all day. When you’re this fast, you can experiment with interfaces and approaches like you’re sculpting a hundred iPhones out of clay to feel which ones sit right in your hand. It used to take two weeks to roll out a single prototype. Each try was expensive, so we spent so much time guessing in advance. Now it’s “Let’s try this. Nope. Let’s try this. Better.” This is how we built Whisk (the two-player co-op at Double Dusk). What’s the minimum set of mechanics that actually delivers fun? You can’t spec “delight” in a design doc. You have to feel it. And now you can feel a dozen variants in an afternoon instead of a quarter. This raises a real question about scale. Once a zero-to-one gets real traction, do you bring on deep domain experts? Is the improvement marginal or transformational? Maybe the tools keep getting better and the value-add of specialists shrinks. Maybe the pros with the same tools stay ahead. Maybe the gap shifts – as it has in software development – toward taste and context and understanding the problem instead of executing the solution. Either way: we’re all going to have much better software because builders can finally play and experiment at the speed of their intuition. I’ve done it. It’s magic.
Iterating on Myself
The models (Codex Extra High, Claude Opus 4.6) are astounding. With the right guidelines, upfront research, an agreed-upon plan, and real tests plus “manual” agentic QA, the results are remarkable. Most things got implemented as envisioned. Some took multiple tries. A few took too long to be worth hammering on any further. The genie is truly extraordinary. But the gap between what it can do and what you actually need—that’s where you earn your keep. Here’s the core tension: independence and correctness pull in opposite directions. Independence means you’re not checking the agent’s work, which means stuff slips through. How do you maximize both at the same time? That’s the unsolved meta-problem of agentic development, and I don’t think anybody has cracked it yet. How do I tell it to take into account all the things a highly capable software expert should (security, robustness, maybe its own taste)? Next time I build something like Kale, I’ll be way more disciplined about testing from the start. Make sure the agent knows what tests to write, review them before it runs off, agree upfront on what “done” looks like. I don’t want to be the one mashing buttons to double-check everything. I want the agent to be independent. But right now, that independence comes at a cost, and you have to decide how much you’re willing to pay.
My Reckoning
Building Kale has filled me with a sense of loss. Four years of school. Hundreds of books. Thousands of hours of practice. And all of that is heading toward extinction. That’s a bummer, no way around it. It’s like being a language carrier for a tradition that’s fading. Building Kale, I finally realized: it’s my turn to set fifteen years of practice on fire. I sit with a real question: how are other people – people who haven’t been doing this for twenty years – going to navigate this new paradigm? Is my experience holding me back? Am I a dinosaur, hooked on a past that’s no longer relevant? Or is my perspective an advantage, because I actually understand how things work at the low level? People find something they’re good at, something the world values – “I’m the X guy” – and it becomes their whole deal. Power, respect, authority, compensation. When that disappears, a lot of people get really upset and want to revolt. These are the Luddites. My identity was “I’m a super solid low-level guy” or “a super solid functional programming guy.” That stuff is basically worthless now. The innovator’s dilemma, playing out in real time: how do you let go of the thing you were exploiting to make a living?
Retreat or Rebirth
The people who stay relevant are the ones excited to get more out of their tools. They are not doing a Renaissance faire LARP of their glory days. This is equally true for musicians and artists as it is for developers. Musicians are struggling against an artistic machine that spits out twenty variants of the punk passion they used to rage. Developers are, well, no longer writing code. The great artists reinvent themselves. The David Bowie types go through era after era, some brilliant, some flops. They keep asking: “Now that all of this is possible, what will I become?” I am the carpenter who has lost his craft. But I’m equally the architect who directs the machines. I appreciate the past. I choose my future.
Coda
Kale isn’t just a product. It’s fifteen years of expertise poured into a new paradigm to see what comes out the other side. There’s more to build. The comment UX is bare bones, I’m curious about porting it to the web, and the tension between independence and correctness is nowhere near solved. But these are problems I’m excited to have. They’re the problems of someone who chose to be the architect, not someone mourning their old tools. I look forward to tinkering with its new possibilities. I will continue to revel in its awesome power.
| #ai #coding-agents #software-engineering #developer-experience #future-of-work #workflow
Subscribe to the newsletter
Hard-won lessons on software, startups, and leading teams.