AIEngineering

Practical Lessons from Six Months of AI-First Development

Nathan Atherton· Staff Software EngineerJanuary 15, 20269 min read

It's been six months since I went fully AI-first in my development workflow. Not "AI-assisted" - AI-first, where AI agents do the majority of implementation work and I operate as an orchestrator. Here's an honest retrospective on what I've learned.

What Works Well

Parallelism is the superpower

The single biggest productivity gain isn't that AI writes code faster than me (it's roughly the same speed for simple tasks). It's that I can have five agents working simultaneously. A feature that would take me a day of focused work gets done in 1-2 hours of wall clock time.

But parallelism only works if you're good at decomposition. Early on, I'd create agents that stepped on each other's toes - editing the same files, creating incompatible interfaces, duplicating work. The lesson: file ownership is sacred. One agent per file, no exceptions.

Fresh context per agent produces better results

I initially tried to have agents share context - passing one agent's output to the next. This created a telephone game where errors and assumptions compounded. Much better to give each agent fresh context with clear requirements and let them work independently.

The trade-off is that you repeat some context in each prompt. That's fine. The cost of a few hundred extra tokens is nothing compared to the cost of an agent building on a misunderstanding.

Hook guardrails are non-negotiable

I cannot overstate how important automated guardrails are. In the first week without hooks, an agent committed a .env file, another added console.logs to production code, and a third modified a generated file that got overwritten on the next build.

Now I have 11 hooks that catch these issues before they happen. It took a day to set up and has saved weeks of debugging time since.

Model routing saves time and money

Using Opus for lint checks was costing me time (Opus is slower) and money (Opus costs more) with no quality benefit. Routing verification to Sonnet and trivial checks to Haiku dropped my spend by 40% and made feedback loops faster.

Voice announcements keep me in the loop

This sounded gimmicky when I set it up, but it's become essential. When I'm reading documentation or reviewing a PR in another window, hearing "Agent 3 completed on blog-build" means I never miss a status change. It's ambient awareness without constant context-switching.

What Doesn't Work

Shared mutable state between agents

The biggest failure pattern: two agents working on code that shares runtime state. Agent A writes a function that mutates a module-level variable. Agent B writes another function that reads that variable. Neither knows about the other's approach, and the result is a race condition that passes tests but fails in production.

The fix isn't better prompting - it's better architecture. If work requires shared state, it should be one agent's job, not split across multiple.

Over-decomposing small tasks

For a while, I was decomposing everything into the smallest possible units. "Create the type" → Agent A. "Create the function" → Agent B. "Write the test" → Agent C. For a feature that would take one agent 5 minutes, the coordination overhead of three agents made it take longer.

Now I use a simple rule: if a task takes under 10 minutes for one agent, don't split it. The overhead of spawning, coordinating, and merging isn't worth it.

Expecting AI to understand business context

AI agents can write technically correct code that's completely wrong for the business. They don't know that "discount" means something specific in your domain, or that certain customers have special pricing rules, or that the legal team just changed the data retention policy.

I've learned to be explicit about business rules in every prompt, not assume context. If there's a constraint that matters, it goes in the task description - even if it seems obvious to me.

Long-running sessions without compacting

Context windows fill up. When they do, AI quality degrades noticeably. Early on, I'd run marathon sessions with a single agent, and the output quality at the end was measurably worse than at the start.

Now I compact aggressively (every 30-40% of context usage) and start fresh sessions for new features. Short, focused sessions produce better results than long, unfocused ones.

Surprising Things I Learned

My architecture skills improved

When you have to decompose every feature into parallelisable tasks with clear interfaces, you naturally design better architectures. The discipline of "can I split this into independent pieces?" leads to more modular, more composable systems.

Code review became more enjoyable

When I was the one writing the code, reviewing was tedious - I already knew what was in it. When AI writes the code, reviewing is genuinely interesting. I'm reading code with fresh eyes, evaluating decisions I didn't make, and catching things I might not have thought of.

I write more tests than before

It sounds counterintuitive, but AI makes test writing so low-friction that I test things I would have skipped before. Edge cases, error paths, integration scenarios - when testing is just "describe what to test and let an agent write it," the threshold for "is this worth testing?" drops dramatically.

The emotional adjustment was real

There was a genuine identity adjustment period. I'd been a "hands-on-keyboard" developer for years. Stepping back to orchestrate felt like cheating at first. It took a few weeks to internalise that orchestrating is a skill, that the systems I'm building are still my work, and that the output quality is at least as good as what I'd produce manually.

Advice for Teams Considering AI-First

Start with one person. Don't try to switch the whole team at once. Have one engineer go AI-first for a month and report back. The learning curve is real but manageable.
Invest in guardrails before speed. Set up hooks and automated checks before optimising for parallelism. Speed without safety creates expensive messes.
Don't mandate specific tools. Some engineers will thrive with AI orchestration. Others will prefer AI-assisted coding (with more direct control). Both are valid. The goal is productivity, not uniformity.
Measure outcomes, not keystrokes. Lines of code per day is meaningless. Features shipped, bugs introduced, time to resolution - these are the metrics that matter.
Budget for experimentation. The first month of AI-first development is slower, not faster. You're learning new patterns, building configuration, and developing instincts. The productivity gains come in month 2 and beyond.

Six Months In

I'm more productive than I've ever been. Not marginally - dramatically. Features that used to take days take hours. I have more time for architecture, mentoring, and the strategic work that actually matters at the staff level.

But it's not magic. It's a skill set - decomposition, prompt engineering, verification, coordination - that takes time to develop. The tools are powerful, but they're only as good as the human directing them.

If you're considering going AI-first, my best advice is this: start small, invest in guardrails, and give yourself permission to work differently than you always have. The adjustment period is worth it.