I shipped a feature last month that an AI assistant wrote about 70% of. It worked perfectly in dev. It passed my tests. It broke in production within four hours because I skipped two steps I normally never skip.
That was the last time I shipped without a checklist.
If you’re a solo dev using AI to write meaningful chunks of your features, you need a release process that accounts for the specific ways AI-generated code can surprise you. Not a 40-page runbook — just a short, repeatable set of checks that catches the problems I’ve actually hit.
Here’s what I use now.
Pre-build constraints
Before I write a single prompt or touch any code, I nail down three things:
What exactly is shipping. One sentence. If I can’t describe the feature in one sentence, it’s too big for a single release. I split it.
What’s explicitly out of scope. AI assistants love to be helpful. They’ll add error handling you didn’t ask for, create utility functions for hypothetical future use, refactor adjacent code “while they’re at it.” I write down what I’m not doing so I can catch scope creep in the diff.
What’s the rollback plan. For every feature, I decide in advance: can I revert the commit, or do I need a feature flag? If there’s a database migration involved, the rollback plan gets more specific. I figure this out before writing code, not while production is on fire.
Implementation guardrails
This is where AI-assisted development needs the most discipline. The code comes fast, which makes it easy to skip the parts that keep you safe.
Review every generated file as if a junior dev wrote it. Because that’s roughly what happened. The code will usually run. It might even be well-structured. But it may also include dependencies you don’t want, patterns that don’t match your codebase, or subtle assumptions about your data that aren’t true.
Keep diffs readable. I commit AI-generated code in small, logical chunks — not one massive commit with 400 lines. If I can’t explain what a commit does in one line, it’s too big. This also makes rollbacks surgical instead of all-or-nothing.
Check for hallucinated APIs. This still happens in 2026. The AI will call a method that doesn’t exist, use a config option that was removed two versions ago, or import from a package path that’s slightly wrong. Your linter catches some of this. Not all of it.
Watch for over-engineering. AI loves abstractions. If I asked for a function that sends an email, I don’t need a NotificationStrategyFactory. I actively trim the generated code down to what I actually need.
Pre-release quality gate
Here’s my actual checklist. I print this and check items off. I’m not joking — the physical act of checking a box makes me slow down.
PRE-RELEASE CHECKLIST
=====================
[ ] All new code has been read line-by-line (not skimmed)
[ ] Tests cover the happy path AND at least one failure case
[ ] Tests run against real dependencies (not mocked services)
[ ] No new dependencies added without explicit decision
[ ] No unrelated changes in the diff
[ ] Feature flag or rollback commit identified
[ ] Environment variables documented if any were added
[ ] Migration is reversible (if applicable)
[ ] Tested on a fresh checkout (not just my dev environment)
[ ] API responses checked for shape — not just status codesThe line about fresh checkout matters more than you’d think. My dev environment has cached data, running services, and environment variables that production won’t have. I’ve caught at least three bugs by running the build on a clean machine before tagging a release.
Launch-day checklist
I keep launch day boring on purpose. No other changes go out. No “while I’m at it” fixes.
Deploy during my most alert hours. For me that’s mid-morning. Not Friday afternoon. Not right before a meeting.
Watch the logs for 30 minutes after deploy. Not “check in occasionally” — actually watch them. I keep a terminal open with structured logs filtered to the new feature. Most issues surface in the first 15 minutes.
Verify the feature manually in production. Not staging. Production. Click through it, submit real data, check the database. I’ve had features that worked everywhere except production because of a CDN cache or a permissions difference.
Tell at least one person it shipped. Even if you’re solo, having someone who knows a change went out means there’s a second pair of eyes if users report something weird. I post in a Discord server. A Slack message to a friend works too.
48-hour post-release checks
The first two days after a release are when the slow-burn problems show up.
Day one: check error rates. Not just 500s — look at 400s too. AI-generated validation code sometimes rejects valid input or accepts invalid input in ways that don’t cause crashes but do cause user frustration.
Day one: check performance. AI-generated code is often correct but not optimized. A function that runs fine with 10 records might crawl with 10,000. I look at response times for any endpoint the new feature touches.
Day two: check resource usage. Memory leaks, connection pool exhaustion, log volume spikes. These take time to manifest. I check my monitoring dashboard 48 hours post-deploy specifically for trends that are heading the wrong direction.
Day two: re-read the code. This sounds redundant, but I find things on the second read that I missed the first time. With fresh eyes and the knowledge of how the feature is actually being used, I often spot an edge case or a cleanup opportunity.
Common mistakes I’ve made (so you don’t have to)
Trusting AI-generated tests. If the AI wrote the feature and the tests, the tests might just verify that the code does what it does — not that it does what it should. I write my test assertions by hand, even when the AI writes the test scaffolding.
Shipping multiple AI-generated features at once. When something breaks, you can’t tell which change caused it. One feature per release. Always.
Skipping the diff review because “the tests pass.” Tests are necessary but not sufficient. I’ve shipped code where the tests passed but the implementation included a hardcoded API key, an unnecessary network call on every request, and a try/catch that swallowed errors silently. All in the same PR.
Not checking what got installed. AI assistants sometimes add packages. I review package.json or requirements.txt diffs with the same care I review code diffs. An unnecessary dependency is a future vulnerability.
Assuming the AI understood my codebase conventions. It doesn’t. It writes plausible code that might not match your naming conventions, error handling patterns, or file organization. I treat “does this match the rest of my code” as a separate review pass.
None of this is glamorous. There’s no trick that makes releasing AI-assisted code risk-free. But a short checklist that you actually follow beats a sophisticated process that you skip under pressure.
I’ve been using this workflow for about six months now. The bugs I ship have shifted from “this code does the wrong thing” to “this edge case wasn’t covered” — and that’s a category of bug I know how to handle.