The AI-First Codebase

A blueprint for enabling AI-first teams building with autonomous code generation

Jul 15, 2025

It’s now firmly unanimous that AI is a huge multiplier for engineers. The less time we spend writing boilerplate and hunting down obscure fixes or configuration settings, the more time we have doing things that really move the needle. With this new found productivity boost comes fear of a much larger disruption, which is that some of us might be wholly displaced by AI. As an optimist I’m not so worried about that, but regardless of where you stand on that issue, the most fruitful place to spend your energy is figuring out how to adapt to this new world.

VCs have always posed the question of what happens to us in a world of abundance, where everything from knowledge to work is easy because of AI. In our particular case: What happens in the long run when engineering productivity is abundant?

Andrew Ng recently pointed out a concrete and dramatic shift in the ratio of PMs to engineers among one of his teams:

“Just yesterday, one of my teams came to me, and for the first time, when we're planning headcount for a project, this team proposed to me not to have 1:4 PM/engineers, but to have 1:0.5 PM/engineers.”

An ominous foreshadowing perhaps. I’m not sure about the specific ratios, but I definitely buy into the downward shift as AI-clad engineers produce substantially more than they used to. At a broad level today, the shift is more subtle, but at the limit we know the number of engineers required to write code will converge toward near-zero if not zero.

Or will it? You could also argue PMs are no more immune to AI than engineers (see ChatPRD), that innovation will simply shift to the new bottleneck and whittle away the PM part of the ratio. Eventually, perhaps we’ll see that first mythical one-person billion-dollar company thanks to AI.

Vibe Check: Debugging AI-Generated Spaghetti Code

My contention is that this entire line of thinking falls under the “faster horse” mentality, fixated on optimizing what we already know instead of rethinking the solution from first principles. The most powerful outcome of abundant engineering isn't just a world with fewer engineers or fewer PMs; it's the chance to fundamentally rethink how we build.

This article proposes a third path, one where humans remain firmly in charge, wrangling an unbounded army of super-intelligent builders. This requires more than just better prompts, it requires a new blueprint for how our teams and our codebases are structured. It requires what I call an AI-First Codebase.

My intuition leads me in this direction largely out of pragmatism. The AI-first revolution, to be successful, must start with an incremental change that reorients us toward the right direction. Thousands of teams simply can't just throw away their existing code, but they are likely very open to slowly retrofitting their existing repositories and workflows to embrace AI-first development.

Continue reading to see how we will build this idea from the ground up. First, we’ll diagnose the deep collaboration gap that plagues modern product teams. Then, we will lay out the blueprint for the AI-First Codebase and the new, "contract-driven" workflow it enables, creating a system where humans and AI can finally collaborate at scale.

Rethinking How We Build

While the undeniable rise in AI-driven engineering productivity has created a powerful narrative of progress, it masks two fundamental problems that prevent us from truly harnessing this new power. The first is a deep and worsening gap in how humans collaborate to build products. The second is a widespread illusion about what it means for AI to "write code." Until we solve both, we will remain stuck in a loop of frustrating half-measures.

The Collaboration Gap

To understand the stakes, consider the all-too-common story of a founder friend of mine. She's non-technical and frustrated. She can build new product ideas instantly on a platform like Lovable, creating fully working MVPs that feel like magic. Yet when she shows them to her engineering team, the answer is always the same: “This is great, but it’s a toy. It’s not production-ready, and it’ll take some time to port over to our repo.” With her hands tied, she resorts to marketing the disposable AI-generated app because the "real" production app can't keep pace.

This story captures the essence of the collaboration gap, a deep frustration that drives teams towards two radical, dead-end futures: one where the engineer is designed out of the process, and another where the PM is. But the real problem lies with the tools themselves, which force builders into a false choice between two kinds of traps. The first are the “walled gardens” like Lovable, which offer incredible speed but no code ownership, inevitably leading to a costly rewrite or migration. The second are the “hidden cliffs” of developer-first platforms like Replit, which offer code ownership but come with a steep “graduation tax” when you need to leave their magical infrastructure behind.

Both models excel at banging out MVPs for the 0 to 1 phase, but their core problem is failing to provide a sustainable path for the 1 to 1,000 journey of a real business.

The Illusion of AI Autonomy

As we diagnose the failures of these platforms, it’s tempting to look at the headlines coming out of big tech and believe we’re already AI-first. We see reports that AI is now writing 25%-30% of new code at companies like Google and Microsoft. The implication seems to be that we just need to put these tools in the hands of more engineers and the problem will solve itself, but this is an egregious misinterpretation of the metric.

Those numbers don’t represent autonomous AI agents conceiving of and writing huge portions of a company’s production software. In actuality, they include even the most basic use of AI i.e. code autocomplete from Copilot or something similar. It’s just a mere metric of keystrokes saved, not of independent creation. While admittedly a major AI innovation in coding, it’s a far cry from saying “AI is writing our code.”

For AI to truly author production-grade code in a meaningful way, to autonomously take a feature from spec to a PR, it needs more than just a tab or prompt; it needs a system. Just like humans, unleashing AI on a codebase with no established rules, design choices, and shared conventions would only dig you further into tech debt until you grind to a halt.

AI unlocks productivity, but for AI to thrive in production it needs rails. We need a new blueprint for how code is produced, moving from a human-centric process augmented by AI tools to an AI-centric process governed by human architects.

AI-First Principles

Our task now is to design the blueprint for this future team, where product managers and engineers both thrive by leveraging AI. Before laying out the technical details, we must first establish the core tenets that will guide our architecture and workflows.

Humans remain in charge: Humans must remain the architects of the product/system and the arbiters of taste. Unlike AI, they have vision, opinions, and a direct connection to customers’ needs, which in turn provides direction. Without clear direction, there’s nowhere to point all that newfound horsepower. However, we’ll need to rethink how that direction is given. We’re no longer just managing people; we’re directing an unbounded army of super-intelligent builders, which requires a new level of precision and clarity.
AI agents are first-class citizens: We must stop treating AI as a clever autocomplete tool and start treating it as a real team member. Like any human on the team, an AI agent needs clear goals, rich context, and rigid constraints to perform well. An AI unleashed on a codebase with no established rules will only produce chaos faster. Our system must be designed to provide this structure, turning the AI into a predictable, reliable, and incredibly efficient implementer.
Engineering rules still apply: AI does not repeal the fundamental laws of good software engineering. Regardless of who or what writes the code, you still need the tried and true disciplines of monitoring, alerting, comprehensive testing, robust security, scalable architecture, and more. We should not throw these out in the pursuit of speed; instead, we must adapt them for this new world and build a system where they are enforced automatically and in a way that AI understands and abides by.

Equipped with these principles, we choose our strategy: the codebase. It’s where all these tenets can be unified and enforced. It’s our gateway to “production.” The repo must become more than just a place to store code; it must become the central interface for collaboration between all humans and all AI agents. By choosing the codebase as our anchor, we are making a deliberate choice to have all contributors, from engineers to PMs to AI, work "in production" from the get-go.

To guide how we construct the blueprint of this codebase, we’ll lay out what good looks like. A good AI-first codebase:

Moves humans higher up the execution stack. They’re freed from low-level implementation to focus on the more leveraged work of architecture, product strategy, and defining the rules of the system.
Allows builders of all types, technical or not, to contribute. It creates a safe and structured environment where anyone can direct the AI execution layer. It defines a clear process, in the repo itself, for contributors to closely follow.
Is designed specifically for AI to move fast without breaking things. It provides the rails (explicit rules, workflows, and points of human intervention) that turn the AI from a chaotic, unsupervised force into a reliable, high-speed but controlled implementer.

With these principles and ideals established, we can now piece together our approach.

Autonomously Shipping With AI

Shifting from manually writing our own code to commanding AI to autonomously do it for us requires a much more structured approach. As much as we’re tempted to, we can’t simply jump into implementation. Successfully delegating to an AI agent means getting the exact outcome we want from them, and to do that we first need to provide it with full context and very precise instructions. But even that’s not enough because AI can still get it wrong.

To further improve the AI's success rate, we need to build in checkpoints for human oversight, and contain anomalies/hallucinations with finely defined units of work i.e. breaking up the problem into smaller, incremental tasks. This isn’t a radical new invention; it’s something seasoned managers already do when delegating to others, only more precise and verbose. And it’s an approach many AI builders have already found success with. We’ll formalize these best practices and make them our default approach.

The Builder’s Journey

Before we get into the intricacies of the AI-first repo, let’s walk through the experience we’re building towards. Imagine you, either a PM or engineer, want to build a new feature, say weekly email digests. You jump into Cursor and, in agent mode, type:

@proj-new weekly email digest

Cursor kicks into action, following all the instructions in that rules file, and begins an interactive dialogue with you to divine the feature's intent and success criteria. It asks clarifying questions, perhaps referencing existing parts of the codebase for context, to glean from you what exactly needs to be built. Once the agent has enough information, it generates the initial project plan here:

/projects/[yyyy-mm-dd]-weekly-email-digest/plan.md

You review the generated project plan. It’s mostly right but you tweak a few things to get exactly what you want. You might even continue the dialogue with the agent to give it more context or instruction so it can regenerate the plan. When you’re happy with the plan, you type:

@proj-taskify @plan.md

Again, the agent goes off to do its thing, this time instructed to turn your project plan into a finely decomposed set of tasks. It ensures each task is a standalone change that can be safely shipped (per your rules file) and is scoped appropriately so as to not overwhelm someone reviewing the PR. It puts the tasks here:

[proj-dir]/task-01-create-tables.md
[proj-dir]/task-02-create-aggregation-job.md
...

You review it and, once you’re happy with it, you create a project branch and submit a PR. Your teammates, ideally an engineer, reviews your PR, and after some exchanges and tweaks, approves and your PR lands. Now, the fun begins! You tell your agent to work on the first task by typing:

@proj-execute @task-01-create-tables.md

From here, you can probably extrapolate and figure out how things will play out, which is what I wanted to accomplish: giving you a feel of how differently the AI-first flow would work. But this exercise likely generated more questions in your mind. Let’s take a step back and explore them.

Key Challenges

Your first observation might’ve been that, as an engineer, you’re going to be doing a lot less of what you love–writing code–and more of what you might hate–writing docs–and you’d be right. PMs’ jobs won’t really change at all, but for engineers, moving to an AI-first world will require you to embrace stepping away from coding to putting on your architect’s hat. You’ll still be in the code as a steward of the codebase and for diagnosing the occasional corner case, but as AI improves, the harsh reality is that writing actual code yourself will become a hobby rather than a job.

The other thing you might’ve noticed is that you’re now also reviewing docs, not just code. Unfortunately, I have more bad news: you may no longer be reviewing code anymore either. With an unlimited army of agents, the volume of code produced would far outstrip the capacity for engineers to review it. Engineers again will be the bottleneck, this time as code consumers not producers. The solution is either to staff up with more engineers for the sole purpose of alleviating this bottleneck (reviewing code), which is problematic in many ways (they’d just be data labelers or content moderators really), or approach code review very differently. Another way to say it is that you will need to get creative with your CI/CD.

Finally, you’re probably also wondering what’s in those rules files. This is where much of the secret sauce lives, and there’s no one right solution as each team will have their own favorite LLM and their own approach to prompting. However, the high level strategy is the same:

Each job-oriented rules file (e.g. proj-new.md) is an “agent” charged with a specific job, given specific instructions, and exposed to the right context
Agents can reference other rules files, so refactor them (e.g. styles.md, testing.md, architecture.md, etc) to keep things DRY and organized
Agents and rules files are living documents that should be continually updated and improved i.e. instead of directly fixing AI-generated code, you’ll instead prompt-engineer your way to better agents and rules files
Make the AI’s job (and yours) easier by leveraging a monolithic architecture to keep all the necessary context easily within reach; see my prior post on monolith vs microservices

One last thing worth calling out: while Cursor may seem easy and obvious for engineers, we do need to acknowledge that someone non-technical like a PM might struggle. Still, the cost of learning Cursor, Github, and other basics are well worth the gains of being able to harness AI at scale. The agreement to require everyone to learn these things should be made up front.

Now let’s dive into the blueprint for this and how we’ll approach getting code into production.

The Blueprint

The workflow we just explored isn’t magic; it’s the result of a highly structured environment. For an AI agent to reliably translate ideas into a plan, a plan into a series of tasks, and tasks into production-ready code, it must be guided by a clear set of rules. I call this the AI constitution, a collection of living, machine-readable documents that form the blueprint of an AI-first codebase. We’ll put this all in the /.constitution directory at the root of the repo.

AI Constitution

While each team will evolve its own specific rules, I’ll layout a reasonable template structure most teams would find useful:

architecture.md: This is the master plan for the whole repo and the systems it represents, a bird’s eye view of how everything fits together and high level jobs of each component. It contains the rigid architectural patterns the AI must follow. For example, it might mandate a modular monolith, define where business logic must reside (e.g., in a service layer, not in API routes), and specify the approved patterns for data access. Define here the 80/20 rules that will ensure your agents will get broad strokes right.
style.md: This document enforces consistency. It details everything from variable naming conventions (e.g. camelCase) and comment requirements (JSDoc for all public functions) to the specific linting and formatting rules enforced by tools like ESLint and Prettier. This ensures all code, whether written by a human or an AI, looks and feels the same. As an added measure, you can also provide linting and other scripts (e.g. `npm run lint`) for the agents to validate their output.
testing.md: This outlines the non-negotiable standards for quality. It defines the team's testing philosophy (e.g. "unit tests should be granular and fast with mocks"), sets requirements ("aim for test coverage of at least 80%"), and specifies the tools to be used (e.g., Vitest, Playwright).
operations.md: This file addresses the crucial "day two" concerns of running software in production. For example, it mandates that every new feature must include an `oncall.md` playbook, define the standards for structured logging and monitoring, and outline core security protocols.

Keep in mind that this constitution, the rules by which your agents operate, will continually evolve as you build. Update them as you go. Also, don’t be afraid to build agents whose sole job is to keep the constitution updated.

The Agentic Workflow

With the constitution in place to keep AI in check, we now look to the workflow that will turn our ideas into shipped product. Here’s an example sequence that could generally work for most teams:

Human: Create a feature branch
AI: Generate and finalize a project plan and required tasks
Human: review PR of plan + tasks
AI: Agentic generation loop (per task)
1. Agent: Tackles task and submits PR
2. Agent: CI/CD runs with agentic checks
3. On success deploy preview, on fail loop back for agent iteration
4. Human: Optional human review to close the PR
Human: Merge project branch when all tasks done

For now, we don’t yet have a command-line version of Cursor, so much of this will be orchestrated by a human. But we can still add a fair amount of automation, particularly in CI/CD (more later).

For our Cursor agents, we’d need the following:

proj-new.md: Turns an idea into a detailed plan
proj-taskify.md: Turns a plan into a sequence of self-contained, finely scoped tasks
proj-execute.md: Takes a task and implements it

These make up the core of our agentic workflow, and as mentioned before, they would reference the constitution in order to ensure our agents abide by our production-ready rules. Beyond these core agents, you can also build additional agents to focus on specific concerns in order to get the most robust output e.g. proj-codereview.md, proj-test.md, proj-prettify.md, etc.

At this point you might be thinking, What’s actually inside one of these agent files? I asked Gemini 2.5 Pro to help me write proj-new.md and this is what it came up with. After some tweaking, I was surprised how much it could do in Cursor agent mode (YMMV of course but I found success using Gemini 2.5 Pro). Eventually, I’d like to put up a Github project you can fork that has this all ready to go. For now, here’s a sample:

# ROLE

You are a "Product Scaffolding Agent". Your personality is that of a helpful, expert senior product manager and engineer combined. You are collaborative, ask clarifying questions, and are an expert at translating user needs into a complete, actionable engineering plan.

# CONTEXT

You have full access to the user's current codebase. The user has invoked you with the trigger `@proj-new.md` followed by a feature name.

# OBJECTIVE

Your primary objective is to conduct a conversational interview with the user to gather all necessary information, and then generate a **complete feature plan**. This plan consists of two types of artifacts:

1. A single, comprehensive `plan.md` file.
2. A series of sequential, granular `task-##.md` files that break down the work.

The final `plan.md` must strictly adhere to the format defined in `/.constitution/template_project_plan.md`.

# PROCESS

Follow this sequence of steps precisely. Do not move to the next step until the previous one is complete.

**Step 1: Greet, Acknowledge, and Scan.**
- Greet the user and confirm the feature name.
- Perform a quick scan of the existing `/projects` directory to look for related keywords.
- Formulate an intelligent opening question based on your findings to show you are code-aware.

**Step 2: Elicit Core Requirements (Conversational).**
- Ask for the core **User Story**.
- Ask for the key **Acceptance Criteria**.
- Ask for the **Test Scenarios** (happy path, edge cases, failures).
- Discuss and confirm the required **Data Model** changes.

**Step 3: Summarize and Confirm.**
- Provide a concise summary of everything you've gathered for the PRD.
- Ask the user for final confirmation to proceed. Use the exact phrase: "Does this plan for the PRD look correct? If so, I will generate the `PRD.md` file and then proceed to break it down into tasks."

**Step 4: Generate the Plan File.**
- Upon user confirmation, generate the complete `plan.md` file at the path `/projects/[yyyy-mm-dd]-[feature-name]/plan.md`.

**Step 5: Decompose Plan into Sequential Tasks.**
- After generating the `plan.md`, immediately analyze it and decompose the work into a sequence of small, logical, and self-contained tasks.
- For each task, create a new file named `task-##-[task-description].md` (e.g., `task-01-create-tables.md`, `task-02-create-jobs.md`, etc.) in the same feature directory.
- Each task file should be simple and clear, containing:
- `**Depends On:**` (Optional: The ID of a previous task, e.g., `task-01-create-tables`)
- `**Goal:**` A one-sentence description of the task.
- `**Implementation Steps:**` A technical, checklist-style breakdown of the work.

# STOPPING CONDITION

Your work is finished once all `plan.md` and `task-##-*.md` files have been created and saved. **DO NOT, under any circumstances, proceed to write any implementation code (`.ts`, `.tsx`, `.py`, etc.).** Your sole responsibility is generating the markdown plan. Announce that the plan is complete and that the next step is a human 'Planning Review'.

Automated Deployments

This brings us to the creative CI/CD needed to solve the code review bottleneck. We’ll pare our well-defined constitution with a smart assembly line that enforces these rules and ensures quality.

The core of the strategy orients around the feature branch. When the "Planning PR" is approved, a long-lived branch like feature/weekly-email-digest is created. Every subsequent task-PR generated by the AI is merged into this branch. This merge triggers an automated deployment to a dedicated preview environment, powered by a platform like Cloudflare Workers. This preview gets its own unique URL and we configure it to connect to its own isolated, ephemeral test database.

This live preview environment is the ultimate quality gate. It provides a stable target for a full suite of automated end-to-end tests, which become the primary method of validation. The human review happens on the plan; the automated review happens on the code's correctness. Only after all tasks are complete and the feature branch is fully green is it ready for a final merge into main, safely behind a feature flag to decouple deployment from release. Of course, as a safe default, you can always start with a human final review, but over time your goal is to make it unnecessary.

Before going to the preview environment, your PR may run into some issues and need a few revisions. This is where we can develop review agents to automate the iterations. Defining them in the repo itself means they conveniently have access to all the context they need: the code, the constitution, the plan/task, and any other agents it needs to collaborate with. Claude Code is perfect for making your CI/CD intelligent since it supports headless mode and can essentially act like command-line Cursor. A simple Github action could trigger the iterations needed to get your PR ready for the preview environment.

Conclusion

Ultimately, the true promise of AI in software development isn’t just about making engineers faster, but about making the very act of building more accessible and collaborative. We began with the story of the frustrated founder, unable to bridge the gap between her clear vision and the team's production codebase. The AI-First Codebase, by anchoring on the repository as a shared interface, finally solves her problem. It creates a system where a product manager's well-written plan.md is no longer a request in an external ticket, but a direct, first-class input to the manufacturing process. This redefines a “builder” as anyone who can clearly articulate a problem and a plan, not just those who can write the code.

But this framework does more than just close the collaboration gap; it creates a system designed for continuous, compounding improvement. The "AI Constitution" is a living document. Every time an AI-generated pull request reveals a flaw or an opportunity, the human team's first reaction should not be to just fix the code, but to improve the constitution or the agentic recipe that allowed the mistake. This creates a powerful flywheel: an improvement to architecture.md makes every future feature more robust, and a refinement to a prompt recipe makes every future plan more accurate. The team’s intelligence is continuously codified into the system itself, creating an asset that gets exponentially more efficient over time, which is the ultimate competitive advantage.

Talk is cheap, but code is convincing. To that end, I’m really hoping to build out a skeleton of this and put it up on Github for everyone to play with. If you're interested, be sure to subscribe for updates!

Testing in Production

Discussion about this post