Every new AI IDE is the same model with a different system prompt
Cursor, Kiro, Windsurf, Antigravity, Trae. All forks of VS Code, all wrapping one of three model APIs, all selling a long system prompt that does not move the model's ceiling. The vibe-coding tech-debt numbers were always pointing somewhere else.
AI-assisted postDrafted with help from Claude, edited and fact-checked by Mart. See transparency policy →
Hannibal Buress on The Eric Andre Show, "New Year's Eve Spooktacular," 31 December 2012. Meme background on Know Your Meme.
Forks all the way down
The funniest thing about the AI IDE gold rush is that almost every "new" editor you have read a review of in the last twelve months is, underneath the marketing, the same editor. Cursor, Windsurf, Kiro (AWS, July 2025), Antigravity from Google, Trae from ByteDance: they are all forks of Microsoft's VS Code. The fork is well documented enough that Visual Studio Magazine ran a feature on it in January 2026 with the title "What a Difference a VS Code Fork Makes." The implied joke is that the difference is mostly cosmetic.
Cursor by itself is now reportedly at $2B ARR with 2 million users and half the Fortune 500 paying customers, and Anysphere keeps using the phrase "fastest-growing SaaS in history." It is an extraordinary commercial moment, and it deserves to be examined honestly. If you scrape off the marketing surface, what Anysphere ships is a maintained VS Code fork, a model picker that calls one of three APIs, and a long block of natural-language instructions the model never sees as something a user wrote. That last piece is the part that gets called "the product."
flowchart TD
VSC["VS Code<br/>(Microsoft, MIT-licensed)"]
F["Fork the editor"]
S["Re-skin chrome,<br/>add branded panels"]
P["Bolt on long system prompt"]
A["Wrap an agent loop<br/>over the same tool set<br/>(file edit, terminal, search)"]
M["Charge $20–40/month"]
API1["Anthropic API"]
API2["OpenAI API"]
API3["Google API"]
VSC --> F --> S --> P --> A --> M
API1 -.-> A
API2 -.-> A
API3 -.-> A
Same recipe, different chrome:
VS Code forks:
| IDE | Maker | Base | Model(s) | Pricing |
|---|---|---|---|---|
| Cursor | Anysphere | VS Code fork | Anthropic / OpenAI / Google (picker) | $20/mo Pro |
| Windsurf | (ex-Codeium) | VS Code fork + plugins for 40+ IDEs | Multi-provider, Cascade agent | $15/mo Pro |
| Kiro | AWS | VS Code fork | Claude Sonnet 4.6 (default) | Spec-driven, AWS-priced |
| Antigravity | VS Code fork | Gemini (default) | (Google-priced) | |
| Trae | ByteDance | VS Code fork | Free tier + paid | Free tier |
Plugin / native editors (not VS Code forks):
| Editor | Maker | Base | Model(s) | Pricing |
|---|---|---|---|---|
| GitHub Copilot | Microsoft / GitHub | VS Code extension + JetBrains + Neovim plugins | OpenAI + Anthropic (added 2024) | $10/mo Individual |
| Zed | Zed Industries | Native Rust editor, not a fork | Anthropic / OpenAI / local via Ollama | Free + Pro tier |
| JetBrains AI Assistant | JetBrains | IntelliJ family, not a fork | OpenAI / Anthropic / Google | Bundled with IDE subscription |
| Continue.dev | Open source | VS Code + JetBrains extension | Any model, BYO key | Free |
Terminal-first (no editor at all):
| Tool | Maker | Base | Model(s) | Pricing |
|---|---|---|---|---|
| Claude Code | Anthropic | Terminal CLI, editor-agnostic | Claude only | Usage-priced |
| Codex CLI | OpenAI | Terminal CLI | OpenAI only | Usage-priced |
| aider | Open source | Terminal CLI, editor-agnostic | Any model, BYO key | Free |
It is not only VS Code
If the wrapper thesis were confined to VS Code forks it would be easier to dismiss as a Cursor-specific complaint, but the same convergence is happening in editors that have no connection to VS Code at all. The Neovim plugin ecosystem now hosts at least half a dozen mature AI integrations — avante.nvim, codecompanion.nvim, copilot.lua, parrot.nvim, and a long tail of smaller plugins — each of which wraps Anthropic, OpenAI, or Google APIs behind a chat panel, an inline-edit command, and an agent loop that is functionally similar to the one in Cursor. The user experience is different (modal editing, keyboard-first ergonomics, no mouse-driven panels), but the artefact produced on a given task is not.
The same pattern shows up in Emacs through gptel, copilot.el, and ellama. Zed, an editor written from scratch in Rust by alumni of the Atom team, ships its own AI assistant tied to the same three APIs and adds local-model support via Ollama. JetBrains AI Assistant is bundled into IntelliJ, PyCharm, GoLand, and the rest of the JetBrains family — an entirely different IDE substrate, owned by an entirely different company — and routes user requests to Anthropic, OpenAI, and Google. None of these are VS Code forks. All of them converge on the same UX shape: chat panel, inline-edit command, agent loop, system prompt, model picker.
Terminal-first tools take the thesis further by stripping the editor entirely. Claude Code, Codex CLI, and aider operate as command-line agents over a project directory with no graphical UI at all. The workflow is: the user issues a command, the model reads relevant files, the model proposes edits, the user accepts or rejects them. The output of an aider session on a real engineering task is similar enough to the output of a Cursor session on the same task that the editor's contribution becomes hard to measure separately from the model's contribution.
The wrapper conversation is not really about VS Code. The wrapper is a wrapper regardless of substrate, and the leverage point in the productivity curve sits at the model layer, which is where it has been for several years now.
The system prompt is the product
None of these system prompts were meant to be public, but a single GitHub repo now indexes leaked copies of more than 28 of them (Cursor, Kiro, Windsurf, Trae, Devin, Lovable, Replit, and on), and the repo sits at around 134K stars as of May 2026. If you have never read one, it is genuinely worth opening a few side by side. They are mostly very long lists of scaffolding rules: be concise, cite line numbers, do not invent file paths, prefer this tool for that situation, never edit binary files, and so on. Pages of it. Tens of thousands of tokens in some cases.
Drew Breunig captured the dynamic neatly in his February post on the topic: "the model sets the theoretical ceiling; the system prompt determines whether the peak is reached." That is roughly the most honest framing of what these IDEs are doing. They are not training new models. They are writing very long instructions for someone else's model and hoping the instructions guide it closer to its own upper limit. Sometimes that works, sometimes it backfires. Cursor's prompt, for instance, contains a rule along the lines of
Your intermediate outputs must be kept short and concise.
which silently overrides any "be more verbose" preference a user sets in their own rules file, as Cursor's own community forum has been pointing out for months. The hidden instruction wins because it is closer to the model.
What is worth holding onto here is that the whole layer is text. It is also paid for. The model weights underneath are not affected by it.
The model is the ceiling, and the wrappers know it
A system prompt cannot teach a model something it does not already know. It can route between known behaviors, shape outputs, prevent certain failure modes, give the model a persona. None of those operations expand the model's capability surface. The same Claude that drives a 200-line refactor inside Cursor, refactor we will then merrily merge to master, still cannot count the R's in "strawberry" for the same tokenization reasons I wrote about in the previous post. The same Gemini that lives behind Antigravity will still tell you to walk 50 meters to a car wash rather than drive the car you wanted washed. There is no IDE wrapper that fixes either of those.
When Carnegie Mellon ran a frontier model through a battery of routine office tasks earlier this year and it failed somewhere around 65 to 70 percent of them, the headline-grabbing number was that frontier failure rate. The quieter observation in the same line of work is that the rate did not move when the model was put behind a fancier UI. Gartner's complaint about "agent washing," where they estimate only about 130 of the thousands of vendors marketing autonomous agents are actually shipping anything that meets the technical definition, lives in the same family of observations. Agent-shaped wrapping does not make a non-agent into an agent.
The tech-debt numbers are talking about the model
If the IDE wrappers were actually bending the productivity curve, the numbers people are now collecting on vibe-coded codebases would show some kind of meaningful spread between them. They do not. Pulling from Pixelmojo's 2026 compilation of recent audits: roughly 63 percent of developers report spending more time debugging AI-generated code than they would have spent writing the code themselves; AI-assisted development is running about 12 percent more expensive than baseline in year one, and without active debt remediation, year-two maintenance costs are landing around 4x baseline; around 45 percent of AI-generated code is being found to contain vulnerabilities, and a 2025 audit of 1,645 Lovable apps surfaced 170 of them carrying critical CVEs.
These numbers move with the model and with how the model is being used. They do not move with the choice of IDE. If you take the same engineer and the same Claude and the same task, the codebase you get out is not appreciably different between Cursor and Kiro, except for who sees your code on the way and how much you paid for the privilege. The wrapper is selling comfort and ergonomics, which are real, but ergonomics are not the same thing as capability. Why those productivity numbers refuse to move even as throughput rises is the subject of the next post in the series, which picks up on cognitive debt and the comprehension cost that LLM-assisted workflows are quietly running on engineering teams.
What the next IDE will look like
I am willing to predict, with reasonable confidence, that the next "new AI IDE" announcement will be a VS Code fork, will wrap one of the same three model APIs (Anthropic, OpenAI, Google), and will ship with a longer system prompt and a more elaborate agent loop than its predecessors. The pitch will use words like "principle-driven" or "constitution-driven" or something else that connotes seriousness. The substrate will be identical. This is not a complaint, exactly. The substrate is genuinely good, the system prompts are genuinely useful, and ergonomics matter.
The structural change in the LLM-coding stack between 2024 and 2026 has not happened at the editor layer. It has happened at the tool-use layer. Anthropic's Model Context Protocol, announced in late 2024 and now adopted by Cursor, Claude Code, Zed, JetBrains, Continue, and a long list of others, standardises how external tools — file systems, project search, git, browsers, custom backends — talk to a model. The effect on the IDE market is that tool integrations, which used to be IDE-specific work, are becoming portable across editors. A custom MCP server written for one IDE works in any of the others without modification. The substance of a 2026 AI IDE is fairly captured by three pieces: a system prompt, an agent harness (tool loop, context compaction, retry, memory), and a tool registry. Two of those three are becoming commodity; the system prompt is the one piece still nominally proprietary, and the leaked-prompts repository is the empirical evidence that even those are converging.
But the leverage point in the curve is the model, and that is why Yann LeCun raising a billion dollars for AMI Labs on an explicitly anti-LLM, world-model thesis is more interesting to me, long-term, than Cursor's $2B ARR. Cursor is repackaging a ceiling. LeCun is, with characteristic stubbornness, trying to build a new one. One of those bets eventually moves the curve; the other charges $20 a month to redecorate it.
If you are evaluating yet another AI IDE, the honest evaluation is uncomfortable: it will be roughly as good as the model it runs on, plus or minus the quality of someone else's hidden system prompt. If you already have direct access to the same model, you already have most of the value. The wrapper is mostly the wrapping.
Read next
Apple's "do not hallucinate" prompt is funny. The same logic is why AI-written + AI-reviewed pipelines silently approve bugs, why long contexts compound errors, and why the real guardrails are deterministic.
A 2026 study found humans are quietly being gentler on AI-authored PRs than on human-authored ones. Single-reviewer review already broke at 400 lines per diff. The fix is two old ideas — distribute the review across the team, and pair every diff with a deterministic map of the code flow.
Brooks said adding people to a late project makes it later. Adding LLMs follows a similar pattern, except the cost lands on a different axis — output rises while shared understanding of the system erodes, and the resulting cognitive debt eventually has to be paid.