Cursor Composer 2 Review: Benchmarks, Pricing, and the Kimi K2.5 Controversy Explained

On March 18, 2026, Cursor launched Composer 2, an AI coding model that immediately shook up the developer tooling ecosystem. Frontier-level performance, aggressive pricing, remarkable inference speed: on paper, Composer 2 has everything it takes to become the new standard in AI-assisted coding. But just hours after the announcement, an unexpected controversy erupted. A developer discovered that the model was built on Kimi K2.5, a Chinese open-source model developed by Moonshot AI, a detail Cursor had carefully omitted from its official blog post.

The incident raises fundamental questions about transparency in the AI industry, the growing role of Chinese open-source models in the global ecosystem, and the increasingly blurred line between "building your own model" and "fine-tuning someone else's." Here is everything you need to know.

What Is Cursor Composer 2 and Why Is Everyone Talking About It?

Cursor, developed by San Francisco-based startup Anysphere Inc., is a code editor built on a fork of Visual Studio Code and enhanced with deeply integrated AI capabilities. Founded in 2022 by Michael Truell, Aman Sanger, Sualeh Asif, and Arvid Lunnemark, the company has experienced a meteoric rise. By November 2025, Anysphere was valued at $29.3 billion after raising $3.38 billion from investors including a16z, Thrive Capital, and DST Global. All four cofounders, under 30, made it onto the Forbes 30 Under 30 list.

Composer is the name of Cursor's proprietary model family, specifically designed for AI-assisted coding. After Composer 1 and Composer 1.5, this second major version represents a significant leap in performance. Cursor describes it as "frontier-level at coding," meaning it performs at the level of the best existing models for programming tasks.

What Makes Composer 2 Different From Its Competitors

Composer 2 is not simply a language model applied to code. It is an agentic model, capable of autonomously executing sequences of hundreds of actions: navigating a project, modifying multiple files simultaneously, running terminal commands, and iterating on its own results. This agentic approach, combined with a self-summarization technique for managing long contexts, allows it to tackle complex tasks that go far beyond simple autocompletion.

The model also delivers impressive inference speed, exceeding 200 tokens per second, making it one of the fastest coding models on the market. In an interactive code editor, that speed fundamentally transforms the development experience.

How Does the Kimi K2.5 Model Behind Composer 2 Actually Work?

To understand Composer 2, you first need to understand its foundation: Kimi K2.5, developed by Moonshot AI, a Chinese AI startup. Kimi K2.5 is an open-source Mixture of Experts (MoE) model, an architecture that achieves high performance while keeping computational costs under control.

Kimi K2.5 Technical Architecture

The numbers are staggering. Kimi K2.5 totals 1 trillion parameters (1T), but only 32 billion are active for any given request, thanks to the MoE system. The model uses 384 total experts, with 8 experts (plus 1 shared expert) activated per token. It has 61 layers of depth, a hidden attention dimension of 7,168, and uses MLA (Multi-head Latent Attention) with SwiGLU activation. Its context window reaches 256,000 tokens.

What particularly distinguishes Kimi K2.5 is its natively multimodal nature. The model was pre-trained on approximately 15 trillion (15T) mixed tokens combining text and vision, thanks to the MoonViT component. This allows it to understand visual interfaces, generate code from design mockups, and orchestrate agents capable of processing visual data.

The model also introduced the concept of Agent Swarm, an approach where it decomposes complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents.

From Kimi K2.5 to Composer 2: Cursor's Training Process

Cursor did not simply use Kimi K2.5 out of the box. According to statements from Lee Robinson, Cursor's vice president of developer education, approximately 75% of the compute invested in Composer 2 came from Cursor's own training, with only 25% from the base model.

This process involved two main steps. First, continued pretraining on the base model, aimed at strengthening its coding-specific capabilities. Then, large-scale reinforcement learning, four times the volume of the base, focused on long-horizon coding tasks. This combination is what enables Composer 2 to solve problems requiring hundreds of sequential actions.

Training also incorporated self-summarization, a technique that allows the model to automatically condense the working context when it grows too large, rather than simply truncating the conversation history.

Cursor accesses Kimi K2.5 through Fireworks AI, an inference and reinforcement learning platform, as part of an authorized commercial partnership with Moonshot AI.

Benchmarks and Performance: Composer 2 vs GPT-5.4 and Claude Opus 4.6

Composer 2's performance was measured across three reference benchmarks, and the results show dramatic improvement over previous versions.

Coding Benchmark Results

Model	CursorBench	Terminal-Bench 2.0	SWE-bench Multilingual
Composer 2	61.3%	61.7%	73.7%
Composer 1.5	44.2%	47.9%	65.9%
Composer 1	38.0%	40.0%	56.9%

The progression is striking. Between Composer 1 and Composer 2, the CursorBench score jumped by 61%, the Terminal-Bench 2.0 score by 54%, and the SWE-bench Multilingual score by 30%. The last metric is particularly significant, as SWE-bench Multilingual evaluates a model's ability to solve real issues in open-source projects across multiple programming languages.

Terminal-Bench 2.0, maintained by the Laude Institute, is an agentic evaluation benchmark focused on terminal use. It measures a model's ability to navigate, diagnose, and solve technical problems in a command-line environment.

How Composer 2 Stacks Up Against the Giants

Based on available data, Composer 2 outperforms Anthropic's Claude Opus 4.6 on certain coding benchmarks while trailing behind OpenAI's GPT-5.4 on other metrics. The nuance matters: Composer 2 particularly excels at implementation, the ability to write functional code quickly, while models like GPT-5.4 and Claude Opus 4.6 retain an advantage in architectural planning and complex reasoning tasks.

Community feedback from developers confirms this analysis. Multiple users report that Composer 2 is remarkably effective for day-to-day development tasks (refactoring, bug fixing, feature creation) but that for complex architectural problems or modifications requiring deep understanding of a project's context, Claude Opus 4.6 remains superior.

The decisive advantage of Composer 2 lies in its performance-to-cost ratio. Where most frontier models force you to choose between quality and budget, Composer 2 delivers both.

Cursor Pricing in 2026: How Much Does Composer 2 Cost?

This is arguably the most disruptive aspect of Composer 2: its pricing. The model is offered at $0.50 per million input tokens and $2.50 per million output tokens in its standard version. The Fast variant, which provides the same intelligence with higher inference speed, costs $1.50 per million input tokens and $7.50 per million output tokens.

Price Comparison With Competing Models

Model	Input Price ($/M tokens)	Output Price ($/M tokens)	Ratio vs Composer 2
Cursor Composer 2 (Standard)	$0.50	$2.50	1x
Cursor Composer 2 (Fast)	$1.50	$7.50	3x
Claude Sonnet 4.6 (Anthropic)	$3.00	$15.00	6x
Claude Opus 4.6 (Anthropic)	$5.00	$25.00	10x

The difference is staggering. Composer 2 costs roughly one-tenth the price of Claude Opus 4.6 and one-sixth of Claude Sonnet 4.6, while delivering comparable performance on coding benchmarks. For developers who use AI intensively on a daily basis, the savings are substantial.

Cursor Subscription Plans

Beyond per-token API pricing, Cursor offers several subscription tiers for accessing its tools within the code editor:

Plan	Monthly Price	Description
Hobby	Free	Limited access to basic features
Pro	$20/month	Standard usage with Composer 2 access
Pro+	$60/month	3x more usage than Pro
Ultra	$200/month	20x more usage than Pro
Teams	$40/user/month	Enterprise plan per user

On individual plans, Composer usage is part of a standalone usage pool with generous included volume. Beyond those limits, billing switches to usage-based pricing.

Independent tests by developers show that for the same coding task, using Composer 2 in Cursor costs roughly four times less than using Claude Opus or GPT-5.4 through the same editor. This economic advantage is especially significant for development teams accumulating millions of tokens daily.

The Kimi K2.5 Controversy: Why Cursor Hid the Model's Origin

This is the aspect of the announcement that generated the most discussion, and for good reason. The way Cursor initially failed to disclose Composer 2's foundations raises important questions about transparency in the AI industry.

Timeline of Events

Thursday, March 18: Cursor publishes a blog post announcing Composer 2. The text describes the improvements as resulting from "the first continued pretraining of the base model, combined with reinforcement learning." No mention of Kimi K2.5 or Moonshot AI.

Friday, March 19: Less than two hours after the launch, a developer going by @fynnso intercepts the actual model ID in a Cursor API request: kimi-k2p5-rl-0317-s515-fast. The name immediately betrays the origin: "Kimi K2.5 + RL." The developer publishes the finding on X with a cutting comment: "At least rename the model ID."

See @amanrsanger's post on X

In the hours that follow, Du Yulun, Moonshot AI's head of pretraining, tweets that Composer 2's tokenizer is "completely identical" to Kimi's. He directly challenges Michael Truell, Cursor's cofounder. The tweet is later deleted.

Saturday, March 20: The tide turns. Moonshot AI's official account (@Kimi_Moonshot) posts a congratulatory message to the Cursor team, confirming that the use of Kimi K2.5 is authorized under a commercial partnership through Fireworks AI.

See @Kimi_Moonshot's post on X

The same day, Aman Sanger, Cursor's cofounder, acknowledges the mistake: "We've evaluated a lot of base models on perplexity-based evals and Kimi K2.5 proved to be the strongest. It was a miss to not mention the Kimi base in our blog from the start."

Lee Robinson adds: "Only ~1/4 of the compute spent on the final model came from the base, the rest is from our training. We will do full pretraining in the future."

A Recurring Transparency Problem at Cursor

This is not the first time Cursor has been caught failing to disclose the origins of its models. In November 2025, when Composer 1 launched, the community discovered that the model's tokenizer was identical to DeepSeek's, another Chinese open-source model. The model even occasionally output Chinese text during inference. At the time, Cursor offered no explanation.

This pattern raises a more fundamental concern. If Cursor systematically builds its models on Chinese open-source bases without disclosing that fact, it raises legitimate questions about the company's actual value-add and about the trust developers can place in its communications.

Kimi K2.5's Modified License and Its Implications

Kimi K2.5 uses a modified MIT license containing a specific clause: any commercial product exceeding 100 million monthly active users or generating over $20 million in monthly revenue must prominently display "Kimi K2.5" in the product's user interface. Given Cursor's valuation and paid user base, it is highly likely that the revenue threshold is reached.

According to Moonshot AI, license compliance is ensured through the commercial agreement with Fireworks AI, the technical intermediary between the two companies. This clarification eased tensions but did not entirely silence critics in the community, particularly those pointing out the lack of attribution in Cursor's interface.

What This Affair Reveals About the Global AI Ecosystem in 2026

Beyond the Cursor case, this controversy highlights several deeper trends in the AI industry.

The Rise of Chinese Open-Source Models

The fact that an American startup valued at nearly $30 billion chose a Chinese open-source model as the foundation for its flagship product is itself a powerful signal. Kimi K2.5, DeepSeek, and other models from Chinese labs are establishing themselves as credible alternatives to Western models, particularly thanks to their innovative architectures (Mixture of Experts) and their performance-to-cost ratios.

The CEO of Hugging Face himself noted that this episode illustrates the growing influence of Chinese open-source models in the global AI ecosystem. This observation holds true not just for coding but also for vision, reasoning, and agentic capabilities.

The Blurred Line Between "In-House Model" and "Fine-Tuning"

When a company takes an open-source model, continues pretraining it, and adds massive reinforcement learning, at what point can it call the result "its own model"? Cursor claims that 75% of Composer 2's compute came from its own training. But the community points out that without the Kimi K2.5 base, that 75% would not have produced the same result.

This question is not purely academic. It has direct implications for investors valuing these companies, for developers choosing their tools, and for overall trust in the AI industry.

The Importance of Transparency for Users

For developers entrusting their code to an AI assistant, knowing which model is running under the hood is relevant information. It influences security decisions, regulatory compliance, and understanding the tool's strengths and limitations. Failing to disclose this information, even when usage is perfectly legal, erodes trust.

Should You Use Cursor Composer 2 in 2026?

Controversy aside, facts are facts. Composer 2 delivers frontier-level coding performance at a fraction of the cost of its competitors. Its inference speed of over 200 tokens per second makes it particularly fluid in an interactive development environment. For day-to-day programming tasks, it offers a value proposition unmatched on the market.

If your work primarily involves implementation, refactoring, and bug fixing, Composer 2 is likely the best available option in terms of cost-effectiveness. If your needs lean more toward architectural planning and complex reasoning, models like Claude Opus 4.6 or GPT-5.4 remain relevant alternatives, though significantly more expensive.

The real question going forward is no longer technical but strategic. Cursor has announced plans to do full pretraining in-house in the future. If the company can achieve that while maintaining this level of performance and pricing, it could permanently reshape the AI coding market. In the meantime, Composer 2 remains a remarkable product, built on a remarkable open-source foundation, and the industry would do well to heed the transparency lesson this controversy has imposed.

Want to automate?

Free 30-min audit. We identify your 3 AI quick wins.

Book a free audit →

Cursor Composer 2 Review: Benchmarks, Pricing, and the Kimi K2.5 Controversy Explained

Cursor Composer 2 Review: Benchmarks, Pricing, and the Kimi K2.5 Controversy Explained

What Is Cursor Composer 2 and Why Is Everyone Talking About It?

What Makes Composer 2 Different From Its Competitors

How Does the Kimi K2.5 Model Behind Composer 2 Actually Work?

Kimi K2.5 Technical Architecture

From Kimi K2.5 to Composer 2: Cursor's Training Process

Benchmarks and Performance: Composer 2 vs GPT-5.4 and Claude Opus 4.6

Coding Benchmark Results

How Composer 2 Stacks Up Against the Giants

Cursor Pricing in 2026: How Much Does Composer 2 Cost?

Price Comparison With Competing Models

Cursor Subscription Plans

The Kimi K2.5 Controversy: Why Cursor Hid the Model's Origin

Timeline of Events

A Recurring Transparency Problem at Cursor

Kimi K2.5's Modified License and Its Implications

What This Affair Reveals About the Global AI Ecosystem in 2026

The Rise of Chinese Open-Source Models

The Blurred Line Between "In-House Model" and "Fine-Tuning"

The Importance of Transparency for Users

Should You Use Cursor Composer 2 in 2026?

Want to automate?

Also read

AI Web Development: How We Deliver Sites at €5,000 Instead of €20,000

GLM-5.1 and 8-Hour Autonomous Agents: What Long-Horizon AI Means for Your Stack

NVIDIA Agent Toolkit Breakdown: OpenShell, Nemotron and AI-Q for Enterprise Teams