This Free Rust Stack Replaces 4 LLM Tools at Once

At Bridgers, we design and deploy AI solutions for our clients: conversational agents, data extraction pipelines, business automation powered by LLMs. Every project confronts us with the same challenge: stitching together a patchwork of tools (gateway, observability, evaluation, optimization) that do not communicate well. When we discovered TensorZero, an open-source stack claiming to unify all of this in a single component written in Rust, we immediately wanted to evaluate it. Here is our detailed analysis, from the perspective of an agency that builds AI products every day.

An Open-Source LLMOps Stack: What Does That Actually Mean?

Before diving into TensorZero, let us clarify the problem it solves. When you move from an LLM prototype to a production product, you must manage five workstreams simultaneously:

The gateway: a layer that routes your calls to different LLM providers (OpenAI, Anthropic, Mistral, etc.), handles retries, fallbacks, and load balancing
Observability: recording and analyzing every inference to understand what works and what fails
Optimization: model fine-tuning, prompt optimization, advanced inference techniques
Evaluation: systematic testing to measure LLM output quality
Experimentation: rigorous A/B tests to validate every change before full deployment

Most teams use a different tool for each need: LiteLLM for the gateway, Langfuse for observability, homegrown scripts for evaluation, and often nothing for optimization and experimentation. TensorZero unifies all five in a single stack under the Apache 2.0 license, with no paid features whatsoever.

Who Is Behind TensorZero?

The project is driven by a 9-person team based in Brooklyn, New York, co-founded in January 2024 by Gabriel Bianconi and Viraj Mehta.

CTO Viraj Mehta's background explains many of the architectural choices. His PhD at CMU focused on reinforcement learning applied to nuclear fusion reactors, a domain where every data point costs roughly $30,000 for 5 seconds of collection. This experience generated an obsessive philosophy: never waste a single data point. That philosophy drives the "data flywheel" concept at the heart of TensorZero.

The team also includes Aaron Hill, a Rust compiler maintainer (significant when the product is 77.5% Rust), Alan Mishler, VP at J.P. Morgan AI Research with over 1,300 citations, and Shuyang Li, Staff Engineer at Google on LLM infrastructure.

In August 2025, TensorZero raised $7.3 million in seed funding with FirstMark Capital (Matt Turck) as lead, joined by Bessemer Venture Partners, Bedrock, and DRW, according to the official TensorZero blog. The project now counts 11,100 GitHub stars, 769 forks, and 124 contributors.

https://x.com/gabrielbianconi/status/2031773980734976161

https://x.com/thebigmehtaphor/status/2031775345473368126

Why TensorZero Is Built in Rust (And Why You Should Care)

The choice of programming language for an LLM gateway is not a cosmetic detail. The gateway is the hot path of your AI infrastructure: every LLM call passes through this layer. Any added latency multiplies across your total request volume.

The benchmarks published by TensorZero, run on an AWS c7i.xlarge instance (4 vCPUs, 8 GB RAM), reveal a staggering gap with LiteLLM (written in Python):

Metric	LiteLLM at 100 QPS	LiteLLM at 500 QPS	LiteLLM at 1,000 QPS	TensorZero at 10,000 QPS
Mean latency	4.91 ms	7.45 ms	Total failure	0.37 ms
P50	4.83 ms	5.81 ms	Total failure	0.35 ms
P90	5.26 ms	10.02 ms	Total failure	0.50 ms
P99	5.87 ms	39.69 ms	Total failure	0.94 ms

Put differently: TensorZero at 10,000 requests per second delivers lower latency than LiteLLM at 100 requests per second. And LiteLLM collapses entirely beyond 1,000 QPS, per the official benchmarks.

For an agency like Bridgers that builds AI products that must absorb unpredictable traffic spikes, this performance difference is not academic: it determines whether your infrastructure holds or breaks.

Four technical reasons explain the gap:

No garbage collector. Rust uses an ownership model that guarantees memory safety without GC pauses. Result: zero random latency spikes.
Concurrency without data races. Rust's type system catches race conditions at compile time. For a concurrent gateway, that eliminates an entire category of bugs.
No GIL. Python imposes a Global Interpreter Lock that creates a throughput ceiling. Rust has no such limitation.
Deterministic performance. Where Python can surprise you in production with unexpected slowdowns, Rust offers total operational predictability.

The Data Flywheel: How Your Production Data Improves Your Models

TensorZero's most original concept is its data flywheel, a self-reinforcing learning loop that transforms every production interaction into an improvement opportunity.

The idea rests on modeling LLM applications as POMDPs (Partially Observable Markov Decision Processes), a theoretical framework from reinforcement learning. In practice, every LLM function is viewed as an agent that observes a partial environment, makes a decision (the text output), and receives a reward (the business KPI), according to the TensorZero technical blog.

The loop works as follows:

Collect: every inference is stored in ClickHouse in a structured format. TensorZero records input variables and outputs, not raw prompts. This makes data portable across providers: you can fine-tune an Anthropic model with data collected via OpenAI.

Optimize: data feeds multiple optimization types. Supervised fine-tuning (SFT), preference fine-tuning (DPO), RLHF for models. MIPROv2, DSPy, and GEPA for prompts. Dynamic In-Context Learning (DICL), Best-of-N, and Mixture-of-N for inference.

Evaluate: offline backtests on historical data validate each optimization before deployment. You can replay 6 months of inferences with a new prompt without sending a single request to an LLM.

Loop: production traffic automatically generates new variants and evaluates them. Engineers focus on strategic decisions.

Autopilot: The AI Engineer That Optimizes Your LLMs While You Sleep

Launched in preview in version 2026.1.7 (February 2026), TensorZero Autopilot is described by the team as "Claude Code for LLM engineering." It is an automated system that operates on top of the stack to continuously optimize your LLM applications.

https://x.com/TensorZero/status/2018450123332763783

In practice, Autopilot can:

Analyze millions of inferences to detect error patterns
Recommend model or inference strategy changes
Generate and refine prompts based on real feedback
Drive fine-tuning, RL, and knowledge distillation workflows
Set up evaluations and prevent regressions
Launch A/B tests to validate changes

For an agency, the appeal is obvious: Autopilot could dramatically reduce the engineering time spent on manual LLM optimization for each client. The TensorZero team claims it has achieved "substantial performance improvements in use cases ranging from data extraction to customer support agents."

Autopilot is currently invite-only. It is also TensorZero's future monetization layer: the open-source stack stays free, Autopilot will be the paid managed service.

Technical Comparison: TensorZero vs LangSmith vs Langfuse vs LiteLLM

When we evaluate a tool for client projects, we systematically compare it against alternatives. Here is how TensorZero positions itself.

Criteria	TensorZero	LangSmith	Langfuse	LiteLLM
License	Apache 2.0 (100% free)	Commercial (paid)	Partial (paid tier)	Partial (enterprise tier)
LLM Gateway	Yes (Rust, < 1 ms P99)	No (via LangChain)	No	Yes (Python, fails at 1K QPS)
Observability	OSS UI + ClickHouse	Paid	Full UI	Third-party integrations
Built-in fine-tuning	SFT, DPO, RLHF	No	No	No
Native A/B testing	Yes (RCT + bandits)	No	No	No
Inference-time opt	DICL, BoN, MoN, CoT	No	No	No
Evaluations	Static + dynamic	Paid	Built-in	No
Self-hosted	Full	Partial	Yes	Yes
Native providers	19+	Via LangChain	N/A	100+
Dynamic routing	No (static)	No	N/A	Yes (latency/cost)

Where TensorZero Beats LangSmith

TensorZero cleanly separates application engineering from LLM optimization. LangSmith requires a paid subscription and remains tied to the LangChain ecosystem. TensorZero is language-agnostic (HTTP API), while LangChain requires Python or JavaScript, according to the comparison documentation. For an agency working with diverse tech stacks, this flexibility is valuable.

Where TensorZero Beats Langfuse

Langfuse excels at observability with a mature UI and advanced playground. But it offers no gateway, no fine-tuning, no A/B testing, and no inference-time optimization. TensorZero covers all these aspects. The two tools can coexist, per the official comparison page.

Where TensorZero Beats LiteLLM

Raw performance is the major differentiator. But beyond the gateway, TensorZero provides a complete ecosystem that LiteLLM does not: evaluations, experimentation, optimization, integrated observability. LiteLLM remains superior in the number of supported providers (100+) and dynamic routing by latency or cost, per the official benchmarks.

Use Cases: What TensorZero Can Do for Your Client Projects

On-Premise Deployment for Regulated Industries

A case study published by TensorZero describes automating code changelogs at a major European bank. Key points for agencies:

Fully on-premise deployment with TensorZero + Ollama
No data leaves the client's infrastructure
Dynamic In-Context Learning (DICL) enables continuous improvement without ML intervention
Integration into existing CI/CD pipelines (GitLab)

For clients in finance, healthcare, or legal, this sovereign deployment capability is a decisive argument.

Model Optimization at Lower Cost

TensorZero's NER (Named Entity Recognition) example demonstrates that a GPT-4o Mini optimized with TensorZero can outperform unoptimized GPT-4o, at a fraction of the cost and latency. For an agency billing AI projects, being able to deliver superior performance with cheaper models directly increases margins.

A/B Testing Models in Production

You are building a chatbot for a client. Rather than arbitrarily choosing between GPT-4o, Claude 3.7, and Mistral, you deploy all three via TensorZero with native A/B testing. The system automatically measures which model produces the best responses according to the client's KPIs (satisfaction, accuracy, resolution time). No homegrown scripts, no selection bias.

AI Agents with a Learning Loop

For customer support agents, data extraction pipelines, or content generation systems, TensorZero's data flywheel transforms every interaction into training data. The more the agent is used, the better it gets. This is the shift from a static AI agent to one that learns.

The Business Model: Zero Cost, But for How Long?

TensorZero is distributed under the Apache 2.0 license with no restrictions. Enterprise support is free. Self-hosting costs nothing beyond your own LLM API keys and ClickHouse infrastructure.

Gabriel Bianconi explained to VentureBeat: "We realized very early on that we needed to make this open source, to give enterprises the confidence to do this." Monetization will come from the managed Autopilot service, which will provide GPU infrastructure for fine-tuning and automated experiment management.

FirstMark's Matt Turck summarized his conviction in a tweet: "Been thinking about feedback loops in AI forever and those guys are the real deal."

https://x.com/mattturck/status/1957546109632483330

19+ LLM Providers Compatible with TensorZero

The TensorZero gateway natively supports over 19 providers: OpenAI, Anthropic, AWS Bedrock, AWS SageMaker, Azure OpenAI, DeepSeek, Fireworks, GCP Vertex AI (Anthropic and Gemini), Google AI Studio, Groq, Hyperbolic, Mistral, OpenRouter, SGLang, TGI, Together AI, vLLM, xAI (Grok), and any OpenAI-compatible API.

For agencies, the value is twofold. You can start a project with one provider and migrate to another without rewriting your code. And you can integrate self-hosted models via vLLM or Ollama for clients with sovereignty constraints.

What TensorZero Still Lacks

No tool is perfect, and TensorZero has real limitations:

The GitOps approach can be off-putting. Prompt management uses TOML configuration files versioned in Git. For teams accustomed to graphical interfaces like LangSmith or Langfuse, the transition takes effort.

The UI is functional, not elegant. If you need to present an observability dashboard to a non-technical client, Langfuse or LangSmith offer a better visual experience.

No dynamic routing. LiteLLM can dynamically route requests based on provider latency or cost. TensorZero only supports static routing.

Around 20 native providers. Versus over 100 for LiteLLM. Support for any OpenAI-compatible API partially compensates.

No native SSO. Large organizations will need to add Nginx or OAuth2 Proxy for authentication.

Autopilot is in preview. The flagship feature is not yet publicly available.

When to Recommend TensorZero to a Client

Yes, if: the client operates LLMs in production at medium to large scale, wants to continuously optimize models, has performance or data sovereignty constraints, or wants to build a competitive moat through continuous learning.

No, if: the client is at the prototype stage, lacks a DevOps culture, needs a full graphical interface for non-technical users, or operates at very low volume.

The 9-Person Team Behind TensorZero

The team, based in Brooklyn, New York, is remarkably dense in expertise. CEO Gabriel Bianconi served as CPO at Ondo Finance, a DeFi decacorn with over $1 billion in assets under management, and holds BS and MS degrees from Stanford. CTO Viraj Mehta completed his PhD at CMU on reinforcement learning for nuclear fusion and LLMs, with additional Stanford degrees in math and computer science.

Beyond the founders, the roster reads like a who's who of applied AI. Aaron Hill is a Rust compiler maintainer with prior experience at AWS and Svix. Alan Mishler, VP at J.P. Morgan AI Research, holds a CMU PhD in statistics and over 1,300 academic citations. Andrew Jesson, a Columbia postdoc with an Oxford PhD in LLMs, brings over 4,000 citations. Antoine Toussaint, a former quant and Princeton PhD, previously served as a Staff Software Engineer at Shopify and Illumio. Michelle Hui worked at Wing (Alphabet) and the United Nations. Shuyang Li was a Staff Engineer at Google on LLM infrastructure and search. Simeon Lee led design at Merge.

As CTO Viraj Mehta put it in a recent tweet: "Honestly, it is crazy to get to work with a team like this. I've spent time at Stanford, CMU, national labs, Google, etc and I think this is the densest collection of cracked people around."

Our Analysis: A Tool to Watch Very Closely

TensorZero is the most comprehensive project in the open-source LLMOps ecosystem. Where alternatives specialize in one aspect (Langfuse on observability, LiteLLM on the gateway, LangChain on prototyping), TensorZero targets end-to-end integration.

The bet is ambitious: building in Rust for uncompromising performance, modeling LLM applications as POMDPs to maximize learning, and making everything 100% free. With 11,100 GitHub stars, $7.3 million raised, and a team that includes a Rust compiler maintainer and a VP from J.P. Morgan AI Research, the project has the resources to make this vision real.

As the LLM gateway guide from getmaxim.ai notes: "TensorZero targets teams with strong DevOps cultures that treat AI infrastructure with the same rigor as traditional backend systems."

https://x.com/TensorZero/status/1931367228772962353

For agencies building AI products for their clients, TensorZero represents an opportunity to move from a fragile assembly of disparate tools to a unified, performant, and free stack. We continue to evaluate it on parallel projects, and we will keep you posted on our findings.

Want to automate?

Free 30-min audit. We identify your 3 AI quick wins.

Book a free audit →