Leanstral: Complete Guide to the First Open-Source Agent for Formal Proofs in Lean 4
Mistral AI has released Leanstral, an open-source AI agent dedicated to Lean 4, the formal proof language used by mathematicians, researchers, and a growing number of software verification developers. Published on March 16, 2026 under the Apache 2.0 license, Leanstral is the first model specifically trained for proof engineering in real-world code repositories.
For developers and technical agencies, this release opens concrete possibilities: integrating formal verification into client workflows, offering smart contract audits at reduced cost, or contributing to formalized research projects. This guide covers the architecture, deployment, and integration scenarios for Leanstral.
Why Leanstral Matters for Developers and Technical Agencies
Formal Verification Leaves the Ivory Tower
Until recently, formal verification was confined to research labs and a few heavily regulated sectors (aerospace, defense). The reason: writing proofs in Lean 4 or Rocq (formerly Coq) requires rare expertise and considerable time.
Leanstral changes the equation. By automating part of the proof engineering process, it makes this technology accessible to teams that previously lacked the budget or in-house skills to adopt it.
What Lean 4 Can Verify
Lean 4 is not just a tool for mathematicians. It is a full-fledged functional programming language capable of expressing and verifying:
Program properties (algorithm correctness, absence of specific bugs)
Security policies (Amazon uses Lean to verify Cedar, its access policy engine)
Complex mathematical proofs (perfectoid spaces, automorphic forms, Fermat's theorem)
Formal software contracts (specifications that code must satisfy)
The Lean community has over 10,000 members on Zulip, the GitHub repository exceeds 7,500 stars, and more than 50 university courses now include Lean in their curriculum.
Leanstral Technical Architecture: What Developers Need to Know
Mixture-of-Experts: 119 Billion Parameters, 6.5 Billion Active
Leanstral is built on a Sparse Mixture-of-Experts (MoE) architecture from the Mistral Small 4 family.
Specification | Detail |
|---|---|
Model identifier | Leanstral-120B-A6B-2603 |
Total parameters | ~119B (128 experts) |
Active parameters per token | ~6.5B (4 experts activated per token) |
Architecture | Sparse MoE |
Context window | 256K tokens |
Inputs | Text and images |
License | Apache 2.0 |
API endpoint |
|
The MoE routing activates 4 experts out of 128 for each token, yielding an efficiency ratio of approximately 18x between the model's total capacity and its actual inference cost. In deployment terms, this means Leanstral can run on 4 A100 80GB GPUs, where a dense model of equivalent size would require significantly heavier infrastructure.
MCP Integration with Lean's Language Server
The most important technical detail for developers: Leanstral was trained with tool-calling capabilities via the MCP (Model Context Protocol), specifically for lean-lsp-mcp, the MCP server for Lean's Language Server Protocol.
In practice, this means the agent does not simply generate text that looks like Lean. It interacts in a loop with the compiler:
It submits a proof attempt
The Lean compiler verifies and returns errors
The agent analyzes the errors and adjusts
The cycle repeats until the proof compiles
This feedback loop with a binary verifier (it compiles or it does not) is fundamentally different from standard code generation where the model produces output without automatic feedback.
The agent also supports arbitrary MCPs via Mistral Vibe, making it extensible to additional development tools.
FLTEval Benchmarks: Performance and Cost Compared
FLTEval: A Benchmark on Real Repositories, Not Competition Exercises
Mistral AI created FLTEval alongside Leanstral. This benchmark evaluates the ability to complete formal proofs in the FLT project (formalizing Fermat's Last Theorem), a real research project hosted on GitHub with 55 contributors, 663 stars, and EPSRC funding.
The difference from MiniF2F (the commonly used benchmark): FLTEval measures proof engineering in a real repository, with imports, dependencies, and multi-file structures. This is the kind of work developers and researchers encounter daily.
Parallel Inference Strategy (pass@N)
Leanstral leverages the fact that Lean is a binary verifier to launch multiple proof attempts in parallel. The pass@N score indicates the probability that at least one of N attempts produces a valid proof. This strategy is particularly well-suited to distributed deployment architectures.
Comparison Table: Leanstral vs Claude vs Open Source
Model | Active Parameters | Cost per run ($) | FLTEval Score |
|---|---|---|---|
Leanstral pass@1 | 6.5B | 18 | 21.9 |
Leanstral pass@2 | 6.5B | 36 | 26.3 |
Leanstral pass@4 | 6.5B | 72 | 29.3 |
Leanstral pass@8 | 6.5B | 145 | 31.0 |
Leanstral pass@16 | 6.5B | 290 | 31.9 |
Claude Haiku 4.5 | N/A (proprietary) | 184 | 23.0 |
Claude Sonnet 4.6 | N/A (proprietary) | 549 | 23.7 |
Claude Opus 4.6 | N/A (proprietary) | 1,650 | 39.6 |
Qwen3.5-397B-A17B | 17B | N/A | 25.4 (pass@4) |
Kimi-K2.5-1T-A32B | 32B | N/A | ~20.1 |
GLM5-744B-A40B | 40B | N/A | ~16.6 |
Key points for developers:
Leanstral pass@2 at $36 outperforms Sonnet ($549) and Haiku ($184). The cost-performance ratio is unmatched in the ecosystem.
With 6.5B active parameters, Leanstral surpasses GLM5 (40B active), Kimi-K2.5 (32B active), and Qwen3.5 (17B active, pass@4).
Claude Opus still leads on raw quality (39.6 vs 31.9), but at 46x the cost.
Cost Table: Self-Hosting vs API
Deployment Mode | Cost | Advantage |
|---|---|---|
| Free (limited period) | Immediate start, no infrastructure |
Mistral Vibe ( | Free (uses API) | Automatic configuration |
Self-hosting (4x A100/H100) | Hardware cost only | Full control, no API dependency |
Claude Sonnet 4.6 (API) | ~$549 per FLTEval run | No self-hosting option |
Claude Opus 4.6 (API) | ~$1,650 per FLTEval run | Best quality, highest cost |
How to Deploy Leanstral: Three Scenarios for Developers
Scenario 1: Quick Test with Mistral Vibe
Mistral Vibe is Mistral AI's open-source CLI for orchestrating agents. Version 2.5.0 (March 16, 2026) adds the /leanstall command, which automatically configures Leanstral with the Lean MCP server.
This is the fastest path to evaluate the model on your own proofs. No infrastructure required: Vibe uses the Mistral API as its backend.
Scenario 2: Integration via the Labs API
The labs-leanstral-2603 endpoint is available for free for a limited period. For agencies looking to integrate formal verification into a CI/CD pipeline or internal tool, this is the simplest way to prototype.
The model supports MCP tool-calling, which allows integration into existing agentic workflows.
Scenario 3: Self-Hosting with vLLM
Weights are available on Hugging Face (mistralai/Leanstral-120B-A6B-2603) under the Apache 2.0 license. Recommended setup:
4 A100 80GB or H100 GPUs
vLLM with
--tensor-parallel-size 4Flash Attention MLA backend
Self-hosting is relevant for agencies working on sensitive projects (intellectual property, confidential data) or to guarantee availability without depending on a third-party service.
Note: at launch, the Hugging Face page showed a temporary 404 error. Weights should be fully accessible shortly.
Concrete Integration Scenarios for Agencies and Developers
Smart Contract Audits at Reduced Cost
The smart contract audit market relies heavily on formal verification. With Leanstral, an agency can offer formal audits at a fraction of the current cost. A proof of correctness via Leanstral pass@2 costs $36 compared to $549 with Sonnet or $1,650 with Opus. This cost reduction can transform the profitability of a blockchain audit offering.
Continuous Verification Pipeline for Critical Software
For teams developing critical software (medtech, fintech, infrastructure), Leanstral can integrate into a continuous verification pipeline. The typical workflow:
The developer writes the specification in Lean 4
Leanstral generates compliance proofs
The Lean compiler verifies validity
On failure, the agent automatically adjusts
The validated proof is versioned alongside the code
Migrating Rocq Proof Bases to Lean 4
Mistral AI demonstrated Leanstral's ability to translate proofs from Rocq (formerly Coq) to Lean 4, preserving semantics and custom notation. For agencies supporting academic or industrial clients in an ecosystem migration, this is a high-value use case.
Accelerating Formalized Research
Projects like Mathlib (over 20,000 contributions) and FLT (formalizing Fermat's theorem) generate a considerable volume of routine proofs. Leanstral can automate this portion of the work, allowing researchers to focus on creative proofs and new mathematical definitions.
Verifying Code Produced by Other AI Agents
The most promising medium-term use case: using Leanstral as a verification layer on top of other code generation agents. The code agent produces the implementation, Leanstral generates and verifies the compliance proof. This is the "trustworthy vibe coding" concept that Mistral AI promotes.
Technical Limitations and Points of Caution
Lean 4 Only
Leanstral only supports Lean 4. It does not generate proofs for Rocq, Isabelle, Agda, or any other proof assistant. If your project uses a different formal language, Leanstral is not suitable.
Opus Still Leads on Absolute Quality
Claude Opus 4.6 scores 39.6 on FLTEval versus 31.9 for Leanstral pass@16. The gap is 24%. For projects where every score point matters (critical security proofs, high-profile academic publications), Opus may justify its higher cost. The Hacker News community raised this question: should a specialist model beat a generalist model on its own specialty?
Diminishing Returns Beyond pass@8
The performance gain from pass@8 (31.0) to pass@16 (31.9) is only 0.9 points for a doubling in cost. Beyond a certain number of parallel attempts, the marginal investment becomes less profitable. Developers will need to calibrate the number of passes based on their budget and quality requirements.
Hardware Requirements for Self-Hosting
Self-hosting requires 4 high-end GPUs (A100 or H100). This is a significant investment, even for a well-equipped agency. For most use cases, the free API or Mistral Vibe will be more pragmatic during the discovery phase.
Weight Availability at Launch
The Hugging Face page showed a 404 error at launch. While likely temporary, this was noted by the community as a friction point for immediate adoption.
Leanstral in the Ecosystem: Positioning and Outlook
Leanstral occupies a unique position: it is the only model that combines Lean 4-specific proof training, an open-source license (Apache 2.0), and competitive inference costs. This combination did not exist before March 16, 2026.
For agencies and developers, the opportunity exists on two levels:
Short term: test Leanstral on existing projects, evaluate its integration into verification pipelines, and build new service offerings around formal verification.
Medium term: formal verification could become a standard in critical software development, as tools like Leanstral reduce its cost and complexity. Teams that master this technology today will be well-positioned tomorrow.
Mistral AI chose to make Leanstral free and open source to accelerate adoption and gather feedback. For developers, now is the time to experiment.



