Live Data from Zip Platform

The only benchmark that tells you the truth in agentic use

Senrie Zip Benchmark doesn't test with sample data. The data are results of actual use in Zip. Real tasks, real performance, real insights.

180
Models
16
Providers
4
Evaluations
Performance Rankings

Top 10 Models

Across intelligence, speed, and price metrics

View full leaderboard
Anthropic logo Anthropic
OpenAI logo OpenAI
Google logo Google
DeepSeek logo DeepSeek
Meta logo Meta
xAI logo xAI
Mistral logo Mistral
Alibaba logo Alibaba
Cohere logo Cohere

Intelligence Index

Composite score from real agentic tasks

Speed

Output tokens per second

Price (Blended)

USD per 1M tokens • 7:2:1 ratio

Why This Matters

Built on Real-World Data

Real Agentic Tasks

We test models with actual coding, research, analysis, and planning tasks from production usage.

No Synthetic Data

Every benchmark result comes from real user interactions on the Zip platform, not curated test sets.

Continuously Updated

Benchmarks are updated regularly as new models are released and performance data accumulates.

Last updated: 2026-06-10 Version 2.0.0
Learn about our methodology →