Live Data from Zip Platform
The only benchmark that tells you the truth in agentic use
Senrie Zip Benchmark doesn't test with sample data. The data are results of actual use in Zip. Real tasks, real performance, real insights.
180
Models
16
Providers
4
Evaluations
Performance Rankings
Top 10 Models
Across intelligence, speed, and price metrics
Intelligence Index
Composite score from real agentic tasks
Speed
Output tokens per second
Price (Blended)
USD per 1M tokens • 7:2:1 ratio
Why This Matters
Built on Real-World Data
Real Agentic Tasks
We test models with actual coding, research, analysis, and planning tasks from production usage.
No Synthetic Data
Every benchmark result comes from real user interactions on the Zip platform, not curated test sets.
Continuously Updated
Benchmarks are updated regularly as new models are released and performance data accumulates.
Last updated: 2026-06-10 •
Version 2.0.0
Learn about our methodology →