A research that makes top models efficient.
Comprehensive evaluation under maximum thinking effort across knowledge, reasoning, coding, and agentic capability benchmarks.
Ultra-efficient architecture requiring a fraction of the compute of competing closed-source models at equivalent quality levels.
Fully open weights and architecture. Run locally, fine-tune for your use case, or deploy at scale with no restrictions.
Top scores across AIME, Codeforces, SWE-bench, HLE, and 12+ additional evaluation suites against GPT-5.5, Claude Opus, and DeepSeek.