AI RESEARCH LAB Open-Source · Efficiency-First

NitrAI

A research that makes top models efficient.

96.7%
AIME '25
3206
Codeforces Elo
90.2%
Apex Shortlist
9.8x
FLOPs Efficiency
Benchmark Dashboard

Performance vs. Top-Tier Models

Comprehensive evaluation under maximum thinking effort across knowledge, reasoning, coding, and agentic capability benchmarks.

OpenGCM 8B: Performance Evaluation vs. Top Tier Models *Models evaluated under maximum thinking effort for relevant benchmarks. 100 80 60 40 20 0 Accuracy / Pass@1 (%) 57.9 46.2 45.5 SimpleQA Verified (Pass@1) 75.6 37.7 40.0 39.8 44.4 HLE (Pass@1) 90.2 85.9 78.1 89.1 Apex Shortlist (Pass@1) 3206 3168 3052 Codeforces (Rating) 80.6 80.8 80.6 SWE Verified (Resolved) 67.9 65.4 75.1 68.5 Terminal Bench 2.0 (Acc) 51.8 47.2 51.8 48.8 Toolathlon (Pass@1) KNOWLEDGE & REASONING AGENTIC CAPABILITIES OpenGCM 8B DeepSeek-V4-Pro Claude-Opus 4.8 GPT-5.5 Gemini 3.1 Pro Single-Token FLOPs (T) 0.6 0.0 1.2 9.8x lower 3.7x lower 0 512 1024 Token Position (K) Accumulated KV Cache (GB) 30 0 60 13.7x lower 9.5x lower 0 512 1024 Sequence Length (K) AIME '25 IMO-AnswerBench 96.7 AIME '26 IMO-AnswerBench 97.1 AIME-Answer AnswerBench 87.1 IFBench Instruction Following 74.5 SWE-bench Pro Software Engineering 62.1 Terminal-Bench Interactive Shell 81.0 NL2Repo Repo-level generation 48.9 DeepSWE Agentic Debugging 46.2 ProgramBench Logic & Syntax 63.7 MCP-Atlas Model Context Protocol 77.0 Tool-Decathlon Multi-tool use loops 48.2 Humanity's Exam Extreme Reasoning 54.7

9.8× FLOPs Efficiency

Ultra-efficient architecture requiring a fraction of the compute of competing closed-source models at equivalent quality levels.

Open-Source, Apache 2.0

Fully open weights and architecture. Run locally, fine-tune for your use case, or deploy at scale with no restrictions.

Frontier-Class Benchmarks

Top scores across AIME, Codeforces, SWE-bench, HLE, and 12+ additional evaluation suites against GPT-5.5, Claude Opus, and DeepSeek.