Evaluating the logical reasoning capabilities of leading AI models across various complex reasoning tasks
| Rank | Model | Total Score |
|---|---|---|
| 1 | Gemini 3 Pro (preview) | 36/40 |
| 2 | Gpt 5.2 (xhigh) | 34/40 |
| 3 | Grok 4 Reasoning | 27/40 |
| 4 | Gpt-5 | 26/40 |
| 5 | Gpt 5.1 | 25/40 |
| 5 | Gemini 3 Flash (preview) | 25/40 |
| 7 | Qwen 3 Max (thinking) | 22/40 |
| 7 | Kimi K2 thinking | 22/40 |
| 9 | Claude Sonnet 4.5 (high) Openrouter | 21/40 |
| 10 | Grok 4 Fast reasoning | 20/40 |
| 11 | Gemini 2.5 Pro | 19/40 |
| 11 | Claude Opus 4.1 | 19/40 |
| 13 | Claude 4.5 Haiku | 14/40 |
| 14 | MIn-max M2 | 10/40 |
| 15 | Gpt-oss-120b | 8/40 |
| 16 | Qwen 3 235b - d22 b 2507 | 4/40 |
I'll be adding more AI benchmark evaluations soon
I'm working on additional AI benchmarks that will be added to this page in the future.
Subscribe to my YouTube channel to be notified when new benchmarks are released:
Subscribe for Updates