AI BENCHMARKS

Comprehensive performance evaluations of leading AI models

Back to Home
ARC-AGI Benchmark Chess Benchmark Coming Soon

ARC-AGI Benchmark

Evaluating the abstract reasoning capabilities of leading AI models on the Abstraction and Reasoning Corpus

The ARC-AGI benchmark evaluates AI models on their ability to solve abstract reasoning tasks from the Abstraction and Reasoning Corpus. Models are ranked based on their accuracy (percentage of tasks solved correctly).

Based on the ARC-AGI repository by François Chollet. The Abstraction and Reasoning Corpus (ARC) is a benchmark designed to measure general AI reasoning capabilities.

Latest Results

Rank Model Accuracy Success Rate Avg. Time (s) Last Updated
1 anthropic/claude-sonnet-4 35.0% 140/400 17.21 2025-05-23
2 anthropic/claude-3.7-sonnet 34.2% 137/400 15.20 2025-04-05
3 anthropic/claude-3.5-sonnet 32.5% 130/400 12.01 2025-02-07
4 deepseek/deepseek-chat-v3-0324 28.0% 112/400 17.50 2025-04-05
5 google/gemini-2.5-flash 26.2% 105/400 5.59 2025-06-22
6 google/gemini-2.5-flash-preview 25.2% 101/400 6.37 2025-04-18
7 Grok 3 (Beta) 24.3% 94/387 10.0k 2025-02-25
8 Gemini 2.0 Pro 22.2% 89/400 10.0k 2025-02-25
9 openai/gpt-4.1 20.5% 82/400 7.13 2025-04-15
10 meta-llama/llama-4-maverick 18.5% 74/400 5.54 2025-04-06
11 openai/gpt-4.1-mini 17.5% 70/400 5.42 2025-04-15
12 google/gemini-2.0-flash-lite-001 16.0% 64/400 3.60 2025-03-13
13 google/gemini-2.5-flash-lite-preview-06-17 15.0% 60/400 1.85 2025-06-22
14 mistralai/mistral-small-3.2-24b-instruct:free 11.2% 45/400 40.99 2025-06-22
15 meta-llama/llama-4-scout 10.5% 42/400 1.19 2025-04-06
16 google/gemma-3-27b-it 10.2% 41/400 30.89 2025-03-14
17 mistralai/mistral-small-3.1-24b-instruct-2503 10.0% 40/400 88.91 2025-03-21
18 openai/gpt-4c-mini 8.0% 32/400 7.61 2025-02-09
19 openai/gpt-4.1-nano 6.2% 25/400 5.23 2025-04-14

For detailed benchmark results and methodology, check out our YouTube video:

Watch Benchmark Videos

Chess Benchmark

Results

Rank Model
1 OpenAI o3
2 Gemini 2.5 Pro

Watch the matches: YouTube Playlist

Coming Soon

I'll be adding more AI benchmark evaluations soon

More Benchmarks Coming Soon

I'm working on additional AI benchmarks that will be added to this page in the future.

Subscribe to my YouTube channel to be notified when new benchmarks are released:

Subscribe for Updates