Evaluating the abstract reasoning capabilities of leading AI models on the Abstraction and Reasoning Corpus
The ARC-AGI benchmark evaluates AI models on their ability to solve abstract reasoning tasks from the Abstraction and Reasoning Corpus. Models are ranked based on their accuracy (percentage of tasks solved correctly).
Based on the ARC-AGI repository by François Chollet. The Abstraction and Reasoning Corpus (ARC) is a benchmark designed to measure general AI reasoning capabilities.
Rank | Model | Accuracy | Success Rate | Avg. Time (s) | Last Updated |
---|---|---|---|---|---|
1 | anthropic/claude-sonnet-4 | 35.0% | 140/400 | 17.21 | 2025-05-23 |
2 | anthropic/claude-3.7-sonnet | 34.2% | 137/400 | 15.20 | 2025-04-05 |
3 | anthropic/claude-3.5-sonnet | 32.5% | 130/400 | 12.01 | 2025-02-07 |
4 | deepseek/deepseek-chat-v3-0324 | 28.0% | 112/400 | 17.50 | 2025-04-05 |
5 | google/gemini-2.5-flash | 26.2% | 105/400 | 5.59 | 2025-06-22 |
6 | google/gemini-2.5-flash-preview | 25.2% | 101/400 | 6.37 | 2025-04-18 |
7 | Grok 3 (Beta) | 24.3% | 94/387 | 10.0k | 2025-02-25 |
8 | Gemini 2.0 Pro | 22.2% | 89/400 | 10.0k | 2025-02-25 |
9 | openai/gpt-4.1 | 20.5% | 82/400 | 7.13 | 2025-04-15 |
10 | meta-llama/llama-4-maverick | 18.5% | 74/400 | 5.54 | 2025-04-06 |
11 | openai/gpt-4.1-mini | 17.5% | 70/400 | 5.42 | 2025-04-15 |
12 | google/gemini-2.0-flash-lite-001 | 16.0% | 64/400 | 3.60 | 2025-03-13 |
13 | google/gemini-2.5-flash-lite-preview-06-17 | 15.0% | 60/400 | 1.85 | 2025-06-22 |
14 | mistralai/mistral-small-3.2-24b-instruct:free | 11.2% | 45/400 | 40.99 | 2025-06-22 |
15 | meta-llama/llama-4-scout | 10.5% | 42/400 | 1.19 | 2025-04-06 |
16 | google/gemma-3-27b-it | 10.2% | 41/400 | 30.89 | 2025-03-14 |
17 | mistralai/mistral-small-3.1-24b-instruct-2503 | 10.0% | 40/400 | 88.91 | 2025-03-21 |
18 | openai/gpt-4c-mini | 8.0% | 32/400 | 7.61 | 2025-02-09 |
19 | openai/gpt-4.1-nano | 6.2% | 25/400 | 5.23 | 2025-04-14 |
For detailed benchmark results and methodology, check out our YouTube video:
Watch Benchmark VideosI'll be adding more AI benchmark evaluations soon
I'm working on additional AI benchmarks that will be added to this page in the future.
Subscribe to my YouTube channel to be notified when new benchmarks are released:
Subscribe for Updates