Top AI Models Compared: Coding & Reasoning Scores (2025)

Benchmark comparison of 15 leading AI models from 10 organizations, with coding scores, reasoning scores, context windows, and open-weights status.
# model
org
release_date
context_window_k
params_b
coding_score
reasoning_score
multimodal
open_weights
1 GPT-4o OpenAI 2024-05-13 128 unknown 82 88 yes no
2 Claude 3.5 Sonnet Anthropic 2024-10-22 200 unknown 85 91 yes no
3 Claude 3.7 Sonnet Anthropic 2025-02-24 200 unknown 88 94 yes no
4 Gemini 2.0 Flash Google 2025-01-15 1000 unknown 80 86 yes no
5 Gemini 2.5 Pro Google 2025-03-25 1000 unknown 86 93 yes no
6 Llama 3.3 70B Meta 2024-12-06 128 70 76 82 no yes
7 DeepSeek-V3 DeepSeek 2024-12-26 128 671 82 87 no yes
8 DeepSeek-R1 DeepSeek 2025-01-20 128 671 84 95 no yes
9 Grok-3 xAI 2025-02-17 131 unknown 83 92 yes no
10 Mistral Large 2 Mistral 2024-07-24 128 123 78 80 no yes
11 Qwen2.5-72B Alibaba 2024-09-19 128 72 79 83 no yes
12 Phi-4 Microsoft 2024-12-12 16 14 77 81 no yes
13 o3-mini OpenAI 2025-01-31 200 unknown 87 96 no no
14 o1 OpenAI 2024-09-12 200 unknown 85 95 no no
15 Command R+ Cohere 2024-04-04 128 104 74 78 no no
Double-click to expand
Sign in to edit this dataset. Sign in

Expand Analysis

Embed this dataset

Paste this code into your blog or website. Readers can search, sort, and paginate the data.

<iframe src="https://data.tablepage.ai/d/top-ai-models-compared-coding-reasoning-scores-2025?embed=1" width="100%" height="500" frameborder="0"></iframe>

Works on WordPress, Ghost, and any site that supports iframes.

Drop to create a new dataset CSV, TSV, or Excel
Uploading...

Upload your own dataset

Explore any CSV with AI insights, charts & filters. Free, no account needed.