Top AI Models Compared: Coding & Reasoning Scores (2025)
Benchmark comparison of 15 leading AI models from 10 organizations, with coding scores, reasoning scores, context windows, and open-weights status.
1–15 of 15
Rows per page:
1 / 1
| # | model | org | release_date | context_window_k | params_b | coding_score | reasoning_score | multimodal | open_weights |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GPT-4o | OpenAI | 2024-05-13 | 128 | unknown | 82 | 88 | yes | no |
| 2 | Claude 3.5 Sonnet | Anthropic | 2024-10-22 | 200 | unknown | 85 | 91 | yes | no |
| 3 | Claude 3.7 Sonnet | Anthropic | 2025-02-24 | 200 | unknown | 88 | 94 | yes | no |
| 4 | Gemini 2.0 Flash | 2025-01-15 | 1000 | unknown | 80 | 86 | yes | no | |
| 5 | Gemini 2.5 Pro | 2025-03-25 | 1000 | unknown | 86 | 93 | yes | no | |
| 6 | Llama 3.3 70B | Meta | 2024-12-06 | 128 | 70 | 76 | 82 | no | yes |
| 7 | DeepSeek-V3 | DeepSeek | 2024-12-26 | 128 | 671 | 82 | 87 | no | yes |
| 8 | DeepSeek-R1 | DeepSeek | 2025-01-20 | 128 | 671 | 84 | 95 | no | yes |
| 9 | Grok-3 | xAI | 2025-02-17 | 131 | unknown | 83 | 92 | yes | no |
| 10 | Mistral Large 2 | Mistral | 2024-07-24 | 128 | 123 | 78 | 80 | no | yes |
| 11 | Qwen2.5-72B | Alibaba | 2024-09-19 | 128 | 72 | 79 | 83 | no | yes |
| 12 | Phi-4 | Microsoft | 2024-12-12 | 16 | 14 | 77 | 81 | no | yes |
| 13 | o3-mini | OpenAI | 2025-01-31 | 200 | unknown | 87 | 96 | no | no |
| 14 | o1 | OpenAI | 2024-09-12 | 200 | unknown | 85 | 95 | no | no |
| 15 | Command R+ | Cohere | 2024-04-04 | 128 | 104 | 74 | 78 | no | no |
Double-click to expand