
Akademi Kecerdasan Buatan Indonesia
February 18, 2025 at 01:23 PM
Benchmark Performance
Grok 3 has demonstrated impressive performance across various benchmarks, outperforming many of its competitors. Here's a comprehensive breakdown of the benchmark results:
* Reasoning + Test-Time Compute:
- Math (AIME '24): Grok-3 Reasoning Beta (93), Grok-3 mini Reasoning (96), o3-mini-high (87), o1 (83), DeepSeek-R1 (80), Gemini-2 Flash Thinking (73)
- Science (GPQA): Grok-3 Reasoning Beta (85), Grok-3 mini Reasoning (84), o3-mini-high (80), o1 (78), DeepSeek-R1 (71), Gemini-2 Flash Thinking (74)
- Coding (LCB Oct-Feb): Grok-3 Reasoning Beta (79), Grok-3 mini Reasoning (80), o3-mini-high (74), o1 (73), DeepSeek-R1 (65), Gemini-2 Flash Thinking (46)
* Standard Benchmarks:
- Math (AIME '24): Grok-3 (52), Grok-3 mini (40), Gemini-2 Pro (36), DeepSeek-V3 (39), Claude 3.5 Sonnet (16), GPT-4o (9)[1][2]
- Science (GPQA): Grok-3 (75), Grok-3 mini (65), Gemini-2 Pro (65), DeepSeek-V3 (59), Claude 3.5 Sonnet (65), GPT-4o (50)[1][2]
- Coding (LCB Oct-Feb): Grok-3 (57), Grok-3 mini (41), Gemini-2 Pro (36), DeepSeek-V3 (40), Claude 3.5 Sonnet (36), GPT-4o (34)[1][2]
Additionally, an early version of Grok-3, codenamed "Chocolate," became the first AI model to break the 1400 ELO score in the LMSYS Chatbot Arena, ranking first across all categories[3]. In the AIME 2025 Mathematics Competition, both Grok-3 Reasoning Beta and Grok-3 mini Reasoning dominated the top two positions, significantly outperforming other reasoning models[2].
Citations:
[1] xAI's Grok 3 Is Here—And It Might Be the Smartest AI on Earth https://felloai.com/2025/02/xais-grok-3-is-here-and-it-might-be-the-smartest-ai-on-earth/
[2] 200,000 GPUs! Musk unveils the "world's strongest" model Grok 3 ... https://news.futunn.com/en/post/53289310/200000-gpus-musk-unveils-the-world-s-strongest-model-grok
[3] Grok 3 Has Better Reasoning Than ChatGPT: Experts React To Elon ... https://www.ibtimes.com/grok-3-has-better-reasoning-chatgpt-experts-react-elon-musk-xais-latest-chatbot-3764083