Akademi Kecerdasan Buatan Indonesia
Akademi Kecerdasan Buatan Indonesia
February 18, 2025 at 01:23 PM
Benchmark Performance Grok 3 has demonstrated impressive performance across various benchmarks, outperforming many of its competitors. Here's a comprehensive breakdown of the benchmark results: * Reasoning + Test-Time Compute: - Math (AIME '24): Grok-3 Reasoning Beta (93), Grok-3 mini Reasoning (96), o3-mini-high (87), o1 (83), DeepSeek-R1 (80), Gemini-2 Flash Thinking (73) - Science (GPQA): Grok-3 Reasoning Beta (85), Grok-3 mini Reasoning (84), o3-mini-high (80), o1 (78), DeepSeek-R1 (71), Gemini-2 Flash Thinking (74) - Coding (LCB Oct-Feb): Grok-3 Reasoning Beta (79), Grok-3 mini Reasoning (80), o3-mini-high (74), o1 (73), DeepSeek-R1 (65), Gemini-2 Flash Thinking (46) * Standard Benchmarks: - Math (AIME '24): Grok-3 (52), Grok-3 mini (40), Gemini-2 Pro (36), DeepSeek-V3 (39), Claude 3.5 Sonnet (16), GPT-4o (9)[1][2] - Science (GPQA): Grok-3 (75), Grok-3 mini (65), Gemini-2 Pro (65), DeepSeek-V3 (59), Claude 3.5 Sonnet (65), GPT-4o (50)[1][2] - Coding (LCB Oct-Feb): Grok-3 (57), Grok-3 mini (41), Gemini-2 Pro (36), DeepSeek-V3 (40), Claude 3.5 Sonnet (36), GPT-4o (34)[1][2] Additionally, an early version of Grok-3, codenamed "Chocolate," became the first AI model to break the 1400 ELO score in the LMSYS Chatbot Arena, ranking first across all categories[3]. In the AIME 2025 Mathematics Competition, both Grok-3 Reasoning Beta and Grok-3 mini Reasoning dominated the top two positions, significantly outperforming other reasoning models[2]. Citations: [1] xAI's Grok 3 Is Here—And It Might Be the Smartest AI on Earth https://felloai.com/2025/02/xais-grok-3-is-here-and-it-might-be-the-smartest-ai-on-earth/ [2] 200,000 GPUs! Musk unveils the "world's strongest" model Grok 3 ... https://news.futunn.com/en/post/53289310/200000-gpus-musk-unveils-the-world-s-strongest-model-grok [3] Grok 3 Has Better Reasoning Than ChatGPT: Experts React To Elon ... https://www.ibtimes.com/grok-3-has-better-reasoning-chatgpt-experts-react-elon-musk-xais-latest-chatbot-3764083

Comments