RT X Freeze: Grok 4.20 Beta ranks #2 with 97% accuracy score on the 𝜏²-Bench for Telecom (Agentic Tool Use) It outperforms Claude Opus 4.6(max), G...

来源：马斯克X | 发布时间：2026-03-31 17:42

RT X Freeze
Grok 4.20 Beta ranks #2 with 97% accuracy score on the 𝜏²-Bench for Telecom (Agentic Tool Use)

It outperforms Claude Opus 4.6(max), GPT-5.4(xhigh), and Gemini 3.1 Pro, while closing in on GLM-5 scoring the top in agentic work flow

Tool calling is the whole game for AI agents, and this is where Grok 4.20 takes over

with state-of-the-art intelligence that fires up instantly, making it the fastest at tokens per sec in the industry