← 返回首页
RT X Freeze: Grok 4.20 Beta ranks #2 with 97% accuracy score on the 𝜏²-Bench for Telecom (Agentic Tool Use) It outperforms Claude Opus 4.6(max), G...
RT X Freeze
Grok 4.20 Beta ranks #2 with 97% accuracy score on the 𝜏²-Bench for Telecom (Agentic Tool Use)
It outperforms Claude Opus 4.6(max), GPT-5.4(xhigh), and Gemini 3.1 Pro, while closing in on GLM-5 scoring the top in agentic work flow
Tool calling is the whole game for AI agents, and this is where Grok 4.20 takes over
with state-of-the-art intelligence that fires up instantly, making it the fastest at tokens per sec in the industry
Grok 4.20 Beta ranks #2 with 97% accuracy score on the 𝜏²-Bench for Telecom (Agentic Tool Use)
It outperforms Claude Opus 4.6(max), GPT-5.4(xhigh), and Gemini 3.1 Pro, while closing in on GLM-5 scoring the top in agentic work flow
Tool calling is the whole game for AI agents, and this is where Grok 4.20 takes over
with state-of-the-art intelligence that fires up instantly, making it the fastest at tokens per sec in the industry