Elon Musk's artificial intelligence company xAI released its latest language model Grok3 this Monday, marking a significant progress in the field of artificial intelligence. According to Musk, the new model requires ten times the computing power of its predecessor, using a data center in Memphis and equipped with about 200,000 GPUs.
The Grok3 series models have launched a variety of variants, including a streamlined version designed to increase speed but sacrifice partial accuracy. In addition, the new “inference” model is specially designed to solve mathematical and scientific problems. Users can adjust these functions through the Thinking and Brain settings in the Grok interface. xAI said this version has not been finalized yet, the model is still being trained continuously, and the team plans to improve in the coming weeks.
According to AI benchmarking platform lmarena.ai, Grok3 scored more than 1,400 in the chatbot field, becoming a leader, covering all categories such as programming, surpassing OpenAI, Anthropic and Google models. However, actual performance may differ from the benchmark results. For example, although Claude3.5Sonnet scores lower than some models in coding benchmarks, many users still consider it a better choice for programming tasks.
OpenAI founder Andrej Karpathy received early access to Grok3 and he highly praised the model's logical reasoning ability. The "think" feature is able to successfully handle complex tasks such as computing training flops for GPT-2 or creating hexagonal mesh for board games, which were previously limited to the high-end model of OpenAI only. In addition, this feature improves the accuracy of basic mathematical operations, such as letter counting and comparison of decimals.
In terms of new search capabilities, Karpasi noted that DeepSearch's quality is comparable to Perplexity's research tools, providing relevant answers to topics such as upcoming Apple products and Palantir stock dynamics. However, he also found some obvious problems: the model sometimes generates fake URLs, makes unsupported statements, and only quotes X's posts at specific prompts.
It also seems to be lacking in awareness of its existence, missing the location of xAI in the main AI labs. These limitations have left DeepSearch not yet at the quality level of OpenAI “deep research” and underperformed on humor and ethical issues.