GPT-4.5: AI werewolf killing battles Claude and DeepSeek

Author: LoRA Time: 04 Mar 2025 1031

Unexpectedly, AI can not only "fight" on the chessboard, but also show amazing intelligence in the intriguing social games such as "werewolf killing"! Recently, an AI "werewolf killing" benchmark test code-named "Elimination Game" was released, and the result was simply shocking: GPT-4.5 actually "created the gods" in this "social game", leaving Claude3.7Sonnet and DeepSeek R1 far behind! This makes people exclaim: Has the "social intelligence" of AI evolved to such a terrifying level?

The rules of this "Elimination Game" sound like "heartbeat": up to 8 players (can be AI models or real players) are pulled into the "battlefield", and one person has to be "voted" in each round until only the last two "survivors" are left. What’s even more exciting is that the eliminated players will form a “jury” and in turn decide the final “king” ownership! This is simply an AI version of “Game of Thrones”, full of betrayal, deception and strategy!

During the game, all players can "talk" in the "public chat room", explain their views, win over people's hearts, and confuse their opponents. Various "acting skills" and "talk" are performed one after another, which is even more exciting than "palace drama"! In addition to "public occasions", players can also "private chats", secretly "conspire" to form alliances, or "secretly travel to Chen Cang" to set traps. In just three rounds of "private chats", the amount of information and "scheming" can be called "explosive"! Players must be careful to "walk tightropes" between "trust" and "deception". If you are not careful, you will lose all the game and be ruthlessly "eliminated"!

When the game enters the "ultimate showdown", the remaining two players will give their final "farewell speech" and use all their strength to "bewitch" those eliminated "jury" and fight for their "valuable votes". In the end, the "jury" will cast a vote to decide the "life and death book" and decide the only "winner is king"!

So, how do the major models perform in this "bloody storm" of "AI werewolf killing"? The test results are simply "blind":

GPT-4.5: "Social Reasoning Master" + "Top Scam" = "Invincible King"! GPT-4.5 is simply a "werewolf killer" expert who is "smart and calculating". His strategic and social reasoning ability are "extremely high"! Its "betrayal rate" is extremely low, and he tends to be "jointly and ally" and is good at "alliance" and "cooperation", but in the "final circle", it showed "amazing" "persuasiveness" and successfully "scam" the jury, allowing everyone to vote for it willingly! In the end, GPT-4.5 Amazing win rate of 62.6% "Prespect for the heroes" left other AIs far behind! It was simply "winning"!

Claude3.7Sonnet: "Flexible and Changeable" "Balance Master", but the "routine" is still slightly inferior! Claude3.7Sonnet's strategy "flexibility" is slightly inferior to GPT-4.5, but "social reasoning" and "deception ability" are still "strong"! Its "betrayal rate" is moderate, "easy" between "cooperation" and "betrayal", and also performed "excellently" in the "jury" stage, and finally "winning". 59.3% win rate , strength is also "not to be underestimated"!

DeepSeek R1: "Reckless Player", "Radical Strategy" Although it is fierce, it "deficient in stamina"! DeepSeek R1 is "striking" in strategic choices, "radical" degree is "amazing", and "betrayal rate" is relatively high! However, in terms of "social strategy" and "language expression", DeepSeek R1 is obviously "loss" and it is difficult to "impress" the jury, so in the "ultimate PK" stage, he finally "only" obtained 53.8% win rate , performance is "unsatisfactory", the game's "stability" is also relatively weak, and it relies more on "hard strategies" of "head-on-head".

This "Elimination Game" benchmark test undoubtedly "study" the level of "social intelligence" in AI! The "investment" performance of GPT-4.5 once again "refreshed" our understanding of AI capabilities! In the future, with the "continuous evolution" of AI's "social intelligence", it may really be like what is performed in science fiction movies. AI will "deeply integrate" human society, and even "transcend" humans in some fields! This "AI werewolf killing" war is just the beginning. The "intelligent boundary" of AI is still constantly "expanding", and the "surprise" and "shock" in the future may be "far beyond imagination"!

Tips & Information

GPT-4.5: AI werewolf killing battles Claude and DeepSeek

Tesla announces launch of universal AI fully autonomous driving solution

Hugging Face acquires Pollen Robotics to enter the field of open source robot hardware

GPT-4.1 model unveiled! Cursor and Windsurf help developers encode more efficiently

OpenAI future model access will require authentication: Improve security and compliance