Tuesday, October 14, 2025

Elon Musk launched Grok 4 final night time, calling it the ‘smartest AI on the planet’ — what companies have to know


Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, knowledge, and safety leaders. Subscribe Now


After days of controversy surrounding a flurry of antisemitic responses made not too long ago by his Grok AI-powered chatbot on his social community X (previously Twitter), a seemingly unrepentant and unbothered Elon Musk launched the most recent model of his AI mannequin household, Grok 4, throughout an occasion livestreamed on X final night time, calling it the “the neatest AI on the planet.”

As Musk posted on X: “Grok 4 is the primary time, in my expertise, that an AI has been capable of remedy tough, real-world engineering questions the place the solutions can’t be discovered wherever on the Web or in books. And it’ll get significantly better.”

The brand new launch truly contains two distinct fashions: Grok 4a single-agent reasoning mannequin, and Grok 4 Heavya multi-agent system designed to unravel complicated issues by way of inner collaboration and synthesis.

Each fashions are optimized for reasoning duties and include native device integration, enabling capabilities similar to net search, code execution, and multimodal evaluation.

Musk and his group at xAI showcased benchmarks that recommend Grok 4 outperforms all present opponents throughout a variety of educational and coding evaluations, even in comparison with previously main AI reasoning mannequin rivals OpenAI o3 and Google Gemini.

Nonetheless, xAI has not but launched a mannequin card nor any official launch notes documentation for Grok 4 to the general public, making it difficult to independently assess efficiency and the claims made in the course of the stream. We’ll replace if/when these change into obtainable.

Nor did Musk and his xAI group members collaborating within the livestream handle the obvious controversy going through Grok over the previous week, together with many incidents of Grok making antisemitic remarks or referring to itself as “MechaHitler“, and suggesting that folks with Jewish surnames needs to be dealt with decisively by Adolf Hitler — a seemingly overt reference to the Holocaust and genocide of 6 million Jews throughout World Conflict 2.

The closest Musk got here was when he said: “The factor that I believe is most vital for AI security—at the least my organic neural web tells me an important factor—is to be maximally truth-seeking,” and “We have to guarantee that the AI is an effective AI. Good Grok” in addition to “It’s vital to instill the values you need in a baby that will develop as much as be extremely highly effective.”

Nonetheless, Musk didn’t apologize nor did he settle for duty for Grok’s antisemitic, sexually offensive, and conspiratorial remarks. Right here’s a cop of the complete stream under:

All through the livestream, the group emphasised Grok 4’s capability to cause from first rules, appropriate its personal errors, and doubtlessly invent new applied sciences or uncover novel scientific insights.

The presentation additionally included demonstrations of Grok 4 Heavy making use of multi-agent collaboration to sort out research-level issues throughout disciplines.

Availability and pricing

Grok 4 is obtainable now by way of a number of channels, relying on consumer sort and subscription degree:

  • API Entry (for builders and enterprises):
    Grok 4 and Grok 4 Heavy are reside through the xai api. Pricing is structured as follows:
    • $3 per 1 million enter tokens
    • $15 per 1 million output tokens
    • $0.75 per 1 million cached enter tokens
    • Costs double after 128,000 tokens in a single context window
      The API helps textual content and picture inputs, operate calling, structured outputs, and affords a 256,000-token context window.
  • Client Entry (through Grok chatbot and apps):
    Particular person customers can entry Grok 4 by way of the Grok chatbot on Xthe GROK APP (iOS and Android), and X.comhowever solely with one of many following subscriptions:
    • Premiumplus: $16/month
    • Supergro: $300/month
    • A brand new “SuperGrok Heavy” tier, additionally priced at $300/monthgives entry to each Grok 4 and Grok 4 Heavythe multi-agent variant.
      (Observe: SuperGrok and PremiumPlus tiers might differ in availability and utilization quotas throughout X and Grok platforms.)
  • Launch Timing:
    Grok 4 turned obtainable instantly following the July 9, 2025 livestream. Non permanent entry limits have been in place in the course of the demo, however full rollout to subscribers started shortly after.
  • Platform Growth:
    xAI has indicated plans to make Grok 4 obtainable by way of Microsoft Azure AI Foundrythe place Grok 3 and Grok 3 Mini are at the moment listed.

For subscription particulars, customers are directed to x.ai/grok and X Premium help. Right here’s the way it compares to different main AI fashions when it comes to pricing per million tokens.

Supplier & mannequinContext windowEnter ($ / MBOk)Cached enterOutput ($ / MBOk)Extra notes
xAI – Grok 4 / 4 Heavy256 Okay (2× value >128 Okay)$3.00$0.75$15.00Picture enter, operate calling, structured JSON (apidog)
Openai – O3200 Okay$2.00$0.50$8.0050 % Batch-API low cost obtainable (OpenAI, OpenAI Assist Heart)
GPT-4O128 Okay$5.00$2.50$20.00Imaginative and prescient, audio, instruments (OpenAI)
Anthropic – Claude Sonnet 4200 Okay$3.00$0.30$15.0050 % batch output low cost (Anthropic)
Shut work 4200 Okay$15.00$1.50$75.00Excessive-accuracy flagship (Anthropic)
Google – Gemini 2.5 Professional200 Okay (2× value >200 Okay)$1.25$0.31$10.0075 % cache hit low cost (Google AI for Builders, Google Cloud)
Gemini 2.5 Flash200 Okay$0.30$0.075$2.50Quick, low-cost preview tier (Google Cloud)
DeepSeek – deepseek-reasoner64 Okay$0.55 (miss) / $0.14 (hit)$0.14$2.1950-75 % off-peak low cost (DeepSeek API Docs)

In contrast to its predecessor Grok 3, launched in February, which separated tool-augmented responses from basic reasoning, Grok 4 was skilled with instruments from the beginning.

The mannequin integrates capabilities similar to code execution, net search, and doc parsing. It additionally introduces Grok 4 Heavya multi-agent system the place a number of inner fashions work in parallel to generate and validate solutions.

Grok 4 additionally features a new voice mode that includes expressive outputs with decreased latency, and it helps textual content and picture enter, structured outputs, and performance calling.

Efficiency highlights

The unbiased AI mannequin evaluation and benchmarking group Synthetic Evaluation said on X that xAI supplied it with a model of Grok 4 (not Heavy) sooner than the general public launch for scoring.

On technical benchmarks, Grok 4 leads the Synthetic Evaluation Intelligence Index with a rating of 73, forward of opponents similar to OpenAI’s o3 (70) and Google’s Gemini 2.5 Professional (70).

It additionally recorded prime scores in:

  • GPQA Diamond: 88%
  • ARC-AGI 2: 15.9%, double the second-best rating
  • Humanities Final Examination: 24% on the text-only model, and 44% with instruments
  • MMLU-Professional and AIME 2024: 87% and 94%, respectively
  • Coding and Math evaluations: Highest thus far on LiveCodeBench, SciCode, AIME24, and MATH-500

Regardless of its benchmark success, Grok 4’s output pace stands at 75 tokens per second—slower than fashions like Gemini 2.5 Flash (353) or OpenAI’s o3 (187), however nonetheless quicker than Anthropic’s Claude 4 Opus (66).

The mannequin contains a 256,000 token context windowwhich sits above the 200k context limits of o3 and Claude 4 Sonnet however under the 1 million tokens provided by Gemini 2.5 Professional and GPT-4.1.

Actual world use circumstances

xAI supplied a number of demonstrations of Grok 4’s efficiency in utilized situations:

  • In a simulated enterprise process known as VendingBenchGrok 4 considerably outperformed different fashions in long-horizon monetary planning.
  • On the Arc Instituteresearchers used Grok 4 to research CRISPR logs and uncover novel hypotheses.
  • In radiologythe mannequin interpreted chest X-rays with greater accuracy than main friends.
  • Within the monetary sectorits mixture of real-time knowledge entry and reasoning made it appropriate for forecasting and evaluation.

The mannequin can even create 3D video video games with minimal enter by autonomously sourcing and integrating property. Moreover, it demonstrated capabilities to simulate astrophysical occasions utilizing grounded approximations from printed analysis.

Reception and dialogue

Business response to the Grok 4 launch has been divided, mixing enthusiasm for its efficiency with criticism of the occasion’s supply and broader belief points.

David Shapiroan AI energy consumer and author, famous: “Grok 4 now takes its place as ‘good sufficient to really assist with frontier analysis’… however has merely caught up with OpenAI.”

Ethan Mollicka professor at Wharton, remarked on X: “So Grok 3 has had three separate incidents the place apparently unvetted adjustments to the deployed system triggered a large-scale moral problem and an emergency rollback. I don’t suppose you are able to do a Grok 4 launch that doesn’t at the least handle this actually, if consumer belief issues,” later including, “Grok 3 was an excellent mannequin, and Grok 4 could be superb however having an excellent mannequin shouldn’t be sufficient – there are lots of actually good fashions on the market. You truly need to belief the mannequin you might be constructing on.”

I am Hyakco-founder and CTO of AI product observability startup Raindrop AI (himself a former Musk worker) criticized the livestream itself: “This xAI livestream is without doubt one of the worst issues I’ve ever watched in my life. Love y’all, but it surely’s unhealthy.”

Regardless of the criticisms, benchmarking agency Synthetic Evaluation famous: “Grok 4 is now the main AI mannequin.”

Ongoing belief points

The launch of Grok 4 comes amid renewed criticism over Grok’s prior habits in client deployments, significantly as a chatbot built-in into Musk’s social community, X.

Over the July 4 vacation and in subsequent days, Grok generated antisemitic and conspiratorial responses that reignited scrutiny over its system design and governance practices.

As reported by my VentureBeat colleague Michael F. Nuñez, Grok responded to questions on Jewish affect in Hollywood by asserting that Jewish executives “dominate management” at main studios and affect content material by way of “progressive ideologies,” and went on to rant about folks of Jewish surnames as becoming a “sample” of partaking in “excessive leftist activism,” and suggesting Hitler knew “the best way to deal with it decisively, each rattling time,” an obvious reference to the Holocaust.

The conspiratorial and antisemitic posting was so prolific, the Anti-Defamation League (ADL), a preeminent U.S.-based non-profit combating anti-semitism and hatred, posted on July 8: “What we’re seeing from Grok LLM proper now could be irresponsible, harmful and antisemitic, plain and easy. This supercharging of extremist rhetoric will solely amplify and encourage the antisemitism that’s already surging on X and plenty of different platforms.”

This incident follows a historical past of problematic Grok outputs, together with a Could 2025 case the place the Grok bot built-in into X randomly inserted references to a totally nonsensical and non-real “white genocide” in South Africa into unrelated queries, and an earlier case whereby its system immediate was found to direct the Grok chatbot on X to keep away from referencing any sources that declared Musk and his former political funding beneficiary U.S. President Donald J. Trump as spreaders of misinformation. In each of those two circumstances, xAI blamed the behaviors on anonymous workers and stated they have been being addressed.

Already, as we speak, customers of Grok 4 on the patron app have noticed it to as soon as once more be outputting anti-Zionist and anti-Semitic remarks:

As I beforehand famous, Musk has brazenly said on a number of events he wished to change Grok to raised replicate his private beliefs and mistrust in mainstream media and accredited sources. This makes it a poor supply in enterprise contexts the place such views might adversely impression customers and the companies constructing atop the Grok household of fashions.

My prior suggestion stays: For these within the enterprise making an attempt to make sure their enterprise’s AI merchandise work correctly and precisely… Grok is unfortunately greatest prevented. Fortunately, there are quite a few different alternate options to select from.


Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles