Meta's Llama-4: A Promising but Controversial Leap in Open Source AI Models

This week, Meta revealed its latest artificial intelligence models, the highly anticipated Llama-4 LLM, to developers, and hinted at an even larger model that is still being trained. Despite the new model being state of the art, Meta purports that it can compete with the best closed-source models without the need for any fine-tuning.

According to an official announcement from Meta, “These models are our best yet thanks to distillation from Llama 4 Behemoth, a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs. Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight.”

The Llama 4 Scout and Maverick models each use 17 billion active parameters per inference, but they differ in the number of experts: Scout uses 16, while Maverick uses 128. Both models are now available for download on llama.com and Hugging Face, and Meta has also integrated them into WhatsApp, Messenger, Instagram, and its Meta.AI website.

The mixture of experts (MoE) architecture has been around in the tech world for a while, but it is new to Llama and is designed to make the model highly efficient. Rather than having a large model that activates all its parameters for every task, a mixture of experts activates only the necessary parts, leaving the rest of the model’s brain “dormant” and thus saving computing resources. This means that more powerful models can run on less powerful hardware.

In Meta’s case, for example, Llama 4 Maverick contains 400 billion total parameters but only activates 17 billion at a time, allowing it to run on a single NVIDIA H100 DGX card.

Meta’s new Llama 4 models feature native multimodality with early fusion techniques that integrate text and vision tokens. This allows for joint pre-training with massive amounts of unlabeled text, image, and video data, making the model more versatile.

Perhaps most impressive is Llama 4 Scout’s context window of 10 million tokens. This significantly surpasses the previous generation’s 128K limit and exceeds most competitors and even current leaders like Gemini with its 1M context. According to Meta, this leap enables multi-document summarization, extensive code analysis, and reasoning across massive datasets in a single prompt. Meta said its models were able to process and retrieve information in basically any part of its 10 million token window.

Meta also teased its still-in-training Behemoth model, which boasts 288 billion active parameters with 16 experts and nearly two trillion total parameters. The company claims this model already outperforms GPT-4.5, Claude 3.7 Sonnet, and Gemini 2.0 Pro on STEM benchmarks like MATH-500 and GPQA Diamond.

However, not all that glitters is gold. Several independent researchers have challenged Meta’s benchmark claims, finding inconsistencies when running their own tests. For example, some users found that Llama-4 was scored better than other models despite providing the wrong answer. Additionally, the model tends to write in an overly excited tone, often using emojis.

Despite Meta claiming its models were great at handling long context prompts, other users challenged these statements. Independent AI researcher Simon Willinson wrote in a blog post: “I then tried it with Llama 4 Scout via OpenRouter and got complete junk output for some reason.”

Our own testing of the model revealed that Meta’s claims about the model’s retrieval capabilities fell apart. We ran a “Needle in a Haystack” experiment, embedding specific sentences in lengthy texts and challenging the model to find them. At moderate context lengths, Llama-4

Bitcoin’s Ideal Leader: 100K Votes Flood Michael Saylor’s Poll, Backing Future BTC…

Kucoin Director Alicia Kao Highlights Role of Security and Regulation in Crypto

Crypto Gambling Crosses $26 Billion in Q1 Wagers Alone as the Industry…

Stripe Boosts Crypto Push With Valora Team Acquisition

Hyperliquid Rolls Out $30M HYPE Buyback

Bitcoin Enters ‘Controlled Volatility’ as $90K Level Comes Into Focus

Bitcoin Climbs Past $91K Ahead of Fed Decision

The Boring Blockchain Wins: Why Verification Beats Speculation

Artificial intelligence infrastructure is changing the cryptocurrency market – DeFi hashes are…

Quanchai Showcases Self-Developed Agricultural Power Solutions at AGRITECHNICA ASIA 2026

Seven Technical Criteria Regulators and Central Banks Use to Choose Blockchains for…

Rakurai Confirms Protocol Security Following Completed Hashlock Audit

Democratizing AI Compute: Nodera Scales Infrastructure to Connect Investors with Enterprise Demand

FinHarbor Expands White-Label Banking Stack as Embedded Finance Demand Shifts From Banks…

Meta’s Llama-4: A Promising but Controversial Leap in Open Source AI Models

Jack

Recent Posts

Ethereum exchange reserves hit 14 million as Q3 rally looms

Bubblemaps Flags WCUP Token Supply Pre-Purchase by Single Group

Ethereum exchange reserves hit 14-million low as Q3 setup shifts

Trending Now

Ethereum exchange reserves hit 14 million as Q3 rally looms

Liberland fires tech secretary over blockchain control and vote block

Bitcoin Breakout Imminent as $61K Support Holds: Analysts

Insights

Related posts