Scam is too strong if it implies the technology is fake. The tools are real and useful. The more precise charge is that the framing is a confidence game built on a deliberate category error: a narrow text predictor (the LLM) gets sold under a term so broad it implies almost anything (AI), in order to evoke a capability that does not exist (AGI). The hallucination rates, the 37% lab-to-deployment gap, the majority failure on real office tasks, and the sub-1% scores on genuine novel reasoning are all publicly measurable and sit in tension with valuations that only make sense if you believe the AGI story.

Analysis 09 JUNE 2026

The Naming Game: How "AI" Became the Most Profitable Word in History

Q: What is the difference between AI, LLM, and AGI?

Artificial Intelligence (AI) is a broad academic field dating to 1956 covering any system that performs tasks associated with intelligence. A Large Language Model (LLM) is the specific thing nearly every AI product today actually is: a statistical engine trained on text to predict the most likely next token in a sequence. Artificial General Intelligence (AGI) is a system that can learn and perform any intellectual task a human can, transfer knowledge across domains, and handle novel problems it was never trained on. AGI does not exist. The marketing sleight of hand is that companies sell you an LLM, label it AI, and let your imagination fill the word with AGI.

Q: How often do AI language models hallucinate or make factual errors?

Across modern LLMs, factual error rates run from roughly 15% to over 50% depending on the model and task, with most clustering in the 20 to 27% range. In legal contexts the average sits near 18.7% and in scientific contexts near 16.9%. On high-complexity reasoning tasks the rate still exceeds 33%. In adversarial testing, citation fabrication has been measured as high as 94%. Enterprise agentic systems show roughly a 37% gap between lab benchmark scores and real-world deployment performance.

Q: Are AI companies like OpenAI and Anthropic profitable?

No. OpenAI generated around $20 billion in annualized revenue while projecting a $14 billion loss for 2026, spending roughly $1.69 for every dollar of revenue. It expects to keep burning at a 57% rate through 2027 and projects operating losses near $74 billion in 2028. Anthropic reported a rising revenue trajectory with a $47 billion run-rate but still loses more than it makes. All three major players, OpenAI, Anthropic, and xAI, are currently valued at 15 to 50 times revenue while spending more than they earn.

Q: Can AI actually do real office work and knowledge tasks?

Current AI systems fail the majority of tasks on TheAgentCompany, a benchmark designed to mimic actual workplace tasks. Administrative and finance work produced the lowest scores, with many models completing none of the tasks successfully. The systems perform best at software engineering because the entire field has been optimized around coding benchmarks. On ARC-AGI-3, a test of genuine novel reasoning with no stated rules, frontier models score below 1%, while ordinary humans handle it easily.

Companies sell you an LLM, label it AI, and let your imagination fill the word with AGI. A data-driven investigation into the gap between what the technology actually is, what it actually does, and what it is being valued as.

By Kai Tutor, Societal News | June 9, 2026 | ~10 min read

Chart showing the gap between AI company valuations and actual revenue and performance data in 2026, with OpenAI valued at nearly $900 billion while spending $1.69 for every dollar earned — Global AI capital expenditure is projected near $690 billion for 2026. OpenAI projects $74 billion in operating losses by 2028. The valuations require the marketing.

15-50% Factual error rate across modern LLMs depending on model and task

37% Gap between lab benchmark scores and real-world enterprise deployment performance

<1% Frontier AI score on ARC-AGI-3, a genuine novel reasoning test ordinary humans pass easily

$1.69 Amount OpenAI spends for every $1.00 of revenue it earns

$965B Anthropic's peak pre-IPO valuation on $47B annualized revenue run-rate

$300B Venture capital poured into AI in Q1 2026 alone

Before the Argument, the Definitions

Almost the entire marketing problem lives in the gap between three terms that get used interchangeably and should not be.

Artificial Intelligence is the broad umbrella, an academic field dating to 1956 covering any system that performs tasks we associate with intelligence. By this definition your email spam filter is AI. So is the algorithm that beats you at chess. The term is so wide it is nearly meaningless as a product descriptor, which is exactly why it is so useful for marketing.

A Large Language Model, or LLM, is the specific thing nearly every "AI" product today actually is. It is a statistical engine trained on enormous amounts of text to predict the most likely next token in a sequence. That is the whole mechanism. It is extraordinary at it, and that prediction process produces outputs that look like reasoning. But the underlying operation is pattern completion at massive scale, not understanding in any human sense.

Artificial General Intelligence, or AGI, is the thing being implied but not delivered. AGI means a system that can learn and perform any intellectual task a human can, transfer knowledge across domains, and handle novel problems it was never trained on. It does not exist. Nobody has built it. There is no clear technical path to it, and there is no agreement on how we would even measure it if it arrived.

The Marketing Sleight of Hand Companies sell you an LLM, label it "AI," and let your imagination fill the word with AGI. The product is narrow. The connotation is general. The valuation is priced on the connotation. This same pattern, where official language is constructed to evoke something broader than what is actually being delivered, appears across many industries, as we documented in our investigation into how ghost jobs inflate perceptions of a healthy labor market.

What We Actually Have Right Now

We have very good text predictors with tool access. They write code, draft documents, summarize, translate, and hold fluent conversations. These are real, valuable capabilities, and dismissing them entirely is as dishonest as overselling them. But "can do impressive things" and "reliable enough to bet a trillion dollars on" are different claims, and the data on the second one is not flattering.

Start with hallucination. Across modern LLMs, factual error rates run from roughly 15% to over 50% depending on the model and task, with most clustering in the 20 to 27% range. In legal contexts the average sits near 18.7% and in scientific contexts near 16.9%. On high-complexity reasoning tasks the rate still exceeds 33%. In adversarial testing, citation fabrication has been measured as high as 94%. These are not edge cases. A system that invents facts a fifth to a half of the time is being sold as a knowledge engine.

The Lab-to-Deployment Gap Nobody Talks About Enterprise agentic systems show roughly a 37% gap between their lab benchmark scores and real-world deployment performance. The same model, Claude Opus 4, scored 64.9% inside one agent framework and 57.6% inside another on an identical task set, a 7-point swing that came entirely from the wrapper software, not the model. The benchmarks that headlines quote are measuring something closer to ideal lab conditions than to what you get when you actually deploy.

The most damning evidence comes from benchmarks built to mimic actual jobs. On TheAgentCompany, a test of consequential real-world workplace tasks, state-of-the-art agents fail the majority of tasks. Administrative and finance work, the bread-and-butter of most office jobs, produced the lowest scores, with many models completing none of the tasks successfully. The systems do best at software engineering, and the researchers explain why with refreshing bluntness: the entire field has been optimized around coding benchmarks because coding data is abundant and public. In other words, the models look smartest at the one thing they were most heavily trained and tested on, and that narrow success gets generalized into "it can do knowledge work."

On ARC-AGI-3, a test of genuine novel reasoning with no stated rules, frontier models score below 1% while ordinary humans handle it easily. That single data point is the clearest illustration of the AGI gap. The thing being marketed as approaching human-level general intelligence cannot do what a curious child does in minutes.

Why "AI" Gets Marketed This Hard

The valuations require the marketing, not the other way around. As of mid-2026, Anthropic raised $65 billion to reach a $965 billion valuation, briefly the highest ever for a pre-IPO company. OpenAI sat around $852 billion. SpaceX, after merging with xAI, was valued around $1.25 trillion. The combined fundraising across the three expected IPOs could exceed $200 billion. Global venture capital poured roughly $300 billion into AI in Q1 2026 alone, and industry-wide AI capital expenditure for 2026 is projected near $690 billion.

OpenAI's Financials in Plain Terms OpenAI generated around $20 billion in annualized revenue while projecting a $14 billion loss for 2026, spending roughly $1.69 for every dollar of revenue. It expects to keep burning at a 57% rate through 2027 and projects operating losses near $74 billion in 2028 before some promised pivot to profitability in 2030. It has signed up to $1.4 trillion in compute commitments over eight years. Of its reportedly 900 million weekly users, only about 5% pay anything.

Anthropic's financials look meaningfully healthier in trajectory. It reported annualized revenue rising from $1 billion to $9 billion to $19 billion and then to a $47 billion run-rate, and forecasts cutting its cash burn to roughly one-third of revenue in 2026 and 9% by 2027. That is a genuinely different curve. But healthier-than-OpenAI is not the same as justifying a near-trillion-dollar valuation, and even bullish analysts note that all three major players are still losing more than they make.

This is the engine of the hype. When your valuation is 15 to 50 times revenue, and that revenue does not yet cover your costs, the gap between what you are worth and what you have proven has to be filled with a story. The story is that these systems are on a smooth path toward general intelligence that will automate enormous swaths of human labor, and that the company telling the story will own that future. The word "AI," with its science-fiction freight, does the storytelling work that the actual product cannot. The same dynamic, where official narratives are constructed to obscure the real numbers, connects to our examination of how economic data is built to tell the most comfortable version of the story.

The Commodity Problem

There is an additional crack forming under the whole structure. Cutting-edge model capability is becoming cheap and abundant, with open-source models now trailing the best proprietary ones by single-digit percentages on real benchmarks. On the hardware bug-repair benchmark HWE-Bench, the top open-source model came within 7.6 points of the best proprietary model. The trillion-dollar valuations assume these companies keep their pricing power and that customers have no alternative. If frontier-level capability becomes a commodity, the moat that justifies the valuation evaporates, and so does the premium people are paying.

So Is It a Scam?

"Scam" is too strong if it implies the technology is fake. It is not. The tools are real and genuinely useful, and people use them to ship code and get work done every day. That part is not in dispute.

The more precise charge is that the framing is a confidence game built on a deliberate category error. Three things are being conflated to create value that the product alone does not support. A narrow text predictor gets sold under a term so broad it implies almost anything, in order to evoke a capability that does not exist and may never arrive on the current technical path. The hallucination rates, the 37% lab-to-deployment gap, the majority failure on real office tasks, and the sub-1% scores on genuine novel reasoning are all publicly measurable. They sit in tension with valuations that only make sense if you believe the AGI story.

You can hold both truths at once. These are remarkable tools that have changed how a lot of people work. And the companies building them are, in many cases, valued at multiples that require a leap of faith the current evidence does not earn. The marketing exists to make that leap feel inevitable rather than optional. The data suggests it is very much optional, and that the most important word in the entire industry is doing far more financial work than the technology behind it.

☕

Support Independent Journalism Help keep Societal News free and independent No paywalls. No corporate owners. Just the truth, powered by readers like you.

☕ Buy us a coffee

Frequently Asked Questions

What is the difference between AI, LLM, and AGI?

Artificial Intelligence is a broad academic field dating to 1956 covering any system that performs tasks associated with intelligence. A Large Language Model is the specific thing nearly every AI product today actually is: a statistical engine trained on text to predict the most likely next token. Artificial General Intelligence is a system that can learn and perform any intellectual task a human can. AGI does not exist. The marketing sleight of hand is that companies sell you an LLM, label it AI, and let your imagination fill the word with AGI.

How often do AI language models hallucinate or make factual errors?

Across modern LLMs, factual error rates run from roughly 15% to over 50% depending on the model and task, with most clustering in the 20 to 27% range. In legal contexts the average sits near 18.7% and in scientific contexts near 16.9%. On high-complexity tasks the rate exceeds 33%. In adversarial testing, citation fabrication has been measured as high as 94%. Enterprise systems show roughly a 37% gap between lab benchmarks and real-world deployment performance.

Are AI companies like OpenAI and Anthropic profitable?

No. OpenAI generated around $20 billion in annualized revenue while projecting a $14 billion loss for 2026, spending $1.69 for every dollar of revenue. It projects operating losses near $74 billion in 2028. Anthropic reported a rising revenue trajectory with a $47 billion run-rate but still loses more than it makes. All three major players are currently valued at 15 to 50 times revenue while spending more than they earn.

Can AI actually do real office work and knowledge tasks?

Current AI systems fail the majority of tasks on TheAgentCompany, a benchmark mimicking actual workplace tasks. Administrative and finance work produced the lowest scores, with many models completing none of the tasks successfully. On ARC-AGI-3, a test of genuine novel reasoning, frontier models score below 1%, while ordinary humans handle it easily. The systems perform best at software engineering because that is what they were most heavily trained and benchmarked on.

Is AI a scam?

Scam is too strong if it implies the technology is fake. The tools are real and useful. The more precise charge is that the framing is a confidence game built on a deliberate category error: a narrow text predictor gets sold under a term implying almost anything, to evoke a capability that does not exist. The hallucination rates, the 37% lab-to-deployment gap, the majority failure on real office tasks, and the sub-1% scores on novel reasoning all sit in tension with valuations that only make sense if you believe the AGI story.

Kai Tutor | The Societal News Team

Follow Us!
It helps decentralize our presence across the web and it's completely free!
Instagram ➤
Youtube ➤
Substack ➤
X.com ➤
Telegram ➤
TikTok ➤

The Naming Game: How "AI" Became the Most Profitable Word in History

Before the Argument, the Definitions

What We Actually Have Right Now

Why "AI" Gets Marketed This Hard

The Commodity Problem

So Is It a Scam?

Frequently Asked Questions

Your Money Is Worth Less Than They're Telling You

The Phantom Workforce: How America's Ghost Job Epidemic Is Rigging the Hiring System

The Only One Going Backward: The U.S. Is the Only G7 Nation Where Quality of Life Is Getting Worse