All articles
LLM Economics

What a $28B Neocloud Tells You About Your AI Token Costs

A neocloud is a GPU-only cloud built for AI compute, and when one reportedly clears $28B a year, it is a signal that the compute under your LLM bill is a large, fast-moving cost you should model as a variable, not a constant.

Jun 23, 2026 · 4 min read
What a $28B Neocloud Tells You About Your AI Token Costs

Key takeaways

  • A neocloud is a cloud that rents raw GPU capacity for AI training and inference, separate from hyperscalers like AWS or Azure.
  • Latent Space's AINews argues SpaceX is already running a roughly $28B/yr neocloud, a marker of how much capital is flowing into AI compute.
  • That compute spend sits upstream of the $/1M-token price you pay: provider economics, not magic, set your unit cost.
  • For SaaS founders, the move is to treat token price as a scenario input and re-check margins whenever providers shift.

Why does a $28B neocloud matter for a SaaS founder?

Because the compute it sells is the raw material in your cost of goods sold. Every LLM call you make is GPU time someone bought, marked up, and resold to you as tokens. When billions of dollars pour into GPU capacity, it tells you two things at once: demand for inference is enormous, and the price you pay per token is a downstream effect of a supply chain you do not control. If you model token cost as a fixed line item, you are modeling the one number most likely to move.

What is a 'neocloud', exactly?

A neocloud is a cloud provider built specifically to rent GPUs for AI workloads, rather than general-purpose servers. Think of it as the difference between renting a fully fitted office (a hyperscaler) and renting raw industrial floor space wired for heavy machinery (a neocloud). The second is cheaper per unit of compute and aimed at customers who want one thing: lots of accelerators, running hot. The reported $28B/yr figure, if accurate, would put this kind of operation in the same revenue conversation as mid-size public cloud businesses.

How does GPU spend become your $/1M-token price?

Roughly, in three layers. First, capital: the GPUs, networking, and data-center build. Second, operating cost: power, cooling, staff, and the financing on all that hardware. Third, the model provider's own margin when it wraps that compute in an API and charges you per token. By the time it reaches your invoice, your $/1M-token rate is a blend of all three plus the provider's pricing strategy. That is why the same prompt can cost very different amounts across providers, and why 'cheap GPUs upstream' does not automatically mean 'cheap tokens downstream'.

What should you actually do about it?

Stop treating token price as a constant. Build at least three scenarios for any AI feature: a base case at today's price, a downside if prices rise (capacity tightens or a provider hikes rates), and an upside if they fall (new chips, more competition). Then look at what each does to your gross margin per user. The reframe worth keeping: the neocloud headline is not trivia about SpaceX, it is a reminder that your COGS rides on someone else's capex cycle.

For example, say a feature uses 2M tokens per active user each month at an assumed $4 per 1M tokens. That is about $8 of COGS per user. Hold your price flat and let token cost swing 30% in either direction, and the same feature can quietly move from comfortable to underwater. The numbers are illustrative, but the exercise is the point.

Takeaway: model token cost as a moving input, and your pricing survives the next provider shake-up. You can build best, base, and worst token-cost cases per feature in Calcaas in a few minutes.

Frequently asked questions

What is a neocloud?

A neocloud is a cloud provider focused on renting GPU compute for AI training and inference, as opposed to general-purpose hyperscalers. They optimize for raw accelerator capacity and typically serve AI labs and companies running large inference workloads.

Does cheaper GPU compute mean cheaper LLM tokens for me?

Not directly. Your token price also reflects the provider's operating costs, financing, and margin. Lower upstream GPU costs can push token prices down over time, but providers set their own rates, so verify current pricing rather than assume it.

How should I model AI token costs in my pricing?

Treat token price as a variable. Build base, downside, and upside scenarios, multiply by your real token usage per user, and check the gross margin each one produces. Re-run it whenever a provider changes pricing or you switch models.

Why do token prices differ so much between providers?

Because each provider has different hardware, utilization, and margin targets, and prices each model tier differently. Comparing the $/1M input and output token rates across providers for your actual workload is the only reliable way to know which is cheapest for you.

More from the blog

The Margin Memo

Pricing math, in your inbox.

One short note a week on AI pricing, token economics, and margin. No spam, unsubscribe anytime.