What a $28B Neocloud Tells You About Your AI Token Costs

A neocloud is a GPU-only cloud built for AI compute, and when one reportedly clears $28B a year, it is a signal that the compute under your LLM bill is a large, fast-moving cost you should model as a variable, not a constant.

Jun 23, 2026 · 4 min read

What a $28B Neocloud Tells You About Your AI Token Costs

Key takeaways

A neocloud is a cloud that rents raw GPU capacity for AI training and inference, separate from hyperscalers like AWS or Azure.
Latent Space's AINews argues SpaceX is already running a roughly $28B/yr neocloud, a marker of how much capital is flowing into AI compute.
That compute spend sits upstream of the $/1M-token price you pay: provider economics, not magic, set your unit cost.
For SaaS founders, the move is to treat token price as a scenario input and re-check margins whenever providers shift.

Why does a $28B neocloud matter for a SaaS founder?

Because the compute it sells is the raw material in your cost of goods sold. Every LLM call you make is GPU time someone bought, marked up, and resold to you as tokens. When billions of dollars pour into GPU capacity, it tells you two things at once: demand for inference is enormous, and the price you pay per token is a downstream effect of a supply chain you do not control. If you model token cost as a fixed line item, you are modeling the one number most likely to move.

What is a 'neocloud', exactly?

A neocloud is a cloud provider built specifically to rent GPUs for AI workloads, rather than general-purpose servers. Think of it as the difference between renting a fully fitted office (a hyperscaler) and renting raw industrial floor space wired for heavy machinery (a neocloud). The second is cheaper per unit of compute and aimed at customers who want one thing: lots of accelerators, running hot. The reported $28B/yr figure, if accurate, would put this kind of operation in the same revenue conversation as mid-size public cloud businesses.

How does GPU spend become your $/1M-token price?

Roughly, in three layers. First, capital: the GPUs, networking, and data-center build. Second, operating cost: power, cooling, staff, and the financing on all that hardware. Third, the model provider's own margin when it wraps that compute in an API and charges you per token. By the time it reaches your invoice, your $/1M-token rate is a blend of all three plus the provider's pricing strategy. That is why the same prompt can cost very different amounts across providers, and why 'cheap GPUs upstream' does not automatically mean 'cheap tokens downstream'.

What should you actually do about it?

Stop treating token price as a constant. Build at least three scenarios for any AI feature: a base case at today's price, a downside if prices rise (capacity tightens or a provider hikes rates), and an upside if they fall (new chips, more competition). Then look at what each does to your gross margin per user. The reframe worth keeping: the neocloud headline is not trivia about SpaceX, it is a reminder that your COGS rides on someone else's capex cycle.

For example, say a feature uses 2M tokens per active user each month at an assumed $4 per 1M tokens. That is about $8 of COGS per user. Hold your price flat and let token cost swing 30% in either direction, and the same feature can quietly move from comfortable to underwater. The numbers are illustrative, but the exercise is the point.

Takeaway: model token cost as a moving input, and your pricing survives the next provider shake-up. You can build best, base, and worst token-cost cases per feature in Calcaas in a few minutes.

Frequently asked questions

What is a neocloud?

A neocloud is a cloud provider focused on renting GPU compute for AI training and inference, as opposed to general-purpose hyperscalers. They optimize for raw accelerator capacity and typically serve AI labs and companies running large inference workloads.

Does cheaper GPU compute mean cheaper LLM tokens for me?

Not directly. Your token price also reflects the provider's operating costs, financing, and margin. Lower upstream GPU costs can push token prices down over time, but providers set their own rates, so verify current pricing rather than assume it.

How should I model AI token costs in my pricing?

Treat token price as a variable. Build base, downside, and upside scenarios, multiply by your real token usage per user, and check the gross margin each one produces. Re-run it whenever a provider changes pricing or you switch models.

Why do token prices differ so much between providers?

Because each provider has different hardware, utilization, and margin targets, and prices each model tier differently. Comparing the $/1M input and output token rates across providers for your actual workload is the only reliable way to know which is cheapest for you.

More from the blog

GPU Cloud Providers in Europe 2026: The Real Cost of Data Residency

LLM Economics

Jun 23, 20264 min read

GPU Cloud Providers in Europe 2026: The Real Cost of Data Residency

European GPU clouds offer B200 and H200 capacity with EU data residency and sovereignty, but residency usually carries a price premium that you should model as part of cost per token, not treat as a free checkbox.

Custom AI Chips vs NVIDIA in 2026: What It Means for Your Inference Cost

LLM Economics

Jun 23, 20263 min read

Custom AI Chips vs NVIDIA in 2026: What It Means for Your Inference Cost

Hyperscaler custom chips like Trainium, Google TPU, Maia, and Meta MTIA are built to cut the provider's cost of serving AI, but that only lowers your bill if it shows up as a cheaper per-token price or GPU-hour rate.

Oracle Cloud GPU Pricing in 2026: H100 vs H200 vs B200 Per-Hour Cost

LLM Economics

Jun 23, 20263 min read

Oracle Cloud GPU Pricing in 2026: H100 vs H200 vs B200 Per-Hour Cost

Oracle Cloud prices H100, H200, and B200 GPUs at different per-hour rates, but the cheapest choice depends on your model size and utilization, not on which chip is newest.

The Margin Memo

Pricing math, in your inbox.

One short note a week on AI pricing, token economics, and margin. No spam, unsubscribe anytime.