Inference Economics: Why a $13B Valuation Is a Bet on the Token Spread
When an inference provider raises at a $13B valuation, investors are buying the spread between what a token costs to serve and what you are charged. That spread is why your API price is not a cost floor.
Jun 21, 2026 · 4 min read
Key takeaways
Baseten is reportedly raising $1.5B at a $13B valuation, months after its last mega-round.
The 'spread' is the gap between an LLM provider's serving cost and the price you pay per token.
That spread is the entire reason inference is a venture category, not just a line item.
Your API price is a margin-bearing number that can compress (prices fall) or reprice (you move to a pricier tier).
Model your unit economics as a range (today, -30%, +50%), not a single price.
What is the inference 'spread'?
When you call an LLM API, you see one number: dollars per million tokens. Underneath sits the serving cost, GPU time, utilization, batching efficiency, and memory bandwidth, that the provider actually pays. The gap between the two is the inference margin, or the spread. It is the entire reason inference providers can raise billions: investors are betting that spread stays wide at scale, across many models.
Why does the spread matter for your costs?
Because it means the price you pay is not a cost floor. It is a commercial number sitting on top of a falling serving cost, and commercial numbers move. Most builders anchor their model to today's API price as if it were physics. It is not. Treating one price as permanent is the most common mistake in AI unit economics.
How can the token price move?
Scenario 1: the spread compresses
Competition and better GPU utilization push prices down. Good for your COGS, but if you priced your product assuming today's rates, you may have left a price cut on the table that a competitor takes first.
Scenario 2: the spread reprices
A provider repositions, deprecates a cheap model, or moves you to a pricier tier. Your COGS jumps with no change in your product. With no pricing buffer, your margin absorbs the whole hit.
How do you model AI cost when the price keeps moving?
Stop quoting a single number. Instead:
Model a range, not a price. Run your unit economics at today's rate, at -30%, and at +50%. If your margin survives only one of those, you have a bet, not a pricing model.
Track cost per outcome, not per token. Feature cost = tokens per action x actions per user. Token price is one input; usage is the other.
Re-run on every model swap. A new model resets both the price and the tokens per task, sometimes in opposite directions.
The takeaway
Baseten's investors are betting the spread stays wide. As a builder, you are exposed to the same spread from the other side: every basis point of inference margin a provider captures is a basis point off your COGS. You cannot control the spread, but you can model your sensitivity to it. You can run your token costs at today's rate, a price cut, and a price hike side by side in Calcaas.
Frequently asked questions
What is the inference spread in AI pricing?
The inference spread is the gap between a provider's cost to serve a token (GPU time, utilization, batching) and the price it charges you per token. It is the provider's gross margin on inference.
Is the API price I pay a fixed cost floor?
No. API pricing is a commercial decision layered on top of a falling serving cost. It can drop as competition increases or rise if a provider repositions or moves you to a different tier.
How should I model AI costs if token prices keep changing?
Model a range rather than a single price. Run your margin at today's rate, a discount (for example -30%), and an increase (for example +50%), and confirm it survives all three before committing to a price.
Why are investors paying so much for inference startups?
Because they are betting the spread between serving cost and token price stays wide and durable at scale. A large valuation like $13B prices in years of sustained inference margin.