Custom AI Chips Will Reshape Token Prices: What Builders Should Do Now

Custom silicon from OpenAI, Google, Apple and SpaceX is built to cut inference cost, but that does not guarantee cheaper API prices for you, so model your margins across price scenarios instead of betting on one rate.

Jun 27, 2026 · 4 min read

Custom AI Chips Will Reshape Token Prices: What Builders Should Do Now

Key takeaways

OpenAI revealed Jalapeno, a custom inference chip built with Broadcom, joining Google, Apple and SpaceX in building their own silicon.
The motive is control and lower inference cost, plus reducing single-supplier dependence on Nvidia.
Lower internal cost does not automatically become lower API prices: labs can keep the margin.
A memory-chip crunch is pushing some hardware costs up at the same time, so the net direction is uncertain.
The defense is scenario modeling: stress your unit economics against token-price swings, do not hardcode one number.

Why is everyone suddenly building their own AI chips?

OpenAI just unveiled Jalapeno, its first custom inference chip, built with Broadcom. It joins Google's TPUs, Apple's in-house silicon, and SpaceX's own chip efforts. The pattern is clear: the biggest AI players want out of a single-supplier dependence on Nvidia.

The stated logic is control and efficiency: hardware tuned to a specific workload, the kind of step-change Apple unlocked when it dropped Intel. For inference, custom chips promise lower cost per token at scale.

Does custom silicon mean cheaper tokens for you?

Here is the uncomfortable part. Lower internal cost for a lab is not the same as a lower price on your invoice. Custom silicon can widen the gap between what it costs a provider to serve a token and what they charge for it. That spread is margin, and a provider under pressure to fund data centers and chip programs has every reason to keep some of it.

So the popular narrative, 'custom chips will make AI cheap for everyone,' is only half true. It will make AI cheaper to produce. Whether it gets cheaper to buy depends on competition, not silicon.

Will costs even go down in the near term?

Not necessarily. The same conversation includes a memory-chip crunch pushing some component costs up, and a scramble for data-center capacity. Custom inference chips are a multi-year play, while supply pressures are immediate. The honest answer is that token prices could fall, hold, or rise depending on the segment and the quarter.

That uncertainty is exactly why a single fixed cost assumption is dangerous.

What should builders actually do?

Stop modeling one token price. Model a range. Practical version:

1Take your current blended cost per task.
2Re-run your margins at token prices down 30% and up 30%.
3Check whether your pricing survives the upside case. If a 30% cost increase erases your margin, your pricing, not the chip market, is the risk.
4Re-run whenever a provider ships new hardware or changes prices.

This turns a macro story you cannot control into a number you can.

Takeaway: Custom chips change who controls inference cost, not whether you should plan for it. Model your margins across price scenarios in Calcaas so the next hardware shift is a tweak, not a fire drill.

Frequently asked questions

What is OpenAI's Jalapeno chip?

Jalapeno is OpenAI's first custom inference chip, developed with Broadcom. It is designed to run AI workloads more efficiently than general-purpose hardware and to reduce OpenAI's reliance on a single chip supplier.

Why are tech companies building custom AI chips instead of buying Nvidia?

To gain control, tune hardware to their specific workloads, and lower inference cost at scale, while reducing single-supplier risk. It mirrors the gains Apple saw when it moved from Intel to its own silicon.

Will custom AI chips make API prices cheaper?

Not automatically. Custom silicon can lower a provider's cost to serve a token, but the price you pay depends on competition and the provider's margin strategy. Cheaper to produce does not always mean cheaper to buy.

How should startups plan for changing token prices?

Model a range rather than a single rate. Re-run your unit economics with token prices both lower and higher, confirm your pricing survives a cost increase, and update the model whenever providers change hardware or prices. Place this JSON-LD in a `<script type="application/ld+json">` tag in the page head.

ShareX LinkedIn Facebook

More from the blog

GPT-5.6 Pricing Explained: Sol vs Terra vs Luna Cost Breakdown

LLM Economics

Jun 27, 20264 min read

GPT-5.6 Pricing Explained: Sol vs Terra vs Luna Cost Breakdown

GPT-5.6 ships in three tiers, Sol at $5/$30, Terra at $2.50/$15, and Luna at $1/$6 per million tokens, so the cost decision is now about routing each task to the cheapest tier that clears your quality bar.

How to Cut Your LLM API Costs and Protect Your SaaS Margins

LLM Economics

Jun 27, 20265 min read

How to Cut Your LLM API Costs and Protect Your SaaS Margins

A cheaper model can swing gross margin from roughly 30% to 85%, but only if you model your real token mix first: output tokens, not the headline input price, decide your unit economics.

OpenAI's Internal Token Use Grew Up to 56x: What It Means for Your AI Budget

LLM Economics

Jun 26, 20264 min read

OpenAI's Internal Token Use Grew Up to 56x: What It Means for Your AI Budget

OpenAI's own usage data shows median internal output tokens rising as much as 56x since November 2025, a warning that per-seat AI costs can compound far faster than headline price cuts.

The Margin Memo

Pricing math, in your inbox.

One short note a week on AI pricing, token economics, and margin. No spam, unsubscribe anytime.