GPT-5.6 Pricing Explained: Sol vs Terra vs Luna Cost Breakdown

Q: How much does GPT-5.6 cost per token?

Per 1M tokens, Sol is $5 input and $30 output, Terra is $2.50 input and $15 output, and Luna is $1 input and $6 output. Output is priced six times higher than input across all three tiers.

Q: What is the difference between Sol, Terra, and Luna?

Sol is the flagship for the hardest reasoning, Terra is the balanced everyday tier positioned as competitive with GPT-5.5 at about half the cost, and Luna is the fast, lowest-cost tier. They are capability tiers within the same GPT-5.6 generation.

Q: Is GPT-5.6 prompt caching worth it?

Usually yes for repeated content. Cache writes cost 1.25x the input rate and reads get a 90% discount, so a cached prefix becomes cheaper after roughly two reads within the 30-minute cache window. System prompts and RAG context clear that bar easily.

Q: How do I compare GPT-5.6 to my current model's cost?

Take your real average input and output tokens per task, multiply by each model's per-token prices for a blended cost, add caching where prompts repeat, then scale to cost per active user. Comparing blended cost per user, not headline price, shows the true margin impact. *Place this JSON-LD in a ` ` tag in the page head.*

GPT-5.6 ships in three tiers, Sol at $5/$30, Terra at $2.50/$15, and Luna at $1/$6 per million tokens, so the cost decision is now about routing each task to the cheapest tier that clears your quality bar.

Jun 27, 2026 · 4 min read

GPT-5.6 Pricing Explained: Sol vs Terra vs Luna Cost Breakdown

Key takeaways

Three tiers, one generation: Sol ($5 in / $30 out), Terra ($2.50 / $15), Luna ($1 / $6) per 1M tokens.
Terra is positioned as competitive with GPT-5.5 at about 2x cheaper, resetting the default for everyday work.
The Sol-to-Luna spread is roughly 5x, so tier routing affects margin more than family choice.
New caching rules: cache writes cost 1.25x input, cache reads keep a 90% discount, with a 30-minute minimum cache life.
A cached prefix pays off after about two reads, making caching a real lever for repetitive prompts.

What does GPT-5.6 cost?

OpenAI previewed the GPT-5.6 series with three durable tiers. Per 1M tokens:

Sol (flagship): $5 input, $30 output
Terra (balanced): $2.50 input, $15 output
Luna (fast, low cost): $1 input, $6 output

Output is six times the input price across all three tiers, which matters because most real workloads turn output-heavy once you add reasoning and tool calls.

Which tier should you use?

Terra is the headline. OpenAI positions it as competitive with the prior GPT-5.5 while costing about half. For a lot of everyday product work, that makes Terra the new default, with Luna for high-volume or latency-sensitive paths and Sol reserved for the hardest reasoning.

The bigger point is structural: the spread from Luna to Sol is roughly 5x on both input and output. A product that routes each task to the cheapest tier that still passes quality will beat a product that hardcodes one model, regardless of which family it chose. Tier routing is now a first-class pricing decision.

How does the new prompt caching math change costs?

GPT-5.6 introduces more predictable caching: explicit cache breakpoints, a 30-minute minimum cache life, cache writes billed at 1.25x the uncached input rate, and cache reads keeping the 90% cached-input discount.

Here is the break-even, framed illustratively. Say a fixed prompt prefix would otherwise cost 1 unit of input each time you send it. Without caching, N sends cost N units. With caching, you pay 1.25 once to write, then 0.1 per read. Caching wins when:

1.25 + 0.1N <= N

That solves to N of about 1.4. In plain terms, if a cached prefix is reused roughly twice or more inside the 30-minute window, caching is already cheaper. For chat histories, system prompts, and RAG context that repeat constantly, that threshold is trivial to clear.

What about throughput?

OpenAI also flagged GPT-5.6 Sol on Cerebras at up to 750 tokens per second in July. Faster output does not change the per-token price, but it cuts user-perceived latency and lets more work fit into time-boxed agent runs, which indirectly affects how many tokens a task consumes.

How do you turn this into a margin decision?

Pricing tables do not tell you your cost. Your token mix does. To compare GPT-5.6 tiers against your current provider:

1Pull your average input and output tokens per task.
2Apply each tier's input and output price for a blended cost per task.
3Layer in caching for any repeated prefix using the break-even above.
4Multiply by tasks per active user to get cost per user, then compare to your price.

Takeaway: GPT-5.6 makes tiering and caching the main cost levers, not the brand on the API. Model your token mix across Sol, Terra, and Luna in Calcaas before you commit a default.

Frequently asked questions

How much does GPT-5.6 cost per token?

Per 1M tokens, Sol is $5 input and $30 output, Terra is $2.50 input and $15 output, and Luna is $1 input and $6 output. Output is priced six times higher than input across all three tiers.

What is the difference between Sol, Terra, and Luna?

Sol is the flagship for the hardest reasoning, Terra is the balanced everyday tier positioned as competitive with GPT-5.5 at about half the cost, and Luna is the fast, lowest-cost tier. They are capability tiers within the same GPT-5.6 generation.

Is GPT-5.6 prompt caching worth it?

Usually yes for repeated content. Cache writes cost 1.25x the input rate and reads get a 90% discount, so a cached prefix becomes cheaper after roughly two reads within the 30-minute cache window. System prompts and RAG context clear that bar easily.

How do I compare GPT-5.6 to my current model's cost?

Take your real average input and output tokens per task, multiply by each model's per-token prices for a blended cost, add caching where prompts repeat, then scale to cost per active user. Comparing blended cost per user, not headline price, shows the true margin impact. Place this JSON-LD in a `<script type="application/ld+json">` tag in the page head.

ShareX LinkedIn Facebook

More from the blog

Custom AI Chips Will Reshape Token Prices: What Builders Should Do Now

LLM Economics

Jun 27, 20264 min read

Custom AI Chips Will Reshape Token Prices: What Builders Should Do Now

Custom silicon from OpenAI, Google, Apple and SpaceX is built to cut inference cost, but that does not guarantee cheaper API prices for you, so model your margins across price scenarios instead of betting on one rate.

How to Cut Your LLM API Costs and Protect Your SaaS Margins

LLM Economics

Jun 27, 20265 min read

How to Cut Your LLM API Costs and Protect Your SaaS Margins

A cheaper model can swing gross margin from roughly 30% to 85%, but only if you model your real token mix first: output tokens, not the headline input price, decide your unit economics.

OpenAI's Internal Token Use Grew Up to 56x: What It Means for Your AI Budget

LLM Economics

Jun 26, 20264 min read

OpenAI's Internal Token Use Grew Up to 56x: What It Means for Your AI Budget

OpenAI's own usage data shows median internal output tokens rising as much as 56x since November 2025, a warning that per-seat AI costs can compound far faster than headline price cuts.

The Margin Memo

Pricing math, in your inbox.

One short note a week on AI pricing, token economics, and margin. No spam, unsubscribe anytime.