Transparent pricing

How full-capability model pricing works

Use this page to understand how full-capability model access is billed before you send large traffic or recharge for production workloads.

Your request type determines the cost

Text models are usually billed by input, output, and cached tokens. Image models are commonly billed by image count, size, and quality tier. Video, audio, and other media tasks are usually billed by duration, resolution, or completed task. Check the model catalog for the current unit price.

Three common billing methods

Billing type	Typical requests	How cost is calculated	Watch closely
Token billing	Chat, reasoning, embeddings	Input and output are counted separately; some models price cache usage separately	Context, output length, cache hits
Per image / request	Image generation and editing	Image count combined with size, quality tier, or request type	Count, resolution, quality, edits
Media task	Video, audio, asynchronous generation	Duration, resolution, or successfully completed task	Duration, quality, status, retries

View live model prices Read billing and usage docs

Pricing guidance

Review pricing guidance before sending production traffic to a new model.
Different service plans or billing rules can apply different prices for the same model.
Some models are billed per request or per generated asset instead of token usage.

Pricing FAQ

Why can full-capability models cost more?

Because flagship-grade access usually prioritizes better reasoning, richer capability, and more complete model behavior, its usage cost can be higher than lighter or reduced alternatives.

How do I know what a model will cost me?

Use this page as the pricing guide first, then check the live price shown in the product before heavy usage, because different models and request types can cost differently.

What should I check before sending large traffic or topping up?

Before large usage, confirm the current price, your balance or quota, and whether the target model supports the request type you plan to use.

For teams, not just self-serve

If pricing is tied to workflow risk, the next step is enterprise contact

If your team is comparing access stability, full-capability model behavior, usage records, and support expectations, a self-serve pricing table is only the first step. Use the enterprise page when you need to align on workload, budget, and rollout risk.

Go to enterprise contact Back to home

You run models inside Cursor, Cline, or an internal product.

You need clearer budget control, request logs, and a safer rollout path before scale.

You want to discuss maintained access resources, troubleshooting, or procurement workflow.

Your request type determines the cost

Three common billing methods

Billing type	Typical requests	How cost is calculated	Watch closely
Token billing	Chat, reasoning, embeddings	Input and output are counted separately; some models price cache usage separately	Context, output length, cache hits
Per image / request	Image generation and editing	Image count combined with size, quality tier, or request type	Count, resolution, quality, edits
Media task	Video, audio, asynchronous generation	Duration, resolution, or successfully completed task	Duration, quality, status, retries

Pricing FAQ

Why can full-capability models cost more?

Because flagship-grade access usually prioritizes better reasoning, richer capability, and more complete model behavior, its usage cost can be higher than lighter or reduced alternatives.

How do I know what a model will cost me?

Use this page as the pricing guide first, then check the live price shown in the product before heavy usage, because different models and request types can cost differently.

What should I check before sending large traffic or topping up?

Before large usage, confirm the current price, your balance or quota, and whether the target model supports the request type you plan to use.