Last updated: June 19, 2026

Azure Provisioned Throughput (PTU) Calculator

Estimate how many Provisioned Throughput Units you need for your Microsoft Foundry model deployment. Enter your workload parameters below and the calculator determines PTU requirements for Global, Data Zone, and Regional deployment types using the latest Microsoft documentation.

What Are Provisioned Throughput Units (PTUs)?

Provisioned Throughput Units (PTUs) are generic units of model processing capacity that you purchase to power provisioned deployments on Microsoft Foundry. Unlike pay-as-you-go (standard) deployments where you pay per token, PTU deployments give you a reserved block of compute capacity that is allocated exclusively to your workloads — whether you use it or not.

PTU quota is managed per subscription and per region. Each quota defines the maximum number of PTUs that can be assigned to deployments in that subscription and region. Importantly, quota does not guarantee capacity — capacity is allocated at deployment time and held as long as the deployment exists. If capacity is unavailable when you create a deployment, the deployment will fail.

PTU reservations can be shared across a growing portfolio of models sold directly by Azure, including Azure OpenAI models (GPT-5.4, GPT-5.2, GPT-5.1, GPT-5, GPT-4.1, o3, o4-mini, and more), Azure DeepSeek models (DeepSeek-R1, DeepSeek-V3-0324, DeepSeek-R1-0528), Meta Llama (Llama-3.3-70B-Instruct), and Fireworks models (FW-GPT-OSS-120B, FW-Kimi-K2.5, FW-DeepSeek-V3.2, FW-MiniMax-M2.5). For example, if you have a 500 PTU reservation and use 300 for Azure OpenAI models, the remaining 200 can be used for DeepSeek-R1 and automatically share the reservation discount.

When to Use Provisioned Throughput

Choose provisioned throughput deployments when your application has well-defined, predictable throughput requirements — typically production workloads with known traffic patterns. Key scenarios include:

  • Latency-sensitive applications: PTU deployments deliver consistent model processing times because capacity is pre-allocated, unlike standard deployments which may experience variable latency under load.
  • High-throughput production workloads: If you process a large, steady volume of requests, PTUs often provide cost savings compared to per-token pricing.
  • Predictable capacity needs: When you can estimate your RPM, input tokens, and output tokens with reasonable accuracy using this calculator.

For exploratory workloads, variable traffic, or low-volume usage, standard (pay-as-you-go) deployments are usually more cost-effective.

Deployment Types Explained

When creating a provisioned deployment in Microsoft Foundry, you choose from three deployment types:

  • Global Provisioned Throughput (GlobalProvisionedManaged) — Routes traffic across all Azure regions for the highest availability and typically the lowest minimum PTU requirement. Best for workloads without strict data residency constraints.
  • Data Zone Provisioned Throughput (DataZoneProvisionedManaged) — Keeps all data processing within a geographic data zone (e.g., EU or US). Balances availability with data residency compliance.
  • Regional Provisioned Throughput (ProvisionedManaged) — Restricts all traffic to a single Azure region. Required when regulatory or compliance needs demand that data stays in one specific region. Typically has the highest minimum PTU deployment size.

New models are typically onboarded with Global Provisioned first, with Data Zone and Regional options following later. PTU quota and any reservations must match the region and deployment type (Global, Data Zone, or Regional) you intend to use.

Frequently Asked Questions

Looking for gpt-realtime API pricing?

Estimate the cost of gpt-realtime-1.5 audio and text conversations.

gpt-realtime Calculator →