Estimate how many Provisioned Throughput Units you need for your Microsoft Foundry model deployment. Enter your workload parameters below and the calculator determines PTU requirements for Global, Data Zone, and Regional deployment types using the latest Microsoft documentation.
Provisioned Throughput Units (PTUs) are generic units of model processing capacity that you purchase to power provisioned deployments on Microsoft Foundry. Unlike pay-as-you-go (standard) deployments where you pay per token, PTU deployments give you a reserved block of compute capacity that is allocated exclusively to your workloads — whether you use it or not.
PTU quota is managed per subscription and per region. Each quota defines the maximum number of PTUs that can be assigned to deployments in that subscription and region. Importantly, quota does not guarantee capacity — capacity is allocated at deployment time and held as long as the deployment exists. If capacity is unavailable when you create a deployment, the deployment will fail.
PTU reservations can be shared across a growing portfolio of models sold directly by Azure, including Azure OpenAI models (GPT-5.4, GPT-5.2, GPT-5.1, GPT-5, GPT-4.1, o3, o4-mini, and more), Azure DeepSeek models (DeepSeek-R1, DeepSeek-V3-0324, DeepSeek-R1-0528), Meta Llama (Llama-3.3-70B-Instruct), and Fireworks models (FW-GPT-OSS-120B, FW-Kimi-K2.5, FW-DeepSeek-V3.2, FW-MiniMax-M2.5). For example, if you have a 500 PTU reservation and use 300 for Azure OpenAI models, the remaining 200 can be used for DeepSeek-R1 and automatically share the reservation discount.
Choose provisioned throughput deployments when your application has well-defined, predictable throughput requirements — typically production workloads with known traffic patterns. Key scenarios include:
For exploratory workloads, variable traffic, or low-volume usage, standard (pay-as-you-go) deployments are usually more cost-effective.
When creating a provisioned deployment in Microsoft Foundry, you choose from three deployment types:
GlobalProvisionedManaged) — Routes traffic across all Azure regions for the highest availability and typically the lowest minimum PTU requirement. Best for workloads without strict data residency constraints.DataZoneProvisionedManaged) — Keeps all data processing within a geographic data zone (e.g., EU or US). Balances availability with data residency compliance.ProvisionedManaged) — Restricts all traffic to a single Azure region. Required when regulatory or compliance needs demand that data stays in one specific region. Typically has the highest minimum PTU deployment size.New models are typically onboarded with Global Provisioned first, with Data Zone and Regional options following later. PTU quota and any reservations must match the region and deployment type (Global, Data Zone, or Regional) you intend to use.
Looking for gpt-realtime API pricing?
Estimate the cost of gpt-realtime-1.5 audio and text conversations.