Static math · runs locally

CUDA Occupancy Calculator

Estimate theoretical occupancy from block size, registers per thread, and shared memory per block. The calculator surfaces which resource binds first and how nearby block sizes compare.

Kernel and architecture

Examples
Compare against a second architecture

Occupancy results

Occupancy
Limited by: —

Per-resource utilization

Derived

Block-size sweep

How it works

For a kernel, occupancy is the ratio of active warps to the maximum resident warps on a single SM. Each SM has caps on threads, warps, blocks, registers, and shared memory. The number of resident blocks per SM is the minimum of what each cap allows.

The math here matches what the NVIDIA CUDA occupancy spreadsheet does. Registers are allocated per warp, rounded up to the architecture's allocation unit. Shared memory per block (static plus dynamic) is rounded up to the shared-memory allocation unit. The candidate count of resident blocks for each constraint is the cap divided by the per-block usage. The smallest of those candidates wins, and that constraint is the limiter.

What "high occupancy" actually means

High theoretical occupancy is a useful target but not always a useful goal. Modern GPUs can hide latency at much lower occupancy if the kernel has enough instruction-level parallelism. Use this tool to understand which resource is binding so you can decide whether reducing it is worth the engineering effort.

Edge cases and notes

FAQ

Why does my actual occupancy look different in Nsight?
Nsight reports achieved occupancy, which factors in launch bound mismatches, divergence, and runtime conditions. This calculator reports theoretical occupancy — an upper bound that ignores those.
Where do these architecture numbers come from?
The CUDA Programming Guide's "Compute Capabilities" table and the NVIDIA Occupancy Calculator. They are versioned in the source.
Why is my occupancy 0?
You probably exceeded a per-block resource cap. The issue list above the bars explains which one. If shared memory per block is larger than the SM cap, no block can fit at all.