PyTorch Profiler on Spyre
Stack: torch-spyre (new, Inductor-based).
torch.profiler.profile is the entry point for per-op timing on Spyre.
Two modes are available:
CPU-only — no extra install; measures host-side Python and
torch.compileactivity.CPU + PrivateUse1 — measures CPU and Spyre-side kernel activity; requires the
kineto-spyrePyTorch wheel.
CPU-only (no extra install)
import torch
from torch.profiler import profile, ProfilerActivity
compiled = torch.compile(model, backend="spyre")
with profile(activities=[ProfilerActivity.CPU]) as prof:
output = compiled(x_spyre)
print(prof.key_averages().table(sort_by="cpu_time_total"))
This captures CPU wall-clock for every ATen call and every Dynamo / Inductor stage.
CPU + PrivateUse1
Install a matching kineto-spyre wheel for your
PyTorch version (check the releases page for
the current combination). Example URL for PyTorch 2.10.0:
uv pip install --no-deps --force-reinstall \
https://github.com/IBM/kineto-spyre/releases/download/torch-2.10.0.aiu.kineto.1.1.1/torch-2.10.0+aiu.kineto.1.1.1-cp312-cp312-linux_x86_64.whl
Then profile with ProfilerActivity.PrivateUse1:
import torch
from torch.profiler import profile, ProfilerActivity
with profile(
activities=[ProfilerActivity.CPU, ProfilerActivity.PrivateUse1],
record_shapes=True,
profile_memory=True,
on_trace_ready=torch.profiler.tensorboard_trace_handler("./logs/mymodel"),
) as prof:
compiled_result = compiled(x_device).cpu()
Print aggregates
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10).replace("CUDA", "AIU"))
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10).replace("CUDA", "AIU"))
The .replace("CUDA", "AIU") is a cosmetic workaround — the profiler’s
internal column category is still named after CUDA; native renaming is
on the roadmap.
Export a trace for viewers
prof.export_chrome_trace("spyre_trace.json")
See Trace analysis for viewing.
Advanced features
Full reference lives in the upstream PyTorch profiler documentation:
record_function— annotate named spansschedule— skip warmup, sample a bounded windowon_trace_ready— stream to TensorBoard-compatible JSONwith_stack— include file and line for Python ops
Known issues (from torch-spyre-docs)
Multi-AIU communication profiling is not supported yet.
See also
Trace analysis — viewers for the traces
Device monitoring —
aiu-smitelemetry alongsidetorch.profiler