Profiling
Stack: torch-spyre (new, Inductor-based).
Scope: performance — why is it slow? For correctness questions (why is the result wrong?) see Debugging.
Torch-Spyre provides tooling to measure the performance of PyTorch workloads running on the Spyre accelerator. The full design of the planned toolkit is in RFC 0601 — Spyre Profiling Toolkit.
The in-tree torch_spyre.profiler package is currently a scaffold —
torch_spyre.profiler.is_available() returns False, and there is no
public API yet. Profiling today goes through torch.profiler plus the
external integrations described on this page (kineto-spyre,
aiu-smi, aiu-trace-analyzer); the in-tree API will be populated as
RFC 0601 lands.
What can be profiled today
Capability |
Status |
Where |
|---|---|---|
Compiler pipeline logs |
Available |
|
CPU-side timing with |
Available |
|
Device telemetry (power, temperature, bandwidth) |
Available (IBM-internal distribution; public release tracked in #1335) |
|
Device-side kernel timing via |
Preview (requires |
|
Trace post-processing (aiu-trace-analyzer) |
Available, known gaps |
|
|
Planned |
|
Scratchpad utilization metrics |
Planned |
|
IR-instrumentation-based fine-grained profiler |
Planned |
Toolkit layers
Layer |
Tool |
Granularity |
|---|---|---|
Application / PyTorch |
|
Kernel-level |
Compiler frontend |
Inductor logging |
Pass-level |
Compiler backend |
IR instrumentation (planned) |
Intra-kernel |
Runtime |
|
Kernel + memory |
Device / HW |
|
Device-level telemetry |
Post-processing |
Derived metrics |
Contents
Environment variables — logging, device enumeration, runtime/driver variables used by
aiu-smiandaiu-trace-analyzerPyTorch Profiler —
torch.profilerusage, CPU today, device-side previewDevice monitoring —
aiu-smisetupTrace analysis — Chrome / Perfetto / TensorBoard viewing and
aiu-trace-analyzerpost-processingPerformance analysis methodology — bounding a region and pairing traces with telemetry
Toolkit usage matrix — which tool for which metric
End-to-end example — profiling a Granite model on Spyre, gluing all four tools into one workflow
See also
Debugging — correctness-focused workflow, including
TORCH_COMPILE_DEBUGartifacts and thesendnnbisectRunning Models —
torch.compileusageCompiler Architecture — pipeline overview
RFC 0601 — full profiling toolkit design
Contributing to the Profiler — branch / commit conventions, build flag, test layout, and review process for the profiling squad
Work in Progress
Some subsystems above are labelled Planned and are under active development as part of RFC 0601. The APIs reflect planned design and may change.