Profiling

Stack: torch-spyre (new, Inductor-based).

Scope: performance — why is it slow? For correctness questions (why is the result wrong?) see Debugging.

Torch-Spyre provides tooling to measure the performance of PyTorch workloads running on the Spyre accelerator. The full design of the planned toolkit is in RFC 0601 — Spyre Profiling Toolkit.

The in-tree torch_spyre.profiler package is currently a scaffold — torch_spyre.profiler.is_available() returns False, and there is no public API yet. Profiling today goes through torch.profiler plus the external integrations described on this page (kineto-spyre, aiu-smi, aiu-trace-analyzer); the in-tree API will be populated as RFC 0601 lands.

What can be profiled today

Capability

Status

Where

Compiler pipeline logs

Available

Environment variables

CPU-side timing with torch.profiler

Available

PyTorch Profiler

Device telemetry (power, temperature, bandwidth)

Available (IBM-internal distribution; public release tracked in #1335)

Device monitoring

Device-side kernel timing via ProfilerActivity.PrivateUse1

Preview (requires kineto-spyre wheel)

PyTorch Profiler

Trace post-processing (aiu-trace-analyzer)

Available, known gaps

Trace analysis

torch.spyre.memory_allocated() / max_memory_allocated()

Planned

RFC 0601

Scratchpad utilization metrics

Planned

RFC 0601

IR-instrumentation-based fine-grained profiler

Planned

RFC 0601

Toolkit layers

Layer

Tool

Granularity

Application / PyTorch

torch.profiler + kineto-spyre

Kernel-level

Compiler frontend

Inductor logging

Pass-level

Compiler backend

IR instrumentation (planned)

Intra-kernel

Runtime

libaiupti kernel + memory events

Kernel + memory

Device / HW

aiu-smi

Device-level telemetry

Post-processing

aiu-trace-analyzer

Derived metrics

Contents

See also

Work in Progress

Some subsystems above are labelled Planned and are under active development as part of RFC 0601. The APIs reflect planned design and may change.