# PyTorch Profiler on Spyre

**Stack:** torch-spyre (new, Inductor-based).

`torch.profiler.profile` is the entry point for per-op timing on Spyre.
Two modes are available:

1. **CPU-only** — no extra install; measures host-side Python and
   `torch.compile` activity.
2. **CPU + PrivateUse1** — measures CPU *and* Spyre-side kernel activity;
   requires the [`kineto-spyre`][kineto-spyre] PyTorch wheel.

## CPU-only (no extra install)

```python
import torch
from torch.profiler import profile, ProfilerActivity

compiled = torch.compile(model, backend="spyre")

with profile(activities=[ProfilerActivity.CPU]) as prof:
    output = compiled(x_spyre)

print(prof.key_averages().table(sort_by="cpu_time_total"))
```

This captures CPU wall-clock for every ATen call and every Dynamo /
Inductor stage.

## CPU + PrivateUse1

Install a matching [`kineto-spyre`][kineto-spyre] wheel for your
PyTorch version (check the [releases page][kineto-spyre-releases] for
the current combination). Example URL for PyTorch 2.10.0:

```bash
uv pip install --no-deps --force-reinstall \
  https://github.com/IBM/kineto-spyre/releases/download/torch-2.10.0.aiu.kineto.1.1.1/torch-2.10.0+aiu.kineto.1.1.1-cp312-cp312-linux_x86_64.whl
```

Then profile with `ProfilerActivity.PrivateUse1`:

```python
import torch
from torch.profiler import profile, ProfilerActivity

with profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.PrivateUse1],
    record_shapes=True,
    profile_memory=True,
    on_trace_ready=torch.profiler.tensorboard_trace_handler("./logs/mymodel"),
) as prof:
    compiled_result = compiled(x_device).cpu()
```

### Print aggregates

```python
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10).replace("CUDA", "AIU"))
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10).replace("CUDA", "AIU"))
```

The `.replace("CUDA", "AIU")` is a cosmetic workaround — the profiler's
internal column category is still named after CUDA; native renaming is
on the roadmap.

### Export a trace for viewers

```python
prof.export_chrome_trace("spyre_trace.json")
```

See [Trace analysis](trace_analysis.md) for viewing.

## Advanced features

Full reference lives in the upstream
[PyTorch profiler documentation][torch-profiler-docs]:

- `record_function` — annotate named spans
- `schedule` — skip warmup, sample a bounded window
- `on_trace_ready` — stream to TensorBoard-compatible JSON
- `with_stack` — include file and line for Python ops

## Known issues (from torch-spyre-docs)

- **Multi-AIU communication profiling is not supported yet.**

## See also

- [Trace analysis](trace_analysis.md) — viewers for the traces
- [Device monitoring](device_monitoring.md) — `aiu-smi` telemetry
  alongside `torch.profiler`

[kineto-spyre]: https://github.com/IBM/kineto-spyre
[kineto-spyre-releases]: https://github.com/IBM/kineto-spyre/releases
[torch-profiler-docs]: https://pytorch.org/docs/stable/profiler.html