# Contributing to the Profiler
This page extends the general [contribution guidelines](guidelines.md) with
conventions specific to the **profiling squad**, the contributors building
out the Spyre profiling toolkit described in [RFC 0601][rfc-0601]. If this
is your first profiler PR, read the guidelines page first. Everything here
assumes you already have a working fork and `pre-commit` set up.
The user-facing docs we ship live under
[Profiling](../user_guide/profiling/index.md). When you change a feature,
walk those pages as a new user would. If the instructions no longer match
reality, fix them in the same PR.
## What makes profiler work different
Most torch-spyre PRs touch one layer. Profiler PRs almost always cross
several:
| Layer | Where it lives | Typical change |
|---|---|---|
| Python API | `torch_spyre/profiler/` | `profile_spyre()` wrapper, `torch.spyre.memory_*` |
| C++ registration | `torch_spyre/csrc/profiler/` | PrivateUse1 observer, kineto wiring |
| Build | `CMakeLists.txt`, `torch_spyre/csrc/profiler/CMakeLists.txt` | Guarded by `USE_SPYRE_PROFILER` |
| Tests | `tests/profiler/` | Skip-marked when `USE_SPYRE_PROFILER` is off |
| External | [`kineto-spyre`][kineto-spyre], [`aiu-trace-analyzer`][ata] | Versioned separately |
| Docs | `docs/source/user_guide/profiling/` | User-visible additions |
Plan PRs accordingly. See [PR scope](#pr-scope) below.
## Branch naming
Use `profiler/-` so a `git branch -r` listing
tells the reviewer what each branch is about without opening the PR:
| Prefix | Use for |
|---|---|
| `profiler/build-…` | Build system, CMake, linking, `USE_SPYRE_PROFILER` |
| `profiler/reg-…` | C++ registration, PrivateUse1 plugin loading |
| `profiler/api-…` | Python APIs in `torch_spyre/profiler/` |
| `profiler/trace-…` | Trace enrichment, post-processing, Perfetto grouping |
| `profiler/mem-…` | Memory profiling at any layer |
| `profiler/test-…` | Test additions |
| `profiler/docs-…` | Documentation, examples |
| `profiler/feat-…` | Multi-PR feature work |
| `profiler/fix-…` | Bug fixes |
Keep `` to **3–5 hyphenated words**. `cmake-libaiupti`
is about right. `sol` is too terse to read at a glance, and
`tex-scratchpad-vram-sol-average` should be split into smaller PRs.
## PR title prefix
Prefix the **PR title** with `[profiler]` so it is easy to slice profiler
work out of `git log` after merge. Because we squash-merge, the PR title
becomes the single commit message landed on `main` — there is no need to
prefix every individual commit on the feature branch. Stack a sub-area
tag when one is obvious:
```text
[profiler] Add profile_spyre() context manager
[profiler][memory] Stub torch.spyre.memory_allocated()
[profiler][trace] Group runtime events under PerfettoSpyreRuntime track
[profiler][test] Add scaffold with USE_SPYRE_PROFILER skip markers
[profiler][docs] Document kineto-spyre wheel install
```
Tooling that slices profiler work looks for this tag in `main` history
(merged work) and in `profiler/*` branches (in-flight work), so the
branch prefix plus the PR title prefix together are enough — individual
commits on the branch can carry whatever message is most useful while
you iterate.
Sign off your commits (`git commit -s`) like every other torch-spyre
commit.
## PR scope
Profiling features tend to grow large because they touch multiple layers.
Keep each PR to **one observable change**, even when the underlying feature
spans several PRs:
* Good: "Add `profile_spyre()` wrapper (Python only, no kineto wiring yet)".
* Good: "Wire kineto observer behind `USE_SPYRE_PROFILER` (no API change)".
* Bad: "Add memory profiling APIs and Perfetto trace grouping". Split it.
If a feature genuinely cannot be split, flag that in the PR description
and in the matching sub-issue so reviewers know what they are signing up
for.
## Building with the profiler enabled
The profiler is gated by a CMake flag so torch-spyre still imports
cleanly without it. Local development usually wants it on:
```bash
USE_SPYRE_PROFILER=1 pip install -e . --no-build-isolation
```
When the flag is **off**, every profiler import path must still succeed
(the import test below covers this). When it is **on**, install the
kineto-spyre wheel:
```bash
pip install kineto-spyre
```
If you change build wiring, verify both the on and off paths build and
import.
## Testing profiler changes
The profiler test suite lives at `tests/profiler/`. The `conftest.py`
exposes a `spyre_profiler_available` fixture that skips tests when the
feature is compiled out, so the same suite works in both build modes.
Run only the profiler tests:
```bash
pytest tests/profiler/ -v
```
For focused iteration on activity / trace / memory / sync subsets:
```bash
pytest tests/profiler/test_spyre_profiler.py -k activity
pytest tests/profiler/test_spyre_profiler.py -k trace
```
Smoke test validation before you open a PR:
1. `import torch_spyre.profiler` succeeds with `USE_SPYRE_PROFILER=0`.
2. `tests/profiler/` passes with the kineto-spyre wheel installed.
3. If you touched trace emission, capture a small trace and open it in
Perfetto. See [Trace analysis](../user_guide/profiling/trace_analysis.md).
4. If you touched device telemetry, sanity-check against `aiu-smi`. See
[Device monitoring](../user_guide/profiling/device_monitoring.md).
## Trace and telemetry sanity checks
Profiler bugs are often invisible at the test level. The test passes, the
trace is wrong. When you change anything that emits trace events or
metrics, attach one of the following to the PR description:
* A short Perfetto screenshot of the affected region.
* The output of `aiu-trace-analyzer` summarizing the trace.
* A diff of the relevant counter values before and after.
This is the most common review request on profiler PRs. Including it up
front saves a round trip.
:::{tip}
**Cross-check with `chrome://tracing` when validating event ordering.**
Perfetto silently truncates overlapping events on the same thread — two
events that overlap (which is impossible within a single thread and
indicates a real bug) are rendered as a single clean span, hiding the
problem. `chrome://tracing` instead renders the overlap as garbled,
intermingled labels, which makes the bug obvious. Using it has two
benefits for "is the trace correct?" reviews:
1. Overlapping/interleaved events on the same thread are visible
instead of hidden.
2. It runs locally — your trace data does not transit a third-party web
service.
Use Perfetto for analysis and presentation, but reach for
`chrome://tracing` when you specifically need to verify that no two
events on the same thread overlap.
:::
## Coordinating with kineto-spyre
[`kineto-spyre`][kineto-spyre] is a separate repository on its own release
cadence. If your change needs a new kineto-spyre symbol or behaviour:
1. Land the change in kineto-spyre **first**, with its own PR and release.
2. Pin the new kineto-spyre version in torch-spyre's requirements.
3. Open the torch-spyre PR with a description line like
*"Requires `kineto-spyre>=X.Y.Z`."*
Do not couple a torch-spyre PR to an unreleased kineto-spyre commit.
Reviewers cannot run it and CI cannot reproduce it.
## Documentation expectations
If you change something a user can see (a new API, a new env var, a new
trace field, a new build flag), update the docs in the same PR. The right
home is almost always under `docs/source/user_guide/profiling/`. Sphinx
runs with `-W` in CI, so warnings will fail your PR. Build locally before
pushing:
```bash
python -m sphinx docs/source docs/build/html -W --keep-going
python -m http.server 8080 --directory docs/build/html
```
## Reviewers
Profiler PRs need a review from a **profiling squad lead** plus **one
other squad member**. CODEOWNERS for `torch_spyre/profiler/`,
`torch_spyre/csrc/profiler/`, and `tests/profiler/` will request the
right people automatically. If GitHub does not auto-request a lead,
request one manually.
[rfc-0601]: https://github.com/torch-spyre/rfcs/blob/main/0601-SpyreProfilingToolkit/0601-SpyreProfilingToolkitRFC.md
[kineto-spyre]: https://github.com/IBM/kineto-spyre
[ata]: https://github.com/IBM/aiu-trace-analyzer