Torch-Spyre Documentation
Torch-Spyre is the PyTorch backend for the IBM Spyre AI Accelerator.
It enables standard PyTorch models to run on the Spyre device with full
torch.compile support via a custom Inductor backend.
New to Torch-Spyre?
Three on-ramps depending on what you need:
Just want to run a model? Start with Run PyTorch on Spyre device.
Need the mental model? Read Key concepts — a 5–10 minute primer on dataflow execution, sticks and tiled tensors, the LX scratchpad, the eager vs compiled paths, and graph breaks.
Want the design story? How Torch-Spyre works: an out-of-tree PyTorch backend walks through the four challenges we hit and the PyTorch extension mechanisms that addressed each one.
For a one-line definition of a specific term, jump to the Glossary.
For Users
- Getting Started
- How Torch-Spyre works: an out-of-tree PyTorch backend
- A device with a different execution model from a GPU
- Challenge 1: making PyTorch recognize a new device
- Challenge 2: teaching PyTorch a memory layout it had never seen
- Challenge 3: extending TorchInductor for dataflow compilation
- Challenge 4: covering ops in a model forward pass
- What we learned
- What is next
- Getting started
- Appendix: extension point reference for out-of-tree PyTorch backends
- Acknowledgments
- Key concepts
- 1. Execution model
- 2. Hardware
- 3. Memory hierarchy
- 4. Sticks and tiled tensors
- 5. Eager vs compiled path
- 6. Graph breaks
- 7. Compilation pipeline
- 8. Dtype defaults and casting
- 9. Running models today: FMS vs stock HuggingFace
- 10. Hardware constraints checklist
- 11. Streams
- 12. Distributed execution
- Where to go next
- Glossary
- Installation
- Run PyTorch on Spyre device
- More Examples
- How Torch-Spyre works: an out-of-tree PyTorch backend
- User Guide
- API Reference
For Developers
- Architecture
- Compiler Stack
- Overview
- Inductor Front-End: Deep Dive
- Back-End Compiler (DeepTools)
- KTIR (Kernel Tile IR)
- Spyre Inductor Operation Cookbook
- Working Set Reduction - Design Document
- Coarse-Tiling Loop IR for the Spyre Backend
- Work Division Planning
- What work division does
- Three-pass planner overview
- Key concepts
- Pass 1 — Span Reduction (
span_reduction) - Pass 2 — Cost-Model Matmul Division (
cost_model_matmul_division) - Pass 3 — Work Distribution (
work_distribution) - Worked example: large matmul on 32 cores
- Interaction with SDSC and scratchpad planning
- User Work-Division Hints
- Limitations and Future Work
- See Also
- Scratchpad (LX) optimization
- Runtime
- Contributing
- RFCs
- Index
- Summaries
- RFC 0047 — Tensors with Device-Specific Layouts
- RFC 0171 — Spyre Device Construct in PyTorch
- RFC 0186 — Test Frameworks
- RFC 0601 — Spyre Profiling Toolkit
- RFC 0682 — Kernel Tile Intermediate Representation (KTIR)
- RFC 1287 — Test Suite Configuration for Upstream PyTorch Tests on OOT Devices
- RFC 1632 — Model Enablement Tracking
- RFC 1633 — End-to-End Model Performance Testing