IBM Spyre device

This document provides an overview of the Spyre device.

What is Spyre

The Spyre AI Card, also known as the IBM Spyre Accelerator, is a high-performance, energy-efficient AI accelerator. Currently, it is generally available for IBM Z, LinuxONE, and Power systems.

Spyre Accelerators are engineered to support the development of higher-accuracy AI techniques, enabling real-time generative asset creation, customer data ingestion and interpretation for outreach, cross-selling, and risk assessments.

The IBM Spyre AI Card. *Image credit: IBM Research.*

Key features

Some of the key features of the Spyre device are listed below:

It is equipped with 32 AI accelerator cores, capable of handling matrix operations and low‑precision workloads for high throughput.
It is manufactured using advanced 5nm node technology.
Each card supports up to 128 GB of LPDDR5 memory, with ensembles of up to eight cards delivering 1 TB memory and massive AI performance.
It delivers exceptional AI compute, exceeding 300 TOPS per card, while consuming just 75W.
PCIe gen5 x16 host interface (PCIe form factor card).
Each core has a 2 MB LX scratchpad (SRAM), shared between the two corelets within the core.
Each core has a 256 MB limit on the contiguous device-memory span it can address. This is a hardware constraint on the addressable range, distinct from the 2 MB LX scratchpad capacity.

Core microarchitecture

Each Spyre core is built from two corelets that share a single 2 MB LX scratchpad (SRAM). Inside each corelet there is an 8 × 8 systolic Processing Element (PE) array, used for matrix-style compute on the PT execution unit, plus a 1D Special Function Unit (SFU) for non-linear activations such as GELU and softmax.

Cores talk to each other over a bi-directional ring interconnect at 128 B per cycle per direction. The architecture descends from IBM’s research-stage RaPiD AI accelerator (Venkataramani et al., ISCA 2021, DOI:10.1109/ISCA52012.2021.00021).

Memory and tiling constants

The runtime, compiler, and tensor-layout code all share one tiling constant:

BYTES_IN_STICK = 128

A stick is a 128-byte aligned memory chunk, which works out to 64 elements at fp16. The size matches the natural granularity of data transfers between LPDDR5 device memory and the per-core LX scratchpad, so the hardware can pull in a full stick of contiguous elements in a single transfer.

Production deployments

As of 2025, Spyre is shipping in two production systems. IBM z17 mainframes support up to 48 Spyre cards, each delivering 300+ TOPS (see the IBM Z press release). IBM Power11 servers run the same silicon as a 75W PCIe gen5 x16 card with 128 GB of LPDDR5 memory (see the IBM Power11 press release). The Torch-Spyre integration described in these docs targets that PCIe card configuration.

Use cases

The Spyre device is designed for enterprise AI workloads including:

Real-time fraud detection
Code generation and assistance
Large language model inference
Multi-model ensemble inferencing

Integration with PyTorch

The Spyre device is integrated with PyTorch as a custom backend device, enabling standard PyTorch models to leverage Spyre’s AI acceleration capabilities. See the Getting Started guide for setup and usage instructions. The examples section provides annotated code examples.

IBM and the PyTorch community are collaborating to broaden Spyre’s integration into the open-source AI stack, including vLLM and torchtitan. See Expanding AI model training and inference for the open-source community for details.

Learn more

Refer to the official product pages and IBM Research blogs to learn more about the Spyre device.

Product pages

IBM Spyre Accelerator for Z and LinuxONE

IBM Research blogs

Lifting the cover on the IBM Spyre Accelerator — architecture deep-dive and full-stack approach
Enhancing enterprise AI with the IBM Spyre Accelerator — Spyre for IBM Z mainframe AI inference
Expanding AI model training and inference for the open-source community — Spyre in the PyTorch ecosystem

Internal reference

Dataflow Architecture Reference