IBM Spyre device

This document provides an overview of the Spyre device.

What is Spyre

The Spyre AI Card, also known as the IBM Spyre Accelerator, is a high-performance, energy-efficient AI accelerator. Currently, it is generally available for IBM Z, LinuxONE, and Power systems.

Spyre Accelerators are engineered to support the development of higher-accuracy AI techniques, enabling real-time generative asset creation, customer data ingestion and interpretation for outreach, cross-selling, and risk assessments.

The IBM Spyre AI Card

The IBM Spyre AI Card. Image credit: IBM Research.

Key features

Some of the key features of the Spyre device are listed below:

  • It is equipped with 32 AI accelerator cores, capable of handling matrix operations and low‑precision workloads for high throughput.

  • It is manufactured using advanced 5nm node technology.

  • Each card supports up to 128 GB of LPDDR5 memory, with ensembles of up to eight cards delivering 1 TB memory and massive AI performance.

  • It delivers exceptional AI compute, exceeding 300 TOPS per card, while consuming just 75W.

  • PCIe gen5 x16 host interface (PCIe form factor card).

Core microarchitecture

Each Spyre core is built from two corelets that share a single 2 MB LX scratchpad (SRAM). Inside each corelet there is an 8 × 8 systolic Processing Element (PE) array, used for matrix-style compute on the PT execution unit, plus a 1D Special Function Unit (SFU) for non-linear activations such as GELU and softmax.

Cores talk to each other over a bi-directional ring interconnect at 128 B per cycle per direction. The architecture descends from IBM’s research-stage RaPiD AI accelerator (Venkataramani et al., ISCA 2021, DOI:10.1109/ISCA52012.2021.00021).

Memory and tiling constants

The runtime, compiler, and tensor-layout code all share one tiling constant:

BYTES_IN_STICK = 128

A stick is a 128-byte aligned memory chunk, which works out to 64 elements at fp16. The size matches the natural granularity of data transfers between LPDDR5 device memory and the per-core LX scratchpad, so the hardware can pull in a full stick of contiguous elements in a single transfer.

Production deployments

As of 2025, Spyre is shipping in two production systems. IBM z17 mainframes support up to 48 Spyre cards, each delivering 300+ TOPS (see the IBM Z press release). IBM Power11 servers run the same silicon as a 75W PCIe gen5 x16 card with 128 GB of LPDDR5 memory (see the IBM Power11 press release). The Torch-Spyre integration described in these docs targets that PCIe card configuration.

Use cases

The Spyre device is designed for enterprise AI workloads including:

  • Real-time fraud detection

  • Code generation and assistance

  • Large language model inference

  • Multi-model ensemble inferencing

Integration with PyTorch

The Spyre device is integrated with PyTorch as a custom backend device, enabling standard PyTorch models to leverage Spyre’s AI acceleration capabilities. See the Getting Started guide for setup and usage instructions. The examples section provides annotated code examples.

IBM and the PyTorch community are collaborating to broaden Spyre’s integration into the open-source AI stack, including vLLM and torchtitan. See Expanding AI model training and inference for the open-source community for details.

Learn more

Refer to the official product pages and IBM Research blogs to learn more about the Spyre device.

Product pages

IBM Research blogs

Internal reference