At Hot Chips 2025, Google presented more insights into its next-gen TPU platform called Ironwood and how it scales on a rack level.
The 7th Generation TPU architecture, codenamed Ironwood, was introduced back in April of this year and was said to offer 24x the performance of today's most powerful supercomputers. Google started off Hot Chips 2025 with a recap of its existing TPU systems and how they have progressed over the years.
Back in 2022, Google launched its TPU v4, which offered 4096 chips within a single pod along with 32 GB of 1.2 TB/s HBM memory, and a compute output of 275 TFLOPs per chip. A year later, they introduced the TPU v5p with 8960 chips, 95 GB of 2.8 TB/s HBM memory, and 459 TFLOPs of compute output.
This year, the Ironwood TPU Superpod will offer 9216 chips per pod, with 192 GB of 7.4 TB/s HBM memory, and a massive 4614 TFLOPs of peak FLOPs per chip. That's more than a 16x increase over TPU v4.
Google then continues to deep-dive into the Ironwood Superpod and Max-scale cluster. The core block of this system is the Ironwood SoC. Four of these chips are featured on the Ironwood PCBA motherboard and then implemented inside an Ironwood TPU rack. An Ironwood TPU rack stacks a total of 16 Ironwood PCBAs like trays, which provides a 64-chip solution.
Next up is the interconnect solution. Google leverages its InterChip Interconnect or ICI, which is a scale-up network. This network allows them to go all the way with 43 blocks (one block is 64 chips) of Superpods connected using a 1.8 Petabytes network. Internal communications are handled using a range of NICs
With these, Google offers a suite of racks in a superpod. First up is the Ironwood Superpod, which includes 144 racks; then there's the optical switch chassis, which is used to enable scale-up ICI across multiple blocks; and lastly, there's the CBU rack, which is used for coolant distribution.
Now, coming to the rack itself, Google is using a 3D Torus layout for its TPUs for at least the past three generations. Each logical building block is comprised of a 4x4x4 3D network, which is 64 chips or nodes, that get packaged into a single rack.
The interconnect takes a more hybrid approach with PCB trades, copper passive links to cables, and optical links or fiber that gets connected to the OCS through pad panels. This allows the unit to be very flexible.
At the top of the rack is an upper drip pan, which is used to detect any liquid wall manifold leaker, and right below is the power delivery, which includes two power domains that take 416 volt AC and convert the AC to DC using rectifiers. The entire unit features liquid cooling, and the fully loaded system can enable more than 100kW of power in a single run. That's about it as far as the Ironwood TPU talks are concerned
Signing off
Please sign in
Login and share