With ISC High Performance 2022 taking place this week in Hamburg, Germany, Intel is using the event’s first in-person event in 3 years to offer an update on the state of their high-performance/supercomputing silicon plans. The big news from this year’s show is that Intel is naming the successor to the Ponte Vecchio accelerator, which the company now announces as the Rialto Bridge.
Intel’s GPU teams, previously appearing on Intel’s roadmaps as “Ponte Vecchio Next,” have pushed the development of Ponte’s successor even as the first major installation of Ponte itself (the Aurora supercomputer) is still being built. Part of the company’s 3-year(ish) roadmap leading to CPUs and accelerators converging with the Falcon Shores XPU, the Rialto Bridge is the part that, if you’ll pardon the pun, Bridge bridges the gap between the Ponte and Falcon and offers an evolution of the Ponte design that leverages newer technology and manufacturing processes.
While Intel doesn’t offer a fully detailed technical breakdown at this early stage in the process, the company does talk a bit about specs at a high level and provides a rendering of the future chip that removes any doubt that this is a Ponte successor. It shows that it is made up of dozens of tiles/chiplets in the same layout as Ponte. The biggest change Intel is talking about today is that they will expand the total number of Xe compute cores from 128 on Ponte to a maximum of 160 on Rialto Bridge – presumably by increasing the number of Xe cores in each compute tile.
In the absence of concrete details on the manufacturing side, Intel at least confirms that Rialto will use newer manufacturing nodes for its construction and its current mix of TSMC N7 (Link Tile), TSMC N5 (Compute) and Intel 7 (Cache & Base) parts. The Intel 4 process is expected to come online this year, so it would make sense to use it to upgrade the base and cache. Ideally, Intel would jump forward in the processing nodes for the compute tiles as well, possibly using this opportunity to move production of those tiles to Intel 4 – although we wouldn’t rule out TSMC N4 either.
However, at the risk of reading too much into a single renderer, Rialto has one notable difference from Ponte when it comes to the compute cores: while Ponte used pairs of compute cores with a cache tile in between, Rialto would at first glance appear to use monolithic panels. This implies that Intel has chosen to integrate the Rambo cache on the compute tile die and that they are willing to create fewer, larger compute tiles. This lends some credibility to the idea of Intel taking over the manufacture of compute tiles (since they already manufacture the cache tiles), but we’ll have to see what Intel announces later.
Interestingly, Intel also promises more I/O bandwidth for Rialto — although again, that’s a very high-level (and unspecific) detail. Ponte is already one of the first products to ship with PCIe 5.0 connectivity, and with PCIe 6.0 hardware still some way off, it may be more about on-chip bandwidth than off-chip bandwidth or the available bandwidth between accelerators with Intel’s Xe Link connection.
HBM3 is also a shoe for Intel’s next-gen accelerators, as it’s already shipping accelerators this year. HPC accelerators live and die based on memory bandwidth, so we’re assuming that’s the first thing Intel is considering for Rialto. And it would align with Intel’s awkwardly worded “More GT/s” since memory bandwidth is often measured in gigatransfers.
Finally, Intel states that Rialto will be based on a newer version of the Open Accelerator Module (OAM) socket specification, which is particularly noteworthy given that the next version of OAM has yet to be announced. Without further details, the biggest differentiating factor seems to be the supported power – while OAM 1.x allows for modules to draw power up to 700 watts, Intel talks about delivering up to 800 watts on a Rialto module. For better or worse, this aligns with the increase in power consumption for the most powerful versions of next-generation HPC accelerators, and is a key factor in the move to liquid and immersion cooling for high-end hardware.
|Compute GPU Accelerator Comparison|
|product||Rialto Bridge||Ponte Vecchio||H100 80GB|
|Tiles (incl. HBM)||31?||47||6 + 1 replacement|
|units of account||160||128||132|
|L2 / L3||?||2 x 204MB||50MB|
|VRAM type||HBM3?||8 x HBM2e||5 x HBM3|
|VRAM width||?||8192 bit||5120 bits|
|Chip-to-chip total BW||?||64 x 11.25GB/s
(4×16 90G SERDES)
|18 x 50GB/s|
|CPU coherence||Yes||Yes||With NVLink4|
|form factors||OAM 2.0 (800W)||OAM (600W)||SXM4 (400W*)|
|release date||Mid 2023 (sample)||2022||2022|
|*Some custom deployments go up to 600W|
Overall, Intel is targeting a 30% increase in “application-level” performance with the Rialto Bridge. Which at first glance isn’t a huge win, but also for a part that comes out about a year after the original Ponte Vecchio. The 25 percent increase in the number of Xe cores means most of that performance increase should be delivered by the additional hardware and not by clock speed changes, but since Intel is giving real-world performance expectations instead of just theoretical throughput, that’s what we’d be Don’t be too surprised if Rialto’s specs were a little richer on paper. Intel is also promising that Rialto should be more efficient than Ponte, which is a reasonable claim at face value given that performance should increase faster than power consumption.
According to Intel’s roadmap, Rialto Bridge is scheduled to begin sampling in mid-2023. Given Intel’s troubles getting Ponte Vecchio out in time – you still can’t get it unless you’re Aurora – that would be a surprisingly fast turnaround for Intel. However, since these are also pipeline designs with a very strong architectural resemblance, ideally Intel won’t have nearly as many teething problems with Rialto as it did with Ponte. But as always, we’ll see what actually happens next year as Intel gets closer to shipping its next accelerator.
All roads lead to Falcon Shores
With the addition of Rialto Bridge to Intel’s HPC plans, the company’s current silicon roadmap is as follows:
Both the HBM-equipped Xeon and HPC accelerator lines are slated to merge with Intel’s first flexible XPU, Falcon Shores, in 2024. First announced at Intel’s winter investor meeting earlier this year, Falcon Shores will be Intel’s first product to bring high-performance CPU and GPU tiles to their logical conclusion by allowing for a configurable number of each tile type. As a result, Falcon Shores includes not only mixed CPU/GPU designs, but also (relatively) pure CPU and GPU designs, making it the successor to Intel’s HPC CPUs and HPC GPUs.
For today’s event, Intel isn’t offering any further details on Falcon Shores – so the company is still talking about aiming for a 5x increase in everything from power efficiency to compute density and memory bandwidth. How they intend to do that, aside from relying on their planned packaging and shared memory technologies, remains to be seen. However, this update provides a better picture of where Falcon Shores will fit into Intel’s product roadmaps by taking a look at how the current HBM Xeon and Xe HPC products will merge into it.
Ultimately, Falcon Shores remains Intel’s power play for the HPC industry. The company is betting that being able to provide a tightly integrated (yet tiled and flexible) experience for everyone with a single API gives them an edge in the HPC market and puts them ahead of traditional GPU-based accelerators. And if they can stick to those plans, then 2024 is shaping up to be a very interesting year in the high-performance computing industry.