Among NVIDIA’s announcements tonight at Computex 2022, the company revealed that it is preparing to launch liquid-cooled versions of its high-end PCIe accelerator cards. As an alternative to the traditional air-cooled dual-slot cards, the liquid-cooled cards are offered in a more compact single-slot form factor for improved cooling and higher density. The liquid-cooled A100 will be available in Q3, and a liquid-cooled H100 will be available early next year.
While liquid cooling in the data center is far from new, it has typically been reserved for more custom hardware with extreme cooling and/or density requirements, such as B. the upcoming generation of high-end NVIDIA H100 (SMX) servers. PCIe servers, on the other hand, are all about standardization and compatibility. Which for server graphics cards/accelerators means dual slot cards designed to be used with forced air cooling in a server chassis. This serves the market segment well, but the 300-350 watt TDP of these cards means they can’t go thinner and still be effectively air-cooled – again creating a 4-card limit for standard rackmount systems.
But times are changing and liquid cooling is being implemented in larger capacity data centers to both keep up with cooling increasingly hot hardware and to improve overall data center energy efficiency. To this end, NVIDIA will release liquid-cooled versions of its A100 and H100 PCIe cards to provide data center customers with an easy and officially supported way to install liquid-cooled PCIe accelerators in their facilities.
The cards (pictured above) are essentially an A100/H100 reference, with the traditional dual-slot heatsink replaced by a full-coverage, single-slot water block. Designed for integration by server vendors, they use an open-loop design to be used as part of a larger liquid cooling setup.
However, apart from the change in the cooling system, the specifications of the cards remain unchanged. NVIDIA doesn’t increase the TDPs or clock speeds of these cards, so their performance should be identical to traditional air-cooled cards (as long as they’re not thermally throttled, of course). Put another way, these new cards use liquid cooling to improve power efficiency and density rather than performance.
The first card to hit the market will be the liquid-cooled version of the 80GB A100 PCIe accelerator. That will be available to customers in the third quarter of this year. Meanwhile, a liquid-cooled version of the H100 PCIe is also in development, and NVIDIA expects that to be available in early 2023.
Meanwhile, NVIDIA has been working with Equinix to qualify the liquid-cooled A100 in their data centers and get a picture of the real-world power savings of the new hardware. Interestingly, NVIDIA reports a significant reduction in overall data center power consumption by moving to liquid cooling – for a 2000 server (4000 A100 card) the overall power consumption dropped by 28%. According to NVIDIA, this is due to a combination of overall power savings across the data center from the switch, including everything from improved graphics card power efficiency through lower temperatures to reduced energy requirements through cooling water compared to running large air coolers. All of this underscores why NVIDIA is promoting liquid-cooled hardware as an energy efficiency gain for data center operators looking to reduce power consumption.
And while this first generation of liquid-cooled hardware is designed for efficiency, NVIDIA says that won’t always be the case. For future generations of cards, the company will also consider liquid cooling to improve performance at current power levels – presumably by reinvesting data center-scale gains into higher TDPs for the cards.
While the bulk of NVIDIA’s announcement today (as well as the case study) focuses on PCIe cards, NVIDIA also reveals that they’ve also been working on official, liquid-cooled designs for their HGX systems, which are used to housing the more powerful SMX cards the company. A liquid-cooled HGX A100 is already shipping, and a liquid-cooled HGX H100 is due out in Q4.