An ASHRAE podcast recently delved into a critical evolution within data centres: the increasing necessity of liquid cooling. Host Justin Seter guided a panel of industry experts – David Quirk, Dustin Demetriou and Tom Davidson – through the intricacies of this technology, driven by the insatiable demands of artificial intelligence (AI) and high-performance Graphics Processing Unit (GPU) applications. This is Part 4 of a nine-part series.

Motherboard elements still rely on traditional air cooling.

Motherboard elements still rely on traditional air cooling. Image by Rawpixel/Freepik.com

The research aims to establish clear criteria, not only for steady-state inlet conditions but also for transient scenarios, considering power transfers and potential fluctuations in the electrical supply. Understanding the allowable temperature variations and their impact on both software operations and the long-term reliability of the hardware is a key objective. The ultimate goal is to develop common terminology and guidelines that enable both IT hardware and infrastructure teams to ensure robust and resilient operations across various availability and redundancy configurations.

Demetriou added a crucial layer to the resiliency discussion, highlighting the evolving nature of data centre deployments. He pointed out that historically, liquid cooling systems were often delivered as fully integrated solutions from a single vendor, encompassing the servers, cooling distribution units, pumps and plumbing. However, the trend towards massive, hyperscale deployments has made this single-vendor approach impractical. Instead, infrastructure is increasingly being assembled from disparate components from various vendors. This disaggregation presents a significant challenge in ensuring seamless interoperability and maintaining the same levels of efficiency and resiliency inherent in integrated solutions. The ongoing research and the development of best practices and guidelines are therefore crucial in enabling the industry to effectively design and commission these disaggregated liquid cooling systems, ensuring they operate reliably and efficiently while speaking a common language across the various stakeholders.

Building on the discussion of escalating power densities and ASHRAE’s initial guidance, the podcast segment delved deeper into the complexities of hybrid liquid-air cooling, the challenges posed by diverse stakeholders in data centre deployments, and the critical need for industry-wide standardisation.

Demetriou clarified that while the focus has been on liquid-to-chip cooling, it’s crucial to understand that most current implementations are hybrid systems. Even in high-density racks, while the CPU, GPU, and potentially some memory modules are liquid-cooled via cold plates, other vital server components like power supplies, RAM, and other motherboard elements still rely on traditional air cooling. This means that even a 50-kilowatt liquid-cooled rack still presents a significant air-cooling load, often estimated around 10 kilowatts. As rack densities continue to climb to 100 kilowatts and beyond, the air-cooling component also increases proportionally, eventually reaching a point where air alone will again become a limiting factor. Demetriou also emphasised that while AI is a primary driver for high-density liquid cooling, other applications may require liquid cooling at lower densities, highlighting the importance of designing data centre infrastructure that can accommodate diverse use cases rather than solely focusing on extreme AI workloads. Designing solely for 400-500 kilowatt racks might necessitate lower water temperatures that are inefficient for lower-density deployments.