The necessary modularity of data centers to meet the challenge of AI


The sudden democratization of large-scale artificial intelligence over the past year, particularly new generative AI applications such as ChatGPT, has imposed a number of new technical requirements on the data centers where these applications are hosted. The infrastructure that supports them will therefore consume more energy, process more data and use more bandwidth than before, all in facilities that may have been built more than 20 years ago. These must now adapt to support these new orders of magnitude, particularly in terms of density per rack. The only way to achieve this is to adopt a modular design.

Data centers can seem like very static entities; huge buildings with rows of generators and a multitude of equipment, all carefully designed to keep the facility running uninterrupted, whether under normal operating conditions, or in the event of a total power grid failure . However, modern data centers are anything but static; Many installations are designed from the start to be highly modular: a floor can thus evolve to accommodate a change in network topology, airflow or physical redundancy several times a year if necessary. What is the reason for this, and how can we achieve it?

The widespread emergence of AI-related deployments in data centers demonstrates how quickly customer needs can evolve. While in 2022 a data center operator could forecast an average power consumption of 10 kilowatts per rack for customer equipment, the need for 25, 50 or even 100 kilowatt racks is becoming increasingly important, and growing. . With a traditional static design, this can create many issues in terms of performance, maintenance and redundancy.

More bandwidth, more energy, but static designs

First, these dense racks often require more network bandwidth to operate at their highest level of efficiency. A point too often overlooked which can generate dissatisfaction on the part of a customer who is unable to deploy high rack density without obtaining adequate bandwidth.

Second, an uneven increase in power consumption across a data center floor can often strain a cooling system that was not designed to handle these types of hot spots. Dense racking at one end of a row can easily cause temperatures to rise at the other end.

Finally, resilience and redundancy measures are calculated based on the location of specific electrical loads in the installation, and their distribution. If a very dense group of equipment is added to an area, the electrical capacities of the generators may not be guaranteed due to static designs.

Understanding the actual cooling state of an installation

Each of these concerns can have significant consequences, ranging from the inability to operate one’s AI equipment to its maximum performance potential, to the possibility of unwanted downtime in the event of a power outage. or voltages from the local electricity network. Using a highly adaptable, modular design framework, these issues can be solved within any data center, regardless of its age.

For example, spaces can be repurposed or designed early in the installation for use as additional network rooms to allow for the installation of more circuits, switches and routers to increase the bandwidth of the network to the client over time. At the same time, a modular method of designing and deploying overhead cable trays allows the data center operator to physically bring this connectivity to the customer, which is often overlooked in static designs. Some AI technologies, like InfiniBand, can use heavy, bulky cabling that can only be installed in a modular fashion to avoid real performance and operational issues down the line.

Understanding the true cooling status of a facility, through the use of CFD (Computational Fluid Dynamics), allows the data center operator to identify trapped airflow, airflow patterns unintended airflow that can result in unoptimized cooling, and where there is additional air capacity that can be used to cool dense, and particularly hot, AI deployments.

Why a Modular Power Configuration

Many data centers can also be modular enough to move from an air-cooling-only configuration to a hybrid configuration where both air and liquid cooling (AALC and DLC) are available, as needed, allowing for deployments of AI to take place as part of an existing data center or larger dedicated space.

With a modular power configuration – where the data center is conceptualized as a series of blocks, each with its own power, redundancy and cooling infrastructure – core components can be sized and deployed appropriately depending on the deployment from the customer, to ensure that as deployments are added to a space, even if they differ significantly in power consumption, they can be supported.

These are just a few examples of how a modular approach to data center design helps ensure that AI deployments, even at very high rack densities, are supported in a high-performance, robust manner. and cost-effective in an existing data center.

Modular designs will mean the difference between the ability to support current and future generations of AI deployments in existing sites and the need to build new ones.



Source link -97