About our Data Centers Dataset
US Data Centers Power Capacity Estimation Model
US Data Centers Power Capacity Estimation Model
INTRODUCTION
Data centers are critical infrastructure that house powerful computing resources used for storing, processing, and managing vast amounts of data. The power capacity of a data center refers to the total amount of electrical power it can consume to operate its IT equipment and supporting infrastructure, such as cooling systems and power conditioning units. This capacity is a crucial factor in determining a data center's ability to handle workloads and maintain uptime
The US Data Center Power Capacity Estimation Model is an approach designed to accurately predict the power capacity of data centers across the United States. This model leverages a dataset that includes detailed information on facility size, IT infrastructure, and others. By employing an ensemble modeling technique, we integrate multiple predictive algorithms to enhance the accuracy and reliability of our estimates. This model serves as a vital tool for data center operators, energy planners, and policymakers, providing them with the insights needed to optimize energy usage and plan for future capacity needs effectively.
ABOUT DATA CENTER’S POWER CAPACITY
The power consumption of data centers is significantly influenced by various factors, with artificial intelligence (AI) being a major contributor to recent increases. According to the latest sustainability report from Meta, 14,975,435 MWh has been consumed by their data centers by the end of 2023. For instance, the data center located in Prineville, OR consumed 1,243,306 MWh and by applying a simple formula we can estimate the power capacity of the facility.
Power Capacity (MW) = Energy Consumption (MWh)Time (hours) x Percentage Utilization
Assuming the data center operates continuously over the year (8,760 hours) and utilizes a percentage utilization rate between 70% and 90%, we can estimate a power capacity between 157 MW and 200 MW.
Furthermore, as projected by Newmark, power usage will reach 35 GW by 2030, nearly double the 17 GW in 2022. This growth highlights the critical need for precise power capacity estimates to support advanced technologies. Many existing facilities face challenges adapting to new energy and cooling demands, leading to a surge in new construction aimed at meeting these higher performance standards, which directly influences how companies estimate power requirements to stay competitive.
OUR POWER CAPACITY ESTIMATION METHODOLOGY
This section outlines the methodology employed to estimate the power capacity of data centers across multiple providers and locations, considering various project stages, including announced, active, and under construction. The dataset comprises over 1,000 data center buildings, and the estimation process involved several key steps:
Data Compilation
We compiled a comprehensive dataset that includes critical variables affecting power capacity, such as facility size, power consumption, provider details, geo proximity between data centers and operational status. This dataset serves as the foundation for our analysis.
Outlier Detection
Recognizing the significance of accurate data representation, the initial phase of the model focused on identifying potential outliers. Outliers can be defined as data points that exhibit unexpected behavior, deviating from the expected linear relationship between facility size and power capacity. These anomalies may arise from advancements in cooling technologies or server efficiencies, leading to facilities consuming less power relative to their size. With this, we aimed to enhance the robustness of the modeling process.
Insights were gained from identifying potential outliers among data centers that are currently under construction, those that have been announced, and the latest active facilities. These buildings deviate from the expected linear correlation between facility size and power capacity, likely due to the new innovations these companies are implementing to enhance efficiency.
Model Evaluation
Following the outlier detection, we employed a series of regression models to evaluate their performance in predicting power capacity. The primary metrics for model performance were the coefficient of determination (R²) and the Mean Squared Error (MSE), which quantifies the proportion of variance in the dependent variable (power capacity) that can be explained by the independent variables (facility size and other factors). The MSE provides a measure of how much the predictions deviate from the actual values, indicating the model's accuracy.
An ensemble learning model achieving the highest performance, with an R² value of 0.97 and an RMSE of 12.72, was selected for further analysis.
Prediction Generation
Utilizing the selected model, we generated predictions for the power capacity of the remaining data centers. To enhance the interpretability of these predictions, we also established a prediction interval with a confidence level of 80%. This interval provides a range within which we expect the true power capacity of each data center to lie, accounting for the inherent uncertainty in our estimates.
INSIGHTS
The following chart illustrates the relationship between the data center size and the power capacity, including our new estimations:
In data centers that are under construction or in the announcement phase, the lines are steeper, suggesting higher power requirements as the facility size increases. This aligns with expectations as newer data centers may have greater demands for power infrastructure due to more extensive and modern equipment being installed. For those in the active phase, the slope is more moderate, reflecting potentially more optimized and stable power usage patterns.
This differentiation makes sense when considering the evolving nature of data centers. Recent advancements in technology, particularly in artificial intelligence (AI) chips and the more efficient distribution of IT workloads, have contributed to an overall trend of achieving more computational power without the need for dramatically increasing physical space. These innovations, combined with power management improvements, are creating more power-efficient data centers, allowing for greater productivity and performance in less space, which is reflected in the tilting trend lines.
ABOUT OUR DATASET
Aterio's "US Data Centers Dataset" provides a comprehensive inventory of data centers across the United States, including those currently operational, under development, and planned for future construction.
REFERENCES
Last updated