The rapid expansion of artificial intelligence is leading to predictions that data centers could use up to 12% of the United States’ electricity by 2028, as reported by the Lawrence Berkeley National Laboratory. Scientists are working to enhance the energy efficiency of data centers to make AI more sustainable. MIT researchers, along with the MIT-IBM Watson AI Lab, have developed a quick prediction tool for data center operators to estimate power consumption of specific AI workloads on processors or AI accelerator chips.
This new method provides reliable power estimates in seconds, unlike conventional modeling techniques that take much longer. It can be applied to a variety of hardware configurations, including new designs not yet in use. These estimates help data center operators allocate resources efficiently across multiple AI models and processors, increasing energy efficiency. Algorithm developers and model providers can also use this tool to evaluate the potential energy consumption of new models before deployment.
“The AI sustainability challenge is a pressing question we have to answer,” says Kyungmi Lee, an MIT postdoc and lead author on this technique. Lee worked with Zhiye Song, an EECS graduate student; Eun Kyung Lee and Xin Zhang, research managers at IBM Research and the MIT-IBM Watson AI Lab; Tamar Eilam, IBM Fellow and chief scientist of sustainable computing at IBM Research; and senior author Anantha P. Chandrakasan, MIT provost. The research is being presented at the IEEE International Symposium on Performance Analysis of Systems and Software.
Inside data centers, thousands of GPUs perform operations to train and deploy AI models, with power consumption varying depending on configuration and workload. Traditional prediction methods involve breaking workloads into steps and simulating module use, which can be time-consuming for large AI workloads. “As an operator, if I want to compare different algorithms or configurations to find the most energy-efficient manner to proceed, if a single emulation is going to take days, that is going to become very impractical,” Lee explains.
To expedite the process, MIT researchers used less-detailed information that could be estimated faster, recognizing that AI workloads often have repeatable patterns. Algorithm developers optimize programs to run efficiently on GPUs, and these optimizations create a regular structure that the researchers leveraged. They developed a lightweight estimation model called EnergAIzer, capturing GPU power usage patterns.
While fast, the initial estimation didn’t account for all energy costs, like fixed energy for program setup and additional costs for data operations. Hardware fluctuations can also affect energy use. The researchers added correction terms from real GPU measurements to their model for accuracy. Users can input workload information into EnergAIzer to quickly estimate energy consumption and adjust configurations to see effects on power use.
Testing EnergAIzer with real AI workload data from GPUs showed only about 8% error in power consumption estimates, matching traditional methods that take longer. The tool can also predict future GPU power use if hardware changes are gradual. Researchers aim to test EnergAIzer on new GPU configurations and scale it for many collaborating GPUs. “To really make an impact on sustainability, we need a tool that can provide a fast energy estimation solution across the stack,” Lee says. This research was supported in part by the MIT-IBM Watson AI Lab.
Original Source: news.mit.edu
