Methods for evaluating the energy efficiency of large language models

Authors

DOI:

https://doi.org/10.18372/2073-4751.85.21093

Keywords:

Green AI, large language models, energy efficiency, LLM, benchmarking

Abstract

This paper examines methods for evaluating the energy efficiency of autoregressive large language models based on the transformer architecture, focusing on representatives of the Cogito, Phi-4, Mistral, and RNJ-1 families. Given the rapidly increasing computational complexity of attention mechanisms and the associated power demands during inference, the study emphasizes experimental measurement of model power consumption on a consumer-grade NVIDIA RTX 3070 Ti GPU using CUDA acceleration. The proposed approach enables quantitative assessment of average, minimum, and maximum power draw, as well as comparative analysis of relative energy efficiency across models in typical text-generation scenarios. The obtained results provide a baseline for further research on energy-efficient deployment of artificial intelligence systems and highlight the industrial and societal importance of reducing the energy footprint of modern LLMs. In addition, the article presented a number of other approaches to improving the energy efficiency of LLM, such as query routing and dynamic power adjustment when working with query decryption. The comprehensive use of various optimization methodologies is an important factor in the development and implementation of neural networks LLM.

References

Green AI / R. Schwartz, J. Dodge, N. A. Smith, O. Etzioni. Communications of the ACM. 2020. Vol. 63, No. 12. P. 54–63.

E. Strubell, A. Ganesh, A. McCallum, “Energy and Policy Considerations for Deep Learning in NLP,” ACL 2019. https://aclanthology.org/P19-1355

NVIDIA Corp., “Energy Efficiency Trends in AI Inference,” NVIDIA Whitepaper, 2024. https://developer.nvidia.com

Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint, 2021. https://arxiv.org/abs/2106.09685

Hugging Face, “Optimum-Benchmark GitHub Repository,” 2025. https://github.com/huggingface/optimum-benchmark

Zhang et al., “Distributed Inference of Large Language Models: Challenges and Opportunities,” IEEE TPDS, 2024.

Li et al., “Adaptive Energy-Aware Scheduling for Distributed Transformer Inference,” ACM SoCC, 2024.

ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators https://arxiv.org/abs/1802.03806

”FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance” https://openreview.net/forum?id=XUZ2S0JVJP

Published

2026-04-28

How to Cite

Dorosh, O., & Huzii, M. (2026). Methods for evaluating the energy efficiency of large language models. Problems of Informatization and Control, 1(85). https://doi.org/10.18372/2073-4751.85.21093

Issue

Section

Статті