Methods for evaluating the energy efficiency of large language models
DOI:
https://doi.org/10.18372/2073-4751.85.21093Keywords:
Green AI, large language models, energy efficiency, LLM, benchmarkingAbstract
This paper examines methods for evaluating the energy efficiency of autoregressive large language models based on the transformer architecture, focusing on representatives of the Cogito, Phi-4, Mistral, and RNJ-1 families. Given the rapidly increasing computational complexity of attention mechanisms and the associated power demands during inference, the study emphasizes experimental measurement of model power consumption on a consumer-grade NVIDIA RTX 3070 Ti GPU using CUDA acceleration. The proposed approach enables quantitative assessment of average, minimum, and maximum power draw, as well as comparative analysis of relative energy efficiency across models in typical text-generation scenarios. The obtained results provide a baseline for further research on energy-efficient deployment of artificial intelligence systems and highlight the industrial and societal importance of reducing the energy footprint of modern LLMs. In addition, the article presented a number of other approaches to improving the energy efficiency of LLM, such as query routing and dynamic power adjustment when working with query decryption. The comprehensive use of various optimization methodologies is an important factor in the development and implementation of neural networks LLM.
References
Green AI / R. Schwartz, J. Dodge, N. A. Smith, O. Etzioni. Communications of the ACM. 2020. Vol. 63, No. 12. P. 54–63.
E. Strubell, A. Ganesh, A. McCallum, “Energy and Policy Considerations for Deep Learning in NLP,” ACL 2019. https://aclanthology.org/P19-1355
NVIDIA Corp., “Energy Efficiency Trends in AI Inference,” NVIDIA Whitepaper, 2024. https://developer.nvidia.com
Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint, 2021. https://arxiv.org/abs/2106.09685
Hugging Face, “Optimum-Benchmark GitHub Repository,” 2025. https://github.com/huggingface/optimum-benchmark
Zhang et al., “Distributed Inference of Large Language Models: Challenges and Opportunities,” IEEE TPDS, 2024.
Li et al., “Adaptive Energy-Aware Scheduling for Distributed Transformer Inference,” ACM SoCC, 2024.
ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators https://arxiv.org/abs/1802.03806
”FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance” https://openreview.net/forum?id=XUZ2S0JVJP
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
The scientific journal adheres to the principles of Open Access and provides free, immediate, and permanent access to all published materials without financial, technical, or legal barriers for readers.
All articles are published in Open Access under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Copyright
Authors who publish their works in the journal:
-
retain the copyright to their publications;
-
grant the journal the right of first publication of the article;
-
agree to the distribution of their materials under the CC BY 4.0 license;
-
have the right to reuse, archive, and distribute their works (including in institutional and subject repositories), provided that proper reference is made to the original publication in the journal.