Методи оцінки енергоефективності великих мовних моделей

O.I. Dorosh; M.M. Huzii

doi:10.18372/2073-4751.85.21093

Methods for evaluating the energy efficiency of large language models

Authors

O.I. Dorosh https://orcid.org/0000-0003-2488-0500
M.M. Huzii https://orcid.org/0000-0003-4807-8862

DOI:

https://doi.org/10.18372/2073-4751.85.21093

Keywords:

Green AI, large language models, energy efficiency, LLM, benchmarking

Abstract

This paper examines methods for evaluating the energy efficiency of autoregressive large language models based on the transformer architecture, focusing on representatives of the Cogito, Phi-4, Mistral, and RNJ-1 families. Given the rapidly increasing computational complexity of attention mechanisms and the associated power demands during inference, the study emphasizes experimental measurement of model power consumption on a consumer-grade NVIDIA RTX 3070 Ti GPU using CUDA acceleration. The proposed approach enables quantitative assessment of average, minimum, and maximum power draw, as well as comparative analysis of relative energy efficiency across models in typical text-generation scenarios. The obtained results provide a baseline for further research on energy-efficient deployment of artificial intelligence systems and highlight the industrial and societal importance of reducing the energy footprint of modern LLMs. In addition, the article presented a number of other approaches to improving the energy efficiency of LLM, such as query routing and dynamic power adjustment when working with query decryption. The comprehensive use of various optimization methodologies is an important factor in the development and implementation of neural networks LLM.

References

Green AI / R. Schwartz, J. Dodge, N. A. Smith, O. Etzioni. Communications of the ACM. 2020. Vol. 63, No. 12. P. 54–63.

E. Strubell, A. Ganesh, A. McCallum, “Energy and Policy Considerations for Deep Learning in NLP,” ACL 2019. https://aclanthology.org/P19-1355

NVIDIA Corp., “Energy Efficiency Trends in AI Inference,” NVIDIA Whitepaper, 2024. https://developer.nvidia.com

Hu et al., “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint, 2021. https://arxiv.org/abs/2106.09685

Hugging Face, “Optimum-Benchmark GitHub Repository,” 2025. https://github.com/huggingface/optimum-benchmark

Zhang et al., “Distributed Inference of Large Language Models: Challenges and Opportunities,” IEEE TPDS, 2024.

Li et al., “Adaptive Energy-Aware Scheduling for Distributed Transformer Inference,” ACM SoCC, 2024.

ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Neural Network Accelerators https://arxiv.org/abs/1802.03806

”FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance” https://openreview.net/forum?id=XUZ2S0JVJP

Downloads

PDF (Українська)

Published

2026-04-28

How to Cite

Dorosh, O., & Huzii, M. (2026). Methods for evaluating the energy efficiency of large language models. Problems of Informatization and Control, 1(85). https://doi.org/10.18372/2073-4751.85.21093

Download Citation

Issue

Vol. 1 No. 85 (2026)

Section

Статті

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

The scientific journal adheres to the principles of Open Access and provides free, immediate, and permanent access to all published materials without financial, technical, or legal barriers for readers.

All articles are published in Open Access under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Copyright

Authors who publish their works in the journal:

retain the copyright to their publications;
grant the journal the right of first publication of the article;
agree to the distribution of their materials under the CC BY 4.0 license;
have the right to reuse, archive, and distribute their works (including in institutional and subject repositories), provided that proper reference is made to the original publication in the journal.