Modern artificial intelligence models protection methods from adversarial attacks
DOI:
https://doi.org/10.18372/2225-5036.31.21160Keywords:
artificial intelligence, cybersecurity, artificial intelligence protection, large language models, adversarial attacksAbstract
This article provides a comprehensive overview of the current state of artificial intelligence (AI) model security, systematizing vectors of targeted attacks and corresponding defense methods. The evolution of the threat landscape is analyzed, from classic attacks on machine learning (ML) models to specific vulnerabilities inherent in modern large language models (LLMs). The introduction outlines the problem's relevance in the context of AI's deep integration into critical infrastructure and business processes, emphasizing the shift from reactive vulnerability patching to proactive risk management, as reflected in industry standards like the NIST AI Risk Management Framework. The main body of the research begins with a detailed classification of modality-agnostic attacks, including adversarial examples, data poisoning, backdoors, model stealing, membership inference attacks, and model inversion. It then analyzes the most critical threats to modern systems based on criteria of prevalence, potential damage, and detection difficulty, focusing on supply chain attacks and data leakage. General-purpose defense methods are systematized according to the model lifecycle stages: at the data level (sanitization, differential privacy), during training (adversarial training, robust optimization), and at the inference stage (monitoring, security policies). A separate section is dedicated to the paradigm shift caused by LLMs. LLM-specific threats are examined in detail: prompt injection, including direct (jailbreaks) and indirect attacks in RAG systems, insecure output handling, and risks associated with fine-tuning and the use of external tools. Accordingly, multi-layered defense strategies for LLMs are analyzed, such as prompt hardening, the implementation of guardrails, red-teaming, and secure design of RAG systems and tools. An analytical synthesis summarizes the strengths and weaknesses of the reviewed approaches, assesses the risks of false positives and false negatives (FP/FN), and discusses their application conditions. The conclusion summarizes key findings and identifies promising directions for future research, including the development of formal security guarantees for LLMs, the standardization of benchmarks, and the advancement of comprehensive monitoring systems.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
The scientific journal "Ukrainian Scientific Journal of Information Security" adheres to the principles of open science and provides free, free and permanent access to all published materials. The goal of the policy is to increase the visibility, citation and impact of the results of scientific research in the field of information security. The journal works according to the principles of Open Access and does not charge a fee for access to published articles.
All articles are published in Open Access under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Copyright
Authors who publish their works in the journal “Ukrainian Scientific Journal of Information Security”:
-
retain the copyright to their publications;
-
grant the journal the right of first publication of the article;
-
agree to the distribution of their materials under the CC BY 4.0 license;
-
have the right to reuse, archive, and distribute their works (including in institutional and subject repositories), provided that proper reference is made to the original publication in the journal.




