Cross-Modal Adversarial Defense for Medical LLM Agents in Clinical Decision Support Systems
Keywords:
adversarial defense, cross-modal, large language models, clinical decision support, medical AI, robustness, fairness, governanceAbstract
The integration of large language model (LLM) agents into clinical decision support systems (CDSS) represents a transformative advance in healthcare informatics, yet it introduces unprecedented vulnerabilities to cross-modal adversarial attacks. Such attacks exploit the interplay between textual, visual, and structured clinical data to degrade model performance, mislead diagnostic reasoning, and potentially harm patient outcomes. This paper develops a comprehensive framework for cross-modal adversarial defense tailored to medical LLM agents operating within critical care environments. We examine the architectural foundations of multimodal LLM agents, the unique threat surfaces arising from heterogeneous input channels, and the structural trade-offs inherent in designing robust defense mechanisms. A system-level perspective is adopted to evaluate governance constraints, deployment infrastructure, sustainability of defenses under distributional shift, and fairness implications for diverse patient populations. Drawing from adversarial machine learning theory, clinical safety standards, and socio-technical infrastructure design, we propose a multi-layered defense strategy that integrates input sanitization, cross-modal consistency verification, robust training procedures, and real-time anomaly detection. The analysis highlights that no single defense suffices; rather, a coordinated ecosystem of technical safeguards, policy frameworks, and institutional oversight is necessary. We further discuss the scalability of these defenses across different healthcare settings, from tertiary hospitals to resource-constrained clinics, and consider the ethical and regulatory dimensions of deploying adversarial defenses that may inadvertently introduce biases. This work contributes to the emerging field of safe and trustworthy medical AI by providing a systematic roadmap for defending multimodal LLM agents in high-stakes clinical decision support.
References
1. Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR).
2. Xu, K., Zhang, S., Tang, H., & Lu, C. (2020). Adversarial attacks and defenses in multimodal deep learning: A survey. ACM Computing Surveys, 53(4), 1–38.
3. Finlayson, S. G., Bowers, J. D., Ito, J., Zittrain, J. L., Beam, A. L., & Kohane, I. S. (2019). Adversarial attacks on medical machine learning. Science, 363(6433), 1287–1289.
4. Bagdasaryan, E., & Shmatikov, V. (2022). Spying on your dog: Adversarial attacks and defenses in multimodal models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
5. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR).
6. Tran, A., & Søgaard, A. (2021). Multimodal consistency verification for adversarial robustness. In Proceedings of the European Conference on Computer Vision (ECCV).
7. U.S. Food and Drug Administration. (2021). Artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan. FDA.
8. Suresh, H., & Guttag, J. V. (2021). A framework for understanding sources of harm throughout the machine learning life cycle. In Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO).
9. Rajpurkar, P., Chen, E., Banerjee, O., & Topol, E. J. (2022). AI in health and medicine. Nature Medicine, 28(1), 31–38.
10. Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy.
11. Guo, C., Rana, M., Cisse, M., & van der Maaten, L. (2018). Countering adversarial images using input transformations. In International Conference on Learning Representations (ICLR).
12. Zhao, P., Liu, S., & Tao, D. (2021). Multimodal adversarial training: A unified framework. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 4567–4582.
13. Huang, J., & Feng, J. (2022). Disentangled representation learning for robust multimodal recognition. In Advances in Neural Information Processing Systems (NeurIPS).
14. Hu, S. (2026). Research on Security Enhancement Methods for Adversarial Robust Large Language Model Intelligent Agents for Medical Decision-Making Tasks. arXiv preprint arXiv:2605.08257.
15. Hardt, M., Price, E., & Srebro, N. (2016). Equality of opportunity in supervised learning. In Advances in Neural Information Processing Systems (NeurIPS).
16. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.
17. Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118.
18. Chen, P. Y., Sharma, Y., Zhang, H., Yi, J., & Hsieh, C. J. (2018). EAD: Elastic-net attacks to deep neural networks via adversarial examples. In AAAI Conference on Artificial Intelligence.
19. Zech, J. R., Badgeley, M. A., Liu, M., Costa, A. B., Titano, J. J., & Oermann, E. K. (2018). Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLOS Medicine, 15(11), e1002683.
20. Dwork, C., Hardt, M., Pitassi, T., Reingold, O., & Zemel, R. (2012). Fairness through awareness. In Innovations in Theoretical Computer Science (ITCS).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Computational Intelligence Systems

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



