Cross-Modal Federated Learning Security: Extending Prototype Consistency for Backdoor-Resilient Multimodal Representation Learning

Authors

  • Henri C. Berry Department of Computer Science, George Mason University, Fairfax, VA, USA.
  • Davide Allen Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA.
  • Sunil Jha Department of Computer Science, University of North Texas, Denton, TX, USA.

Keywords:

cross-modal federated learning; backdoor defense; prototype consistency; multimodal representation learning; adversarial robustness; distributed learning security; governance; fairness

Abstract

Multimodal federated learning represents a paradigm shift in distributed machine learning, enabling collaborative model training across heterogeneous data sources that span images, text, audio, and sensor streams. However, the integration of multiple modalities introduces novel attack surfaces, particularly backdoor poisoning, where an adversary embeds hidden triggers across one or more modalities to cause targeted misclassification at inference time. Existing defenses rooted in unimodal settings often fail to generalize to cross-modal environments due to the complex interplay between representation spaces. This paper proposes a systematic extension of prototype consistency mechanisms—originally developed for single-modal and split learning contexts—to secure cross-modal federated learning architectures. We examine the structural trade-offs between robustness and utility when enforcing alignment constraints on multimodal prototype clusters, and we analyze how such constraints interact with the inherent heterogeneity of client data distributions and modality-specific encoders. Beyond technical design, we discuss the governance and fairness implications of deploying backdoor-resilient multimodal systems in critical infrastructures such as healthcare diagnostics and autonomous navigation. Our analysis draws on recent advances in prototype-based defenses, including the ProtoGuard framework for vertical split learning, and situates these within the broader landscape of secure representation learning. The paper concludes with a forward-looking discussion on sustainability, policy requirements, and the need for standardized evaluation benchmarks for cross-modal federated security.

References

1. Chen, X., Liu, C., Li, B., Lu, K., & Song, D. (2017). Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526.

2. Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733.

3. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (pp. 8748–8763). PMLR.

4. Jia, C., Yang, Y., Xia, Y., Chen, Y. T., Parekh, Z., Pham, H., ... & Duchi, J. (2021). Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning (pp. 4904–4916). PMLR.

5. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (pp. 308–318).

6. Papernot, N., Abadi, M., Erlingsson, U., Goodfellow, I., & Talwar, K. (2016). Semi-supervised knowledge transfer for deep learning from private training data. arXiv preprint arXiv:1610.05755.

7. Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (pp. 1597–1607). PMLR.

8. Caron, M., Bojanowski, P., Joulin, A., & Douze, M. (2018). Deep clustering for unsupervised learning of visual features. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 132–149).

9. Yang, K., Qin, Z., Du, S., & Wang, X. (2022). Cross-modal backdoor attack and defense in multimodal learning. arXiv preprint arXiv:2203.13035.

10. Stich, S. U., Cordonnier, J. B., & Jaggi, M. (2018). Sparsified SGD with memory. In Advances in Neural Information Processing Systems 31.

11. Alistarh, D., Grubic, D., Li, J., Tomioka, R., & Vojnovic, M. (2017). QSGD: Communication-efficient SGD via gradient quantization and encoding. In Advances in Neural Information Processing Systems 30.

12. Zhang, Z., Li, Y., Wang, C., & Chen, C. (2023). Prototype-based backdoor defense for federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 9, pp. 11167–11175).

13. Shui, Y., Jin, R., Dou, Z., & Gao, Z. (2026). ProtoGuard-SL: Prototype Consistency Based Backdoor Defense for Vertical Split Learning. arXiv preprint arXiv:2604.03595.

14. Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50–60.

15. Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Zhao, S. (2021). Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1–2), 1–210.

16. Sun, Z., Qian, H., & Wang, H. (2024). Multimodal backdoor attacks and defenses: A survey. ACM Computing Surveys, 56(5), 1–35.

17. Bagdasaryan, E., Veit, A., Hua, Y., Estrin, D., & Shmatikov, V. (2020). How to backdoor federated learning. In International Conference on Artificial Intelligence and Statistics (pp. 2938–2948). PMLR.

18. Blanchard, P., Guerraoui, R., & Stainer, J. (2017). Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems 30.

19. Li, M., Zhou, Y., & Li, Q. (2023). Federated learning with prototype clustering for non-iid data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12042–12051).

20. Xu, J., Gole, S., & Li, R. (2025). Cross-modal prototype alignment for robust federated learning. arXiv preprint arXiv:2501.10045.

Downloads

Published

2026-05-27

How to Cite

Henri C. Berry, Davide Allen, & Sunil Jha. (2026). Cross-Modal Federated Learning Security: Extending Prototype Consistency for Backdoor-Resilient Multimodal Representation Learning. Computational Intelligence Systems, 4(1). Retrieved from https://scivexus.org/index.php/CIS/article/view/371