Privacy-Preserving Vertical Split Learning for Healthcare AI with Prototype-Based Anomaly Detection Against Data Poisoning

Dan Xue; Niklas Barnett

Authors

Dan Xue Department of Computer Science, University of Alabama at Birmingham, Birmingham, AL, USA.
Niklas Barnett Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, USA.

Keywords:

vertical split learning, privacy-preserving machine learning, healthcare AI, anomaly detection, data poisoning, prototype learning, federated learning, security, governance

Abstract

The integration of artificial intelligence into healthcare systems promises transformative improvements in diagnostics, treatment personalization, and operational efficiency, yet it simultaneously amplifies concerns regarding patient data privacy and the integrity of learning pipelines. Vertical split learning has emerged as a compelling paradigm that enables multiple healthcare institutions to collaboratively train deep neural networks without sharing raw feature-level data, thereby preserving confidentiality while leveraging complementary data modalities. However, the distributed nature of vertical split learning introduces new attack surfaces, particularly data poisoning and backdoor attacks that can compromise model behavior without violating privacy boundaries. This paper presents a comprehensive system-level analysis of a privacy-preserving vertical split learning framework augmented with prototype-based anomaly detection to counter data poisoning. We examine architectural trade-offs between privacy guarantees, communication overhead, and model accuracy, and we discuss how prototype-based mechanisms, which rely on learning compact class representations, can detect anomalous updates or poisoned samples at the cut layer during split learning. The proposed framework integrates differential privacy mechanisms, secure aggregation protocols, and a prototype consistency verification module that identifies deviations from expected latent distributions. Beyond technical design, we explore governance implications, deployment challenges in heterogeneous hospital networks, regulatory compliance with HIPAA and GDPR, and sustainability considerations regarding computational and energy costs. System-level robustness is evaluated through cross-domain comparisons with horizontal federated learning and fully centralized approaches, highlighting the nuanced benefits and limitations of vertical split learning in sensitive healthcare environments. We conclude with forward-looking perspectives on adaptive defense strategies, standardization of privacy-preserving benchmarks, and the role of policy in fostering trustworthy AI infrastructure.

References

1. McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017). Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS).

2. Vepakomma, P., Gupta, O., Swedish, T., & Raskar, R. (2018). Split learning for health: Distributed private deep learning without sharing raw data. arXiv preprint arXiv:1812.00564.

3. Gupta, O., & Raskar, R. (2018). Distributed learning of deep neural networks via independent subnet training. arXiv preprint arXiv:1810.04521.

4. Dwork, C. (2006). Differential privacy. In International Colloquium on Automata, Languages, and Programming (ICALP).

5. Biggio, B., Nelson, B., & Laskov, P. (2012). Poisoning attacks against support vector machines. In Proceedings of the 29th International Conference on Machine Learning (ICML).

6. Gu, T., Dolan-Gavitt, B., & Garg, S. (2017). BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv preprint arXiv:1708.06733.

7. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys, 41(3), 1-58.

8. Snell, J., Swersky, K., & Zemel, R. (2017). Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS).

9. Topol, E. J. (2019). High-performance medicine: The convergence of human and artificial intelligence. Nature Medicine, 25(1), 44-56.

10. Rieke, N., Hancox, J., Li, W., Milletari, F., Roth, H. R., Albarqouni, S., ... & Cardoso, M. J. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3(1), 1-7.

11. Li, T., Sahu, A. K., Talwalkar, A., & Smith, V. (2020). Federated learning: Challenges, methods, and future directions. IEEE Signal Processing Magazine, 37(3), 50-60.

12. Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(5), 557-570.

13. Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS).

14. Papernot, N., Abadi, M., Ulfar, E., Goodfellow, I., Talwar, K., & Zhang, L. (2017). Semi-supervised knowledge transfer for deep learning from private training data. In Proceedings of the 5th International Conference on Learning Representations (ICLR).

15. Shui, Y., Jin, R., Dou, Z., & Gao, Z. (2026). ProtoGuard-SL: Prototype Consistency Based Backdoor Defense for Vertical Split Learning. arXiv preprint arXiv:2604.03595.

16. Bagdasaryan, E., & Shmatikov, V. (2020). Blind backdoors in deep learning models. In Proceedings of the 29th USENIX Security Symposium.

17. Sun, Z., & Li, B. (2022). A survey on data poisoning attacks and defenses in federated learning. IEEE Transactions on Neural Networks and Learning Systems, 33(11), 6073-6093.

18. Chen, Y., Qin, X., Wang, J., Yu, C., & Gao, W. (2020). FedHealth: A federated transfer learning framework for wearable healthcare. IEEE Intelligent Systems, 35(4), 83-93.

19. Zhang, C., Xie, Y., Bai, H., Yu, B., Li, W., & Gao, Y. (2021). A survey on federated learning. Knowledge-Based Systems, 216, 106775.

20. Kairouz, P., McMahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., ... & Zhao, S. (2021). Advances and open problems in federated learning. Foundations and Trends in Machine Learning, 14(1-2), 1-210.

Privacy-Preserving Vertical Split Learning for Healthcare AI with Prototype-Based Anomaly Detection Against Data Poisoning

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information

Make a Submission

Journal Information

Indexing & Infrastructure