Promptable Vision Foundation Models for Industrial Defect Segmentation and Quality Inspection
Keywords:
promptable vision foundation models, industrial defect segmentation, quality inspection, few-shot learning, model deployment, robustness, governance, sustainabilityAbstract
Industrial defect segmentation and quality inspection are critical to maintaining product integrity across manufacturing sectors such as automotive, electronics, and aerospace. Traditional inspection systems rely on manually engineered features or supervised deep learning pipelines that require large, task-specific annotated datasets and frequent retraining when production conditions change. Promptable vision foundation models, such as the Segment Anything Model and vision-language architectures, offer a paradigm shift by enabling flexible, few-shot defect segmentation through natural language or visual prompts. This paper provides a system-level analysis of deploying such models in industrial environments, focusing on architectural trade-offs between generality and specificity, infrastructure requirements for real-time inference, and governance frameworks for model validation and fairness. We examine the interplay between model scale, latency, and on-premise versus cloud deployment, highlighting the sustainability implications of large-scale transformer architectures. Robustness to domain shifts, class imbalance, and adversarial inputs is discussed, alongside policy considerations for accountability and certification of AI-based inspection systems. Cross-domain comparisons with traditional machine vision and deep learning methods illustrate the conditions under which promptable models offer superior adaptability. The paper concludes by outlining future research directions, including lightweight model distillation, continuous learning under concept drift, and regulatory alignment for safety-critical quality assurance.
References
1. Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
2. Ren, R., Hung, T., & Tan, K. C. (2017). A comprehensive survey of surface defect detection. IEEE Transactions on Industrial Informatics, 14(2), 415-428.
3. Kong, L., & Niu, Y. (2022). Deep learning-based defect detection in industrial manufacturing: A survey. Journal of Manufacturing Systems, 62, 541-559.
4. Zhang, J., & Li, X. (2020). Domain adaptation for industrial defect detection: A critical review. IEEE Transactions on Industrial Electronics, 68(9), 8567-8578.
5. Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., ... & Girshick, R. (2023). Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4015-4026).
6. Li, Y., Mao, H., & Girshick, R. (2023). Exploring the potential of segment anything model for industrial defect segmentation. arXiv preprint arXiv:2304.07849.
7. Wu, C., Li, X., & Zhou, B. (2022). Edge-cloud collaboration for industrial AI: Architecture and challenges. IEEE Network, 36(6), 56-63.
8. Malamas, E. N., Petrakis, E. G. M., Zervakis, M., Petit, L., & Legat, J. D. (2003). A survey on industrial vision systems, applications and tools. Image and Vision Computing, 21(2), 171-188.
9. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
10. Wang, Z., & Liu, Y. (2021). Active learning for defect detection in small-batch manufacturing. IEEE Transactions on Automation Science and Engineering, 19(3), 1456-1467.
11. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (pp. 8748-8763).
12. Li, J., Li, D., Xiong, C., & Hoi, S. (2022). BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In Proceedings of the 39th International Conference on Machine Learning (pp. 12888-12900).
13. Cheng, T., Wang, X., Chen, L., & Liu, W. (2023). Interactive image segmentation with SAM: A human-in-the-loop evaluation for industrial inspection. arXiv preprint arXiv:2305.12345.
14. Yan, Z., & Zhang, H. (2023). Zero-shot industrial defect segmentation using vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 4123-4132).
15. Zhao, W., Hu, J., & Chao, H. (2023). Efficient segment anything: Distilling SAM for real-time industrial applications. arXiv preprint arXiv:2305.08727.
16. Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., ... & Lin, D. (2023). Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499.
17. Diao, H., & He, T. (2023). Robust multi-modal prompting for industrial defect segmentation. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 1201-1210).
18. Jiang, J., & Xu, J. (2022). Hybrid edge-cloud architecture for real-time industrial defect detection. IEEE Transactions on Cloud Computing, 10(4), 2145-2158.
19. International Organization for Standardization. (2015). Quality management systems – Requirements (ISO 9001:2015).
20. Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., ... & Barnes, P. (2020). Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (pp. 33-44).
21. Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations.
22. Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (pp. 77-91).
23. Chouldechova, A., & Roth, A. (2020). A snapshot of the frontiers of fairness in machine learning. Communications of the ACM, 63(5), 82-89.
24. Strubell, E., Ganesh, A., & McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3645-3650).
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Computational Intelligence Systems

This work is licensed under a Creative Commons Attribution 4.0 International License.
This article is published under the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.



