Cultural Bias Auditing in Multimodal Generative Models Through Cross-Lingual Prompt Sensitivity Analysis

Roy J. Burton; Maurice D. Greene; Arthur M. Walters

Authors

Roy J. Burton Department of Computer Science and Engineering, University at Buffalo, Buffalo, NY, USA.
Maurice D. Greene Department of Computer Science, Colorado State University, Fort Collins, CO, USA.
Arthur M. Walters Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, KS, USA.

Keywords:

cultural bias, multimodal generative models, cross-lingual prompt sensitivity, fairness auditing, text-to-image generation, socio-technical infrastructure, model governance

Abstract

The rapid deployment of multimodal generative models, particularly those capable of producing images from textual prompts, has introduced unprecedented challenges in ensuring cultural fairness across global user populations. This paper proposes a systematic framework for auditing cultural bias in such models through cross-lingual prompt sensitivity analysis, a method that leverages linguistic diversity to expose latent cultural assumptions embedded in model representations. By systematically translating semantically equivalent prompts across languages and assessing the resulting image distributions, we reveal systematic disparities in how models depict culturally specific artifacts, social roles, and geographic settings. The approach emphasizes system-level considerations, including the architectural trade-offs between model scale and bias amplification, the infrastructure required for multilingual evaluation pipelines, and the governance mechanisms needed to operationalize fairness audits. We present a detailed case study involving text-to-image models, demonstrating that even state-of-the-art systems exhibit pronounced cultural gaps that correlate with the linguistic and demographic composition of training data. Our analysis further explores the sustainability of bias mitigation strategies, the interplay between robustness and cultural fidelity, and the policy implications for deploying these models in cross-cultural contexts. The paper concludes with recommendations for integrating cross-lingual auditing into the development lifecycle of generative systems, advocating for a shift from post-hoc evaluation to proactive bias governance.

References

1. Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623). https://doi.org/10.1145/3442188.3445922

2. Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (pp. 77–91). PMLR.

3. Friedman, B., & Nissenbaum, H. (1996). Bias in computer systems. ACM Transactions on Information Systems, 14(3), 330–347. https://doi.org/10.1145/230538.230561

4. Srinivasan, R., & Venkatesh, N. (2022). How cultural is a cultural dataset? A study of the GeoDE dataset. arXiv preprint arXiv:2205.09248.

5. De Vries, T., Misra, I., Wang, C., & van der Maaten, L. (2019). Does object recognition work for everyone? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (pp. 52–59).

6. Shi, C., Li, S., Guo, S., Xie, S., Wu, W., Dou, J., ... & Chua, T. S. (2025). Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation. arXiv preprint arXiv:2511.17282.

7. Blodgett, S. L., Barocas, S., Daumé III, H., & Wallach, H. (2020). Language (technology) is power: A critical survey of "bias" in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5454–5476). https://doi.org/10.18653/v1/2020.acl-main.485

8. Prabhu, V., & Dhamija, S. (2023). Cultural diversity in visual datasets: A survey and benchmark. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 2012–2022).

9. Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1668–1678). https://doi.org/10.18653/v1/P19-1163

10. Selbst, A. D., Boyd, D., Friedler, S. A., Venkatasubramanian, S., & Vertesi, J. (2019). Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (pp. 59–68). https://doi.org/10.1145/3287560.3287598

11. Castelle, M. (2023). The political economy of machine translation. New Media & Society, 25(4), 891–910. https://doi.org/10.1177/14614448211051935

12. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. https://doi.org/10.2307/2529310

13. Dixon, L., Li, J., Sorensen, J., Thain, N., & Vasserman, L. (2018). Measuring and mitigating unintended bias in text classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (pp. 67–73). https://doi.org/10.1145/3278721.3278772

14. Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., ... & Sifre, L. (2022). Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.

15. Krause, B., & Zamfirescu-Pereira, J. (2023). Reproducibility and versioning in generative AI pipelines. In Proceedings of the ACM Conference on Reproducibility and Replicability (pp. 45–52).

16. Settles, B. (2009). Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison.

17. Fjeld, J., Achten, N., Hilligoss, H., Nagy, A., & Srikumar, M. (2020). Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to AI governance. Berkman Klein Center Research Publication No. 2020-1.

18. Ranjbar, M., & Wilson, G. (2022). The price of fairness: Trade-offs between bias mitigation and model performance. Journal of Machine Learning Research, 23(1), 1–36.

19. Kleinberg, J., Mullainathan, S., & Raghavan, M. (2017). Inherent trade-offs in the fair determination of risk scores. In Proceedings of the 8th Innovations in Theoretical Computer Science Conference (pp. 1–23). https://doi.org/10.4230/LIPIcs.ITCS.2017.43

20. Wong, P.-H. (2020). Democratizing algorithmic fairness. Philosophy & Technology, 33(2), 225–244. https://doi.org/10.1007/s13347-019-00370-z

21. Birhane, A., Prabhu, V. U., & Kahembwe, E. (2021). Multimodal datasets: Misogyny, pornography, and malignant stereotypes. arXiv preprint arXiv:2110.01963.

22. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.

23. Rottger, P., Vidgen, B., Hovy, D., & Pierrehumbert, J. (2022). Two counterfactuals to measure and mitigate bias in language generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 4454–4470). https://doi.org/10.18653/v1/2022.naacl-main.331

24. Helberger, N., & Zuiderveen Borgesius, F. (2020). The role of AI in the media: A view from Europe. In AI and the Media (pp. 1–18). Springer.

Cultural Bias Auditing in Multimodal Generative Models Through Cross-Lingual Prompt Sensitivity Analysis

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information

Make a Submission

Journal Information

Indexing & Infrastructure