Quantifying the Ethical Risks of Generative AI through Automated Toxicity Scoring and Human-Centric Alignment Auditing Pipelines

Paul Whitmore; Edward Telford

doi:10.66280/cis.v4i1.125

Authors

Paul Whitmore School of Engineering, University of Louisville
Edward Telford Department of Electrical and Computer Engineering, Boise State University

DOI:

https://doi.org/10.66280/cis.v4i1.125

Keywords:

Generative Artificial Intelligence, Algorithmic Governance, Toxicity Scoring, Human-Centric Alignment, Socio-Technical Systems, AI Ethics, Infrastructure Robustness.

Abstract

The rapid deployment of generative artificial intelligence systems has fundamentally altered the landscape of digital interaction, information dissemination, and socio-technical governance. While these models offer unprecedented creative and analytical capabilities, they simultaneously introduce profound ethical risks ranging from algorithmic bias and cultural erasure to the propagation of toxic content. This paper presents a comprehensive inquiry into the quantification of these risks through the integration of automated toxicity scoring mechanisms and human-centric alignment auditing pipelines. We argue that traditional evaluation metrics, which often focus on narrow computational performance, fail to capture the nuanced and context-dependent harms inherent in large-scale generative deployments. By establishing a multi-layered auditing framework, this research explores the structural trade-offs between model utility and safety, the architectural challenges of real-time monitoring, and the policy implications of automated governance. We demonstrate that while automated scoring provides the necessary scalability for high-velocity data streams, human-centric auditing remains an indispensable component for interpreting cultural nuances and complex sociopolitical dynamics. The discussion extends to the sustainability of these oversight systems and the robustness of alignment techniques against adversarial manipulation. Ultimately, this study proposes a path toward more resilient AI infrastructures that prioritize human well-being and democratic values within the technical design cycle, ensuring that the advancement of generative intelligence does not come at the expense of societal cohesion or ethical integrity.

References

1.Anderljung, J., Barnhart, J., Korinek, A., Leung, J., O’Keefe, C., Whittlestone, J., ... & Dafoe, A. (2023). Frontier AI regulation: Managing emerging risks to public safety. arXiv preprint arXiv:2307.03718.

2.Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.

3.Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

4.Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

5.Cave, S., & ÓhÉigeartaigh, S. S. (2018). Bridging AI arms races and 21st-century control. Nature Machine Intelligence, 1(1), 5–7.

6.Crawford, K. (2021). The Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press.

7.Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255.

8.Floridi, L., & Cowls, J. (2019). A united framework of five ethical principles for AI in society. Harvard Data Science Review, 1(1).

9.Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411–437.

10.Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., ... & Sifre, L. (2022). Training compute-optimal large language models. arXiv preprint arXiv:2203.15556.

11.Ji, J., Liu, M., Dai, J., Pan, X., Zhang, C., Bian, Z., ... & Yang, Y. (2024). Beavertails: Towards improved safety alignment of LLM via a human-preference dataset. Advances in Neural Information Processing Systems, 36.

12.Kasirzadeh, A., & Gabriel, I. (2023). In conversation with AI: Aligning language models with human values. Philosophy & Technology, 36(2), 1–24.

13.Kirk, H. R., Vidgen, B., Röttger, P., & Hale, S. A. (2023). Personalisation within bounds: A multi-value approach to implementable AI ethics. arXiv preprint arXiv:2303.04500.

14.Liang, P., Rishi, B., Stanford, C., ... & Zaharia, M. (2022). Holistic evaluation of language models. arXiv preprint arXiv:2211.09110.

15.Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., ... & Gebru, T. (2019). Model cards for model reporting. Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency, 220–229.

16.Ngo, R., Chan, L., & Mindermann, S. (2023). The alignment problem from a deep learning perspective. arXiv preprint arXiv:2209.00626.

17.Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., ... & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744.

18.Perez, E., Huang, S., Song, F., Cai, T., Lukošiūtė, R., Magar, G., ... & Bowman, S. R. (2022). Red teaming language models with language models. arXiv preprint arXiv:2202.03286.

19.Raji, I. D., Gebru, T., Mitchell, M., Buolamwini, J., Jost, J., & Barnes, P. (2020). Saving face: Investigating the ethical concerns of facial recognition auditing. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 145–151.

20.Russell, S. (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Viking.

21.Selvamanickam, S., & Belani, S. (2024). Algorithmic auditing and the social construction of risk in generative AI. Journal of Socio-Technical Studies, 12(4), 455–478.

22.Shi, C., Li, S., Guo, S., Xie, S., Wu, W., Dou, J., ... & Chua, T. S. (2025). Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation. arXiv preprint arXiv:2511.17282.

23.Solaiman, I., & Dennison, C. (2021). Process for adapting language models to society (PALMS) with values-targeted datasets. Advances in Neural Information Processing Systems, 34, 5861–5873.

24.Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., ... & Scialom, T. (2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.

25.Vidgen, B., Thrush, T., Waseem, Z., & Kiela, D. (2021). Learning from the worst: Dynamic adversarial data collection for hate speech detection. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 1667–1682.

26.Wachter, S., Mittelstadt, B., & Russell, C. (2021). Why fairness cannot be automated: Bridging the gap between EU non-discrimination law and AI. Computer Law & Security Review, 41, 105567.

27.Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P. S., ... & Gabriel, I. (2021). Ethical and social risks of harm from Language Models. arXiv preprint arXiv:2112.04359.

28.Welbl, J., Glaese, A., Huang, P. S., Dathathri, S., Mellor, J., Rezende, D., ... & Isaac, A. (2021). Challenges in detoxifying language models. arXiv preprint arXiv:2109.07445.

29.Whittaker, M., Crawford, K., Dobbe, R., Fried, G., Kaziunas, E., Kak, A., ... & West, S. M. (2018). AI Now Report 2018. AI Now Institute at New York University.

30.Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., ... & Rush, A. M. (2020). Transformers: State-of-the-art natural language processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45.

31.Zhang, B., Dafoe, A. (2019). Artificial intelligence: American attitudes and trends. Oxford Internet Institute.

32.Zuboff, S. (2019). The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power. PublicAffairs.

Quantifying the Ethical Risks of Generative AI through Automated Toxicity Scoring and Human-Centric Alignment Auditing Pipelines

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information

Make a Submission

Journal Information

Indexing & Infrastructure