Citation-Aware Retrieval-Augmented Generation for Reliable Knowledge-Intensive AI Applications

Cesar Howard; Qian Wei

Authors

Cesar Howard School of Information Technology, University of Cincinnati, Cincinnati, OH, USA.
Qian Wei Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV, USA.

Keywords:

retrieval-augmented generation, citation networks, knowledge grounding, large language models, information retrieval, AI reliability, system architecture, socio-technical governance

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a prominent paradigm for grounding large language models in external knowledge sources, thereby mitigating issues of hallucination and factual staleness. However, conventional RAG systems treat retrieved passages as independent evidence, often overlooking the relational and provenance cues inherent in citation networks. This paper proposes a citation-aware extension to the standard RAG framework, wherein the retrieval component is augmented with graph-based citation structures and the generation module is guided by citation-derived confidence signals. We argue that citation awareness improves not only factual accuracy but also the verifiability, transparency, and trustworthiness of generated outputs. The discussion covers architectural design choices, trade-offs between computational overhead and retrieval fidelity, robustness to adversarial citation manipulation, and implications for large-scale deployment in regulated domains such as healthcare, law, and scientific publishing. A cross-domain comparison highlights how citation-aware RAG systems can be tailored to different citation practices, from biomedical literature to legal case law. We further examine governance challenges, including citation bias, data provenance, and model updating strategies. The paper concludes with a research agenda for building citation-aware systems that support reliable knowledge-intensive applications while respecting ethical and policy constraints.

References

1. Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623.

2. Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459–9474.

3. Karpukhin, V., Oğuz, B., Min, S., Lewis, P., Wu, L., Edunov, S., ... & Yih, W. (2020). Dense passage retrieval for open-domain question answering. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 6769–6781.

4. Shuster, K., Poff, S., Chen, M., Kiela, D., & Weston, J. (2021). Retrieval augmentation reduces hallucination in conversation. Findings of the Association for Computational Linguistics: EMNLP 2021, 3784–3805.

5. Garfield, E. (1955). Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159), 108–111.

6. Cohan, A., Feldman, S., Beltagy, I., Downey, D., & Weld, D. S. (2020). SPECTER: Document-level representation learning using citation-informed transformers. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2270–2282.

7. Jiang, Z., Xu, F. F., Araki, J., & Neubig, G. (2021). How can we know when language models know? On the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9, 962–977.

8. Menick, J., Trebacz, M., Mikulik, V., Aslanides, J., Song, F., Chadwick, M., ... & Battaglia, P. (2022). Teaching large language models to self-debug. arXiv preprint arXiv:2204.07143.

9. Khattab, O., & Zaharia, M. (2020). ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 39–48.

10. Page, L., Brin, S., Motwani, R., & Winograd, T. (1999). The PageRank citation ranking: Bringing order to the web. Stanford InfoLab Technical Report.

11. Xiong, W., Li, J., Li, J., Tang, D., & Geng, X. (2020). Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808.

12. Zhang, Y., Chen, D., & Manning, C. D. (2021). Neural graph learning for document retrieval. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, 1451–1462.

13. Nye, B., Li, J. J., Patel, R., Yang, Y., Marshall, I. J., Nenkova, A., & Wallace, B. C. (2020). A corpus with multi-level annotations of patients, interventions and outcomes to support language processing for medical evidence. Journal of Biomedical Informatics, 109, 103520.

14. Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2022). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 11, 1037–1053.

15. Gao, L., Dai, Z., Pasupat, P., Chen, D., & Vandenhende, S. (2023). RARR: Researching and revising what language models say, using language models. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, 552–570.

16. Zellers, R., Holtzman, A., Rashkin, H., Bisk, Y., Farhadi, A., Roesner, F., & Choi, Y. (2019). Defending against neural fake news. Advances in Neural Information Processing Systems, 32.

17. Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D., Milojević, S., ... & Barabási, A. L. (2018). Science of science. Science, 359(6379), eaao0185.

18. Le, M. P., Esfahani, M. N., & Dong, C. (2023). Dynamic knowledge graph evolution for retrieval-augmented generation. arXiv preprint arXiv:2304.06255.

19. Priem, J., Piwowar, H., & Orr, R. (2022). OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833.

20. U.S. Food and Drug Administration. (2021). Artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan. FDA.

21. European Commission. (2021). Proposal for a regulation laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). COM(2021) 206 final.

22. Singh, R., Arora, A., & Bhardwaj, A. (2023). Mitigating geographic citation bias in automated knowledge synthesis. Journal of Informetrics, 17(3), 101410.

23. Wallace, B. C., Trikalinos, T. A., Lau, J., Brodley, C., & Schmid, C. H. (2010). Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinformatics, 11, 55.

24. Ashley, K. D. (2017). Artificial intelligence and legal analytics: New tools for law practice in the digital age. Cambridge University Press.

25. Jurgens, D., Kumar, S., Hoover, J., McFarland, D., & Jurafsky, D. (2018). Citation context analysis for identifying knowledge flows. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, 1772–1782.

Citation-Aware Retrieval-Augmented Generation for Reliable Knowledge-Intensive AI Applications

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information

Make a Submission

Journal Information

Indexing & Infrastructure