International Journal of Drug Delivery Technology
Volume 16, Issue 13s, 2026

CiteChat: A RAG Based Intelligent ChatBot

Suganya R1, Kavitha V2, Shereya B3, Sujitha C4, Dharani G5, Banu Rithika M S6

1Assistant Professor, Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: suganyavsb20163@gmail.com

2Assistant Professor, Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: kavithataru2015@gmail.com

3Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: shereyabaskaran138@gmail.com

4Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: sujithachandrasekaran2004@gmail.com

5Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: dharanigokulakannan962004@gmail.com

6Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: banurithikabanurithika@gmail.com


ABSTRACT

Background: CiteChat is a document-based intelligent chatbot, which uses Retrieval-Augmented Generation (RAG) to provide correct and context-sensitive answers of documents stored by users, including resumes, research papers and articles. It will take into account the fact that the proposed system will offer more factual consistency as compared to conventional large language model (LLM) systems because it will base every one of its responses on the uploaded document. A stringent grounding procedure is put in place to avoid the production of unsupported answer generation by introducing similarity threshold validation which minimizes hallucination.

Methodology: First, the PDF documents are read with the help of transformer-based sentence embeddings, and the embeddings are stored in the local memory, preserving the data privacy but allowing the data to be easily retrieved. When the user enters a query, the semantic similarity search mechanism finds the most relevant document segments and the most relevant document segments are made available to a Groq-powered LLaMA model to generate the response. The system also increases the level of transparency by including page-level snippets of citations and generated answers along with calculation of a retrieval-based confidence score to show the reliability of answers.

Results: Experimental analysis shows better factual basis and fewer hallucination in contrast to the standalone approaches based on the LLM.

Conclusion: Lastly, the lightweight Streamlit interface helps to upload documents and interactively query, making it easy to analyze documents.

Keywords: Document retrieval, large language model, Retrieval Augmented Generation, Vector Database, Hallucination.

How to cite this article: Suganya R, Kavitha V, Shereya B, Sujitha C, Dharani G, Banu Rithika MS. CiteChat: A RAG Based Intelligent ChatBot. Int J Drug Deliv Technol. 2026;16(13s): 975-985. DOI: 10.25258/ijddt.16.13s.109.

Source of support: Nil.

Conflict of interest: None