1Assistant Professor, Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: suganyavsb20163@gmail.com
2Assistant Professor, Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: kavithataru2015@gmail.com
3Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: shereyabaskaran138@gmail.com
4Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: sujithachandrasekaran2004@gmail.com
5Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: dharanigokulakannan962004@gmail.com
6Department of Artificial Intelligence and Data Science, V.S.B. Engineering College, Karur, India. Email: banurithikabanurithika@gmail.com
Background: CiteChat is a document-based intelligent chatbot, which uses Retrieval-Augmented Generation (RAG) to provide correct and context-sensitive answers of documents stored by users, including resumes, research papers and articles. It will take into account the fact that the proposed system will offer more factual consistency as compared to conventional large language model (LLM) systems because it will base every one of its responses on the uploaded document. A stringent grounding procedure is put in place to avoid the production of unsupported answer generation by introducing similarity threshold validation which minimizes hallucination.
Methodology: First, the PDF documents are read with the help of transformer-based sentence embeddings, and the embeddings are stored in the local memory, preserving the data privacy but allowing the data to be easily retrieved. When the user enters a query, the semantic similarity search mechanism finds the most relevant document segments and the most relevant document segments are made available to a Groq-powered LLaMA model to generate the response. The system also increases the level of transparency by including page-level snippets of citations and generated answers along with calculation of a retrieval-based confidence score to show the reliability of answers.
Results: Experimental analysis shows better factual basis and fewer hallucination in contrast to the standalone approaches based on the LLM.
Conclusion: Lastly, the lightweight Streamlit interface helps to upload documents and interactively query, making it easy to analyze documents.
Keywords: Document retrieval, large language model, Retrieval Augmented Generation, Vector Database, Hallucination.
How to cite this article: Suganya R, Kavitha V, Shereya B, Sujitha C, Dharani G, Banu Rithika MS. CiteChat: A RAG Based Intelligent ChatBot. Int J Drug Deliv Technol. 2026;16(13s): 975-985. DOI: 10.25258/ijddt.16.13s.109.
Source of support: Nil.
Conflict of interest: None