AI Document Q&A System (LLaMA 3 - Local RAG)

An AI-powered system that allows users to upload documents (PDF, DOCX, TXT, MD) and ask questions using a local LLaMA 3 model via Ollama with Retrieval-Augmented Generation (RAG).

View on Github

🧠 Architecture

Documentation

AI-powered document chat system using LLaMA 3 and RAG architecture. Upload documents and ask questions with fully local inference (no API required). Built with FastAPI, Streamlit, and FAISS.

✨ Key Features

📄 Upload PDF, DOCX, TXT, MD documents
💬 ChatGPT-style conversational interface
🧠 100% local LLM (no API required)
⚡ Fast semantic search using FAISS
🔍 Source tracking for answers

💼 What This Project Demonstrates

🧠 Applied AI / LLM Engineering

Built a Retrieval-Augmented Generation (RAG) system using a local LLM
Integrated LLaMA 3 via Ollama for offline inference
Designed prompts to reduce hallucination

🔍 Information Retrieval & Vector Search

Semantic search using FAISS
Transformer embeddings for document understanding
Optimized chunking and retrieval

⚙️ Backend Engineering

FastAPI-based backend
Document ingestion pipeline
Error handling and reliability

🎨 Frontend & UX Design

Streamlit chat interface
Chat history and file upload
Backend-safe UX

🏗️ System Design

Upload → Parse → Chunk → Embed → Store → Retrieve → Generate

🧪 Debugging & Reliability Engineering

Solved race conditions
Handled local model constraints
Built fault-tolerant flow

🔐 Production Awareness

Clean structure
GitHub-ready setup
Environment-safe practices

⚠️ Limitations

High RAM usage
No persistence
Single user

🔮 Future Improvements

Multi-doc support
Chat memory
Docker
Cloud fallback

⚙️ How It Works

Upload document
Chunk & embed
Store in FAISS
Retrieve relevant chunks
LLM generates answer

🏗️ Tech Stack

Frontend: Streamlit
Backend: FastAPI
LLM: LLaMA 3 (via Ollama)
Embeddings: sentence-transformers
Vector DB: FAISS

Google Sites

Report abuse