Added uploads folder on the host machine to the docker workflow (5348fda9) · Commits · Harris, Tyrone / Offline Multilingual Question Answering System

Dockerfile

+2 −1

Original line number	Diff line number	Diff line
		@@ -32,8 +32,9 @@ COPY . .
		# EXPOSE 6379 # Redis default port
		# EXPOSE 5432 # PostgreSQL default port

		# Define commands to start Redis and PostgreSQL servers
		# Define commands to start Redis and PostgreSQL servers, and then run the main program
		CMD service redis-server start && \
		service postgresql start && \
		sleep 5 && \
		python -c "from langchain.document_loaders import DirectoryLoader; from langchain.text_splitter import RecursiveCharacterTextSplitter; from langchain.vectorstores import FAISS; from langchain.embeddings import HuggingFaceEmbeddings; loader = DirectoryLoader('/uploads', glob='*/.txt'); documents = loader.load(); text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0); texts = text_splitter.split_documents(documents); embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2'); db = FAISS.from_documents(texts, embeddings); db.save_local('data/faiss_index')" && \
		python main.py
		No newline at end of file

README.md

+16 −17

Original line number	Diff line number	Diff line
		@@ -34,22 +34,21 @@ This system is designed to provide a seamless and informative question-answering

		## Features

		- Offline Operation: Functions entirely without an internet connection, ensuring data privacy and availability.
		- Multilingual Support: Handles questions and provides answers in multiple languages (Spanish, French, German, Thai, Russian, Arabic, Portuguese, Mandarin).
		- Contextual Understanding: Maintains conversation history within chat sessions to provide more relevant and coherent responses to follow-up questions.
		- Self-Correction: Employs a retry mechanism to iteratively refine answers, minimizing hallucinations and improving accuracy.
		- Terminal-Based Chat Interface: Offers a user-friendly, real-time chat interface for interaction.
		- UID Tracking & Database: Assigns unique identifiers to each interaction, facilitating tracking, analysis, and debugging.
		- Caching: Enhances performance by storing and reusing previous results.
		- Document Referencing: Provides transparency by citing the sources used to generate answers.
		- Efficient Multilingual Tokenizer: Utilizes SentencePiece for efficient handling of multiple languages.
		- Offline LLM: Leverages the Vicuna-7B model for powerful language understanding and answer generation capabilities in an offline setting.
		- Queue-Based Processing: Handles multiple chat requests concurrently, ensuring fair and efficient processing.
		- User Feedback Mechanism: Collects user feedback (thumbs up/thumbs down) to improve the system's performance over time.
		* Offline Operation: Functions entirely without an internet connection, ensuring data privacy and availability.
		* Multilingual Support: Handles questions and provides answers in multiple languages (Spanish, French, German, Thai, Russian, Arabic, Portuguese, Mandarin).
		* Contextual Understanding: Maintains conversation history within chat sessions to provide more relevant and coherent responses to follow-up questions.
		* Self-Correction: Employs a retry mechanism to iteratively refine answers, minimizing hallucinations and improving accuracy.
		* Terminal-Based Chat Interface: Offers a user-friendly, real-time chat interface for interaction.
		* UID Tracking & Database: Assigns unique identifiers to each interaction, facilitating tracking, analysis, and debugging.
		* Caching: Enhances performance by storing and reusing previous results.
		* Document Referencing: Provides transparency by citing the sources used to generate answers.
		* Efficient Multilingual Tokenizer: Utilizes SentencePiece for efficient handling of multiple languages.
		* Offline LLM: Leverages the Vicuna-7B model for powerful language understanding and answer generation capabilities in an offline setting.
		* Queue-Based Processing: Handles multiple chat requests concurrently, ensuring fair and efficient processing.
		* User Feedback Mechanism: Collects user feedback (thumbs up/thumbs down) to improve the system's performance over time.

		![image](/CRAIG_Graph.svg)


		## System Architecture

		The system's modular architecture comprises interconnected components, each fulfilling a specific role in the question-answering process.
		@@ -378,10 +377,10 @@ The `Dockerfile` includes instructions to:
		1. Run the Docker container in interactive mode:

		```bash
		docker run -it offline-qna-system
		docker run -it -v $(pwd)/uploads:/uploads offline-qna-system # mounts the 'uploads' folder from your current directory
		```

		This will start the container, and you should see the chat interface in the terminal.
		This will start the container, and you should see the chat interface in the terminal. The `-v $(pwd)/uploads:/uploads` option mounts the `uploads` folder from your current directory to the `/uploads` directory inside the container, allowing the container to access and process the documents you add to the `uploads` folder on your host machine.

		2. Set up the database:

		@@ -419,6 +418,6 @@ docker run -it \
		-p 6379:6379 \ # Map Redis port
		offline-qna-system
		```

		This command mounts the /path/to/your/data directory from your host machine to the /app/data directory inside the container, ensuring that any data stored in the database or cache is persisted even after the container stops. It also maps the container's Redis port (6379) to the same port on the host machine, allowing you to access Redis from outside the container.

		By following these instructions, you can easily set up and run the offline multilingual question-answering system within a Docker container, simplifying the deployment process and ensuring a consistent environment across different machines.
		No newline at end of file