added Mac quick setup scripts (aeec94a7) · Commits · Harris, Tyrone / Offline Multilingual Question Answering System

MacM3restart.sh

0 → 100644

+10 −0

Original line number	Diff line number	Diff line
		#!/bin/bash

		# Restart Redis server
		brew services restart redis

		# Restart PostgreSQL server
		brew services restart postgresql

		# Run the main program
		python main.py
		No newline at end of file

MacM3setup.sh

0 → 100644

+95 −0

Original line number	Diff line number	Diff line
		#!/bin/bash

		# Check if the 'uploads' directory exists
		if [ ! -d "uploads" ]; then
		echo "Error: The 'uploads' directory does not exist. Please create it and add your documents."
		exit 1
		fi

		# Create a virtual environment
		python3 -m venv offlineqa-env \|\| { echo "Error creating virtual environment. Exiting."; exit 1; }

		# Activate the virtual environment
		source offlineqa-env/bin/activate \|\| { echo "Error activating virtual environment. Exiting."; exit 1; }

		# Install dependencies
		pip install -r requirements.txt \|\| { echo "Error installing dependencies. Exiting."; exit 1; }

		# Download and organize models
		mkdir -p models/translation models/llm models/embedding

		# Download translation models
		python -c """
		from transformers import MarianMTModel, MarianTokenizer

		language_pairs = {
		'es': ('Helsinki-NLP/opus-mt-es-en', 'Helsinki-NLP/opus-mt-en-es'),
		'fr': ('Helsinki-NLP/opus-mt-fr-en', 'Helsinki-NLP/opus-mt-en-fr'),
		'de': ('Helsinki-NLP/opus-mt-de-en', 'Helsinki-NLP/opus-mt-en-de'),
		'th': ('Helsinki-NLP/opus-mt-th-en', 'Helsinki-NLP/opus-mt-en-th'),
		'ru': ('Helsinki-NLP/opus-mt-ru-en', 'Helsinki-NLP/opus-mt-en-ru'),
		'ar': ('Helsinki-NLP/opus-mt-ar-en', 'Helsinki-NLP/opus-mt-en-ar'),
		'pt': ('Helsinki-NLP/opus-mt-pt-en', 'Helsinki-NLP/opus-mt-en-pt'),
		'zh': ('Helsinki-NLP/opus-mt-zh-en', 'Helsinki-NLP/opus-mt-en-zh'),
		}

		for _, (to_en_model_name, from_en_model_name) in language_pairs.items():
		MarianTokenizer.from_pretrained(to_en_model_name, cache_dir='./models/translation')
		MarianMTModel.from_pretrained(to_en_model_name, cache_dir='./models/translation')
		MarianTokenizer.from_pretrained(from_en_model_name, cache_dir='./models/translation')
		MarianMTModel.from_pretrained(from_en_model_name, cache_dir='./models/translation')
		""" \|\| { echo "Error downloading translation models. Exiting."; exit 1; }

		# Download the LLM (Vicuna-7B)
		python -c """
		from transformers import AutoTokenizer, AutoModelForCausalLM

		tokenizer = AutoTokenizer.from_pretrained('TheBloke/Vicuna-7B-v1.5-GGUF', cache_dir='./models/llm')
		model = AutoModelForCausalLM.from_pretrained('TheBloke/Vicuna-7B-v1.5-GGUF', cache_dir='./models/llm')
		""" \|\| { echo "Error downloading Vicuna-7B. Exiting."; exit 1; }

		# Download the SentencePiece tokenizer
		python -c """
		from transformers import AutoTokenizer

		tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base', cache_dir='./models/embedding')
		""" \|\| { echo "Error downloading SentencePiece tokenizer. Exiting."; exit 1; }

		# Generate embeddings for documents
		python -c """
		from langchain.document_loaders import DirectoryLoader
		from langchain.text_splitter import RecursiveCharacterTextSplitter
		from langchain.vectorstores import FAISS
		from langchain.embeddings import HuggingFaceEmbeddings

		# Load and preprocess documents from the 'uploads' folder
		loader = DirectoryLoader('./uploads', glob='*/.txt')
		documents = loader.load()

		text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
		texts = text_splitter.split_documents(documents)

		# Generate embeddings (all-MiniLM-L6-v2 is recommended for Vicuna)
		embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
		db = FAISS.from_documents(texts, embeddings)
		db.save_local('data/faiss_index')
		""" \|\| { echo "Error generating embeddings. Exiting."; exit 1; }

		# Download the spaCy language model and enable the coherence pipe
		python -m spacy download en_core_web_sm \|\| { echo "Error downloading spaCy model. Exiting."; exit 1; }
		python -m spacy_experimental.coref.download en \|\| { echo "Error downloading spaCy coreference data. Exiting."; exit 1; }

		# Install and start Redis
		brew install redis \|\| { echo "Error installing Redis. Exiting."; exit 1; }
		brew services start redis \|\| { echo "Error starting Redis. Exiting."; exit 1; }

		# Install and start PostgreSQL
		brew install postgresql \|\| { echo "Error installing PostgreSQL. Exiting."; exit 1; }
		brew services start postgresql \|\| { echo "Error starting PostgreSQL. Exiting."; exit 1; }

		# Create a database and user in PostgreSQL
		psql postgres -c "CREATE DATABASE my_qna_db;" \|\| { echo "Error creating database. Exiting."; exit 1; }
		psql postgres -c "CREATE USER my_qna_user WITH ENCRYPTED PASSWORD 'your_password';" \|\| { echo "Error creating user. Exiting."; exit 1; }
		psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE my_qna_db TO my_qna_user;" \|\| { echo "Error granting privileges. Exiting."; exit 1; }

		echo "Setup complete! You can now run the system using 'python main.py'"
		No newline at end of file

README.md

+96 −0

Original line number	Diff line number	Diff line
		@@ -133,6 +133,102 @@ The system's modular architecture comprises interconnected components, each fulf
		* Gets user input, generates UIDs for new chat sessions, and retrieves context for follow-up questions
		* Adds chat requests to the queue for processing

		### M3 Mac Quick Setup

		1. Clone the repository:

		```bash
		git clone https://code.ornl.gov/6cq/offline-multilingual-question-answering-system
		```

		2. Navigate to the project directory:

		```bash
		cd offline-multilingual-question-answering-system
		```

		3. Create a document upload folder:

		* Create a folder named `uploads` at the root level of the project.

		4. Add your documents:

		* Place your documents (e.g., plain text files) in the `uploads` folder.

		5. Run the setup script:

		```bash
		./MacM3setup.sh # This will execute the setup script
		```

		This script will perform the following actions:

		* Create and activate a virtual environment:

		* It creates a virtual environment named `offlineqa-env` using `python3 -m venv`.
		* It activates the virtual environment using `source offlineqa-env/bin/activate`.

		* Install dependencies:

		* It installs the required Python packages listed in `requirements.txt` using `pip install -r requirements.txt`.

		* Download and organize models:

		* It creates the necessary directories for storing the models (`models/translation`, `models/llm`, `models/embedding`).
		* It downloads the translation models (MarianMT) for the supported language pairs using the `transformers` library.
		* It downloads the Vicuna-7B LLM and its tokenizer.
		* It downloads the SentencePiece tokenizer.

		* Prepare your document collection:

		* It loads and preprocesses documents from the `uploads` folder using `DirectoryLoader` and `RecursiveCharacterTextSplitter`.
		* It generates embeddings for the documents using the `all-MiniLM-L6-v2` embedding model (recommended for Vicuna) and stores them in a FAISS index.
		* It saves the FAISS index to disk (`data/faiss_index`).

		* Download the spaCy language model and enable the coherence pipe:

		* It downloads the `en_core_web_sm` language model for spaCy using `python -m spacy download en_core_web_sm`.
		* It downloads the coreference resolution data for spaCy using `python -m spacy_experimental.coref.download en`.

		* Set up and start Redis and PostgreSQL:

		* It installs Redis and PostgreSQL using Homebrew.
		* It starts the Redis and PostgreSQL servers.
		* It creates a database and user in PostgreSQL.

		6. Run the main program:

		```bash
		python main.py
		```

		The system will start running, presenting the terminal-based chat interface for user interaction.


		### Running the System

		1. Initial Setup:
		* If you haven't already, follow the installation instructions to set up the system and its dependencies.

		2. Starting the System:
		* Execute the `MacM3restart.sh` script to start the Redis and PostgreSQL servers and run the main program:

		```bash
		./MacM3restart.sh
		```

		* The system will start running, presenting the terminal-based chat interface for user interaction.

		3. Restarting the System:
		* If you need to restart the system (e.g., after making changes to the code or configuration), you can use the same `MacM3restart.sh` script:

		```bash
		./MacM3restart.sh
		```

		* This script will restart the Redis and PostgreSQL servers and then re-run the main program.


		## Getting Started

		### Prerequisites