Commit aeec94a7 authored by Harris, Tyrone's avatar Harris, Tyrone
Browse files

added Mac quick setup scripts

parent dd048b77
Loading
Loading
Loading
Loading

MacM3restart.sh

0 → 100644
+10 −0
Original line number Diff line number Diff line
#!/bin/bash

# Restart Redis server
brew services restart redis

# Restart PostgreSQL server
brew services restart postgresql

# Run the main program
python main.py
 No newline at end of file

MacM3setup.sh

0 → 100644
+95 −0
Original line number Diff line number Diff line
#!/bin/bash

# Check if the 'uploads' directory exists
if [ ! -d "uploads" ]; then
  echo "Error: The 'uploads' directory does not exist. Please create it and add your documents."
  exit 1
fi

# Create a virtual environment
python3 -m venv offlineqa-env || { echo "Error creating virtual environment. Exiting."; exit 1; }

# Activate the virtual environment
source offlineqa-env/bin/activate || { echo "Error activating virtual environment. Exiting."; exit 1; }

# Install dependencies
pip install -r requirements.txt || { echo "Error installing dependencies. Exiting."; exit 1; }

# Download and organize models
mkdir -p models/translation models/llm models/embedding

# Download translation models
python -c """
from transformers import MarianMTModel, MarianTokenizer

language_pairs = {
    'es': ('Helsinki-NLP/opus-mt-es-en', 'Helsinki-NLP/opus-mt-en-es'),
    'fr': ('Helsinki-NLP/opus-mt-fr-en', 'Helsinki-NLP/opus-mt-en-fr'),
    'de': ('Helsinki-NLP/opus-mt-de-en', 'Helsinki-NLP/opus-mt-en-de'),
    'th': ('Helsinki-NLP/opus-mt-th-en', 'Helsinki-NLP/opus-mt-en-th'),
    'ru': ('Helsinki-NLP/opus-mt-ru-en', 'Helsinki-NLP/opus-mt-en-ru'),
    'ar': ('Helsinki-NLP/opus-mt-ar-en', 'Helsinki-NLP/opus-mt-en-ar'),
    'pt': ('Helsinki-NLP/opus-mt-pt-en', 'Helsinki-NLP/opus-mt-en-pt'),
    'zh': ('Helsinki-NLP/opus-mt-zh-en', 'Helsinki-NLP/opus-mt-en-zh'),
}

for _, (to_en_model_name, from_en_model_name) in language_pairs.items():
    MarianTokenizer.from_pretrained(to_en_model_name, cache_dir='./models/translation')
    MarianMTModel.from_pretrained(to_en_model_name, cache_dir='./models/translation')
    MarianTokenizer.from_pretrained(from_en_model_name, cache_dir='./models/translation')
    MarianMTModel.from_pretrained(from_en_model_name, cache_dir='./models/translation')
""" || { echo "Error downloading translation models. Exiting."; exit 1; }

# Download the LLM (Vicuna-7B)
python -c """
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained('TheBloke/Vicuna-7B-v1.5-GGUF', cache_dir='./models/llm')
model = AutoModelForCausalLM.from_pretrained('TheBloke/Vicuna-7B-v1.5-GGUF', cache_dir='./models/llm')
""" || { echo "Error downloading Vicuna-7B. Exiting."; exit 1; }

# Download the SentencePiece tokenizer
python -c """
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base', cache_dir='./models/embedding')
""" || { echo "Error downloading SentencePiece tokenizer. Exiting."; exit 1; }

# Generate embeddings for documents
python -c """
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings

# Load and preprocess documents from the 'uploads' folder
loader = DirectoryLoader('./uploads', glob='**/*.txt')
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# Generate embeddings (all-MiniLM-L6-v2 is recommended for Vicuna)
embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')
db = FAISS.from_documents(texts, embeddings)
db.save_local('data/faiss_index')
""" || { echo "Error generating embeddings. Exiting."; exit 1; }

# Download the spaCy language model and enable the coherence pipe
python -m spacy download en_core_web_sm || { echo "Error downloading spaCy model. Exiting."; exit 1; }
python -m spacy_experimental.coref.download en || { echo "Error downloading spaCy coreference data. Exiting."; exit 1; }

# Install and start Redis
brew install redis || { echo "Error installing Redis. Exiting."; exit 1; }
brew services start redis || { echo "Error starting Redis. Exiting."; exit 1; }

# Install and start PostgreSQL
brew install postgresql || { echo "Error installing PostgreSQL. Exiting."; exit 1; }
brew services start postgresql || { echo "Error starting PostgreSQL. Exiting."; exit 1; }

# Create a database and user in PostgreSQL
psql postgres -c "CREATE DATABASE my_qna_db;" || { echo "Error creating database. Exiting."; exit 1; }
psql postgres -c "CREATE USER my_qna_user WITH ENCRYPTED PASSWORD 'your_password';" || { echo "Error creating user. Exiting."; exit 1; }
psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE my_qna_db TO my_qna_user;" || { echo "Error granting privileges. Exiting."; exit 1; }

echo "Setup complete! You can now run the system using 'python main.py'"
 No newline at end of file
+96 −0
Original line number Diff line number Diff line
@@ -133,6 +133,102 @@ The system's modular architecture comprises interconnected components, each fulf
    * Gets user input, generates UIDs for new chat sessions, and retrieves context for follow-up questions
    * Adds chat requests to the queue for processing

### M3 Mac Quick Setup

1.  **Clone the repository:**

    ```bash
    git clone https://code.ornl.gov/6cq/offline-multilingual-question-answering-system
    ```

2.  **Navigate to the project directory:**

    ```bash
    cd offline-multilingual-question-answering-system
    ```

3.  **Create a document upload folder:**

    *  Create a folder named `uploads` at the root level of the project.

4.  **Add your documents:**

    *  Place your documents (e.g., plain text files) in the `uploads` folder.

5.  **Run the setup script:**

    ```bash
    ./MacM3setup.sh  # This will execute the setup script
    ```

    This script will perform the following actions:

    *   **Create and activate a virtual environment:**

        *   It creates a virtual environment named `offlineqa-env` using `python3 -m venv`.
        *   It activates the virtual environment using `source offlineqa-env/bin/activate`.

    *   **Install dependencies:**

        *   It installs the required Python packages listed in `requirements.txt` using `pip install -r requirements.txt`.

    *   **Download and organize models:**

        *   It creates the necessary directories for storing the models (`models/translation`, `models/llm`, `models/embedding`).
        *   It downloads the translation models (MarianMT) for the supported language pairs using the `transformers` library.
        *   It downloads the Vicuna-7B LLM and its tokenizer.
        *   It downloads the SentencePiece tokenizer.

    *   **Prepare your document collection:**

        *   It loads and preprocesses documents from the `uploads` folder using `DirectoryLoader` and `RecursiveCharacterTextSplitter`.
        *   It generates embeddings for the documents using the `all-MiniLM-L6-v2` embedding model (recommended for Vicuna) and stores them in a FAISS index.
        *   It saves the FAISS index to disk (`data/faiss_index`).

    *   **Download the spaCy language model and enable the coherence pipe:**

        *   It downloads the `en_core_web_sm` language model for spaCy using `python -m spacy download en_core_web_sm`.
        *   It downloads the coreference resolution data for spaCy using `python -m spacy_experimental.coref.download en`.

    *   **Set up and start Redis and PostgreSQL:**

        *   It installs Redis and PostgreSQL using Homebrew.
        *   It starts the Redis and PostgreSQL servers.
        *   It creates a database and user in PostgreSQL.

6.  **Run the main program:**

    ```bash
    python main.py
    ```

    The system will start running, presenting the terminal-based chat interface for user interaction.


### Running the System

1.  **Initial Setup:**
    *   If you haven't already, follow the installation instructions to set up the system and its dependencies.

2.  **Starting the System:**
    *   Execute the `MacM3restart.sh` script to start the Redis and PostgreSQL servers and run the main program:

        ```bash
        ./MacM3restart.sh
        ```

    *   The system will start running, presenting the terminal-based chat interface for user interaction.

3.  **Restarting the System:**
    *   If you need to restart the system (e.g., after making changes to the code or configuration), you can use the same `MacM3restart.sh` script:

        ```bash
        ./MacM3restart.sh
        ```

    *   This script will restart the Redis and PostgreSQL servers and then re-run the main program.


## Getting Started

### Prerequisites