@@ -223,6 +223,73 @@ The system is built upon a modular architecture, orchestrated using Langgraph, a
This will install all the necessary Python packages listed in `requirements.txt` within the virtual environment, ensuring that your project has its own isolated set of dependencies.
6.**Download and organize models:**
* **Create the `models` directory and its subdirectories:**
* **Ensure you have enough disk space:** The Vicuna-7B model is quite large (around 13GB). Make sure you have sufficient disk space available before downloading
* **Use the `transformers` library to download:**
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("TheBloke/Vicuna-7B-v1.5-GGUF", cache_dir="./models/llm")
```
This code will download both the tokenizer and the model weights to the `./models/llm` directory
* **Consider using a quantized version:** If you're running the system on a CPU or have limited memory, you might want to explore using a quantized version of Vicuna-7B, which can significantly reduce its memory footprint and improve inference speed. Refer to the Vicuna-7B model documentation for instructions on how to obtain and use a quantized version.
This code will download the SentencePiece tokenizer to the `./models/embedding` directory.
***Prepare your document collection:**
* Organize your documents in a suitable format (e.g., plain text files) within the `data/documents` directory.
* Use an offline embedding model (e.g., Sentence Transformers) to generate embeddings for your documents and store them in a local vector database (e.g., FAISS) for efficient retrieval. You can refer to the Langchain documentation or other resources for guidance on how to perform document embedding and indexing.