Skip to content
Snippets Groups Projects
Commit 76cb0600 authored by Thomasson, Noah's avatar Thomasson, Noah
Browse files

Clarify README and add top_p parameter

parent 6cd3a897
No related branches found
No related tags found
No related merge requests found
......@@ -16,7 +16,7 @@ and generating all different permutations of the sequence. In addition to permut
For this project, we are trying to enhance and evaluate performance on a mathematical reasoning task containing word problems, commonly used across literature, called GSM8k. After the generation of a set of permutations from an initial sequence of prompt phrases, each of these permutations is then evaluated to assess its performance. The prompt is evaluated against the ~1300 question-answer format of the GSM8k. First, the LLM fills in the `{QUESTION HERE}` spot with a specific question in the GSM8k, then runs the string into an LLM to generate a response. The response is then compared against the actual answer, and if the LLM scored correctly, it gets an answer of 1, otherwise scoring 0. The accuracy of the prompt is then evaluated as the average score across all question-answer pairs in the GSM8k dataset.
At the very end, an output file is generated with the LLM scores for every prompt, so that we can find the optimal prompt format for GSM8k-like mathematical reasoning tasks, and compare different trends between well-performing and basly-performing prompts.
At the very end, an output file is generated with the LLM scores for every prompt, so that we can find the optimal prompt format for GSM8k-like mathematical reasoning tasks, and compare different trends between well-performing and badly-performing prompts.
The LLM used in this project is llama3, hosted by Ollama as a backend, however any other Ollama-hosted model can be used pretty easily. To use a different LLM API, replace the OllamaClient class in `prompt_utils.py` to be something different.
......@@ -122,6 +122,9 @@ python3 ./main.py --model llama3 --url 127.0.0.1:11434 --parallel 1 --data-file
Note that that the log and output files should probably be changed for the use-case.
#### Evaluation config.
See the contents of `config.json`. The evaluation `temperature` and `top_p` parameters are the respective parameters of the LLM when generating an output from a prompt. See [Ollama parameter docs](https://github.com/ollama/ollama/blob/main/docs/modelfile.md#PARAMETER) for information on what each parameter means.
# Troubleshooting
### module not found
......
{
"eval_parameters": {
"temperature": 0.0
"temperature": 0.0,
"top_p": 0.0
}
}
\ No newline at end of file
}
......@@ -73,7 +73,8 @@ def run_task(idx: int, data: list, prompts: list, client: OllamaClient, logger,
llm_response = client.generate(
input_text=prompt,
temperature=config["eval_parameters"]["temperature"]
temperature=config["eval_parameters"]["temperature"],
top_p=config["eval_parameters"]["top_p"]
)
# validate and save llm response
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment