Evals is a framework for evaluating OpenAI models and an open source benchmark registry, you can use Evals to create and run evaluations:
- Use datasets to generate hints
- Measuring the quality of completion provided by OpenAI models
- Compare the performance of different datasets and models
The goal of Evals is to make building an evaluation as easy as possible while writing as little code as possible. To get started, I suggest you follow the steps below in order:
- Read through this document and follow the setup instructions below.
- Learn how to run existing evaluations: run-evals.md
- Familiarize yourself with the existing evaluation templates: eval-templates.md
- Learn about the build eval process: build-eval.md
- See an example of implementing custom evaluation logic: custom-eval.md.
set up
To run the assessment, you need to set up and specify your OpenAI API key.After obtaining the API key, use OPENAI_API_KEY
The environment variable specifies it.
download assessment
Evals registry using Git-LFS After storing, downloading and installing LFS, evaluations can be obtained by:
git lfs fetch --all git lfs pull
You may only want to get data for selected evaluations, you can do this by:
git lfs fetch --include=evals/registry/data/${your eval} git lfs pull
to evaluate
If you are creating an assessment, it is recommended to clone this repo directly from GitHub and install the requirements with:
use -e
changes made to eval will be reflected immediately without reinstalling.
#Evals #Homepage #Documentation #Downloads #OpenAI #Model #Evaluation #Framework #News Fast Delivery