Skip to main content
The Evaluation tab is used to test and validate AI behavior within a domain. Within this section, you can create specific sets of prompts to simulate user interactions. Provide expected SQL outputs alongside these prompts to directly measure the AI’s accuracy in translating natural language into correct database queries. After running evaluations, you can review the results to identify areas for improvement. UI showing the Evaluation Tab

Evaluation Sets

Within this tab, create and manage collections of prompts used to evaluate AI behavior. Add a new evaluation set by providing a name and a JSON definition containing prompts, optionally including expected SQL. Existing evaluation sets are listed in this view, allowing you to review and manage them over time.

Evaluation Runs

Find here a list of completed evaluation runs and their results. This view allows you to review past executions, compare outcomes, and track how AI performance changes over time.
To learn how to create an Evaluation Set, understand indicators, and review the results, consult the related article Use Evaluations Sets and Runs.