> ## Documentation Index
> Fetch the complete documentation index at: https://docs.wisdom.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Use Evaluations Sets and Runs

This article describes the concepts of Evaluation Sets and Evaluation Runs, which are tools used to test and validate your AI's domain performance.

## Evaluation Sets

In the Evaluation Sets tab, you can create specific sets of prompts that simulate user interactions and provide expected SQL outputs alongside these prompts. This allows for a direct comparison to see how accurately the AI converts natural language into database queries.

To create an Evaluation Set:

1. Open a Domain and click on the **Evaluation** tab.
2. Navigate to the **Evaluation Sets** Sub-tab.
3. Click **Add Evaluation**.

<img alt="Image showing the Evaluations tab" lightAlt="Image showing the Evaluations tab" darkAlt="Image showing the Evaluations tab" src="https://mintcdn.com/wisdomai/zeXGFUSd1mmTmFn0/images/add-evaluation-set-01.png?fit=max&auto=format&n=zeXGFUSd1mmTmFn0&q=85&s=542614cc00e8a6178ebe8f3fee963e11" className="dark:hidden" width="1920" height="800" data-path="images/add-evaluation-set-01.png" />

<img alt="Image showing the Evaluations tab" lightAlt="Image showing the Evaluations tab" darkAlt="Image showing the Evaluations tab" src="https://mintcdn.com/wisdomai/FPeFp9c8fz1r_sxU/dark-img/add-evaluation.png?fit=max&auto=format&n=FPeFp9c8fz1r_sxU&q=85&s=b48c12d89bbad9b39b62d7bd902299f4" className="hidden dark:block" width="1872" height="664" data-path="dark-img/add-evaluation.png" />

4. Fill out the **Create Evaluation Set** Form, providing a set **Name** and presenting a natural language query. Optionally, you can import CSV and also ground truth SQL.
5. Click on **Create Evaluation Set**. The new Evaluation set will appear listed.

<img alt="Image showing the Add Evaluations form" lightAlt="Image showing the Add Evaluations form" darkAlt="Image showing the Add Evaluations form" src="https://mintcdn.com/wisdomai/AlmJTV-THzwWGfCm/images/new-ui-evaluetion-set.png?fit=max&auto=format&n=AlmJTV-THzwWGfCm&q=85&s=0fb462e1899850aef46dfb858885340c" className="dark:hidden" width="860" height="630" data-path="images/new-ui-evaluetion-set.png" />

<img alt="Image showing the Add Evaluations form" lightAlt="Image showing the Add Evaluations form" darkAlt="Image showing the Add Evaluations form" src="https://mintcdn.com/wisdomai/FPeFp9c8fz1r_sxU/dark-img/add-evaluation-form.png?fit=max&auto=format&n=FPeFp9c8fz1r_sxU&q=85&s=63f6e17c71ce3cb6f562131832559faf" className="hidden dark:block" width="1287" height="771" data-path="dark-img/add-evaluation-form.png" />

## Evaluation Runs

Evaluation Runs are where the AI processes your defined Evaluation Sets. After running these evaluations, you can review the results to identify areas for improvement.

To run an Evaluation, go to the **Evaluations Set** tab, select an evaluation, and click on the **run icon**. You will see the results in the Evaluation Runs tab.

<img alt="Run Evaluation Pn" lightAlt="Run Evaluation Pn" darkAlt="Run Evaluation Pn" src="https://mintcdn.com/wisdomai/AlmJTV-THzwWGfCm/images/run-action-icon.png?fit=max&auto=format&n=AlmJTV-THzwWGfCm&q=85&s=6f63d3ed3291d71abcd04f0f8cec4864" className="dark:hidden" width="1687" height="776" data-path="images/run-action-icon.png" />

<img alt="Run Evaluation Pn" lightAlt="Run Evaluation Pn" darkAlt="Run Evaluation Pn" src="https://mintcdn.com/wisdomai/FPeFp9c8fz1r_sxU/dark-img/run-evaluation.png?fit=max&auto=format&n=FPeFp9c8fz1r_sxU&q=85&s=fd8b1e7c1db775efe11d44a501b06c20" className="hidden dark:block" width="1744" height="597" data-path="dark-img/run-evaluation.png" />

### Evaluation Run indicators

Evaluation Run Indicators provide a concise overview of the Run's progress and outcome. These indicators offer immediate feedback on the Evaluation's status and score, detailing how well the AI's generated responses matched the expected results. They are:

* **Status:** Signals its progress or completion.
  * Running: The evaluation run is currently in progress.
  * Completed: The evaluation run has finished successfully.
* **Score:** Reflects the result of the completed Evaluation. It tells you how many conversations passed based on the predefined **evaluation criteria** (i.e., Prompt + Expected SQL result added in the Evaluation set modal).

<Tip>
  The Score evaluation criteria refer to how much the generated answer matches the expected SQL result defined previously.
</Tip>

<img alt="Evaluation Runs Indicatorsv2 Pn" lightAlt="Evaluation Runs Indicatorsv2 Pn" darkAlt="Evaluation Runs Indicatorsv2 Pn" src="https://mintcdn.com/wisdomai/oq67V3T9jhHN9WJO/images/evaluation-runs-indicatorsv2.png?fit=max&auto=format&n=oq67V3T9jhHN9WJO&q=85&s=c66d19d9df5da79beda001cadd152884" className="dark:hidden" width="1455" height="786" data-path="images/evaluation-runs-indicatorsv2.png" />

<img alt="Evaluation Runs Indicatorsv2 Pn" lightAlt="Evaluation Runs Indicatorsv2 Pn" darkAlt="Evaluation Runs Indicatorsv2 Pn" src="https://mintcdn.com/wisdomai/FPeFp9c8fz1r_sxU/dark-img/evaluation-runs-indicatorsv2.png?fit=max&auto=format&n=FPeFp9c8fz1r_sxU&q=85&s=5e065107a2f00a2a149a1cd6f19de7ed" className="hidden dark:block" width="1756" height="854" data-path="dark-img/evaluation-runs-indicatorsv2.png" />

### Evaluation Run report

When you click the **View Report** option, you will see comprehensive details about how the evaluation run performed. Here's a breakdown of the information you'll find:

* **View Domain:** A link that allows you to navigate to the specific domain that was evaluated.
* **Soft Match:** This **Score** indicates the overall performance of the Evaluation. It shows how many of the evaluation criteria were successfully met out of the total. It is named as Soft Match since results may be considered a match even if they are not the same as the expected (provided) SQL.
* **Individual Session Details**: The report organizes the evaluation results by individual sessions or queries (e.g., Session 1, Session 2).

<img alt="View Report 01v2 Gi" lightAlt="View Report 01v2 Gi" darkAlt="View Report 01v2 Gi" src="https://mintcdn.com/wisdomai/wAEh0bG6TcD8vn3h/images/view-report-01v2-1.gif?s=8244d1ae99b71c5930d44fcc5aceff91" className="dark:hidden" width="1766" height="868" data-path="images/view-report-01v2-1.gif" />

<img alt="View Report 01v2 Gi" lightAlt="View Report 01v2 Gi" darkAlt="View Report 01v2 Gi" src="https://mintcdn.com/wisdomai/qJRFF44IeHBfP-Jl/images/view-report-dk.gif?s=67c454c9a772f125fd82e9063c7f486c" className="hidden dark:block" width="1758" height="946" data-path="images/view-report-dk.gif" />

For each session, the report is broken down into several components, allowing you to analyze the AI's performance thoroughly.

* **Session Title:** This states the query or task that was evaluated for that particular session (e.g., "Calculate total revenue for closed won opportunities using ACV.").
* **Evaluation Details:** This expandable section provides specific insights into the session's outcome. The core components are detailed in the table below.

| Component            | Description                                                                                                     |
| :------------------- | :-------------------------------------------------------------------------------------------------------------- |
| **Prompt**           | The specific input prompt that was used for the session.                                                        |
| **Manual Score**     | An option for you to manually score the Evaluation (✅ or ❌), which overrides the automated score.               |
| **Automated Score**  | The system automatically assigns the score. This corresponds to the **Soft Match**.                             |
| **Ground Truth**     | The expected or correct outcome, typically the ideal SQL query (`SELECT SUM(amount) FROM Opportunity`).         |
| **Generated Result** | The output produced by the system, including the generated SQL query and the final result (e.g., `"$137.55M"`). |

<img alt="Gif showing the Session Details" lightAlt="Gif showing the Session Details" darkAlt="Gif showing the Session Details" src="https://mintcdn.com/wisdomai/tFQ8vc64nav1FvIc/images/evaluation-inspec-1.gif?s=11a4472f40e4d5ad6578e4a8d73650d7" className="dark:hidden" width="1794" height="954" data-path="images/evaluation-inspec-1.gif" />

<img alt="Gif showing the Session Details" lightAlt="Gif showing the Session Details" darkAlt="Gif showing the Session Details" src="https://mintcdn.com/wisdomai/lEU19OFfT-fznjCT/images/evaluation-inspec-v2.gif?s=2717f7fd76cf21ef4cd731cf09c0091f" className="hidden dark:block" width="1772" height="856" data-path="images/evaluation-inspec-v2.gif" />

<Warning>
  Providing an incorrect SQL code will lead to a syntax or semantic error. The system will display a red message detailing the nature of the error, often including the location within the query where the problem occurred, to aid in correction.

  <img alt="Example showing an Incorrect Sql in Red" lightAlt="Example showing an Incorrect Sql in Red" darkAlt="Example showing an Incorrect Sql in Red" src="https://mintcdn.com/wisdomai/82xgEVWUP7cwhDHG/images/incorrect-sql-example.png?fit=max&auto=format&n=82xgEVWUP7cwhDHG&q=85&s=91e3bf5c888e5bda14a1a9c9966b2c3f" className="dark:hidden" width="1435" height="584" data-path="images/incorrect-sql-example.png" />

  <img alt="Example showing an Incorrect Sql in Red" lightAlt="Example showing an Incorrect Sql in Red" darkAlt="Example showing an Incorrect Sql in Red" src="https://mintcdn.com/wisdomai/FPeFp9c8fz1r_sxU/dark-img/incorrect-sql-example.png?fit=max&auto=format&n=FPeFp9c8fz1r_sxU&q=85&s=9ecbdd6da73f7f316837256f52b5935d" className="hidden dark:block" width="1582" height="501" data-path="dark-img/incorrect-sql-example.png" />
</Warning>

* **Conversation:** You can expand this section to review the entire conversational exchange related to that specific session. This includes:
  * **AI Workstream:** Offers a look into the AI's process, showing the tool it selected, the examples it referenced, and the step-by-step plan it followed to generate the response.
  * **Reviewed Status**: Confirms whether the response has been reviewed and summarizes the outcome (e.g., "The user received a complete response... so no further action or information is needed.").

<img alt="Show Chat V2" lightAlt="Gif showing the Conversation in Detail" darkAlt="Gif showing the Conversation in Detail" title="Show Chat V2" src="https://mintcdn.com/wisdomai/baQ8c6mHAyJCaOOI/images/show-chat-1f.gif?s=76ee268aede1f7d659be813f838360b4" className="dark:hidden" width="1794" height="952" data-path="images/show-chat-1f.gif" />

<img alt="Show Chat V2" lightAlt="Gif showing the Conversation in Detail" darkAlt="Gif showing the Conversation in Detail" title="Show Chat V2" src="https://mintcdn.com/wisdomai/N3P0pgrbnb3fiq8-/images/show-chat-v3.gif?s=d0c0586e666aab48e7ef56442f48010e" className="hidden dark:block" width="1760" height="796" data-path="images/show-chat-v3.gif" />

Navigating the report helps you understand the system's processing, verify generated results, and review the underlying conversational logic.

## Next steps

<CardGroup cols={2}>
  <Card title="Manage Domains" icon="sliders" href="setting-up-wisdom-ai/manage-domains/understand-domains">
    Manage and customize your data domains to refine context and improve query results.
  </Card>

  <Card title="Auditing" icon="clipboard-list" href="/using-wisdom-ai-everyday/auditing">
    Organize insights by tagging chats, navigating history, and sharing vetted answers with your team.
  </Card>

  <Card title="Basic Tutorial: Connect and Test" icon="rocket" href="/setting-up-wisdom-ai/basic-tutorial-connect-and-test">
    Walk through the initial setup to connect a data source and run your first query.
  </Card>

  <Card title="Advanced Data Modeling" icon="sitemap" href="/setting-up-wisdom-ai/advanced-data-modeling-creating-context">
    Define relationships and context in your data to enable more powerful analysis.
  </Card>
</CardGroup>
