How to evaluate on a specific dataset version

Before diving into this content, it might be helpful to read the guide on versioning datasets. Additionally, it might be helpful to read the guide on fetching examples.

Using `list_examples`

You can take advantage of the fact that evaluate / aevaluate allows passing in an iterable of examples to evaluate on a particular version of a dataset. Simply use list_examples / listExamples to fetch examples from a particular version tag using as_of / asOf and pass that in to the data argument.

Python
TypeScript

from langsmith import Client

ls_client = Client()

# Assumes actual outputs have a 'class' key.
# Assumes example outputs have a 'label' key.
def correct(outputs: dict, reference_outputs: dict) -> bool:
  return outputs["class"] == reference_outputs["label"]

results = ls_client.evaluate(
    lambda inputs: {"class": "Not toxic"},
    # Pass in filtered data here:
    data=ls_client.list_examples(
      dataset_name="Toxic Queries",
      as_of="latest",  # specify version here
    ),
    evaluators=[correct],
)

Learn more about how to fetch views of a dataset here

Datasets

Evaluations

Analyze experiment results

Annotation & human feedback

Tutorials

How to evaluate on a specific dataset version

Using `list_examples`

Datasets

Evaluations

Analyze experiment results

Annotation & human feedback

Tutorials

​Using list_examples

​Related

Using `list_examples`

Related