evaluate()
evaluation flow, this is useful when:
langsmith>=0.3.1
.*.test.ts
files) using your existing test config files, the below examples will also set up a separate test config file and command to run your evals. It will assume you end your test files with .eval.ts
.
This ensures that the custom test reporter and other LangSmith touchpoints do not modify your existing test outputs.
openai
(and of course langsmith
!) as a dependency:
ls.vitest.config.ts
file with the following base config:
include
ensures that only files ending with some variation of eval.ts
in your project are runreporters
is responsible for nicely formatting your output as shown abovesetupFiles
runs dotenv
to load environment variables before running your evals"environment"
field from your config or set it to "node"
.scripts
field in your package.json
to run Vitest with the config you just created:
openai
(and of course langsmith
!) as a dependency:
ls.jest.config.cjs
:
testMatch
ensures that only files ending with some variation of eval.js
in your project are runreporters
is responsible for nicely formatting your output as shown abovesetupFiles
runs dotenv
to load environment variables before running your evals"testEnvironment"
field from your config or set it to "node"
.scripts
field in your package.json
to run Jest with the config you just created:
describe
and test
from the langsmith/jest
or langsmith/vitest
entrypointdescribe
blocksql.eval.ts
(or sql.eval.js
if you are using Jest without TypeScript) and pasting the below contents into it:
ls.test()
case as corresponding to a dataset example, and ls.describe()
as defining a LangSmith dataset. If you have LangSmith tracing environment variables set when you run the test suite, the SDK does the following:
ls.describe()
in LangSmith if it does not existpass
feedback key for each test casepass
boolean feedback key based on the test case passing / failing. It will also track any outputs that you log with the ls.logOutputs()
or return from the test function as “actual” result values from your app for the experiment.
Create a .env
file with your OPENAI_API_KEY
and LangSmith credentials if you don’t already have one:
eval
script we set up in the previous step to run the test:
pass
feedback key for each test case. You can add additional feedback with either ls.logFeedback()
or wrapEvaluator()
. To do so, try the following as your sql.eval.ts
file (or sql.eval.js
if you are using Jest without TypeScript):
ls.wrapEvaluator()
around the myEvaluator
function. This makes it so that the LLM-as-judge call is traced separately from the rest of the test case to avoid clutter, and conveniently creates feedback if the return value from the wrapped function matches { key: string; score: number | boolean }
. In this case, instead of showing up in the main test case run, the evaluator trace will instead show up in a trace associated with the correctness
feedback key.
You can see the evaluator runs in LangSmith by clicking their corresponding feedback chips in the UI.
ls.test.each()
. This is useful when you want to evaluate your app the same way against different inputs:
ls.logOutputs()
like this:
.skip
and .only
methods on ls.test()
and ls.describe()
:
ls.describe()
for the full suite or by passing a config
field into ls.test()
for individual tests:
process.env.ENVIRONMENT
, process.env.NODE_ENV
and process.env.LANGSMITH_ENVIRONMENT
and set them as metadata on created experiments. You can then filter experiments by metadata in LangSmith’s UI.
See the API refs for a full list of configuration options.
LANGSMITH_TEST_TRACKING=false
in your environment.
The tests will run as normal, but the experiment logs will not be sent to LangSmith.