Example
.
For custom code evaluators bound to a dataset, the evaluator function takes in two arguments:
Run
(reference). This represents the new run in your experiment. For example, if you ran an experiment via SDK, this would contain the input/output from your chain or model you are testing.Example
(reference). This represents the reference example in your dataset that the chain or model you are testing uses. The inputs
to the Run and Example should be the same. If your Example has a reference outputs
, then you can use this to compare to the run’s output for scoring.