Welcome to the LangSmith Evaluation documentation. The following sections help you create datasets, run evaluations, and analyze results:
  • Datasets: Create and manage datasets for evaluation, including creating datasets through the UI or SDK and managing existing datasets.
  • Evaluations: Run evaluations on your applications using various methods and techniques, including different evaluator types and evaluation techniques.
  • Analyze experiment results: View and analyze your evaluation results, including comparing experiments, filtering results, and downloading data.
  • Annotation & human feedback: Collect human feedback on your application outputs through annotation queues and inline annotation.
  • Tutorials: Follow step-by-step tutorials to evaluate different types of applications, from chatbots to complex agents.
For terminology definitions and core concepts, refer to the introduction on evaluation.