Optional
customOptional
eval_Optional
evaluators?: (EvalConfig | T)[]Optional
prepareOptional
customCustom evaluators to apply to a dataset run. Each evaluator is provided with a run trace containing the model outputs, as well as an "example" object representing a record in the dataset.
Optional
eval_The language model specification for evaluators that require one.
Optional
evaluatorsLangChain evaluators to apply to a dataset run. You can optionally specify these by name, or by configuring them with an EvalConfig object.
Optional
prepareConvert the evaluation data into a format that can be used by the evaluator. By default, we pass the first value of the run.inputs, run.outputs (predictions), and references (example.outputs)
The prepared data.
Static
CriteriaConfiguration to load a "CriteriaEvalChain" evaluator, which prompts an LLM to determine whether the model's prediction complies with the provided criteria.
The criteria to use for the evaluator.
The language model to use for the evaluator.
The configuration for the evaluator.
const evalConfig = new RunEvalConfig(
[new RunEvalConfig.Criteria("helpfulness")],
);
const evalConfig = new RunEvalConfig(
[new RunEvalConfig.Criteria(
{ "isCompliant": "Does the submission comply with the requirements of XYZ"
})],
Static
LabeledConfiguration to load a "LabeledCriteriaEvalChain" evaluator, which prompts an LLM to determine whether the model's prediction complies with the provided criteria and also provides a "ground truth" label for the evaluator to incorporate in its evaluation.
The criteria to use for the evaluator.
The language model to use for the evaluator.
The configuration for the evaluator.
const evalConfig = new RunEvalConfig(
[new RunEvalConfig.LabeledCriteria("correctness")],
);
const evalConfig = new RunEvalConfig(
[new RunEvalConfig.Criteria(
{ "mentionsAllFacts": "Does the include all facts provided in the reference?"
})],
Generated using TypeDoc
Configuration class for running evaluations on datasets.
Remarks
RunEvalConfig in LangSmith is a configuration class for running evaluations on datasets. Its primary purpose is to define the parameters and evaluators that will be applied during the evaluation of a dataset. This configuration can include various evaluators, custom evaluators, and different keys for inputs, predictions, and references.
Typeparam
T - The type of evaluators.
Typeparam
U - The type of custom evaluators.