Access each experiment result in a pairwise experiments

eric-burel · October 7, 2025, 4:22pm

Hi, all in the title, I’d be eager to build a pairwise evaluator based on the previous evaluations results, rather than outputs direct comparison. For instance I could write business rules indicating that I want to favour one metric and use the other as damage control in the pairwise experiment.

def ranked_preference(inputs: dict, outputs: list[dict]) -> list:
    # Assumes example inputs have a 'question' key and experiment
    # outputs have an 'answer' key.
    response = chain.invoke({
        "question": inputs["question"],
        "answer_a": outputs[0].get("answer", "N/A"),
        "answer_b": outputs[1].get("answer", "N/A"),
    })

For instance in this code, does the “outputs” direct have more metadata to it I can use? Or should I look at runs rather than output? I am not sure whether evaluation results are found in the Run object though (Reference here).

Topic		Replies	Views
Viewing experiment metadata - such as model and prompt - for Langsmith evaluations Observability & Evals	4	132	September 26, 2025
Langsmitsh experiments not showing results Observability & Evals	1	75	October 16, 2025
Experiments Bug(?) - Every output is using the same input Observability & Evals	2	279	July 10, 2025
Full experiment runs only 20 examples Observability & Evals	9	657	November 4, 2025
Langsmith-experiment-UI-not reporting the scores Observability & Evals	3	94	October 13, 2025

Access each experiment result in a pairwise experiments

Related topics