Introducing Align Evals: Streamlining LLM Application Evaluation

tanushree-sharma · July 29, 2025, 8:01pm

Hi LangChain community!

We’re excited to share a new LangSmith feature we’re launching to make it easier to build high quality LLM-as-a-judge evaluators: Align Evals.

One big challenge we hear consistently from teams building evaluations is: “Our evaluation scores don’t match what we’d expect a human on our team to say.” Align evals helps you calibrate your evaluators to better match human preferences.

This feature gives you side-by-side comparison of human-graded data and LLM-generated scores and a playground-like interface to iterate on your evaluator prompt and see the evaluator’s alignment score.

llm_alignment_2

It’s out now for you to try! You can learn more by heading to our developer documentation or watching our video tutorial. Would love to hear what you think - please reply here with any thoughts or requests!

widle · August 28, 2025, 6:32am

Thanks for sharing, @tanushree-sharma —this looks very useful. The side-by-side human vs. LLM comparison should make calibrating evaluators much more straightforward. Does Align Evals also support custom rubrics (e.g., categorical or multi-criteria evaluations), or is it limited to numeric scoring for now?

tanushree-sharma · August 28, 2025, 6:02pm

Thanks @widle!

Currently, we only support boolean evaluators and running this for a single evaluator. We do want to add support for categorical scores as well! For multi-criteria evaluations, would recommend breaking those up into distinct judges - have found independent judges tend to work better in practice than evaluating over multiple criteria in a single LLM-as-a-judge.

j.wrobel · October 10, 2025, 10:41pm

I was SO excited for this feature because I found it after I basically set up a more hacky way to go through this process with our Subject Matter Expert. I was so excited that I pushed our Platform team to upgrade our local hosted LangSmith instance to version 0.11.45 on our staging environment so I can start playing and using it for the next step of our project.

But sadly, I went to a dataset and I clicked on the Evaluation Button and saw now option for Create from Labeled Dataset option. The release notes indicate that this feature should be available at our current 0.11.45 version. Is this not true? Is there a setting we need to set to get this feature?

Hoping I get to play with this soon!

tanushree-sharma · October 10, 2025, 11:34pm

Hey @j.wrobel

Excited to have you try this out! Thanks for flagging that this isn’t showing up in self hosted, we are looking into this.

In the meantime, you’ll have to bug your Platform team one more time (sorry!) to enable it for your org. They will need to run the following query in Postgres

update organizations set config = config || '{"enable_align_evaluators": true}' where id = '<org_id>';

Let me know if you run into any issues.

michael-capizzi · October 14, 2025, 10:52pm

I am trying to enable via Postgres as you suggested @tanushree-sharma .

I had to update the query a bit (because the field was jsonb):

update organizations set config = (config::jsonb || ‘{"enable_align_evaluators": true}’::jsonb)::json where id = ‘<org_id>’

But I can confirm the update in the config field of the organizations table was made.

However when I go to an existing dataset with experiments, I still don’t see the option (see screenshot). We are on 0.11.45 and we restarted the backend application.

Any suggestions?

michael-capizzi · October 23, 2025, 8:54pm

@tanushree-sharma to close the loop here. We met with Brian V and figured it out!

There was a second feature flag that required updating:

update organizations set config = (config::jsonb || ‘{"new_rule_evaluator_creation_version": 3}’::jsonb)::json  where id = ‘<org_id>’

This enabled the new feature without a restart (simply needed to wait for the cache to clear.

Topic		Replies	Views
Score_string LangChainStringEvaluator display Observability & Evals	0	107	September 24, 2025
Can't add a new evaluator rule in LangSmith Academy module 5 LangChain Academy	0	96	October 15, 2025
Error trying to add an LLM-as-judge Evaluator Observability & Evals	4	593	August 21, 2025
Pass metadata as parameter into LLM-as-a-judge config in langsmith UI Observability & Evals	1	338	July 14, 2025
Error running Evals Observability & Evals	3	467	July 30, 2025

Introducing Align Evals: Streamlining LLM Application Evaluation

Related topics