Agent with ChatHuggingFace LLM does not support both tools and response format?

alexbelengeanu · January 3, 2026, 11:11am

I am creating an agent which was working perfectly fine if I just passed the tools as arguments to the create_agent() method.

class Citation(BaseModel):
    """Citation information for a document."""
    document_id: str = Field(description="The unique identifier of the document")
    title: str = Field(description="The title of the document which was cited")
    reference_text: str = Field(description="The text snippet from the document that was referenced")


class QueryResponse(BaseModel):
    """Final user query response with answer and citations"""
    answer: str = Field(description="The agent's final answer")
    citations: list[Citation] = Field(description="List of citations used to generate the answer")

class VLMAgent:
    """ Wrapper class for the VLM agent. """
    def __init__(self):
        self._vlm = None
        self._tools = None
        self._checkpointer = None
        self._embeddings = None
        self._vlm_agent = None

    def initialize(self, embeddings: HuggingFaceEmbeddings, vlm_manager: LanguageModelManager):
        if self._vlm_agent is None:
            self._vlm = vlm_manager.vlm
            self._embeddings = embeddings
            self._checkpointer = get_redis_checkpointer()
            self._tools = [
                get_documents_metadata,
                get_documents_content,
                create_vector_search_tool(self._embeddings),
            ]

            # Get middleware list from factory
            guardrails_middleware = create_guardrails_middleware(
                guardrails_llm=vlm_manager.guardrails_vlm,
                max_retries=3
            )

            self._vlm_agent = create_agent(
                self._vlm,
                tools=self._tools,
                response_format=ToolStrategy(QueryResponse),
                context_schema=CustomAgentState,
                system_prompt=system_prompt,
                checkpointer=self._checkpointer,
                middleware=guardrails_middleware + [
                    SummarizationMiddleware(
                        model=self._vlm,
                        trigger=("messages", 20),  # Trigger earlier to prevent token overflow
                        keep=("messages", 4),  # Keep fewer messages to maintain context within limits
                    ),
                ], 
            )

Once I also added the response_format schema, I started receiving the following error:
Traceback (most recent call last):
File “/mnt/d/Miscellaneous/Projects/backend_vlm/src/application/chat_service/streaming.py”, line 88, in stream_agent_response
for chunk in vlm_agent.stream(
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/pregel/main.py”, line 2633, in stream
for _ in runner.tick(
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/pregel/_runner.py”, line 167, in tick
run_with_retry(
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/pregel/_retry.py”, line 42, in run_with_retry
return task.proc.invoke(task.input, config)
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/_internal/_runnable.py”, line 656, in invoke
input = context.run(step.invoke, input, config, **kwargs)
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/_internal/_runnable.py”, line 400, in invoke
ret = self.func(*args, **kwargs)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/factory.py”, line 1132, in model_node
response = wrap_model_call_handler(request, _execute_model_sync)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/factory.py”, line 146, in normalized_single
result = single_handler(request, handler)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/middleware/types.py”, line 1672, in wrapped
return func(request, handler)
File “/mnt/d/Miscellaneous/Projects/backend_vlm/src/application/chat_service/middlewares.py”, line 414, in retry_on_error
raise last_error
File “/mnt/d/Miscellaneous/Projects/backend_vlm/src/application/chat_service/middlewares.py”, line 409, in retry_on_error
return handler(request)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/factory.py”, line 1097, in execute_model_sync
model, effective_response_format = _get_bound_model(request)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/factory.py”, line 1074, in _get_bound_model
request.model.bind_tools(
File “/home/alex/venvs/lib/python3.10/site-packages/langchain_huggingface/chat_models/huggingface.py”, line 968, in bind_tools
raise ValueError(msg)
ValueError: When specifying tool_choice, you must provide exactly one tool. Received 4 tools.

I tried the same thing using a locally deployed Ollama model and it seemed to work fine, I did not receive any error and the final response had the final answer and citations fields.
Is this an issue only with the HuggingFace chat models?

pawel-twardziak · January 3, 2026, 8:18pm

Hi @alexbelengeanu

having done some investigation in the source code I can state this:

Short answer: Yes - this is specific to the current HuggingFace chat integration, not to create_agent in general.

What ToolStrategy does inside create_agent

When you pass response_format=ToolStrategy(QueryResponse), create_agent:

Builds a hidden structured‑output tool from your QueryResponse schema.
Appends that tool to the tools list.
Forces a tool call by setting tool_choice=“any” when binding the model.

So with your 3 tools + ToolStrategy(QueryResponse), ChatHuggingFace is asked to bind_tools with 4 tools and tool_choice=“any”.

This pattern is supported and tested for providers like OpenAI and works fine there (and with many OpenAI‑compatible backends such as Ollama)

Why ChatHuggingFace throws ValueError

The HuggingFace integration’s bind_tools is much stricter:

    def bind_tools(
        self,
        tools: Sequence[dict[str, Any] | type | Callable | BaseTool],
        *,
        tool_choice: dict | str | bool | None = None,
        **kwargs: Any,
    ) -> Runnable[LanguageModelInput, AIMessage]:
        ...
        formatted_tools = [convert_to_openai_tool(tool) for tool in tools]
        if tool_choice is not None and tool_choice:
            if len(formatted_tools) != 1:
                msg = (
                    "When specifying `tool_choice`, you must provide exactly one "
                    f"tool. Received {len(formatted_tools)} tools."
                )
                raise ValueError(msg)
            if isinstance(tool_choice, str):
                if tool_choice not in ("auto", "none", "required"):
                    tool_choice = {
                        "type": "function",
                        "function": {"name": tool_choice},
                    }
            elif isinstance(tool_choice, bool):
                tool_choice = formatted_tools[0]
            elif isinstance(tool_choice, dict):
                if (
                    formatted_tools[0]["function"]["name"]
                    != tool_choice["function"]["name"]
                ):
                    ...
            ...
        return super().bind(tools=formatted_tools, **kwargs)

Key points:

Any non‑null tool_choice requires exactly one tool. With your 3 tools + 1 structured‑output tool, len(formatted_tools) == 4, so it always raises the ValueError you see.
ChatHuggingFace only treats “auto”, “none” and “required” as special; anything else (like “any”) is converted into a dict targeting a single tool name. That assumption is aligned with simple chat.bind_tools([SingleTool]) examples, but not with create_agent’s “any” sentinel and multiple tools.

Other integrations (OpenAI, Ollama’s OpenAI wrapper, VertexAI GenAI, etc.) have bind_tools implementations that allow “any”/“auto” with multiple tools, which is why the exact same agent graph runs there without error.

So yes, this combination (multiple tools + ToolStrategy from create_agent) is currently broken specifically for ChatHuggingFace because its bind_tools contract doesn’t match what create_agent expects.

I’m not actually sure whether or not this is a bug or a feature

Could you @alexbelengeanu try to create a subclass and see whether it works?

  from langchain_huggingface import ChatHuggingFace as _ChatHuggingFace

  class PatchedChatHuggingFace(_ChatHuggingFace):
      def bind_tools(self, tools, *, tool_choice=None, **kwargs):
          # Allow LangChain agents' 'any' sentinel with multiple tools
          if tool_choice == "any":
              tool_choice = "auto"  # or just None
          # Optionally drop the len(formatted_tools) == 1 restriction entirely
          return super().bind_tools(tools=tools, tool_choice=tool_choice, **kwargs)

Then use PatchedChatHuggingFace in your LanguageModelManager.

Let me know how it works please

alexbelengeanu · January 4, 2026, 12:35pm

Hey @pawel-twardziak

Thanks for your suggestion. Using that wrapper fixes the ValueError, but for some reason the agent is not outputting the final answer using the provided schema. There is no structured_response key in the final result, only the content => the agent is returning the result in the same way as it did before adding the response_format.

I am doing this now:

def bind_tools(
        self,
        tools: Sequence[Dict[str, Any] | type | Callable | BaseTool],
        *,
        tool_choice: Dict | str | bool | None = None,
        **kwargs: Any,
) -> Runnable[LanguageModelInput, AIMessage]:
    if tool_choice in ("any", "auto", "required") and len(tools) > 1:
        tool_choice = None

    return super().bind_tools(tools=tools, tool_choice=tool_choice, **kwargs)

And inside the API endpoint responsible for sending chat messages I am doing that:

...

docs_metadata, docs_content = prepare_documents(payload.documents)
llm_content = payload.to_llm_content()
result = vlm_agent.invoke({
                "messages": [{"role": "user", "content": llm_content}],
            },
            context=CustomAgentState(
                session_id=payload.session_id,
                documents_metadata=docs_metadata,
                documents_content=docs_content
            ),
            config={
                "configurable": {
                    "thread_id": payload.session_id
                }
            }
        )

if hasattr(result, 'structured_response') and result.structured_response:
    logger.debug(f"Structured output for session {payload.session_id}.")
else:
    logger.debug(f"No structured output found for session {payload.session_id}. Using raw content.")

Maybe I’ll think about something else to get the desired output format, thank you very much for your time and idea

pawel-twardziak · January 4, 2026, 2:05pm

hi @alexbelengeanu

thanks for your feedback Sorry it didn’t help.

What HugginFace model are you using?

alexbelengeanu · January 4, 2026, 4:52pm

For this use case I tested with several models but now I mainly work with Qwen and Gemma.

firefoxegy · January 10, 2026, 6:13am

I have exactly the same issue. I am trying to create an agent

agent = create_agent(model=chatModel, tools=[geocode_city, get_weather], response_format=ToolStrategy(WeatherResponse))

The LLM is a chat hugging face model and I get the same error

I am using the meta-llama/Llama-3.3-70B-Instruct model

This error doesn’t occur when I am using a model from TogetherAI

Topic		Replies	Views
Create_agent + ToolStrategy: Tools + Structured Output Not Working Across Models (LangChain 1.0.2) LangSmith Product Help	3	663	January 5, 2026
Harmony Response Format sometimes outputted when using gpt-oss-120b as an Agent LangChain python-help	9	926	January 5, 2026
Tool call and structured ouput LangChain python-help	2	394	October 23, 2025
ChatHuggingFace + HuggingFacePipeline code never parses tool call code LangChain product-feedback , python-help	4	312	December 20, 2025
An exception issue related to the invocation of tools via ChatopenAI() LangChain python-help	4	50	March 17, 2026

Agent with ChatHuggingFace LLM does not support both tools and response format?

What ToolStrategy does inside create_agent

Why ChatHuggingFace throws ValueError

Related topics