Agent with ChatHuggingFace LLM does not support both tools and response format?

I am creating an agent which was working perfectly fine if I just passed the tools as arguments to the create_agent() method.

class Citation(BaseModel):
    """Citation information for a document."""
    document_id: str = Field(description="The unique identifier of the document")
    title: str = Field(description="The title of the document which was cited")
    reference_text: str = Field(description="The text snippet from the document that was referenced")


class QueryResponse(BaseModel):
    """Final user query response with answer and citations"""
    answer: str = Field(description="The agent's final answer")
    citations: list[Citation] = Field(description="List of citations used to generate the answer")

class VLMAgent:
    """ Wrapper class for the VLM agent. """
    def __init__(self):
        self._vlm = None
        self._tools = None
        self._checkpointer = None
        self._embeddings = None
        self._vlm_agent = None

    def initialize(self, embeddings: HuggingFaceEmbeddings, vlm_manager: LanguageModelManager):
        if self._vlm_agent is None:
            self._vlm = vlm_manager.vlm
            self._embeddings = embeddings
            self._checkpointer = get_redis_checkpointer()
            self._tools = [
                get_documents_metadata,
                get_documents_content,
                create_vector_search_tool(self._embeddings),
            ]

            # Get middleware list from factory
            guardrails_middleware = create_guardrails_middleware(
                guardrails_llm=vlm_manager.guardrails_vlm,
                max_retries=3
            )

            self._vlm_agent = create_agent(
                self._vlm,
                tools=self._tools,
                response_format=ToolStrategy(QueryResponse),
                context_schema=CustomAgentState,
                system_prompt=system_prompt,
                checkpointer=self._checkpointer,
                middleware=guardrails_middleware + [
                    SummarizationMiddleware(
                        model=self._vlm,
                        trigger=("messages", 20),  # Trigger earlier to prevent token overflow
                        keep=("messages", 4),  # Keep fewer messages to maintain context within limits
                    ),
                ], 
            )

Once I also added the response_format schema, I started receiving the following error:
Traceback (most recent call last):
File “/mnt/d/Miscellaneous/Projects/backend_vlm/src/application/chat_service/streaming.py”, line 88, in stream_agent_response
for chunk in vlm_agent.stream(
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/pregel/main.py”, line 2633, in stream
for _ in runner.tick(
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/pregel/_runner.py”, line 167, in tick
run_with_retry(
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/pregel/_retry.py”, line 42, in run_with_retry
return task.proc.invoke(task.input, config)
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/_internal/_runnable.py”, line 656, in invoke
input = context.run(step.invoke, input, config, **kwargs)
File “/home/alex/venvs/lib/python3.10/site-packages/langgraph/_internal/_runnable.py”, line 400, in invoke
ret = self.func(*args, **kwargs)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/factory.py”, line 1132, in model_node
response = wrap_model_call_handler(request, _execute_model_sync)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/factory.py”, line 146, in normalized_single
result = single_handler(request, handler)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/middleware/types.py”, line 1672, in wrapped
return func(request, handler)
File “/mnt/d/Miscellaneous/Projects/backend_vlm/src/application/chat_service/middlewares.py”, line 414, in retry_on_error
raise last_error
File “/mnt/d/Miscellaneous/Projects/backend_vlm/src/application/chat_service/middlewares.py”, line 409, in retry_on_error
return handler(request)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/factory.py”, line 1097, in execute_model_sync
model
, effective_response_format = _get_bound_model(request)
File “/home/alex/venvs/lib/python3.10/site-packages/langchain/agents/factory.py”, line 1074, in _get_bound_model
request.model.bind_tools(
File “/home/alex/venvs/lib/python3.10/site-packages/langchain_huggingface/chat_models/huggingface.py”, line 968, in bind_tools
raise ValueError(msg)
ValueError: When specifying tool_choice, you must provide exactly one tool. Received 4 tools.

I tried the same thing using a locally deployed Ollama model and it seemed to work fine, I did not receive any error and the final response had the final answer and citations fields.
Is this an issue only with the HuggingFace chat models?

Hi @alexbelengeanu

having done some investigation in the source code I can state this:

Short answer: Yes - this is specific to the current HuggingFace chat integration, not to create_agent in general.

What ToolStrategy does inside create_agent

When you pass response_format=ToolStrategy(QueryResponse), create_agent:

  • Builds a hidden structured‑output tool from your QueryResponse schema.
  • Appends that tool to the tools list.
  • Forces a tool call by setting tool_choice=“any” when binding the model.

So with your 3 tools + ToolStrategy(QueryResponse), ChatHuggingFace is asked to bind_tools with 4 tools and tool_choice=“any”.

This pattern is supported and tested for providers like OpenAI and works fine there (and with many OpenAI‑compatible backends such as Ollama)

Why ChatHuggingFace throws ValueError

The HuggingFace integration’s bind_tools is much stricter:

    def bind_tools(
        self,
        tools: Sequence[dict[str, Any] | type | Callable | BaseTool],
        *,
        tool_choice: dict | str | bool | None = None,
        **kwargs: Any,
    ) -> Runnable[LanguageModelInput, AIMessage]:
        ...
        formatted_tools = [convert_to_openai_tool(tool) for tool in tools]
        if tool_choice is not None and tool_choice:
            if len(formatted_tools) != 1:
                msg = (
                    "When specifying `tool_choice`, you must provide exactly one "
                    f"tool. Received {len(formatted_tools)} tools."
                )
                raise ValueError(msg)
            if isinstance(tool_choice, str):
                if tool_choice not in ("auto", "none", "required"):
                    tool_choice = {
                        "type": "function",
                        "function": {"name": tool_choice},
                    }
            elif isinstance(tool_choice, bool):
                tool_choice = formatted_tools[0]
            elif isinstance(tool_choice, dict):
                if (
                    formatted_tools[0]["function"]["name"]
                    != tool_choice["function"]["name"]
                ):
                    ...
            ...
        return super().bind(tools=formatted_tools, **kwargs)

Key points:

  • Any non‑null tool_choice requires exactly one tool. With your 3 tools + 1 structured‑output tool, len(formatted_tools) == 4, so it always raises the ValueError you see.
  • ChatHuggingFace only treats “auto”, “none” and “required” as special; anything else (like “any”) is converted into a dict targeting a single tool name. That assumption is aligned with simple chat.bind_tools([SingleTool]) examples, but not with create_agent’s “any” sentinel and multiple tools.

Other integrations (OpenAI, Ollama’s OpenAI wrapper, VertexAI GenAI, etc.) have bind_tools implementations that allow “any”/“auto” with multiple tools, which is why the exact same agent graph runs there without error.

So yes, this combination (multiple tools + ToolStrategy from create_agent) is currently broken specifically for ChatHuggingFace because its bind_tools contract doesn’t match what create_agent expects.

I’m not actually sure whether or not this is a bug or a feature :slight_smile:

Could you @alexbelengeanu try to create a subclass and see whether it works?

  from langchain_huggingface import ChatHuggingFace as _ChatHuggingFace

  class PatchedChatHuggingFace(_ChatHuggingFace):
      def bind_tools(self, tools, *, tool_choice=None, **kwargs):
          # Allow LangChain agents' 'any' sentinel with multiple tools
          if tool_choice == "any":
              tool_choice = "auto"  # or just None
          # Optionally drop the len(formatted_tools) == 1 restriction entirely
          return super().bind_tools(tools=tools, tool_choice=tool_choice, **kwargs)

Then use PatchedChatHuggingFace in your LanguageModelManager.

Let me know how it works please :slight_smile:

Hey @pawel-twardziak :slight_smile:

Thanks for your suggestion. Using that wrapper fixes the ValueError, but for some reason the agent is not outputting the final answer using the provided schema. There is no structured_response key in the final result, only the content => the agent is returning the result in the same way as it did before adding the response_format.

I am doing this now:

def bind_tools(
        self,
        tools: Sequence[Dict[str, Any] | type | Callable | BaseTool],
        *,
        tool_choice: Dict | str | bool | None = None,
        **kwargs: Any,
) -> Runnable[LanguageModelInput, AIMessage]:
    if tool_choice in ("any", "auto", "required") and len(tools) > 1:
        tool_choice = None

    return super().bind_tools(tools=tools, tool_choice=tool_choice, **kwargs)

And inside the API endpoint responsible for sending chat messages I am doing that:

...

docs_metadata, docs_content = prepare_documents(payload.documents)
llm_content = payload.to_llm_content()
result = vlm_agent.invoke({
                "messages": [{"role": "user", "content": llm_content}],
            },
            context=CustomAgentState(
                session_id=payload.session_id,
                documents_metadata=docs_metadata,
                documents_content=docs_content
            ),
            config={
                "configurable": {
                    "thread_id": payload.session_id
                }
            }
        )

if hasattr(result, 'structured_response') and result.structured_response:
    logger.debug(f"Structured output for session {payload.session_id}.")
else:
    logger.debug(f"No structured output found for session {payload.session_id}. Using raw content.")

Maybe I’ll think about something else to get the desired output format, thank you very much for your time and idea :slight_smile:

hi @alexbelengeanu

thanks for your feedback :slight_smile: Sorry it didn’t help.

What HugginFace model are you using?

For this use case I tested with several models but now I mainly work with Qwen and Gemma.

I have exactly the same issue. I am trying to create an agent

agent = create_agent(model=chatModel, tools=[geocode_city, get_weather], response_format=ToolStrategy(WeatherResponse))

The LLM is a chat hugging face model and I get the same error

I am using the meta-llama/Llama-3.3-70B-Instruct model

This error doesn’t occur when I am using a model from TogetherAI