ChatHuggingFace + HuggingFacePipeline code never parses tool call code

diegomarron · November 6, 2025, 11:10am

Hello Forum!

When using ChatHuggingFace + HuggingFacePipeline, the returned AIMessage never populates the AIMessage.tool_calls (it is always empty). Even if the output of the model contains well formatted json for tool calling.

Expected outcome:
AIMessage.tool_calls containing a list of ToolCall objects (like when using openAI’s chatmodel) when the llm response contains json object for tool call.

Diving into the source code, I can see that this is expected since there is no code for parsing the results.

When the llm is instance of HuggingFacePipeline, the ChatHuggingface.\_generate(…) method calls:

1. self._to_chat_prompt(..)

2. llm._generate(…)

3. self._to_chat_result(…)

The llm._generate method returns an LLMResults (which does not contain any field for tool_calls), so my guess would be that the ChatHuggingaFace model should be parsing the tool_calls and creating the appropriate ToolCall objects.

I have some bandwidth to work on this. It would be great if we could align on possible solutions and how to implement them. If you think that this change can be interesting enough to be incorportated, can think on a possible ways to implement and post them here for a more focused/precise discussion.

Looking forward to hear your thoughts!!

I’m using the following versions:

langchain=1.0.3
langchain_core=1.0.3
langchain-huggingface=1.0.1

I’m posting an example code to reproduce the issue, just in case I am doing something wrong.

import torch

from langchain_core.messages import SystemMessage, HumanMessage
from langchain_huggingface.llms import HuggingFacePipeline
from langchain_huggingface import ChatHuggingFace
from langchain.tools import tool



@tool
def multiply(a: int, b: int) -> int:
    """Multiply a and b.

    Args:
        a: first int
        b: second int
    """
    return a * b

model_name='Qwen/Qwen3-4B-Thinking-2507'

llm = HuggingFacePipeline.from_model_id(
    model_id=model_name,
    task='text-generation',
    device=0,
    batch_size=1,
    model_kwargs={'temperature': 0.1,
                  'max_length': 8192,
                  'torch_dtype':torch.float16
                  },
)

sys_msg = SystemMessage(content="You are a helpful assistant tasked with performing arithmetic on a set of inputs.")

prompt='''
multiply 3 by 4. 
You must use a tool name multiply which receives as parameters the two numbers to be multiplied
Respond only using a JSON blob with the following format:
{
  "name": "multiply",
  "args": { "a": "3", "b": "4" },
  "id": "multiply_call",
  "type": "tool_call"
}
'''
human_msg = HumanMessage(content=prompt)


chat_model = ChatHuggingFace(llm=llm)
llm_with_tools = chat_model.bind_tools([multiply])

llm_output = llm_with_tools.invoke([sys_msg,human_msg])

print(llm_output.tool_calls)

Diego

pawel-twardziak · November 6, 2025, 12:03pm

Hi @diegomarron

it seems lke this is an expected limitation when ChatHuggingFace wraps HuggingFacePipeline. It’s not an issue in your code.
With the pipeline backend, ChatHuggingFace converts the raw generated text into an AIMessage without attempting to parse tool calls, so AIMessage.tool_calls stays empty.
Tool-call parsing is only implemented for the structured chat APIs (e.g., Hugging Face TGI/Inference Endpoints), not for raw pipelines - e.g. HuggingFaceEndpoint.

If you must stay on pipelines, post-process the model’s JSON output yourself and map it to ToolCall objects (e.g., parse the JSON and set AIMessage.tool_calls), or use a structured-output parser like JsonOutputParser/JsonOutputKeyToolsParser to extract the call and then invoke your tool.

Some examples:

from langchain_huggingface.llms import HuggingFaceEndpoint
from langchain_huggingface import ChatHuggingFace
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_core.tools import tool

@tool
def multiply(a: int, b: int) -> int:
    return a * b

# Requires HUGGINGFACEHUB_API_TOKEN env var or pass huggingfacehub_api_token=...
llm = HuggingFaceEndpoint(
    repo_id="microsoft/Phi-3-mini-4k-instruct",  # or your endpoint/model
    max_new_tokens=64,
    do_sample=False,
)

chat = ChatHuggingFace(llm=llm)
chat_with_tools = chat.bind_tools([multiply])

msgs = [
    SystemMessage(content="You can call tools when needed."),
    HumanMessage(content="Multiply 3 by 4 using the tool."),
]
ai = chat_with_tools.invoke(msgs)
print(ai.tool_calls)

from langchain_community.llms.huggingface_text_gen_inference import (
    HuggingFaceTextGenInference,
)
from langchain_huggingface import ChatHuggingFace
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_core.tools import tool

@tool
def multiply(a: int, b: int) -> int:
    return a * b

# TGI server must be running (e.g., http://localhost:8080)
llm = HuggingFaceTextGenInference(
    inference_server_url="http://localhost:8080",
    max_new_tokens=64,
    temperature=0.0,
)

chat = ChatHuggingFace(llm=llm)
chat_with_tools = chat.bind_tools([multiply])

msgs = [
    SystemMessage(content="You can call tools when needed."),
    HumanMessage(content="Multiply 3 by 4 using the tool."),
]
ai = chat_with_tools.invoke(msgs)
print(ai.tool_calls)

diegomarron · November 7, 2025, 10:09am

Hi @pawel-twardziak
Thank you so much for your pointer. Now I have a better picture
Sadly and for legal constrains, I must stick to offline models that run locally.

I would like to politely raise an observation regarding the current implementation of ChatHuggingface + HuggingfacePipeline.

While I can certainly work around the current state, including parsing the model’s output and manually injecting the necessary tools directly into the prompt, this process presents a significant inconvenience. This necessity essentially forces me to re-implement fundamental logic that LangChain is supposed to simplify.

I would expect the ChatHuggingface and HuggingfacePipeline combination to work similarly to other chat models:

.bind_tools(…)
automatically pass the tools schema and output generation constraints to the prompt
fill the tool_calls for me

In addition, the current implementation of ChatHuggingface also requires message preprocessing before 'llm.invoke(…) since:

The ‘_to_chat_prompt(..)’ method does not support ToolCall nor ToolMessage objects
The ‘_to_chatml_prompt(..)’ requires the last message to be type HumanMessage, while for a ReAct agent, this is not always the case (it may come back from a ToolNode).

Streamlining this integration would dramatically enhance developer efficiency, uphold the core promise of the LangChain framework, and unlock more seamless integration of all supported model providers. I want to emphasize again that I have the bandwidth to work on this with you.

Thank you very much for your time

Diego

TZengCoder · December 20, 2025, 2:50am

Hi, I am using meta-llama/Llama-3.1-8B-Instruct and have successfully parsed the function called and executed it through prompting. However, I am unsure how I can feed the tool result back to the model. It seems ToolMessage does not apply to Huggingface model as it asks to provide tool_call_id which does not exist in the previous AIMessage. However, if i check the chat template of Llama 3.1, it does show signs of receiving tool messages back through a role called ‘ipython’. However, I am unsure how I can implement using the Langchain messages framework. Would appreciate any help!

pawel-twardziak · December 20, 2025, 7:59am

Hi @TZengCoder

could you create a separate post for your issue and mention me there? Most likely it requires a PR to the repo.

@diegomarron sorry for late resposne Do you want me to improve the integration together with your help?

Topic		Replies	Views
Tool/Function Calling with Llama-3.2-3B-Instruct model (local) LangChain python-help	8	611	January 3, 2026
Agent with ChatHuggingFace LLM does not support both tools and response format? LangChain python-help	5	238	January 10, 2026
Harmony Response Format sometimes outputted when using gpt-oss-120b as an Agent LangChain python-help	9	1054	January 5, 2026
Tool call and structured ouput LangChain python-help	2	445	October 23, 2025
Document issue for tool binding LangChain product-feedback , python-help	10	778	November 13, 2025

ChatHuggingFace + HuggingFacePipeline code never parses tool call code

Related topics