How to use tool calling using ChatLlamaCpp and Gemma 4 E4B with create_agent?

Hello, i need help with my project.
I’m trying to use ChatLlamaCpp in create_agent and want to use tool calling to it. My code is like this:

from langchain_community.chat_models import ChatLlamaCpp
from langchain.agents import create_agent

def get_weather(city: str) -> str:
    """Get weather for a given city."""
    return f"It's always sunny in {city}!"

def main():
    model = ChatLlamaCpp(
        model_path="models/gemma-4-E4B-it-Q4_K_M.gguf",
        verbose=False
    )

    model_with_tools = model.bind_tools(
        tools=[get_weather],
        tool_choice={"type": "function", "function": {"name": "get_weather"}}
    )

    agent = create_agent(
        model=model_with_tools,
        tools=[get_weather],
        system_prompt="You are a helpful assistant"
    )

    result = agent.invoke({
        "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}]
    })

    print(result)

if __name__ == "__main__":
    main()

Why is the response of agent.invoke like this?

{'messages': [HumanMessage(content="What's the weather in San Francisco?", additional_kwargs={}, response_metadata={}, id='411447a3-a8b7-428e-afd1-4a10669380a7'), AIMessage(content='<|tool_call>call:get_weather{city:<|"|>San Francisco<|"|>}<tool_call|>', additional_kwargs={}, response_metadata={'finish_reason': 'stop'}, id='lc_run--019dea66-3ee5-79d1-bfce-d0f469fe8807-0', tool_calls=[], invalid_tool_calls=[])]}

Why is the content of AI message is <|tool_call>call:get_weather{city:<|"|>San Francisco<|"|>}<tool_call|> ?
How do i fix it?
Thank you.

Hey @azemihako — couple things going on:

The raw <|tool_call>...|> text is Gemma’s native tool-call syntax leaking through unparsed. The model is emitting tool calls in its own chat-template format, but langchain_community.chat_models.ChatLlamaCpp isn’t converting that text into structured AIMessage.tool_calls. So bind_tools registers the schema, but create_agent never sees a parsed tool call to dispatch.

Status note on langchain-community: it’s in maintenance mode and not where new integrations land. ChatLlamaCpp there hasn’t kept up with newer chat templates / tool-call formats, and I wouldn’t expect it to. So this path is unlikely to “just start working.”

What I’d recommend instead — let llama.cpp itself handle the tool-call parsing and talk to it over HTTP. llama.cpp recently grew first-class support for the Anthropic Messages API, which appears to be more feature-complete than its OpenAI-compat endpoint (overview: New in llama.cpp: Anthropic Messages API). Use ChatAnthropic pointed at the local server:

from langchain_anthropic import ChatAnthropic

model = ChatAnthropic(
    base_url="http://localhost:8080",
    api_key="not-needed",
    model="<your-model-id>",
)

Then bind_tools and create_agent work as expected. @jcuypers is running roughly this setup with Qwen3 and reports good results — context in Optionally show reasoning · Issue #1117 · langchain-ai/deepagents · GitHub. They also flagged that the OpenAI-compat path in llama.cpp doesn’t surface everything the agent needs (e.g., thinking blocks), which is why the Anthropic path is the safer default.

If you want the OpenAI-compat path anyway (simpler, fewer features), ChatOpenAI(base_url="http://localhost:8080/v1", api_key="not-needed", model=...) works for plain tool calling.

One smaller nit: tool_choice={"type": "function", "function": {...}} is OpenAI’s wire format; the in-process ChatLlamaCpp doesn’t necessarily honor that shape — another reason the server-based path is cleaner.