The raw <|tool_call>...|> text is Gemma’s native tool-call syntax leaking through unparsed. The model is emitting tool calls in its own chat-template format, but langchain_community.chat_models.ChatLlamaCpp isn’t converting that text into structured AIMessage.tool_calls. So bind_tools registers the schema, but create_agent never sees a parsed tool call to dispatch.
Status note on langchain-community: it’s in maintenance mode and not where new integrations land. ChatLlamaCpp there hasn’t kept up with newer chat templates / tool-call formats, and I wouldn’t expect it to. So this path is unlikely to “just start working.”
What I’d recommend instead — let llama.cpp itself handle the tool-call parsing and talk to it over HTTP. llama.cpp recently grew first-class support for the Anthropic Messages API, which appears to be more feature-complete than its OpenAI-compat endpoint (overview: New in llama.cpp: Anthropic Messages API). Use ChatAnthropic pointed at the local server:
from langchain_anthropic import ChatAnthropic
model = ChatAnthropic(
base_url="http://localhost:8080",
api_key="not-needed",
model="<your-model-id>",
)
Then bind_tools and create_agent work as expected. @jcuypers is running roughly this setup with Qwen3 and reports good results — context in Optionally show reasoning · Issue #1117 · langchain-ai/deepagents · GitHub. They also flagged that the OpenAI-compat path in llama.cpp doesn’t surface everything the agent needs (e.g., thinking blocks), which is why the Anthropic path is the safer default.
If you want the OpenAI-compat path anyway (simpler, fewer features), ChatOpenAI(base_url="http://localhost:8080/v1", api_key="not-needed", model=...) works for plain tool calling.
One smaller nit: tool_choice={"type": "function", "function": {...}} is OpenAI’s wire format; the in-process ChatLlamaCpp doesn’t necessarily honor that shape — another reason the server-based path is cleaner.