How to avoid agent copy history without calling the tool?

sometime, the model (with tools) will not calling the tool when the similar query is come in. eventhough the previous query has called the tools, the model will be lazy to response the similar query with copy the previous response.

if the model response without calling tool, then invoke the model again (but in the course_qa_graph, tool’s response will send to model, then the model will generate final response, how to difference between the model’s final response and first response). or remove some history message ?
Is there any better suggestions ?

from typing import Literal
from langgraph.types import Command
from nodes.router import router
from agents.course_qa_agent import course_qa_graph
from agents.general_agent import general_graph
from langgraph.graph import START, StateGraph, MessagesState, END
from langchain_core.messages import AIMessage, ToolMessage  # 导入消息对象
from langgraph.prebuilt import ToolNode, tools_condition
from tools.api_client import load_chat_model, toolbox

model = load_chat_model()
course_tools = toolbox.load_toolset('course_information_tools')


class CourseQAState(MessagesState):
    """数据库问答代理的状态类"""
    answer: str


def call_model(state: CourseQAState) -> dict:
    """
    使用检索到的信息作为上下文回答问题

    Args:
        state: 数据库问答状态

    Returns:
        dict: 包含答案和消息的字典
    """

    system_message = '你是一个高效的数据查询助手'
    model_with_tools = model.bind_tools(course_tools)

    response = model_with_tools.invoke([
        {"role": "system", "content": system_message},
        *state['messages']])
    response.name = 'course_qa_agent'

    answer = response.content
    return {"answer": answer,
            "messages": [response]}


course_qa_graph_builder = StateGraph(CourseQAState)
course_qa_graph_builder.add_node('course_call_model', call_model)
course_qa_graph_builder.add_node('course_tools', ToolNode(course_tools))
course_qa_graph_builder.add_edge(START, "course_call_model")
course_qa_graph_builder.add_conditional_edges(
    "course_call_model",
    tools_condition,
    {'tools': 'course_tools', '__end__': END}
)
course_qa_graph_builder.add_edge("course_tools", "course_call_model")

course_qa_graph = course_qa_graph_builder.compile(checkpointer=True)



class AgentState(MessagesState):
    """代理状态类,继承自MessagesState"""
    pass


def supervisor(state: AgentState) -> Command[
    Literal['course_qa_agent', 'general_agent', END]]:
    """
    主管代理,负责路由到相应的子代理

    Args:
        state: 代理状态

    Returns:
        Command: 包含目标节点的命令
    """
    messages = state.get('messages', [])
    last_message = messages[-1]

    # 如果是AI消息或工具消息,结束流程
    if isinstance(last_message, AIMessage) or isinstance(last_message, ToolMessage):
        return Command(goto=END)

    response = router(state)
    intent = response.get('intent', '通用对话')

    routing_map = {
        '课程查询': 'course_qa_agent',
        '通用对话': 'general_agent'
    }
    target = routing_map.get(intent, 'general_agent')
    return Command(goto=target)


def course_qa_agent(state: AgentState) -> Command[Literal['supervisor']]:
    """
    数据库问答代理

    Args:
        state: 代理状态

    Returns:
        Command: 返回主管代理的命令
    """
    response = course_qa_graph.invoke({'messages': state.get('messages', [])})
    last_message = response.get('messages', [])[-1]
    return Command(goto='supervisor', update={'messages': last_message})


def general_agent(state: AgentState) -> Command[Literal['supervisor']]:
    """
    通用对话代理

    Args:
        state: 代理状态

    Returns:
        Command: 返回主管代理的命令
    """
    response = general_graph.invoke({'messages': state.get('messages', [])})
    last_message = response.get('messages', [])[-1]
    # print('response:',last_message)
    return Command(goto='supervisor', update={'messages': last_message})


builder = StateGraph(AgentState)
builder.add_node(supervisor)
builder.add_node(course_qa_agent)
builder.add_edge(START, "supervisor")

hi @feng-1985

In your course_qa_graph you call model.bind_tools(course_tools) with the default tool behavior.

In LangChain this is effectively tool_choice="auto", which means “the model may or may not call tools” depending on what it thinks is easiest.

Because you pass the full MessagesState (including previous assistant answers) into each invoke, the model often sees that it has already answered a very similar question and just reuses that answer instead of paying the cost of a tool call.

That is expected behaviour for tool_choice="auto" models.

1. Force the model to call tools

For course Q&A it sounds like you always want to use your tools (DB / APIs) instead of the model’s own memory.
You can enforce that at the model level with tool_choice when you bind tools:

model = load_chat_model()

model_with_tools = model.bind_tools(
    course_tools,
    tool_choice="any",   # or "required" for OpenAI-style models
)

example for Anthropic (look at the source code of your provider):

tool_choice: Which tool to require the model to call. Options are:

- Name of the tool as a string or as dict `{"type": "tool", "name": "<<tool_name>>"}`: calls corresponding tool
- `'auto'`, `{"type: "auto"}`, or `None`: automatically selects a tool (including no tool)
- `'any'` or `{"type: "any"}`: force at least one tool to be called

With tool_choice="any" (or "required" for OpenAI-style backends), every call to course_call_model will first return an AIMessage with non-empty tool_calls, so your existing tools_condition + ToolNode wiring will always run the tool at least once before producing a final answer.

2. Don’t show the model its previous answers when you don’t want copying

Right now you pass state["messages"] (which includes old AI answers) into course_qa_graph:

response = course_qa_graph.invoke({"messages": state.get("messages", [])})

If you don’t want the model to simply copy a previous assistant message, you can trim or filter the history before sending it to the course QA subgraph.

A simple pattern is to pass only the latest human question (and optionally tool outputs), instead of the whole chat history:

from langchain_core.messages import HumanMessage


def course_qa_agent(state: AgentState) -> Command[Literal['supervisor']]:
    # keep only the last user turn for the course QA subgraph
    user_messages = [m for m in state["messages"] if isinstance(m, HumanMessage)]
    sub_state = {"messages": user_messages[-1:]}  # last human message only

    response = course_qa_graph.invoke(sub_state)
    last_message = response["messages"][-1]
    return Command(goto="supervisor", update={"messages": [last_message]})

For more advanced control you can use trim_messages from langchain_core.messages.utils to keep only the last N tokens/messages before each model call; LangGraph’s own memory docs show this pattern via a pre_model_hook that trims messages before the LLM is called (LangGraph “Add memory” / trimming messages).

You could also wire the built-in summarization middleware into your graph - I wrote about that in another thread.

1 Like

it’s hard to decide show the previous answer or not, maybe the user’s query if related to the previous answer. maybe should introduce a method to filter history message ?

force the model to call tools, maybe make the model response for some bad query (such as those query couldn’t match the tool).

to decide whether or not to show the previous answer you would need more advanced context engineering imho.
LLM is kinda “thinking” machine that can decide on its own. Usually it’s effective, but sometimes it doesn’t match our goals.
The question is - why would an LLM run the tool again if there is the answer in the context already? Would the tool produce different answer with the same input? if so, it’s not idempotent any longer, which is wrong.