Rag Tool Calling Script written in 0.3.27 and modified to 1.0.2 resulting in empty AIMessage content and no tool calls or additional kwargs

This original script was built in langchain version 0.3.x and it worked well. A few days ago I migrated to version 1.0.2 and tried to modify the script to accommodate the changes. Now when running the new script I get an error that perhaps indicates the LangChain agent fails to produce a meaningful response, resulting in an empty AIMessage content and no tool calls or additional kwargs. The debug output specifically notes: “DEBUG: No final answer content accumulated from stream. Agent might have failed or produced no discernible output.” Seems perhaps the agent is not processing the user input correctly or is failing to generate a response, either from the LLM directly or through the tools. Not sure can someone take a look and helpout thank you all.

Original script using version 0.3.27:

import streamlit as st

from operator import itemgetter

from langchain_core.prompts import (

SystemMessagePromptTemplate,

HumanMessagePromptTemplate,

ChatPromptTemplate,

MessagesPlaceholder

)

from langchain_core.output_parsers import StrOutputParser

from langchain_core.runnables.history import RunnableWithMessageHistory

from langchain_core.runnables import RunnablePassthrough

from langchain_community.chat_message_histories import SQLChatMessageHistory

from langchain_ollama import ChatOllama, OllamaEmbeddings

from langchain_community.vectorstores import FAISS

# — LANGCHAIN AGENT IMPORTS (NEW) —

# Note: These are the new imports required for creating and running a tool-calling agent.

from langchain.agents import AgentExecutor, create_tool_calling_agent #AgentExecutor receives agent’s decision and executes it, create_tool_calling_agent build agent logic

from langchain_core.tools import tool #Imports the @tool decorator which is placed above a function to convert it into a format the agent can understand

# ---- UI and Session ----

st.title(“Welcome To AddyBot The Smartest AI On The Block”)

st.write(“What would you like to know! I have the answer you seek!”)

user_id = st.text_input(“Enter your user id”, “”)

def get_session_history(session_id: str):

“”“Returns a SQL-backed chat history object for a given session_id.”“”

return SQLChatMessageHistory(session_id, connection=“sqlite:///chat_history.db”)

if “chat_history” not in st.session_state:

st.session_state.chat_history =

if st.button(“Click Here To Wipe History AND Start A New Chat”):

st.session_state.chat_history =

if user_id:

history = get_session_history(user_id)

history.clear()

st.rerun()

for message in st.session_state.chat_history:

with st.chat_message(message[‘role’]):

st.markdown(message[‘content’])

# ---- LLM, Embeddings, Vector Store ----

base_url = “http://localhost:11434

model = ‘llama3.2:3b-instruct-q4_K_M’ # Note: Ensure this model supports tool calling. Llama 3 models generally do.

llm = ChatOllama(base_url=base_url, model=model)

embeddings = OllamaEmbeddings(model=‘nomic-embed-text’, base_url=base_url)

db_name = “/home/kmurray/Desktop/LANGCHAIN AND OLLAMA/Network_Testing_DataStore”

vector_store = FAISS.load_local(db_name, embeddings, allow_dangerous_deserialization=True)

retriever = vector_store.as_retriever(search_type=‘similarity’, search_kwargs={‘k’: 2})

def format_docs(docs):

“”“Formats retrieved documents into a single string.”“”

return ‘\n\n’.join([doc.page_content for doc in docs]) if docs else “No relevant context found.”

# ---- Prompt Templates: This section is the systems instructions ----

# Note: This system message will now be part of the agent’s main prompt.

system_msg = “You are a helpful assistant. Make sure your answer is relevant to the question.”

# — RAG and General Chains (to be used inside tools) —

# Note: These chains are now the building blocks for our agent’s tools, not directly invoked by the user.

# RAG Chain

rag_prompt = ChatPromptTemplate.from_messages([

("system", system_msg + "\\nAnswer the user's question based on the context provided below."),

("human", "Question: {question}\\nContext: {context}\\nAnswer:")

])

rag_base_chain = (

RunnablePassthrough.assign(context=itemgetter(“question”) | retriever | format_docs)

| rag_prompt

| llm

| StrOutputParser()

)

# General Chain

general_prompt = ChatPromptTemplate.from_messages([

("system", system_msg),

("human", "{input}")

])

general_chain = general_prompt | llm | StrOutputParser()

# ### — AGENT AND TOOLS SETUP (NEW) — ###

# Note: This entire section replaces the old keyword-based router.

# We define two functions and decorate them with `@tool`. The agent will decide which one to call.

@tool

def rag_search(question: str) → str:

“”"

Use this tool when the user asks a question about networking, IP addresses, switches,

routers, device configurations, VLANs, subnets, firewalls, ports, or interfaces.

This tool is for retrieving specific, technical information from the knowledge base.

"""

# Note: The docstring above is CRITICAL. The agent uses it to decide when to use this tool.

print(f"— Calling RAG Search Tool for: {question} —")

return rag_base_chain.invoke({“question”: question}) #your question is used here

@tool

def general_conversation(query: str) → str:

“”"

Use this tool for general conversation, greetings, or any questions that are not

related to specific networking, device, or configuration details.

"""

print(f"— Calling General Conversation Tool for: {query} —")

return general_chain.invoke({“input”: query})

# A list of tools the agent can choose from.

tools = [rag_search, general_conversation]

agent_prompt = ChatPromptTemplate.from_messages(

\[

SystemMessagePromptTemplate.from_template(“You are a helpful assistant”),

MessagesPlaceholder(variable_name=“chat_history”, optional=True), # Optional if you handle history elsewhere

HumanMessagePromptTemplate.from_template(“{input}”),

MessagesPlaceholder(variable_name=“agent_scratchpad”)

\]

)

# Create the agent itself. This binds the LLM, the tools, and the prompt together.

# The agent is a Runnable that will output either a tool call or a final answer.

agent = create_tool_calling_agent(llm, tools, agent_prompt)

# Create the AgentExecutor. This is the runtime that actually executes the tools the agent decides to call.

# `verbose=True` is very helpful for debugging as it prints the agent’s thought process.

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# Finally, wrap the AgentExecutor in RunnableWithMessageHistory to give it conversational memory.

# This is the final, stateful object we will interact with.

agent_with_chat_history = RunnableWithMessageHistory(

agent_executor,

get_session_history,

input_messages_key=“input”,

history_messages_key=“chat_history”,

)

# ### — END OF NEW AGENT SETUP — ###

# — Streamlit Chat Input —

user_prompt = st.chat_input(“How can I help?”) #this is the point of origin of a users question, where the user types their question

#the question is then stored in user_prompt and used below

if user_prompt: #this is saying if the user typed a question then begin to execute this section of code

if user_id:

st.session_state.chat_history.append({‘role’: ‘user’, ‘content’: user_prompt})

with st.chat_message(“user”):

st.markdown(user_prompt)

full_response = “”

with st.chat_message(“assistant”):

# Note: The old router is gone. We now call the agent directly.

# The agent will decide internally whether to use RAG or the general tool.

# This function streams the agent’s final answer to the UI.

def gen_agent_response():

# The stream from an AgentExecutor yields many intermediate steps (tool calls, etc.).

# We are interested in the final ‘output’ chunks to display to the user.

for chunk in agent_with_chat_history.stream( #this is the function call that kicks off the entire AI reasoning process

                {"input": user_prompt},                   #question passed here & agent designed to receive a dictionary which you create here & pass to agent

config={“configurable”: {“session_id”: user_id}}

            ):

if “output” in chunk:

yield chunk[“output”]

full_response = st.write_stream(gen_agent_response())

if full_response:

st.session_state.chat_history.append({‘role’: ‘assistant’, ‘content’: full_response})

else:

st.warning(“Please enter a user ID before starting a chat!”)

NEW SCRIPT USING VERSION 1.0.2:

import streamlit as st

from operator import itemgetter

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

from langchain_core.output_parsers import StrOutputParser

from langchain_core.runnables.history import RunnableWithMessageHistory

from langchain_core.runnables import RunnablePassthrough

from langchain_community.chat_message_histories import SQLChatMessageHistory

from langchain_ollama import ChatOllama, OllamaEmbeddings

from langchain_community.vectorstores import FAISS

# — LANGCHAIN AGENT IMPORTS (UPDATED FOR v1) —

from langchain.agents import create_agent

from langchain.tools import tool

from langchain_core.messages import AIMessage, AIMessageChunk, SystemMessage

from langchain_core.callbacks.base import BaseCallbackHandler # Import for callbacks

# ---- UI and Session ----

st.title(“Welcome To AddyBot The Smartest AI On The Block”)

st.write(“What would you like to know! I have the answer you seek!”)

user_id = st.text_input(“Enter your user id”, “”)

def get_session_history(session_id: str):

“”“Returns a SQL-backed chat history object for a given session_id.”“”

return SQLChatMessageHistory(session_id, connection=“sqlite:///chat_history.db”)

if “chat_history” not in st.session_state:

st.session_state.chat_history =

if st.button(“Click Here To Wipe History AND Start A New Chat”):

st.session_state.chat_history =

if user_id:

history = get_session_history(user_id)

history.clear()

st.rerun()

for message in st.session_state.chat_history:

with st.chat_message(message[‘role’]):

st.markdown(message[‘content’])

# ---- LLM, Embeddings, Vector Store ----

base_url = “http://localhost:11434

model = ‘llama3-groq-tool-use:8b-q3_K_M’ # Using the model you’ve tested

# Initialize LLM with a callback handler to print its internal workings

llm = ChatOllama(base_url=base_url, model=model)

embeddings = OllamaEmbeddings(model=‘nomic-embed-text’, base_url=base_url)

db_name = “/home/kmurray/Desktop/LANGCHAIN AND OLLAMA/Network_Testing_DataStore”

vector_store = FAISS.load_local(db_name, embeddings, allow_dangerous_deserialization=True)

retriever = vector_store.as_retriever(search_type=‘similarity’, search_kwargs={‘k’: 2})

def format_docs(docs):

“”“Formats retrieved documents into a single string.”“”

return ‘\n\n’.join([doc.page_content for doc in docs]) if docs else “No relevant context found.”

# ---- Prompt Templates: This section is the systems instructions ----

system_msg = “You are a helpful assistant. Make sure your answer is relevant to the question.”

# — RAG and General Chains (to be used inside tools) —

rag_prompt = ChatPromptTemplate.from_messages([

("system", system_msg + "\\nAnswer the user's question based on the context provided below."),

("human", "Question: {question}\\nContext: {context}\\nAnswer:")

])

rag_base_chain = (

RunnablePassthrough.assign(context=itemgetter(“question”) | retriever | format_docs)

| rag_prompt

| llm

| StrOutputParser()

)

general_prompt = ChatPromptTemplate.from_messages([

("system", system_msg),

("human", "{input}")

])

general_chain = general_prompt | llm | StrOutputParser()

# ### — AGENT AND TOOLS SETUP (UPDATED FOR v1) — ###

@tool

def rag_search(question: str) → str:

“”"

Use this tool when the user asks a question about networking, IP addresses, switches,

routers, device configurations, VLANs, subnets, firewalls, ports, or interfaces.

This tool is for retrieving specific, technical information from the knowledge base.

"""

print(f"— Calling RAG Search Tool for: ‘{question}’ —")

result = rag_base_chain.invoke({“question”: question})

print(f"— RAG Search Tool output: ‘{result}’ (Type: {type(result)}) —")

return result

@tool

def general_conversation(query: str) → str:

“”"

Use this tool for general conversation, greetings, or any questions that are not

related to specific networking, device, or configuration details.

"""

print(f"— Calling General Conversation Tool for: ‘{query}’ —")

result = general_chain.invoke({“input”: query})

print(f"— General Conversation Tool output: ‘{result}’ (Type: {type(result)}) —")

return result

# A list of tools the agent can choose from.

tools = [rag_search, general_conversation]

# — Custom Callback Handler to print LLM prompts and responses —

class StdOutCallbackHandler(BaseCallbackHandler):

def on_llm_start(self, serialized, prompts, **kwargs):

print(“\n— LLM Invocation START —”)

print(f" Prompts sent to LLM:")

for prompt in prompts:

print(f" {prompt}")

print(“— LLM Invocation END (waiting for response) —”)

def on_llm_end(self, response, **kwargs):

print(“\n— LLM Response START —”)

print(f" Raw LLM Response: {response}")

if response.generations and response.generations[0].message:

msg = response.generations[0].message

print(f" Parsed AIMessage Content: ‘{msg.content}’")

print(f" Parsed AIMessage Tool Calls: {msg.tool_calls}")

print(f" Parsed AIMessage Additional Kwargs: {msg.additional_kwargs}")

print(“— LLM Response END —\n”)

# IMPORTANT: Bind tools to the LLM *before* passing it to create_agent.

# Also, bind the custom callback handler to the LLM.

llm_with_tools = llm.bind_tools(tools).with_config(callbacks=[StdOutCallbackHandler()])

agent_runnable = create_agent(model=llm_with_tools, tools=tools, system_prompt=system_msg)

# — Wrapping agent_runnable directly in RunnableWithMessageHistory —

agent_with_chat_history = RunnableWithMessageHistory(

agent_runnable,

get_session_history,

input_messages_key=“input”,

history_messages_key=“chat_history”,

# output_messages_key=“output” # Removed as we’re parsing raw graph state chunks

)

# ### — END OF NEW AGENT SETUP — ###

# — Streamlit Chat Input —

user_prompt = st.chat_input(“How can I help?”)

if user_prompt:

if user_id:

st.session_state.chat_history.append({‘role’: ‘user’, ‘content’: user_prompt})

with st.chat_message(“user”):

st.markdown(user_prompt)

full_response = “”

with st.chat_message(“assistant”):

def gen_agent_response():

final_answer_content = “” # To store the final accumulated response

for chunk in agent_with_chat_history.stream(

                {"input": user_prompt},

config={“configurable”: {“session_id”: user_id}}

            ):

# print(f"DEBUG: Raw chunk received in stream: {chunk}") # Uncomment for full stream debug

# Langgraph typically streams dictionaries representing state updates.

if isinstance(chunk, dict):

# Correctly access the ‘messages’ list nested under ‘model’

if ‘model’ in chunk and isinstance(chunk[‘model’], dict) and ‘messages’ in chunk[‘model’]:

messages_list = chunk[‘model’][‘messages’]

for msg in messages_list:

if isinstance(msg, (AIMessage, AIMessageChunk)):

# — CRITICAL DEBUG PRINT —

# This is from the stream’s perspective, after internal processing

print(f"DEBUG: Processing AIMessage from ‘model’->‘messages’ (stream):")

print(f" Content: ‘{msg.content}’")

print(f" Tool Calls: {msg.tool_calls}")

print(f" Additional Kwargs: {msg.additional_kwargs}")

# — END CRITICAL DEBUG PRINT —

if msg.tool_calls:

# LLM decided to call a tool. Its content will be empty.

print(f"DEBUG: Agent decided to call tool(s): {msg.tool_calls}")

yield “*(Calling tool…)*\n”

elif msg.content:

# This is the actual textual content from the LLM

final_answer_content += msg.content

yield msg.content

else:

# Handle other types of messages if they appear in ‘messages’

if str(msg).strip(): # Only yield if not empty

final_answer_content += str(msg)

yield str(msg) + “\n”

# Look for ‘tool_output’ if it’s streamed as a separate key

elif ‘tool_output’ in chunk:

print(f"DEBUG: Tool output received: {chunk[‘tool_output’]}")

# Tool output is usually fed back into the agent, not directly displayed.

# If you want to display it for transparency:

# yield f"*(Tool result: {chunk[‘tool_output’]})*\n"

# Look for the agent’s final outcome if it’s explicitly streamed

elif ‘agent_outcome’ in chunk and isinstance(chunk[‘agent_outcome’], dict) and ‘return_values’ in chunk[‘agent_outcome’]:

if ‘output’ in chunk[‘agent_outcome’][‘return_values’]:

outcome_content = chunk[‘agent_outcome’][‘return_values’][‘output’]

if isinstance(outcome_content, str) and outcome_content.strip():

final_answer_content += outcome_content

yield outcome_content

elif isinstance(outcome_content, (AIMessage, AIMessageChunk)) and outcome_content.content.strip():

final_answer_content += outcome_content.content

yield outcome_content.content

# Handle raw AIMessage/AIMessageChunk if they are streamed directly (less common with langgraph)

elif isinstance(chunk, (AIMessage, AIMessageChunk)):

if chunk.content:

final_answer_content += chunk.content

yield chunk.content

# After the loop, ensure that if no content was yielded, a default message is shown

if not final_answer_content.strip():

print(“DEBUG: No final answer content accumulated from stream. Agent might have failed or produced no discernible output.”)

yield “I’m sorry, I couldn’t generate a response.”

full_response = st.write_stream(gen_agent_response())

if full_response:

st.session_state.chat_history.append({‘role’: ‘assistant’, ‘content’: full_response})

else:

st.warning(“Please enter a user ID before starting a chat!”)

Could you please put your code inside “Preformatted Text”?

1 Like

ORIGINAL:

import streamlit as st
from operator import itemgetter
from langchain_core.prompts import (
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
ChatPromptTemplate,
MessagesPlaceholder
)
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough
from langchain_community.chat_message_histories import SQLChatMessageHistory
from langchain_ollama import ChatOllama, OllamaEmbeddings
from langchain_community.vectorstores import FAISS

— LANGCHAIN AGENT IMPORTS (NEW) —

Note: These are the new imports required for creating and running a tool-calling agent.

from langchain.agents import AgentExecutor, create_tool_calling_agent #AgentExecutor receives agent’s decision and executes it, create_tool_calling_agent build agent logic
from langchain_core.tools import tool #Imports the @tool decorator which is placed above a function to convert it into a format the agent can understand

---- UI and Session ----

st.title(“Welcome To AddyBot The Smartest AI On The Block”)
st.write(“What would you like to know! I have the answer you seek!”)

user_id = st.text_input(“Enter your user id”, “”)

def get_session_history(session_id: str):
“”“Returns a SQL-backed chat history object for a given session_id.”“”
return SQLChatMessageHistory(session_id, connection=“sqlite:///chat_history.db”)

if “chat_history” not in st.session_state:
st.session_state.chat_history =

if st.button(“Click Here To Wipe History AND Start A New Chat”):
st.session_state.chat_history =
if user_id:
history = get_session_history(user_id)
history.clear()
st.rerun()

for message in st.session_state.chat_history:
with st.chat_message(message[‘role’]):
st.markdown(message[‘content’])

---- LLM, Embeddings, Vector Store ----

base_url = “http://localhost:11434
model = ‘llama3.2:3b-instruct-q4_K_M’ # Note: Ensure this model supports tool calling. Llama 3 models generally do.
llm = ChatOllama(base_url=base_url, model=model)
embeddings = OllamaEmbeddings(model=‘nomic-embed-text’, base_url=base_url)

db_name = “/home/kmurray/Desktop/LANGCHAIN AND OLLAMA/Network_Testing_DataStore”
vector_store = FAISS.load_local(db_name, embeddings, allow_dangerous_deserialization=True)
retriever = vector_store.as_retriever(search_type=‘similarity’, search_kwargs={‘k’: 2})

def format_docs(docs):
“”“Formats retrieved documents into a single string.”“”
return ‘\n\n’.join([doc.page_content for doc in docs]) if docs else “No relevant context found.”

---- Prompt Templates: This section is the systems instructions ----

Note: This system message will now be part of the agent’s main prompt.

system_msg = “You are a helpful assistant. Make sure your answer is relevant to the question.”

— RAG and General Chains (to be used inside tools) —

Note: These chains are now the building blocks for our agent’s tools, not directly invoked by the user.

RAG Chain

rag_prompt = ChatPromptTemplate.from_messages([
(“system”, system_msg + “\nAnswer the user’s question based on the context provided below.”),
(“human”, “Question: {question}\nContext: {context}\nAnswer:”)
])
rag_base_chain = (
RunnablePassthrough.assign(context=itemgetter(“question”) | retriever | format_docs)
| rag_prompt
| llm
| StrOutputParser()
)

General Chain

general_prompt = ChatPromptTemplate.from_messages([
(“system”, system_msg),
(“human”, “{input}”)
])
general_chain = general_prompt | llm | StrOutputParser()

### — AGENT AND TOOLS SETUP (NEW) —

Note: This entire section replaces the old keyword-based router.

We define two functions and decorate them with @tool. The agent will decide which one to call.

@tool
def rag_search(question: str) → str:
“”"
Use this tool when the user asks a question about networking, IP addresses, switches,
routers, device configurations, VLANs, subnets, firewalls, ports, or interfaces.
This tool is for retrieving specific, technical information from the knowledge base.
“”"

Note: The docstring above is CRITICAL. The agent uses it to decide when to use this tool.

print(f"— Calling RAG Search Tool for: {question} —")
return rag_base_chain.invoke({“question”: question}) #your question is used here

@tool
def general_conversation(query: str) → str:
“”"
Use this tool for general conversation, greetings, or any questions that are not
related to specific networking, device, or configuration details.
“”"
print(f"— Calling General Conversation Tool for: {query} —")
return general_chain.invoke({“input”: query})

A list of tools the agent can choose from.

tools = [rag_search, general_conversation]

agent_prompt = ChatPromptTemplate.from_messages(
[
SystemMessagePromptTemplate.from_template(“You are a helpful assistant”),
MessagesPlaceholder(variable_name=“chat_history”, optional=True), # Optional if you handle history elsewhere
HumanMessagePromptTemplate.from_template(“{input}”),
MessagesPlaceholder(variable_name=“agent_scratchpad”)
]
)

Create the agent itself. This binds the LLM, the tools, and the prompt together.

The agent is a Runnable that will output either a tool call or a final answer.

agent = create_tool_calling_agent(llm, tools, agent_prompt)

Create the AgentExecutor. This is the runtime that actually executes the tools the agent decides to call.

verbose=True is very helpful for debugging as it prints the agent’s thought process.

agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

Finally, wrap the AgentExecutor in RunnableWithMessageHistory to give it conversational memory.

This is the final, stateful object we will interact with.

agent_with_chat_history = RunnableWithMessageHistory(
agent_executor,
get_session_history,
input_messages_key=“input”,
history_messages_key=“chat_history”,
)

### — END OF NEW AGENT SETUP —

— Streamlit Chat Input —

user_prompt = st.chat_input(“How can I help?”) #this is the point of origin of a users question, where the user types their question
#the question is then stored in user_prompt and used below
if user_prompt: #this is saying if the user typed a question then begin to execute this section of code
if user_id:
st.session_state.chat_history.append({‘role’: ‘user’, ‘content’: user_prompt})
with st.chat_message(“user”):
st.markdown(user_prompt)

    full_response = ""
    with st.chat_message("assistant"):
        # Note: The old router is gone. We now call the agent directly.
        # The agent will decide internally whether to use RAG or the general tool.
        
        # This function streams the agent's final answer to the UI.
        def gen_agent_response():
            # The stream from an AgentExecutor yields many intermediate steps (tool calls, etc.).
            # We are interested in the final 'output' chunks to display to the user.
            for chunk in agent_with_chat_history.stream(      #this is the function call that kicks off the entire AI reasoning process
                {"input": user_prompt},                   #question passed here & agent designed to receive a dictionary which you create here & pass to agent
                config={"configurable": {"session_id": user_id}}
            ):
                if "output" in chunk:
                    yield chunk["output"]

        full_response = st.write_stream(gen_agent_response())

    if full_response:
        st.session_state.chat_history.append({'role': 'assistant', 'content': full_response})
else:
    st.warning("Please enter a user ID before starting a chat!")