Groq Model Response

razaullah · January 26, 2026, 8:22am

I am using Groq as the model provider and the model I am using is openai/gpt-oss-120b. Some time I am getting a worse response from the model as follows. I am using a multi-agent flow.

The response from the agent is:
The appointment 33854985 at Takhassusi Hospital, Dental Clinic with Reem Alghamdi on thirty‑thirty‑thirty? (Oops! I made … … … … … … … … …… … … … … … … … … … … … ……………… … …………… … ………… … … … … … … … … … … … … … ………… ……………………………………………………………………………………………………………………………………………………………………………………………

I’m sorry— … … … hall…

Oops! … … … …

… … …

Apologies…

… … ……

…

… Sorry…

…

We…

…

……

…

……

…

……

…

Okay…

We…

…

……

…

We…

…

……

…

The…

…

……

…

……

…

……

…

……

…

…The appointment for the patient Raza Ullah Ullah is Cancelled at Takhassusi Hospital, Dental Clinic with Reem Alghamdi on thirty‑first August two thousand twenty‑six at five p.m.

What could be the reason for this type of responses. Need advice.

pawel-twardziak · January 26, 2026, 1:23pm

Hi @razaullah

what does your graph definition look like? Could you share as much code as possible?

razaullah · January 26, 2026, 1:30pm

I am using a multi-agent flow. I am using create_agent as a supervisor and other sub agents as a tools. the supervisor agent code is as follows:

supervisor = create_agent(
            model=model,
            system_prompt=supervisor_prompt,
            tools=[fetch_patient_information_tool, manage_appointment_tool, book_appointment_tool],
            checkpointer=checkpointer,
            name="supervisor_agent",
)

One of the tools is defined as follows:

@tool(return_direct=True)
async def fetch_patient_information_tool(request: str, runtime: ToolRuntime) -> str:
    """Use this tool to retrieve patient information such as name, mobile number, date of birth, or address. 
    This tool handles ONLY patient personal information queries. Call this when the user asks about their 
    personal details or provides a patient ID to retrieve information.
    
    Args:
        request: The user's request for patient information
    """
    root_thread_id = runtime.config["configurable"]["thread_id"]
    logger.info(f"[patient_agent] 🚀 Invoking with thread_id={root_thread_id}, request: {request}")
    
    # Build config with shared checkpoint namespace for memory continuity
    sub_config = extend_runtime_config(runtime, checkpoint_ns=SHARED_NS)
    
    # 💾 Log checkpoint state BEFORE invoke to see what memory is loaded
    await log_checkpoint_state(patient_agent, "patient_agent", sub_config)

    # Note: checkpointer is already bound to patient_agent at creation time
    result = await patient_agent.ainvoke(
        {"messages": [{"role": "user", "content": request}]},
        config=sub_config
    )
    
    # Convert ToolMessages to TOON format
    result = convert_tool_messages_to_toon_format(result, tool_names=["mssql_get_patient_info", "mssql_get_patient_by_name_dob"])
    
    # Update the checkpoint with TOON formatted messages
    await patient_agent.aupdate_state(sub_config, result)
    logger.info(f"[patient_agent] 💾 Updated checkpoint with TOON formatted messages")
    
    logger.info(f"[patient_agent] 🛠️ Sub-agent invocation complete.")
    # Log sub-agent's internal tool calls and messages AFTER invoke
    log_subagent_result("patient_agent", result)
    
    response = extract_last_ai_message(result)
    logger.info(f"[patient_agent] ✅ Final response: {response[:200]}...")
    response = strip_markdown(response)
    
    return response.lower()

The sub-agent is defined as follows:

async def patient_information_retrieval_agent(tools, language_id, checkpointer=None):
    """
    Create a patient information retrieval agent.
    """
    logger.info(f"Creating patient information retrieval agent with checkpointer: {checkpointer}")
    prompt_folder = 'english_prompts' if language_id == 2 else 'arabic_prompts'
    prompt_path = os.path.join(os.path.dirname(__file__), '..', 'prompts', prompt_folder, 'patient_information_retrieval_prompt.yml')
    logger.info(f"Loading patient information retrieval prompt from with language id: {language_id}")
    
    with open(prompt_path, 'r', encoding='utf-8') as f:
        prompts = yaml.safe_load(f)

    return create_agent(
        model=model,
        tools=tools,
        system_prompt=prompts['patient_information_retrieval_prompt'].format(LANGUAGE_ID=language_id),
        name="patient_information_retrieval_agent", 
        checkpointer=checkpointer,
        middleware=[
            # SummarizationMiddleware(
            #     model = model,
            #     trigger=("tokens", 1000),
            #     keep=("messages", 5),
            # ),
            ]
    )

The other tools and sub_agents are defined in the same manner.

@pawel-twardziak If you need other information’s let me know.

pawel-twardziak · January 26, 2026, 2:05pm

thanks, I’m pondering…

pawel-twardziak · January 26, 2026, 2:22pm

Could you root out all the TOON conversion functionality? I’m pretty sure, together with the aupdate_state usage within the tools, it messes up.
Remove and tell me please whether the issue is still there.
There is no real need to convert to TOON since it does not provide any significant reduction and, what’s crucial, the framework is not ready for that.

razaullah · January 27, 2026, 6:07am

@pawel-twardziak I removed the TOON conversion function and now analyzing the results. I will let you know about the outcomes observed in this case.

Thanks for your feedback.

razaullah · January 27, 2026, 7:05am

@pawel-twardziak When I removed the TOON format conversion function, it’s working now. Now as I have a multi-agent architecture. Also I need that agent’s should shared the context. Now when I am interacting with the agent for booking an appointment, it consumes almost 35k tokens. I tried to exclude the tools using the built in middleware, but sometime it is excluding a tool, that I need to fulfill the user request, I am getting an error.
One other approach to exclude the tool’s that are used somewhere before, but may I need an information that were extracted in the first tool call. Also SummarizationMiddleware will update the full context.

Can you guide me for this case. Which approach will be best to have a full context for fulfilling user requests and also minimize the tokens consumptions.

Looking forward for your guidance.

pawel-twardziak · January 27, 2026, 7:29am

Hi @razaullah

OK, great it works again. I’ll follow up today or tomorrow.

pawel-twardziak · January 27, 2026, 10:51pm

hi @razaullah I have a few conclusions, but I need to put them together. I’ll get back to you tomorrow

pawel-twardziak · January 29, 2026, 9:52am

Sorry, was occupied yesterday, will do my best today.

Topic		Replies	Views
Summarization Middleware Talking Shop	6	85	January 26, 2026
Harmony Response Format sometimes outputted when using gpt-oss-120b as an Agent LangChain python-help	9	783	January 5, 2026
Multi-agent using create_agent as a tools Talking Shop	3	148	January 15, 2026
How to generate response when the model without proper tools (multi-turn)? LangGraph python-help	6	135	December 19, 2025
Agent Claims it Needs More Info but Proceeds to Generate Answer With Tool Call LangGraph python-help	3	178	December 5, 2025

Groq Model Response

Related topics