Advice for implementing complex Customer AI HealthCare Assistant

Hi everyone,

I’m currently working on a fairly large healthcare conversational use case and would appreciate advice from the community on architecture and best practices.

Use Case Overview

The assistant is expected to handle end-to-end patient journeys, including (but not limited to):

  • Fetching patient information
  • Fetching a patient’s appointment details
  • Confirming or cancelling appointments
  • Booking appointments:
    • Asking the user about symptoms
    • Fetching hospitals near the user’s location
    • Fetching clinics based on the selected hospital
    • Fetching available doctors with dates and time slots
    • Supporting both “nearest available” appointments and bookings on a specific date
  • Sending appointment details via SMS or WhatsApp
  • Sending hospital location via SMS or WhatsApp
  • Fetching and updating insurance details
  • Fetching insurance status
  • Fetching user-submitted queries to doctors

All backend capabilities are exposed as tools via an MCP server.

Current Architecture

I’m using a multi-agent architecture:

  • A supervisor agent created using create_agent
  • Multiple sub-agents, also created using create_agent, exposed as tools to the supervisor
  • Each sub-agent owns a specific domain and has access to its own set of tools

Challenges

  1. Flow reliability
    In complex scenarios (especially booking), the supervisor sometimes skips steps or jumps ahead, even though the steps are logically required.
    I’m concerned about whether this architecture can reliably handle all edge cases and long, structured workflows without missing mandatory steps.
  2. Bilingual support (Arabic & English)
  • ~90% of usage will be in Arabic, ~10% in English
  • Arabic responses require heavy customization:
    • Strict response structures for hospital lists
    • Different formatting rules for doctor lists, appointment slots, confirmations, etc.
  • This currently requires a lot of prompt-level instructions, and I’m worried about prompt complexity, maintainability, and consistency.

What I’m Looking For

I’d love guidance on:

  • Whether this supervisor + sub-agent pattern is the right approach for such deterministic, step-heavy workflows
  • Techniques to enforce step-by-step execution (e.g., state machines, planners, guardrails, or explicit workflow validation)
  • Best practices for handling highly structured, multilingual responses, especially when one language (Arabic) dominates and has strict formatting requirements
  • Any real-world patterns you’ve used successfully for similar healthcare or booking flows in LangChain

Any suggestions, examples, or references would be greatly appreciated.

Thanks in advance! :folded_hands:

Hi @razaullah

This is a really solid and realistic healthcare use case. You’re right to be concerned about reliability, especially for booking flows where steps are mandatory and can’t be skipped.

Based on similar structured workflows I’ve worked on, I’d suggest slightly shifting the architecture rather than relying fully on a supervisor + sub-agent setup.

1. Supervisor + Sub-Agents

That pattern is powerful for reasoning-heavy or open-ended tasks, but for deterministic, step-by-step flows (like booking), it can become unpredictable. LLMs are probabilistic planners, so they may occasionally skip or reorder steps.

For booking pipelines, I would strongly recommend using LangGraph with explicit state and controlled transitions.

2. Move Booking to a State-Driven LangGraph

Instead of letting a supervisor decide the flow dynamically, define a state object and explicit nodes.

Example state:

from typing import TypedDict, Optional

class BookingState(TypedDict):

patient_id: str

symptoms: Optional\[str\]

hospital_id: Optional\[str\]

clinic_id: Optional\[str\]

doctor_id: Optional\[str\]

slot_id: Optional\[str\]

confirmed: bool

step: str

Each node performs one atomic action:

  1. collect_symptoms
  2. fetch_hospitals
  3. select_hospital
  4. fetch_clinics
  5. fetch_doctors
  6. select_slot
  7. confirm_booking
  8. send_notifications

Example node:

def collect_symptoms(state: BookingState):

if state\["symptoms"\]:

    return state

\# ask user for symptoms here

state\["step"\] = "awaiting_symptoms"

return state

Example conditional transition:

def route_after_symptoms(state: BookingState):

if not state\["symptoms"\]:

    return "collect_symptoms"

return "fetch_hospitals"

This guarantees that hospitals are never fetched before symptoms are present, etc. The graph enforces order instead of the LLM.

3. Hybrid Pattern (Recommended)

Use the LLM for:

  1. Intent classification
  2. Extracting structured fields from user messages
  3. Clarification questions

Use LangGraph for:

  1. Workflow control
  2. Mandatory step enforcement
  3. Tool execution
  4. State validation

Think of it as:

LLM = interpreter

Graph = controller

This pattern tends to be much more stable in production.

4. Enforcing Step Reliability

A few practical suggestions:

  1. Validate required fields before every transition.
  2. Keep tool calls idempotent (especially for booking/cancellation).
  3. Persist state in Redis or a database.
  4. Log every state transition for audit (important in healthcare).

Add guard nodes like:

def validate_required_fields(state: BookingState):

required = \["hospital_id", "doctor_id", "slot_id"\]

for field in required:

    if not state.get(field):

        return "collect_missing_info"

return "confirm_booking"

In healthcare systems, determinism and traceability matter more than agent cleverness.

5. Arabic + English Strategy

I would strongly avoid solving strict formatting purely in prompts. That becomes fragile and hard to maintain.

Instead:

  1. Have the LLM return structured JSON.
  2. Use a separate rendering layer for Arabic and English.

Example LLM output:

{

“hospitals”: [

{"id": "1", "name": "Al Noor Hospital", "distance_km": 3.2}

]

}

Then format in code:

def render_hospitals_ar(data):

return "\\n".join(

    \[f"{i+1}. {h\['name'\]} - {h\['distance_km'\]} كم"

     for i, h in enumerate(data\["hospitals"\])\]

)

This keeps:

  1. Prompts simpler
  2. Formatting deterministic
  3. Arabic structure fully controlled

The LLM should handle reasoning and extraction, not presentation rules.

6. Where Multi-Agent Still Makes Sense

Multi-agent is still useful for:

  1. Insurance Q&A
  2. General medical inquiries
  3. Doctor-submitted questions
  4. Knowledge retrieval (RAG)

But for strict booking pipelines, a graph-based workflow is usually safer.

7. High-Level Architecture Suggestion

Intent Classifier

→ Route to:

  1. Booking Graph
  2. Insurance Graph
  3. General Q&A Agent
  4. Escalation / Human Handoff

This prevents a supervisor from hallucinating control flow in critical paths.

@Bitcot_Kaushal , Thank you very much for the detail answer and it make sense to use a deterministic workflow, but as my agent needs to perform different task that too, is possible to be under one session i.e. user can ask for sending hospital location, agent needs to fetch the details and send it via user specific method (SMS or WhatsApp) and then can ask for booking an appointment.
For that purpose I have divided the architecture into a multi-agent system i.e. a Supervisor to route user request to book-appointment agent or appointment-management or Insurance details agent etc etc. All share the same memory for context management.
How I can handle this.

Hi @razaullah

That makes complete sense — your use case definitely needs flexibility within the same session.

You can absolutely keep the multi-agent + supervisor setup. The important part is just this:

:backhand_index_pointing_right: Let the Supervisor handle routing only (booking, insurance, location, etc.)
:backhand_index_pointing_right: Let each critical flow (like booking) run as a deterministic LangGraph workflow internally

So when the user says:

  • “Send hospital location” → route to Info agent

  • Then “Book appointment” → route to Booking graph

Each workflow controls its own steps reliably, while the supervisor just decides where to send the request.

This way you get flexibility + reliability without the supervisor hallucinating control flow in critical paths.

@Bitcot_Kaushal , Supervisor should be implemented using create_agent function or using the langgraph approach? Currently I have create_agent for routing to different sub_agents that I pass as a tools to the supervisor and then the sub_agents are also defined using create_agent function and each agent have it’s own tools from the MCP server.

Hi @razaullah ,

Use LangGraph for the Supervisor, not create_agent.

Let the supervisor graph handle:

  • Intent detection

  • Routing

  • Session state

Then:

  • Use deterministic LangGraph workflows for critical flows (like booking).

  • Use create_agent agents only for flexible/reasoning tasks (insurance Q&A, general info, etc.).

So basically:

Graph = controller
Agents = workers

That way you keep flexibility in one session, but avoid LLM-based routing breaking important booking steps.

@Bitcot_Kaushal , I got your points. Thank you for the guidance. Really much appreciated.

1 Like

Hi @razaullah

if it’s solved, a huge favor - please mark this post as Solved for the others, so that they can get benefits from it :rocket: