🚀 Help Needed: LangChain JS + Google GenAI Live Voice + RAG (MongoDB Vector Search)

ramprasadchauhan · November 24, 2025, 12:07pm

Hii with Langchain js i make RAG application where we use Mongodb vector DB database. in Chat it work fine

Hello everyone!
I have built a RAG application using LangChain JS + MongoDB Atlas Vector Search, and everything works perfectly for text chat.

Now I want to add real-time voice (live audio) conversation using Google GenAI Live WebSocket API, but I am unsure how to integrate it with LangChain’s RAG retriever.

Below is my full implementation and what I want to achieve.

My Current Setup (Working)

Technologies:

LangChain JS
MongoDB Atlas Vector Search
Google Generative AI (embeddings + chat model)
Express backend
SSE streaming for chat responses
PDF ingestion & chunking

My Full Code (Current Working RAG Setup)

Server Setup

import express from "express";
import cors from "cors";
import dotenv from "dotenv";
import multer from "multer";
import path from "path";
import fs from "fs";
import { v4 as uuidv4 } from "uuid";
import { MongoClient } from "mongodb";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import {
    GoogleGenerativeAIEmbeddings,
    ChatGoogleGenerativeAI,
} from "@langchain/google-genai";
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { WebSocketServer } from "ws";
import { GoogleGenAI, Modality } from "@google/genai";

dotenv.config();

const app = express();
const PORT = process.env.PORT || 8000;

// ==========================
// ⚙️ Middleware
// ==========================
app.use(cors());
app.use(express.json());
app.use(express.urlencoded({ extended: true }));

const upload = multer({ dest: "uploads/" });

// ==========================
// 🧠 MongoDB & Model Setup
// ==========================
const MONGO_URI = process.env.MONGO_URI;
const DB_NAME = process.env.MONGO_DB_NAME || "rag_demo";
const COLLECTION_NAME = process.env.MONGO_COLLECTION || "knowledge_base";

const embeddings = new GoogleGenerativeAIEmbeddings({
    model: "text-embedding-004",
    apiKey: process.env.GOOGLE_API_KEY,
});

const chatModel = new ChatGoogleGenerativeAI({
    model: "gemini-2.0-flash",
    apiKey: process.env.GOOGLE_API_KEY,
    streaming: true,
});

Upload API (PDF → Chunks → Embeddings → MongoDB)

app.post("/api/upload-file", upload.single("file"), async (req, res) => {
    const { file } = req;
    const { name } = req.body;

    if (!file) return res.status(400).json({ error: "File is required" });
    if (!name) return res.status(400).json({ error: "Name is required" });

    const ext = path.extname(file.originalname).toLowerCase();
    const fileId = uuidv4();

    try {
        if (ext !== ".pdf") {
            return res.status(400).json({ error: "Only PDF files are supported" });
        }

        const loader = new PDFLoader(file.path);
        const rawDocs = await loader.load();

        const splitter = new RecursiveCharacterTextSplitter({
            chunkSize: 1500,
            chunkOverlap: 200,
        });

        const splitDocs = await splitter.splitDocuments(rawDocs);

        const documents = splitDocs.map((doc, index) => ({
            ...doc,
            metadata: {
                fileId,
                name,
                fileName: file.originalname,
                chunkIndex: index,
                uploadedAt: new Date(),
            },
        }));

        const client = new MongoClient(MONGO_URI);
        await client.connect();
        const db = client.db(DB_NAME);
        const collection = db.collection(COLLECTION_NAME);

        const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
            collection,
            indexName: "vector_index",
            textKey: "pageContent",
            embeddingKey: "embedding",
        });

        await vectorStore.addDocuments(documents);
        await client.close();
        fs.unlinkSync(file.path);

        res.status(201).json({
            message: "✅ File uploaded & vectors stored in MongoDB Atlas",
            fileId,
            totalChunks: documents.length,
        });
    } catch (err) {
        console.error("Upload error:", err);
        res.status(500).json({ error: err.message });
    }
});

Chat API (RAG + SSE Streaming)

app.post("/api/chat-stream", async (req, res) => {
    const { question } = req.body;
    if (!question) return res.status(400).json({ error: "Question is required" });

    try {
        res.setHeader("Content-Type", "text/event-stream");
        res.setHeader("Cache-Control", "no-cache");
        res.setHeader("Connection", "keep-alive");

        const client = new MongoClient(MONGO_URI);
        await client.connect();
        const db = client.db(DB_NAME);
        const collection = db.collection(COLLECTION_NAME);

        const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
            collection,
            indexName: "vector_index",
            textKey: "pageContent",
            embeddingKey: "embedding",
        });

        const results = await vectorStore.similaritySearch(question, 5);
        const context = results.map((r) => r.pageContent).join("\n---\n");

        const prompt = `
You are a helpful AI assistant. Use the context below to answer the user's question.
If the answer is not in the context, say "I don't know".

Context:
${context}

Question:
${question}
`;

        const stream = await chatModel.stream(prompt);

        for await (const chunk of stream) {
            if (chunk?.content && chunk.content.length > 0) {
                const token = chunk.content || "";
                res.write(`data: ${JSON.stringify({ type: "text", data: token })}\n\n`);
            }
        }

        res.write(`data: ${JSON.stringify({ type: "complete" })}\n\n`);
        res.end();
        await client.close();
    } catch (err) {
        console.error("Chat streaming error:", err);
        res.write(`data: ${JSON.stringify({ type: "error", data: err.message })}\n\n`);
        res.end();
    }
});

Now My Actual Question: How to do Live Voice RAG?

Google GenAI provides real-time bidirectional audio streaming using WebSockets:

wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent

Using @google/genai, we can do:

const session = await client.live.connect({
    model: "gemini-2.0-flash-live",
});

This gives:

Real-time microphone streaming
Real-time transcript
Real-time audio response

My Question to the LangChain Community

Does LangChain JS support RAG inside Live Voice (WebSocket) sessions?

Specifically:

Topic		Replies	Views
How save chat history on database like mongodb in LangChain 1.0.4 LangChain Academy python-help	2	870	November 14, 2025
DOC Rag Sample clarification LangSmith Product Help intro-to-langgraph , product-feedback	0	242	August 20, 2025
Langconnect server Deployment	0	130	October 6, 2025
Langsmith API Key in langchain react sdk useStream LangGraph cloud , js-help	1	163	January 25, 2026
Rag Tool Calling Script written in 0.3.27 and modified to 1.0.2 resulting in empty AIMessage content and no tool calls or additional kwargs LangChain python-help	2	424	October 25, 2025

🚀 Help Needed: LangChain JS + Google GenAI Live Voice + RAG (MongoDB Vector Search)

My Current Setup (Working)

Technologies:

My Full Code (Current Working RAG Setup)

Server Setup

Now My Actual Question: How to do Live Voice RAG?

My Question to the LangChain Community

Does LangChain JS support RAG inside Live Voice (WebSocket) sessions?

K1: Is there any built-in LangChain wrapper for Google Live WebSocket?

K2: Can I combine real-time transcript → RAG → send context → audio response?

K3: Do I need to manually implement this flow?

🚀 Help Needed: LangChain JS + Google GenAI Live Voice + RAG (MongoDB Vector Search)

My Current Setup (Working)

Technologies:

My Full Code (Current Working RAG Setup)

Server Setup

Now My Actual Question: How to do Live Voice RAG?

My Question to the LangChain Community

Does LangChain JS support RAG inside Live Voice (WebSocket) sessions?

K1: Is there any built-in LangChain wrapper for Google Live WebSocket?

K2: Can I combine real-time transcript → RAG → send context → audio response?

K3: Do I need to manually implement this flow?

Related topics