๐Ÿš€ Help Needed: LangChain JS + Google GenAI Live Voice + RAG (MongoDB Vector Search)

Hii with Langchain js i make RAG application where we use Mongodb vector DB database. in Chat it work fine

Hello everyone!
I have built a RAG application using LangChain JS + MongoDB Atlas Vector Search, and everything works perfectly for text chat.

Now I want to add real-time voice (live audio) conversation using Google GenAI Live WebSocket API, but I am unsure how to integrate it with LangChainโ€™s RAG retriever.

Below is my full implementation and what I want to achieve.


:white_check_mark: My Current Setup (Working)

Technologies:

  • LangChain JS
  • MongoDB Atlas Vector Search
  • Google Generative AI (embeddings + chat model)
  • Express backend
  • SSE streaming for chat responses
  • PDF ingestion & chunking

:pushpin: My Full Code (Current Working RAG Setup)

Server Setup

import express from "express";
import cors from "cors";
import dotenv from "dotenv";
import multer from "multer";
import path from "path";
import fs from "fs";
import { v4 as uuidv4 } from "uuid";
import { MongoClient } from "mongodb";
import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import {
    GoogleGenerativeAIEmbeddings,
    ChatGoogleGenerativeAI,
} from "@langchain/google-genai";
import { MongoDBAtlasVectorSearch } from "@langchain/mongodb";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { WebSocketServer } from "ws";
import { GoogleGenAI, Modality } from "@google/genai";

dotenv.config();

const app = express();
const PORT = process.env.PORT || 8000;

// ==========================
// โš™๏ธ Middleware
// ==========================
app.use(cors());
app.use(express.json());
app.use(express.urlencoded({ extended: true }));

const upload = multer({ dest: "uploads/" });

// ==========================
// ๐Ÿง  MongoDB & Model Setup
// ==========================
const MONGO_URI = process.env.MONGO_URI;
const DB_NAME = process.env.MONGO_DB_NAME || "rag_demo";
const COLLECTION_NAME = process.env.MONGO_COLLECTION || "knowledge_base";

const embeddings = new GoogleGenerativeAIEmbeddings({
    model: "text-embedding-004",
    apiKey: process.env.GOOGLE_API_KEY,
});

const chatModel = new ChatGoogleGenerativeAI({
    model: "gemini-2.0-flash",
    apiKey: process.env.GOOGLE_API_KEY,
    streaming: true,
});

:file_folder: Upload API (PDF โ†’ Chunks โ†’ Embeddings โ†’ MongoDB)

app.post("/api/upload-file", upload.single("file"), async (req, res) => {
    const { file } = req;
    const { name } = req.body;

    if (!file) return res.status(400).json({ error: "File is required" });
    if (!name) return res.status(400).json({ error: "Name is required" });

    const ext = path.extname(file.originalname).toLowerCase();
    const fileId = uuidv4();

    try {
        if (ext !== ".pdf") {
            return res.status(400).json({ error: "Only PDF files are supported" });
        }

        const loader = new PDFLoader(file.path);
        const rawDocs = await loader.load();

        const splitter = new RecursiveCharacterTextSplitter({
            chunkSize: 1500,
            chunkOverlap: 200,
        });

        const splitDocs = await splitter.splitDocuments(rawDocs);

        const documents = splitDocs.map((doc, index) => ({
            ...doc,
            metadata: {
                fileId,
                name,
                fileName: file.originalname,
                chunkIndex: index,
                uploadedAt: new Date(),
            },
        }));

        const client = new MongoClient(MONGO_URI);
        await client.connect();
        const db = client.db(DB_NAME);
        const collection = db.collection(COLLECTION_NAME);

        const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
            collection,
            indexName: "vector_index",
            textKey: "pageContent",
            embeddingKey: "embedding",
        });

        await vectorStore.addDocuments(documents);
        await client.close();
        fs.unlinkSync(file.path);

        res.status(201).json({
            message: "โœ… File uploaded & vectors stored in MongoDB Atlas",
            fileId,
            totalChunks: documents.length,
        });
    } catch (err) {
        console.error("Upload error:", err);
        res.status(500).json({ error: err.message });
    }
});

:speech_balloon: Chat API (RAG + SSE Streaming)

app.post("/api/chat-stream", async (req, res) => {
    const { question } = req.body;
    if (!question) return res.status(400).json({ error: "Question is required" });

    try {
        res.setHeader("Content-Type", "text/event-stream");
        res.setHeader("Cache-Control", "no-cache");
        res.setHeader("Connection", "keep-alive");

        const client = new MongoClient(MONGO_URI);
        await client.connect();
        const db = client.db(DB_NAME);
        const collection = db.collection(COLLECTION_NAME);

        const vectorStore = new MongoDBAtlasVectorSearch(embeddings, {
            collection,
            indexName: "vector_index",
            textKey: "pageContent",
            embeddingKey: "embedding",
        });

        const results = await vectorStore.similaritySearch(question, 5);
        const context = results.map((r) => r.pageContent).join("\n---\n");

        const prompt = `
You are a helpful AI assistant. Use the context below to answer the user's question.
If the answer is not in the context, say "I don't know".

Context:
${context}

Question:
${question}
`;

        const stream = await chatModel.stream(prompt);

        for await (const chunk of stream) {
            if (chunk?.content && chunk.content.length > 0) {
                const token = chunk.content || "";
                res.write(`data: ${JSON.stringify({ type: "text", data: token })}\n\n`);
            }
        }

        res.write(`data: ${JSON.stringify({ type: "complete" })}\n\n`);
        res.end();
        await client.close();
    } catch (err) {
        console.error("Chat streaming error:", err);
        res.write(`data: ${JSON.stringify({ type: "error", data: err.message })}\n\n`);
        res.end();
    }
});

:microphone: Now My Actual Question: How to do Live Voice RAG?

Google GenAI provides real-time bidirectional audio streaming using WebSockets:

wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent

Using @google/genai, we can do:

const session = await client.live.connect({
    model: "gemini-2.0-flash-live",
});

This gives:

  • Real-time microphone streaming
  • Real-time transcript
  • Real-time audio response

:red_question_mark: My Question to the LangChain Community

Does LangChain JS support RAG inside Live Voice (WebSocket) sessions?

Specifically:

:check_mark: K1: Is there any built-in LangChain wrapper for Google Live WebSocket?

:check_mark: K2: Can I combine real-time transcript โ†’ RAG โ†’ send context โ†’ audio response?

:check_mark: K3: Do I need to manually implement this flow?