Add WhatsApp Chat Document Loader (Python parity)

Checked

  • I searched existing ideas and did not find a similar one
  • I added a very descriptive title
  • I’ve clearly described the feature request and motivation for it

Feature request

Add a WhatsApp Chat Loader to LangChain.js, matching the functionality available in LangChain Python.

Motivation

WhatsApp is one of the most widely used messaging platforms globally (2B+ users), and chat export is a common use case for conversational AI and RAG applications. The Python implementation exists but the JS/TS version doesn’t, creating a gap for developers who need this functionality.

Why this matters:

  1. Feature parity with Python implementation - ensures consistency across the LangChain ecosystem
  2. High demand - WhatsApp chat analysis is a frequent use case for conversation history, customer support analysis, and knowledge extraction
  3. Simple implementation - Text file parsing only, no external API dependencies or authentication required
  4. Low maintenance burden - Standard .txt export format with well-defined structure

Reference:

Proposal (If applicable)

Important discovery: This is actually a Chat Loader, not a Document Loader. It returns ChatSession objects with HumanMessage instances, preserving the conversational structure.

Implementation approach:

  • Type: Chat Loader (Document Loader)
  • Location: libs/langchain-community/src/chat_loaders/whatsapp.ts
  • Entrypoint: @langchain/community/chat_loaders/whatsapp
  • Input: Exported WhatsApp chat files (.txt format, without media)
  • Output: ChatSession with HumanMessage objects containing sender and timestamp metadata
  • Features:
    • Multi-line message support
    • System message filtering (deleted messages, media omitted, etc.)
    • Timestamp and sender extraction
    • Support for both 12-hour (AM/PM) and 24-hour formats
  • Dependencies: None (pure text parsing with regex)
  • Tests: Integration tests with sample chat exports

Example usage:

import { WhatsAppChatLoader } from "@langchain/community/chat_loaders/whatsapp";

const loader = new WhatsAppChatLoader("path/to/chat.txt");
const chatSessions = await loader.load();
// Returns ChatSession with HumanMessage objects

Question: Given the simplicity of this loader and low maintenance requirements, would it be acceptable to contribute this directly to @langchain/community, or should this be published as a standalone NPM package per the current integration guidelines? I’m ready to implement this with full tests and documentation if this is welcome in the main repository.

Hi @Jude

Thanks @pawel-twardziak for jumping on this!
I was planning to implement it, but glad to see it’s already done.
I’ll review the PR and provide feedback if needed.

Hi @Jude

I raised it quickly. It is more or less a mirror of the python implementation. If you think there should be something else in the loader, feel free to add comments or just raise your own PR and let me know - I will cancel mine :slight_smile:

1 Like