How do I trim messages stored in memory before invoking?

Younes · November 20, 2025, 2:06pm

In the docs it speaks about how to trim messages for short term memory in order to not exceed token windows. The problem is that the documentation is outdated. The ‘preModelHook‘ option is not available anymore. My question is, how can I do the same with the current version of langchain?

pawel-twardziak · November 20, 2025, 7:11pm

Hi @Younes

have you tried middleware? Overview - Docs by LangChain
Use agent middleware (the beforeModel hook) to trim or summarize messages before the model call, or use the built‑in summarizationMiddleware.

Built-in one:

import { createAgent, summarizationMiddleware } from "langchain";
import { ChatOpenAI } from "@langchain/openai";

const agent = createAgent({
  model: new ChatOpenAI({ model: "gpt-4o" }),
  tools: [],
  middleware: [
    summarizationMiddleware({
      model: new ChatOpenAI({ model: "gpt-4o" }),
      trigger: { tokens: 4000, messages: 10 }, // when to summarize
      keep: { messages: 20 },                  // how much to retain afterward
    }),
  ],
});

or your own middleware

import { createAgent, createMiddleware, trimMessages } from "langchain";
import { ChatOpenAI } from "@langchain/openai";

const trimHistory = createMiddleware({
  name: "TrimHistory",
  beforeModel: async (state) => ({
    messages: await trimMessages(state.messages, {
      strategy: "last",
      maxTokens: 4000,
      allowPartial: true,
      includeSystem: true,
      // Simple approximate counter; replace with your tokenizer if needed
      tokenCounter: async (msgs) =>
        msgs.reduce(
          (sum, m) => sum + (typeof m.content === "string" ? m.content.length / 4 : 0),
          0
        ),
    }),
  }),
});

const agent = createAgent({
  model: new ChatOpenAI({ model: "gpt-4o" }),
  tools: [],
  middleware: [trimHistory],
});

Younes · November 24, 2025, 6:41am

I am trying to create a middleware that trims the messages, but it doesn’t seem to be working. With my code snippet below I would expect the agent to not have any memory, since I am removing all previous messages. But it still has memory.


trimHistory = createMiddleware({
     name: 'TrimHistory',
     beforeModel: async (state) => ({
     messages: [],
    }),
  });

private readonly agent = createAgent({
    model: new ChatOpenAI({ model: 'gpt-4o-mini' }),
    tools: [retrieveDocuments],
    contextSchema: z.object({ organisationId: z.number() }),
    checkpointer: this.checkpointer,
    middleware: [this.trimHistory],
  });

pawel-twardziak · November 27, 2025, 10:44pm

checkpointer: this.checkpointer, is the memory.
If the middleware does not work, there must be sth wrong with its implementation.

Younes · December 1, 2025, 6:37am

This is the implementation:

    this.checkpointer = PostgresSaver.fromConnString(
      process.env.DATABASE_URL as string,
    );

pawel-twardziak · December 1, 2025, 10:49am

I meant the middleware implementation.
Could you share your entire code so that I could asses and test whether it is an implementation issue?

Younes · December 2, 2025, 6:56am

I’ll share all the parts that belong to the agent:

const retrieveDocuments = tool(
  async ({ query }, config) => {
    const { organisationId } = config.context;

    const vectorStore = await PGVectorStore.initialize(embeddingsModel, {
      tableName: 'embeddings',
      postgresConnectionOptions: {
        connectionString: process.env.DATABASE_URL,
      },
    });

    const limit = 20;
    const retrievedDocs = await vectorStore.similaritySearch(query, limit, {
      organisationId,
    });
    const serialized = retrievedDocs
      .map(
        (doc) =>
          `Source: ${doc.metadata.source}\n${doc.metadata?.ogImage ? `ogImage: ${doc.metadata.ogImage}\n` : ''}Content: ${doc.pageContent}`,
      )
      .join('\n');

    await vectorStore.end();
    return [serialized, retrievedDocs];
  },
  {
    name: 'retrieve',
    description: 'Retrieve information related to a query.',
    schema: z.object({ query: z.string() }),
    responseFormat: 'content_and_artifact',
  },
);

const checkpointer = PostgresSaver.fromConnString(
      process.env.DATABASE_URL as string,
 );


const agent = createAgent({
      model: new ChatOpenAI({ model: 'gpt-4o-mini' }),
      tools: [retrieveDocuments],
      contextSchema: z.object({ organisationId: z.number() }),
      responseFormat: z.object({
        text: z.string().describe('A markdown repsonse to the user query'),
        sources: z
         .array(z.object({ url: z.string(), ogImage: z.string() }))
         .describe('the sources used to come to an answer'),
      }),
      checkpointer: checkpointer,
    });

Topic		Replies	Views
Why is all messages still being passed when LLM is called? Trim_message isn't working. Here is my code LangChain python-help	3	251	November 26, 2025
Complete context compression through middleware LangChain intro-to-langgraph , python-help	2	129	March 19, 2026
Summarization Middleware Talking Shop	6	265	January 26, 2026
Questions about SummarizationMiddleware outputs for different models and using summary_prompt LangChain python-help	2	351	September 19, 2025
How to use Langchain v1.x middleware in langgraph? LangChain python-help	3	1085	November 3, 2025

How do I trim messages stored in memory before invoking?

Related topics