RecursiveCharacterTextSplitter separator order

wavozy · July 1, 2025, 9:01pm

Hi! It’s not very clear to me how the order of the items in the separator list, which is a parameter for RecursiveCharacterText, works. The docs are very brief RecursiveCharacterTextSplitter — 🦜🔗 LangChain 0.0.149 . It’s only clear that the chunks aren’t split at every item and the order seems to point towards some sort of priority. But how exactly does it decide to split the input text at one of the default items in the separator list, ["\n\n", "\n", " ", ""]?
Thank you

nhuang · July 1, 2025, 9:24pm

It tries to split in the order of the separator list.

First it will try to split by the first item "\n\n"
Then if any of the chunks are > max_chunk_size, the next separator in the list is used on that chunk to try and reduce that chunk’s size.

wavozy · July 1, 2025, 9:54pm

Thank you! But how long does it wait till trying the next separator? Because if it surpasses max_chunk_size, it could try each separator and immediately go to "" and split the string right there. It seems to me that the string is split sometimes earlier than max_chunk_size and sometimes much later, but I don’t understand how it decides when to use each separator once the index approaches max_chunk_size. In other words, how do I know the real min and max chunk size before another separator is picked? For example, with the separator list [". ", " "], how can I know the min and max string size to change the sparator from dot to space? Thanks again

nhuang · July 1, 2025, 10:32pm

There’s not a concept of “waiting”. The full implementation is here, you can TAL for full detail.

You can think of it like it like this
First, we split on the first separator.
Then we iterate through the chunks of the document.
When we hit a chunk that is too big > max chunk size, then:

we merge all of the small acceptable chunks so far, getting close to the desired chunk size. We flush those into a “final chunks” list that has been finalized.
we then use the next separator on the too big chunk to break it smaller, and continue

You can check out the source code for the exact logic, but this is general overview!

Topic		Replies	Views
Could not find "RecursiveCharacterTextSplitter" LangChain python-help	8	2715	November 27, 2025
Feature request - Dart language text splitter Talking Shop	0	145	August 28, 2025
Compatibility Issue Between langchain and langchain-text-splitters Versions LangChain python-help	1	669	October 8, 2025
LangChain LLMs chatbot Weird responses and cut off LangChain python-help	3	158	November 28, 2025
Inability to Determine the Breakpoint between s1 and s2 due to Current Windowing Logic LangChain python-help	9	95	April 13, 2026

RecursiveCharacterTextSplitter separator order

Related topics