Inability to Determine the Breakpoint between s1 and s2 due to Current Windowing Logic

OnAnd0n · March 12, 2026, 6:20pm

so, I have just opened an issue on GitHub.
Could you please take a look when you have a moment?

https://github.com/langchain-ai/langchain-experimental/issues/73

keenborder786 · March 12, 2026, 10:10pm

@OnAnd0n I did check your proposed solution and problem that you have identified.

Why don’t we make combine_sentences use a fixed-size window with edge padding (clamp or reflect indices), instead of shrinking windows at document boundaries.

It removes the start/end “dead-zone” bias. Also, It keeps embedding calls at n (same as now), so runtime/cost stays nearly identical. Plus It’s a small, local change in one function.

OnAnd0n · March 14, 2026, 4:27pm

@keenborder786
Thank you for the clamping suggestion.
unfortunately, I found that the issue of failing to separate s1 and s2 still persists even after applying the clamping method.
But, By applying the Disjoint Context Comparison approach alongside your suggested Clamping method, we can effectively address the boundary detection issues at the start of the document.

While Clamping ensures a stable window size, combining it with a Disjoint approach eliminates Inclusion Bias by isolating the information at each boundary.

Here is how the combined logic works with a buffer_size of 2:

Boundary 0 (s1 | s2):
Left (Pre): {s1, s1} (Clamping applied)
Right (Post): {s2, s3}

Boundary 1 (s2 | s3):
Left (Pre): {s1, s2}
Right (Post): {s3, s4}

Boundary 2 (s3 | s4):
Left (Pre): {s2, s3}
Right (Post): {s4, s5}

Boundary 3 (s4 | s5):
Left (Pre): {s3, s4}
Right (Post): {s5, s6}

Boundary 4 (s5 | s6):
Left (Pre): {s4, s5}
Right (Post): {s6, s6} (Clamping applied)

By integrating these two approaches, we can accurately capture the semantic shifts between s1 and s2.
Furthermore, since each sentence is embedded only once, the computational overhead and costs are significantly reduced.

Applied Methods

Clamping: Ensures stable window sizes at the document edges.
Disjoint Context Comparison: Eliminates overlap bias by isolating pre- and post-context windows.

Modified Functions

_calculate_sentence_distances: Updated to implement the Disjoint and Clamping logic.
calculate_cosine_distances: Refactored to handle the refined distance metrics.

Deprecated Function

combine_sentences: This function is no longer used as the logic has been shifted to a more efficient vector-based mean pooling.

OnAnd0n · March 14, 2026, 4:32pm

@keenborder786

I am attaching an image showing the improved results after applying the proposed logic.

Would it be okay to proceed with the Pull Request (PR) in this manner?
Also, could you recommend whom I should tag or request for a review?
(I am mindful that github-maintainers all have very busy schedules and want to ensure the PR is directed to the right person so it doesn’t get overlooked.)

keenborder786 · March 17, 2026, 6:42am

Excellent, yes you can open a PR in here GitHub - langchain-ai/langchain-community: Community-maintained LangChain integrations · GitHub and mention @mdrxy

keenborder786 · March 17, 2026, 6:43am

Also if possible @OnAnd0n can you mark your reply as a solution as it helps the community.

OnAnd0n · March 17, 2026, 10:51am

Thank you for the guidance!!!
Just to double-check, my changes are currently within the langchain-experimental package. Should I still open the PR in the langchain-community repository, or was that a link for the experimental repo?

keenborder786 · March 17, 2026, 1:33pm

Oh yes, open it in langchain-experimental package.

OnAnd0n · April 12, 2026, 3:01pm

@keenborder786
There hasn’t been any progress on PRs in langchain-experimental at all.
In this case, where or whom should I reach out to?

keenborder786 · April 13, 2026, 11:13am

@OnAnd0n you can reach out to @mdrxy or tag him in your PR.

Topic		Replies	Views
Issues and PRs in langchain-experimental have been stalled for a month LangChain python-help	1	35	April 12, 2026
RecursiveCharacterTextSplitter separator order LangChain python-help	3	516	July 1, 2025
Could not find "RecursiveCharacterTextSplitter" LangChain python-help	8	2740	November 27, 2025
Compatibility Issue Between langchain and langchain-text-splitters Versions LangChain python-help	1	676	October 8, 2025
400 error from concatenated annotation fields during chunk aggregation (OpenAI + built-in file_search) LangChain python-help	1	195	October 8, 2025

Inability to Determine the Breakpoint between s1 and s2 due to Current Windowing Logic

Related topics