GraphDocument with multiple Document sources

febus · November 19, 2025, 1:31pm

Hello there! I am trying to ingest data in Neo4J for a hybrid RAG application.

I have seen some tutorials where you basically do this:

Load Document objects through a loader
Chunk the document
Feed them to a LLM transformer which creates GraphDocument with GraphDocument.source pointing to the original Document chunk
Add them to Neo4J using Neo4jGraph.add_graph_document(document, include_source=True)

My JSON structures contains basically Title, Summary, Long Description (some additional irrelevant properties) so I’d like to build my data as:

GraphDocument (everything but the description)
GraphDocument.source (all the JSON data, formatted and with embeddings for vector queries)

Now, my data is already available in a structured JSON format and I know exactly what the GraphDocument has to contain, I don’t really need to use an LLM to read the JSON structure and infer the structure (it’s slower, expensive, and not deterministic)

I thought to build it manually, however I found an issue I didn’t find any documentation for: GraphDocument.source is a single Document object while inevitably the description will become large at some point and it will need to be chunked in multiple Document objects.

I’m not sure what purpose GraphDocument.source serves (apart from creating the MENTION relationships). I could keep Vector and Graph data separate and create manually such relationships but is it the right approach?

Wouldn’t be better supporting multiple sources in the GraphDocument class?

hsm207 · December 12, 2025, 7:43pm

I agree that from a design perspective, it would be very convenient if the GraphDocument.source attribute could directly support a list of Document objects. This would be an intuitive way to represent a graph derived from a text that has been chunked into multiple pieces.

The current design, however, is still quite powerful and supports this use case . You could enrich the metadata of each document chunk during the ingestion process.

For example, when you split a large document, you can add identifiers to each chunk’s metadata, such as a parent_id (linking to the original document) and chunk_number or next_chunk_id and prev_chunk_id.

Then, at retrieval time, when you fetch a graph node and its corresponding source chunk, your application can inspect this metadata. If the context seems incomplete, you can use the parent_id and chunk_num to retrieve the next (or previous) chunks from your document store to reconstruct the full context for the LLM.

Topic		Replies	Views
LangGraph Studio: Conext Schema and Visual Editor LangGraph langsmith-studio , python-help	0	211	August 6, 2025
Neo4jVector from only the text based node properties LangChain python-help	0	141	August 20, 2025
Model agnostic multimodal LLM call LangGraph python-help	2	200	October 30, 2025
Cache on langgraph cloud LangGraph python-help	0	193	August 6, 2025
Langgraph nodes rewrite query with form metadata LangGraph intro-to-langgraph , python-help	3	139	November 6, 2025

GraphDocument with multiple Document sources

Related topics