Hello there! I am trying to ingest data in Neo4J for a hybrid RAG application.
I have seen some tutorials where you basically do this:
- Load
Documentobjects through a loader - Chunk the document
- Feed them to a LLM transformer which creates
GraphDocumentwithGraphDocument.sourcepointing to the originalDocumentchunk - Add them to Neo4J using
Neo4jGraph.add_graph_document(document, include_source=True)
My JSON structures contains basically Title, Summary, Long Description (some additional irrelevant properties) so I’d like to build my data as:
- GraphDocument (everything but the description)
- GraphDocument.source (all the JSON data, formatted and with embeddings for vector queries)
Now, my data is already available in a structured JSON format and I know exactly what the GraphDocument has to contain, I don’t really need to use an LLM to read the JSON structure and infer the structure (it’s slower, expensive, and not deterministic)
I thought to build it manually, however I found an issue I didn’t find any documentation for: GraphDocument.source is a single Document object while inevitably the description will become large at some point and it will need to be chunked in multiple Document objects.
I’m not sure what purpose GraphDocument.source serves (apart from creating the MENTION relationships). I could keep Vector and Graph data separate and create manually such relationships but is it the right approach?
Wouldn’t be better supporting multiple sources in the GraphDocument class?