Curious to get other developers’ thoughts on a design tradeoff for a restaurant conversational AI use case.
We’re trying to decide whether menu data should live in the system prompt or be pulled from a database during the conversation.
My concern with the system prompt approach is reliability: the model may hallucinate, miss constraints, or fail to consistently respect add-ons and customizations.
My concern with the database approach is latency: if every item selection or modification requires a DB call, that may slow down the interaction too much for a real-time conversational experience.
For those who have worked on similar systems, how are you thinking about this tradeoff? Have you found a good pattern that balances response quality, accuracy, and latency?
It largely depends on the size and complexity of the menu data. If the menu is small and relatively static, keeping it in the system prompt can work well because it avoids extra round trips and keeps latency low.
However, once the menu becomes large , especially with categories, add-ons, modifiers, pricing rules, or availability changes , injecting everything into the system prompt becomes expensive in token usage and can also increase latency on the LLM side due to the larger context window. It may also make updates harder to manage.
In those cases, a better pattern is to store the menu in a database and expose it through structured tools. A useful approach is to divide the menu by category (e.g., burgers, drinks, desserts) and let the agent fetch only the relevant subset when needed, rather than loading the entire menu every turn.