Make ToolCallLimitMiddleware proactive via before_model hook

The `ToolCallLimitMiddleware` is currently reactive, which means the limit logic is invoked after the limit is reached which has few downsides:

  1. This might make LLM to utilize tools less effectively since it is not aware of tool limit from the beginning. When the limit is reached we panick the LLM by sending tool call limit reached message all of a sudden which it was not aware of

  2. The reactive way of detecting tool call limit leads to additional tool call which doesn’t get used.

Use Case

LLM can plan ahead if it is aware of the concept of tool call limit and MAYBE converge better. Compare it with how a human would act in real life, in a game if we are aware of no of retries/no of tries left we do plan accordingly right?

Proposed Solution

This can be implemted by using `@before_model` hook where an additional message is passed which informs LLM how many tool calls left.

[
HumanMessage(content='Get all current affairs across the globe'),
HumanMessage(content=f'You are left with {tool_calls_left}' tool calls, plan accordingly), # inserted before model call
AIMessage(content=...)
ToolMessage(content=...)
HumanMessage(content=f'You are left with {tool_calls_left}' tool calls, plan accordingly), # inserted before next model call
AIMessage(content=...)
...
]

Alternatives Considered

The proposed solution requires no additional configuration through constructor parameters and the remaining tool call left message is sent after each iteration., we can make few enhancements to the proposed solution:

  1. Introduce new attributes to `ToolCallLimitMiddleware` class with which developers can configure the behaviour of tool call remaining message sent to LLM.
class ToolCallLimitMiddleware(BaseCallbackHandler):
    def __init__(
        self,
        max_tool_calls: int = 50,
        proactive: bool = True,  # New flag
        warning_threshold: int = 5  # Warn when N calls left
    ):
        self.max_tool_calls = max_tool_calls
        self.proactive = proactive
        self.warning_threshold = warning_threshold
  1. The tool call remaining message is sent using `HumanMessage` if this is not the right message type we can consider `ToolMessage`

I’ve created a github issue as well for the same

this is a nice and useful feature :slight_smile: