Hey folks, I’d be eager to better unit test tools but the current experience is not super great:
How to get typing when importing a tool? The “invoke” function is untyped making it hard to discover the expected args for `invoke`
Is it possible to tie some metadata to a tool directly? This would be also useful for other use cases. For instance, I’d like to treat some tools as “final”, meaning that when they are called, the agent should not call a tool afterwards. More broadly it could be interesting to tag them, or even attach a nice “print” function to a tool to optimize how it displays its results
I’d like to avoid exporting separately the unwrapped function, as I think a tool should be tested as a tool
Example code I’d like to improve:
class TestDocumentsTools(unittest.TestCase):
def test_read_document_file_text_content(self):
filepath = path.join(assets_dir, "./foo.odt")
# could be easier to call the wrapped function rather than the tool here
content = read_document_file_text_content.invoke(filepath)
self.assertTrue(content.find("foo") > -1)
self.assertFalse(content.find("<") > -1)
def test_copyfile_dst_does_not_exist(self):
not_exist_dir = path.join(assets_dir, "does_not_exist")
# not typed: flaky and hard to figure
# I would often pass args in order rather than using a dict
copy_res = copy_file.invoke(
{"filepath": "./foo.py", "new_directory_or_filepath": not_exist_dir})
self.assertEqual(copy_res, "Destination does not exist")
Typed invoke: Not possible today; the decorator erases types. Use tool.func(arg1, arg2) instead to get positional args and full type hints on the original function.
Metadata: Already built in: return_direct=True for “final” tools, tags=[...] for tagging, metadata={...} for arbitrary data. All settable via @tool(return_direct=True, tags=["foo"]).
Test without exporting the raw function: Use tool.func(...) directly in tests. It’s the original unwrapped callable, fully typed, no dict needed.
1. Typing for invoke
This is a fundamental limitation of the @tool decorator’s current design. The decorator’s overloads all return BaseTool (an erased type), and BaseTool.invoke is:
Both input: str | dict | ToolCall and -> Any are erased from the original function signature. There is no generic StructuredTool[InputModel, ReturnType] that would flow the original types through.
Your practical options today:
Option A — Use tool.func directly in tests.StructuredTool stores the original callable as .func:
func: Callable[..., Any] | None = None
"""The function to run when the tool is called."""
So copy_file.func("./foo.py", not_exist_dir) gives you full positional args + type checking, because you’re calling the original function. The test suite itself does this (line 2239 in test_tools.py). The downside is that it bypasses tool-layer machinery (callbacks, error handling), which may or may not matter in unit tests.
Option B — Cast at import. If you want invoke to be discoverable, you can annotate the tool at definition with a protocol or cast:
from typing import cast
from langchain_core.tools import StructuredTool
@tool
def copy_file(filepath: str, new_directory_or_filepath: str) -> str:
"""..."""
...
copy_file = cast(StructuredTool, copy_file)
# Now at least your IDE knows it's a StructuredTool and you can inspect .func
Option C — Use args_schema explicitly with a typed Pydantic model, then call invoke with that model directly. Still returns Any, but the schema and field names are fully discoverable.
The “proper” fix would require the @tool decorator to return a generic StructuredTool[ArgsModel, ReturnType] that propagates through invoke. That’s a deeper type system change not currently present in the codebase.
2. Metadata / tagging tools
Good news: BaseTool already has most of what you want:
return_direct: bool = False
"""Whether to return the tool's output directly.
Setting this to `True` means that after the tool is called, the `AgentExecutor` will
stop looping.
"""
...
tags: list[str] | None = None
"""Optional list of tags associated with the tool.
...
"""
metadata: dict[str, Any] | None = None
"""Optional metadata associated with the tool.
...
"""
...
extras: dict[str, Any] | None = None
"""Optional provider-specific extra fields for the tool.
...
"""
“final” / stop-after-call: return_direct=True is exactly this. You can set it via the decorator: @tool(return_direct=True).
Tags: tags=["final", "display:table"] works out of the box.
Arbitrary metadata: metadata={"is_final": True, "display_fn": my_fn} also works, though storing callables in metadata is unconventional.
Custom print/display: There’s no first-class display_fn field, but you could store it in metadata or, more idiomatically, subclass StructuredTool to add typed fields.
3. Testing the tool as a tool (without exporting the raw function)
The cleanest solution for invoking with positional arguments while staying on the tool is calling .func on the StructuredTool, which gives you the fully typed original function without a separate export:
class TestDocumentsTools(unittest.TestCase):
def test_read_document_file_text_content(self):
filepath = path.join(assets_dir, "./foo.odt")
# Call the underlying function with positional args — fully typed
content = read_document_file_text_content.func(filepath)
self.assertTrue(content.find("foo") > -1)
def test_copyfile_dst_does_not_exist(self):
not_exist_dir = path.join(assets_dir, "does_not_exist")
# Positional, typed, no dict needed
copy_res = copy_file.func("./foo.py", not_exist_dir)
self.assertEqual(copy_res, "Destination does not exist")
If you want to test the full tool path (callbacks, error handling, validation), keep using invoke with a dict — that’s the intended interface. For pure logic tests, .func(...) is cleaner and the pattern the project itself uses.
Summary table:
Goal
Current solution
Positional args + type hints
tool.func(arg1, arg2)
Keyword args via dict
tool.invoke({"key": val})
“Final” / stop-after
@tool(return_direct=True)
Tagging,.
@tool(...) + tags=[...]
Arbitrary metadata
metadata={...} on BaseTool
Typed invoke return
Not available; requires generic StructuredTool[I, O]
Wow thanks for the detailed answer! Somehow I couldn’t find these from either docs or chat.langchain, this totally solves my questions and will vastly improve my tools setup!!
For `return_direct` this will cover some use cases, while some others are more subtle, in this case I would add a “maybe_return_direct” tag for instance and have the LLM to decide whether it should process the ToolMessage or “just” send its result as is. Thanks again!