Tool Use: Function Calling in LLM Applications (2026)
Tool use lets you define functions with JSON schemas that the LLM can call when it needs external data or actions. You define the tool name, description, and parameter schema; the model returns a structured JSON call when it decides to use a tool; you execute the function and return the result. This pattern powers everything from web search to database queries to code execution in LLM applications.
When to Use
- ✓The LLM needs real-time data it wasn't trained on (current prices, weather, live database records)
- ✓Tasks require executing code, running calculations, or performing actions with side effects
- ✓Structured output is required and you want the model to fill in fields from retrieved data
- ✓Building agents that can take actions in the world (send emails, create tickets, update records)
- ✓Replacing few-shot prompting for data extraction — tool use produces more reliable structured output
How It Works
- 1Define tools as JSON schemas: each tool has a name, description (explain when to use it), and parameters (input_schema with types and descriptions). The description is critical — it guides the model's tool selection.
- 2Pass tool definitions to the API with the user message. The model reads the tool descriptions and decides whether to call a tool or respond directly.
- 3When the model decides to call a tool, it returns a tool_use block (Anthropic) or function_call (OpenAI) with the tool name and JSON arguments. It stops generating and waits for the result.
- 4Execute the function with the provided arguments and return the result as a tool_result message. The model continues the conversation, now informed by the tool result.
- 5Implement validation on tool inputs — don't blindly execute whatever arguments the model provides. Validate types, ranges, and permissions before calling external systems.
Examples
import anthropic
client = anthropic.Anthropic()
tools = [{
'name': 'get_stock_price',
'description': 'Get the current stock price for a ticker symbol. Use when the user asks about stock prices.',
'input_schema': {
'type': 'object',
'properties': {
'ticker': {
'type': 'string',
'description': 'Stock ticker symbol (e.g., AAPL, NVDA)'
}
},
'required': ['ticker']
}
}]
response = client.messages.create(
model='claude-3-5-sonnet-20241022',
max_tokens=1024,
tools=tools,
messages=[{'role': 'user', 'content': 'What is Nvidia stock trading at today?'}]
)
# Check if model wants to use a tool
if response.stop_reason == 'tool_use':
tool_use = next(b for b in response.content if b.type == 'tool_use')
print(f'Tool: {tool_use.name}, Args: {tool_use.input}')def run_agent(user_message: str, tools: list, max_turns: int = 10):
messages = [{'role': 'user', 'content': user_message}]
for _ in range(max_turns):
response = client.messages.create(
model='claude-3-5-sonnet-20241022',
max_tokens=1024,
tools=tools,
messages=messages
)
if response.stop_reason == 'end_turn':
return response.content[0].text
# Process all tool calls in this response
tool_results = []
for block in response.content:
if block.type == 'tool_use':
result = execute_tool(block.name, block.input)
tool_results.append({
'type': 'tool_result',
'tool_use_id': block.id,
'content': str(result)
})
messages.extend([
{'role': 'assistant', 'content': response.content},
{'role': 'user', 'content': tool_results}
])
return 'Max turns reached'Common Mistakes
- ✗Poor tool descriptions — the model selects tools based on descriptions, not names. 'Get data' is useless; 'Retrieve the current inventory count for a product SKU from the warehouse database' is actionable. Write descriptions that tell the model exactly when to use the tool.
- ✗Not validating tool inputs before execution — LLMs can hallucinate argument values (wrong types, out-of-range numbers, non-existent IDs). Always validate inputs server-side before calling downstream APIs.
- ✗Exposing too many tools at once — beyond ~20 tools, model accuracy on tool selection degrades. Group related tools, use tool namespacing, or dynamically include only relevant tools based on the user's query.
- ✗Not handling tool errors gracefully — if a tool call fails, return a clear error message as the tool_result. Don't return empty results or Python tracebacks — the model can recover from meaningful error messages but not from cryptic failures.
FAQ
What's the difference between tool use and RAG?+
RAG retrieves text and puts it in the prompt (read-only, static data). Tool use calls executable functions with side effects (write access, live data, computation). Use RAG for retrieving knowledge from documents; use tool use when you need live data, computation, or actions. Many agents combine both.
How many tools can I give a model?+
Models reliably handle 5–15 tools. GPT-4o and Claude handle 20+ but accuracy on tool selection decreases. For systems with many tools (50+), implement tool retrieval: embed tool descriptions and dynamically include the 5-10 most relevant tools for each query.
Can the model call multiple tools in parallel?+
Yes — Claude and GPT-4o both support parallel tool calls where the model returns multiple tool_use blocks in a single response. This is called parallel tool calling and is a significant performance optimization. See the parallel-tool-calls skill for details.
How do I prevent the model from calling tools it shouldn't?+
Three layers: (1) Write precise tool descriptions that specify preconditions. (2) Add a tool_choice parameter to force or restrict tool use. (3) Implement application-level guardrails that validate tool calls before execution. For sensitive tools (delete, payment), require explicit user confirmation.
What's forced tool use and when should I use it?+
Forced tool use (tool_choice='required' in Claude, 'force' in OpenAI) requires the model to always call a tool on its next turn. Use it when you always need structured output — it's more reliable than asking the model to produce JSON because the tool schema enforces the output structure. This is the recommended pattern for structured data extraction.