LLM Function Calling: The Complete Guide with Examples
Quick answer: Function calling lets LLMs output structured JSON to invoke external functions instead of (or in addition to) generating text. The model decides when to call a function, what arguments to pass, and synthesizes a final response from the function's output. It's the foundation of all modern LLM agents.
How function calling works
The flow is:
- You define tools (functions with JSON schema descriptions)
- You send a user message
- The model decides whether to call a tool (or answer directly)
- If it calls a tool, it returns a structured JSON object with the function name and arguments
- You execute the function and return the result
- The model synthesizes a final response using the tool result
Defining tools with OpenAI
weather_tool = {
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather conditions for a city. Use when the user asks about weather in a specific location.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'London' or 'New York'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
}
Key insight: The description fields are critical — they're what the model reads to decide whether to call the function and how to call it. Write descriptions as if explaining to a smart person who doesn't know your system.
Executing a tool call (OpenAI)
from openai import OpenAI
import json
client = OpenAI()
def get_weather(city: str, units: str = "celsius") -> dict:
# Your actual weather API call here
return {"city": city, "temp": 18, "conditions": "partly cloudy", "units": units}
def run_agent(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
tools = [weather_tool]
# First call - model may call a tool
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
# If no tool call, return the direct response
if response.choices[0].finish_reason == "stop":
return response.choices[0].message.content
# Execute the tool call
tool_call = response.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
result = get_weather(**args)
# Send result back and get final response
messages.append(response.choices[0].message)
messages.append({
"role": "tool",
"content": json.dumps(result),
"tool_call_id": tool_call.id
})
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
return final_response.choices[0].message.content
print(run_agent("What's the weather like in Tokyo?"))
Parallel tool calls
Modern LLMs can call multiple tools in parallel in a single response:
# Response may contain multiple tool_calls
if response.choices[0].finish_reason == "tool_calls":
tool_calls = response.choices[0].message.tool_calls
# Execute all tool calls in parallel
import asyncio
async def execute_all(calls):
tasks = [execute_tool(call) for call in calls]
return await asyncio.gather(*tasks)
results = asyncio.run(execute_all(tool_calls))
Anthropic tool use syntax
import anthropic
client = anthropic.Anthropic()
tools = [{
"name": "search_database",
"description": "Search the product database for items matching a query",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string"},
"limit": {"type": "integer", "default": 10}
},
"required": ["query"]
}
}]
response = client.messages.create(
model="claude-sonnet-4",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "Find me waterproof hiking boots under $200"}]
)
if response.stop_reason == "tool_use":
tool_use = next(b for b in response.content if b.type == "tool_use")
result = search_database(**tool_use.input)
# Continue conversation with result
response2 = client.messages.create(
model="claude-sonnet-4",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "Find me waterproof hiking boots under $200"},
{"role": "assistant", "content": response.content},
{"role": "user", "content": [{"type": "tool_result", "tool_use_id": tool_use.id, "content": json.dumps(result)}]}
]
)
Best practices
- Limit your tool count: Models perform worse with >10 tools. Start with 3-5 and add more only when needed.
- Write great descriptions: The description determines when and how the model calls your function. Test edge cases.
- Validate outputs: Always validate tool call arguments before executing. Treat LLM-generated JSON as untrusted input.
- Add error handling: Return meaningful error messages from tools — the model uses them to self-correct.
For agentic applications, see best LLMs for automation for the top models by function calling reliability.
Methodology
All benchmarks, pricing, and performance figures cited in this article are sourced from publicly available data: provider pricing pages (verified 2026-04-16), LMSYS Chatbot Arena ELO leaderboard, MTEB retrieval benchmark, and independent API tests. Costs are listed as per-million-token input/output unless noted. Rankings reflect the publication date and change as models update.