LLM Function Calling: The Complete Guide with Examples

Quick answer: Function calling lets LLMs output structured JSON to invoke external functions instead of (or in addition to) generating text. The model decides when to call a function, what arguments to pass, and synthesizes a final response from the function's output. It's the foundation of all modern LLM agents.

How function calling works

The flow is:

You define tools (functions with JSON schema descriptions)
You send a user message
The model decides whether to call a tool (or answer directly)
If it calls a tool, it returns a structured JSON object with the function name and arguments
You execute the function and return the result
The model synthesizes a final response using the tool result

Defining tools with OpenAI

weather_tool = {
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather conditions for a city. Use when the user asks about weather in a specific location.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name, e.g. 'London' or 'New York'"
                },
                "units": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature units"
                }
            },
            "required": ["city"]
        }
    }
}

Key insight: The description fields are critical — they're what the model reads to decide whether to call the function and how to call it. Write descriptions as if explaining to a smart person who doesn't know your system.

Executing a tool call (OpenAI)

from openai import OpenAI
import json

client = OpenAI()

def get_weather(city: str, units: str = "celsius") -> dict:
    # Your actual weather API call here
    return {"city": city, "temp": 18, "conditions": "partly cloudy", "units": units}

def run_agent(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
    tools = [weather_tool]
    
    # First call - model may call a tool
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        tool_choice="auto"
    )
    
    # If no tool call, return the direct response
    if response.choices[0].finish_reason == "stop":
        return response.choices[0].message.content
    
    # Execute the tool call
    tool_call = response.choices[0].message.tool_calls[0]
    args = json.loads(tool_call.function.arguments)
    result = get_weather(**args)
    
    # Send result back and get final response
    messages.append(response.choices[0].message)
    messages.append({
        "role": "tool",
        "content": json.dumps(result),
        "tool_call_id": tool_call.id
    })
    
    final_response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    return final_response.choices[0].message.content

print(run_agent("What's the weather like in Tokyo?"))

Parallel tool calls

Modern LLMs can call multiple tools in parallel in a single response:

# Response may contain multiple tool_calls
if response.choices[0].finish_reason == "tool_calls":
    tool_calls = response.choices[0].message.tool_calls
    
    # Execute all tool calls in parallel
    import asyncio
    async def execute_all(calls):
        tasks = [execute_tool(call) for call in calls]
        return await asyncio.gather(*tasks)
    
    results = asyncio.run(execute_all(tool_calls))

Anthropic tool use syntax

import anthropic

client = anthropic.Anthropic()

tools = [{
    "name": "search_database",
    "description": "Search the product database for items matching a query",
    "input_schema": {
        "type": "object",
        "properties": {
            "query": {"type": "string"},
            "limit": {"type": "integer", "default": 10}
        },
        "required": ["query"]
    }
}]

response = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "Find me waterproof hiking boots under $200"}]
)

if response.stop_reason == "tool_use":
    tool_use = next(b for b in response.content if b.type == "tool_use")
    result = search_database(**tool_use.input)
    
    # Continue conversation with result
    response2 = client.messages.create(
        model="claude-sonnet-4",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "Find me waterproof hiking boots under $200"},
            {"role": "assistant", "content": response.content},
            {"role": "user", "content": [{"type": "tool_result", "tool_use_id": tool_use.id, "content": json.dumps(result)}]}
        ]
    )

Best practices

Limit your tool count: Models perform worse with >10 tools. Start with 3-5 and add more only when needed.
Write great descriptions: The description determines when and how the model calls your function. Test edge cases.
Validate outputs: Always validate tool call arguments before executing. Treat LLM-generated JSON as untrusted input.
Add error handling: Return meaningful error messages from tools — the model uses them to self-correct.

For agentic applications, see best LLMs for automation for the top models by function calling reliability.

Methodology

All benchmarks, pricing, and performance figures cited in this article are sourced from publicly available data: provider pricing pages (verified 2026-04-16), LMSYS Chatbot Arena ELO leaderboard, MTEB retrieval benchmark, and independent API tests. Costs are listed as per-million-token input/output unless noted. Rankings reflect the publication date and change as models update.

LLM Function Calling Complete Guide 2026: Tool Use with GPT-4o, Claude, and Gemini