Parallel Tool Calls: Running Tools Simultaneously (2026)
When an LLM needs data from multiple independent sources, parallel tool calls let it request all of them at once instead of waiting for each sequentially. The model returns multiple tool_use blocks in one response; you execute them concurrently with asyncio.gather() or Promise.all(), then return all results together. This converts O(N) sequential latency to O(1) parallel latency for independent lookups.
When to Use
- ✓Fetching data from multiple independent APIs (stock price + news + weather) that don't depend on each other
- ✓Running multiple database queries simultaneously when results don't depend on each other
- ✓Executing multiple file reads, web searches, or calculations that are independent
- ✓Any agent step where the model needs N things that can be gathered simultaneously
- ✓Reducing latency in data aggregation agents where each data source takes 200-500ms
How It Works
- 1When the model decides to call multiple independent tools, it returns multiple tool_use blocks in a single response (Anthropic) or tool_calls array entries (OpenAI). Both APIs support this natively.
- 2Your orchestration layer detects multiple tool_use blocks in the response and executes them concurrently: asyncio.gather() in Python, Promise.all() in JavaScript.
- 3After all tools complete, return all results as separate tool_result messages in a single user turn. The model then processes all results together and either produces a final answer or requests more tools.
- 4Not all tool calls can be parallelized — if tool B uses the output of tool A, they must run sequentially. Design your tool call graph upfront to identify which calls are parallel vs. sequential.
- 5Prompt the model to use parallel calls when appropriate: 'When you need data from multiple independent sources, request them all at once rather than one at a time.'
Examples
import asyncio
import anthropic
client = anthropic.Anthropic()
async def execute_tool(tool_name: str, tool_input: dict) -> dict:
# Simulate async tool execution
result = await your_async_tool_registry[tool_name](**tool_input)
return {'tool_use_id': None, 'result': result} # id set below
async def run_parallel_tools(tool_uses: list) -> list:
tasks = [execute_tool(t.name, t.input) for t in tool_uses]
results = await asyncio.gather(*tasks)
return [
{
'type': 'tool_result',
'tool_use_id': tool_use.id,
'content': str(result['result'])
}
for tool_use, result in zip(tool_uses, results)
]
# In your agent loop:
if response.stop_reason == 'tool_use':
tool_uses = [b for b in response.content if b.type == 'tool_use']
# Execute ALL tool calls in parallel
tool_results = asyncio.run(run_parallel_tools(tool_uses))
messages.append({'role': 'user', 'content': tool_results})System prompt addition:
'When you need to gather information from multiple independent sources to answer a question, request all tools simultaneously in a single response rather than making sequential requests. For example, if you need stock price, recent news, and analyst rating for a company, request all three tools in one response.'
User: 'What's the investment thesis for Nvidia right now?'
Assistant response (parallel):
[tool_use: get_stock_price(ticker='NVDA')]
[tool_use: get_recent_news(company='Nvidia', days=7)]
[tool_use: get_analyst_ratings(ticker='NVDA')]Common Mistakes
- ✗Not executing tool calls concurrently despite receiving parallel requests — if the model returns 3 tool_use blocks but you execute them sequentially, you lose all the latency benefit. Always check for multiple tool_use blocks and execute them with asyncio.gather().
- ✗Returning tool results sequentially instead of all at once — after executing parallel tools, return ALL results in a single user message. If you return them one by one in separate turns, the model can't correlate them.
- ✗Parallelizing dependent tool calls — if tool B needs the output of tool A, they cannot run in parallel. Identify dependency chains in your tool graph before parallelizing. Dependent calls must remain sequential.
- ✗No timeout for parallel tool execution — if one of 5 parallel tools hangs, the entire parallel batch waits. Set per-tool timeouts (5-10 seconds) and return an error result for timed-out tools rather than blocking the whole batch.
FAQ
Do all models support parallel tool calls?+
GPT-4o, GPT-4o-mini, and Claude 3.5+ all support parallel tool calls natively. Claude 3 Haiku and earlier models support it but are less reliable at batching when not explicitly prompted. Gemini 1.5+ supports parallel function calls. Always test your specific model — behavior varies.
How many tools can run in parallel?+
There's no API limit on parallel tool calls, but practical limits apply: your backend must handle N concurrent requests, and sending 20 tool results back as 20 separate tool_result blocks in one message is valid but adds token overhead. In practice, 2-5 parallel calls per step is the typical range.
How do I handle partial failures in parallel tool calls?+
Return error tool_results for failed calls and successful results for successful ones. The model will see which tools failed and can decide to retry failed calls, ask for user input, or proceed with partial data. Never let one tool failure block the entire parallel batch.
Can parallel tool calls include tool calls that depend on each other?+
No — tools in the same parallel batch must be independent. For mixed scenarios, use sequential batches: first parallel batch for independent calls, then sequential call(s) that depend on first batch results. The model naturally structures calls this way when it understands dependencies, which is why good tool descriptions matter.
Does parallel tool use affect token costs?+
Minimally. Parallel tool calls return all tool_use blocks in one response (same tokens as sequential first call), and all tool_results in one user message (fewer tokens than multiple sequential user messages). The tool definition tokens are fixed regardless of how many calls are made. Net effect: parallel is slightly cheaper due to fewer conversation turns.