AI Function Calling Code: Tool Use Snippets for LLMs

Function calling changed everything about what you can build with LLMs. Before it existed, getting AI to take real actions was a mess of prompt engineering and regex parsing. Now? You define a function schema, and the model tells you exactly what to call with structured arguments. No parsing nightmares, no “I hope the model formatted this correctly.”

I use function calling in almost every AI project now—from simple chatbots that check the weather to complex agents that orchestrate entire workflows. The patterns here are the ones I reach for constantly, refined through building dozens of production features.

Every snippet is copy-paste ready and tested with the latest API versions.

Understanding Function Calling

Function calling (also called “tool use”) lets LLMs request the execution of predefined functions. The model doesn’t actually run code—it requests a function call, you execute it, and you return the result.

The flow works like this:

You define functions the model can use (name, description, parameters)
You send a message along with available functions
The model decides if it needs a function and returns a function call request
You execute the function with the provided arguments
You return the result to the model
The model incorporates the result into its response

When to use function calling:

Accessing real-time data (weather, stocks, databases)
Taking actions (sending emails, creating records)
Structured data extraction
Building AI agents that interact with systems

For a comparison with the Model Context Protocol approach, see our guide on MCP vs function calling.

Choosing Your Integration Approach

Function calling isn’t the only way to connect LLMs to external systems. Here’s when to use each approach:

Approach	Best For	Trade-offs
Function calling	Structured tool use within a conversation	Vendor-specific, tightly coupled to your code
MCP (Model Context Protocol)	Reusable tools across applications, enterprise deployments	More infrastructure, separate server process
Output parsing	Simple extractions, when you just need structured data	Fragile, model-dependent formatting
ReAct agents	Complex reasoning chains, when the model needs to plan	Higher latency, harder to debug

My recommendation: Start with function calling for its simplicity. Move to MCP when you’re building tools you’ll reuse across projects or need to share across a team. The concepts transfer directly—MCP tools are essentially function definitions with a standard transport layer.

OpenAI Function Calling

OpenAI’s function calling is the most mature implementation. GPT-5 handles complex multi-function scenarios reliably. For the official documentation, see the OpenAI Function Calling Guide.

Understanding the Schema Design

The function schema is the most important part of your implementation. A well-designed schema leads to reliable function calls; a vague schema leads to frustration and hallucinated arguments.

Key principles for schema design:

Specific descriptions: Don’t write “Gets weather.” Write “Retrieves current weather conditions including temperature, humidity, and conditions for a specified city.” The model uses this text to decide when to call the function.
Parameter constraints: Use enum for fixed options, minimum/maximum for numbers, and clear descriptions for each parameter. The more constraints you provide, the more reliable the outputs.
Required vs optional: Only mark parameters as required if they’re truly necessary. Optional parameters with good defaults reduce friction.

The JSON Schema format follows the JSON Schema specification. You don’t need to master the full spec, but understanding type, properties, required, enum, and description covers 95% of use cases.

Basic Function Definition

Functions are defined as JSON schemas:

from openai import OpenAI

client = OpenAI()

# Define a weather function
weather_function = {
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location. Use this when the user asks about weather conditions.",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and state/country, e.g., 'San Francisco, CA' or 'London, UK'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit preference"
                }
            },
            "required": ["location"]
        }
    }
}

# Define multiple functions
tools = [
    weather_function,
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for current information",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    }
                },
                "required": ["query"]
            }
        }
    }
]

Single Function Call

import json

def get_weather(location: str, unit: str = "celsius") -> dict:
    """Your actual weather API implementation."""
    # This would call a real weather API
    return {
        "location": location,
        "temperature": 22 if unit == "celsius" else 72,
        "unit": unit,
        "conditions": "sunny"
    }

def run_conversation(user_message: str):
    """Complete function calling flow."""
    
    messages = [{"role": "user", "content": user_message}]
    
    # First API call - may request function
    response = client.chat.completions.create(
        model="gpt-5-turbo",
        messages=messages,
        tools=tools
    )
    
    message = response.choices[0].message
    
    # Check if model wants to call a function
    if message.tool_calls:
        # Add the assistant's response to messages
        messages.append(message)
        
        # Execute each function call
        for tool_call in message.tool_calls:
            function_name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments)
            
            # Route to appropriate function
            if function_name == "get_weather":
                result = get_weather(**arguments)
            elif function_name == "search_web":
                result = search_web(**arguments)
            else:
                result = {"error": f"Unknown function: {function_name}"}
            
            # Add function result to messages
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            })
        
        # Second API call - get final response
        final_response = client.chat.completions.create(
            model="gpt-5-turbo",
            messages=messages,
            tools=tools
        )
        
        return final_response.choices[0].message.content
    
    # No function call needed
    return message.content

# Usage
response = run_conversation("What's the weather like in Tokyo?")
print(response)

Parallel Function Calling

GPT-5 can request multiple functions simultaneously:

def run_parallel_functions(user_message: str):
    """Handle parallel function calls."""
    
    messages = [{"role": "user", "content": user_message}]
    
    response = client.chat.completions.create(
        model="gpt-5-turbo",
        messages=messages,
        tools=tools,
        parallel_tool_calls=True  # Enabled by default
    )
    
    message = response.choices[0].message
    
    if message.tool_calls:
        messages.append(message)
        
        # Execute all function calls (potentially in parallel)
        import concurrent.futures
        
        def execute_tool(tool_call):
            function_name = tool_call.function.name
            arguments = json.loads(tool_call.function.arguments)
            
            if function_name == "get_weather":
                return tool_call.id, get_weather(**arguments)
            elif function_name == "search_web":
                return tool_call.id, search_web(**arguments)
            return tool_call.id, {"error": "Unknown function"}
        
        # Execute in parallel
        with concurrent.futures.ThreadPoolExecutor() as executor:
            results = list(executor.map(execute_tool, message.tool_calls))
        
        # Add all results
        for tool_call_id, result in results:
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call_id,
                "content": json.dumps(result)
            })
        
        # Get final response
        final = client.chat.completions.create(
            model="gpt-5-turbo",
            messages=messages,
            tools=tools
        )
        
        return final.choices[0].message.content
    
    return message.content

# Example: "What's the weather in Tokyo and New York?"
# Model will call get_weather twice in parallel

Forcing Function Use

Sometimes you want to guarantee the model calls a specific function:

# Force any function call
response = client.chat.completions.create(
    model="gpt-5-turbo",
    messages=messages,
    tools=tools,
    tool_choice="required"  # Must call at least one function
)

# Force a specific function
response = client.chat.completions.create(
    model="gpt-5-turbo",
    messages=messages,
    tools=tools,
    tool_choice={
        "type": "function",
        "function": {"name": "get_weather"}
    }
)

# Disable function calling for this request
response = client.chat.completions.create(
    model="gpt-5-turbo",
    messages=messages,
    tools=tools,
    tool_choice="none"
)

For more OpenAI patterns, see our OpenAI API tutorial.

Claude Tool Use

Anthropic’s Claude calls these “tools” rather than functions, but the concept is identical. Claude 4 has excellent tool use capabilities. Check the official Anthropic Tool Use documentation for the latest features.

Tool Definition

from anthropic import Anthropic

client = Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location. Call this when users ask about weather.",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City and country, e.g., 'Tokyo, Japan'"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    },
    {
        "name": "search_database",
        "description": "Search the product database",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"},
                "category": {"type": "string"},
                "max_results": {"type": "integer", "default": 10}
            },
            "required": ["query"]
        }
    }
]

Single Tool Use

def run_claude_tools(user_message: str):
    """Claude tool use flow."""
    
    messages = [{"role": "user", "content": user_message}]
    
    response = client.messages.create(
        model="claude-4-sonnet",
        max_tokens=1024,
        tools=tools,
        messages=messages
    )
    
    # Check stop reason
    if response.stop_reason == "tool_use":
        # Extract tool use blocks
        tool_uses = [
            block for block in response.content
            if block.type == "tool_use"
        ]
        
        # Add assistant response
        messages.append({"role": "assistant", "content": response.content})
        
        # Execute tools and collect results
        tool_results = []
        for tool_use in tool_uses:
            result = execute_tool(tool_use.name, tool_use.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": tool_use.id,
                "content": json.dumps(result)
            })
        
        # Add tool results
        messages.append({"role": "user", "content": tool_results})
        
        # Get final response
        final = client.messages.create(
            model="claude-4-sonnet",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )
        
        return extract_text(final.content)
    
    return extract_text(response.content)

def execute_tool(name: str, inputs: dict):
    """Execute a tool by name."""
    if name == "get_weather":
        return get_weather(**inputs)
    elif name == "search_database":
        return search_database(**inputs)
    return {"error": f"Unknown tool: {name}"}

def extract_text(content) -> str:
    """Extract text from Claude response content."""
    return "".join(
        block.text for block in content
        if hasattr(block, "text")
    )

Multi-Turn Tool Use

Claude can use tools across multiple conversation turns:

def chat_with_tools(conversation: list):
    """Multi-turn conversation with tools."""
    
    while True:
        response = client.messages.create(
            model="claude-4-sonnet",
            max_tokens=1024,
            tools=tools,
            messages=conversation
        )
        
        # Add response to conversation
        conversation.append({
            "role": "assistant",
            "content": response.content
        })
        
        # If not a tool use, return the response
        if response.stop_reason != "tool_use":
            return extract_text(response.content)
        
        # Execute all tools
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = execute_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(result)
                })
        
        # Add results for next iteration
        conversation.append({
            "role": "user",
            "content": tool_results
        })

For more Claude patterns, see our Claude API tutorial.

Gemini Function Calling

Google’s Gemini also supports function calling with a similar pattern:

import google.generativeai as genai
from google.generativeai.types import FunctionDeclaration, Tool

genai.configure(api_key="your-api-key")

# Define functions
get_weather_func = FunctionDeclaration(
    name="get_weather",
    description="Get the weather for a location",
    parameters={
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "City name"
            }
        },
        "required": ["location"]
    }
)

# Create tool with functions
weather_tool = Tool(function_declarations=[get_weather_func])

# Create model with tools
model = genai.GenerativeModel(
    "gemini-3-pro",
    tools=[weather_tool]
)

def run_gemini_functions(prompt: str):
    """Gemini function calling flow."""
    
    chat = model.start_chat()
    response = chat.send_message(prompt)
    
    # Check for function calls
    for part in response.parts:
        if hasattr(part, "function_call"):
            func_call = part.function_call
            
            # Execute function
            result = execute_function(func_call.name, dict(func_call.args))
            
            # Send result back
            response = chat.send_message(
                genai.protos.Content(
                    parts=[genai.protos.Part(
                        function_response=genai.protos.FunctionResponse(
                            name=func_call.name,
                            response={"result": result}
                        )
                    )]
                )
            )
    
    return response.text

Common Tool Patterns

Here are reusable tool patterns for common use cases.

Web Search Tool

def create_search_tool():
    """Web search tool using a search API."""
    return {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for current information. Use for facts, news, or anything needing up-to-date data.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search query"
                    },
                    "num_results": {
                        "type": "integer",
                        "description": "Number of results (1-10)",
                        "default": 5
                    }
                },
                "required": ["query"]
            }
        }
    }

def search_web(query: str, num_results: int = 5) -> list:
    """Execute web search."""
    # Use your preferred search API (Serp, Tavily, etc.)
    import requests
    
    response = requests.get(
        "https://api.search.example/search",
        params={"q": query, "num": num_results},
        headers={"Authorization": f"Bearer {API_KEY}"}
    )
    
    return response.json()["results"]

Database Query Tool

def create_database_tool():
    """SQL query tool with safety constraints."""
    return {
        "type": "function",
        "function": {
            "name": "query_database",
            "description": "Execute a read-only SQL query. Only SELECT queries allowed.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "SQL SELECT query"
                    },
                    "limit": {
                        "type": "integer",
                        "description": "Max rows to return",
                        "default": 100
                    }
                },
                "required": ["query"]
            }
        }
    }

def query_database(query: str, limit: int = 100) -> dict:
    """Execute safe database query."""
    
    # Safety: Only allow SELECT
    query_upper = query.strip().upper()
    if not query_upper.startswith("SELECT"):
        return {"error": "Only SELECT queries allowed"}
    
    # Prevent dangerous operations
    dangerous = ["DROP", "DELETE", "UPDATE", "INSERT", "ALTER", "TRUNCATE"]
    if any(kw in query_upper for kw in dangerous):
        return {"error": "Query contains forbidden keywords"}
    
    # Add LIMIT if not present
    if "LIMIT" not in query_upper:
        query = f"{query} LIMIT {limit}"
    
    # Execute (use your database connection)
    import sqlite3
    conn = sqlite3.connect("database.db")
    cursor = conn.cursor()
    cursor.execute(query)
    
    columns = [desc[0] for desc in cursor.description]
    rows = cursor.fetchall()
    
    return {
        "columns": columns,
        "rows": rows,
        "row_count": len(rows)
    }

API Call Tool

def create_api_tool():
    """Generic HTTP API call tool."""
    return {
        "type": "function",
        "function": {
            "name": "call_api",
            "description": "Make an HTTP request to an API endpoint",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {
                        "type": "string",
                        "description": "API endpoint URL"
                    },
                    "method": {
                        "type": "string",
                        "enum": ["GET", "POST"],
                        "description": "HTTP method"
                    },
                    "body": {
                        "type": "object",
                        "description": "Request body for POST requests"
                    }
                },
                "required": ["url", "method"]
            }
        }
    }

def call_api(url: str, method: str, body: dict = None) -> dict:
    """Execute API call with safety checks."""
    import requests
    
    # Whitelist allowed domains
    allowed_domains = ["api.example.com", "api.trusted.io"]
    from urllib.parse import urlparse
    
    domain = urlparse(url).netloc
    if domain not in allowed_domains:
        return {"error": f"Domain not allowed: {domain}"}
    
    try:
        if method == "GET":
            response = requests.get(url, timeout=10)
        else:
            response = requests.post(url, json=body, timeout=10)
        
        return {
            "status_code": response.status_code,
            "data": response.json()
        }
    except Exception as e:
        return {"error": str(e)}

File Operations Tool

def create_file_tool():
    """Safe file reading tool."""
    return {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read contents of a file",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "File path relative to workspace"
                    }
                },
                "required": ["path"]
            }
        }
    }

def read_file(path: str) -> dict:
    """Read file with safety constraints."""
    import os
    
    # Prevent directory traversal
    if ".." in path or path.startswith("/"):
        return {"error": "Invalid path"}
    
    # Restrict to workspace directory
    workspace = "/app/workspace"
    full_path = os.path.join(workspace, path)
    
    # Ensure path is within workspace
    if not os.path.abspath(full_path).startswith(workspace):
        return {"error": "Path outside workspace"}
    
    if not os.path.exists(full_path):
        return {"error": "File not found"}
    
    try:
        with open(full_path, "r") as f:
            content = f.read()
        return {"content": content, "size": len(content)}
    except Exception as e:
        return {"error": str(e)}

Parallel and Multi-Tool Patterns

Tool Chaining

When one tool’s output feeds another:

def chained_tool_flow(user_message: str):
    """Handle tools that depend on each other."""
    
    messages = [{"role": "user", "content": user_message}]
    max_iterations = 5
    iteration = 0
    
    while iteration < max_iterations:
        iteration += 1
        
        response = client.chat.completions.create(
            model="gpt-5-turbo",
            messages=messages,
            tools=tools
        )
        
        message = response.choices[0].message
        
        # No more tool calls - return final answer
        if not message.tool_calls:
            return message.content
        
        messages.append(message)
        
        # Execute all tool calls
        for tool_call in message.tool_calls:
            result = execute_tool(
                tool_call.function.name,
                json.loads(tool_call.function.arguments)
            )
            
            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            })
    
    return "Max iterations reached"

Error Handling in Tool Use

Robust error handling for function calling:

def safe_tool_execution(tool_call) -> dict:
    """Execute tool with comprehensive error handling."""
    
    function_name = tool_call.function.name
    
    # Parse arguments safely
    try:
        arguments = json.loads(tool_call.function.arguments)
    except json.JSONDecodeError as e:
        return {
            "error": "Invalid JSON arguments",
            "details": str(e)
        }
    
    # Validate required parameters
    tool_schema = get_tool_schema(function_name)
    required = tool_schema.get("required", [])
    missing = [p for p in required if p not in arguments]
    if missing:
        return {"error": f"Missing required parameters: {missing}"}
    
    # Execute with timeout
    import concurrent.futures
    
    try:
        with concurrent.futures.ThreadPoolExecutor() as executor:
            future = executor.submit(
                execute_tool, function_name, arguments
            )
            result = future.result(timeout=30)
            return result
    except concurrent.futures.TimeoutError:
        return {"error": "Tool execution timed out"}
    except Exception as e:
        return {"error": f"Execution failed: {str(e)}"}

Retry Patterns with Exponential Backoff

When tools fail, you don’t always want to give up immediately. Here’s a retry pattern I’ve found reliable in production:

import time
import random

def execute_with_retry(tool_call, max_retries: int = 3) -> dict:
    """Retry failed tools with exponential backoff."""
    
    base_delay = 1.0  # Initial delay in seconds
    
    for attempt in range(max_retries + 1):
        result = safe_tool_execution(tool_call)
        
        # Success - return immediately
        if "error" not in result:
            return result
        
        # Check if error is retryable
        error = result.get("error", "")
        retryable_errors = [
            "rate limit",
            "timeout",
            "connection",
            "temporary",
            "503",
            "429"
        ]
        
        is_retryable = any(err in error.lower() for err in retryable_errors)
        
        if not is_retryable or attempt == max_retries:
            return result
        
        # Calculate delay with jitter
        delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
        print(f"Retry {attempt + 1}/{max_retries} after {delay:.2f}s: {error}")
        time.sleep(delay)
    
    return result

Graceful Degradation Strategies

Sometimes the best response to a tool failure is to gracefully fall back. Here’s a pattern that keeps your agent running even when tools fail:

def execute_with_fallback(tool_call, fallback_tools: dict) -> dict:
    """Try primary tool, fall back to alternatives on failure."""
    
    primary_result = safe_tool_execution(tool_call)
    
    if "error" not in primary_result:
        return primary_result
    
    # Check for fallback
    function_name = tool_call.function.name
    fallback_fn = fallback_tools.get(function_name)
    
    if fallback_fn:
        try:
            arguments = json.loads(tool_call.function.arguments)
            fallback_result = fallback_fn(**arguments)
            return {
                "result": fallback_result,
                "source": "fallback",
                "primary_error": primary_result["error"]
            }
        except Exception as e:
            pass  # Fall through to return error
    
    # Return original error with context
    return {
        **primary_result,
        "fallback_attempted": fallback_fn is not None
    }

# Example fallback configuration
fallback_tools = {
    "get_weather": lambda location, **kwargs: {
        "location": location,
        "status": "Weather service unavailable",
        "suggestion": "Try again in a few minutes"
    },
    "search_web": lambda query, **kwargs: {
        "query": query,
        "results": [],
        "message": "Search temporarily unavailable"
    }
}

Logging and Debugging Best Practices

I can’t stress this enough—good logging saves hours of debugging. Here’s a structured logging pattern that’s made my life easier:

import logging
import uuid
from datetime import datetime

# Configure structured logging
logger = logging.getLogger("function_calling")

def log_tool_execution(tool_call, result, execution_time: float):
    """Log tool execution with structured data."""
    
    log_data = {
        "trace_id": str(uuid.uuid4()),
        "timestamp": datetime.utcnow().isoformat(),
        "function": tool_call.function.name,
        "execution_time_ms": round(execution_time * 1000, 2),
        "success": "error" not in result,
        "error": result.get("error") if "error" in result else None
    }
    
    # Log at appropriate level
    if log_data["success"]:
        logger.info("Tool execution completed", extra=log_data)
    else:
        logger.error("Tool execution failed", extra=log_data)
    
    return log_data

def traced_tool_execution(tool_call) -> tuple[dict, dict]:
    """Execute tool with full tracing."""
    
    start_time = time.time()
    result = safe_tool_execution(tool_call)
    execution_time = time.time() - start_time
    
    log_data = log_tool_execution(tool_call, result, execution_time)
    
    return result, log_data

Streaming with Function Calls

Streaming responses while handling function calls is one of the trickier patterns to get right. The challenge? Function call arguments arrive in chunks, so you need to accumulate them before executing. Here’s how to handle it elegantly.

Server-Sent Events (SSE) with Tools

This pattern lets you stream text responses while still supporting function calls—the best of both worlds:

import json
from openai import OpenAI

client = OpenAI()

def stream_with_functions(user_message: str, tools: list):
    """Stream responses, handling function calls mid-stream."""
    
    messages = [{"role": "user", "content": user_message}]
    
    response = client.chat.completions.create(
        model="gpt-5-turbo",
        messages=messages,
        tools=tools,
        stream=True
    )
    
    # Accumulators for streamed content
    content_chunks = []
    tool_calls_data = {}  # id -> {name, arguments}
    
    for chunk in response:
        delta = chunk.choices[0].delta
        
        # Handle text content
        if delta.content:
            content_chunks.append(delta.content)
            yield {"type": "content", "data": delta.content}
        
        # Handle tool calls (streamed in chunks)
        if delta.tool_calls:
            for tc in delta.tool_calls:
                tc_id = tc.id or list(tool_calls_data.keys())[-1]
                
                if tc_id not in tool_calls_data:
                    tool_calls_data[tc_id] = {
                        "id": tc_id,
                        "name": tc.function.name if tc.function else None,
                        "arguments": ""
                    }
                
                if tc.function:
                    if tc.function.name:
                        tool_calls_data[tc_id]["name"] = tc.function.name
                    if tc.function.arguments:
                        tool_calls_data[tc_id]["arguments"] += tc.function.arguments
    
    # If we accumulated tool calls, execute them
    if tool_calls_data:
        for tc_id, tc_data in tool_calls_data.items():
            yield {"type": "tool_start", "name": tc_data["name"]}
            
            try:
                args = json.loads(tc_data["arguments"])
                result = execute_tool(tc_data["name"], args)
                yield {"type": "tool_result", "name": tc_data["name"], "result": result}
            except Exception as e:
                yield {"type": "tool_error", "name": tc_data["name"], "error": str(e)}

UI Feedback Patterns

When streaming, users need visual feedback about what’s happening. Here’s a pattern I use for building responsive chat UIs:

import asyncio
from typing import AsyncGenerator

async def stream_with_ui_feedback(
    user_message: str,
    tools: list
) -> AsyncGenerator[dict, None]:
    """Provide rich UI feedback during streaming."""
    
    # Signal start
    yield {"type": "status", "message": "Thinking..."}
    
    messages = [{"role": "user", "content": user_message}]
    current_tool = None
    
    async for event in async_stream_with_functions(messages, tools):
        match event["type"]:
            case "content":
                # Switch status if needed
                if current_tool:
                    yield {"type": "status", "message": "Responding..."}
                    current_tool = None
                yield event
                
            case "tool_start":
                current_tool = event["name"]
                # Friendly tool names for UI
                friendly_names = {
                    "get_weather": "Checking weather",
                    "search_web": "Searching the web",
                    "query_database": "Looking up information"
                }
                status = friendly_names.get(current_tool, f"Using {current_tool}")
                yield {"type": "status", "message": f"{status}..."}
                yield {"type": "tool_indicator", "name": current_tool, "state": "running"}
                
            case "tool_result":
                yield {"type": "tool_indicator", "name": event["name"], "state": "complete"}
                
            case "tool_error":
                yield {"type": "tool_indicator", "name": event["name"], "state": "error"}
                yield {"type": "status", "message": "Encountered an issue, continuing..."}
    
    yield {"type": "status", "message": "Complete"}

Partial Response Handling

One thing that bit me early on: users can disconnect mid-stream, and you need to handle that gracefully:

class StreamingSession:
    """Manage a streaming session with cancellation support."""
    
    def __init__(self):
        self.cancelled = False
        self.accumulated_content = ""
        self.completed_tool_calls = []
    
    def cancel(self):
        """Cancel the streaming session."""
        self.cancelled = True
    
    async def stream(self, messages: list, tools: list):
        """Stream with cancellation support."""
        
        response = await client.chat.completions.create(
            model="gpt-5-turbo",
            messages=messages,
            tools=tools,
            stream=True
        )
        
        async for chunk in response:
            if self.cancelled:
                # Clean up and return partial results
                yield {
                    "type": "cancelled",
                    "partial_content": self.accumulated_content,
                    "completed_tools": self.completed_tool_calls
                }
                return
            
            delta = chunk.choices[0].delta
            if delta.content:
                self.accumulated_content += delta.content
                yield {"type": "content", "data": delta.content}
        
        # Stream completed normally
        yield {"type": "complete"}

Function Calling vs MCP: When to Use Which

I get asked this question constantly. Both function calling and the Model Context Protocol (MCP) let LLMs interact with external systems, but they solve different problems.

The Core Difference

Function calling is request-scoped: You define tools fresh for each API call. The model sees them, potentially uses them, and then they’re gone.

MCP is connection-scoped: Tools persist across a session. You set up a connection once, and the model can use those tools throughout an entire conversation—or across multiple conversations.

Here’s how I think about it:

Function Calling = Building blocks, assembled per request
MCP = Infrastructure, persistent and shared

When to Use Function Calling

Stick with function calling when:

You’re building a single application. If your tools only serve one codebase, function calling keeps things simple. No separate server to run.
Tools change based on context. Maybe you show different functions to different users, or adjust available tools based on conversation state. Function calling gives you that per-request flexibility.
You want minimal infrastructure. Function calling is just JSON in your API call. No additional processes, no connection management, no new dependencies.
You’re prototyping. For quick experiments, function calling gets you from idea to working demo faster.

When to Switch to MCP

Consider MCP (see our MCP guide) when:

You’re sharing tools across applications. Build a database tool once, use it in Claude Desktop, your web app, and your CLI tool. That’s MCP’s sweet spot.
Enterprise deployment. Teams need centralized tool management, access control, and audit logging? MCP provides the infrastructure for that.
Complex tool ecosystems. When you have 20+ tools that need versioning, monitoring, and consistent behavior across contexts.
Long-running sessions. Heavy tools that benefit from persistent connections and cached state.

Performance Considerations

I’ve benchmarked both approaches, and here’s what I found:

Aspect	Function Calling	MCP
Latency (first call)	Lower—no connection setup	Higher—requires handshake
Latency (subsequent)	Same each call	Faster—connection reuse
Token cost	Higher—tools sent every request	Lower—tools defined once
Cold start	Instant	Varies by server

For most applications under 10 tools, function calling wins on simplicity and matches MCP on performance. Beyond that, MCP’s token savings add up.

Migration Path

Here’s the good news: migrating from function calling to MCP isn’t a rewrite. The tool definitions are nearly identical. You’re essentially moving your function schemas to a separate process and adding transport.

# Function calling tool
{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {...}
    }
}

# MCP tool (almost identical)
{
    "name": "get_weather",
    "description": "Get weather for a location", 
    "inputSchema": {...}  # Same as parameters
}

I usually recommend: start with function calling, build your tools, prove they work, then migrate to MCP when you need the scaling benefits.

For a deeper comparison, check out our MCP vs function calling guide.

Performance and Cost Optimization

Function calling adds overhead to your API calls. Here’s how to manage it.

Token Costs

Every function definition consumes tokens. A typical function with description and parameters uses 100-200 tokens. With 10 functions defined, you’re adding 1,000-2,000 tokens to every request before any content.

Optimization strategies:

Only include functions relevant to the current context
Use shorter, more precise descriptions (still clear, but concise)
Consider separate “tool sets” for different conversation types

Latency Patterns

Function calling adds two sources of latency:

Decision time: The model deciding whether to call a function (~50-200ms additional)
Round trips: Each function call requires a new API call

For time-sensitive applications, consider:

Parallel function execution (shown in earlier examples)
Caching function results when inputs repeat
Pre-warming common function calls based on user context

Troubleshooting Common Issues

Function Not Being Called

What you see: You expect the model to call a function, but it responds with text instead.

Why it happens: The model decided your query didn’t need the function, or the function description didn’t clearly match the task.

How to fix it:

Make your function description more specific: instead of “Get weather,” use “Get the current weather conditions for a location. Call this whenever the user asks about weather.”
Use tool_choice="required" to force function usage
Check that required parameters aren’t too restrictive

Invalid JSON in Arguments

What you see: json.JSONDecodeError when parsing function arguments.

Why it happens: The model occasionally produces malformed JSON, especially with complex nested parameters.

How to fix it:

Always wrap JSON parsing in try/except (as shown in error handling section)
Simplify parameter schemas—flatten nested objects when possible
Use Pydantic for validation after parsing

Model Returns Wrong Function

What you see: The model calls search_database when you expected search_web.

Why it happens: Function descriptions overlap or are ambiguous.

How to fix it:

Make function purposes mutually exclusive in descriptions
Add negative examples: “Use for web search. Do NOT use for database queries.”
Reduce the number of similar functions

Frequently Asked Questions

What’s the difference between function calling and MCP?

Function calling is request-scoped—you define tools for each API request. MCP is connection-scoped—tools persist across a session. Function calling is simpler; MCP is more powerful for complex agent architectures.

Can the model call functions that don’t exist?

Yes, and this is a real issue called “hallucinated function calls.” Always validate that the function name exists before executing. The patterns above include this check.

How do I limit which functions the model can use?

Only include the functions you want available in the tools parameter. You can also use tool_choice to force or prevent specific function calls.

Are function call responses counted in token limits?

Yes, both the tool definitions and results count toward your context window. Large tool schemas or verbose results can significantly impact token usage.

How should I handle sensitive operations?

Implement human-in-the-loop for sensitive tools. Show the user what the model wants to do and wait for confirmation before executing.

Can I use function calling with streaming?

Yes, all providers support streaming with function calling. Tool call arguments are streamed in chunks that you accumulate before parsing.

How many functions can I define at once?

There’s no hard limit, but practical constraints matter. More than 15-20 functions significantly increases token usage and can confuse the model. Group related functions logically and only include those relevant to the current task.

Do I need to define functions every request?

Yes, function definitions are sent with each API call. They’re not persisted on the server. This is actually beneficial—you can dynamically adjust available tools per request.

Can functions call other functions?

The model can request multiple function calls in sequence (tool chaining), where the output of one informs the next. The model handles the orchestration; you just execute whatever it requests.

What happens if my function takes too long?

The API call remains open while you execute the function. Implement timeouts (30 seconds is reasonable) and return error objects if functions fail. The model can adapt its response based on the error.

Wrapping Up

Function calling transforms LLMs from text generators to capable agents. The patterns here give you:

OpenAI: Basic, parallel, and forced function calling
Claude: Tool definitions and multi-turn use
Gemini: Function declarations and execution
Common tools: Web search, database, API, file operations
Safety: Input validation and timeout handling

These are the building blocks for AI applications that do real work in the world. Start with the basic patterns, add safety checks, and build up to complex multi-tool agents.

For a complete agent implementation, see our guide on building your first AI agent.