8 Function calling

Whereas RAG is mostly passive (in the sense that the LLM has little to no control over the query process), function calling is an active method where the LLM can decide, at any point in a conversation, to invoke a helper function for a specific functionality. The most common use case is to implement dynamic querying of a private API.

For example, suppose you’re making a bot for a delivery service. To implement this functionality, you’d need some elaborate prompts to first, instruct the bot to ask the user for the necessary data (such as the package ID). Then, you’d need a way to produce a well-formatted API call, invoke your API, and inject the resulting data in a prompt template to produce the final answer. This might not be enough, though. Getting the right answer might require more than one API call, with some back-and-forth between bot and user to narrow down the precise information the user needs.

This back-and-forth between bot, user, and API is so common that it makes sense to abstract it into a design pattern. This is what function calling is meant to support. Instead of manually crafting detailed prompts with a description of your API, and implementing the whole back-and-forth conversation workflow, most LLM providers already support function calling as an explicit feature.

How function calling works

First, you define a set of “functions”, which can be anything from real code functions (e.g., Python methods) to API calls. It doesn’t matter what is the underlying implementation, as the LLM will never directly interact with the function. It will just tell you when and how it should be invoked.

For that reason, you need to provide the LLM with a natural language description as well as an structured definition of the arguments of every function. This is usually all encapsulated in a standarized JSON schema, such as the following:

{
    "functions": [
        {
            "name": "get_user_info",
            "description": "get information about what a user has bought.",
            "arguments": [
                {
                    "name": "user_id",
                    "description": "The unique user identifier",
                    "type": "string",
                    "mandatory": true,
                }
            ]
        },
        {
            "name": "get_item_info",
            "description": "get information about an item's status and location.",
            "arguments": [
                {
                    "name": "item_id",
                    "description": "The unique item identifier",
                    "type": "string",
                    "mandatory": true,
                }
            ]
        }
    ]
}

Then, at inference time, a special system prompt instructs the LLM to either respond as normal or to produce a function call. The function call is a special type of structured response in which the LLM provides just a JSON object with the identifier of the function to call and the values for all mandatory arguments. An onversimplified example might be:

The following is a set of API functions you can invoke to obtain
relevant information to answer a user query.

{functions}

Given the following user query, determine whether an API call is appropriate.
If any arguments are missing from the conversation, ask the user.
If all arguments are available, output your response in JSON
format with the corresponding function call.
Otherwise, answer to the user in natural language.

{query}

Given a prompt like the above, an well-tuned LLM should be able to determine whether, given a specific query, it needs to call an API function or not. The developer must capture these function calling replies and, instead of outputting them to the user, call the appropriate function and inject the result back into the LLM context. Then, the LLM will produce an appropriate natural language response.

An example of a possible conversation in this fictional setting would be as follows.

First, the user asks for a specific information.

USER: Hey, please show me my latest purchases.

Given this query, and an appropriate prompt like the one shown above, the LLM might recognize it needs to call the get_user_info function but it’s missing the user_id argument.

ASSISTANT: Sure, I will need your user ID for that.

The user replies back.

USER: Of course, my user ID is 12345.

Since the LLM is receiving the whole conversation story, the second time it’s called it will recognize that it has all the appropriate arguments, and produce a function call.

ASSISTANT: {"function": "get_user_info", "arguments": {"user_id": {"12345"}}}

This time, instead of showing this message to the user, the developer intercepts the function call, invokes the API, and outputs the return value, presumably a list of purchases.

TOOL: {"function": "get_user_info", "result": [ ... ]}

Given this new information, the LLM can now answer back to the user.

ASSISTANT: You have bought 3 items in the last month...

This process can occur as many times as necessary in a conversation. With a suitable prompt, the LLM can even detect when some argument value is missing and produce the corresponding natural language question for the user. This way, we can naturally weave a conversation in which the user supplies the necessary arguments for a given function call in any order. The LLM can also call more than one function in the same conversation, giving a lot more flexibility than a rigid RAG cycle.

Use cases for function calling

Function calling is particularly useful for integrating an LLM with an external tool that can be consumed as an API. A typical use case (which we will see in Chapter 16) is building a shopping assistant for an online store that can suggest products, add or remove items from the shopping cart, provide information on delivery status, etc.

An interesting trick is to use function calling for structured generation. When you want an LLM to produce a JSON-formatted output, it’s typically hard to guarantee you always get the exact schema you need–except maybe when using the best models. However, even some of the smaller models, once fine-tuned for function calling, are extremely robust in generating the exact argument names and types for any function. Thus, if you can frame your generation prompt as an API function call, you get all this robustness for free.

But the possibilities don’t end here. Whatever service you can encapsulate behind a reasonably well-structured and fine-grained API, you can now stamp an LLM upfront and make your API queryable in natural language. Here are some typical examples:

Customer support: Integrate an LLM with a company’s knowledge base, product information, and customer data to create an intelligent virtual agent for customer support. The LLM can handle common queries, provide product recommendations, look up order status, and escalate complex issues to human agents.
Information systems: Connect an LLM to a query API that provides realtime information about some specific domain, from weather to stocks. Use it for internal tools connected to a company dashboard and integrate a conversational-style interface with a traditional graphical user interface.
Workflow automation: Connect an LLM to APIs for various business tools like CRM, project management, HR systems etc. Allow users to automate common workflows by querying the LLM in natural language, e.g. “Create a new Salesforce lead for this email”, “Schedule a meeting with the team next week”, “Approve this time off request”.
Collaborative writing: Integrate an LLM with document editing and collaboration tools to assist with writing tasks. The LLM can help brainstorm ideas, provide feedback on tone and structure, check for grammar and spelling, and even generate content based on prompts. We will see an example of this use case in Chapter 19.
Software development: When combined with the powerful code generation skills of language models, another possibility opens up: connecting an LLM to code repositories, documentation, and APIs to create an AI programming assistant. Developers can ask the LLM to explain code, debug issues, suggest improvements, and even generate new code based on high-level requirements. We will see an example of this use case in Chapter 20.

The key is to identify areas where humans currently interact with APIs and information systems, and see how an LLM can make those interactions more natural, efficient and productive.

Some caveats and limitations

As usual with LLMs, there are significant caveats and limitations to any integration. Although in general you can mitigate hallucinations considerably, the LLM can still hallucinate a wrong function call by, e.g., passing the wrong arguments. In the simplest case, maybe you can catch that error when arguments have the wrong type or are out of range. However, subtle hallucinations might result in a function call that succeeds but wasn’t the user intention.

For this reason, in all critical systems it is crucial that you don’t simply call an API blindly on behalf of the user, specially when doing so can have irreversible effects. For example, in a banking app, your LLM might hallucinate an incorrect destination in a transference, effectively sending the user money to an arbitrary third party. Furthermore, hackers might find a way to mess with your prompt and trigger the hallucination.

In these cases, you should always make the user explicitely trigger the final action, and make sure they have reviewed and understood the implications of such action. This enhances reliability at a small cost in usefulness, turning the LLM into an assistant that fills in the data for you, but doesn’t click the red button.

Another possible source of concern is when the LLM hallucinates the response, even though it made the right call and received the right data. This is the same problem we had with RAG: even if the context contains the right answer, there is no guarantee the LLM will pick it. One easy fix in many cases is to display the function result next to the LLM interpretation, so the user can double check the response.

One final caveat that may be relevant in many cases is regarding privacy. If you are interacting with a private API–say, a banking app–using a commercial LLM, you are effectively sending to OpenAI (or any other provider) you users’ information as part of the prompts, and this may include user IDs, addresses, financial details, etc. This underscores the need for powerfull open source LLMs that companies can self-host for added privacy and security.

Conclusions

Function calling can be seen at the same time a special case and a generalization of retrieval augmented generation. It is a special case because it involves injecting external information in the prompt to enhance the capabilities of an LLM. It is a generalization because you can implement RAG with function calling, simply by encapsulating your search functionality in a function call specification.

This pattern is extremely flexible, but at the same time it’s very repeatible. However, to make it work, it is crucial to get the prompt right. Since prompts are, in general, not entirely portable across different models, implementing this workflow from scratch every single time is a chore.

For this reason, most LLM services provide a native way to perform function calling, basically abstracting away the fragile prompt engineering component. Moreover, the LLM provider might have fine-tuned their model to a specific function-calling prompt and formatting. And since most LLM providers implement the OpenAI API specification, porting function calling between different providers is way easier.