LLMs and Agents

Overview

Large Language Models exhibit 'intelligent like' behaviour when responding to a prompt. Different LLMs have been trained for different use cases and the most recent such as Google's Gemini can accept a task type which it uses to optimise its approach to the user case. Gemini is a set of LLMs trained on multimodal data. Nevertheless an LLM is a 'single inference' mechanism - a prompt or query is input and the LLM responds with generated output (usually text). It does not in itself retain context or session history and this is where Agents come in.

Agent Interaction Loop

An agent has an core loop where it observes the world state, decides upon and enacts an action. It then waits for and observes the reaction to that action and continues until it's end goal is achieved.

Within this loop, the agent uses its profile, goals and instructions along with its memory and its planning abilities to decide upon the next action. This action could be predicting the next token or it could be that it decides it needs to perform some action on the environment and observe the impact or change. This involves calling a tool and then obtaining the response and adding that to the context or response. Tools are configured with a metadata that describes what the tool does along with its input and output parameters.

Reasoning Models

The latest LLMs are capable of following instruction based reasoning. Examples include ReAct, Chain-of-Thought or Tree-of-Thoughts. An agent uses these models and whereas the foundational LLM models are unable to interact with the real world, an agent can interact using external data and services. These tools often align with web apis to change the world state.

The orchestration layer in an agent manages its state (or memory) and maintains it's reasoning and planning using one or more of the prompt engineering frameworks to guide its reasoning and planning :

ReAct is a prompt engineering framework that provides a thought process strategy for language models to reason and take action on a user query.

Chain-of-Thought is a prompt engineering framework that enables reasoning through intermediate steps.

Tree-of-Thought is a prompt engineering framework that is suitable for exploration or strategic planning tasks. Agents can use these approaches to choose the next best action to take for a given user request.

Tools

Tools bridge the agent's internal knowledge to the real world. They allow the agent to interact with APIs, databases, etc.

Typically an agent will use its LLM to determine if additional information is needed, and if so, select an appropriate tool from its configured set (eg weather API, calculator, database lookup, etc.). It will construct a function call with the necessary parameters, invoke the tool, observe the output and perform any additional processing, and finally integrate the retrieved information with its current response and context to generate a complete response.

Consider a flight booking agent. A user makes a request 'I want to book a flight to X'. The agent provides its thoughts on what it should do. This might include 'i should search for flights to X'. As a result it might decide to use it's Flight Search Tool. The results of the execution of this tool can either be presented to the user or used as an intermediate step for futher reasoning. Currently there are 3 types of tools that an agent can use:

Extensions

A tool that allows the agent to directly call the external api. The agent is taught how to use the api through examples so that the agent knows what arguments to use to successfully call the endpoint. The agent then uses the model and examples at runtime to decide which extension, if any, could be used.

Functions

Functions are similar to extensions but the model does not execute the function but rather selects one and fills its arguments as required. The function is returned to the client where it can be executed :

We define a function in the usual way:

def display_cities(cities: list[str], preferences: Optional[str] = None):

return cities

And then set up the agent as follows:

from vertexai.generative_models import GenerativeModel, Tool, FunctionDeclaration

model = GenerativeModel("gemini-1.5-flash-001")

display_cities_function = FunctionDeclaration.from_func(display_cities)

tool = Tool(function_declarations=[display_cities_function])

message = "I’d like to take a ski trip with my family but I’m not sure where to go."

res = model.generate_content(message, tools=[tool])

print(f"Function Name: {res.candidates[0].content.parts[0].function_call.name}") print(f"Function Args: {res.candidates[0].content.parts[0].function_call.args}")

> Function Name: display_cities

> Function Args: {'preferences': 'skiing', 'cities': ['Aspen', 'Vail', 'Park City']}

Data Stores

A Data Store is an embedding store for additional data that are typically implemented as vector databases. A common example is in the implementation of Retrieval Augmented Generation applications.

The following, from Google, shows how RAG can be combined with ReAct :

An Example Using LangChain

The following code uses langchain to link a google search tool and the google places api. The agent can then be used to answer queries such as "Who did the Texas Longhorns play in football last week? What is the address of the other team's stadium?"

from langgraph.prebuilt import create_react_agent

from langchain_core.tools import tool

from langchain_community.utilities import SerpAPIWrapper

from langchain_community.tools import GooglePlacesTool

os.environ["SERPAPI_API_KEY"] = "XXXXX"

os.environ["GPLACES_API_KEY"] = "XXXXX"

@tool def search(query: str):

"""Use the SerpAPI to run a Google Search."""

search = SerpAPIWrapper()

return search.run(query)

@tool def places(query: str):

"""Use the Google Places API to run a Google Places Query."""

places = GooglePlacesTool()

return places.run(query)

model = ChatVertexAI(model="gemini-1.5-flash-001")

tools = [search, places]

query = "Who did the Texas Longhorns play in football last week? What is the address of the other team's stadium?"

agent = create_react_agent(model, tools)

input = {"messages": [("human", query)]}

for s in agent.stream(input, stream_mode="values"):

message = s["messages"][-1]

if isinstance(message, tuple):

print(message)

else:

message.pretty_print()

The following shows the actual output on running the code (above):

=============================== Human Message ================================

Who did the Texas Longhorns play in football last week? What is the address of the other team's stadium? ================================= Ai Message =================================

Tool Calls: search

Args:

query: Texas Longhorns football schedule

================================ Tool Message ================================

Name: search

{...Results: "NCAA Division I Football, Georgia, Date..."}

================================= Ai Message =================================

The Texas Longhorns played the Georgia Bulldogs last week.

Tool Calls: places

Args:

query: Georgia Bulldogs stadium

================================ Tool Message ================================

Name: places

{...Sanford Stadium Address: 100 Sanford...}

================================= Ai Message =================================

The address of the Georgia Bulldogs stadium is 100 Sanford Dr, Athens, GA 30602, USA.

Page updated

Google Sites

Report abuse

LLMs and Agents

Overview

Agent Interaction Loop

Reasoning Models

Tools

Extensions

Functions

def display_cities(cities: list[str], preferences: Optional[str] = None):

return cities

from vertexai.generative_models import GenerativeModel, Tool, FunctionDeclaration

model = GenerativeModel("gemini-1.5-flash-001")

display_cities_function = FunctionDeclaration.from_func(display_cities)

tool = Tool(function_declarations=[display_cities_function])

message = "I’d like to take a ski trip with my family but I’m not sure where to go."

res = model.generate_content(message, tools=[tool])

print(f"Function Name: {res.candidates[0].content.parts[0].function_call.name}") print(f"Function Args: {res.candidates[0].content.parts[0].function_call.args}")

> Function Name: display_cities

> Function Args: {'preferences': 'skiing', 'cities': ['Aspen', 'Vail', 'Park City']}

Data Stores

An Example Using LangChain

from langgraph.prebuilt import create_react_agent

from langchain_core.tools import tool

from langchain_community.utilities import SerpAPIWrapper

from langchain_community.tools import GooglePlacesTool

os.environ["SERPAPI_API_KEY"] = "XXXXX"

os.environ["GPLACES_API_KEY"] = "XXXXX"

@tool def search(query: str):

"""Use the SerpAPI to run a Google Search."""

search = SerpAPIWrapper()

return search.run(query)

@tool def places(query: str):

"""Use the Google Places API to run a Google Places Query."""

places = GooglePlacesTool()

return places.run(query)

model = ChatVertexAI(model="gemini-1.5-flash-001")

tools = [search, places]

query = "Who did the Texas Longhorns play in football last week? What is the address of the other team's stadium?"

agent = create_react_agent(model, tools)

input = {"messages": [("human", query)]}

for s in agent.stream(input, stream_mode="values"):

message = s["messages"][-1]

if isinstance(message, tuple):

print(message)

else:

message.pretty_print()

=============================== Human Message ================================

Who did the Texas Longhorns play in football last week? What is the address of the other team's stadium? ================================= Ai Message =================================

Tool Calls: search

Args:

query: Texas Longhorns football schedule

================================ Tool Message ================================

Name: search

{...Results: "NCAA Division I Football, Georgia, Date..."}

================================= Ai Message =================================

The Texas Longhorns played the Georgia Bulldogs last week.

Tool Calls: places

Args:

query: Georgia Bulldogs stadium

================================ Tool Message ================================

Name: places

{...Sanford Stadium Address: 100 Sanford...}

================================= Ai Message =================================

The address of the Georgia Bulldogs stadium is 100 Sanford Dr, Athens, GA 30602, USA.

CONTACT INFORMATION