How many times have we dreamed about robots that perform daily tasks just like humans do? In fact, the concept of agents is a seamless representation of such time-ago unrealistic prospects. Currently, Large Language Models (LLMs) have become the main technology making these agent ideas possible. In this article, we'll explore how to create smart digital assistants using LLMs and other tools. The core idea is to allow the agent not only to answer questions but also to trigger actions without requiring explicit human interaction.
There are a plethora of instances of such entities widely spread on the web. While the most general and widely known LLMs like GPT or LLaMa are thought to provide appropriate textual answers to user queries, more advanced agents can perform specific actions, change environmental variables or indeed send commands or queries to other agents interacting with them, all in real time, upon the basis of a user request. Imagine an assistant that can perform different actions for you, without asking many questions, but taking care of the most relevant aspects of the request. In this context, typical examples of Agents’ actions can be, but are not limited to:
Book a hotel room for your vacation.
Automatically adjust your home's temperature.
Send emails to specific people according to an information necessity.
Monitor and control conditions in a factory to ensure safety.
Write down a list of items to shop in store, and request them via store App Interfaces (API’s).
The possibilities are endless, and today we have the availability of platforms that allow us to even shape up agents from scratch or reuse previously deployed instances, just putting pieces around as a sort of puzzle. Examples of such platforms are N8N, Make.com, or AgentGPT. They allow the development of specialized agents that can include many kinds of tools to retrieve information from websites, social media, or internal company databases. Besides, they can make use of LLMs or programmed functions to perform specific behaviors or handle specialized devices.
Defining simple agents with Large Language Models
Essentially, these agents listen to your request, figure out what needs to be done, do it, and then check if it worked, repeating until the task is complete. Such a workflow is known as “Think, Act, and Observe”. Basic agents can be defined by simple technological core components according to this behaviour: User Interface, LLMs, and Tools/Actions. Following this simple schema, each of these components behaves in the following way:
User Interface: Provides basic communication elements for the user to enter commands that the agent will handle. Mostly, this interface consists of a chat outline where the user can introduce requests and view the responses generated by the agent. Other classes of UI’s can be web-based dashboards, Mobile applications or Chatbots.
LLMs: These are the most interesting and fundamental components of current AI agents. LLMs enhance agent capabilities through the use of specialized prompts. For instance, the user request goes through different prompts to achieve the desired result. Some prompts can split the reasoning related to the request into several steps, indicating the tools the agent should use each time. Also, other prompts and LLMs can be specialized in specific topics, therefore being used to answer questions belonging to those topics.
Tools and Actions: Beyond generating textual responses, LLM-powered agents often need up-to-date, real-time information. To meet these needs, agents can utilize tools, which are specialized routines or functions enabling them to access external resources and perform actions. In order to work properly with LLMs, such tools -and their inputs/outputs- have to be thoroughly and precisely declared. For instance, in Python, the tools are declared with related decorators and very well-specified docstrings.
Explore the HuggingFace Agents course for a hands-on learning experience with AI LLM-based agent components. Alex Honchar's "Introduction to LLM Agents" offers comprehensive explanations and practical examples of agent architecture. Additionally, the LangChain Hub provides a searchable repository of efficient and precise prompts for LLMs to achieve desired agent behaviors. These resources offer valuable insights into building smart, specialized agents.
Intelligent AI agents: beyond simple reasoners
We can go beyond the previous simple architecture and outline complex and more realistic structures for advanced scenarios. According to its definition, an intelligent agent is a perceptual system capable of interpreting and processing environmental information and acting logically and rationally in accordance with retrieved and processed data. We may include the concepts of sensors and actuators in order to perceive and change its environment. Such a stack of concepts requires additional elements that will allow an agent to behave appropriately and as expected, even for complex sequences of actions that a user query may involve.
A key aspect within this schema is the ability of agents to understand and process correctly the variety of inputs that can be fed from the context. On this aspect, LLMs are the most appropriate available tool that provides agents with such versatility. The maturity that some models have acquired so far allows a seamless workflow from the beginning to the end of the task, by means of training parameters and prompts. An agent that uses LLMs to trigger specific actions typically combines several architectural components to interact with the LLM, integrate with APIs, manage workflow, and ensure secure operations. To improve the LLM-based agent design, based on the last thoughts, we can add these features:
API Orchestrator: This component connects the agent with services or other agents. The output of the LLMs in the planning stages determines the use of tools. In this sense, the API Orchestrator provides the appropriate interfaces -i.e., APIs- to access such tools, which can be external or internal applications. Examples of API Orchestrator platforms are the Amazon AWS API Gateway or FastAPI. This component can also be integrated by an API Gateway a mediator between the agent and external apps-, Service Connectors -to interact with specific apps, databases, or messaging systems- and an API Action Mapping scheme -maps user intents or LLM results to API calls.
Workflow Manager: Handles complex workflows, managing state and multi-step processes that involve interactions across different systems. An example of existing technology that can play this role is Temporal. Some basic components of this module can be a state machine -Keeps track of the order and progress of workflows- and a task queue -schedules and queues up tasks, e.g., Celery or RabbitMQ.
Security Layer: Ensures secure interactions between users, the LLM, and company systems. For instance, Auth0 can be used for this purpose. Components of this layer can be Authentication and Authorization, Data Encryption, and Audit Logging.
 An example of the complete workflow in this architecture could be as follows:
User Request: A user asks, "Generate a sales report for Q3."
LLM Interaction: The LLM understands the context and converts this into an actionable task, identifying the request as "report generation" with an entity "Q3".
API Orchestrator: The API orchestrator triggers an API call to the company’s report generation system.
Workflow Manager: If this is a multi-step process (e.g., fetching data, processing, sending), the workflow manager coordinates the actions.
Security Layer: Ensures the user has permission to request the report.
Report Delivery: The agent returns the requested report to the user via the UI.
This architecture leverages the LLM's natural language understanding, along with APIs and workflows, to perform complex actions within an organization. Another key tool that can be added to this schema is Retrieval Augmented Generation (RAG). For instance, RAG can be integrated between the LLM Layer and an external Knowledge Base, augmenting the agent’s ability to generate responses and trigger actions that are more contextually grounded. In the same way, RAG can enhance the language understanding capabilities of LLMs keeping up-to-date knowledge for intents and entities according to real-time information retrieval (e.g., FAQs, policies, related knowledge). A RAG module can be executed when the agent determines that retrieval is needed in any step of the complete process. Another critical enhancement to this architecture is the use of specific Instruct LLMs in the planning stages, and specifically finetuned LLMs for the act/observe stages.
Final thoughts
AI Agents have achieved impressive levels of performance since the adoption of LLMs into their infrastructure. Anyone can deliver agents to perform actions in specific contexts by only providing the correct sets of resources and clear and precise instructions. The power of LLMs and Prompt engineering will do the most complex and specific general tasks under the hood. Besides querying the links provided in this article, another valuable skill for agents developing is, hence, prompt engineering. In the Langchain Hub repository, many prompts from several domains are available to shape new agents with incredible adaptability to general or specific tasks.
In the field of software development, a new programming approach can be set up from the creation of agents. If we develop high-quality and well-documented simple pieces of code, the actions surrounding such source code can be performed by agents with the flexibility that prompts and LLMs can provide. For instance, we can write the code to generate reports or process data in a common way for different data pipelines, and lend to an agent the interfaces with users to determine the scope of information to include in such reports.
The Internet of Things (IoT) is another area that can benefit from this technology. Not only can the automation of control tasks for different environments be enhanced, but also known concepts such as dark factories -fully automated manufacturing processes- could gain further impulse with this new approach. The power of prompts opens a wide range of possibilities over IoT and robotics, among many other disciplines. The sky is the limit.
About the Author
Eduardo Xamena is a Data Scientist and Machine Learning Engineer with a strong foundation in software development, specializing in Python, Azure (ADF, AML, Data Lake), Cosmos-Scope, and AWS tools like Sagemaker and S3. He has led impactful projects, including developing machine learning models that improved fraud detection by 20% and building scalable data pipelines for water service providers across Latin America. Eduardo has also enhanced existing solutions, creating new pipelines and visualizations for a Debiased Machine Learning project focused on Net Promoter Score and telemetry data.