AI Agent

728x90

Introduction (소개)

사람은 복잡한 패턴 인식 작업을 잘 수행하지만, 결론에 도달하기 전에 책, 구글 검색, 계산기와 같은 도구를 사용하여 기존 지식을 보완하곤 합니다. 이와 마찬가지로, 생성 AI 모델도 도구를 사용하여 실시간 정보를 확인하거나 현실적인 행동을 제안하도록 훈련될 수 있습니다. 예를 들어, 모델은 데이터베이스 검색 도구를 활용하여 고객의 구매 이력을 확인하고 맞춤형 쇼핑 추천을 생성할 수 있습니다. 또한 사용자의 요청에 따라 이메일을 보내거나 금융 거래를 대신 수행하기 위해 다양한 API 호출을 실행할 수도 있습니다.

이를 위해 모델은 외부 도구에 접근할 수 있어야 할 뿐만 아니라, 자체적으로 계획을 수립하고 작업을 실행할 수 있는 능력을 가져야 합니다. 논리적 사고, 추론, 외부 정보의 조합이 생성 AI 모델과 연결될 때, 이는 단순한 생성 AI 모델을 넘어서는 **에이전트(agent)**라는 개념으로 확장됩니다. 에이전트는 생성 AI 모델의 독립적인 능력을 확장하는 프로그램으로 정의됩니다. 이 백서는 이러한 에이전트의 개념 및 관련된 측면들을 더 자세히 다룹니다.

Introduction
Humans are fantastic at messy pattern recognition tasks. However, they often rely on tools
- like books, Google Search, or a calculator - to supplement their prior knowledge before
arriving at a conclusion. Just like humans, Generative AI models can be trained to use tools
to access real-time information or suggest a real-world action. For example, a model can
leverage a database retrieval tool to access specific information, like a customer's purchase
history, so it can generate tailored shopping recommendations. Alternatively, based on a
user's query, a model can make various API calls to send an email response to a colleague
or complete a financial transaction on your behalf. To do so, the model must not only have

access to a set of external tools, it needs the ability to plan and execute any task in a self-
directed fashion. This combination of reasoning, logic, and access to external information

that are all connected to a Generative AI model invokes the concept of an agent, or a
program that extends beyond the standalone capabilities of a Generative AI model. This
whitepaper dives into all these and associated aspects in more detail.

What is an agent? (에이전트란 무엇인가?)

에이전트는 기본적으로 도구를 활용하여 목표를 달성하기 위해 세상을 관찰하고 이에 따라 행동하는 애플리케이션으로 정의됩니다. 에이전트는 자율적이며, 적절한 목표나 과제가 주어졌을 때 인간의 개입 없이 독립적으로 작동할 수 있습니다. 또한, 명시적인 지침이 없어도 최종 목표를 달성하기 위해 무엇을 해야 할지 스스로 추론하여 적극적으로 행동할 수 있습니다.

AI에서 에이전트라는 개념은 매우 일반적이고 강력하지만, 이 백서에서는 생성 AI 모델이 현재 구축할 수 있는 특정 유형의 에이전트에 초점을 맞춥니다.

에이전트의 내부 작동 방식을 이해하기 위해, 에이전트의 행동, 행동 방식, 의사 결정을 구동하는 기본 구성 요소를 소개합니다. 이러한 구성 요소들의 조합은 **인지 아키텍처(cognitive architecture)**로 설명될 수 있으며, 다양한 구성 요소를 혼합 및 매칭하여 여러 아키텍처를 구현할 수 있습니다. 핵심 기능에 집중하면, 에이전트의 인지 아키텍처에는 다음과 같은 세 가지 필수 구성 요소가 포함됩니다.

What is an agent?
In its most fundamental form, a Generative AI agent can be defined as an application that
attempts to achieve a goal by observing the world and acting upon it using the tools that it
has at its disposal. Agents are autonomous and can act independently of human intervention,
especially when provided with proper goals or objectives they are meant to achieve. Agents
can also be proactive in their approach to reaching their goals. Even in the absence of
explicit instruction sets from a human, an agent can reason about what it should do next to
achieve its ultimate goal. While the notion of agents in AI is quite general and powerful, this
whitepaper focuses on the specific types of agents that Generative AI models are capable of
building at the time of publication.
In order to understand the inner workings of an agent, let’s first introduce the foundational
components that drive the agent’s behavior, actions, and decision making. The combination
of these components can be described as a cognitive architecture, and there are many
such architectures that can be achieved by the mixing and matching of these components.
Focusing on the core functionalities, there are three essential components in an agent’s
cognitive architecture as shown in Figure 1.

The model (모델)

에이전트의 맥락에서 모델은 에이전트 프로세스를 위한 중앙 의사결정 역할을 하는 **언어 모델(LM)**을 의미합니다. 에이전트가 사용하는 모델은 하나 이상의 크기(소형 또는 대형)의 언어 모델일 수 있으며, 이들은 ReAct, Chain-of-Thought, Tree-of-Thoughts와 같은 논리적 프레임워크를 따르는 추론 및 논리 처리를 수행할 수 있습니다. 모델은 일반 목적, 멀티모달, 또는 특정 에이전트 아키텍처의 필요에 따라 미세 조정된 상태일 수 있습니다.

최상의 결과를 얻으려면, 원하는 최종 애플리케이션에 적합한 모델을 사용하는 것이 중요하며, 가능하면 사용하려는 도구와 연관된 데이터 시그니처로 훈련된 모델을 사용하는 것이 이상적입니다. 그러나 일반적으로 모델은 에이전트의 구체적인 설정(예: 도구 선택, 오케스트레이션/추론 설정)으로 훈련되지 않습니다. 대신, 에이전트의 작업을 지원하기 위해 모델을 추가적으로 정제할 수 있습니다. 이를 위해 에이전트가 다양한 맥락에서 특정 도구나 추론 단계를 사용하는 사례를 모델에 제공하여 그 능력을 학습시킬 수 있습니다.

The model
In the scope of an agent, a model refers to the language model (LM) that will be utilized as
the centralized decision maker for agent processes. The model used by an agent can be one
or multiple LM’s of any size (small / large) that are capable of following instruction based
reasoning and logic frameworks, like ReAct, Chain-of-Thought, or Tree-of-Thoughts. Models
can be general purpose, multimodal or fine-tuned based on the needs of your specific agent
architecture. For best production results, you should leverage a model that best fits your
desired end application and, ideally, has been trained on data signatures associated with the
tools that you plan to use in the cognitive architecture. It’s important to note that the model is
typically not trained with the specific configuration settings (i.e. tool choices, orchestration/
reasoning setup) of the agent. However, it’s possible to further refine the model for the
agent’s tasks by providing it with examples that showcase the agent’s capabilities, including
instances of the agent using specific tools or reasoning steps in various contexts.

The tools (도구)

기초 언어 모델은 텍스트 및 이미지 생성에서 인상적인 성능을 보여주지만, 외부 세계와 상호작용할 수 없다는 한계가 있습니다. 도구는 이러한 한계를 극복하여 에이전트가 외부 데이터 및 서비스를 활용하고, 기초 모델의 한계를 넘어 더 다양한 작업을 수행할 수 있도록 돕습니다. 도구는 다양한 형태를 가지며 복잡성도 다를 수 있지만, 일반적으로 GET, POST, PATCH, DELETE와 같은 웹 API 메서드와 연관됩니다.

예를 들어, 도구를 통해 에이전트는 데이터베이스에서 고객 정보를 업데이트하거나, 날씨 데이터를 가져와 여행 추천을 제공할 수 있습니다. 도구를 사용함으로써 에이전트는 **정보 검색 강화 생성(RAG, Retrieval-Augmented Generation)**과 같은 고급 시스템을 지원할 수 있으며, 이를 통해 기본 모델이 단독으로 수행할 수 있는 것보다 훨씬 더 많은 작업을 할 수 있습니다.

도구는 에이전트의 내부 기능과 외부 세계를 연결하여, 더 광범위한 가능성을 열어줍니다. 도구의 상세 구현은 뒤에서 다루겠지만, 핵심은 도구가 에이전트의 내재적 능력을 확장하고 외부와의 연결을 통해 새로운 작업 범위를 열어준다는 점입니다.

The tools
Foundational models, despite their impressive text and image generation, remain constrained
by their inability to interact with the outside world. Tools bridge this gap, empowering agents
to interact with external data and services while unlocking a wider range of actions beyond
that of the underlying model alone. Tools can take a variety of forms and have varying
depths of complexity, but typically align with common web API methods like GET, POST,
PATCH, and DELETE. For example, a tool could update customer information in a database
or fetch weather data to influence a travel recommendation that the agent is providing to
the user. With tools, agents can access and process real-world information. This empowers
them to support more specialized systems like retrieval augmented generation (RAG),
which significantly extends an agent’s capabilities beyond what the foundational model can
achieve on its own. We’ll discuss tools in more detail below, but the most important thing
to understand is that tools bridge the gap between the agent’s internal capabilities and the
external world, unlocking a broader range of possibilities.

The orchestration layer (오케스트레이션 계층)

오케스트레이션 계층은 에이전트가 정보를 수집하고 내부적으로 추론하며, 그 결과를 바탕으로 다음 행동이나 결정을 내리는 순환적인 과정을 설명합니다. 일반적으로 이 루프는 에이전트가 목표를 달성하거나 멈춰야 할 지점에 도달할 때까지 계속됩니다.

오케스트레이션 계층의 복잡성은 에이전트와 수행 중인 작업에 따라 크게 달라질 수 있습니다. 일부 루프는 간단한 계산과 결정 규칙으로 구성될 수 있지만, 다른 루프는 연결된 논리, 추가적인 머신러닝 알고리즘, 또는 확률적 추론 기법을 포함할 수 있습니다.

에이전트의 오케스트레이션 계층은 정보 intake(수집), reasoning(추론), planning(계획), 그리고 action(행동)을 반복적으로 수행하여 목표를 향해 나아가는 프로세스를 지원합니다. 이 섹션에서 에이전트의 오케스트레이션 계층 구현에 대해 더 자세히 논의합니다.

The orchestration layer
The orchestration layer describes a cyclical process that governs how the agent takes in
information, performs some internal reasoning, and uses that reasoning to inform its next
action or decision. In general, this loop will continue until an agent has reached its goal or a
stopping point. The complexity of the orchestration layer can vary greatly depending on the
agent and task it’s performing. Some loops can be simple calculations with decision rules,
while others may contain chained logic, involve additional machine learning algorithms, or
implement other probabilistic reasoning techniques. We’ll discuss more about the detailed
implementation of the agent orchestration layers in the cognitive architecture section.

Agents vs. Models (에이전트와 모델 비교)

에이전트와 모델의 차이를 명확히 이해하기 위해 다음 표를 참고할 수 있습니다:

모델 (Models)에이전트 (Agents)

훈련 데이터에 포함된 정보에만 제한됨	도구를 통한 외부 시스템 연결로 지식 확장
단일 추론 또는 예측 수행	세션 기록(예: 대화 기록)을 관리하며 다중 턴 기반 추론/예측 수행
네이티브 도구 구현 없음	에이전트 아키텍처에 네이티브 도구가 구현됨
네이티브 논리 계층 없음. 사용자가 단순 질문을 하거나, CoT, ReAct 등 논리적 프레임워크로 복잡한 질문 구성 가능	CoT, ReAct, LangChain 등 사전 구축된 프레임워크를 활용하는 네이티브 인지 아키텍처 포함

에이전트는 외부 세계와의 상호작용을 위해 논리적 사고 및 추론을 포함한 구조적인 인지 아키텍처를 활용합니다.

Agents vs. models
To gain a clearer understanding of the distinction between agents and models, consider the
following chart:

Cognitive architectures: How agents operate (인지 아키텍처: 에이전트의 작동 방식)

에이전트의 작동 방식을 이해하기 위해, 바쁜 주방의 요리사를 예로 들어봅시다. 요리사의 목표는 손님에게 맛있는 음식을 제공하는 것입니다. 이 과정에는 계획, 실행, 조정의 순환이 포함됩니다.

정보 수집: 요리사는 손님의 주문을 듣고, 식료품 저장소와 냉장고에 있는 재료를 확인합니다.
내부 추론: 요리사는 가지고 있는 재료를 바탕으로 어떤 요리와 맛을 만들 수 있을지 고민합니다.
행동 수행: 재료를 손질하고, 양념을 섞고, 고기를 굽는 등의 조리 과정을 실행합니다.
조정 및 수정: 재료가 부족하거나 손님 피드백을 받으면 계획을 수정하며, 이전 결과를 참고하여 다음 계획을 수립합니다.

이 순환 과정은 정보를 수집하고, 계획을 세우고, 실행하며, 결과를 조정하는 독특한 인지 아키텍처를 설명합니다.

에이전트 역시 이와 유사하게 인지 아키텍처를 활용하여 정보를 처리하고, 정보에 기반해 의사결정을 내리며, 이전 결과를 참고하여 다음 행동을 세밀하게 조정합니다.

Cognitive architectures: How agents operate
Imagine a chef in a busy kitchen. Their goal is to create delicious dishes for restaurant
patrons which involves some cycle of planning, execution, and adjustment.

• They gather information, like the patron’s order and what ingredients are in the pantry
and refrigerator.
• They perform some internal reasoning about what dishes and flavor profiles they can
create based on the information they have just gathered.
• They take action to create the dish: chopping vegetables, blending spices, searing meat.
At each stage in the process the chef makes adjustments as needed, refining their plan as
ingredients are depleted or customer feedback is received, and uses the set of previous
outcomes to determine the next plan of action. This cycle of information intake, planning,
executing, and adjusting describes a unique cognitive architecture that the chef employs to
reach their goal.
Just like the chef, agents can use cognitive architectures to reach their end goals by
iteratively processing information, making informed decisions, and refining next actions
based on previous outputs. At the core of agent cognitive architectures lies the orchestration
layer, responsible for maintaining memory, state, reasoning and planning. It uses the rapidly
evolving field of prompt engineering and associated frameworks to guide reasoning and
planning, enabling the agent to interact more effectively with its environment and complete
tasks. Research in the area of prompt engineering frameworks and task planning for
language models is rapidly evolving, yielding a variety of promising approaches. While not an
exhaustive list, these are a few of the most popular frameworks and reasoning techniques
available at the time of this publication:

• ReAct, a prompt engineering framework that provides a thought process strategy for
language models to Reason and take action on a user query, with or without in-context
examples. ReAct prompting has shown to outperform several SOTA baselines and improve
human interoperability and trustworthiness of LLMs.

• Chain-of-Thought (CoT), a prompt engineering framework that enables reasoning
capabilities through intermediate steps. There are various sub-techniques of CoT including
self-consistency, active-prompt, and multimodal CoT that each have strengths and
weaknesses depending on the specific application.
• Tree-of-thoughts (ToT),a prompt engineering framework that is well suited for
exploration or strategic lookahead tasks. It generalizes over chain-of-thought prompting
and allows the model to explore various thought chains that serve as intermediate steps
for general problem solving with language models.

Agents can utilize one of the above reasoning techniques, or many other techniques, to
choose the next best action for the given user request. For example, let’s consider an agent
that is programmed to use the ReAct framework to choose the correct actions and tools for
the user query.

The sequence of events might go something like this:
1. User sends query to the agent
2. Agent begins the ReAct sequence
3. The agent provides a prompt to the model, asking it to generate one of the next ReAct
steps and its corresponding output:

a. Question: The input question from the user query, provided with the prompt
b. Thought: The model’s thoughts about what it should do next
c. Action: The model’s decision on what action to take next
i. This is where tool choice can occur
ii. For example, an action could be one of [Flights, Search, Code, None], where the first

3 represent a known tool that the model can choose, and the last represents “no
tool choice”

d. Action input: The model’s decision on what inputs to provide to the tool (if any)
e. Observation: The result of the action / action input sequence
i. This thought / action / action input / observation could repeat N-times as needed
f. Final answer: The model’s final answer to provide to the original user query

4. The ReAct loop concludes and a final answer is provided back to the user

As shown in Figure 2, the model, tools, and agent configuration work together to provide a grounded, concise response back to the user based on the user’s original query. While the model could have guessed at an answer (hallucinated) based on its prior knowledge, it instead used a tool (Flights) to search for real-time external information. This additional information was provided to the model, allowing it to make a more informed decision based on real factual data and to summarize this information back to the user. In summary, the quality of agent responses can be tied directly to the model’s ability to reason and act about these various tasks, including the ability to select the right tools, and how well that tools has been defined. Like a chef crafting a dish with fresh ingredients and attentive to customer feedback, agents rely on sound reasoning and reliable information to deliver optimal results. In the next section, we’ll dive into the various ways agents connect with fresh data.

오케스트레이션 계층의 역할

에이전트의 인지 아키텍처 중심에는 오케스트레이션 계층이 있으며, 이는 메모리, 상태, 추론, 계획을 관리합니다. 이 계층은 프롬프트 엔지니어링과 관련된 프레임워크를 활용하여 추론 및 계획을 안내하며, 에이전트가 환경과 효과적으로 상호작용하고 작업을 완료하도록 돕습니다.

주요 추론 프레임워크

에이전트가 사용하는 다양한 프레임워크와 추론 기술이 있으며, 몇 가지 주요 예는 다음과 같습니다:

ReAct (Reason + Act): 사용자의 요청을 해결하기 위해 논리적 사고와 행동을 결합하는 프레임워크입니다. 이는 최신 상태(SOTA) 기술들을 능가하며, 인간과의 상호작용성과 신뢰성을 향상시키는 데 효과적입니다.
Chain-of-Thought (CoT): 중간 단계의 추론 과정을 통해 논리적 사고를 지원하는 프레임워크입니다. CoT에는 Self-Consistency, Active Prompt, Multimodal CoT와 같은 하위 기술들이 포함되며, 각 기술은 특정 응용 분야에 따라 강점과 약점이 다릅니다.
Tree-of-Thoughts (ToT): 전략적 탐구나 탐색 작업에 적합한 프레임워크로, CoT를 일반화하며 문제 해결을 위해 다양한 사고 체인을 탐구할 수 있습니다.

Tools: Our keys to the outside world (도구: 외부 세계로의 연결고리)

언어 모델은 정보를 처리하는 데 뛰어난 성능을 보이지만, 외부 세계를 직접적으로 인지하거나 영향을 미칠 수는 없습니다. 이는 모델이 학습한 훈련 데이터에만 의존하기 때문에, 실시간 상호작용이 필요한 상황에서 한계가 발생합니다. 이를 극복하기 위해 도구를 활용하여 모델이 실시간 데이터에 접근하고 외부 시스템과 상호작용할 수 있도록 해야 합니다.

도구란 무엇인가?

도구는 기본 모델과 외부 세계를 연결하는 매개체 역할을 합니다. 이를 통해 에이전트는 단순히 이해하는 것을 넘어, 다음과 같은 다양한 작업을 수행할 수 있습니다:

스마트홈 설정 변경
캘린더 업데이트
데이터베이스에서 사용자 정보 검색
특정 조건에 따라 이메일 전송

도구의 주요 유형

현재 기준으로 Google 모델은 세 가지 주요 도구 유형과 상호작용할 수 있습니다:

Extensions (확장 도구): 표준화된 방식으로 API와 에이전트를 연결하여 에이전트가 다양한 API를 실행할 수 있도록 지원합니다.
Functions (기능): 특정 작업을 수행하는 독립적 코드 모듈로, 클라이언트 측에서 실행됩니다.
Data Stores (데이터 저장소): 에이전트가 실시간으로 액세스할 수 있는 동적 데이터 및 정보를 제공합니다.

도구를 통해 에이전트는 외부 시스템과 상호작용하며, 단순 언어 모델이 단독으로 수행할 수 없는 작업을 수행할 수 있는 잠재력을 얻습니다.

Tools: Our keys to the outside world
While language models excel at processing information, they lack the ability to directly
perceive and influence the real world. This limits their usefulness in situations requiring
interaction with external systems or data. This means that, in a sense, a language model
is only as good as what it has learned from its training data. But regardless of how much
data we throw at a model, they still lack the fundamental ability to interact with the outside
world. So how can we empower our models to have real-time, context-aware interaction with
external systems? Functions, Extensions, Data Stores and Plugins are all ways to provide this
critical capability to the model.
While they go by many names, tools are what create a link between our foundational models
and the outside world. This link to external systems and data allows our agent to perform a
wider variety of tasks and do so with more accuracy and reliability. For instance, tools can
enable agents to adjust smart home settings, update calendars, fetch user information from
a database, or send emails based on a specific set of instructions.
As of the date of this publication, there are three primary tool types that Google models are
able to interact with: Extensions, Functions, and Data Stores. By equipping agents with tools,
we unlock a vast potential for them to not only understand the world but also act upon it,
opening doors to a myriad of new applications and possibilities.

Extensions (확장 도구)

**확장 도구(Extensions)**를 이해하기 가장 쉬운 방법은 이를 에이전트와 API 간의 표준화된 다리로 보는 것입니다. 이를 통해 에이전트는 API의 구현 방식에 관계없이 원활하게 API를 실행할 수 있습니다.

Extensions
The easiest way to understand Extensions is to think of them as bridging the gap between
an API and an agent in a standardized way, allowing agents to seamlessly execute APIs
regardless of their underlying implementation. Let’s say that you’ve built an agent with a goal
of helping users book flights. You know that you want to use the Google Flights API to retrieve
flight information, but you’re not sure how you’re going to get your agent to make calls to this
API endpoint.

Figure 3. How do Agents interact with External APIs?

One approach could be to implement custom code that would take the incoming user query,
parse the query for relevant information, then make the API call. For example, in a flight
booking use case a user might state “I want to book a flight from Austin to Zurich.” In this
scenario, our custom code solution would need to extract “Austin” and “Zurich” as relevant
entities from the user query before attempting to make the API call. But what happens if the
user says “I want to book a flight to Zurich” and never provides a departure city? The API call
would fail without the required data and more code would need to be implemented in order
to catch edge and corner cases like this. This approach is not scalable and could easily break
in any scenario that falls outside of the implemented custom code.

A more resilient approach would be to use an Extension. An Extension bridges the gap
between an agent and an API by:
1. Teaching the agent how to use the API endpoint using examples.
2. Teaching the agent what arguments or parameters are needed to successfully call the
API endpoint.

Extensions can be crafted independently of the agent, but should be provided as part of the
agent’s configuration. The agent uses the model and examples at run time to decide which
Extension, if any, would be suitable for solving the user’s query. This highlights a key strength
of Extensions, their built-in example types, that allow the agent to dynamically select the
most appropriate Extension for the task.

Think of this the same way that a software developer decides which API endpoints to use
while solving and solutioning for a user’s problem. If the user wants to book a flight, the
developer might use the Google Flights API. If the user wants to know where the nearest
coffee shop is relative to their location, the developer might use the Google Maps API. In
this same way, the agent / model stack uses a set of known Extensions to decide which one
will be the best fit for the user’s query. If you’d like to see Extensions in action, you can try
them out on the Gemini application by going to Settings > Extensions and then enabling any
you would like to test. For example, you could enable the Google Flights extension then ask
Gemini “Show me flights from Austin to Zurich leaving next Friday.”

확장 도구 사용 사례

예를 들어, 사용자가 항공편을 예약하는 데 도움을 주는 에이전트를 구축하려고 한다고 가정해봅시다. Google Flights API를 사용하여 항공편 정보를 검색하고 싶지만, 에이전트가 이 API를 호출하는 방법을 설정해야 할 때가 있습니다.

Sample Extensions
To simplify the usage of Extensions, Google provides some out of the box extensions that
can be quickly imported into your project and used with minimal configurations. For example,
the Code Interpreter extension in Snippet 1 allows you to generate and run Python code from
a natural language description.

Python
import vertexai
import pprint

PROJECT_ID = "YOUR_PROJECT_ID"
REGION = "us-central1"
vertexai.init(project=PROJECT_ID, location=REGION)

from vertexai.preview.extensions import Extension

extension_code_interpreter = Extension.from_hub("code_interpreter")
CODE_QUERY = """Write a python method to invert a binary tree in O(n) time."""

response = extension_code_interpreter.execute(
operation_id = "generate_and_execute",
operation_params = {"query": CODE_QUERY}
)

print("Generated Code:")
pprint.pprint({response['generated_code']})

# The above snippet will generate the following code.
```
Generated Code:
class TreeNode:
def __init__(self, val=0, left=None, right=None):
self.val = val
self.left = left
self.right = right

def invert_binary_tree(root):
"""
Inverts a binary tree.
Args:
root: The root of the binary tree.
Returns:
The root of the inverted binary tree.
"""

if not root:
return None

# Swap the left and right children recursively
root.left, root.right =
invert_binary_tree(root.right), invert_binary_tree(root.left)

return root

# Example usage:
# Construct a sample binary tree
root = TreeNode(4)
root.left = TreeNode(2)
root.right = TreeNode(7)
root.left.left = TreeNode(1)
root.left.right = TreeNode(3)
root.right.left = TreeNode(6)
root.right.right = TreeNode(9)

# Invert the binary tree
inverted_root = invert_binary_tree(root)

Snippet 1. Code Interpreter Extension can generate and run Python code

To summarize, Extensions provide a way for agents to perceive, interact, and influence the
outside world in a myriad of ways. The selection and invocation of these Extensions is guided
by the use of Examples, all of which are defined as part of the Extension configuration.

전통적 접근 방식: 사용자 요청에서 관련 정보를 추출하고, API 호출을 수행하기 위해 사용자 입력을 구문 분석하는 맞춤형 코드를 작성합니다. 하지만 이 방식은 예외 처리와 경계 사례를 다루는 데 한계가 있으며, 확장성이 떨어질 수 있습니다.
확장 도구 접근 방식: 확장 도구는 다음과 같은 방식으로 에이전트와 API를 연결합니다:
1. 에이전트에게 API 사용법(엔드포인트 호출 방법)을 예제 기반으로 교육.
2. API 호출에 필요한 매개변수나 인수를 명시적으로 정의.

확장의 장점

확장 도구는 에이전트의 구성과 독립적으로 설계될 수 있으며, 실행 시 에이전트가 특정 작업에 적합한 확장을 동적으로 선택하도록 지원합니다. 이러한 기능은 확장 도구의 주요 강점으로, 에이전트가 사용자 요청을 보다 유연하게 처리할 수 있게 합니다.

예제: Google Flights 확장

사용자가 "Austin에서 Zurich로 다음 주 금요일에 출발하는 항공편을 보여줘"라고 요청하면, 에이전트는 Google Flights 확장을 활성화하여 적절한 API 호출을 수행할 수 있습니다.

Sample Extensions (확장 도구 예제)

확장 도구를 보다 쉽게 사용할 수 있도록 Google은 최소한의 설정으로 빠르게 프로젝트에 통합할 수 있는 기본 제공 확장 도구를 제공합니다.

예제: 코드 인터프리터(Code Interpreter)

코드 인터프리터 확장은 자연어 설명으로부터 Python 코드를 생성하고 실행할 수 있는 기능을 제공합니다. 아래는 이 확장 도구를 활용한 Python 코드 예제입니다.

import vertexai
import pprint

PROJECT_ID = "YOUR_PROJECT_ID"
REGION = "us-central1"

vertexai.init(project=PROJECT_ID, location=REGION)

from vertexai.preview.extensions import Extension

extension_code_interpreter = Extension.from_hub("code_interpreter")
CODE_QUERY = """Write a python method to invert a binary tree in O(n) time."""

response = extension_code_interpreter.execute(
operation_id = "generate_and_execute",
operation_params = {"query": CODE_QUERY}
)

print("Generated Code:")
pprint.pprint({response['generated_code']})

위 코드 스니펫은 자연어로 작성된 질문(예: "이진 트리를 O(n) 시간에 반전하는 Python 메서드를 작성하라")을 사용하여 Python 코드를 생성하고 실행합니다.

생성된 코드 예제

class TreeNode:
    def __init__(self, val=0, left=None, right=None):
        self.val = val
        self.left = left
        self.right = right

def invert_binary_tree(root):
    if not root:
        return None
    root.left, root.right = invert_binary_tree(root.right), invert_binary_tree(root.left)
    return root

이 예제는 코드 생성 및 실행 과정을 보여줌으로써 확장 도구의 강력함을 입증합니다.

Functions (기능)

소프트웨어 엔지니어링에서 **기능(Function)**은 특정 작업을 수행하도록 설계된 독립적 코드 모듈로, 필요에 따라 재사용될 수 있습니다. 에이전트 환경에서 기능은 모델이 정해진 기능 집합을 사용하고, 이를 호출할 때 필요한 인수를 결정하며, 사양에 따라 실행되는 방식으로 작동합니다.

Functions
In the world of software engineering, functions are defined as self-contained modules
of code that accomplish a specific task and can be reused as needed. When a software
developer is writing a program, they will often create many functions to do various tasks.
They will also define the logic for when to call function_a versus function_b, as well as the
expected inputs and outputs.
Functions work very similarly in the world of agents, but we can replace the software
developer with a model. A model can take a set of known functions and decide when to use
each Function and what arguments the Function needs based on its specification. Functions
differ from Extensions in a few ways, most notably:
1. A model outputs a Function and its arguments, but doesn’t make a live API call.
2. Functions are executed on the client-side, while Extensions are executed on
the agent-side.
Using our Google Flights example again, a simple setup for functions might look like the
example in Figure 7.

Note that the main difference here is that neither the Function nor the agent interact directly
with the Google Flights API. So how does the API call actually happen?
With functions, the logic and execution of calling the actual API endpoint is offloaded away
from the agent and back to the client-side application as seen in Figure 8 and Figure 9 below.
This offers the developer more granular control over the flow of data in the application. There
are many reasons why a Developer might choose to use functions over Extensions, but a few
common use cases are:
• API calls need to be made at another layer of the application stack, outside of the direct
agent architecture flow (e.g. a middleware system, a front end framework, etc.)
• Security or Authentication restrictions that prevent the agent from calling an API directly
(e.g API is not exposed to the internet, or non-accessible by agent infrastructure)
• Timing or order-of-operations constraints that prevent the agent from making API calls in
real-time. (i.e. batch operations, human-in-the-loop review, etc.)

• Additional data transformation logic needs to be applied to the API Response that the
agent cannot perform. For example, consider an API endpoint that doesn’t provide a
filtering mechanism for limiting the number of results returned. Using Functions on the
client-side provides the developer additional opportunities to make these transformations.
• The developer wants to iterate on agent development without deploying additional
infrastructure for the API endpoints (i.e. Function Calling can act like “stubbing” of APIs)
While the difference in internal architecture between the two approaches is subtle as seen in
Figure 8, the additional control and decoupled dependency on external infrastructure makes
Function Calling an appealing option for the Developer.

기능과 확장의 차이점

실행 위치:
- 기능은 클라이언트 측에서 실행되며, 에이전트는 API 호출을 직접 수행하지 않습니다.
- 확장은 에이전트 측에서 실행되어 API를 직접 호출합니다.
주요 사용 사례:
- 기능은 보안 및 인증 제한이 있어 에이전트가 API를 직접 호출할 수 없는 경우 사용됩니다.
- 확장은 에이전트가 다단계 작업을 수행하거나 API를 실시간 호출해야 할 때 유용합니다.

기능 활용 사례

Google Flights를 예로 들면, 기능 호출 설정은 다음과 같이 간단할 수 있습니다:

python

function_call {
  name: "display_cities"
  args: {
    "cities": ["Crested Butte", "Whistler", "Zermatt"],
    "preferences": "skiing"
  }
}

위 JSON 형식의 기능 호출은 에이전트가 사용자의 입력을 바탕으로 추천 도시 목록을 생성한 후, 클라이언트 측에서 API 호출을 수행하도록 합니다.

Use cases (사용 사례)

기능(Function)은 복잡한 클라이언트 측 실행 흐름을 처리하는 데 유용하며, 에이전트 개발자는 언어 모델이 API 실행을 관리하지 않도록 설정할 수 있습니다. 다음은 기능 호출을 활용한 대표적인 사례입니다:

사례 1: 여행 컨시어지 에이전트

여행 계획을 도와주는 에이전트를 훈련한다고 가정해 보겠습니다. 사용자가 다음과 같이 요청할 수 있습니다:

"가족과 함께 스키 여행을 가고 싶지만 어디로 가야 할지 모르겠어요."

일반적인 프롬프트를 모델에 제공하면 다음과 같은 응답이 나올 수 있습니다:

가족 스키 여행에 적합한 도시 목록입니다:
- 크레스티드뷰트, 콜로라도, 미국
- 휘슬러, 브리티시 컬럼비아, 캐나다
- 체르마트, 스위스

하지만 이 출력은 다른 시스템에서 처리하기에 구조화되어 있지 않습니다. 기능 호출을 사용하면 이 데이터를 JSON 형식으로 생성할 수 있습니다:

function_call {
  name: "display_cities"
  args: {
    "cities": ["Crested Butte", "Whistler", "Zermatt"],
    "preferences": "skiing"
  }
}

이 JSON 데이터는 클라이언트 측 서버로 전송되어 Google Places API 등을 호출하여 사용자에게 풍부한 여행 정보를 제공할 수 있습니다.

활용 가능 시나리오

API 호출에 자격 증명을 포함하지 않아야 할 경우
비동기 작업이 필요한 경우(예: 긴 처리 시간)
에이전트와 별도의 시스템에서 기능을 실행해야 하는 경우
API 응답에 대한 추가 데이터 변환 로직이 필요한 경우

이점

기능 호출을 통해 개발자는 데이터 흐름과 실행 순서를 보다 세밀하게 제어할 수 있습니다. 필요한 경우 외부 데이터 반환을 에이전트에 전달하거나 생략할 수도 있습니다.

Use cases
A model can be used to invoke functions in order to handle complex, client-side execution
flows for the end user, where the agent Developer might not want the language model to
manage the API execution (as is the case with Extensions). Let’s consider the following
example where an agent is being trained as a travel concierge to interact with users that want
to book vacation trips. The goal is to get the agent to produce a list of cities that we can use
in our middleware application to download images, data, etc. for the user’s trip planning. A
user might say something like:
I’d like to take a ski trip with my family but I’m not sure where to go.
In a typical prompt to the model, the output might look like the following:
Sure, here’s a list of cities that you can consider for family ski trips:
• Crested Butte, Colorado, USA
• Whistler, BC, Canada
• Zermatt, Switzerland
While the above output contains the data that we need (city names), the format isn’t ideal
for parsing. With Function Calling, we can teach a model to format this output in a structured
style (like JSON) that’s more convenient for another system to parse. Given the same input
prompt from the user, an example JSON output from a Function might look like Snippet
5 instead.

This JSON payload is generated by the model, and then sent to our Client-side server to do
whatever we would like to do with it. In this specific case, we’ll call the Google Places API to
take the cities provided by the model and look up Images, then provide them as formatted
rich content back to our User. Consider this sequence diagram in Figure 9 showing the above
interaction in step by step detail.

The result of the example in Figure 9 is that the model is leveraged to “fill in the blanks” with
the parameters required for the Client side UI to make the call to the Google Places API. The
Client side UI manages the actual API call using the parameters provided by the model in the
returned Function. This is just one use case for Function Calling, but there are many other
scenarios to consider like:
• You want a language model to suggest a function that you can use in your code, but you
don't want to include credentials in your code. Because function calling doesn't run the
function, you don't need to include credentials in your code with the function information.

• You are running asynchronous operations that can take more than a few seconds. These
scenarios work well with function calling because it's an asynchronous operation.
• You want to run functions on a device that's different from the system producing the
function calls and their arguments.
One key thing to remember about functions is that they are meant to offer the developer
much more control over not only the execution of API calls, but also the entire flow of data
in the application as a whole. In the example in Figure 9, the developer chose to not return
API information back to the agent as it was not pertinent for future actions the agent might
take. However, based on the architecture of the application, it may make sense to return the
external API call data to the agent in order to influence future reasoning, logic, and action
choices. Ultimately, it is up to the application developer to choose what is right for the
specific application.

Function sample code (기능 샘플 코드)

에이전트가 여행 컨시어지 시나리오에서 추천 도시 목록을 생성하도록 하기 위해 Python 코드를 작성해 보겠습니다. 다음은 예제 코드입니다.

1. display_cities 함수 정의

python

def display_cities(cities: list[str], preferences: Optional[str] = None):
    """
    사용자의 검색 쿼리와 선호도에 따라 추천 도시 목록을 제공합니다.

    Args:
        cities (list[str]): 사용자에게 추천할 도시 목록
        preferences (str): 스키, 해변, 레스토랑 등 사용자의 선호 사항

    Returns:
        list[str]: 추천 도시 목록
    """
    return cities

이 함수는 사용자의 요청과 선호 사항에 따라 추천 도시를 반환합니다.

2. 모델과 도구 설정

python

from vertexai.generative_models import GenerativeModel, Tool, FunctionDeclaration

# 모델 초기화
model = GenerativeModel("gemini-1.5-flash-001")

# display_cities 함수 선언
display_cities_function = FunctionDeclaration.from_func(display_cities)

# 도구 구성
tool = Tool(function_declarations=[display_cities_function])

# 사용자 쿼리
message = "가족과 함께 스키 여행을 가고 싶지만 어디로 가야 할지 모르겠어요."

# 모델 실행
res = model.generate_content(message, tools=[tool])

# 출력 결과 확인
print(f"Function Name: {res.candidates[0].content.parts[0].function_call.name}")
print(f"Function Args: {res.candidates[0].content.parts[0].function_call.args}")

3. 출력 예시

plaintext

Function Name: display_cities
Function Args: {'preferences': 'skiing', 'cities': ['Aspen', 'Vail', 'Park City']}

이 코드는 사용자의 요청을 처리하여 추천 도시를 JSON 형식으로 생성하고 클라이언트 측에서 이를 활용할 수 있게 합니다.

Function sample code
To achieve the above output from our ski vacation scenario, let’s build out each of the
components to make this work with our gemini-1.5-flash-001 model.
First, we’ll define our display_cities function as a simple Python method.

Next, we’ll instantiate our model, build the Tool, then pass in our user’s query and tools to
the model. Executing the code below would result in the output as seen at the bottom of the
code snippet.

In summary, functions offer a straightforward framework that empowers application
developers with fine-grained control over data flow and system execution, while effectively
leveraging the agent/model for critical input generation. Developers can selectively choose
whether to keep the agent “in the loop” by returning external data, or omit it based on
specific application architecture requirements.

Data stores (데이터 저장소)

언어 모델을 도서관에 비유해 보겠습니다. 이 도서관에는 훈련 데이터라는 책들이 가득하지만, 새로운 책을 추가하거나 최신 정보를 실시간으로 반영하지 못합니다. 이러한 한계를 해결하기 위해 **데이터 저장소(Data Stores)**를 사용하여 보다 동적이고 최신 정보를 모델이 활용할 수 있도록 합니다.

데이터 저장소란 무엇인가?

데이터 저장소는 에이전트가 다음과 같은 형태의 데이터를 제공받아 사용할 수 있도록 돕습니다:

스프레드시트, PDF 등 구조화된 데이터
텍스트 파일, HTML 등 비구조화된 데이터

데이터 저장소는 입력된 문서를 벡터 데이터베이스로 변환하여 에이전트가 필요한 정보를 쉽게 검색하고 활용할 수 있도록 합니다.

사용 사례

데이터 저장소를 통해 에이전트는 다음과 같은 작업을 수행할 수 있습니다:

사용자 쿼리에 따라 벡터 데이터베이스에서 임베딩(embedding)을 생성.
임베딩을 기반으로 데이터베이스에서 가장 관련성 높은 항목 검색.
검색된 콘텐츠를 텍스트 형식으로 반환.
에이전트가 사용자 쿼리와 검색된 내용을 바탕으로 응답 생성.

이 과정을 통해 에이전트는 훈련 데이터에만 의존하지 않고, 최신 정보에 기반한 결정을 내릴 수 있습니다.

RAG(정보 검색 강화 생성) 응용

데이터 저장소는 정보 검색 강화 생성(Retrieval-Augmented Generation, RAG) 애플리케이션 구현에 사용됩니다. 이 접근법은 모델이 다음 유형의 데이터를 활용하도록 확장합니다:

웹사이트 콘텐츠
PDF, CSV, 워드 문서 등 구조화된 데이터
HTML, 텍스트 파일 등 비구조화된 데이터

Implementation and application (구현 및 응용)

데이터 저장소는 주로 벡터 데이터베이스로 구현되며, 에이전트가 런타임에서 이 데이터에 접근할 수 있도록 설계됩니다. 벡터 데이터베이스는 데이터를 고차원 벡터(벡터 임베딩) 형태로 저장하며, 이를 통해 에이전트가 보다 효율적으로 관련 정보를 검색하고 활용할 수 있습니다.

데이터 저장소 활용 사례

웹사이트 콘텐츠: 특정 도메인이나 URL에서 미리 색인된 웹 콘텐츠를 검색.
구조화된 데이터: PDF, 워드 문서, 스프레드시트, CSV 등.
비구조화된 데이터: 텍스트 파일, HTML, PDF 등.

RAG(정보 검색 강화 생성)의 일반적인 프로세스

사용자 쿼리 임베딩 생성: 사용자의 요청을 임베딩 모델에 전달하여 벡터 형식으로 변환.
데이터베이스 검색: 임베딩 벡터를 데이터베이스에서 검색 알고리즘(e.g., SCaNN)을 사용해 매칭.
관련 데이터 반환: 매칭된 콘텐츠를 텍스트 형식으로 반환.
에이전트 응답 생성: 에이전트가 사용자 요청 및 검색된 데이터를 바탕으로 응답 생성.

이 과정은 사용자의 질문을 벡터 데이터베이스와 연결하여 관련 콘텐츠를 검색하고, 이를 에이전트의 의사결정과 작업에 반영하는 데 사용됩니다.

예제: RAG와 ReAct 활용

RAG 기반 애플리케이션에서 에이전트는 ReAct 추론/계획 방식을 통해 데이터를 처리하며, 사용자의 질문에 더욱 정교하고 구체적인 답변을 제공합니다.

Tools recap (도구 요약)

에이전트가 실행 시 사용할 수 있는 도구는 크게 확장(Extensions), 기능(Functions), **데이터 저장소(Data Stores)**로 구분됩니다. 각각의 도구는 고유한 목적을 가지며, 개발자가 필요에 따라 독립적으로 또는 조합하여 사용할 수 있습니다.

도구 유형실행 위치사용 사례

확장 (Extensions)	에이전트 측에서 실행	- 에이전트가 API 엔드포인트와의 상호작용을 제어하고 싶을 때 - 기본 제공 확장 도구(예: Vertex Search, Code Interpreter 등)를 활용할 때 - 다중 단계 계획 및 API 호출(이전 작업의 출력이 다음 작업에 의존하는 경우)
기능 (Functions)	클라이언트 측에서 실행	- 에이전트가 API를 직접 호출할 수 없는 보안/인증 제한이 있을 때 - 작업 순서 제약(예: 배치 처리, 인간 개입 검토 등)으로 인해 실시간 API 호출이 불가능할 때 - 클라이언트 측에서 추가 데이터 변환 로직을 적용해야 할 때
데이터 저장소 (Data Stores)	에이전트 측에서 실행	- RAG(Retrieval-Augmented Generation)을 구현하고자 할 때 - 사전 색인된 웹 콘텐츠, PDF, CSV, 워드 문서 등 구조화된 데이터 사용 - HTML, 텍스트 파일 등 비구조화된 데이터 활용

도구의 조합

개발자는 에이전트의 요구 사항에 따라 각 도구를 적절히 조합하여 다양한 문제를 해결할 수 있습니다. 확장은 실시간 상호작용을, 기능은 클라이언트 측 데이터 제어를, 데이터 저장소는 최신 정보를 활용한 응답 생성을 지원합니다.

다음은 "Enhancing model performance with targeted learning" (목표 학습을 통한 모델 성능 향상) 섹션으로 넘어가겠습니다.

Enhancing model performance with targeted learning (목표 학습을 통한 모델 성능 향상)

에이전트가 도구를 효과적으로 사용하고 정확한 출력을 생성하려면, 단순한 모델 훈련을 넘어서는 추가 학습이 필요합니다. 이는 일반적인 요리 기술과 특정 요리의 마스터리 차이에 비유할 수 있습니다. 둘 다 기본적인 요리 지식이 필요하지만, 후자는 특정 기술과 재료에 대한 깊이 있는 학습을 요구합니다.

모델 학습 접근 방식

다음은 모델의 목표 학습에 활용되는 세 가지 주요 접근 방식입니다:

In-context learning (맥락 기반 학습)
- 모델에 일반적인 프롬프트, 도구, 몇 가지 샘플 예제를 제공하여 실행 시 모델이 특정 작업에 맞게 '즉석 학습'하도록 합니다.
- 예: ReAct 프레임워크는 자연어로 주어진 문제를 해결하기 위해 추론과 행동을 결합한 방법론입니다.
Retrieval-based in-context learning (정보 검색 기반 학습)
- 외부 메모리에서 가장 관련성 있는 정보, 도구, 예제를 동적으로 가져와 모델의 프롬프트를 채웁니다.
- 예: Vertex AI 확장에서의 'Example Store'나 데이터 저장소를 활용한 RAG 아키텍처.
Fine-tuning based learning (미세 조정 학습)
- 모델을 더 큰 데이터셋으로 훈련하여 특정 도구와 작업의 사용법을 학습합니다. 이를 통해 사용자 쿼리를 처리하기 전에 모델이 더 깊은 이해를 가지도록 합니다.

학습 접근 방식의 비유: 요리사 사례

맥락 기반 학습
요리사가 고객에게서 특정 레시피(프롬프트), 주요 재료(도구), 예제 요리(샘플)를 받고 즉석에서 고객의 요청을 맞추는 방식.
정보 검색 기반 학습
요리사가 풍부한 재료와 요리책이 가득한 저장고(외부 데이터 저장소)를 이용해 고객의 요청을 충족시키는 방식.
미세 조정 학습
요리사가 특정 요리(예: 프랑스 요리)를 배우기 위해 정식 교육을 받는 방식.

결합의 장점

세 가지 접근 방식을 결합하면, 각 방식의 강점을 활용하고 약점을 보완하여 더 robust(견고)하고 적응력이 뛰어난 솔루션을 구축할 수 있습니다.

Agent quick start with LangChain (LangChain을 활용한 에이전트 빠른 시작)

LangChain 및 LangGraph 라이브러리를 사용하면 에이전트를 빠르게 프로토타입화할 수 있습니다. 이러한 오픈 소스 라이브러리는 논리, 추론, 도구 호출의 연속을 연결하여 사용자 요청에 답할 수 있는 사용자 지정 에이전트를 구축하는 데 도움을 줍니다.

다음은 Gemini 모델과 간단한 도구를 사용하여 LangChain으로 다단계 사용자 쿼리를 처리하는 에이전트를 구축하는 예제입니다.

1. 도구 구성

이 예제에서 사용할 도구는 SerpAPI(Google 검색용)와 Google Places API(위치 검색용)입니다.

python

from langgraph.prebuilt import create_react_agent
from langchain_core.tools import tool
from langchain_community.utilities import SerpAPIWrapper
from langchain_community.tools import GooglePlacesTool

# API 키 설정
os.environ["SERPAPI_API_KEY"] = "XXXXX"
os.environ["GPLACES_API_KEY"] = "XXXXX"

@tool
def search(query: str):
    """Google Search를 실행하기 위한 SerpAPI 사용."""
    search = SerpAPIWrapper()
    return search.run(query)

@tool
def places(query: str):
    """Google Places Query 실행."""
    places = GooglePlacesTool()
    return places.run(query)

2. 모델과 에이전트 생성

LangChain의 ReAct 프레임워크를 사용하여 사용자 쿼리를 처리합니다.

python

from vertexai.generative_models import ChatVertexAI

# 모델 초기화
model = ChatVertexAI(model="gemini-1.5-flash-001")
tools = [search, places]

# 사용자 쿼리 정의
query = "텍사스 롱혼스가 지난주 축구 경기에서 상대했던 팀은 누구였나요? 그 팀 경기장의 주소는 무엇인가요?"

# ReAct 에이전트 생성
agent = create_react_agent(model, tools)
input = {"messages": [("human", query)]}

# 에이전트 실행 및 출력
for s in agent.stream(input, stream_mode="values"):
    message = s["messages"][-1]
    if isinstance(message, tuple):
        print(message)
    else:
        message.pretty_print()

3. 실행 결과

plaintext

=============================== 사용자 요청 ===============================
텍사스 롱혼스가 지난주 축구 경기에서 상대했던 팀은 누구였나요? 그 팀 경기장의 주소는 무엇인가요?
=============================== 도구 호출 ================================
Tool Calls: search
Args:
query: Texas Longhorns football schedule
=============================== 결과 ================================
텍사스 롱혼스는 지난주 조지아 불독스와 경기를 했습니다.
Tool Calls: places
Args:
query: Georgia Bulldogs stadium
=============================== 결과 ================================
조지아 불독스 경기장의 주소는 100 Sanford Dr, Athens, GA 30602, USA입니다.

이 간단한 예제는 모델, 오케스트레이션 계층, 도구가 함께 작동하여 특정 목표를 달성하는 에이전트를 구현하는 방법을 보여줍니다.

Production applications with Vertex AI agents (Vertex AI 에이전트를 활용한 프로덕션 애플리케이션)

이 백서에서는 에이전트의 핵심 구성 요소를 다루었지만, 프로덕션 환경에서 애플리케이션을 구축하려면 사용자 인터페이스, 평가 프레임워크, 지속적인 개선 메커니즘과 같은 추가 도구를 통합해야 합니다. Google의 Vertex AI 플랫폼은 이러한 프로세스를 단순화하여 개발자가 에이전트 개발에 집중할 수 있도록 지원합니다.

Vertex AI의 주요 특징

자연어 인터페이스
- 개발자는 목표, 작업 지침, 도구, 하위 에이전트, 예제와 같은 요소를 신속하게 정의하여 원하는 시스템 동작을 설계할 수 있습니다.
개발 도구 제공
- 에이전트를 테스트, 평가, 디버깅 및 성능 개선할 수 있는 도구를 제공하여 개발자는 인프라 관리보다는 에이전트의 품질 향상에 집중할 수 있습니다.
관리형 환경
- 플랫폼은 인프라, 배포, 유지 관리의 복잡성을 처리하여 개발자가 에이전트 구축에만 전념할 수 있게 합니다.

샘플 에이전트 아키텍처

다음은 Vertex AI 플랫폼을 활용하여 구축된 샘플 에이전트 아키텍처의 구성 요소입니다:

Vertex Agent Builder: 에이전트를 설계하고 구성하는 도구.
Vertex Extensions: API 호출을 관리하는 기본 제공 확장 도구.
Vertex Function Calling: 기능 호출을 처리하는 도구.
Vertex Example Store: 학습 예제를 저장 및 관리하여 추론 정확도를 향상.

아래 그림은 이러한 구성 요소를 통합한 엔드투엔드 에이전트 아키텍처를 보여줍니다.

Vertex AI의 이점

Vertex AI를 사용하면 다음과 같은 이점을 얻을 수 있습니다:

속도: 에이전트 개발 및 테스트 속도 향상.
품질: 지속적인 성능 평가 및 개선.
확장성: 엔터프라이즈 수준의 확장 가능성.

다음은 "Summary" (요약) 섹션으로 넘어가겠습니다.

Summary (요약)

이 백서에서는 생성 AI 에이전트의 구성 요소와 이를 효과적으로 구현하는 방법에 대해 논의했습니다. 주요 요점은 다음과 같습니다:

1. 에이전트의 역할

에이전트는 언어 모델의 능력을 확장하여, 도구를 활용해 실시간 정보를 얻거나 현실 세계의 작업을 제안하며 복잡한 작업을 계획하고 실행할 수 있습니다.
에이전트는 하나 이상의 언어 모델을 사용하여 상태를 전환하고 외부 도구를 활용해 단독으로 수행하기 어려운 복잡한 작업을 완료합니다.

2. 인지 아키텍처의 중요성

에이전트의 중심에는 오케스트레이션 계층이 있으며, 이는 추론, 계획, 의사결정을 구조화하고 행동을 안내합니다.
ReAct, Chain-of-Thought(CoT), Tree-of-Thought(ToT)와 같은 다양한 추론 프레임워크는 정보를 처리하고 내부적으로 추론하며, 정보에 기반한 결정을 내리는 데 도움을 줍니다.

3. 도구의 활용

에이전트는 확장(Extensions), 기능(Functions), **데이터 저장소(Data Stores)**와 같은 도구를 활용해 외부 시스템과 상호작용하며, 훈련 데이터 이상의 정보를 사용할 수 있습니다.

확장: 에이전트와 외부 API를 연결하여 실시간 정보 검색 가능.
기능: 클라이언트 측에서 데이터 흐름과 실행을 더 세밀하게 제어 가능.
데이터 저장소: 구조화 및 비구조화된 데이터를 통해 데이터 기반 응답 생성.

4. 미래 전망

더 복잡한 문제 해결: 도구가 더 정교해지고 추론 능력이 강화되면서 에이전트는 더욱 복잡한 문제를 해결할 수 있게 될 것입니다.
에이전트 체이닝(Agent Chaining): 각기 다른 영역에서 전문성을 가진 에이전트를 결합하여 다양한 문제를 해결하는 전략이 주목받을 것입니다.

5. 지속적인 개선

복잡한 에이전트 아키텍처를 구축하려면 실험과 반복적인 개선이 필요합니다. 각 에이전트는 설계 및 구현에 따라 다르며, 구체적인 비즈니스 요구 사항과 조직적 필요에 맞는 솔루션을 찾아야 합니다.

결론

이 백서는 에이전트의 구성 요소를 활용해 실제 애플리케이션을 구축하고, 언어 모델의 한계를 넘어서는 가능성을 탐구하는 데 유용한 지침을 제공합니다. 생성 AI 에이전트는 앞으로 더욱 발전하여 실질적인 가치를 창출하는 데 중요한 역할을 할 것입니다.

728x90

'인공지능' 카테고리의 다른 글

AI Agent 만드는법 , 일레븐랩스, CHTGPT 활용 (0)	2025.01.26
Play.ht ,Play.AI 비교:목소리 복제,대화형음성창작 (0)	2025.01.26
Ericsson은 인공지능(AI)과 머신러닝(ML)을 활용하여 셀 플래닝(Cell Planning) (0)	2025.01.08
인공지능(AI)을 활용한 셀 플래닝(Cell Planning) (0)	2025.01.08
세렌스(Cerence Inc.)(미),자동차용 인공지능(AI) 기반 음성 인식 및 대화형 솔루션 (1)	2025.01.08

AI Agent

Introduction (소개)

What is an agent? (에이전트란 무엇인가?)

The model (모델)

오케스트레이션 계층의 역할

주요 추론 프레임워크

도구란 무엇인가?

도구의 주요 유형

확장 도구 사용 사례

확장의 장점

예제: Google Flights 확장

예제: 코드 인터프리터(Code Interpreter)

생성된 코드 예제

기능과 확장의 차이점

기능 활용 사례

사례 1: 여행 컨시어지 에이전트

활용 가능 시나리오

이점

1. display_cities 함수 정의

2. 모델과 도구 설정

3. 출력 예시

데이터 저장소란 무엇인가?

사용 사례

RAG(정보 검색 강화 생성) 응용

데이터 저장소 활용 사례

RAG(정보 검색 강화 생성)의 일반적인 프로세스

예제: RAG와 ReAct 활용

도구의 조합

Enhancing model performance with targeted learning (목표 학습을 통한 모델 성능 향상)

모델 학습 접근 방식

학습 접근 방식의 비유: 요리사 사례

결합의 장점

1. 도구 구성

2. 모델과 에이전트 생성

3. 실행 결과

Vertex AI의 주요 특징

샘플 에이전트 아키텍처

Vertex AI의 이점

Summary (요약)

1. 에이전트의 역할

2. 인지 아키텍처의 중요성

3. 도구의 활용

4. 미래 전망

5. 지속적인 개선

결론

'인공지능' 카테고리의 다른 글

관련글

티스토리툴바