Setting Up Your Vision Agent
Let's set up the foundation for our vision agent module.
Project Setup
First, clone the module template which is a base template for creating Naptha modules such as agents, agent orchestrators and agent environments.
Environment Setup
Create a copy of the .env file:
You need to set a PRIVATE_KEY
in the .env
file (e.g. this can be the
same as the PRIVATE_KEY
you use with the Naptha SDK). If using OpenAI,
make sure to set the OPENAI_API_KEY
environment variable also.
Project Structure
When you clone the module template, you'll see the following structure:
Install dependencies using poetry:
Install Python 3.10+ and pipx
First, ensure you have Python 3.10+ installed. Then install pipx:
Install Poetry Package Manager
Install Poetry using pipx:
Configuration
-
deployment.json
This file defines the deployment configuration for your module. It includes the module name, node details, and configuration settings.
-
llm_configs.json
Here, we will define the LLM configuration for our agent. We will be using the gpt-4o-mini
model
but you can also add other multi-modal models such as gpt-4-vision
or gemini
or Mistral
as well as adjust the temperature, max tokens, and API base.
Next Steps
In the next section, we'll implement the core functionality in run.py
for our a simple vision agent.
The run.py
file is the default entry point that will be used when the
module run is initiated. The run
function therein can instantiate a class
(e.g. an agent class) or call a function.