Setting Up Your Vision Agent

Let's set up the foundation for our vision agent module.

Project Setup

First, clone the module template which is a base template for creating Naptha modules such as agents, agent orchestrators and agent environments.

Environment Setup

Create a copy of the .env file:

You need to set a PRIVATE_KEY in the .env file (e.g. this can be the same as the PRIVATE_KEY you use with the Naptha SDK). If using OpenAI, make sure to set the OPENAI_API_KEY environment variable also.

Project Structure

When you clone the module template, you'll see the following structure:

Install dependencies using poetry:

Install Python 3.10+ and pipx

First, ensure you have Python 3.10+ installed. Then install pipx:

Install Poetry Package Manager

Install Poetry using pipx:

Configuration

deployment.json

This file defines the deployment configuration for your module. It includes the module name, node details, and configuration settings.

llm_configs.json

Here, we will define the LLM configuration for our agent. We will be using the gpt-4o-mini model but you can also add other multi-modal models such as gpt-4-vision or gemini or Mistral as well as adjust the temperature, max tokens, and API base.

Next Steps

In the next section, we'll implement the core functionality in run.py for our a simple vision agent.

The run.py file is the default entry point that will be used when the module run is initiated. The run function therein can instantiate a class (e.g. an agent class) or call a function.