Setting Up Your Development Environment & Schema Design
Welcome to the second module of our knowledge base course!
In this section, we'll establish our development environment and design a robust schema for our Nobel Prize KB module.
Project Setup
First, clone the official Naptha KB module template:
Environment Setup
Next, install dependencies using poetry:
Before proceeding, ensure you have these prerequisites installed on your system:
1. Install Python 3.11+
2. Install Poetry
We'll also need some additional dependencies for data handling and progress tracking. You can add these to your pyproject.toml
:
Or install them directly using poetry:
Project Structure
Your project should have the following structure:
Schema Design
The schemas.py
file defines how our KB module will interact with data. Let's create our Nobel Prize schema:
Configuration
Let's set up our KB's deployment configuration in configs/deployment.json
.
This configuration defines the storage backend, schema, and query options for our knowledge base:
Data Preparation
For our Nobel Prize dataset, we'll use the open-source Nobel Prize Laureates dataset published by Nobel Media AB, available at https://public.opendatasoft.com/explore/dataset/nobel-prize-laureates/information/. This comprehensive dataset lists all Nobel laureates from 1902 onwards. The dataset also accounts for cases where a laureate has been awarded multiple Nobel Prizes.
We also cleaned and formatted the data into two CSV files available at https://github.com/thestriver/nobel_kb/tree/main/nobel_kb/data with the following headers:
We will use the smaller nobel-prize-laureates-2024.csv
file for our KB module.
Next Steps
In the next section, we'll implement the core functionality of our Nobel Prize KB module, including:
- Data loading and initialization
- Query handling and search functionality
- Error handling and validation
Make sure your environment is properly set up and your schema is well-defined before moving forward!