Setting Up Your Development Environment & Schema Design

Welcome to the second module of our knowledge base course!

In this section, we'll establish our development environment and design a robust schema for our Nobel Prize KB module.

Project Setup

First, clone the official Naptha KB module template:

Environment Setup

Next, install dependencies using poetry:

Before proceeding, ensure you have these prerequisites installed on your system:

1. Install Python 3.11+

2. Install Poetry

We'll also need some additional dependencies for data handling and progress tracking. You can add these to your pyproject.toml:

Or install them directly using poetry:

Project Structure

Your project should have the following structure:

Schema Design

The schemas.py file defines how our KB module will interact with data. Let's create our Nobel Prize schema:

Configuration

Let's set up our KB's deployment configuration in configs/deployment.json. This configuration defines the storage backend, schema, and query options for our knowledge base:

Data Preparation

For our Nobel Prize dataset, we'll use the open-source Nobel Prize Laureates dataset published by Nobel Media AB, available at https://public.opendatasoft.com/explore/dataset/nobel-prize-laureates/information/. This comprehensive dataset lists all Nobel laureates from 1902 onwards. The dataset also accounts for cases where a laureate has been awarded multiple Nobel Prizes.

We also cleaned and formatted the data into two CSV files available at https://github.com/thestriver/nobel_kb/tree/main/nobel_kb/data with the following headers:

We will use the smaller nobel-prize-laureates-2024.csv file for our KB module.

Next Steps

In the next section, we'll implement the core functionality of our Nobel Prize KB module, including:

Data loading and initialization
Query handling and search functionality
Error handling and validation

Make sure your environment is properly set up and your schema is well-defined before moving forward!