Skip to main content

How-To Guides

Repository Clone

To get started, first clone the GitHub repository:

bash
git clone https://github.com/ICICLE-ai/organization-sic-classifier-for-smart-foodsheds.git
cd organization-sic-classifier-for-smart-foodsheds

Create and Activate Virtual Environment

Create a virtual environment:

bash
python3 -m venv venv

Activate the virtual environment:

  • On macOS/Linux:
bash
source venv/bin/activate
  • On Windows:
bash
venv\Scripts\activate

Install all required Python packages:

bash
pip install -r requirements.txt

Dataset

The dataset is available on Hugging Face and must be downloaded before running any training or testing script.

Dataset Download Instructions

bash
git lfs install
git clone https://huggingface.co/datasets/ICICLE-AI/organization-sic-code_smart-foodsheds

After downloading, extract and place the unzipped data/ folder in the root directory (next to src/).

Dataset Variants

The dataset includes multiple variants based on the source of the organization descriptions:

  • gsnip: Google search snippets
  • gptsummary: GPT-4o-mini generated summaries
  • llamasummary: LLaMA 3.1–8B Instruct generated summaries
  • gsnip+gptsummary: Combined inputs of google snippets + GPT-4o-mini generated summaries
  • gsnip+llamasummary: Combined inputs of google snippets + LLaMA 3.1–8B Instruct generated summaries

Each variant includes the following splits:

  • train.csv
  • dev.csv
  • test.csv