BOND Installation Guide

This guide provides detailed instructions for installing and configuring BOND.

System Requirements

Python: 3.11 or higher
Operating System: Linux, macOS, or Windows
Memory: 8GB RAM minimum (16GB+ recommended)
Disk Space: ~5GB for ontology database and FAISS indices
Optional: GPU for faster embedding inference (CUDA-compatible)

Step 1: Clone the Repository

git clone https://github.com/Aronow-Lab/BOND.git
cd BOND

Step 2: Create Virtual Environment

# Create virtual environment
python3.11 -m venv bond_venv

# Activate (Linux/macOS)
source bond_venv/bin/activate

# Activate (Windows)
bond_venv\Scripts\activate

Step 3: Install Dependencies

# Upgrade pip
pip install --upgrade pip

# Install BOND package (editable mode)
pip install -e .

Or install with development dependencies:

pip install -e ".[dev]"

Step 4: Obtain Ontology Database

You need an SQLite database containing ontology terms. Detailed information on how to create, update, and manage the ontology database can be found in assets/README.md.

You have two options:

Option A: Use Pre-built Database

If you have access to a pre-built ontologies.sqlite file:

mkdir -p assets
cp /path/to/ontologies.sqlite assets/ontologies.sqlite

Option B: Generate Database from Ontology Files

# Generate SQLite database from OBO/OWL files
bond-generate-sqlite \
  --input_dir /path/to/ontology/files \
  --output_path assets/ontologies.sqlite

The script supports:

OBO format files (.obo)
OWL format files (.owl)
JSON-LD format

Required ontologies for full functionality:

Cell Ontology (CL)
UBERON
MONDO Disease Ontology
Experimental Factor Ontology (EFO)
PATO
HANCESTRO
NCBI Taxonomy
Organism-specific development stage ontologies (HsapDv, MmusDv, etc.)

Step 4.5: Create Abbreviations Dictionary (Optional but Recommended)

Create or update the abbreviations dictionary at assets/abbreviations.json to improve query matching for abbreviated terms. This file is optional but recommended:

# Create abbreviations file
mkdir -p assets
cat > assets/abbreviations.json << 'EOF'
{
  "cell_type": {
    "t": "t cell",
    "nk": "natural killer cell",
    "dc": "dendritic cell",
    "b": "b cell",
    "mono": "monocyte",
    "mφ": "macrophage",
    "neu": "neutrophil"
  },
  "tissue": {
    "bm": "bone marrow",
    "ln": "lymph node",
    "spl": "spleen"
  }
}
EOF

Step 5: Configure Embedding Model (Before Building FAISS)

Before building the FAISS index, you need to configure your embedding model. The FAISS index must be built with the same embedding model you'll use at runtime.

Important: Configure your embedding model in Step 6 (Environment Configuration) before building FAISS in Step 7.

See the Selecting Your Encoder section below for detailed options.

Step 6: Build FAISS Index

Build the FAISS index for dense semantic search:

bond-build-faiss \
  --sqlite_path assets/ontologies.sqlite \
  --assets_path assets \
  --embed_model st:all-MiniLM-L6-v2

Note: This step requires:

Embedding model configured in .env file (see Step 5 and Step 7)
Several hours for large ontology databases
Sufficient disk space (~2-5GB)

Important: Make sure you've configured your embedding model in the .env file (Step 7) before running this command, as the FAISS index must match your runtime embedding model.

Step 7: Configure Environment

Create a .env file in the project root:

# Embedding Model Configuration
# Options:
# - st:all-MiniLM-L6-v2 (Sentence Transformers, default)
# - st:sentence-transformers/all-mpnet-base-v2
# - litellm/http://your-embedding-service

BOND_EMBED_MODEL=ollama/rajdeopankaj/bond-embed-v1-fp16:latest

# LLM Providers for Expansion and Disambiguation
# You need at least one configured

# Option 1: Anthropic Claude
BOND_EXPANSION_LLM=anthropic/claude-3-5-sonnet-20241022
BOND_DISAMBIGUATION_LLM=anthropic/claude-3-5-sonnet-20241022
ANTHROPIC_API_KEY=your-anthropic-api-key

# Option 2: OpenAI GPT
# BOND_EXPANSION_LLM=openai/gpt-4o
# BOND_DISAMBIGUATION_LLM=openai/gpt-4o
# OPENAI_API_KEY=your-openai-api-key

# Option 3: Other LiteLLM-compatible providers
# BOND_EXPANSION_LLM=cohere/command-r-plus
# BOND_DISAMBIGUATION_LLM=cohere/command-r-plus
# COHERE_API_KEY=your-cohere-api-key

# Paths (defaults shown)
BOND_ASSETS_PATH=assets
BOND_SQLITE_PATH=assets/ontologies.sqlite
BOND_RERANKER_PATH=reranker-model/

# Optional: Retrieval-only mode (skip LLM stages)
# BOND_RETRIEVAL_ONLY=1

# Optional: API Authentication
# BOND_API_KEY=your-secret-api-key
# BOND_ALLOW_ANON=1  # Allow anonymous access (development only)

Step 8: Download Reranker Model (Optional)

The reranker model improves accuracy by 10-15%. It's optional but recommended:

Download from Hugging Face: https://huggingface.co/AronowLab/BOND-reranker

Extract model files to reranker-model/ directory:

mkdir -p reranker-model
# Download and extract model files to reranker-model/

Verify files: The directory should contain:
- config.json
- model.safetensors (or pytorch_model.bin)
- tokenizer_config.json
- vocab.txt
- Other tokenizer files

Note: The BOND_RERANKER_PATH in your .env file should point to this directory (default: reranker-model/). See reranker-model/README.md for detailed instructions.

Step 9: Verify Installation

Verify that all components are properly installed:

1. Verify Assets

# Check SQLite database exists
ls -lh assets/ontologies.sqlite

# Check FAISS index exists
ls -lh assets/faiss_store/embeddings.faiss
ls -lh assets/faiss_store/id_map.npy

# Check abbreviations file (optional)
ls -lh assets/abbreviations.json

# Check reranker model (optional)
ls -lh reranker-model/config.json

2. Test CLI

# Check CLI works
bond-query --help

# Test query (requires database and FAISS index)
bond-query \
  --query "T-cell" \
  --field cell_type \
  --organism "Homo sapiens" \
  --tissue "blood"

3. Test Python API

from bond import BondMatcher
from bond.config import BondSettings

# Initialize matcher
settings = BondSettings()
matcher = BondMatcher(settings)

# Test query
result = matcher.query(
    query="T-cell",
    field_name="cell_type",
    organism="Homo sapiens",
    tissue="blood"
)

print(f"Matched: {result['chosen']['label']}")
print(f"Ontology ID: {result['chosen']['id']}")

4. Test Server (Optional)

# Start server (if API key is set)
bond-serve
# In another terminal:
curl http://localhost:8000/health

Docker Installation (Alternative)

A Dockerfile is provided for containerized deployment:

# Build image
docker build -t bond:latest .

# Run container
docker run -p 8000:8000 \
  -v $(pwd)/assets:/app/assets \
  -e BOND_API_KEY=your-key \
  -e ANTHROPIC_API_KEY=your-key \
  bond:latest

Troubleshooting

Issue: "Database not found: assets/ontologies.sqlite"

Solution: Ensure the ontology database exists at the specified path:

ls -lh assets/ontologies.sqlite

Issue: "FAISS index not found"

Solution: Build the FAISS index:

bond-build-faiss --sqlite_path assets/ontologies.sqlite --assets_path assets

Issue: LLM API errors

Solutions:

Verify API keys are set correctly
Check API key permissions (write access required)
Ensure sufficient API credits/quota
Try a different LLM provider

Issue: Out of memory during FAISS build

Solutions:

Build index with smaller batch size
Use CPU-only FAISS (faiss-cpu) instead of GPU version
Process ontologies in chunks

Issue: Import errors

Solution: Ensure virtual environment is activated and dependencies installed:

source bond_venv/bin/activate
pip install -e .

Next Steps

Read the README.md for usage examples
Explore Hybrid Search Guide for advanced features
Review Reranker Training Guide for custom model training. See notebooks/ for training code and example notebooks

Getting Help

For questions and support, see the resources below.

Additional Resources

Benchmark Dataset: HuggingFace Dataset
Paper: Multi-agent AI System for High Quality Metadata Curation at Scale - Related multi-agent curation system
Issues: GitHub Issues

Selecting Your Encoder (HF or Ollama)

Important: Configure your embedding model before building the FAISS index (Step 6). The FAISS index must be built with the same embedding model you'll use at runtime.

You can use your published encoders with BOND.

Option A: Ollama (local)

Pull the model:

ollama pull rajdeopankaj/bond-embed-v1-fp16

Set the env var (e.g., in .env):

BOND_EMBED_MODEL=ollama:rajdeopankaj/bond-embed-v1-fp16
# OLLAMA_API_BASE=http://localhost:11434  # if remote, set your host

Build FAISS:

bond-build-faiss --sqlite_path assets/ontologies.sqlite --assets_path assets

Option B: Hugging Face TEI (hosted)

Deploy pankajrajdeo/bond-embed-v1-fp16 behind a LiteLLM-compatible endpoint (e.g., TEI + gateway).
Set the env var to the routed model name, for example:

BOND_EMBED_MODEL=litellm:huggingface/teimodel

Build FAISS as usual.

References:

HF model: https://huggingface.co/pankajrajdeo/bond-embed-v1-fp16
Ollama model: https://ollama.com/rajdeopankaj/bond-embed-v1-fp16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BOND Installation Guide

System Requirements

Step 1: Clone the Repository

Step 2: Create Virtual Environment

Step 3: Install Dependencies

Step 4: Obtain Ontology Database

Option A: Use Pre-built Database

Option B: Generate Database from Ontology Files

Step 4.5: Create Abbreviations Dictionary (Optional but Recommended)

Step 5: Configure Embedding Model (Before Building FAISS)

Step 6: Build FAISS Index

Step 7: Configure Environment

Step 8: Download Reranker Model (Optional)

Step 9: Verify Installation

1. Verify Assets

2. Test CLI

3. Test Python API

4. Test Server (Optional)

Docker Installation (Alternative)

Troubleshooting

Issue: "Database not found: assets/ontologies.sqlite"

Issue: "FAISS index not found"

Issue: LLM API errors

Issue: Out of memory during FAISS build

Issue: Import errors

Next Steps

Getting Help

Additional Resources

Selecting Your Encoder (HF or Ollama)

Option A: Ollama (local)

Option B: Hugging Face TEI (hosted)

FilesExpand file tree

INSTALLATION.md

Latest commit

History

INSTALLATION.md

File metadata and controls

BOND Installation Guide

System Requirements

Step 1: Clone the Repository

Step 2: Create Virtual Environment

Step 3: Install Dependencies

Step 4: Obtain Ontology Database

Option A: Use Pre-built Database

Option B: Generate Database from Ontology Files

Step 4.5: Create Abbreviations Dictionary (Optional but Recommended)

Step 5: Configure Embedding Model (Before Building FAISS)

Step 6: Build FAISS Index

Step 7: Configure Environment

Step 8: Download Reranker Model (Optional)

Step 9: Verify Installation

1. Verify Assets

2. Test CLI

3. Test Python API

4. Test Server (Optional)

Docker Installation (Alternative)

Troubleshooting

Issue: "Database not found: assets/ontologies.sqlite"

Issue: "FAISS index not found"

Issue: LLM API errors

Issue: Out of memory during FAISS build

Issue: Import errors

Next Steps

Getting Help

Additional Resources

Selecting Your Encoder (HF or Ollama)

Option A: Ollama (local)

Option B: Hugging Face TEI (hosted)