Unimelb SIGIR Live Rag Competition

This guide provides comprehensive instructions for setting up your development environment and using the resources provided for the LiveRAG Challenge.

Quickstart

Clone the repository:

git clone https://github.com/Dadams2/liverag
cd liverag

Create a virtual environment:
```
uv venv
source .venv/bin/activate
```
Install dependencies:
```
uv sync
```
Set up environment variables: Copy the .env.example to .env and fill in the required values.
Run the setup scripts:
- To set up AWS resources:
```
python scripts/setup_aws.py
```
- To set up AWS credentials:
```
python scripts/setup_credentials.py
```
- To set up hugging face access:
```
python scripts/setup_hf.py
```

AWS Account Setup

Team Account

TBD

LiveRAG Account

Access to pre-built indices is provided through a TII-managed AWS account:

To find the Access Key ID and Secret Access Key, please refer to the email on Friday 21 Mar at 09:44.

Configure AWS CLI Profile:

aws configure --profile sigir-participant
# Use the following settings:
# AWS Access Key ID: [your access key]
# AWS Secret Access Key: [your secret key]
# Default region name: us-east-1
# Default output format: json

Verify Access:

# Should display your AWS account ID
aws sts get-caller-identity --profile sigir-participant

# Test access to configuration service
aws ssm get-parameter --name /pinecone/ro_token --profile sigir-participant

Development Environment

Setting Up Python

Install Python 3.12 (or a recent version) on your development machine
Choose a dependency management approach (see next section for uv)

Using uv for Dependency Management

uv is a fast Python package installer and resolver that can be used as an alternative to pip or conda.

Installing uv

# Using pip
pip install uv

# On macOS with Homebrew
brew install uv

# On Linux with curl
curl -LsSf https://astral.sh/uv/install.sh | sh

Creating a Virtual Environment with uv

# Create a new virtual environment
uv venv

# Activate the virtual environment
# On Unix/macOS:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate

Installing Dependencies with uv

# Install packages directly
uv pip install torch transformers boto3 pinecone opensearch-py

# Install with specific versions 
uv pip install torch==2.5.1 transformers==4.45.2 boto3==1.35.88 pinecone==5.4.2 opensearch-py==2.8.0

# Install from requirements.txt
uv pip install -r requirements.txt

Advantages of uv

Much faster than pip (up to 10-100x)
Better dependency resolution
Compatible with existing tools and workflows
Can generate lock files for reproducible environments

Hugging Face Setup (for Private Model Access)

The model used in this challenge tiiua/Falcon3-10B-Instruct is a private model so you will need to generate an acess token and have it available for the code you wish to run.

Generate a Hugging Face Access Token

First, create or log into your Hugging Face account. Then:

Visit https://huggingface.co/settings/tokens
Click "New token"
Choose Read access
Copy the token

Authenticate via CLI (Recommended)

Assuming you have setup your python environment you should have huggingface-cli installed

huggingface-cli login
# Paste your token when prompted

This stores your credentials in ~/.huggingface/token and will be used automatically by transformers and other HF libraries.

Use Environment Variables

If you’re running code on amazon (EC2, or containers), set the token as an environment variable:

export HUGGINGFACE_HUB_TOKEN=your_token_here

Or add this to your .env file:

HUGGINGFACE_HUB_TOKEN=your_token_here

Ensure your code or scripts load the environment variables, e.g., with dotenv.

Pass Token Programmatically in Code

If needed, you can pass the token directly when loading models:

from transformers import AutoTokenizer, AutoModelForCausalLM
from huggingface_hub import login

login(token="your_token_here")  # optional if already logged in

model = AutoModelForCausalLM.from_pretrained(
    "tiiua/Falcon3-10B-Instruct",
    token="your_token_here"
)

tokenizer = AutoTokenizer.from_pretrained(
    "tiiua/Falcon3-10B-Instruct",
    token="your_token_here"
)

Using `huggingface_hub` Login in Notebooks

if you are just using a notebook (not reccomended)

from huggingface_hub import notebook_login

notebook_login()

Using Pre-Built Indices

We provide two pre-built indices for retrieval:

Pinecone (Dense) Index

import boto3
from pinecone import Pinecone
from transformers import AutoModel, AutoTokenizer

# Get Pinecone token from AWS SSM
session = boto3.Session(profile_name="sigir-participant", region_name="us-east-1")
ssm = session.client("ssm")
token = ssm.get_parameter(Name="/pinecone/ro_token", WithDecryption=True)["Parameter"]["Value"]

# Initialize Pinecone
pc = Pinecone(api_key=token)
index = pc.Index(name="fineweb10bt-512-0w-e5-base-v2")

# See the example notebook for full query implementation

OpenSearch (Sparse) Index

import boto3
from opensearchpy import OpenSearch, AWSV4SignerAuth, RequestsHttpConnection

# Get credentials and endpoint
session = boto3.Session(profile_name="sigir-participant")
credentials = session.get_credentials()
auth = AWSV4SignerAuth(credentials, region="us-east-1")

ssm = session.client("ssm")
host_name = ssm.get_parameter(Name="/opensearch/endpoint")["Parameter"]["Value"]

# Initialize OpenSearch client
aos_client = OpenSearch(
    hosts=[{"host": host_name, "port": 443}],
    http_auth=auth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
)

# See the example notebook for full query implementation

Cost Management

Efficient cost management is crucial to ensure AWS credits last throughout the competition:

Shut down unused resources – Turn off GPU instances when not in use
Monitor costs regularly – Use AWS Cost Explorer and set up CloudWatch billing alarms
Experiment on smaller datasets – Test on smaller data before scaling up
Use spot instances when appropriate for non-critical workloads
Set up AWS Budgets to receive notifications before exceeding planned spending

Additional Resources

Indices Usage Examples Notebook - Sample code for using indices
AWS Accounts Information - Detailed AWS account guidance
Pinecone for LiveRAG - Instructions for building your own Pinecone index
AWS CLI Documentation - Official AWS CLI guide
uv Documentation - Official uv documentation

Note: Remember that if you exceed your AWS credits, we will be directly charged and not refunded!

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
.vscode		.vscode
Sample_Notebooks		Sample_Notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
abby_test.ipynb		abby_test.ipynb
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unimelb SIGIR Live Rag Competition

Table of Contents

Quickstart

AWS Account Setup

Team Account

LiveRAG Account

Development Environment

Setting Up Python

Using uv for Dependency Management

Installing uv

Creating a Virtual Environment with uv

Installing Dependencies with uv

Advantages of uv

Hugging Face Setup (for Private Model Access)

Generate a Hugging Face Access Token

Authenticate via CLI (Recommended)

Use Environment Variables

Pass Token Programmatically in Code

Using `huggingface_hub` Login in Notebooks

Using Pre-Built Indices

Pinecone (Dense) Index

OpenSearch (Sparse) Index

Cost Management

Additional Resources

Development Updates

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unimelb SIGIR Live Rag Competition

Table of Contents

Quickstart

AWS Account Setup

Team Account

LiveRAG Account

Development Environment

Setting Up Python

Using uv for Dependency Management

Installing uv

Creating a Virtual Environment with uv

Installing Dependencies with uv

Advantages of uv

Hugging Face Setup (for Private Model Access)

Generate a Hugging Face Access Token

Authenticate via CLI (Recommended)

Use Environment Variables

Pass Token Programmatically in Code

Using huggingface_hub Login in Notebooks

Using Pre-Built Indices

Pinecone (Dense) Index

OpenSearch (Sparse) Index

Cost Management

Additional Resources

Development Updates

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using `huggingface_hub` Login in Notebooks

Packages